Keyboard Focus and Event Trickling in Immediate Mode GUIs

One thing I haven’t seen addressed a lot in writings about immediate mode GUIs (IMGUIs) is keyboard focus and event trickling. Probably because it can be a tricky subject, and if you are just making a simple UI you can get away with not supporting it.

What do I mean by event trickling?

In a “normal” retained mode GUI (RMGUI) when you press a key on the keyboard, some control which has input focus first gets a chance to respond to the event. If the control doesn’t handle the event, it gets passed to the controls owner, then to the owner’s owner, and so on.

As an example, suppose the focus is on a regular text box. If the user presses A, the text box takes the event and adds an a to the text. However, if the user presses PgDn, the control doesn’t handle it (assuming it’s a single line text box) and it gets passed to the owner, which may be a scrollable pane. The scrollable pane swallows the PgDn event and scrolls the pane. A Ctrl+O key press makes it all the way to the application window (the root owner), where it gets processed as the Open... menu accelerator.

In UI parlance, the text box is called the First Responder and the sequence of owners that get to look at the event next is called the Responder Chain.

Most UI texts talks about generic events travelling up the responder chain, but that seems to be an unnecessary abstraction to me. We already have a good solution for mouse interaction in our IMGUIs, which means the only thing we need the responder chain for is key presses. So instead of thinking of some generic events, we can just assume that we are dealing with key presses.

Side note: Some UI systems allow user-defined events to travel the responder chain. In such a system a Ctrl+V event might travel all the way up to the top responder, which translates that to a user defined Paste event and then sends that event back to the first responder to travel all the way up the responder chain again. There are two points to this. First, it’s nice to do the accelerator resolution in a single place, especially if you want to support things like user defined keyboard mappings. But more importantly, Paste is a command that has different meanings depending on context. What it should do depends on what has focus, and the simplest way of dealing with that is to send it through the responder chain.

For this post, I’ll ignore user defined events, but once we have a good solution for keyboard input, it is not complicated to add support for them. (You could just think of them as special keys.)

The RMGUI responder chain is deeply steeped in object-oriented concepts. We have a bunch of permanently existing objects in different relationships to each other, sending messages back and forth. To make this work in an IMGUI we need to reformulate it without objects and messages.

Specifically we have the following problems:

  • How can we construct and represent a responder chain when we don’t have any permanent objects?
  • How can we handle trickling of events through the chain? The first responder will typically be drawn last. How can it “pass” the event to an earlier control that has already been drawn?
  • How can we handle tabbing and shift-tabbing between controls for a fully keyboard operated UI?

Constructing a responder chain

As common in IMGUIs, we use unique IDs to identify our controls. This is used, among other things, for keeping track of which control the mouse is currently interacting with (the active control).

Since controls are identified by IDs we can represent our responder chain as a sequence of IDs. We can use a fixed size array (32 or so) for this, since we never expect to have more than a few levels of control nesting.

One way of building this array would be to let the function for each control check if it should be part of the responder chain, and in that case add itself. Something like this:

if (mouse_pressed && in_rect(mouse_pos, my_rect))
    add_to_responder_chain(my_id);

Since parent controls are called before child controls they would end up before their child in the array. So the first responder would actually be the last item in the array, but that doesn’t really matter, we can just interpret the array in reverse order.

This method actually works pretty well if all items in the responder chain are nested geometrically (which you typically expect). However it has two drawbacks:

  • It doesn’t really work with overlapping controls — if two controls on the same level of the hierarchy overlap, they would both become part of the responder chain. We only want the topmost control in the responder chain.

  • It is dependent on having a known mouse position when an item gets focus. If an item gets focus some other way (say by tabbing) we can’t reconstruct the responder chain unless we do something hacky, such as simulating a click in the center of the item that got focus.

What we want instead is to know all the parent responders when a control function gets called, so we can do something like:

if (gained_focus)
   set_responder_chain(parent_responders, my_id);

In order to know the parent responders we need to explicitly track them. We can do that with something like:

void owning_control(...)
{
     begin_responder_scope(my_id);
     // Create child controls
     end_responder_scope(my_id);
}

Here, begin_responder_scope() pushes the control’s ID to a stack that keeps track of the current responder scopes and end_responder_scope() pops it from the same stack. Just as the responder chain, this stack can be a fixed sized array, and when a control gets focus the current responder scope stack becomes the responder chain.

Having to insert function calls to explicitly begin and end scopes for every control that can be a part of the responder chain is a bit annoying, but I can’t think of any other way of making sure that we have the right responder chain.

A bigger drawback is that this approach makes it harder to write generic container controls. With the need for responder scopes, we can no longer write code like this:

// Draws chrome for scroll view, handles interaction with scrollbars
// and returns a scroll offset
scrollview(..., &offset);

// Draws controls that should go in the scroll view
button(..., button_pos - offset);
checkbox(..., checkbox_pos - offset);

The problem here is that to get the scroll view in the responder chain — which we want for PgUp, PgDn events — we need calls to begin_responder_scope() and end_responder_scope(). However, we can’t put these calls inside the scrollview() function because end_responder_scope() needs to be called after all child controls have been drawn. Possible ways of solving this:

  1. Have scrollview() return its ID and make the caller responsible for calling begin/end around the child controls.
  2. Make scrollview() take a callback function for drawing the child controls. scrollview() will place appropriate begin/end calls around the call to the callback function.
  3. Split the call to scrollview() into separate begin_scrollview() and end_scrollview() calls so that the right responder scope tracking functions can be added.

Neither option is super attractive to me, but out of these I think I prefer 3. Callbacks get too verbose and between 1 and 3, 3 is one less function call and puts less responsibility on the caller.

Trickling events

Having built the responder chain we need to figure out how to send events along the chain. In the RMGUI this was done by having controls explicitly send events to their owners. Something like:

  • Text Edit Box → Scroll View → Tab

In our IMGUI this is tricky, because we don’t have any permanently existing objects and the call order is the opposite of the way the messages need to go. I.e., when we are processing the call to window() we can’t know whether or not there will be a later call to textedit() that will consume the event we are interested in.

But here the solution to our previous problem comes to rescue us. With explicit scopes, the call order for the example above will in fact look something like:

begin_tab();
begin_scrollview();
textedit();
end_scrollview();
end_tab();

If we process keyboard events in the end_* functions, the items before us in the responder chain will already have had a chance to look at the events, and if they processed the events we can ignore them in the end_* function.

So the only things we to check is if we are in the responder chain and if the controls before us processed the event or not. We could do this with a return value from the control functions, or with just a flag:

void end_scrollview(...)
{
    if (key_pressed[KEY_PGDN] && in_responder_chain(my_id) && !event_processed) {
        // Handle scrolling...
        event_processed = true;
    }
}

Or even simpler — just have the control functions clear out the events they “eat” so that later functions won’t see them:

if (key_pressed[KEY_PGDN] && in_responder_chain(my_id)) {
    // Handle scrolling...
    key_pressed[KEY_PGDN] = false;
}

Another side note: You can get your head twisted around trying to figure out how to handle multiple events in a single IMGUI update loop. For example, what if the user pressed KEY_PGDN twice during the same frame? We cannot use a simple key_pressed array to represent that. What if there is a whole bunch of events? What if one of them is a Shift-Tab that will move the focus to a control that we already have processed? Etc, etc.

The right solution, in my opinion, is to skip this whole mess. Instead of trying to deal with multiple events, just define that the IMGUI update() will only handle a single event per call. If there are more than one event in a frame, just call the update() function multiple times. This shouldn’t be a performance problem, because it should only happen very rarely. Your UI should run at 60 Hz and the user shouldn’t be able to produce events faster than that.

Tabbing and Shift-Tabbing

For our final challenge, we want to add support for tabbing and shift-tabbing between all controls in the interface, so that the user can operate it using just the keyboard.

Again, this requires us to define a relationship between the controls. For each control, we need to know which one is the previous and the next control in the tab order. Luckily, we already have a good order for tabbing — the order in which the controls are rendered. We can just reuse that.

Pressing Tab should focus on the next control that can receive keyboard focus, while Shift-Tab should focus on the previous one. Two problems here. First, when rendering the previous control we don’t know that we are at the “previous control”, since we don’t know what will be rendered after us. Second, when rendering the current control, we don’t know the ID of the next control, since it hasn’t been rendered yet.

But all this can be solved with judicious use of state variables in the UI. Here is one way:

void acquire_tab_status(uint64_t my_id)
{
    if (tab_focus_on_id == my_id || focus on_next) {
        set_responder_chain(parent_responders, my_id);
        tab_focus_on_id = 0;
        focus_on_next = false;
    }

    if (in_focus(my_id) && is_pressed(KEY_TAB)) {
        if (is_down(KEY_SHIFT))
            tab_focus_on_id = last_id;
        else
            focus_on_next = true;
    }

    last_id = my_id;
}

This function should be called by every control function that can get focus from tabbing. Note that on Shift-Tab, there is a frame delay. The focus won’t actually be set until next frame, when the previous control calls acquire_tab_status(). We can’t call set_responder_chain(parent_responders, last_id) immediately, because we need the parent_responders stack from the control we are shift-tabbing to, not the current control.

Another side note: With introducing the responder chain, we now have two ways of representing the control the user is interacting with — the first responder, i.e. the last item in the responder chain array, but also the traditional IMGUI active item, which is the item the user is using the mouse to interact with. This is kind of messy, since they have similar meanings and we typically want to change them together.

I’ve decided to keep them separate for now and use the distinction for highlighting. If a user tabs to a radio button, it becomes active and first responder and I draw a blue highlight around it to show that it can receive keyboard input. However, if the user just clicks on the button, I don’t want the focus highlight, so I just make it active without making it first responder. But maybe it would be better to merge these concepts and use a separate flag for highlighting.

Conclusions

Working with this has reassured my beliefs that IMGUIs can be used for rich and full user experiences, just as RMGUIs.

There is definitely some mental gymnastics needed to make things work. Most problems can be solved by introducing appropriate state variables and frame delaying some transitions. But thinking about how those state variables mutate can be complicated. Multiple times I’ve been tripped up by doing things in the wrong order. For example, in the last code example above, it is important that the test for focus_on_next comes before the setting of focus_on_next or the control will just set focus to itself. This is easy enough to see in a simple example like this, but with more controls and more variables involved it is easy to mess up.

I’m wondering if maybe all mutable state should be broken out into its own struct and only applied at the end of the frame as a single transition (forcing a frame delay for all mutations). Immutable code is simpler to reason about and it would also make it possible to multithread the UI.

Despite the complexities, IMGUIs still feel simpler and more straightforward than RMGUIs. There are no long complicated inheritance chains, everything hot-reloads, it is easy to put a breakpoint anywhere for debugging, and the whole UI system, including the drawing library, is only about 5 K lines.

Here is a screen cap of the focus system in action:

Focus changes, keyboard navigation and input.