Skip to content
Waxed Display Server
← Back to Docs

Render Loop

Waxed Render Loop

Overview and Architecture

The Render Loop is the heart of the Waxed compositor, orchestrating all rendering, input handling, display management, and event processing in a unified epoll-based event loop. It implements a “single source of truth” design where only DRM page flip events (VBlank) trigger frame rendering, ensuring consistent frame pacing and eliminating EBUSY errors.

Key Design Principles

  1. Unified Event Loop: All events (DRM, IPC, Input, Seat, Fence) are dispatched through a single epoll_wait() call
  2. Per-Display VSync: Each display runs at its native refresh rate independently (60Hz + 144Hz mixed support)
  3. Atomic Operations: All state changes use atomic variables for lock-free synchronization
  4. Triple Buffering: Three buffer slots per display enable pipelined rendering
  5. Late Latching: Input is read as close to scanout as possible for minimal latency
  6. Zero Allocation: Event dispatch uses marker addresses instead of dynamic allocations

Unified Epoll Event Loop

The render loop centers on a single epoll_wait() call that blocks until any event occurs. Events are dispatched based on pointer comparison with marker addresses.

// Main event loop (simplified)
while (!stop_requested_) {
    int nfds = epoll_wait(epoll_fd_.get(), events, MAX_EVENTS, -1);
    for (int i = 0; i < nfds; ++i) {
        void* ptr = events[i].data.ptr;

        if (ptr == &drm_marker_)     handle_drm_event();
        else if (ptr == &ipc_marker_) handle_ipc_event();
        else if (ptr == &input_marker_) handle_input_event();
        else if (ptr == &seat_marker_) dispatch_seat_event();
        else if (ptr == &shutdown_marker_) drain_self_pipe();
        else handle_fence_signaled_event(ptr);  // BufferSlot* fence
    }
}

Handler Types

FENCE Handler (Display completion signal)

  • Trigger: When a display fence signals (frame no longer on screen)
  • Data: void* ptr = FenceClosure* embedded in BufferSlot
  • Action: Release buffer slot, mark display ready for next frame
  • Key: Fence generation checking prevents ABA problem

DRM Handler (Page flip + hotplug)

  • Trigger: Kernel sends DRM event on readable DRM FD
  • Data: void* ptr = &drm_marker_
  • Action: Call drmHandleEvent() which invokes page_flip_handler_wrapper() for each completed flip
  • Key: Each display’s page flip triggers render_next_frame() for that display only

IPC Handler (waxedctl commands)

  • Trigger: IPC socket has pending connection
  • Data: void* ptr = &ipc_marker_
  • Action: Drain up to 10 pending connections
  • Mode: Edge-triggered (EPOLLET) to prevent starvation

INPUT Handler (libinput events)

  • Trigger: libinput FD has mouse/keyboard events
  • Data: void* ptr = &input_marker_
  • Action: Process input, update cursor position atomics, mark render needed
  • Key: Cursor position clamped to valid display area for multi-monitor

SEAT Handler (VT switch events)

  • Trigger: libseat FD has VT switch enable/disable event
  • Data: void* ptr = &seat_marker_
  • Action: Dispatch to SeatManager, pause/resume rendering
  • Key: Pause before VT switch away prevents DRM errors

VSync Configuration

VSync is controlled by set_vsync_config(bool enabled):

  • Enabled (default): Frame rate limited to display refresh rate via DRM page flip events
  • Disabled: Render as fast as possible (unthrottled)

VSync is implemented at the hardware level through DRM DRM_MODE_PAGE_FLIP_EVENT. The kernel only delivers page flip events at VBlank, naturally throttling rendering to the display’s refresh rate.

Frame Dump System

Frame dumping captures rendered frames to disk for debugging or recording.

Architecture

  • Worker Thread: Dedicated thread handles CPU-intensive dump operations
  • Non-blocking: Main render thread never waits for dump completion
  • Simplified: Single atomic pointer tracks which slot is being dumped

Flow

1. Frame completes render (with GPU render_fence)
2. Check dump_fps rate limit
3. If due: queue_frame_for_dump(slot)
   - If worker busy: abort current dump, wait for slot release
   - Set dumping_slot_ = slot
4. Worker thread wakes up
   - Wait for render_fence (30ms timeout)
   - mmap DMA-BUF, DMA_BUF_SYNC_START
   - Call frame_dumper_.dump_frame()
   - DMA_BUF_SYNC_END, munmap
   - Clear dumping_slot_, notify main thread

Rate Limiting

dump_interval_ms_ = 1000 / dump_fps_;  // e.g., 60fps = 16.67ms
if (now - last_dump_time_ms_ >= dump_interval_ms_) {
    queue_frame_for_dump(slot);
}

Bootstrap Sequence (First Frame)

The render loop faces a chicken-and-egg problem on startup:

epoll_wait() blocks for page flip event

Page flip event only arrives after frame submitted

Frame submission happens in execute_render()

execute_render() only called from page flip handler

DEADLOCK - nothing can run first

Solution: Bootstrap First Frame

run() {
    // Phase 1: Bootstrap (synchronous modeset)
    bootstrap_first_frame();

    // Phase 2: Trigger second frame (generates first page flip event)
    on_vblank_render();

    // Phase 3: Enter epoll loop (steady state)
    while (!stop_requested_) {
        epoll_wait(...);  // Now page flip events will arrive
    }
}

Bootstrap Flags

  • was_crtc_enabled: False on first frame, true thereafter
  • bootstrap_complete_: Set by first page flip handler, enables async cursor
  • Commit flags:
    • First frame: DRM_MODE_ATOMIC_ALLOW_MODESET (blocking, no event)
    • Subsequent: DRM_MODE_ATOMIC_NONBLOCK | DRM_MODE_PAGE_FLIP_EVENT

VT Switch Handling

VT (Virtual Terminal) switching allows switching between graphical sessions (Ctrl+Alt+F1-F12).

Pause (Switch Away)

Triggered by:

  • libseat seat_disable callback
  • InputManager VT switch keybinding
handle_seat_disable() {
    if (paused_.exchange(true)) return;  // Already paused
    LOGC_INFO("VT switch away - pausing rendering");
    // No DRM operations will occur until resumed
}

Effect:

  • paused_ flag checked before each render
  • In-flight commits complete harmlessly
  • Kernel revokes DRM master

Resume (Switch Back)

Triggered by:

  • libseat seat_enable callback
handle_seat_enable() {
    if (!paused_.exchange(false)) return;  // Wasn't paused
    LOGC_INFO("VT switch back - resuming rendering");
    mark_render_needed();  // Refresh screen on next VBlank
}

Effect:

  • Kernel grants DRM master
  • Next page flip triggers frame render
  • Screen refreshes to current state

Per-Display Rendering

Each display runs at its own refresh rate, driven by its page flip events.

Independent Timeline

Display 0 (60Hz)PF0render_next_frame(display)16.7msPF1render_next_frame(display)33.3msPF2render_next_frame(display)50msPF3render_next_frame(display)Display 1 (144Hz)PF0render_next_frame(display)6.9msPF1render_next_frame(display)13.9msPF2render_next_frame(display)20.8msPF3render_next_frame(display)Per-Display Rendering Timeline

EBUSY Protection

Each display has commit_pending flag:

if (display.runtime.commit_pending.load()) {
    return;  // Previous commit still in-flight
}
// Submit atomic commit
display.runtime.commit_pending.store(true);

Cleared by page flip handler when commit completes.

Fence-Triggered Rendering

Fences signal when buffers are no longer on screen (NOT when they enter).

Fence Lifecycle

EpollKernelRender LoopEpollKernelRender Loop~16ms later at next VBlankSubmit frame with OUT_FENCE_PTRReturn fence FDStore in slot.release_fence_fdAdd fence to epollFence signals (EPOLLIN)handle_fence_signaled_event()Release slot for reuse

Generation Counter

Prevents ABA problem where slot is reused while fence is pending:

slot.fence_generation = next_fence_gen_.fetch_add(1);
fence_closure.generation = slot.fence_generation;
// Later, check before processing:
if (slot.fence_generation != closure.generation) {
    return;  // Stale fence, ignore
}

Lazy Cleanup

Fences are checked for signal during slot acquisition (not epoll callback):

if (is_fence_signaled(slot.release_fence_fd)) {
    // Transfer to pending_release_fence_fd for next frame
    slot.pending_release_fence_fd.reset(slot.release_fence_fd.release());
}

Adaptive VBlank Timing (Late Latching)

Late latching reads input as close to scanout as possible, minimizing input latency.

Calculate Render Deadline

deadline = next_vblank - render_duration - safety_margin;

Components:

  • render_duration: EMA estimate (1/8 weight new, 7/8 old)
  • safety_margin: 20% of render duration
  • vblank_interval: ~16.67ms for 60Hz

Late Latching Loop

deadline_ms = calculate_render_deadline(display);
if (deadline_ms > now) {
    ppoll(self_pipe, deadline_ms - now);  // Interruptible
}
// Now render - input is fresh
execute_render();

Cursor Position Tracking

Cursor position is tracked in atomic variables for lock-free access.

Update Flow

libinput event arrives

handle_input_event()

clamp_cursor_to_display_area()

cursor_x_, cursor_y_ atomics updated

Async cursor update (1000Hz+)

mark_render_needed() for frame commit

Multi-Monitor Clamping

Cursor confined to union of all display areas:

Display 0: [0, 1920) x [0, 1080)
Display 1: [1920, 3840) x [0, 1440)

Valid cursor area:
  X: [0, 3840)
  Y: At X in [0, 1920): [0, 1080)
  Y: At X in [1920, 3840): [0, 1440)

Cursor in Frame Commit

Cursor position merged into atomic commits:

submit_frame(..., cursor_x, cursor_y);

Prevents “jump back” artifact where async cursor position gets overwritten.

Self-Pipe for Interruptible Shutdown

The self-pipe trick allows immediate shutdown response without using signals.

Architecture

EpollShutdown CallerMain ThreadEpollShutdown CallerMain Threadepoll_wait(self_pipe_fd)write(self_pipe[1], "X")self_pipe[0] readableEPOLLIN eventCheck stop_requested flag

Shutdown Flow

stop() {
    write(self_pipe[1], &wake_byte, 1);  // Wake epoll
    stop_requested_ = true;                // Set flag
}

// In epoll loop:
if (ptr == &shutdown_marker_) {
    drain(self_pipe[0]);  // Clear pipe
    // Loop will check stop_requested_ and exit
}

Benefits

  • No signals required (portable)
  • Immediate response (no timeout)
  • Works during ppoll() in late latching
  • Single byte wakes epoll regardless of wait time

ASCII Diagrams

Epoll Event Flow

Yes

Timeout/EINTR

epoll_wait()

Event arrives?

ptr == marker?

drm_marker

ipc_marker

input_marker

seat_marker

fence_handler

drmHandleEvent

drain IPC

process input

dispatch seat

release slot

Render Pipeline

VBlank Event (Page Flip)

page_flip_handler_wrapper(crtc_id)

Look up DisplayState by CRTC ID

Clear commit_pending flag

render_next_frame(display)

acquire_frame_for_display()

Find available slot (triple buffering)

Check fence signaled (lazy cleanup)

render_display()

Build DisplayConfig

plugin_manager_.render_to_target()

Plugin renders to DMA-BUF

Returns render_fence_fd

display.output.submit_frame()

drmModeAtomicCommit

Sets commit_pending = true

Gets release_fence_fd (OUT_FENCE)

Add fence to epoll for cleanup

Frame Timeline

000 ms000 ms000 ms000 ms000 ms000 ms000 ms000 ms000 ms000 ms000 ms000 ms000 ms000 msPF0 (slot0) PF0 PF1 PF2 PF1 (slot1) PF3 PF4 PF2 (slot2) PF5 PF6 PF7 PF3 (slot0) Display 0 (60Hz)Display 1 (144Hz)Frame Timeline for Multiple Displays

Buffer Slot States

Plugin acquires slot

Plugin renders & submits

Frame on screen, fence in epoll

Fence signals, slot released

AVAILABLE

ACQUIRED

ON_SCREEN

SIGNALING

Plugin rendering to this buffer

Buffer currently displayed

commit_pending = true

Fence fd in epoll

Waiting for VBlank signal

Free to acquire for new frame

Bootstrap Sequence

Startup

bootstrap_first_frame()

execute_render()

render_display() for each display

submit_frame(ALLOW_MODESET) - BLOCKING

CRTC enabled (no page flip event)

on_vblank_render() - Manual trigger

execute_render()

render_display() for each display

submit_frame(NONBLOCK | PAGE_FLIP_EVENT)

commit_pending = true

enter epoll_wait() loop

PAGE_FLIP_EVENT arrives (from second frame)

page_flip_handler_wrapper() called

mark_bootstrap_complete() - Enables async cursor

render_next_frame() - Normal steady state

Key Functions

Public API

  • init(drm_fd, seat_manager) - Initialize render loop with DRM device
  • run() - Enter render loop (blocking)
  • stop() - Request graceful shutdown
  • set_vsync_config(enabled) - Enable/disable VSync
  • set_dump_fps(fps) - Configure frame dump rate
  • get_display_manager() - Access display manager
  • render_next_frame(display) - Per-display render trigger
  • handle_seat_disable() - VT switch away
  • handle_seat_enable() - VT switch back

Internal

  • bootstrap_first_frame() - Break startup deadlock
  • execute_render() - Render all active displays
  • render_display() - Render single display
  • acquire_frame_for_display() - Get buffer slot
  • release_slot_for_display() - Return buffer slot
  • on_vblank_render() - VBlank render trigger
  • mark_render_needed() - Defer render to next VBlank
  • handle_drm_event() - Process DRM events
  • handle_fence_signaled_event() - Cleanup signaled fence
  • calculate_render_deadline() - Late latching timing
  • clamp_cursor_to_display_area() - Multi-monitor cursor

State Variables

VariableTypePurpose
running_atomicMain loop active
stop_requested_atomicShutdown requested
paused_atomicVT switch paused
bootstrap_complete_atomicFirst flip done
needs_render_atomicInput/IPC pending
cursor_x_atomicGlobal cursor X
cursor_y_atomicGlobal cursor Y
next_fence_gen_atomic<uint64_t>Fence generation
frame_count_uint64_tTotal frames rendered
fence_pending_boolFence waiting (deprecated)
vsync_enabled_boolVSync on/off

Thread Safety

  • Main thread: Runs render loop, handles all events
  • Dump worker: Runs in background thread, accesses slot via atomic pointer
  • Atomic variables: All cross-thread state uses atomics
  • Mutexes: buffer_state.mutex protects slot lists
  • Condition variable: worker_cv_ for dump worker signaling

Performance Considerations

  1. Zero epoll allocation: Marker addresses instead of new
  2. Lazy fence cleanup: Check signaling during acquire, not callback
  3. Triple buffering: Always have slot ready for render
  4. Per-display VSync: No global frame rate bottleneck
  5. Async cursor: 1000Hz+ updates without EBUSY
  6. Late latching: Minimizes input-to-photon latency