Render Loop
Waxed Render Loop
Overview and Architecture
The Render Loop is the heart of the Waxed compositor, orchestrating all rendering, input handling, display management, and event processing in a unified epoll-based event loop. It implements a “single source of truth” design where only DRM page flip events (VBlank) trigger frame rendering, ensuring consistent frame pacing and eliminating EBUSY errors.
Key Design Principles
- Unified Event Loop: All events (DRM, IPC, Input, Seat, Fence) are dispatched through a single
epoll_wait()call - Per-Display VSync: Each display runs at its native refresh rate independently (60Hz + 144Hz mixed support)
- Atomic Operations: All state changes use atomic variables for lock-free synchronization
- Triple Buffering: Three buffer slots per display enable pipelined rendering
- Late Latching: Input is read as close to scanout as possible for minimal latency
- Zero Allocation: Event dispatch uses marker addresses instead of dynamic allocations
Unified Epoll Event Loop
The render loop centers on a single epoll_wait() call that blocks until any event occurs. Events are dispatched based on pointer comparison with marker addresses.
// Main event loop (simplified)
while (!stop_requested_) {
int nfds = epoll_wait(epoll_fd_.get(), events, MAX_EVENTS, -1);
for (int i = 0; i < nfds; ++i) {
void* ptr = events[i].data.ptr;
if (ptr == &drm_marker_) handle_drm_event();
else if (ptr == &ipc_marker_) handle_ipc_event();
else if (ptr == &input_marker_) handle_input_event();
else if (ptr == &seat_marker_) dispatch_seat_event();
else if (ptr == &shutdown_marker_) drain_self_pipe();
else handle_fence_signaled_event(ptr); // BufferSlot* fence
}
}
Handler Types
FENCE Handler (Display completion signal)
- Trigger: When a display fence signals (frame no longer on screen)
- Data:
void* ptr=FenceClosure*embedded inBufferSlot - Action: Release buffer slot, mark display ready for next frame
- Key: Fence generation checking prevents ABA problem
DRM Handler (Page flip + hotplug)
- Trigger: Kernel sends DRM event on readable DRM FD
- Data:
void* ptr=&drm_marker_ - Action: Call
drmHandleEvent()which invokespage_flip_handler_wrapper()for each completed flip - Key: Each display’s page flip triggers
render_next_frame()for that display only
IPC Handler (waxedctl commands)
- Trigger: IPC socket has pending connection
- Data:
void* ptr=&ipc_marker_ - Action: Drain up to 10 pending connections
- Mode: Edge-triggered (EPOLLET) to prevent starvation
INPUT Handler (libinput events)
- Trigger: libinput FD has mouse/keyboard events
- Data:
void* ptr=&input_marker_ - Action: Process input, update cursor position atomics, mark render needed
- Key: Cursor position clamped to valid display area for multi-monitor
SEAT Handler (VT switch events)
- Trigger: libseat FD has VT switch enable/disable event
- Data:
void* ptr=&seat_marker_ - Action: Dispatch to
SeatManager, pause/resume rendering - Key: Pause before VT switch away prevents DRM errors
VSync Configuration
VSync is controlled by set_vsync_config(bool enabled):
- Enabled (default): Frame rate limited to display refresh rate via DRM page flip events
- Disabled: Render as fast as possible (unthrottled)
VSync is implemented at the hardware level through DRM DRM_MODE_PAGE_FLIP_EVENT. The kernel only delivers page flip events at VBlank, naturally throttling rendering to the display’s refresh rate.
Frame Dump System
Frame dumping captures rendered frames to disk for debugging or recording.
Architecture
- Worker Thread: Dedicated thread handles CPU-intensive dump operations
- Non-blocking: Main render thread never waits for dump completion
- Simplified: Single atomic pointer tracks which slot is being dumped
Flow
1. Frame completes render (with GPU render_fence)
2. Check dump_fps rate limit
3. If due: queue_frame_for_dump(slot)
- If worker busy: abort current dump, wait for slot release
- Set dumping_slot_ = slot
4. Worker thread wakes up
- Wait for render_fence (30ms timeout)
- mmap DMA-BUF, DMA_BUF_SYNC_START
- Call frame_dumper_.dump_frame()
- DMA_BUF_SYNC_END, munmap
- Clear dumping_slot_, notify main thread
Rate Limiting
dump_interval_ms_ = 1000 / dump_fps_; // e.g., 60fps = 16.67ms
if (now - last_dump_time_ms_ >= dump_interval_ms_) {
queue_frame_for_dump(slot);
}
Bootstrap Sequence (First Frame)
The render loop faces a chicken-and-egg problem on startup:
Solution: Bootstrap First Frame
run() {
// Phase 1: Bootstrap (synchronous modeset)
bootstrap_first_frame();
// Phase 2: Trigger second frame (generates first page flip event)
on_vblank_render();
// Phase 3: Enter epoll loop (steady state)
while (!stop_requested_) {
epoll_wait(...); // Now page flip events will arrive
}
}
Bootstrap Flags
was_crtc_enabled: False on first frame, true thereafterbootstrap_complete_: Set by first page flip handler, enables async cursor- Commit flags:
- First frame:
DRM_MODE_ATOMIC_ALLOW_MODESET(blocking, no event) - Subsequent:
DRM_MODE_ATOMIC_NONBLOCK | DRM_MODE_PAGE_FLIP_EVENT
- First frame:
VT Switch Handling
VT (Virtual Terminal) switching allows switching between graphical sessions (Ctrl+Alt+F1-F12).
Pause (Switch Away)
Triggered by:
- libseat
seat_disablecallback - InputManager VT switch keybinding
handle_seat_disable() {
if (paused_.exchange(true)) return; // Already paused
LOGC_INFO("VT switch away - pausing rendering");
// No DRM operations will occur until resumed
}
Effect:
paused_flag checked before each render- In-flight commits complete harmlessly
- Kernel revokes DRM master
Resume (Switch Back)
Triggered by:
- libseat
seat_enablecallback
handle_seat_enable() {
if (!paused_.exchange(false)) return; // Wasn't paused
LOGC_INFO("VT switch back - resuming rendering");
mark_render_needed(); // Refresh screen on next VBlank
}
Effect:
- Kernel grants DRM master
- Next page flip triggers frame render
- Screen refreshes to current state
Per-Display Rendering
Each display runs at its own refresh rate, driven by its page flip events.
Independent Timeline
EBUSY Protection
Each display has commit_pending flag:
if (display.runtime.commit_pending.load()) {
return; // Previous commit still in-flight
}
// Submit atomic commit
display.runtime.commit_pending.store(true);
Cleared by page flip handler when commit completes.
Fence-Triggered Rendering
Fences signal when buffers are no longer on screen (NOT when they enter).
Fence Lifecycle
Generation Counter
Prevents ABA problem where slot is reused while fence is pending:
slot.fence_generation = next_fence_gen_.fetch_add(1);
fence_closure.generation = slot.fence_generation;
// Later, check before processing:
if (slot.fence_generation != closure.generation) {
return; // Stale fence, ignore
}
Lazy Cleanup
Fences are checked for signal during slot acquisition (not epoll callback):
if (is_fence_signaled(slot.release_fence_fd)) {
// Transfer to pending_release_fence_fd for next frame
slot.pending_release_fence_fd.reset(slot.release_fence_fd.release());
}
Adaptive VBlank Timing (Late Latching)
Late latching reads input as close to scanout as possible, minimizing input latency.
Calculate Render Deadline
deadline = next_vblank - render_duration - safety_margin;
Components:
render_duration: EMA estimate (1/8 weight new, 7/8 old)safety_margin: 20% of render durationvblank_interval: ~16.67ms for 60Hz
Late Latching Loop
deadline_ms = calculate_render_deadline(display);
if (deadline_ms > now) {
ppoll(self_pipe, deadline_ms - now); // Interruptible
}
// Now render - input is fresh
execute_render();
Cursor Position Tracking
Cursor position is tracked in atomic variables for lock-free access.
Update Flow
Multi-Monitor Clamping
Cursor confined to union of all display areas:
Display 0: [0, 1920) x [0, 1080)
Display 1: [1920, 3840) x [0, 1440)
Valid cursor area:
X: [0, 3840)
Y: At X in [0, 1920): [0, 1080)
Y: At X in [1920, 3840): [0, 1440)
Cursor in Frame Commit
Cursor position merged into atomic commits:
submit_frame(..., cursor_x, cursor_y);
Prevents “jump back” artifact where async cursor position gets overwritten.
Self-Pipe for Interruptible Shutdown
The self-pipe trick allows immediate shutdown response without using signals.
Architecture
Shutdown Flow
stop() {
write(self_pipe[1], &wake_byte, 1); // Wake epoll
stop_requested_ = true; // Set flag
}
// In epoll loop:
if (ptr == &shutdown_marker_) {
drain(self_pipe[0]); // Clear pipe
// Loop will check stop_requested_ and exit
}
Benefits
- No signals required (portable)
- Immediate response (no timeout)
- Works during ppoll() in late latching
- Single byte wakes epoll regardless of wait time
ASCII Diagrams
Epoll Event Flow
Render Pipeline
Frame Timeline
Buffer Slot States
Bootstrap Sequence
Key Functions
Public API
init(drm_fd, seat_manager)- Initialize render loop with DRM devicerun()- Enter render loop (blocking)stop()- Request graceful shutdownset_vsync_config(enabled)- Enable/disable VSyncset_dump_fps(fps)- Configure frame dump rateget_display_manager()- Access display managerrender_next_frame(display)- Per-display render triggerhandle_seat_disable()- VT switch awayhandle_seat_enable()- VT switch back
Internal
bootstrap_first_frame()- Break startup deadlockexecute_render()- Render all active displaysrender_display()- Render single displayacquire_frame_for_display()- Get buffer slotrelease_slot_for_display()- Return buffer sloton_vblank_render()- VBlank render triggermark_render_needed()- Defer render to next VBlankhandle_drm_event()- Process DRM eventshandle_fence_signaled_event()- Cleanup signaled fencecalculate_render_deadline()- Late latching timingclamp_cursor_to_display_area()- Multi-monitor cursor
State Variables
| Variable | Type | Purpose |
|---|---|---|
running_ | atomic | Main loop active |
stop_requested_ | atomic | Shutdown requested |
paused_ | atomic | VT switch paused |
bootstrap_complete_ | atomic | First flip done |
needs_render_ | atomic | Input/IPC pending |
cursor_x_ | atomic | Global cursor X |
cursor_y_ | atomic | Global cursor Y |
next_fence_gen_ | atomic<uint64_t> | Fence generation |
frame_count_ | uint64_t | Total frames rendered |
fence_pending_ | bool | Fence waiting (deprecated) |
vsync_enabled_ | bool | VSync on/off |
Thread Safety
- Main thread: Runs render loop, handles all events
- Dump worker: Runs in background thread, accesses slot via atomic pointer
- Atomic variables: All cross-thread state uses atomics
- Mutexes:
buffer_state.mutexprotects slot lists - Condition variable:
worker_cv_for dump worker signaling
Performance Considerations
- Zero epoll allocation: Marker addresses instead of
new - Lazy fence cleanup: Check signaling during acquire, not callback
- Triple buffering: Always have slot ready for render
- Per-display VSync: No global frame rate bottleneck
- Async cursor: 1000Hz+ updates without EBUSY
- Late latching: Minimizes input-to-photon latency