Triple Buffer System
Triple Buffer System
Overview
The Waxed compositor uses a triple-buffering strategy to manage frame presentation across multiple displays. Triple buffering provides:
- Decoupled rendering and display: The rendering thread can work on the next frame while the current frame is being displayed
- No tearing: Frames are only swapped during vertical blank
- No stalls: The renderer never waits for the display to release a buffer
- Zero-copy DMA-BUF: Frames are shared between GPU and display controller via DMA-BUF file descriptors
Each display maintains its own independent triple buffer state, allowing multiple displays to operate at different refresh rates without interference.
Core Data Structures
BufferSlot
The BufferSlot structure (defined in include/waxed/composer/buffer_slot.h) represents a single buffer in the triple-buffered system. Each slot contains:
struct BufferSlot {
FrameHandle frame; // The frame handle (we own the FD)
std::atomic<bool> in_use{false}; // True if display or worker is using this
uint64_t sequence_number{0}; // Frame sequence counter
uint64_t acquire_time_ms{0}; // When we acquired this frame
uint64_t fence_generation{0}; // Generation counter for ABA prevention
std::atomic<int> dump_ref_count{0}; // Active dump operations count
waxed::core::utils::UniqueFd release_fence_fd; // [OWNED] Fence FD (KMS OUT_FENCE)
waxed::core::utils::UniqueFd pending_release_fence_fd; // [OWNED] OUT_FENCE from previous cycle
int64_t out_fence_storage{0}; // [KERNEL-WRITE] Persistent storage for OUT_FENCE_PTR
uint32_t slot_index{UINT32_MAX}; // Slot index (0, 1, 2) for event dispatch
uint32_t display_id{UINT32_MAX}; // CRITICAL: Owner display ID
FenceClosure fence_closure; // Pre-allocated fence closure
};
Key Characteristics:
- Non-movable:
std::atomicmembers prohibit move semantics. The slots are default-initialized in-place. - Owned FD: The
frame.dma_buf_fdisdup()’d from the plugin. The slot owns this FD and closes it on reset. - Per-display ownership: Each slot is bound to one display via
display_id, preventing cross-display slot confusion.
DisplayBufferState
The DisplayBufferState class (defined in include/waxed/composer/display_buffer_state.h) consolidates all buffer state for a single display:
class DisplayBufferState {
static constexpr size_t TRIPLE_BUFFER_COUNT = 3;
std::array<BufferSlot, TRIPLE_BUFFER_COUNT> slots{};
std::array<CoreBuffer, TRIPLE_BUFFER_COUNT> swapchain_buffers{};
std::atomic<uint32_t> write_index{0}; // Slot to acquire next frame into
std::atomic<uint32_t> current_slot{0}; // Slot currently being displayed
std::atomic<uint32_t> next_slot{0}; // Slot queued for next frame
std::mutex mutex;
};
FenceClosure: Zero-Allocation Fence Handling
The FenceClosure structure enables zero-allocation fence event handling by embedding all necessary context directly in the epoll event:
struct FenceClosure {
int fd{-1}; // Set at submission time (varies per frame)
uint64_t generation{0}; // Generation counter at submission (prevents ABA)
BufferSlot* slot{nullptr}; // Set at slot creation time (constant)
uint32_t display_id{UINT32_MAX}; // CRITICAL: Store display_id directly
};
How it works:
- At slot initialization,
fence_closure.slotandfence_closure.display_idare set (constant) - At frame submission,
fence_closure.fdandfence_closure.generationare set (varies per frame) - When the fence signals, epoll returns the closure pointer directly via
epoll_data.ptr - The handler dereferences the closure, validates the generation counter, and processes the slot
This avoids heap allocations entirely in the fence signaling path.
Slot Lifecycle
Each BufferSlot cycles through four states during its lifetime:
State Descriptions
| State | Description | Who Owns It? |
|---|---|---|
| READY | Slot is available for acquiring a new frame | Compositor (free to acquire) |
| IN_USE | Slot has been acquired, frame is being rendered | Render thread |
| QUEUED | Frame is complete, queued for display | Compositor (waiting for vsync) |
| DISPLAY_PENDING | Frame is being displayed, OUT fence pending | Display hardware |
Transitions
- READY -> IN_USE:
acquire_frame()claims the slot viawrite_index - IN_USE -> QUEUED: Plugin calls
queue_frame(), frame is ready for presentation - QUEUED -> DISPLAY_PENDING:
drmModePageFlip()is called, OUT fence is created - DISPLAY_PENDING -> READY: OUT fence signals, slot is available again
Fence Generation Counters (ABA Prevention)
The fence_generation counter prevents the ABA problem in fence handling:
The Problem:
- Slot A is submitted with fence F1, generation G=0
- Fence F1 signals, but epoll event is delayed
- Slot A completes its lifecycle, returns to READY
- Slot A is reused, submitted with fence F2, generation G=1 (wraps to G=0 due to overflow)
- Delayed F1 event arrives
- Without generation check, we might mistakenly process the stale event
The Solution:
- Each slot submission increments
fence_generation FenceClosurecaptures the generation at submission time- On fence signal, compare
closure.generation == slot.fence_generation - Mismatch means the event is stale and must be ignored
With a 64-bit counter, overflow is practically impossible (thousands of years at 60fps).
Out Fence Storage
The out_fence_storage field provides persistent storage for the kernel’s OUT_FENCE_PTR:
int64_t out_fence_storage{0}; // [KERNEL-WRITE] Persistent storage
Why this exists:
- KMS
drmModeAtomicCommit()withDRM_MODE_ATOMIC_NONBLOCKrequires a pointer to__u64for OUT_FENCE_PTR - The kernel writes the fence FD to this location
- The storage must remain valid until the commit completes
- Using a member variable ensures the storage lives as long as the slot
Usage flow:
- Point
DRM_MODE_ATOMIC_OUT_FENCE_PTRproperty to&out_fence_storage - Call
drmModeAtomicCommit() - Kernel writes fence FD to
out_fence_storage - Read
out_fence_storageand wrap inrelease_fence_fd
dump_ref_count for Frame Dumping
The dump_ref_count atomic counter enables safe frame dumping:
std::atomic<int> dump_ref_count{0};
Purpose:
- When a frame is being dumped (e.g., for debugging or screenshot),
dump_ref_countis incremented - The slot cannot be reset or reused while
dump_ref_count > 0 - Prevents use-after-free if the dump operation is slow
Usage:
- Before dumping:
slot->dump_ref_count++ - After dumping completes:
slot->dump_ref_count-- - Check before reset:
if (slot->dump_ref_count == 0) { reset(); }
Per-Display Slot Ownership
Each slot is bound to a single display via display_id:
uint32_t display_id{UINT32_MAX}; // CRITICAL: Owner display ID
Why this matters:
- Multiple displays can have different refresh rates
- A slot cannot be shared between displays (different modesettings)
- Fence events are dispatched per-display via
FenceClosure.display_id - Prevents confusion when processing fence callbacks
EBUSY Prevention Mechanism
The in_use atomic prevents concurrent access to slots:
std::atomic<bool> in_use{false};
EBUSY Scenario:
- Renderer tries to acquire a slot that’s still being displayed
- Display tries to flip to a slot that’s being rendered into
in_useacts as a lock, preventing both scenarios
Acquire pattern:
bool expected = false;
if (!slot->in_use.compare_exchange_strong(expected, true)) {
return EBUSY; // Slot is in use, try next slot
}
Release pattern:
slot->in_use.store(false, std::memory_order_release);
Triple Buffer State Machine
Three Slots Rotating
The slots rotate through states as the display progresses:
- CURRENT: The slot currently being displayed
- NEXT: The slot queued for the next vsync
- WRITE: The slot ready for the renderer to acquire
Frame Lifecycle with Fences
Timeline Detail:
| Time | Event | State Change |
|---|---|---|
| T0 | acquire_frame() claims slot | READY → IN_USE |
| T1 | Render complete, queue_frame() | IN_USE → QUEUED |
| T2 | drmModePageFlip() submits slot | Creates OUT fence F0 |
| T3 | Fence stored in fence_closure.fd | QUEUED → DISPLAY_PENDING |
| T4 | Renderer acquires next slot (now READY) | New slot: READY → IN_USE |
| T5 | Fence F0 signals, epoll delivers FenceClosure | - |
| T6 | Handler validates generation, marks slot READY | DISPLAY_PENDING → READY |
| T7 | Next page flip submits slot, cycle repeats | READY → IN_USE |
Implementation Notes
Thread Safety
- Slot access: Protected by
in_useatomic flag - Display state: Protected by
DisplayBufferState::mutex - Write index: Atomic, allows lock-free slot acquisition
- Fence closure: Read-only after submission, safe for concurrent access
Zero Allocation Hot Path
The fence signaling path is zero-allocation:
FenceClosureis embedded inBufferSlot(stack allocation)epoll_data.ptrpoints directly to the closure- No heap allocation in the signal handler
RAII Fence Management
Fence FDs are managed via UniqueFd:
release_fence_fd: Current OUT fence (owned by slot)pending_release_fence_fd: Previous OUT fence (passed to plugin)- Both close automatically on reset or destructor
Initialization
display_buffer_state.initialize_slots();
Sets up each slot with:
slot_index: 0, 1, 2 (for event dispatch)fence_closure.slot: Points to containing slotfence_closure.display_id: Owning display ID- All fields reset to initial state
Summary
The Waxed Triple Buffer System provides:
- Triple buffering for decoupled rendering and display
- Per-display ownership for multi-display independence
- Zero-allocation fence handling via embedded
FenceClosure - ABA prevention via 64-bit generation counters
- EBUSY prevention via atomic
in_useflag - Safe frame dumping via
dump_ref_count - Kernel-compatible OUT fence storage via
out_fence_storage
All designed for zero-copy DMA-BUF presentation with Vulkan rendering.