Triple Buffer System

Overview

The Waxed compositor uses a triple-buffering strategy to manage frame presentation across multiple displays. Triple buffering provides:

Decoupled rendering and display: The rendering thread can work on the next frame while the current frame is being displayed
No tearing: Frames are only swapped during vertical blank
No stalls: The renderer never waits for the display to release a buffer
Zero-copy DMA-BUF: Frames are shared between GPU and display controller via DMA-BUF file descriptors

Each display maintains its own independent triple buffer state, allowing multiple displays to operate at different refresh rates without interference.

Core Data Structures

BufferSlot

The BufferSlot structure (defined in include/waxed/composer/buffer_slot.h) represents a single buffer in the triple-buffered system. Each slot contains:

struct BufferSlot {
    FrameHandle frame;                      // The frame handle (we own the FD)
    std::atomic<bool> in_use{false};        // True if display or worker is using this
    uint64_t sequence_number{0};            // Frame sequence counter
    uint64_t acquire_time_ms{0};            // When we acquired this frame
    uint64_t fence_generation{0};           // Generation counter for ABA prevention
    std::atomic<int> dump_ref_count{0};     // Active dump operations count
    waxed::core::utils::UniqueFd release_fence_fd;  // [OWNED] Fence FD (KMS OUT_FENCE)
    waxed::core::utils::UniqueFd pending_release_fence_fd;  // [OWNED] OUT_FENCE from previous cycle
    int64_t out_fence_storage{0};          // [KERNEL-WRITE] Persistent storage for OUT_FENCE_PTR
    uint32_t slot_index{UINT32_MAX};        // Slot index (0, 1, 2) for event dispatch
    uint32_t display_id{UINT32_MAX};        // CRITICAL: Owner display ID
    FenceClosure fence_closure;             // Pre-allocated fence closure
};

Key Characteristics:

Non-movable: std::atomic members prohibit move semantics. The slots are default-initialized in-place.
Owned FD: The frame.dma_buf_fd is dup()’d from the plugin. The slot owns this FD and closes it on reset.
Per-display ownership: Each slot is bound to one display via display_id, preventing cross-display slot confusion.

DisplayBufferState

The DisplayBufferState class (defined in include/waxed/composer/display_buffer_state.h) consolidates all buffer state for a single display:

class DisplayBufferState {
    static constexpr size_t TRIPLE_BUFFER_COUNT = 3;

    std::array<BufferSlot, TRIPLE_BUFFER_COUNT> slots{};
    std::array<CoreBuffer, TRIPLE_BUFFER_COUNT> swapchain_buffers{};
    std::atomic<uint32_t> write_index{0};   // Slot to acquire next frame into
    std::atomic<uint32_t> current_slot{0};  // Slot currently being displayed
    std::atomic<uint32_t> next_slot{0};     // Slot queued for next frame
    std::mutex mutex;
};

FenceClosure: Zero-Allocation Fence Handling

The FenceClosure structure enables zero-allocation fence event handling by embedding all necessary context directly in the epoll event:

struct FenceClosure {
    int fd{-1};                          // Set at submission time (varies per frame)
    uint64_t generation{0};              // Generation counter at submission (prevents ABA)
    BufferSlot* slot{nullptr};           // Set at slot creation time (constant)
    uint32_t display_id{UINT32_MAX};     // CRITICAL: Store display_id directly
};

How it works:

At slot initialization, fence_closure.slot and fence_closure.display_id are set (constant)
At frame submission, fence_closure.fd and fence_closure.generation are set (varies per frame)
When the fence signals, epoll returns the closure pointer directly via epoll_data.ptr
The handler dereferences the closure, validates the generation counter, and processes the slot

This avoids heap allocations entirely in the fence signaling path.

Slot Lifecycle

Each BufferSlot cycles through four states during its lifetime:

State Descriptions

State	Description	Who Owns It?
READY	Slot is available for acquiring a new frame	Compositor (free to acquire)
IN_USE	Slot has been acquired, frame is being rendered	Render thread
QUEUED	Frame is complete, queued for display	Compositor (waiting for vsync)
DISPLAY_PENDING	Frame is being displayed, OUT fence pending	Display hardware

Transitions

READY -> IN_USE: acquire_frame() claims the slot via write_index
IN_USE -> QUEUED: Plugin calls queue_frame(), frame is ready for presentation
QUEUED -> DISPLAY_PENDING: drmModePageFlip() is called, OUT fence is created
DISPLAY_PENDING -> READY: OUT fence signals, slot is available again

Fence Generation Counters (ABA Prevention)

The fence_generation counter prevents the ABA problem in fence handling:

The Problem:

Slot A is submitted with fence F1, generation G=0
Fence F1 signals, but epoll event is delayed
Slot A completes its lifecycle, returns to READY
Slot A is reused, submitted with fence F2, generation G=1 (wraps to G=0 due to overflow)
Delayed F1 event arrives
Without generation check, we might mistakenly process the stale event

The Solution:

Each slot submission increments fence_generation
FenceClosure captures the generation at submission time
On fence signal, compare closure.generation == slot.fence_generation
Mismatch means the event is stale and must be ignored

With a 64-bit counter, overflow is practically impossible (thousands of years at 60fps).

Out Fence Storage

The out_fence_storage field provides persistent storage for the kernel’s OUT_FENCE_PTR:

int64_t out_fence_storage{0};  // [KERNEL-WRITE] Persistent storage

Why this exists:

KMS drmModeAtomicCommit() with DRM_MODE_ATOMIC_NONBLOCK requires a pointer to __u64 for OUT_FENCE_PTR
The kernel writes the fence FD to this location
The storage must remain valid until the commit completes
Using a member variable ensures the storage lives as long as the slot

Usage flow:

Point DRM_MODE_ATOMIC_OUT_FENCE_PTR property to &out_fence_storage
Call drmModeAtomicCommit()
Kernel writes fence FD to out_fence_storage
Read out_fence_storage and wrap in release_fence_fd

dump_ref_count for Frame Dumping

The dump_ref_count atomic counter enables safe frame dumping:

std::atomic<int> dump_ref_count{0};

Purpose:

When a frame is being dumped (e.g., for debugging or screenshot), dump_ref_count is incremented
The slot cannot be reset or reused while dump_ref_count > 0
Prevents use-after-free if the dump operation is slow

Usage:

Before dumping: slot->dump_ref_count++
After dumping completes: slot->dump_ref_count--
Check before reset: if (slot->dump_ref_count == 0) { reset(); }

Per-Display Slot Ownership

Each slot is bound to a single display via display_id:

uint32_t display_id{UINT32_MAX};  // CRITICAL: Owner display ID

Why this matters:

Multiple displays can have different refresh rates
A slot cannot be shared between displays (different modesettings)
Fence events are dispatched per-display via FenceClosure.display_id
Prevents confusion when processing fence callbacks

EBUSY Prevention Mechanism

The in_use atomic prevents concurrent access to slots:

std::atomic<bool> in_use{false};

EBUSY Scenario:

Renderer tries to acquire a slot that’s still being displayed
Display tries to flip to a slot that’s being rendered into
in_use acts as a lock, preventing both scenarios

Acquire pattern:

bool expected = false;
if (!slot->in_use.compare_exchange_strong(expected, true)) {
    return EBUSY;  // Slot is in use, try next slot
}

Release pattern:

slot->in_use.store(false, std::memory_order_release);

Triple Buffer State Machine

Three Slots Rotating

The slots rotate through states as the display progresses:

CURRENT: The slot currently being displayed
NEXT: The slot queued for the next vsync
WRITE: The slot ready for the renderer to acquire

Frame Lifecycle with Fences

Timeline Detail:

Time	Event	State Change
T0	`acquire_frame()` claims slot	READY → IN_USE
T1	Render complete, `queue_frame()`	IN_USE → QUEUED
T2	`drmModePageFlip()` submits slot	Creates OUT fence F0
T3	Fence stored in `fence_closure.fd`	QUEUED → DISPLAY_PENDING
T4	Renderer acquires next slot (now READY)	New slot: READY → IN_USE
T5	Fence F0 signals, epoll delivers `FenceClosure`	-
T6	Handler validates generation, marks slot READY	DISPLAY_PENDING → READY
T7	Next page flip submits slot, cycle repeats	READY → IN_USE

Implementation Notes

Thread Safety

Slot access: Protected by in_use atomic flag
Display state: Protected by DisplayBufferState::mutex
Write index: Atomic, allows lock-free slot acquisition
Fence closure: Read-only after submission, safe for concurrent access

Zero Allocation Hot Path

The fence signaling path is zero-allocation:

FenceClosure is embedded in BufferSlot (stack allocation)
epoll_data.ptr points directly to the closure
No heap allocation in the signal handler

RAII Fence Management

Fence FDs are managed via UniqueFd:

release_fence_fd: Current OUT fence (owned by slot)
pending_release_fence_fd: Previous OUT fence (passed to plugin)
Both close automatically on reset or destructor

Initialization

display_buffer_state.initialize_slots();

Sets up each slot with:

slot_index: 0, 1, 2 (for event dispatch)
fence_closure.slot: Points to containing slot
fence_closure.display_id: Owning display ID
All fields reset to initial state

Summary

The Waxed Triple Buffer System provides:

Triple buffering for decoupled rendering and display
Per-display ownership for multi-display independence
Zero-allocation fence handling via embedded FenceClosure
ABA prevention via 64-bit generation counters
EBUSY prevention via atomic in_use flag
Safe frame dumping via dump_ref_count
Kernel-compatible OUT fence storage via out_fence_storage

All designed for zero-copy DMA-BUF presentation with Vulkan rendering.