Skip to content
Waxed Display Server
← Back to Docs

Triple Buffer System

Triple Buffer System

Overview

The Waxed compositor uses a triple-buffering strategy to manage frame presentation across multiple displays. Triple buffering provides:

  • Decoupled rendering and display: The rendering thread can work on the next frame while the current frame is being displayed
  • No tearing: Frames are only swapped during vertical blank
  • No stalls: The renderer never waits for the display to release a buffer
  • Zero-copy DMA-BUF: Frames are shared between GPU and display controller via DMA-BUF file descriptors

Each display maintains its own independent triple buffer state, allowing multiple displays to operate at different refresh rates without interference.

Core Data Structures

BufferSlot

The BufferSlot structure (defined in include/waxed/composer/buffer_slot.h) represents a single buffer in the triple-buffered system. Each slot contains:

struct BufferSlot {
    FrameHandle frame;                      // The frame handle (we own the FD)
    std::atomic<bool> in_use{false};        // True if display or worker is using this
    uint64_t sequence_number{0};            // Frame sequence counter
    uint64_t acquire_time_ms{0};            // When we acquired this frame
    uint64_t fence_generation{0};           // Generation counter for ABA prevention
    std::atomic<int> dump_ref_count{0};     // Active dump operations count
    waxed::core::utils::UniqueFd release_fence_fd;  // [OWNED] Fence FD (KMS OUT_FENCE)
    waxed::core::utils::UniqueFd pending_release_fence_fd;  // [OWNED] OUT_FENCE from previous cycle
    int64_t out_fence_storage{0};          // [KERNEL-WRITE] Persistent storage for OUT_FENCE_PTR
    uint32_t slot_index{UINT32_MAX};        // Slot index (0, 1, 2) for event dispatch
    uint32_t display_id{UINT32_MAX};        // CRITICAL: Owner display ID
    FenceClosure fence_closure;             // Pre-allocated fence closure
};

Key Characteristics:

  • Non-movable: std::atomic members prohibit move semantics. The slots are default-initialized in-place.
  • Owned FD: The frame.dma_buf_fd is dup()’d from the plugin. The slot owns this FD and closes it on reset.
  • Per-display ownership: Each slot is bound to one display via display_id, preventing cross-display slot confusion.

DisplayBufferState

The DisplayBufferState class (defined in include/waxed/composer/display_buffer_state.h) consolidates all buffer state for a single display:

class DisplayBufferState {
    static constexpr size_t TRIPLE_BUFFER_COUNT = 3;

    std::array<BufferSlot, TRIPLE_BUFFER_COUNT> slots{};
    std::array<CoreBuffer, TRIPLE_BUFFER_COUNT> swapchain_buffers{};
    std::atomic<uint32_t> write_index{0};   // Slot to acquire next frame into
    std::atomic<uint32_t> current_slot{0};  // Slot currently being displayed
    std::atomic<uint32_t> next_slot{0};     // Slot queued for next frame
    std::mutex mutex;
};

FenceClosure: Zero-Allocation Fence Handling

The FenceClosure structure enables zero-allocation fence event handling by embedding all necessary context directly in the epoll event:

struct FenceClosure {
    int fd{-1};                          // Set at submission time (varies per frame)
    uint64_t generation{0};              // Generation counter at submission (prevents ABA)
    BufferSlot* slot{nullptr};           // Set at slot creation time (constant)
    uint32_t display_id{UINT32_MAX};     // CRITICAL: Store display_id directly
};

How it works:

  1. At slot initialization, fence_closure.slot and fence_closure.display_id are set (constant)
  2. At frame submission, fence_closure.fd and fence_closure.generation are set (varies per frame)
  3. When the fence signals, epoll returns the closure pointer directly via epoll_data.ptr
  4. The handler dereferences the closure, validates the generation counter, and processes the slot

This avoids heap allocations entirely in the fence signaling path.

Slot Lifecycle

Each BufferSlot cycles through four states during its lifetime:

acquire_frame()

queue_to_display()

drmModePageFlip()

fence signals

READY

IN_USE

QUEUED

DISPLAY_PENDING

State Descriptions

StateDescriptionWho Owns It?
READYSlot is available for acquiring a new frameCompositor (free to acquire)
IN_USESlot has been acquired, frame is being renderedRender thread
QUEUEDFrame is complete, queued for displayCompositor (waiting for vsync)
DISPLAY_PENDINGFrame is being displayed, OUT fence pendingDisplay hardware

Transitions

  1. READY -> IN_USE: acquire_frame() claims the slot via write_index
  2. IN_USE -> QUEUED: Plugin calls queue_frame(), frame is ready for presentation
  3. QUEUED -> DISPLAY_PENDING: drmModePageFlip() is called, OUT fence is created
  4. DISPLAY_PENDING -> READY: OUT fence signals, slot is available again

Fence Generation Counters (ABA Prevention)

The fence_generation counter prevents the ABA problem in fence handling:

The Problem:

  • Slot A is submitted with fence F1, generation G=0
  • Fence F1 signals, but epoll event is delayed
  • Slot A completes its lifecycle, returns to READY
  • Slot A is reused, submitted with fence F2, generation G=1 (wraps to G=0 due to overflow)
  • Delayed F1 event arrives
  • Without generation check, we might mistakenly process the stale event

The Solution:

  • Each slot submission increments fence_generation
  • FenceClosure captures the generation at submission time
  • On fence signal, compare closure.generation == slot.fence_generation
  • Mismatch means the event is stale and must be ignored

With a 64-bit counter, overflow is practically impossible (thousands of years at 60fps).

Out Fence Storage

The out_fence_storage field provides persistent storage for the kernel’s OUT_FENCE_PTR:

int64_t out_fence_storage{0};  // [KERNEL-WRITE] Persistent storage

Why this exists:

  • KMS drmModeAtomicCommit() with DRM_MODE_ATOMIC_NONBLOCK requires a pointer to __u64 for OUT_FENCE_PTR
  • The kernel writes the fence FD to this location
  • The storage must remain valid until the commit completes
  • Using a member variable ensures the storage lives as long as the slot

Usage flow:

  1. Point DRM_MODE_ATOMIC_OUT_FENCE_PTR property to &out_fence_storage
  2. Call drmModeAtomicCommit()
  3. Kernel writes fence FD to out_fence_storage
  4. Read out_fence_storage and wrap in release_fence_fd

dump_ref_count for Frame Dumping

The dump_ref_count atomic counter enables safe frame dumping:

std::atomic<int> dump_ref_count{0};

Purpose:

  • When a frame is being dumped (e.g., for debugging or screenshot), dump_ref_count is incremented
  • The slot cannot be reset or reused while dump_ref_count > 0
  • Prevents use-after-free if the dump operation is slow

Usage:

  1. Before dumping: slot->dump_ref_count++
  2. After dumping completes: slot->dump_ref_count--
  3. Check before reset: if (slot->dump_ref_count == 0) { reset(); }

Per-Display Slot Ownership

Each slot is bound to a single display via display_id:

uint32_t display_id{UINT32_MAX};  // CRITICAL: Owner display ID

Why this matters:

  • Multiple displays can have different refresh rates
  • A slot cannot be shared between displays (different modesettings)
  • Fence events are dispatched per-display via FenceClosure.display_id
  • Prevents confusion when processing fence callbacks

EBUSY Prevention Mechanism

The in_use atomic prevents concurrent access to slots:

std::atomic<bool> in_use{false};

EBUSY Scenario:

  • Renderer tries to acquire a slot that’s still being displayed
  • Display tries to flip to a slot that’s being rendered into
  • in_use acts as a lock, preventing both scenarios

Acquire pattern:

bool expected = false;
if (!slot->in_use.compare_exchange_strong(expected, true)) {
    return EBUSY;  // Slot is in use, try next slot
}

Release pattern:

slot->in_use.store(false, std::memory_order_release);

Triple Buffer State Machine

acquire_frame()

queue_frame()

drmModePageFlip()

fence signals

READY

IN_USE

QUEUED

DISPLAY_PENDING

Available for acquiring new frame

Owned by compositor

Frame being rendered

Owned by render thread

Frame ready for presentation

Waiting for vsync

Frame being displayed

OUT fence pending

Owned by display hardware

Three Slots Rotating

Page Flip Occurs

After_Page_Flip

Slot 0: READY\n(now available)

Slot 1: DISPLAY_PENDING\n(now being displayed)

Slot 2: QUEUED\n(queued)

Before_Page_Flip

Slot 0: DISPLAY_PENDING\n(being displayed)

Slot 1: QUEUED\n(queued for next vsync)

Slot 2: READY\n(ready for acquire)

Display_Timeline

CURRENT Slot\n(Displayed)

NEXT Slot\n(Queued for Vsync)

WRITE Slot\n(Ready for Acquire)

Each display has its own\nindependent set of 3 slots

The slots rotate through states as the display progresses:

  • CURRENT: The slot currently being displayed
  • NEXT: The slot queued for the next vsync
  • WRITE: The slot ready for the renderer to acquire

Frame Lifecycle with Fences

FenceSystemGPUDisplaySlotRendererFenceSystemGPUDisplaySlotRendererFrame N LifecycleFrame N being displayedFrame N+1 Lifecycle (pipelined)acquire_frame() (READY → IN_USE)1Render frame2queue_frame() (IN_USE → QUEUED)3drmModePageFlip()4Create OUT fence F05State → DISPLAY_PENDING6acquire_frame() (different slot, READY → IN_USE)7Render next frame8Fence F0 signals9State → READY (slot available again)10Submit next frame11Create OUT fence F112

Timeline Detail:

TimeEventState Change
T0acquire_frame() claims slotREADY → IN_USE
T1Render complete, queue_frame()IN_USE → QUEUED
T2drmModePageFlip() submits slotCreates OUT fence F0
T3Fence stored in fence_closure.fdQUEUED → DISPLAY_PENDING
T4Renderer acquires next slot (now READY)New slot: READY → IN_USE
T5Fence F0 signals, epoll delivers FenceClosure-
T6Handler validates generation, marks slot READYDISPLAY_PENDING → READY
T7Next page flip submits slot, cycle repeatsREADY → IN_USE

Implementation Notes

Thread Safety

  • Slot access: Protected by in_use atomic flag
  • Display state: Protected by DisplayBufferState::mutex
  • Write index: Atomic, allows lock-free slot acquisition
  • Fence closure: Read-only after submission, safe for concurrent access

Zero Allocation Hot Path

The fence signaling path is zero-allocation:

  1. FenceClosure is embedded in BufferSlot (stack allocation)
  2. epoll_data.ptr points directly to the closure
  3. No heap allocation in the signal handler

RAII Fence Management

Fence FDs are managed via UniqueFd:

  • release_fence_fd: Current OUT fence (owned by slot)
  • pending_release_fence_fd: Previous OUT fence (passed to plugin)
  • Both close automatically on reset or destructor

Initialization

display_buffer_state.initialize_slots();

Sets up each slot with:

  • slot_index: 0, 1, 2 (for event dispatch)
  • fence_closure.slot: Points to containing slot
  • fence_closure.display_id: Owning display ID
  • All fields reset to initial state

Summary

The Waxed Triple Buffer System provides:

  1. Triple buffering for decoupled rendering and display
  2. Per-display ownership for multi-display independence
  3. Zero-allocation fence handling via embedded FenceClosure
  4. ABA prevention via 64-bit generation counters
  5. EBUSY prevention via atomic in_use flag
  6. Safe frame dumping via dump_ref_count
  7. Kernel-compatible OUT fence storage via out_fence_storage

All designed for zero-copy DMA-BUF presentation with Vulkan rendering.