Frame Pipeline

Overview

The Waxed Frame Pipeline manages the flow of rendered frames from plugins to physical displays using DRM/KMS atomic mode setting with explicit synchronization ( fences ). The pipeline implements zero-copy DMA-BUF sharing, triple-buffered swapchains, and GPU-to-Display synchronization without CPU blocking.

Key Characteristics:

Zero-copy DMA-BUF transfer from GPU to display
Core-owned swapchain buffers (ABI v5)
Explicit synchronization via DRM fences
Triple buffering for consistent frame pacing
Per-display independent refresh rates

Architecture Overview

FrameHandle Structure

FrameHandle represents a rendered frame from a plugin. In ABI v5, the core owns the DMA-BUF and plugins render directly into core-allocated buffers.

struct FrameHandle {
    void* data = nullptr;                          // CPU-accessible pixel data (if mapped)
    core::utils::UniqueFd dma_buf_fd;              // DMA-BUF file descriptor (RAII-managed)
    uint32_t width = 0;                            // Frame width in pixels
    uint32_t height = 0;                           // Frame height in pixels
    uint32_t stride = 0;                           // Bytes per row
    uint32_t format = 0;                           // Pixel format (0 = BGRA 8-bit)
    uint64_t modifier = DRM_FORMAT_MOD_INVALID;    // DRM format modifier
    void* internal_handle = nullptr;               // Internal handle for cleanup (BufferSlot*)
    core::utils::UniqueFd render_fence_fd;         // Render complete fence (IN_FENCE_FD)
    uint64_t buffer_id = 0;                        // Stable buffer identifier (ABI v4)
    uint64_t generation = 0;                       // Content change counter (ABI v4)
};

Move-Only Semantics

FrameHandle uses RAII and move-only semantics to enforce ownership transfer:

FrameHandle() = default;
FrameHandle(FrameHandle&&) noexcept = default;
FrameHandle& operator=(FrameHandle&&) noexcept = default;

// Deleted copy operations (compile-time enforcement)
FrameHandle(const FrameHandle&) = delete;
FrameHandle& operator=(const FrameHandle&) = delete;

Field Descriptions

Field	Type	Purpose
`data`	`void*`	CPU-accessible pixel data (if memory-mapped)
`dma_buf_fd`	`UniqueFd`	DMA-BUF file descriptor (RAII-managed, closes on destruct)
`width`	`uint32_t`	Frame width in pixels
`height`	`uint32_t`	Frame height in pixels
`stride`	`uint32_t`	Bytes per row (row pitch)
`format`	`uint32_t`	DRM format code (e.g., `DRM_FORMAT_XBGR8888`)
`modifier`	`uint64_t`	DRM format modifier (0 = linear, or tiled modifier)
`internal_handle`	`void*`	Internal handle (points to owning `BufferSlot`)
`render_fence_fd`	`UniqueFd`	Render completion fence (IN_FENCE for DRM)
`buffer_id`	`uint64_t`	Stable buffer ID for fast-path validation
`generation`	`uint64_t`	Incremented on content change

RenderTarget Structure (ABI v5)

RenderTarget represents a core-allocated DMA-BUF that plugins render into. The core allocates and owns the buffer, the plugin receives a borrowed handle.

struct RenderTarget {
    int dma_buf_fd;          // Core-owned FD (Plugin MUST NOT close)
    uint32_t buffer_id;      // Buffer index: 0, 1, or 2 (for import caching)
    uint32_t width;          // Buffer width in pixels
    uint32_t height;         // Buffer height in pixels
    uint32_t stride;         // Row pitch in bytes
    uint32_t format;         // DRM FourCC format code
    uint64_t modifier;       // DRM format modifier
    int release_fence_fd;    // KMS completion fence (Plugin waits before writing)

    // Cursor handoff fields (ABI v9)
    uint32_t display_id;     // Display identifier (0, 1, 2, ...)
    int32_t cursor_x;        // Display-local cursor X position
    int32_t cursor_y;        // Display-local cursor Y position
};

RenderTarget Ownership Model (v9)

The core allocates DMA-BUF buffers and manages their lifecycle. Plugins receive borrowed references.

Key Rules:

Core allocates DMA-BUF via Vulkan with external memory export
Core owns the FD throughout its lifetime
Plugin receives borrowed FD, must NOT close it
Plugin waits on release_fence_fd before rendering (if valid)
Plugin returns render_fence_fd for completion signaling

DMA-BUF Ownership Model

Core-Owned Buffers (ABI v5)

In ABI v5, the core allocates all swapchain buffers. This simplifies ownership:

Core allocates buffers via Vulkan with DMA-BUF export capability
Core owns the FD for the lifetime of the buffer
Plugin borrows the FD via RenderTarget::dma_buf_fd
Plugin renders into the borrowed buffer
Plugin returns a render fence for completion signaling
Core submits the DMA-BUF to DRM for scanout

// Core allocates (DisplayManager)
auto allocate_swapchain_buffers(DisplayState& display) -> Result<void> {
    for (size_t i = 0; i < 3; ++i) {
        // Create Vulkan image with external memory
        vk::ImageCreateInfo image_info = {};
        image_info.flags = vk::ImageCreateFlagBits::eMutableFormat;
        image_info.imageType = vk::ImageType::e2D;
        image_info.extent.width = width;
        image_info.extent.height = height;
        image_info.extent.depth = 1;
        image_info.mipLevels = 1;
        image_info.arrayLayers = 1;
        image_info.format = vk::Format::eR8G8B8A8Unorm;
        image_info.tiling = vk::ImageTiling::eLinear;  // Required for DMA-BUF
        image_info.initialLayout = vk::ImageLayout::eUndefined;
        image_info.usage = vk::ImageUsageFlagBits::eColorAttachment |
                          vk::ImageUsageFlagBits::eTransferSrc;
        image_info.sharingMode = vk::SharingMode::eExclusive;

        // Export memory with DMA-BUF capability
        vk::ExportMemoryAllocateInfo export_info = {};
        export_info.handleTypes = vk::ExternalMemoryHandleTypeFlagBits::eDmaBufEXT;

        vk::MemoryAllocateInfo alloc_info = {};
        alloc_info.pNext = &export_info;
        alloc_info.allocationSize = memory_requirements.size;
        alloc_info.memoryTypeIndex = memory_type_index;

        // Allocate and bind
        vk::raii::DeviceMemory vk_memory(device, alloc_info);
        vk::raii::Image vk_image(device, image_info);
        vk_image.bindMemory(vk_memory, 0);

        // Export as DMA-BUF FD
        vk::MemoryGetFdInfoKHR get_fd_info = {};
        get_fd_info.pNext = nullptr;
        get_fd_info.memory = *vk_memory;
        get_fd_info.handleType = vk::ExternalMemoryHandleTypeFlagBits::eDmaBufEXT;

        int dma_buf_fd = device.getMemoryFdKHR(get_fd_info);

        // Store in swapchain_buffers[i]
        swapchain_buffers[i].fd.reset(dma_buf_fd);
        swapchain_buffers[i].width = width;
        swapchain_buffers[i].height = height;
        swapchain_buffers[i].stride = row_pitch;
        swapchain_buffers[i].format = DRM_FORMAT_XBGR8888;
        swapchain_buffers[i].modifier = 0;  // Linear
        swapchain_buffers[i].id = i;
    }
}

Transfer Semantics

When submitting to KMS, the core duplicates the FD for the frame slot:

// In render_loop.cpp, render_display()
slot->frame.dma_buf_fd.reset(dup(core_buf.fd.get()));

This allows the core to retain ownership of the original while DRM takes ownership of the duplicate. DRM closes the duplicate when the framebuffer is released.

dup() for Caching

Plugins that want to retain access to a buffer must use dup():

// Plugin wants to keep the buffer for next frame
int retained_fd = dup(render_target.dma_buf_fd);
// ... use retained_fd ...
close(retained_fd);  // Close when done

When to close:

Core: Never closes swapchain buffers (owned for lifetime)
Plugin: Never closes RenderTarget::dma_buf_fd (borrowed)
DRM: Closes the dup’d FD when framebuffer is removed

v5 Render API Flow

The v5 API uses waxed_plugin_render_v5() which receives a RenderTarget:

auto waxed_plugin_render_v5(
    PluginState* state,
    const RenderTarget* target
) noexcept -> int;

Render Flow Sequence

Plugin Implementation Example

extern "C" auto waxed_plugin_render_v5(
    PluginState* state,
    const RenderTarget* target
) noexcept -> int {
    // 1. Wait for previous scanout to complete
    if (target->release_fence_fd >= 0) {
        sync_wait(target->release_fence_fd, -1);  // -1 = infinite wait
        close(target->release_fence_fd);  // We own this fence
    }

    // 2. Import DMA-BUF to Vulkan
    VkMemoryDedicatedAllocateInfo dedicated_info = {
        .sType = VK_STRUCTURE_TYPE_MEMORY_DEDICATED_ALLOCATE_INFO,
        .image = VK_NULL_HANDLE,  // Will fill in
    };

    VkImportMemoryFdInfoKHR import_info = {
        .sType = VK_STRUCTURE_TYPE_IMPORT_MEMORY_FD_INFO_KHR,
        .handleType = VK_EXTERNAL_MEMORY_HANDLE_TYPE_DMA_BUF_FD_BIT_EXT,
        .fd = dup(target->dma_buf_fd),  // dup() - we don't own the original
        .pNext = &dedicated_info,
    };

    VkMemoryAllocateInfo alloc_info = {
        .sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO,
        .pNext = &import_info,
        .allocationSize = size,
        .memoryTypeIndex = memory_type,
    };

    VkDeviceMemory vk_memory;
    vkAllocateMemory(device, &alloc_info, nullptr, &vk_memory);

    // 3. Bind image to imported memory
    vkBindImageMemory(device, vk_image, vk_memory, 0);

    // 4. Render frame
    vkCmdBeginCommandBuffer(cmd, &begin_info);
    // ... rendering commands ...
    vkCmdEndCommandBuffer(cmd);
    vkQueueSubmit(queue, 1, &submit_info, VK_NULL_HANDLE);

    // 5. Create render completion fence (sync_file)
    int render_fence_fd = -1;

    VkExportSemaphoreCreateInfo export_info = {
        .sType = VK_STRUCTURE_TYPE_EXPORT_SEMAPHORE_CREATE_INFO,
        .handleTypes = VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_SYNC_FD_BIT_EXT,
    };

    VkSemaphoreCreateInfo sem_info = {
        .sType = VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO,
        .pNext = &export_info,
    };

    VkSemaphore vk_semaphore;
    vkCreateSemaphore(device, &sem_info, nullptr, &vk_semaphore);

    // Submit with semaphore signal
    VkSemaphoreSubmitInfoKHR signal_info = {
        .sType = VK_STRUCTURE_TYPE_SEMAPHORE_SUBMIT_INFO_KHR,
        .semaphore = vk_semaphore,
    };
    // ... submit with signal ...

    // Export as sync_file (FD)
    VkSemaphoreGetFdInfoKHR get_fd_info = {
        .sType = VK_STRUCTURE_TYPE_SEMAPHORE_GET_FD_INFO_KHR,
        .semaphore = vk_semaphore,
        .handleType = VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_SYNC_FD_BIT_EXT,
    };

    vkGetSemaphoreFdKHR(device, &get_fd_info, &render_fence_fd);

    vkDestroySemaphore(device, vk_semaphore, nullptr);

    // 6. Free imported memory (FD ownership transferred to kernel)
    vkFreeMemory(device, vk_memory, nullptr);  // Closes the dup'd FD

    return render_fence_fd;  // Ownership transferred to core
}

Fence Lifecycle

The frame pipeline uses two types of fences for explicit synchronization:

IN_FENCE_FD (Render Completion)

Signals when GPU rendering is complete. DRM waits for this fence before displaying the buffer.

Code Flow:

// Plugin creates render fence
int render_fence_fd = create_sync_file_from_gpu_work();

// Returns render_fence_fd via waxed_plugin_render_v5()

// Core adds to atomic request
drmModeAtomicAddProperty(req, plane_id, in_fence_fd_prop, render_fence_fd);

// DRM waits for fence before scanout
drmModeAtomicCommit(drm_fd, req, flags, nullptr);

OUT_FENCE_PTR (Scanout Completion)

Signals when the buffer has completed scanout and left the screen. The next frame must wait for this fence before rendering to the same buffer.

Code Flow:

// In submit_frame(), slot->out_fence_storage is persistent
slot->out_fence_storage = -1;

// Pass pointer to kernel
drmModeAtomicAddProperty(req, crtc_id, out_fence_ptr_prop,
                         reinterpret_cast<uint64_t>(&slot->out_fence_storage));

// Kernel writes fence FD to slot->out_fence_storage during commit
drmModeAtomicCommit(drm_fd, req, flags, nullptr);

// Extract fence (may have AMDGPU encoding)
int fence_fd = extract_fence_from_storage(slot->out_fence_storage);

// Pass to plugin as release_fence_fd in next RenderTarget
target.release_fence_fd = pending_release_fence_fd.release();

Fence State Flow

Buffer ID Caching for Fast Path

Plugins can cache imported DMA-BUFs to avoid re-importing on every frame. The buffer_id field identifies which core buffer is being used (0, 1, or 2).

// Plugin maintains cache
struct CachedImport {
    uint32_t buffer_id;
    VkDeviceMemory vk_memory;
    VkImage vk_image;
    bool valid;
};
std::array<CachedImport, 3> import_cache;

extern "C" auto waxed_plugin_render_v5(
    PluginState* state,
    const RenderTarget* target
) noexcept -> int {
    uint32_t buffer_id = target->buffer_id;  // 0, 1, or 2

    if (import_cache[buffer_id].valid) {
        // Fast path: Reuse cached import
        VkImage image = import_cache[buffer_id].vk_image;
    } else {
        // Slow path: Import new DMA-BUF
        import_cache[buffer_id].vk_memory = import_dma_buf(target);
        import_cache[buffer_id].valid = true;
    }

    // ... render ...

    return render_fence_fd;
}

Important: Cache invalidation occurs on display configuration changes (hotplug, resolution change). Plugins should clear cache on waxed_plugin_visibility_changed(false, true) or when dimensions mismatch.

Frame Lifecycle Diagrams

Frame Ownership Transfer

Fence Signaling Flow

Timeline of Frame Through Pipeline

Triple Buffer Slot State

BufferSlot tracks the state of each buffer in the triple-buffered swapchain:

struct BufferSlot {
    FrameHandle frame;                          // Frame data
    std::atomic<bool> in_use{false};           // True if display or worker is using
    uint64_t sequence_number{0};               // Frame sequence counter
    uint64_t acquire_time_ms{0};               // When we acquired this frame
    uint64_t fence_generation{0};              // Generation counter for ABA prevention
    std::atomic<int> dump_ref_count{0};        // Active dump operations count
    waxed::core::utils::UniqueFd release_fence_fd;     // OUT_FENCE (KMS completion)
    waxed::core::utils::UniqueFd pending_release_fence_fd;  // OUT_FENCE for plugin
    int64_t out_fence_storage{0};              // Persistent storage for OUT_FENCE_PTR
    uint32_t slot_index{UINT32_MAX};           // Slot index (0, 1, 2)
    uint32_t display_id{UINT32_MAX};           // Owner display ID
    FenceClosure fence_closure;                // Pre-allocated fence event data
};

Slot States

Summary

The Waxed Frame Pipeline provides:

Zero-copy rendering via DMA-BUF sharing between GPU and display
Core-owned buffers simplifying ownership and lifetime management
Explicit synchronization via IN_FENCE and OUT_FENCE for GPU-Display coordination
Triple buffering for consistent frame pacing
Per-display VSync enabling mixed refresh rate configurations