Skip to content
Waxed Display Server
← Back to Docs

Texture Streamer

Texture Streamer

Overview

The TextureStreamer is a core system component that provides asynchronous texture loading for the Waxed compositor. It enables butter-smooth GUI experiences by offloading texture loading operations to a dedicated worker thread, ensuring the main render thread never blocks on disk I/O or GPU uploads.

Key Design Principles:

  • Non-blocking: All texture operations happen off the main thread
  • Zero-GPU-wait: Textures are marked ready immediately after command submission, not after GPU completion
  • Implicit synchronization: Vulkan pipeline barriers handle GPU-side synchronization
  • Ring buffer architecture: Pre-allocated slots minimize runtime allocations
  • Ownership transfer: Consumers take ownership of ready textures, enabling efficient resource management

TextureSlot Structure

The TextureSlot represents a single texture’s lifecycle in the streaming system. Each slot contains all Vulkan resources needed for a texture and maintains its own state machine.

Members

MemberTypeDescription
imagevk::raii::ImageThe Vulkan image handle
viewvk::raii::ImageViewImage view for shader access
memoryvk::raii::DeviceMemoryDevice-local memory backing
descriptor_setvk::raii::DescriptorSetOptional descriptor set for binding
widthuint32_tTexture width (post-resize)
heightuint32_tTexture height (post-resize)
strideuint32_tRow stride in bytes (width * 4)
formatVkFormatPixel format (always VK_FORMAT_R8G8B8A8_UNORM)
sequenceuint64_tUnique request identifier
stateatomic<State>Current slot state (see below)
on_readyfunctionOptional per-slot callback
dma_buf_fdUniqueFdExported DMA-BUF file descriptor
dma_buf_modifieruint64_tDRM format modifier (0 = LINEAR)

State Machine

Each TextureSlot progresses through four states:

request_load()

find_empty_slot()

load_and_submit()

(disk → staging → GPU)

Load failed

take_texture()

Slot reclaimed

Empty

Loading

Ready

Error

Image being loaded from disk

and uploaded to GPU

Texture uploaded and ready

for consumption

Marked ready immediately after

GPU command submission (not completion)

Thread Safety

  • state is std::atomic<State> and can be read safely from any thread
  • All other members are only accessed by the worker thread while in Loading state
  • After Ready, the slot becomes read-only for consumer threads

Ring Buffer Architecture

The TextureStreamer uses a fixed-size ring buffer of TextureSlot objects. This design provides:

  • Predictable memory usage: Pre-allocated at initialization
  • No fragmentation: Slots are reused rather than reallocated
  • Cache efficiency: Contiguous memory layout
  • Simple lifecycle: Slots cycle through states independently

Ring Buffer Layout

State Distribution Example

Slot 0: Empty

Slot 1: Ready

Slot 2: Loading

Ring Buffer (size = config.ring_buffer_size, default = 3)

Slot 0

Slot 1

Slot 2

Slot N...

Slot Allocation

  • find_empty_slot(): Linear search for Empty state slot
  • Returns nullptr if no slots available (caller must retry or handle error)
  • Slots are not pre-reserved; first Empty slot wins
  • After take_texture(), slot is reset to default state (becomes Empty)

Worker Thread Operation

The worker thread runs continuously from construction until destruction, processing load requests from a thread-safe queue.

Worker Loop Flow

Yes

No

No

Yes

Yes

No

Worker Thread Main

Wait on request_cv_

(condition variable)

requests_ not empty

OR shutdown?

Pop LoadRequest from queue

find_empty_slot()

Slot available?

Log warning

Continue loop

Mark as Loading

load_and_submit()

1. Load from disk (stb)
2. Copy to staging buffer
3. Create temp image
4. Create final image
5. Record command buffer
6. Submit to GPU

Success?

Mark Ready

Fire callback

Mark Error

Log error

Loop back to wait

Thread Synchronization

ComponentTypePurpose
request_mutex_std::mutexProtects requests_ queue
request_cv_std::condition_variableWakes worker for new requests
shutdown_requested_std::atomic<bool>Signals worker to exit
slot.statestd::atomic<State>Cross-thread state visibility

Public API

Initialization

TextureStreamer(
    vk::raii::Instance& instance,
    vk::raii::PhysicalDevice& physical_device,
    vk::raii::Device& device,
    vk::raii::Queue& transfer_queue,
    uint32_t transfer_queue_family,
    const Config& config
);

Note: The constructor receives RAII references, not owned pointers. The caller must ensure these objects outlive the TextureStreamer.

Configuration

struct Config {
    size_t ring_buffer_size = 3;              // Number of pre-allocated slots
    size_t min_staging_buffer_size = 32MiB;   // Minimum staging buffer
    bool enable_callback = true;              // Enable callback notification
    TextureReadyCallback on_texture_ready;    // Global callback
};

Requesting a Texture Load

auto request_load(std::string_view path,
                  uint32_t target_width,
                  uint32_t target_height,
                  int mode) -> uint64_t;

Parameters:

  • path: Filesystem path to image file (PNG, JPEG, etc. via stb_image)
  • target_width: Desired output width (0 = use original)
  • target_height: Desired output height (0 = use original)
  • mode: Resize mode (0=contain, 1=cover, 2=stretch, 3=tile)

Returns: Sequence number for tracking this request

Behavior:

  1. Generates sequence number via atomic fetch-add
  2. Pushes LoadRequest to queue
  3. Notifies worker thread
  4. Returns immediately (non-blocking)

Checking Status

auto get_ready_texture(uint64_t sequence) -> const TextureSlot*;

Returns pointer to slot if Ready, nullptr otherwise.

auto get_freshest_ready() -> const TextureSlot*;

Returns the ready texture with the highest sequence number, or nullptr.

auto is_ready(uint64_t sequence) const -> bool;

Simple boolean check for readiness.

Taking Ownership

auto take_texture(uint64_t sequence, TextureSlot& out_slot) -> bool;

Ownership Transfer Semantics:

  1. Moves all Vulkan resources from the internal slot to out_slot
  2. Resets internal slot to default state (Empty)
  3. Consumer now owns the texture and is responsible for its lifetime

Usage Pattern:

TextureSlot my_texture;
if (streamer.take_texture(sequence, my_texture)) {
    // my_texture now owns the Vulkan resources
    // Use my_texture.image, my_texture.view, etc.
} // my_texture destructors clean up automatically

Queue Status

auto pending_count() const -> size_t;

Returns number of requests currently waiting in the queue (not yet started).

Shutdown

void shutdown();

Called automatically by destructor. Signals worker thread to exit, waits for completion, and ensures GPU operations finish.


Async Load Pipeline

request_load() Flow

GPUDisk I/OWorker ThreadRequest QueueMain ThreadGPUDisk I/OWorker ThreadRequest QueueMain ThreadReturns immediatelyGPU continues work(upload, blit, barriers)request_load(path)1next_sequence_++(atomic increment)2Push to requests_(mutex-protected)3notify request_cv_4Wake worker thread5Pop request6find_empty_slot()7Mark slot Loading8stbi_load()(disk I/O, decompression)9Pixel data10Copy to staging buffer11Create temp image12Create final image13Record command buffer14queue.submit()15Submit returns16Mark slot Ready17Fire callback18

The “No GPU Wait” Optimization

The critical design decision: the slot is marked Ready immediately after queue.submit(), not after GPU completion.

Why this works:

  1. Vulkan’s implicit synchronization ensures commands execute in order
  2. Pipeline barriers prevent the shader from sampling before data is ready
  3. The consumer can bind the texture immediately; GPU waits as needed
  4. No vkQueueWaitIdle() or fence synchronization required

Latency comparison:

Traditional (wait):
  submit -> waitIdle -> callback: ~5-15ms

TextureStreamer (fire-and-forget):
  submit -> callback: ~0.1ms

Staging Buffer Management

The staging buffer is a CPU-visible, GPU-accessible buffer used as an intermediate step for texture uploads.

Lifecycle

  • Created on-demand: First call to load_and_submit()
  • Dynamically resized: Grows if current size is insufficient
  • Host-coherent: No manual flushing required
  • Persisted: Reused across all loads (not recreated per texture)

Sizing Logic

required_size = image_width * image_height * 4 (RGBA)

if (required_size < min_staging_size):
    required_size = min_staging_size

actual_size = required_size * 1.25  (25% headroom)

The 25% headroom reduces reallocations when loading images of varying sizes.

Memory Properties

vk::MemoryPropertyFlagBits::eHostVisible |   // CPU can access
vk::MemoryPropertyFlagBits::eHostCoherent   // Writes auto-visible to GPU

DMA-BUF Export Capability

The TextureStreamer supports exporting textures as DMA-BUF file descriptors for zero-copy sharing with other system components (e.g., video encoders, display servers).

Export Process

auto export_slot_to_dma_buf(uint64_t sequence) -> bool;

Requirements:

  1. Slot must be in Ready state
  2. Image must be created with VK_IMAGE_TILING_LINEAR (already done)
  3. Image memory must be allocated with export info (already done)

After Export:

  • slot.dma_buf_fd contains the exported file descriptor
  • slot.dma_buf_modifier is set to DRM_FORMAT_MOD_LINEAR (0)
  • FD ownership is managed by UniqueFd (RAII)

DMA-BUF Creation Details

The final image is created with external memory capabilities:

vk::ExternalMemoryImageCreateInfo ext_mem_info;
ext_mem_info.setHandleTypes(vk::ExternalMemoryHandleTypeFlagBits::eDmaBufEXT);

vk::ImageCreateInfo image_info;
image_info.setPNext(&ext_mem_info);
image_info.setTiling(vk::ImageTiling::eLinear);  // Required for DMA-BUF

Memory allocation includes export info:

vk::ExportMemoryAllocateInfo export_alloc_info;
export_alloc_info.setHandleTypes(vk::ExternalMemoryHandleTypeFlagBits::eDmaBufEXT);

vk::MemoryAllocateInfo alloc_info;
alloc_info.setPNext(&export_alloc_info);

Integration with BackgroundService

The BackgroundService uses TextureStreamer to load and cycle through background images smoothly.

Integration Pattern

// At initialization
TextureStreamer streamer(instance, physical_device, device,
                         transfer_queue, transfer_queue_family,
                         {.ring_buffer_size = 3,
                          .on_texture_ready = [](auto seq, auto& slot) {
                              background.on_texture_ready(seq, slot);
                          }});

// Request next background
streamer.request_load(next_image_path, width, height, mode);

// In callback
void BackgroundService::on_texture_ready(uint64_t seq, const TextureSlot& slot) {
    // Take ownership when ready
    if (streamer.take_texture(seq, pending_texture_)) {
        // Schedule swap for next frame
        texture_pending_ = true;
    }
}

// During render
if (texture_pending_) {
    current_texture_ = std::move(pending_texture_);
    texture_pending_ = false;
}

This pattern ensures:

  • Multiple images can be in flight simultaneously
  • No blocking on the render thread
  • Smooth transitions between backgrounds

Thread Safety Guarantees

Safe Operations from Any Thread

  • request_load() - lock-protected queue push
  • is_ready() - reads atomic state
  • pending_count() - lock-protected queue size

Worker-Thread-Only Operations

  • All TextureSlot mutation (except state)
  • load_and_submit() - entire pipeline
  • ensure_staging_buffer() - buffer management

Consumer-Thread Operations

  • get_ready_texture() - reads slot state
  • get_freshest_ready() - reads slot states
  • take_texture() - moves from slot (state must be Ready)

Atomicity Notes

  • slot.state is always std::atomic<State> for cross-thread visibility
  • next_sequence_ uses atomic fetch-add for unique IDs
  • shutdown_requested_ is atomic for clean termination

Summary Diagram

Complete Request Lifecycle

Consumer Thread

GPU

Worker Thread

Thread-Safe Request Queue

TextureStreamer Core

Main Thread

condition variable notify

return immediately

GPU continues in background

request_load(path, w, h, mode)

Generate Sequence

Req1 | Req2 | Req3 | ...

Dequeue Request

find_empty_slot()

stbi_load()

← disk I/O

staging copy

Vulkan Command Recording:

- Create temp image

- Create final image

- Buffer→Image copy

- Blit (resize)

- Layout transitions

queue.submit()

Mark slot Ready

Fire callback

Execute Commands

Copy staging to temp image

Blit temp to final (resize)

Transition to shader-readable

get_ready_texture(seq) /

take_texture(seq)

Use texture in rendering


Key Takeaways

  1. Zero blocking: Main thread never waits for disk I/O or GPU operations
  2. Fire-and-forget: Submit commands and mark ready immediately
  3. Ring buffer: Fixed slots for predictable memory usage
  4. Ownership transfer: take_texture() moves resources to consumer
  5. DMA-BUF ready: Can export for zero-copy sharing
  6. Thread-safe: Careful atomic and mutex usage for correctness
  7. Vulkan RAII: All resources managed automatically