Texture Streamer
Texture Streamer
Overview
The TextureStreamer is a core system component that provides asynchronous texture loading for the Waxed compositor. It enables butter-smooth GUI experiences by offloading texture loading operations to a dedicated worker thread, ensuring the main render thread never blocks on disk I/O or GPU uploads.
Key Design Principles:
- Non-blocking: All texture operations happen off the main thread
- Zero-GPU-wait: Textures are marked ready immediately after command submission, not after GPU completion
- Implicit synchronization: Vulkan pipeline barriers handle GPU-side synchronization
- Ring buffer architecture: Pre-allocated slots minimize runtime allocations
- Ownership transfer: Consumers take ownership of ready textures, enabling efficient resource management
TextureSlot Structure
The TextureSlot represents a single texture’s lifecycle in the streaming system. Each slot contains all Vulkan resources needed for a texture and maintains its own state machine.
Members
| Member | Type | Description |
|---|---|---|
image | vk::raii::Image | The Vulkan image handle |
view | vk::raii::ImageView | Image view for shader access |
memory | vk::raii::DeviceMemory | Device-local memory backing |
descriptor_set | vk::raii::DescriptorSet | Optional descriptor set for binding |
width | uint32_t | Texture width (post-resize) |
height | uint32_t | Texture height (post-resize) |
stride | uint32_t | Row stride in bytes (width * 4) |
format | VkFormat | Pixel format (always VK_FORMAT_R8G8B8A8_UNORM) |
sequence | uint64_t | Unique request identifier |
state | atomic<State> | Current slot state (see below) |
on_ready | function | Optional per-slot callback |
dma_buf_fd | UniqueFd | Exported DMA-BUF file descriptor |
dma_buf_modifier | uint64_t | DRM format modifier (0 = LINEAR) |
State Machine
Each TextureSlot progresses through four states:
Thread Safety
stateisstd::atomic<State>and can be read safely from any thread- All other members are only accessed by the worker thread while in
Loadingstate - After
Ready, the slot becomes read-only for consumer threads
Ring Buffer Architecture
The TextureStreamer uses a fixed-size ring buffer of TextureSlot objects. This design provides:
- Predictable memory usage: Pre-allocated at initialization
- No fragmentation: Slots are reused rather than reallocated
- Cache efficiency: Contiguous memory layout
- Simple lifecycle: Slots cycle through states independently
Ring Buffer Layout
Slot Allocation
find_empty_slot(): Linear search forEmptystate slot- Returns
nullptrif no slots available (caller must retry or handle error) - Slots are not pre-reserved; first
Emptyslot wins - After
take_texture(), slot is reset to default state (becomesEmpty)
Worker Thread Operation
The worker thread runs continuously from construction until destruction, processing load requests from a thread-safe queue.
Worker Loop Flow
Thread Synchronization
| Component | Type | Purpose |
|---|---|---|
request_mutex_ | std::mutex | Protects requests_ queue |
request_cv_ | std::condition_variable | Wakes worker for new requests |
shutdown_requested_ | std::atomic<bool> | Signals worker to exit |
slot.state | std::atomic<State> | Cross-thread state visibility |
Public API
Initialization
TextureStreamer(
vk::raii::Instance& instance,
vk::raii::PhysicalDevice& physical_device,
vk::raii::Device& device,
vk::raii::Queue& transfer_queue,
uint32_t transfer_queue_family,
const Config& config
);
Note: The constructor receives RAII references, not owned pointers. The caller must ensure these objects outlive the TextureStreamer.
Configuration
struct Config {
size_t ring_buffer_size = 3; // Number of pre-allocated slots
size_t min_staging_buffer_size = 32MiB; // Minimum staging buffer
bool enable_callback = true; // Enable callback notification
TextureReadyCallback on_texture_ready; // Global callback
};
Requesting a Texture Load
auto request_load(std::string_view path,
uint32_t target_width,
uint32_t target_height,
int mode) -> uint64_t;
Parameters:
path: Filesystem path to image file (PNG, JPEG, etc. via stb_image)target_width: Desired output width (0 = use original)target_height: Desired output height (0 = use original)mode: Resize mode (0=contain, 1=cover, 2=stretch, 3=tile)
Returns: Sequence number for tracking this request
Behavior:
- Generates sequence number via atomic fetch-add
- Pushes
LoadRequestto queue - Notifies worker thread
- Returns immediately (non-blocking)
Checking Status
auto get_ready_texture(uint64_t sequence) -> const TextureSlot*;
Returns pointer to slot if Ready, nullptr otherwise.
auto get_freshest_ready() -> const TextureSlot*;
Returns the ready texture with the highest sequence number, or nullptr.
auto is_ready(uint64_t sequence) const -> bool;
Simple boolean check for readiness.
Taking Ownership
auto take_texture(uint64_t sequence, TextureSlot& out_slot) -> bool;
Ownership Transfer Semantics:
- Moves all Vulkan resources from the internal slot to
out_slot - Resets internal slot to default state (
Empty) - Consumer now owns the texture and is responsible for its lifetime
Usage Pattern:
TextureSlot my_texture;
if (streamer.take_texture(sequence, my_texture)) {
// my_texture now owns the Vulkan resources
// Use my_texture.image, my_texture.view, etc.
} // my_texture destructors clean up automatically
Queue Status
auto pending_count() const -> size_t;
Returns number of requests currently waiting in the queue (not yet started).
Shutdown
void shutdown();
Called automatically by destructor. Signals worker thread to exit, waits for completion, and ensures GPU operations finish.
Async Load Pipeline
request_load() Flow
The “No GPU Wait” Optimization
The critical design decision: the slot is marked Ready immediately after queue.submit(), not after GPU completion.
Why this works:
- Vulkan’s implicit synchronization ensures commands execute in order
- Pipeline barriers prevent the shader from sampling before data is ready
- The consumer can bind the texture immediately; GPU waits as needed
- No
vkQueueWaitIdle()or fence synchronization required
Latency comparison:
Traditional (wait):
submit -> waitIdle -> callback: ~5-15ms
TextureStreamer (fire-and-forget):
submit -> callback: ~0.1ms
Staging Buffer Management
The staging buffer is a CPU-visible, GPU-accessible buffer used as an intermediate step for texture uploads.
Lifecycle
- Created on-demand: First call to
load_and_submit() - Dynamically resized: Grows if current size is insufficient
- Host-coherent: No manual flushing required
- Persisted: Reused across all loads (not recreated per texture)
Sizing Logic
required_size = image_width * image_height * 4 (RGBA)
if (required_size < min_staging_size):
required_size = min_staging_size
actual_size = required_size * 1.25 (25% headroom)
The 25% headroom reduces reallocations when loading images of varying sizes.
Memory Properties
vk::MemoryPropertyFlagBits::eHostVisible | // CPU can access
vk::MemoryPropertyFlagBits::eHostCoherent // Writes auto-visible to GPU
DMA-BUF Export Capability
The TextureStreamer supports exporting textures as DMA-BUF file descriptors for zero-copy sharing with other system components (e.g., video encoders, display servers).
Export Process
auto export_slot_to_dma_buf(uint64_t sequence) -> bool;
Requirements:
- Slot must be in
Readystate - Image must be created with
VK_IMAGE_TILING_LINEAR(already done) - Image memory must be allocated with export info (already done)
After Export:
slot.dma_buf_fdcontains the exported file descriptorslot.dma_buf_modifieris set toDRM_FORMAT_MOD_LINEAR(0)- FD ownership is managed by
UniqueFd(RAII)
DMA-BUF Creation Details
The final image is created with external memory capabilities:
vk::ExternalMemoryImageCreateInfo ext_mem_info;
ext_mem_info.setHandleTypes(vk::ExternalMemoryHandleTypeFlagBits::eDmaBufEXT);
vk::ImageCreateInfo image_info;
image_info.setPNext(&ext_mem_info);
image_info.setTiling(vk::ImageTiling::eLinear); // Required for DMA-BUF
Memory allocation includes export info:
vk::ExportMemoryAllocateInfo export_alloc_info;
export_alloc_info.setHandleTypes(vk::ExternalMemoryHandleTypeFlagBits::eDmaBufEXT);
vk::MemoryAllocateInfo alloc_info;
alloc_info.setPNext(&export_alloc_info);
Integration with BackgroundService
The BackgroundService uses TextureStreamer to load and cycle through background images smoothly.
Integration Pattern
// At initialization
TextureStreamer streamer(instance, physical_device, device,
transfer_queue, transfer_queue_family,
{.ring_buffer_size = 3,
.on_texture_ready = [](auto seq, auto& slot) {
background.on_texture_ready(seq, slot);
}});
// Request next background
streamer.request_load(next_image_path, width, height, mode);
// In callback
void BackgroundService::on_texture_ready(uint64_t seq, const TextureSlot& slot) {
// Take ownership when ready
if (streamer.take_texture(seq, pending_texture_)) {
// Schedule swap for next frame
texture_pending_ = true;
}
}
// During render
if (texture_pending_) {
current_texture_ = std::move(pending_texture_);
texture_pending_ = false;
}
This pattern ensures:
- Multiple images can be in flight simultaneously
- No blocking on the render thread
- Smooth transitions between backgrounds
Thread Safety Guarantees
Safe Operations from Any Thread
request_load()- lock-protected queue pushis_ready()- reads atomic statepending_count()- lock-protected queue size
Worker-Thread-Only Operations
- All
TextureSlotmutation (exceptstate) load_and_submit()- entire pipelineensure_staging_buffer()- buffer management
Consumer-Thread Operations
get_ready_texture()- reads slot stateget_freshest_ready()- reads slot statestake_texture()- moves from slot (state must be Ready)
Atomicity Notes
slot.stateis alwaysstd::atomic<State>for cross-thread visibilitynext_sequence_uses atomic fetch-add for unique IDsshutdown_requested_is atomic for clean termination
Summary Diagram
Complete Request Lifecycle
Key Takeaways
- Zero blocking: Main thread never waits for disk I/O or GPU operations
- Fire-and-forget: Submit commands and mark ready immediately
- Ring buffer: Fixed slots for predictable memory usage
- Ownership transfer:
take_texture()moves resources to consumer - DMA-BUF ready: Can export for zero-copy sharing
- Thread-safe: Careful atomic and mutex usage for correctness
- Vulkan RAII: All resources managed automatically