Status note:
- This document started as an implementation plan.
- The most important pacing issue was ultimately not solved by two frames in flight.
- The practical fixes were:
- removing blocking/resize behavior from the DefconDraw Vulkan upload path
- preferring
VK_PRESENT_MODE_FIFO_KHR - using
caps.minImageCountfor FIFO swapchains instead ofminImageCount + 1
- Two frames in flight were prototyped, but are intentionally not the active shipping path right now. The app currently runs with one frame in flight.
This document captures the implementation plan for improving frame pacing and output smoothness in v-type without shifting focus to shader micro-optimization.
The priorities here are:
- Remove structural CPU/GPU lockstep.
- Remove blocking behavior from active frame paths.
- Stabilize dynamic upload behavior.
- Improve perceptual smoothness with render interpolation.
- Keep resource creation/recreation out of steady-state gameplay.
The renderer currently uses one global:
image_availablesemaphorerender_finishedsemaphorein_flightfence
and waits that fence every frame before acquiring the next image.
Effect:
- CPU and GPU are more tightly serialized.
- Any long GPU frame immediately stalls the CPU.
- This looked like the main structural problem initially, but in practice it was not the dominant shipping hitch source.
Relevant code:
src/main.c:create_sync(...)src/main.c:record_submit_present(...)src/main.c:main(...)src/main.c:create_vg_context(...)
The vg Vulkan backend currently:
- grows a single vertex buffer on demand
- calls
vkDeviceWaitIdle()before reallocating it - maps, copies, and unmaps vertex memory every frame
Effect:
- rare but severe hitches when the buffer grows
- avoidable per-frame CPU/GPU synchronization pressure
- no clean path to multi-frame upload ownership
Relevant code:
DefconDraw/src/backends/vulkan/vg_vk.c:vg_vk_ensure_vertex_buffer(...)DefconDraw/src/backends/vulkan/vg_vk.c:vg_vk_upload_vertices(...)
Status:
- Fixed.
- The backend now uses persistently mapped per-frame upload buffers with fixed steady-state capacity.
- The old active-path
vkDeviceWaitIdle()behavior was removed from normal gameplay rendering.
The game sim runs at a fixed step, but rendering uses the latest sim state directly.
Effect:
- motion can appear stepped even when frame time is stable
- high-refresh output is smoother than low-refresh output, but still not maximally smooth
- improvements to frame pacing will not fully translate into perceived smoothness unless interpolation is added
Relevant code:
src/main.cfixed-step loopsrc/render.h:render_metricssrc/render.c
Status:
- Partially fixed.
- Camera, player, enemies, bullets, enemy bullets, and missiles are now interpolated for rendering.
- Secondary visuals can still be extended later if needed.
Some resource validation/loading remains in the steady-state render loop.
Effect:
- normally cheap, but dangerous if a resource change occurs during active play
- can introduce visible stalls unrelated to actual frame rendering
Relevant code:
src/main.c:ensure_active_structure_tile_resources(...)- frame loop call site in
src/main.c
Status:
- Fixed for the known structure-tile path.
- Resource sync for structure-tile atlases was moved to explicit transition points instead of running as an unconditional steady-state frame-loop guard.
The remaining major hitch pattern turned out to be present-side pacing, not scene cost.
Observed behavior from instrumentation:
- scene GPU time stayed low, often around
1-4 ms - some bad hitches were dominated by
vkQueuePresentKHR - the next frame would often inherit the stall as fence wait time
The effective fixes were:
- prefer
VK_PRESENT_MODE_FIFO_KHRinstead ofMAILBOX - when using FIFO, request
caps.minImageCountinstead ofminImageCount + 1
Reason:
- deeper FIFO buffering increased latency and made pacing less stable for this game
- this project is not GPU-bound in steady play, so tighter swapchain buffering gave better results
Current status:
- Implemented experimentally, then backed out as the active mode.
- The current app configuration remains one frame in flight.
- Keep this phase deferred until there is a concrete reason to revisit it and the renderer/resource lifetime model is audited again.
Goal:
- decouple CPU submission from GPU completion enough to absorb transient long frames
- preserve existing behavior as much as possible while removing hard lockstep
Target model:
FRAME_OVERLAP = 2- per-frame sync objects
- per-frame command-buffer ownership
- per-frame upload ownership
Add a small per-frame struct in src/main.c, for example:
typedef struct frame_sync {
VkSemaphore image_available;
VkSemaphore render_finished;
VkFence in_flight;
} frame_sync;and a frame index:
uint32_t current_frame;
frame_sync frames[2];If later needed, extend this to include per-frame command buffer, per-frame query pool, and per-frame transient upload state.
Add per-image tracking:
VkFence images_in_flight[APP_MAX_SWAPCHAIN_IMAGES];Purpose:
- if an acquired swapchain image is still in flight from an older frame, wait only on that image's fence
- avoid assuming one global fence protects all swapchain images
Change create_sync(...) to allocate two semaphore pairs and two fences instead of one.
Destroy all of them in cleanup paths.
Current shape:
- wait one fence
- acquire image
- record
- reset same fence
- submit
- present
Replace with standard two-frame pattern:
- Select
frame = frames[current_frame] - Wait
frame.in_flight - Acquire swapchain image using
frame.image_available - If
images_in_flight[image_index] != VK_NULL_HANDLE, wait that fence - Set
images_in_flight[image_index] = frame.in_flight - Reset
frame.in_flight - Submit using
frame.image_availableandframe.render_finished - Present waiting on
frame.render_finished - Advance
current_frame = (current_frame + 1) % FRAME_OVERLAP
Change:
desc.api.vulkan.max_frames_in_flight = 1
to:
desc.api.vulkan.max_frames_in_flight = 2
This should happen together with the backend upload rework below. If done alone, it improves the API contract but not the backend ownership model.
After Phase 1:
- game still builds and runs
- no use-after-submit on command buffers or uploads
- no visual regressions in gameplay/menu/editor
- if re-enabled in the future, frame pacing should be less sensitive to occasional long GPU frames
Current status:
- Completed for the active Vulkan gameplay path.
Goal:
- no
vkDeviceWaitIdle()in active rendering - no shared mutable upload buffer with implicit single-frame ownership
This is the highest-risk structural issue after frames-in-flight.
Before any structural rework, fix the backend bug where upload memory type is chosen against 0xffffffff instead of actual memoryTypeBits.
Correct behavior:
- call
vkGetBufferMemoryRequirements(...) - choose a memory type from
req.memoryTypeBits - require host visible/coherent for upload memory
This should be done whether or not the rest of the upload redesign lands immediately.
Recommended direction:
- one upload buffer per frame in flight
- each persistently mapped at creation
- each sized for expected peak vector geometry
Suggested backend struct change in DefconDraw/src/backends/vulkan/vg_vk.c:
typedef struct vg_vk_frame_upload {
vg_vk_gpu_buffer vertex_buffer;
void* mapped;
} vg_vk_frame_upload;and:
vg_vk_frame_upload frame_uploads[VG_MAX_FRAMES_IN_FLIGHT];
uint32_t frame_slot;The exact array sizing should follow the backend's configured max_frames_in_flight.
At vg_begin_frame(...) or equivalent frame-start point:
- choose backend upload slot from frame index modulo frames-in-flight
- reset only CPU-side counters for that slot
- do not destroy/recreate buffers during active frames
Persistently map each host-visible upload buffer once at creation time.
Per frame:
- memcpy into the already-mapped region
- flush only if memory is not coherent
Current code assumes coherent memory, which is acceptable if preserved. If non-coherent support is added later, explicit flushes can be introduced.
Preferred behavior:
- choose a practical initial capacity based on current workload
- if exceeded, allocate a larger replacement buffer for future frames without
vkDeviceWaitIdle() - switch over only when safe by frame ownership
Pragmatic first step:
- significantly overprovision initial size
- only resize outside active gameplay or during explicit renderer recreation
That avoids stall spikes immediately, even before a fully elegant allocator lands.
After Phase 2:
- no
vkDeviceWaitIdle()in any steady-state frame submission path - no per-frame map/unmap on the
vgvector upload path - vector rendering still matches current output
- no steady-state gameplay allocations on the active Vulkan vector upload path
Current status:
- Completed for the main moving gameplay objects.
Goal:
- improve perceived smoothness independently of GPU utilization
- make motion smoother at 120 Hz and above
Extend render_metrics with:
float sim_alpha;where:
sim_alpha = sim_accum_s / sim_fixed_dt_s- clamped to
[0, 1]
Do not try to interpolate the entire game_state struct blindly.
Instead, add explicit render snapshots for the objects that matter visually:
- player position/orientation
- enemies
- bullets
- enemy bullets
- camera
- particles if needed
Recommended approach:
- store previous transform state before each fixed sim update
- store current transform state after update
- render as
lerp(prev, curr, sim_alpha)
Interpolation should be render-only.
Do not:
- change gameplay collision timing
- change spawn timing
- modify fixed-step simulation behavior
Start with:
- camera
- player
- enemies
- projectiles
These deliver most of the visible benefit.
After Phase 3:
- gameplay logic remains unchanged
- motion appears smoother at steady 120 Hz
- no visible rubber-banding or temporal lag
Current status:
- Completed for the known structure-tile resource path.
Goal:
- make the steady-state frame loop purely update/record/submit/present
Current known item:
ensure_active_structure_tile_resources(...)
There may be others worth auditing after the primary work lands.
Examples:
- menu transitions
- level load
- atlas change in editor
- swapchain recreation
If a resource must change during play, do it through an explicit “renderer dirty” path, not opportunistically every frame.
Do not add hidden fallback behavior.
If a required resource is missing:
- fail explicitly
- log clearly
- rebuild at a deliberate synchronization point
That matches project policy in AGENTS.md.
After Phase 4:
- no heavyweight create/destroy path runs from the main steady-state render loop
- level/editor transitions remain correct
Current status:
- Implemented and useful enough to keep.
- CPU hitch timing, low-perturbation rolling trace capture, and GPU timestamps are now part of the debug toolbox.
Goal:
- verify pacing improvements with data
- avoid relying on average FPS alone
Track and optionally display:
- frame time
- sim time
- record time
- submit/present wait time
Use Vulkan timestamp queries around:
- scene pass
- split background passes
- bloom pass
- composite pass
This is not for shader micro-optimization. It is to confirm whether pacing spikes come from:
- GPU work expansion
- queue backpressure
- synchronization stalls
Primary success metrics:
- lower 99th percentile frame time
- reduced max frame spikes
- less present jitter under steady play
What actually worked in practice:
- Fix VG memory-type selection correctness.
- Rework VG uploads to per-frame, persistently mapped fixed-capacity buffers.
- Add render interpolation for the main moving gameplay objects.
- Remove steady-state structure-tile resource churn from the frame loop.
- Add timing instrumentation and GPU timestamps.
- Switch swapchain policy to:
- prefer
VK_PRESENT_MODE_FIFO_KHR - use
caps.minImageCountfor FIFO swapchains
- prefer
- Leave two frames in flight deferred unless future evidence justifies reopening that work.
- offscreen resources are currently shared globally
- command buffer and upload ownership assumptions are currently one-frame oriented
- resource lifetime bugs may only appear under load
Mitigation:
- switch to two frames, not three, initially
- keep per-frame ownership explicit
- validate on menu, gameplay, level editor, and swapchain recreate
- stale state if previous/current snapshots are not updated consistently
- camera and entity interpolation can diverge if not sourced from the same sim boundaries
Mitigation:
- centralize snapshot capture around fixed-step update boundaries
- start with a narrow interpolated set
- this is shared renderer infrastructure
- backend changes affect all vector drawing
Mitigation:
- keep initial redesign minimal
- avoid parallel upload systems if one clean per-frame system can serve all vector draws
This work is complete when:
- steady-state gameplay has no active-frame
vkDeviceWaitIdle() - vector uploads do not map/unmap every frame
- render interpolation is active for key moving objects
- no heavyweight resource creation remains in the steady-state frame loop
- swapchain policy is intentionally documented as:
- prefer
VK_PRESENT_MODE_FIFO_KHR - use
caps.minImageCountfor FIFO
- prefer
- frame pacing is measurably more stable, especially in 99th percentile frame time
- two frames in flight remains an optional future experiment, not a completion requirement