The Brain DumpThis is the blog and personal web page of Andre Weissflog (Floh, floooh, flohofwoe) mostly about programming stuff.
https://floooh.github.io/
Sat, 21 Feb 2026 10:58:15 +0000Sat, 21 Feb 2026 10:58:15 +0000Jekyll v3.10.0The experimental Sokol Vulkan backend<p>Update: merge happened on <a href="https://github.com/floooh/sokol/blob/master/CHANGELOG.md#02-dec-2025">02-Dec-2025</a>.</p>
<p>In a couple of days I will merge the first implementation of a sokol-gfx
Vulkan backend. Please consider this backend as ‘experimental’, it has
only received limited testing, has limited platform coverage and some
known shortcomings and feature gaps which I will address in followup
updates.</p>
<p>The related PRs are here:</p>
<ul>
<li><a href="https://github.com/floooh/sokol/pull/1350">sokol/#1350</a> - this one also
has all the embedded shaders for the sokol ‘utility headers’, so it looks much
bigger than it actually is (the Vulkan backend is around the same size as
the GL backend, a bit over 3 kloc)</li>
<li><a href="https://github.com/floooh/sokol-tools/pull/196">sokol-tools/#196</a> - this
is the update for the shader compiler which is already merged</li>
</ul>
<p>The currently known limitiations are:</p>
<ul>
<li>the entire code expects a ‘desktop GPU feature set’ and doesn’t implement
fallback paths for mobile or generally ancient GPUs</li>
<li>the window system glue in sokol_app.h is only implemented for Linux/X11 - and before the question comes up again: it works just fine on Wayland-only distros</li>
<li>only tested on an Intel Meteor Lake integrated GPU (which also means
that some buffer types may be allocated in memory types that are not
optimal on GPUs without unified memory)</li>
<li>barriers for CPU => GPU updates are currently quite conservative
(e.g. more barriers might be inserted than needed, or at a too
early point in a frame)</li>
<li>there’s currently no GPU memory allocator, nor a way to inject
an external GPU memory allocator like VMA (at least the latter is planned)</li>
<li>rendering is currently only supported to a single swapchain
(not a problem when used with sokol_app.h because that also only
supports a single window)</li>
<li>it’s currently not possible to inject native Vulkan buffers and images
into sokol-gfx (that’s a somewhat esoteric feature supported by
the other backends)</li>
<li>I couldn’t get RenderDoc to work, but it’s unclear why</li>
</ul>
<p>On the upside:</p>
<ul>
<li>no sokol-gfx API or shader-authoring changes are required
(there are some minor breaking API changes because of some code cleanup work
I had planned already and which are not directly related to Vulkan, but
most code should work without or only minimal changes)</li>
<li>the Vulkan validation layer is silent on all sokol-samples (which try to cover
most sokol-gfx features and their combined usage), and this includes the tricky
optional synchronization2 validations (I’m pretty proud of that considering
that most Vulkan samples I tried have sync-validation errors)</li>
<li>performance on my Intel Meteor Lake laptop in the <a href="https://floooh.github.io/sokol-html5/drawcallperf-sapp.html">drawcallperf-sample</a>
is already slightly better than the OpenGL backend (on a vanilla
Kubuntu system)</li>
</ul>
<p>It’s also important to understand what actually motivated the Vulkan backend
(e.g. why now, and not earlier or much later):</p>
<p>It’s <em>not</em> mainly about performance, but about ‘future potential’ and OpenGL
rot. Essentially, the Vulkan backend is the first step towards deprecating the
OpenGL backend (first, an alternative to WebGL2 had to happen - which exists now
with WebGPU, and next an alternative for OpenGL on Linux (and less important:
Android) had to be implemented (which is the Vulkan backend). So far Linux and
Android were the only sokol-gfx target platforms limited to a single backend: OpenGL.
All other target platforms already have a more modern alternative (Windows with
D3D11 and macOS/iOS with Metal). Deprecating the OpenGL backend won’t happen for
a while, but personally I can’t wait to free sokol-gfx from the ‘shackles of
OpenGL’ ;)</p>
<p>Also another reason why I felt that now is the right time to tackle Vulkan support
is that the Vulkan API has improved quite a bit since 1.0 in ways that make it a much
better fit for sokol-gfx. In a nutshell (if you already know Vulkan concepts),
the sokol-gfx backend makes use of the following ‘modern’ Vulkan features:</p>
<ul>
<li>‘dynamic rendering’ (e.g. render passes are enclosed by begin/end
calls instead of being baked into render-pass objects) - e.g. pretty much
a copy of the Metal render pass model. This is a perfect match for
sokol-gfx sg_begin_pass()/sg_end_pass()</li>
<li><code class="language-plaintext highlighter-rouge">EXT_descriptor_buffer</code> - this is a controversial choice, but it’s a perfect
match for the sokol-gfx resource binding model and I really did not want to deal
with the traditional rigid Vulkan descriptor API (which is an overengineered boondoggle
if I’ve ever seen one). This is also the main reason why mobile GPUs had to be left out
for now, and apparently descriptor buffers are also a poor match for NVIDIA
GPUs. The plan here is to wait until Khronos completes work on a
descriptor pool replacement which AFAIK will be a mix of descriptor buffers and
D3D12-style descriptor heaps and then port the <code class="language-plaintext highlighter-rouge">EXT_descriptor_buffer</code> code
over to that new resource binding API</li>
<li>‘synchronization2’ (not a drastic change from the original barrier model,
I’m just listing it here for completeness)</li>
</ul>
<p>Work on the Vulkan backend spans three sub-projects:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">sokol-shdc</code>: added Vulkan-flavoured SPIRV output</li>
<li><code class="language-plaintext highlighter-rouge">sokol_app.h</code>: device creation, swapchain management and frame loop</li>
<li><code class="language-plaintext highlighter-rouge">sokol_gfx.h</code>: rendering and compute features</li>
</ul>
<h2 id="sokol-shdc-changes">sokol-shdc changes</h2>
<p>From the outside, the shader compiler changes are minimal (so minimal that
the update is actually already live for a little while).</p>
<p>The only change is that a new output shader format has been added: <code class="language-plaintext highlighter-rouge">spirv_vk</code>
for ‘Vulkan-flavoured SPIRV. To compile a GLSL input shader to SPIRV:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sokol-shdc -i bla.glsl -o bla.h -l spirv_vk
</code></pre></div></div>
<p>Internally the changes are also fairly small since sokol-shdc input shaders
are already authored in ‘Vulkan-flavoured GLSL’, the only missing information
is the descriptor set for resource bindings.</p>
<p>Sokol-shdc shaders only declare a bindslot on resource bindings with
different ‘bind spaces’ for uniform blocks, samplers and anything else,
for instance:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">fs_params</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">tex</span><span class="p">;</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">sampler</span> <span class="n">smp</span><span class="p">;</span>
</code></pre></div></div>
<p>Sokol-shdc performs a backend-specific bindslot allocation which for SPIRV
output assigns descriptor sets (uniform blocks live in descriptor set 0 and
everything else in descriptor set 1), and remap sampler bindings to resolve
bindslot collisions with textures, storage-buffer and storage-images, so the
above code snippet essentially becomes:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">layout</span><span class="p">(</span><span class="n">set</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">fs_params</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span>
<span class="k">layout</span><span class="p">(</span><span class="n">set</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">tex</span><span class="p">;</span>
<span class="k">layout</span><span class="p">(</span><span class="n">set</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">binding</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">sampler</span> <span class="n">smp</span><span class="p">;</span>
</code></pre></div></div>
<p>The one thing that’s not straightforward is that sokol-shdc does a ‘double-tap’
for SPIRV-output:</p>
<ul>
<li>the input shader code is compiled from GLSL to SPIRV</li>
<li>SPIRVTools optimizer passes are applied to the SPIRV</li>
<li>bindings are remapped (in this case: simply add descriptor set decorators
but keep the bindslots intact)</li>
<li>the SPIRV is translated back to GLSL via SPIRVCross</li>
<li>finally the SPIRVCross output is compiled <em>again</em> to SPIRV</li>
</ul>
<p>The weird double compilation is a compromise to avoid large structural changes
to the sokol-shdc code base and make the Vulkan shader pipeline less of a
special case. Essentially, SPIRV is used as an intermediate format in the
first compile pass, and then as output bytecode format in the second pass.</p>
<h2 id="sokol_apph-changes">sokol_app.h changes</h2>
<p>Apart from the actual Vulkan-related update I took the opportunity to do some
public API cleanup which was rolling around in my head for a while.</p>
<p>First, the backend-specific config options in the <code class="language-plaintext highlighter-rouge">sapp_desc</code> struct are
now grouped into per-backend-nested structs, e.g. from this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sapp_desc</span> <span class="nf">sokol_main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">argv</span><span class="p">[])</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">sapp_desc</span><span class="p">){</span>
<span class="c1">// ...</span>
<span class="p">.</span><span class="n">win32_console_utf8</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">win32_console_attach</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">html5_bubble_mouse_events</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">html5_use_emsc_set_main_loop</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>
<p>…to this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sapp_desc</span> <span class="nf">sokol_main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">argv</span><span class="p">[])</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">sapp_desc</span><span class="p">){</span>
<span class="c1">// ...</span>
<span class="p">.</span><span class="n">win32</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">console_utf8</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">console_attach</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">html5</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">bubble_mouse_events</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">use_emsc_set_main_loop</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>
<p>A new enum <code class="language-plaintext highlighter-rouge">sapp_pixel_format</code> has been introduced which will play a bigger
role in the future to allow more configuration options for the sokol-app swapchain.</p>
<p>A ton of backend-specific functions to query backend-specific objects have been
merged to better harmonize with sokol-gfx:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_metal_get_device</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_metal_get_current_drawable</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_metal_get_depth_stencil_texture</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_metal_get_msaa_color_texture</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_d3d11_get_device</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_d3d11_get_device_context</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_d3d11_get_render_view</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_d3d11_get_resolve_view</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_d3d11_get_depth_stencil_view</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_wgpu_get_device</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_wgpu_get_render_view</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_wgpu_get_resolve_view</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_wgpu_get_depth_stencil_view</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="kt">uint32_t</span> <span class="nf">sapp_gl_get_framebuffer</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
</code></pre></div></div>
<p>…those have been merged into:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sapp_environment</span> <span class="nf">sapp_get_environment</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="n">sapp_swapchain</span> <span class="nf">sapp_get_swapchain</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span>
</code></pre></div></div>
<p>The new structs <code class="language-plaintext highlighter-rouge">sapp_environment</code> and <code class="language-plaintext highlighter-rouge">sapp_swapchain</code> conceptually plug into
the sokol-gfx structs <code class="language-plaintext highlighter-rouge">sg_environment</code> and <code class="language-plaintext highlighter-rouge">sg_swapchain</code> (with the emphasis on
<strong>conceptually</strong>, you still need a mapping from the sokol-app structs and enums
to the sokol-gfx structs and enums, and this mapping is still peformed by the
sokol_glue.h header.</p>
<p>That’s it for the public API changes in sokol_app.h, now on to the Vulkan
specific parts:</p>
<p>The new struct <code class="language-plaintext highlighter-rouge">sapp_environment</code> contains a nested struct
<code class="language-plaintext highlighter-rouge">sapp_vulkan_environment vulkan;</code> with Vulkan object pointers (as type-erased
void-pointers so that they can be tunneled through backend-agnostic code):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sapp_vulkan_environment</span> <span class="p">{</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">physical_device</span><span class="p">;</span> <span class="c1">// VkPhysicalDevice</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">device</span><span class="p">;</span> <span class="c1">// VkDevice</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">queue</span><span class="p">;</span> <span class="c1">// VkQueue</span>
<span class="kt">uint32_t</span> <span class="n">queue_family_index</span><span class="p">;</span>
<span class="p">}</span> <span class="n">sapp_vulkan_environment</span><span class="p">;</span>
</code></pre></div></div>
<p>…and likewise the new struct <code class="language-plaintext highlighter-rouge">sapp_swapchain</code> contains a nested struct
<code class="language-plaintext highlighter-rouge">sapp_vulkan_swapchain vulkan;</code> with Vulkan object pointers which are needed
for a sokol-gfx swapchain render pass:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sapp_vulkan_swapchain</span> <span class="p">{</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">render_image</span><span class="p">;</span> <span class="c1">// VkImage</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">render_view</span><span class="p">;</span> <span class="c1">// VkImageView</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">resolve_image</span><span class="p">;</span> <span class="c1">// VkImage;</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">resolve_view</span><span class="p">;</span> <span class="c1">// VkImageView</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">depth_stencil_image</span><span class="p">;</span> <span class="c1">// VkImage</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">depth_stencil_view</span><span class="p">;</span> <span class="c1">// VkImageView</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">render_finished_semaphore</span><span class="p">;</span> <span class="c1">// VkSemaphore</span>
<span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">present_complete_semaphore</span><span class="p">;</span> <span class="c1">// VkSemaphore</span>
<span class="p">}</span> <span class="n">sapp_vulkan_swapchain</span><span class="p">;</span>
</code></pre></div></div>
<p>The Vulkan-specific startup code path looks like this (the usual boilerplate-heavy
initialization dance):</p>
<ul>
<li>A <code class="language-plaintext highlighter-rouge">VkInstance</code> object is created.</li>
<li>A platform- and window-system-specific <code class="language-plaintext highlighter-rouge">vkSurfaceKHR</code> object is created,
this is essentially the glue between a Vulkan swapchain and a specific
window system. In the first release this window system glue code is only
implemented for X11 via <code class="language-plaintext highlighter-rouge">vkCreateXlibSurfaceKHR</code>.</li>
<li>A <code class="language-plaintext highlighter-rouge">VkPhysicalDevice</code> is picked, this is the first time where the sokol-app
backend takes a couple of shortcuts, initialization will fail if:
<ul>
<li>EXT_descriptor_buffer is not supported (this currently rules out
most mobile devices)</li>
<li>the supported Vulkan API version is not at least 1.3</li>
<li>no ‘queue family’ exists which supports graphics, compute, transfer
and presentation commands all on the same queue</li>
</ul>
</li>
<li>Next a logical <code class="language-plaintext highlighter-rouge">VkDevice</code> object is created with the following required
features and extensions (with the exception of compressed texture formats
which are optional):
<ul>
<li>a single queue for all commands</li>
<li>EXT_descriptor_buffer</li>
<li>extendedDynamicState</li>
<li>bufferDeviceAddress</li>
<li>dynamicRendering</li>
<li>synchronization2</li>
<li>samplerAnisotropy</li>
<li>optional:
<ul>
<li>textureCompressionBC</li>
<li>textureCompressionETC2</li>
<li>textureCompressionASTC_LDR</li>
</ul>
</li>
</ul>
</li>
<li>The swapchain is initialized:
<ul>
<li>a <code class="language-plaintext highlighter-rouge">VkSwapchainKHR</code> object is created:
<ul>
<li>pixel format currently either RGBA8 or BGRA8 (no sRGB)</li>
<li>present-mode hardwired to <code class="language-plaintext highlighter-rouge">VK_PRESENT_MODE_FIFO_KHR</code></li>
<li>composite-alpha hardwired to <code class="language-plaintext highlighter-rouge">VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR</code></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">VkImage</code> and <code class="language-plaintext highlighter-rouge">VkImageView</code> objects are obtained or created
for the swapchain images, depth-stencil-buffer
and optional MSAA surface</li>
</ul>
</li>
<li>Finally a couple of VkSemaphore objects are created for each
swapchain image (the number of swapchain images is essentially
dictated by the Vulkan driver):
<ul>
<li>one <code class="language-plaintext highlighter-rouge">render_finished_semaphore</code> which signals that the GPU
has finished rendering to a swapchain surface</li>
<li>one <code class="language-plaintext highlighter-rouge">present_complete_semaphore</code> which signals that presenting
a swapchain image has completed and the image ready for reuse</li>
</ul>
</li>
</ul>
<p>At this point, the Vulkan specific code in sokol_app.h is at about 600
lines of code, which is a lot of boilerplate, but OTH is a lot less messy
than the combined OpenGL window system code for GLX, EGL, WGL or NSOpenGL
(yet still a lot more than the window system glue for the other backends).</p>
<p>The actually interesting stuff happens in the last two Vulkan backend functions:</p>
<p>The internal function <code class="language-plaintext highlighter-rouge">_sapp_vk_swapchain_next()</code> is a wrapper around
<code class="language-plaintext highlighter-rouge">vkAcquireNextImageKHR()</code> and obtains the next free swapchain image. The
function will also signal the associated <code class="language-plaintext highlighter-rouge">present_complete_semaphore</code>.</p>
<p>The last function in the sokol-app Vulkan backend is <code class="language-plaintext highlighter-rouge">_sapp_vk_present()</code>, this
is a wrapper for <code class="language-plaintext highlighter-rouge">vkQueuePresentKHR()</code>. The present operation uses the
<code class="language-plaintext highlighter-rouge">render_finished_semaphore</code> to make sure that presentation happens after the GPU
has finished rendering to the swapchain image. When the <code class="language-plaintext highlighter-rouge">vkQueuePresentKHR()</code>
function returns with <code class="language-plaintext highlighter-rouge">VK_ERROR_OUT_OF_DATE_KHR</code> or <code class="language-plaintext highlighter-rouge">VK_SUBOPTIMAL_KHR</code>, the
swapchain resources are recreated (this happens for instance when the window is
resized).</p>
<p>There’s a couple of open todo points in the sokol-app Vulkan backend which
I’ll take care of later:</p>
<ul>
<li>Any non-success return values from <code class="language-plaintext highlighter-rouge">vkAcquireNextImageKHR()</code> are currently
only logged but not handled. Normally the application is either supposed
to re-create the swapchain resources or skip rendering and presentation.
Since I couldn’t coerce my Kubuntu laptop to ever return a non-success value
from <code class="language-plaintext highlighter-rouge">vkAcquireNextImageKHR()</code> I would have to implement behaviour I couldn’t
test, so I had to skip this part for now. Maybe when moving the code over
to my Windows/NVIDIA PC I’ll be able to handle that situation properly.</li>
<li>Currently the swapchain image size must match the window client rectangle
size (same as OpenGL via GLX). The Vulkan swapchain API has an
optional scaling feature, but I couldn’t get this to work on my Kubuntu
laptop. Window-system scaling is mainly useful when the system has a
high-dpi display but lower-end GPU, and all other sokol-app backends
depend on the system to scale a smaller framebuffer to the window client
rectangle when needed.</li>
</ul>
<p>The main area I struggled with in the sokol-app Vulkan backend was
swapchain resizing. Most sokol-app backends kick off any swapchain
resize operation from the window system’s resize event, e.g.:</p>
<ul>
<li>window is resized by user</li>
<li>window system resize event fires giving the new window size</li>
<li>sokol-app listens for the window system resize event and initiates
a swapchain resize with the new size coming from the window system
event, then stores the new size for sapp_width/height() and finally
fires an <code class="language-plaintext highlighter-rouge">SAPP_EVENTTYPE_RESIZED</code> event</li>
</ul>
<p>This doesn’t work on the Vulkan backend, the validation layer would sometimes
complain that there’s a difference between actual and expected swapchain
surface dimensions (I forgot the exact error circumstances, forgiveable since
implementating a Vulkan backend is basically crawling from one validation
layer error to the next).</p>
<p>Long story short: I got it to work by leaving the host window system entirely
out of the loop and let the Vulkan swapchain take full control of the resize
process:</p>
<ul>
<li>window is resized by user</li>
<li>window system resize event fires, but is now ignored by sokol-app</li>
<li>the next time <code class="language-plaintext highlighter-rouge">vkQueuePresentKHR()</code> is called it returns with an error code
and this triggers a swapchain-resource resize, with the size coming from
the Vulkan surface object instead of the window system, finally an
<code class="language-plaintext highlighter-rouge">SAPP_EVENTTYPE_RESIZED</code> event is fired</li>
</ul>
<p>This fixes any validation layer warnings and is in the end a cleaner
implementation compared to letting the window system dictate the swapchain size.</p>
<p>There are downsides though: At least on my Kubuntu laptop it looks like the
window system and Vulkan swapchain code doesn’t run in lock step. Instead the
Vulkan swapchain seems to lag behind the window system a bit and this results in
minor artefacts during resizing: sometimes there’s a visible gap between the
Vulkan surface and window border, and the frame rate gets slighly out of whack
during resize. In comparison, on macOS rendering with Metal during window resize
is buttery smooth and without resize-jitter or border-gaps (although tbf,
removing the resize-jitter on macOS had to be explicitly implemented by
anchoring the NSView object to a window border).</p>
<p>That’s all there is to the Vulkan backend in sokol_app.h, on to sokol_gfx.h!</p>
<h2 id="sokol_gfxh-changes">sokol_gfx.h changes</h2>
<p>For the most part, the actual mapping of the sokol-gfx functions to Vulkan API
functions is very straightforward, often the mapping is 1:1. This is
mainly thanks to using a couple of modern Vulkan features and extensions:</p>
<ul>
<li>Dynamic rendering (e.g. <code class="language-plaintext highlighter-rouge">vkBeginRendering()/vkEndRendering()</code>) is a perfect match
for sokol-gfx <code class="language-plaintext highlighter-rouge">sg_begin_pass()/sg_end_pass()</code>, this is not very surprising though
because the dynamic rendering Vulkan API is basically a ‘de-OOP-ed’ version of the Metal
render pass API.</li>
<li><code class="language-plaintext highlighter-rouge">EXT_descriptor_buffers</code> is an absolutely perfect match for sokol-gfx’s
<code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> call, and a ‘pretty good’ match for <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code></li>
</ul>
<p>The main areas for future improvements are the barrier system and the staging
system, but let’s not get ahead of ourselves.</p>
<h3 id="a-10000-foot-view">A 10000 foot view</h3>
<p>Apart from the straight mapping of sokol-gfx API calls to Vulkan-API calls, the
Vulkan backend has to implement a couple of low-level subsystems. This isn’t
all that unusual, other backends also have such subsystems, but the Vulkan
backend definitely is the most ‘subsystem heavy’.</p>
<p>OTH some concepts of modern Vulkan are quite similar to WebGPU, Metal and even
D3D11 - and this conceptual overlap significantly simplified the Vulkan
backend implementation.</p>
<p>In some areas the Vulkan backend has even more straightforward implementations than
some of the other backends, for instance the implementation of the resource binding
call <code class="language-plaintext highlighter-rouge">sg_apply_bindings</code> in the Vulkan backend is one of the most straightforward
of all backends and especially compared to the WebGPU backend. In Vulkan it’s
literally just a bunch of memcpy’s followed by a single Vulkan API call to
record an offset into the descriptor buffer (ok, it’s actually a bit more
complicated because of the barrier system). Compared to that, the WebGPU backend
needs to use a ‘hash-and-cache’ approach for baked BindGroup objects, e.g.
calling <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> may involve creating and destroying WebGPU objects.</p>
<p>The low-level subsystems in the sokol-gfx Vulkan backend are:</p>
<ul>
<li>a ‘delete queue’ system for delayed Vulkan object destruction</li>
<li>the GPU memory allocation system (very rudimentary at the moment)</li>
<li>the frame-sync system (e.g. ensuring that the CPU and GPU can work
in parallel in typical render frames)</li>
<li>the uniform update system</li>
<li>the bindings update system</li>
<li>two ‘staging systems’ for copying CPU-side data into GPU-side resources:
<ul>
<li>a ‘copy’ staging system</li>
<li>a ‘stream’ staging system</li>
</ul>
</li>
<li>the resource barrier system</li>
</ul>
<p>Let’s look at those one by one:</p>
<h3 id="the-delete-queue-system">The Delete Queue System</h3>
<p>Vulkan doesn’t have any automatic lifetime management like some other 3D APIs
(e.g. no D3D-style reference counting). When you call a destroy function on
an object, it’s gone. When you do that while the object is still in flight
(e.g. referenced in a queue and waiting to be consumed by the GPU), hilarity
ensues.</p>
<p>IMHO this is much better than any automatic lifetime management system, because
it avoids any confusion about reference counts (e.g. questions like: when I call
this function to get an object reference, will that bump the refcount or not?),
but this means that a Vulkan backend needs to implement some sort of garbage
collection on its own.</p>
<p>Sokol-gfx uses a double-buffered delete-queue system for this. Each
‘double-buffer-frame-context’ owns a delete queue which is a simple fixed-size
array of pointer-pairs. Each queue item consists of:</p>
<ul>
<li>one type-erased Vulkan object pointer (e.g. a void-pointer)</li>
<li>a function pointer for a destructor function which takes a
void* as argument and knows how to destroy that Vulkan object</li>
</ul>
<p>All Vulkan object types which may be referenced in command buffers will not
call their <code class="language-plaintext highlighter-rouge">vkDestroy*()</code> functions directly, but instead add them to the
delete-queue that’s associated with the currently recorded command buffer. At
the start of a new frame (what ‘new frame’ actually means is explained down in
the ‘frame-sync system’), the delete-queue for that frame-context is drained by
calling the destructor function with the Vulkan object pointer of a queue item.
This makes sure that any Vulkan objects are kept alive until the GPU has finished
processing any command buffers which might hold references to those objects.</p>
<h3 id="the-gpu-memory-allocation-system">The GPU Memory Allocation System</h3>
<p>Currently GPU allocations do <em>not</em> go through a custom allocator, instead
all granular allocations directly call into <code class="language-plaintext highlighter-rouge">vkAllocateMemory()</code>. Originally
I had intended to use SebAaltonen’s <a href="https://github.com/sebbbi/OffsetAllocator">OffsetAllocator</a>
as the default GPU allocator, but also expose an allocator interface to allow
users to inject more complex allocators like <a href="https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator">VMA</a>.</p>
<p>Historically a custom allocator was pretty much required because some Vulkan
drivers only allowed 4096 unique GPU allocations. Today though it looks
like pretty much all (desktop) Vulkan drivers allow 4 billion allocations (at least
according to the <a href="https://vulkan.gpuinfo.org/">Vulkan hardware database</a>).</p>
<p>The plan is still to at least allow injecting a custom GPU allocator via an
allocator interface, and also maybe to integrate OffsetAllocator as default
allocator, but without knowing the memory allocation strategy of Vulkan drivers
this may be redundant. E.g. if a Vulkan driver essentially integrates something
like VMA anyway there’s not much point stacking another allocator on top of it,
at least for a fairly high level API wrapper like sokol-gfx.</p>
<p>In any case, the current GPU memory allocation implementation is prepared for
a bit more abstraction in the future. All GPU allocations go through a single
internal function <code class="language-plaintext highlighter-rouge">_sg_vk_mem_alloc_device_memory()</code> which takes a ‘memory type’
enum and a <code class="language-plaintext highlighter-rouge">VkMemoryRequirements</code> pointer as input. The memory type enum is
sokol-gfx specific and includes:</p>
<ul>
<li>storage buffer (an sg_buffer object with storage buffer usage)</li>
<li>generic buffer (all other sg_buffer types)</li>
<li>image (all usages)</li>
<li>internal staging buffer for the ‘copy-staging system’</li>
<li>internal staging buffer for the ‘stream-staging system’</li>
<li>internal uniform buffer</li>
<li>internal descriptor buffer</li>
</ul>
<p>Currently all resources are either in ‘device-local’ memory, or in
‘host-visible + host-coherent’ memory. Having the mapping from sokol-specific
memory type to Vulkan memory flags in one place makes it easier to tweak those
flags in the future (or delegate that decision to an external memory allocator).</p>
<h3 id="the-frame-sync-system">The Frame Sync System</h3>
<p>The frame sync system is mainly concerned about letting the CPU and GPU work
in parallel without stepping on each other’s feet. This basically comes down
to double-buffering all resources which are written by the CPU and read by the
GPU, and to have one sync-point in a sokol-gfx frame where the CPU needs to
wait for the oldest ‘frame-context’ to become available (e.g. is no longer
‘in flight’).</p>
<p>This single <code class="language-plaintext highlighter-rouge">CPU <=> GPU</code> sync point is implemented in a function
<code class="language-plaintext highlighter-rouge">_sg_vk_acquire_frame_command_buffers()</code>. The name indicates the main feature
of that function: it acquires command buffers to record the Vulkan commands of
the current frame. Command buffers are reused, so this involves waiting for the
command buffers to become available (e.g. they are no longer read from by the
GPU). “Command buffers” is plural because there are two command buffers per frame:
one which records all staging-commands, and one for the actual compute/render
commands - more on that later in the staging system section.</p>
<p>For this <code class="language-plaintext highlighter-rouge">CPU <=> GPU</code> synchronization, each double-buffered frame-context owns
a <code class="language-plaintext highlighter-rouge">VkFence</code> which is signalled when the GPU is done processing a ‘queue submit’.</p>
<p>So the first and most important thing the <code class="language-plaintext highlighter-rouge">_sg_vk_acquire_frame_command_buffers()</code> function
does is to wait for the fence of the oldest frame-context with a call to <code class="language-plaintext highlighter-rouge">vkWaitForFences()</code>.</p>
<p>This potential-wait-operation is the reason why sokol-gfx applications should move
sokol-gfx calls towards the end of the frame callback and try to do all
heavy non-rendering-related CPU work at the start of the frame callback.
More specifically calls to:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">sg_begin_pass()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_update_buffer()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_update_image()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_append_buffer()</code></li>
</ul>
<p>…these are basically the ‘potential new-frame entry points’ of the sokol-gfx
API which may require the CPU to wait for the GPU.</p>
<p>The <code class="language-plaintext highlighter-rouge">_sg_vk_acquire_frame_command_buffers()</code> function does a couple more things
after <code class="language-plaintext highlighter-rouge">vkWaitForFences()</code> returns:</p>
<ul>
<li>first (actually before the <code class="language-plaintext highlighter-rouge">vkWaitForFences()</code> call) it checks if the function
had already been called in the current frame, if yes it returns immediately</li>
<li><code class="language-plaintext highlighter-rouge">vkResetFences()</code> is called on the fence we just waited on</li>
<li>the delete-queue is drained (e.g. all resources which were recorded for
destruction in the frame-context we just waited on are finally destroyed)</li>
<li>any command buffers associated with the new frame are reset via <code class="language-plaintext highlighter-rouge">vkResetCommandBuffer()</code></li>
<li>…and recording into those command buffers is started via <code class="language-plaintext highlighter-rouge">vkBeginCommandBuffer()</code></li>
<li>additionally the other subsystems are notified because they might want to do their
own thing:
<ul>
<li><code class="language-plaintext highlighter-rouge">_sg_vk_uniform_after_acquire()</code></li>
<li><code class="language-plaintext highlighter-rouge">_sg_vk_bind_after_acquire()</code></li>
<li><code class="language-plaintext highlighter-rouge">_sg_vk_staging_stream_after_acquire()</code></li>
</ul>
</li>
</ul>
<p>The other internal function of the frame-sync system is <code class="language-plaintext highlighter-rouge">_sg_vk_submit_frame_command_buffers()</code>.
This is called at the end of a ‘sokol-gfx frame’ in the <code class="language-plaintext highlighter-rouge">sg_commit()</code> call. The main job
of this function is to submit the recorded command buffers for the current frame
via <code class="language-plaintext highlighter-rouge">vkQueueSubmit()</code>. This submit operation uses the two semaphores we got handed
from the outside world (e.g. sokol-app) as part of the swapchain information
in <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code>:</p>
<ul>
<li>the <code class="language-plaintext highlighter-rouge">present_complete_semaphore</code> is used as the wait-semaphore of the
<code class="language-plaintext highlighter-rouge">vkQueueSubmit()</code> call (the GPU basically needs to wait for the swapchain image
of the render pass to become available for reuse)</li>
<li>the <code class="language-plaintext highlighter-rouge">render_finished_semaphore</code> is used as the signal-semaphore to be signalled
when the GPU is done processing the submit payload</li>
</ul>
<p>Before the <code class="language-plaintext highlighter-rouge">vkQueueSubmit()</code> call there’s a bit more housekeeping happening:</p>
<ul>
<li>the other subsystems are notified about the submit via:
<ul>
<li><code class="language-plaintext highlighter-rouge">_sg_vk_staging_stream_before_submit()</code></li>
<li><code class="language-plaintext highlighter-rouge">_sg_vk_bind_before_submit()</code></li>
<li><code class="language-plaintext highlighter-rouge">_sg_vk_uniform_before_submit()</code></li>
</ul>
</li>
<li>recording into the command buffers which are associated with the current
frame context is finished via <code class="language-plaintext highlighter-rouge">vkEndCommandBuffers()</code></li>
</ul>
<p>It’s also important to note that there is one other potential <code class="language-plaintext highlighter-rouge">CPU <=> GPU</code>
sync-point in a frame, and that’s in the first <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> for a
swapchain render pass: the swapchain-info struct that’s passed into
<code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> contains a swapchain image which must be acquired via
<code class="language-plaintext highlighter-rouge">vkAcquireNextImageKHR()</code> (when using sokol_app.h this happens in the
<code class="language-plaintext highlighter-rouge">sapp_get_swapchain()</code> call - usually indirectly via <code class="language-plaintext highlighter-rouge">sglue_swapchain()</code>).</p>
<p>That is all for the frame-sync system in sokol-gfx, all in all quite similar to
Metal or WebGPU, just with more code bloat (as is the Vulkan way).</p>
<h3 id="resource-binding-via-ext_descriptor_buffer">Resource binding via EXT_descriptor_buffer</h3>
<p>…a little detour into Vulkan descriptors and how the sokol-gfx resource binding
model maps to Vulkan.</p>
<p>Conceptually and somewhat simplified, a Vulkan <strong>descriptor</strong> is an abstract
reference to a Vulkan buffer, image or sampler which needs to be accessible in a
shader. Basically what shows up on the shader side whenever you see a
<code class="language-plaintext highlighter-rouge">layout(binding=x) ...</code>. In sokol-gfx lingo this is called a ‘binding’.</p>
<p>In an ideal world, such a binding would simply be a ‘GPU pointer’ to some
opaque struct living in GPU memory which describes to shader code how
to access bytes in a storage buffer, pixels in a storage image, or how
to perform a texture-sampling operation.</p>
<p>In the real world it’s not that simple because this is exactly the one main area
where GPU architectures still differ dramatically: on some GPUs this information might be
hardwired into register tables and/or involves fixed-function features instead
of being just ‘structs in GPU memory’ - and unfortunately those differences are
not limited to shitty mobile GPUs, but are also still present in desktop GPUs.
Intel, AMD and NVIDIA all have different opinions on how this whole resource binding
thing should work - and I’m not sure anything has changed in the last decade
since Vulkan promised us a more-or-less direct mapping to the underlying hardware.</p>
<p>So in the real world 3D APIs still need to come up with some sort of abstraction
layer to get all those different hardware resource binding models under a common
programming model (and yes, even the apparently ‘low-level’ Vulkan API had to
come up with a highlevel abstraction for resource binding - and this went quite
poorly… but I disgress).</p>
<p>(side note: traditional vertex- and index-buffer-bindings are <em>not</em> performed
through Vulkan descriptors, but through regular ‘bindslot-setter’ calls like in
any other 3D API - go figure).</p>
<p>A Vulkan <strong>descriptor-set</strong> is a group of such concrete bindings which can be
applied as an atomic unit instead of applying each binding individually. In
the end the traditional Vulkan descriptor model isn’t all that different from
the ‘old’ bindslot model used in Metal V1 or D3D11, the one big and important
difference is that bindings are not applied individually but as groups.</p>
<p>The downside of such a ‘bind group model’ is of course that specific binding
combinations may be unpredictable - which is the one big recurring topic in
Vulkan’s (very slow) API evolution.</p>
<p>In ‘old Vulkan’ pretty much all state-combinations in all areas of the API need
to be known upfront in order to move as much work as possible into the
init-phase and out of the render-phase. Theoretically a pretty sensible plan,
but unfortunately only theoretically. In practice there are a lot of use cases
where pre-baking everything is simply not possible, especially outside the game
engine world, and even in gaming it doesn’t quite work - whenever you see
stuttering when something new appears on screen in modern games built on top of
state-of-the-art engines calling into modern 3D APIs - that’s most likely the core design
philosophy of Vulkan and D3D12 crashing and burning after colliding with
reality. Thankfully - but unfortunately very slowly - this is changing. Most of Vulkan’s
progress in the last decade was about rolling the core API back to a more ‘dynamic’
programming model.</p>
<p>Ok, back to Vulkan’s resource binding lingo:</p>
<p>A Vulkan <strong>descriptor-set-layout</strong> is the <em>shape</em> of a descriptor-set.
It basically says ‘there will be a sampled texture at binding 0, a buffer at
binding 1 and a sampler at binding 2’, but not the concrete texture, buffer or
sampler objects (those are referenced in the concrete <strong>descriptor-sets</strong>).</p>
<p>And finally a Vulkan <strong>pipeline-layout</strong> groups all descriptor-set-layouts required
by the shader stages of a Vulkan pipeline-state-object.</p>
<p>When coming from WebGPU this should all sound quite familiar since the
WebGPU bindgroups model is essentially the Vulkan 1.0 descriptor model
(for better or worse):</p>
<ul>
<li>WebGPU BindGroupEntry maps to Vulkan descriptors</li>
<li>WebGPU BindGroup maps to Vulkan descriptor sets</li>
<li>WebGPU BindGroupLayout maps to Vulkan descriptor set layouts</li>
<li>WebGPU PipelineLayout maps to Vulkan pipeline layouts</li>
</ul>
<p>‘Old Vulkan’ then adds descriptor pools on top of that but tbh I didn’t
even bother to deal with those and skipped right to <code class="language-plaintext highlighter-rouge">EXT_descriptor_buffer</code>.</p>
<p>With the descriptor buffer extension, descriptors and descriptor sets are ‘just
memory’ with opaque memory layouts for each descriptor type which are
specific to the Vulkan driver (depending on the driver and descriptor type, such
opaque memory blobs seem to be between 16 and 256 bytes per descriptor).</p>
<p>Binding resources with <code class="language-plaintext highlighter-rouge">EXT_descriptor_buffers</code> essentially looks like this:</p>
<p>In the init-phase:</p>
<ul>
<li>create a descriptor buffer big enough to hold all descriptors needed in
a worst-case frame</li>
<li>for each item in a descriptor-set-layout, ask Vulkan for the descriptor size and
relative offset to the start of the descriptor-set data in the descriptor buffer</li>
<li>similar for all concrete descriptors, ask Vulkan to copy their opaque memory
representation into some private memory location and keep those around for the render
phase (of course it’s also possible to move this step into the render phase)</li>
</ul>
<p>In the render-phase:</p>
<ul>
<li>memcpy the concrete descriptor blobs we stored upfront into the descriptor
buffer to compose an adhoc descriptor set, using the offsets we also stored upfront</li>
<li>finally record the start offset in the descriptor buffer into a Vulkan command
buffer via a Vulkan API call, and that’s it!</li>
</ul>
<p>This is pretty much the same procedure how uniform data updates are
performed in the sokol-gfx Metal and WebGPU backends, now just extended to
resource bindings.</p>
<p>E.g. TL;DR: both uniform data snippets and resource bindings are
‘just frame-transient data snippets’ which are memcpy’ed into per-frame
buffers and the buffer offsets recorded before the next draw- or dispatch-call.</p>
<p>In sokol-gfx, the VkDescriptorSetLayout and VkPipelineLayout objects are created
in <code class="language-plaintext highlighter-rouge">sg_make_shader()</code> using the shader interface reflection information provided
in the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> arg (which is usually code-generated by the sokol-shdc
shader compiler).</p>
<ul>
<li>the first descriptor set layout (set 0) describes all uniform block bindings
used by the shader across all shader stages</li>
<li>the second descriptor set layout (set 1) describes all texture, storage buffer,
storage image and sampler bindings</li>
</ul>
<p>…additionally, <code class="language-plaintext highlighter-rouge">sg_make_shader()</code> queries the descriptor sizes and offsets
within their descriptor set.</p>
<h3 id="the-uniform-update-system">The uniform update system:</h3>
<p>Conceptually uniform updates in the Vulkan backend are similar to the Metal backend:</p>
<ul>
<li>a double-buffered uniform buffer big enough to hold all uniform updates
for a worst-case frame, allocated in host-visible memory (so that the memory
is directly writable by the CPU and directly readable by the GPU)</li>
<li>a call to <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> memcpy’s the uniform data snippet into the
next free uniform buffer location (taking alignment requirements into account),
this happens individually for the up to 8 ‘uniform block slots’</li>
<li>before the next draw- or dispatch-call, the offsets into the uniform buffer for the up to
8 uniform block slots are recorded into the current command buffer</li>
</ul>
<p>The last step of recording the uniform-buffer offsets is delayed into the next
draw- or dispatch-call to avoid redundant work. This is because <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code>
works on a single uniform block slot, but in Vulkan all uniform block slots are
grouped into one descriptor set, and we only want to apply that descriptor-set
at most once per draw/dispatch call.</p>
<p>The actual <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> call is extremely cheap since no Vulkan API
calls are performed:</p>
<ul>
<li>a simple memcpy of the uniform data snippet into the per-frame uniform buffer</li>
<li>writing the ‘GPU buffer address’ and snippet size into a cached
array of <code class="language-plaintext highlighter-rouge">VkDescriptorAddressInfoEXT</code> structs</li>
<li>setting a ‘uniforms dirty flag’.</li>
</ul>
<p>…then later in the next draw- or dispatch-calls if the ‘uniforms dirty flag’
is set the actual uniform block descriptor set binding happens:</p>
<ul>
<li>for each uniform block used in the current pipeline/shader, a opaque descriptor
memory blob is directly written into the frame’s descriptor buffer via
a call to <code class="language-plaintext highlighter-rouge">vkGetDescriptorEXT()</code></li>
<li>the start offset of the descriptor-set in the descriptor buffer is recorded
into the current frame command buffer via <code class="language-plaintext highlighter-rouge">vkCmdSetDescriptorBufferOffsetsEXT()</code></li>
</ul>
<p>…delaying the operation to record the uniform buffer offsets into the draw- or
dispatch-call to avoid redundant API calls is actually something that I will also
need to implement in the WebGPU backend (I was taking notes while implementing
the Vulkan backend which improvements could be back-ported to the WebGPU
backend, and I’ll take care of those right after the Vulkan backend is merged).</p>
<h3 id="the-resource-binding-system">The resource binding system</h3>
<p>Updating resource bindings via <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> is very similar to the
uniform update system, but actually even simpler because no extra uniform buffer
is involved, and some more initialization can be moved into the init-phase when
creating view objects:</p>
<p>When creating a texture-, storage-buffer- or storage-image-view object via
<code class="language-plaintext highlighter-rouge">sg_make_view()</code> or a sampler object via <code class="language-plaintext highlighter-rouge">sg_make_sampler)</code>, the concrete
descriptor data (those little 16..256 byte opaque memory blobs) is copied into
the sokol-gfx view or sampler object via <code class="language-plaintext highlighter-rouge">vkGetDescriptorEXT()</code>.</p>
<p>Then <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> is just a couple of memcpy’s and a Vulkan call:</p>
<ul>
<li>for each view and sampler in the <code class="language-plaintext highlighter-rouge">sg_bindings</code> argument, a memcpy of the
descriptor memory blob which was stored in the sokol-gfx object
into the current frame’s descriptor buffer happens - e.g. no
Vulkan calls for that…</li>
<li>finally a single call to <code class="language-plaintext highlighter-rouge">vkCmdSetDescriptorBufferOffsetsEXT()</code> records
the descriptor buffer offset into the current frame’s command buffer</li>
</ul>
<p>Vertex- and index-buffer bindings happen via traditional bindslot calls
(<code class="language-plaintext highlighter-rouge">vkCmdBindVertexBuffers</code> and <code class="language-plaintext highlighter-rouge">vkCmdBindIndexBuffer</code>). Additionally,
barriers may be inserted inside <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> but that will be explained
further down in the barrier system.</p>
<h3 id="the-two-staging-systems">The two staging systems</h3>
<p>Sokol-gfx currently has two separate staging systems for uploading CPU-side
data into GPU-memory with the rather arbitrary names ‘copy-staging-system’ and
‘stream-staging-system’. Both can upload data into buffers and images, but with
different compromises:</p>
<ul>
<li>the ‘copy-staging-system’ can upload large amounts of data through a single
small staging buffer (default size: 4 MB), with the downside that the Vulkan
queue needs to be flushed (e.g. a <code class="language-plaintext highlighter-rouge">vkQueueWaitIdle()</code> is involved)</li>
<li>the ‘stream-staging-system’ can upload a limited amount of data per-frame
through a fixed-size double-buffered staging buffer (default size: 16 MB -
but this can be tweaked in the <code class="language-plaintext highlighter-rouge">sg_setup()</code> call of course), this doesn’t cause
any frame-pacing ‘disruptions’ like the copy-staging-system does</li>
</ul>
<p>The copy-staging-system is currently used:</p>
<ol>
<li>to upload initial content into immutable buffers and images within
<code class="language-plaintext highlighter-rouge">sg_make_buffer()</code> and <code class="language-plaintext highlighter-rouge">sg_make_image()</code></li>
<li>to upload data into <code class="language-plaintext highlighter-rouge">usage.dynamic_update</code> images and buffers
in the <code class="language-plaintext highlighter-rouge">sg_update_buffer()</code>, <code class="language-plaintext highlighter-rouge">sg_append_buffer()</code> and <code class="language-plaintext highlighter-rouge">sg_update_image()</code> calls</li>
</ol>
<p>The stream-staging system is only used for <code class="language-plaintext highlighter-rouge">usage.stream_update</code> resources
when calling <code class="language-plaintext highlighter-rouge">sg_update_buffer()</code>, <code class="language-plaintext highlighter-rouge">sg_append_buffer()</code> and <code class="language-plaintext highlighter-rouge">sg_update_image()</code>.</p>
<p>This means that the correct choice of <code class="language-plaintext highlighter-rouge">usage.dynamic_update</code> and
<code class="language-plaintext highlighter-rouge">usage.stream_update</code> for buffers and images is much more important in the
Vulkan backend than in other backends.</p>
<p>In general:</p>
<ul>
<li>creating an immutable buffer or image <strong>with initial content</strong> in the
render-phase will ‘disrupt’ rendering (how bad this disruption actually is
remains to be seen though)</li>
<li>the same disruption happens for updating a buffer or image with <code class="language-plaintext highlighter-rouge">usage.dynamic_update</code>,</li>
<li>make sure to use <code class="language-plaintext highlighter-rouge">usage.stream_update</code> for buffers and images that need to be updated each
frame, but be aware that those uploads go through a single per-frame staging
buffer which needs to be big enough to hold all stream-uploads in a single
frame (staging buffer sizes can be adjusted in the sg_setup() call)</li>
</ul>
<p>The strategy for updating <code class="language-plaintext highlighter-rouge">usage.dynamic_update</code> resources may change in the future. For
instance I was considering treating dynamic-updates exactly the same as
stream-updates (e.g. going through the per-frame staging buffer to avoid
the <code class="language-plaintext highlighter-rouge">vkQueueWaitIdle()</code>), and when the staging buffer would overflow
fall back to the copy-staging system (also for stream-updates). This
felt too unpredictable to me, so I didn’t go that way for now.</p>
<p>Note that the staging system is the most likely system to drastically change
in the future (together with the barrier system). One of the important planned
changes in my mental sokol-gfx roadmap is a rewrite of the resource update API,
and this rewrite will most likely ‘favour’ modern 3D APIs and not worry about
OpenGL as much as the current very restrictive resource update API does.</p>
<p>The common part in both staging systems is how the actual upload happens:</p>
<ul>
<li>staging buffers are allocated in CPU-visible + cache-coherent memory
(the copy-staging system uses a single small buffer, while the stream-staging
system uses double-buffering)</li>
<li>a staging operation first memcpy’s a chunk of memory into the staging
buffer and then records a Vulkan command to copy that data from the
staging buffer into a Vulkan buffer or image (via <code class="language-plaintext highlighter-rouge">vkCmdCopyBuffer()</code> or
<code class="language-plaintext highlighter-rouge">vkCmdCopyBufferToImage2()</code></li>
<li>in the stream-staging system each buffer update is always a single call
to <code class="language-plaintext highlighter-rouge">vkCmdCopyBuffer()</code> and each image update is always one call to
<code class="language-plaintext highlighter-rouge">vkCmdCopyBufferToImage2()</code> per mipmap</li>
<li>in the copy-staging-system, staging operations which are bigger than the
staging buffer size will be split into multiple copy operations,
each copy-step involving a <code class="language-plaintext highlighter-rouge">vkQueueWaitIdle</code></li>
<li>overflowing the stream-staging buffer is a ‘soft error’, e.g. an
error will be logged but otherwise this is a no-op</li>
</ul>
<p>There is another notable implementation detail in the stream-staging
system which is related to the barrier system:</p>
<p>All stream-staging copy commands are recorded into a separate Vulkan command
buffer object so that they are not interleaved with the compute/render commands
which are recorded into the regular per-frame command buffer.</p>
<p>This is done to move any staging commands out of render passes which is pretty
much required for barrier management (I don’t quite remember though if the
Vulkan validation layer only complained about issuing barriers inside
<code class="language-plaintext highlighter-rouge">vkBeginRendering/vkEndRendering</code> or if copy commands were also prohibited during
the render phase).</p>
<p>Long story short: all Vulkan commands used for staging operations are recorded
into a separate command buffer so that all GPU => CPU copies can be moved in
front of any computer/render commands because of various Vulkan API usage
restrictions. This was necessary because sokol-gfx allows to call the
resource update functions at any point in a frame, most importantly within render passes.</p>
<h3 id="the-resource-barrier-system">The resource barrier system</h3>
<p>This was by far the biggest hassle and took a long time to get right, involving
several rewrites (and there’s <em>still</em> quite a lot of room for improvement).</p>
<p>The first implementation phase was basically to come up with a general barrier
insertion strategy which isn’t completely dumb yet still satisfies the Vulkan
default validation layer, the second and much harder step was then to also satisify
the optional synchronization2 validation layer (which even most ‘official’
Vulkan samples don’t seem to get right - go figure).</p>
<p>I won’t bore you with what Vulkan barriers are or why they are necessary, just
that barriers are usually needed when a Vulkan buffer or image changes the way
it is accessed by the GPU (for instance when a resource changes from
being a staging-upload target to being accessed by a shader, or when an image
object changes from being used as a pass attachment to being sampled as a
texture).</p>
<p>In sokol-gfx I tried as much as possible to use a ‘lazy barrier system’, e.g.
a barrier is inserted at the latest possible moment before a resource is used.</p>
<p>The basic idea is that sokol-gfx buffers and images keep track of their current
‘access state’, this may be a combination of:</p>
<ul>
<li>staging upload target</li>
<li>vertex buffer binding</li>
<li>index buffer binding</li>
<li>read-only storage buffer binding</li>
<li>read-write storage buffer binding</li>
<li>texture binding</li>
<li>storage image binding (always read-write)</li>
<li>a pass attachment (in the flavours color, resolve, depth or stencil)</li>
<li>a special ‘discard’ access modifier for pass attachments
(used with <code class="language-plaintext highlighter-rouge">SG_LOADACTION_DONTCARE</code>)</li>
<li>swapchain presentation</li>
</ul>
<p>Implicity those access states carry additional information which may be needed
for picking the right barrier type, like whether shader accesses are read-only,
read-write or write-only, and whether the access may happen exclusively in
compute passes, render passes, or both.</p>
<p>Ideally barriers would always be inserted right at the point before a resource
is bound (because only at that point it’s clear what the new access state is).</p>
<p>Unfortunately it’s not that simple: there’s a metric shitton of arbitrary
restrictions in Vulkan where exactly barriers may be inserted. The main
limitation is that no barriers can be inserted between <code class="language-plaintext highlighter-rouge">vkBeginRendering</code> and
<code class="language-plaintext highlighter-rouge">vkEndRendering</code> (which is hella weird, it would be obvious to disallow barriers
that involve the current pass attachments, but not for any other resources used
in the pass).</p>
<p>This limitation is currently the main reason why the sokol-gfx barrier system
is not optimal in some cases, because it requires to move any barriers that would
be inserted inside render passes before the start of the render pass. However sokol-gfx
can’t predict what resources will actually be used in the render pass
(spoiler: there’s a surprisingly simple solution to this problem which I
should have thought of myself much earlier - but that will be for a later
Vulkan backend update).</p>
<p>Currently, barrier insertion points are in the following sokol-gfx functions:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">sg_begin_pass()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_end_pass()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_update/append_*()</code></li>
</ul>
<p>The obvious barriers in begin- and end-pass are for image objects transitioning
in and out of attachment state.</p>
<p>In <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> barriers are only inserted inside compute passes (because
of the above mentioned ‘no barriers inside render passes’ rule).</p>
<p>In staging operations, barriers are issued at the start and end of the staging
operation, the ‘after-barrier’ is not great and eventually needs to be
moved elsewhere.</p>
<p>Now the tricky part: moving barriers out of render passes… there is one
situation where this is relevant: a compute pass writes to a buffer or
image, and that buffer or image is then read by a shader in a render pass. Ideally
the barrier for this would happen inside the render pass in <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code>,
but Vulkan validation layer says “no”.</p>
<p>What happens instead is that any resource that’s (potentially) written in a
compute pass is tracked as ‘dirty’, and then in the <code class="language-plaintext highlighter-rouge">sg_end_pass()</code> of the compute
pass, very conservative barriers are inserted for all those dirty resources.
‘Conservative’ means that I cannot predict how the resource will be used next,
so buffers are generally transitioned into ‘vertex+index+storage-buffer access
state’ and images are generally transferred into ‘texture access state’.</p>
<p>This generally appears to work but is not optimal. We’d like to delay those
barriers to when the resources are actually used, and also tighten the scope
of the barriers to their actual usage.</p>
<p>The solution for this is surprisingly simple: use the same ‘time warp’ that is
used for recording staging operations by recording barrier commands that would
need to be issued from within sokol-gfx render passes into a separate command
buffer which can then be enqueued <strong>before</strong> another command buffer which holds
all render/compute commands for the pass.</p>
<p>This is a perfect solution but requires a couple of changes which I didn’t want
to do in the first Vulkan backend release to not push that out even further:</p>
<ul>
<li>instead of a single command buffer per frame to hold all render/compute
commands, one command buffer per sokol-gfx pass is needed</li>
<li>for render passes, a separate command buffer per pass is needed to record
barrier commands so that the barriers can be moved out of Vulkan’s
<code class="language-plaintext highlighter-rouge">vkBeginRendering/vkEndRendering</code></li>
</ul>
<p>…inside <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> and <code class="language-plaintext highlighter-rouge">sg_end_pass()</code> we’re now doing some serious
time-travelling-shit:</p>
<p>Each resource that’s used in a render pass will keep track of all the ‘access
states’ it’s used as in the <code class="language-plaintext highlighter-rouge">sg_apply_bindings</code> call (for buffers that may be
vertex-, index- or read-only-storage-buffer-binding and for images it can only
be texture-binding), additionally the resource is uniquely-added to a tracking
array.</p>
<p>In <code class="language-plaintext highlighter-rouge">sg_end_pass()</code> we now have a list of all bound resources and their binding
types, and this information can be used to record ‘just the right’ barriers into
the <strong>separate</strong> command buffer that’s been set aside for render pass
barriers. This barrier command buffer is then enqueued <strong>before</strong> the command
buffer which holds the render commands for that pass and voila: perfectly scoped
render pass barriers. But as I said, this will need to wait until a followup
update.</p>
<h3 id="everything-else">Everything else…</h3>
<p>The rest of the Vulkan backend is so straightforward that it’s not
worth writing about, essentially 1:1 mappings from sokol-gfx API functions
to Vulkan API functions (the blog post is long enough as it is).</p>
<p>Apart from the resource update system (which is overly restrictive and
conservative in sokol-gfx, mainly because of OpenGL/WebGL), the sokol-gfx API
actually is a really good match for Vulkan. There are no expensive operations
(like creating and discarding Vulkan objects) happening in the ‘hot-path’. The
use of <code class="language-plaintext highlighter-rouge">EXT_descriptor_buffer</code> is not a great choice for some GPU architectures,
but as I said at the start: I’m waiting for Khronos to finish their new resource
binding API which apparently will be a mix of D3D12-style descriptor heaps and
<code class="language-plaintext highlighter-rouge">EXT_descriptor_buffer</code>.</p>
<p>The next steps will most likely be:</p>
<ul>
<li>porting the backend to Windows (still limited to Intel GPU though)</li>
<li>port the backend to NVIDIA (will have to wait until around January because
I’ll be away from my NVIDIA PC for the rest of the year)</li>
<li>expose a GPU memory allocator interface, and add a sample which hooks up VMA</li>
<li>…maaaybe integrate SebAaltonen’s OffsetAllocator as default allocator
(still not clear if I need that when all modern Vulkan drivers no longer
seem to have that infamous 4096 unique allocations limit)</li>
<li>tinker around with GPU memory heap types for uniform- and descriptor-buffers
on GPUs without unified memory (e.g. host-visible + device-local)</li>
<li>figure out why exactly RenderDoc doesn’t work (apparently it’s because
of <code class="language-plaintext highlighter-rouge">EXT_descriptor_buffer</code>, but RenderDoc claims to support the extension since 1.41)</li>
<li>add support for debug labels (not much point to implement this before
RenderDoc works)</li>
<li>implement the improved resource barrier system outlined above</li>
<li>add support for multiple swapchain passes (not needed when used with sokol_app.h,
but required for any ‘multi-window-scenario’)</li>
<li>improve interoperability with Vulkan code that exists outside sokol-gfx
(injecting Vulkan buffers and images into <code class="language-plaintext highlighter-rouge">sg_make_buffer/sg_make_image</code>
and add the missing <code class="language-plaintext highlighter-rouge">sg_vk_query_*()</code> functions to expose internal
Vulkan object handles)</li>
</ul>
<p>Originally I also had a long rant about the Vulkan API design in this
blog post, maybe I’ll put that into a separate post and also
change the style from rant into ‘constructive criticism’ (as hard as that will be lol).</p>
<p>My verdict about Vulkan so far is basically: Not great, not terrible.</p>
<p>It’s better than OpenGL but not as good (from an API user’s perspective) as pretty
much any other 3D API. In many places Vulkan is already the same mess as
OpenGL. Sediment layers of outdated, deprecated or competing features and
extensions which is incredibly hard to make sense of when not closely following
Vulkan’s development since its initial release in 2016 (which is the exact same
problem that ruined OpenGL).</p>
<p>At the very least, please, please, PLEASE aggressively remove cruft and reduce
the ‘optional-features creep’ in minor Vulkan API versions (which I think should
actually be major versions - 4 breaking versions in 10 years sounds just about right).</p>
<p>For instance when I’m working against the Vulkan 1.3 API I really don’t care
about any legacy features which have been replaced by newer systems (like
synchronization2 replacing the old synchronization API). Don’t expose the
extensions that have been incorporated into core up to 1.3, and also let me filter
out all those outdated declarations from the Vulkan headers so that code-completion
doesn’t suggest outdated API types and functions. Don’t require me to explicitly enable every
little feature (like anisotropic filtering) when creating a Vulkan device. If
some shitty old-school GPU doesn’t have anisotropic filtering, then just
silently ignore it instead of polluting the 3D API for all eternity just for
this one GPU model which probably wasn’t even produced anymore even back in
2016.</p>
<p>Vulkan profiles are a good idea in theory, but please move them into the
core API instead of implementing them as a Vulkan SDK feature. Give
me a <code class="language-plaintext highlighter-rouge">vkCreateSystemDefaultDevice(VK_PROFILE_*)</code> function to get rid of those
500 lines of boilerplate that <strong>every single Vulkan programmer</strong> needs to
duplicate line by line (people who need more control over the setup
process can still use that traditional initialization dance).</p>
<p>And PLEASE get somebody into Khronos who has the power to inject at least a
minimal amount of taste and elegance into Vulkan and who has a clear idea what should
and shouldn’t go into the core API, because just promoting random vendor
extensions into core is really not a good way to build an API (and that was
clear since OpenGL - and the <strong>one</strong> thing that Vulkan should have done better).</p>
<p>Also, a low-level and explicit API <strong>DOES NOT HAVE TO BE</strong> a hassle to use.</p>
<p>Somehow modern software systems always seem be built around the ‘no pain, no
gain’ philosophy (see Rust, Vulkan, Wayland, …), this sort of self-inflicted
suffering for the sake of purity is such a weird Christian flex that
I’m starting to wonder if ‘religious memes’ surviving under the surface in even
the most rational and atheist developer brains is actually a thing…</p>
<p>Maybe we should return to the ‘Californian hippie attitude’ for building computer
systems and software - apparently that had worked pretty great in the 70’s and 80’s ;)</p>
<p>…ok I’m getting into old-man-yells-at-cloud-mode again, so I’ll better stop here :D</p>
Mon, 01 Dec 2025 00:00:00 +0000
https://floooh.github.io/2025/12/01/sokol-vulkan-backend-1.html
https://floooh.github.io/2025/12/01/sokol-vulkan-backend-1.htmlThe sokol-gfx resource view update.<p><strong>Update:</strong> merge happened on 23-Aug-2025.</p>
<p>In a couple of days I will merge the next big (and breaking) sokol-gfx
update which adds resource view objects and in turn removes pre-baked
pass-attachment objects.</p>
<p>The update also requires to update sokol-shdc and recompile shaders.</p>
<p>The root PR is here: <a href="https://github.com/floooh/sokol/pull/1287">https://github.com/floooh/sokol/pull/1287</a></p>
<p>After merging the update I will spend a couple of weeks to take care of
pending issues and PRs before moving on to a followup <a href="https://github.com/floooh/sokol/issues/1302">resource views update 2</a>.</p>
<h2 id="what-are-resource-view-objects">What are resource view objects?</h2>
<p>If you’re familiar with D3D10 and later you’ll feel right at home since
resource views are a fundamental concept in D3D, and sokol-gfx’s concept
of resource views is closest to D3D11. Other 3D APIs either don’t have
view objects at all (WebGL2 and GL before version 4.3), or only associate resource
views with texture data but not buffer data (GL >= 4.3, Metal and WebGPU).</p>
<p>Typically resource views have a number of different purposes in the various
3D-APIs:</p>
<ul>
<li>they specialize a parent resource object for a specific usage in shaders
(for instance sampling an image object as a texture versus using the
same image object as render target)</li>
<li>they can reinterpret the data in a resource object (for instance to a
different pixel format or image type)</li>
<li>they can define a subset of the data in the resource object (for instance
selecting a specific mipmap or range of mipmaps in a texture)</li>
</ul>
<p>In sokol-gfx you can think of view objects mainly as specializations of an
<code class="language-plaintext highlighter-rouge">sg_image</code> or <code class="language-plaintext highlighter-rouge">sg_buffer</code> object for how the image or buffer is going to be accessed in
shaders:</p>
<ul>
<li>sampling a texture in a shader requires a <strong>texture view</strong></li>
<li>writing to a storage image in a compute shader requires a <strong>storage image view</strong></li>
<li>accessing a storage buffer in a shader requires a <strong>storage buffer view</strong></li>
<li>each render pass attachment type requires its own view object type:
<ul>
<li><strong>color-attachment views</strong></li>
<li><strong>resolve-attachment views</strong></li>
<li><strong>depth-stencil-attachment views</strong></li>
</ul>
</li>
</ul>
<p>Alternatively you can think of view objects as specializations of a resource
object for a specific bindings type (I was actually considering calling this new
object type <code class="language-plaintext highlighter-rouge">sg_binding</code>, but since ‘view’ is the more established term I went
with <code class="language-plaintext highlighter-rouge">sg_view</code> instead).</p>
<p>In sokol-gfx, resource view types are ‘runtime flavours’ of the same handle type
<code class="language-plaintext highlighter-rouge">sg_view</code>. This means that setting the wrong resource type on a bindslot won’t
be a compilation error, but a runtime error in the sokol-gfx validation layer,
so please make sure to test your code in debug build mode from time to time.</p>
<h2 id="new-unlocked-features">New unlocked features</h2>
<p>This first sokol-gfx resource view update unlocks the following features:</p>
<ul>
<li>Storage buffer bindings can now have an offset. Binding storage buffers
with offsets is mainly useful when the same buffer contains different
types of items in different sections of the buffer, and processing
those items in separate compute shaders - or if you only need to access a section
of a buffer with a compute shader.</li>
<li>Texture views can define a subset of the parent image by defining
their own mipmap- and slice-ranges (not on WebGL, GLES3 or GL4.1 - e.g. macOS)</li>
<li>Storage images are no longer ‘compute pass attachments’, but instead
bound like regular textures in the <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> call. This
allows writing to many different storage images in the same compute pass
(the number of simultaneously bound storage images is still very restricted
though)</li>
<li>Combinations of render pass attachment images are no longer ‘pre-baked’
into <code class="language-plaintext highlighter-rouge">sg_attachments</code> objects. Instead <code class="language-plaintext highlighter-rouge">sg_attachments</code> is now a
transient struct like <code class="language-plaintext highlighter-rouge">sg_bindings</code>. This relaxes another
‘combinatorial explosion scenario’ because rendering code longer needs
to predict all possible render-pass attachment combinations upfront.</li>
</ul>
<h2 id="current-restrictions-and-planned-features">Current restrictions and planned features</h2>
<p>The following resource view features are planned for a followup ‘resource view update 2’:</p>
<ul>
<li>Reinterpret the pixel format and image type of image objects in a view object.</li>
<li>Change the max number of per-shader-stage resource bindings of the same type
from hardwired conservative limits to dynamic device limits exposed in the
<code class="language-plaintext highlighter-rouge">sg_limits</code> struct (e.g. more than 4 storage image, 8 storage buffer or 16 texture
bindings - instead try to push those limits closer to 32)</li>
</ul>
<p>For more details about planned ‘update 2’ features see:</p>
<p><a href="https://github.com/floooh/sokol/issues/1302">https://github.com/floooh/sokol/issues/1302</a></p>
<h2 id="high-level-overview-of-public-api-changes">High level overview of public API changes</h2>
<ul>
<li>the <code class="language-plaintext highlighter-rouge">sg_attachments</code> object type and related functions have been removed</li>
<li>a new object type <code class="language-plaintext highlighter-rouge">sg_view</code> has been added along with related functions</li>
<li><code class="language-plaintext highlighter-rouge">sg_features</code> gained a new flag <code class="language-plaintext highlighter-rouge">.gl_texture_views</code>, when this is false the GL backend doesn’t
have full texture view support (e.g. it’s not possible to limit a view to a miplevel or slices
subset)</li>
<li>the <code class="language-plaintext highlighter-rouge">sg_attachments</code> name has been repurposed for a transient struct of render pass
attachment views:
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_attachments</span> <span class="p">{</span>
<span class="n">sg_view</span> <span class="n">colors</span><span class="p">[</span><span class="n">SG_MAX_COLOR_ATTACHMENTS</span><span class="p">];</span>
<span class="n">sg_view</span> <span class="n">resolves</span><span class="p">[</span><span class="n">SG_MAX_COLOR_ATTACHMENTS</span><span class="p">];</span>
<span class="n">sg_view</span> <span class="n">depth_stencil</span><span class="p">;</span>
<span class="p">}</span> <span class="n">sg_attachments</span><span class="p">;</span>
</code></pre></div> </div>
</li>
<li>the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct now has a unified array for views instead of separate
arrays for each ‘shader resource type’ (textures, storage images and storage buffers):
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_bindings</span> <span class="p">{</span>
<span class="c1">// ...</span>
<span class="n">sg_view</span> <span class="n">views</span><span class="p">[</span><span class="n">SG_MAX_VIEW_BINDSLOTS</span><span class="p">];</span>
<span class="c1">// ...</span>
<span class="p">}</span> <span class="n">sg_bindings</span><span class="p">;</span>
</code></pre></div> </div>
</li>
<li>the <code class="language-plaintext highlighter-rouge">sg_image_usage</code> struct now has more detailed usage flags for
render pass attachments, and the <code class="language-plaintext highlighter-rouge">.storage_attachment</code> usage flag
has been renamed to <code class="language-plaintext highlighter-rouge">.storage_image</code>:
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_image_usage</span> <span class="p">{</span>
<span class="n">bool</span> <span class="n">storage_image</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">color_attachment</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">resolve_attachment</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">depth_stencil_attachment</span><span class="p">;</span>
<span class="c1">// ...</span>
<span class="p">}</span> <span class="n">sg_image_usage</span><span class="p">;</span>
</code></pre></div> </div>
</li>
<li>in <code class="language-plaintext highlighter-rouge">sg_image_desc</code> the items to directly inject backend-specific view
objects have been removed:
<ul>
<li><code class="language-plaintext highlighter-rouge">d3d11_shader_resource_view</code></li>
<li><code class="language-plaintext highlighter-rouge">wgpu_texture_view</code></li>
</ul>
</li>
<li>in <code class="language-plaintext highlighter-rouge">sg_shader_desc</code>:
<ul>
<li>the internals of the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct to describe the shader
binding interface has been changed to a unified array of <code class="language-plaintext highlighter-rouge">sg_shader_view</code>
structs:
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_shader_desc</span> <span class="p">{</span>
<span class="c1">// ...</span>
<span class="n">sg_shader_view</span> <span class="n">views</span><span class="p">[</span><span class="n">SG_MAX_VIEW_BINDSLOTS</span><span class="p">];</span>
<span class="c1">// ...</span>
<span class="p">}</span> <span class="n">sg_shader_desc</span><span class="p">;</span>
</code></pre></div> </div>
</li>
<li>some renaming to better differentiate between ‘(storage) image and texture
bindings’, for instance ‘image-sampler-pairs’ are now called ‘texture-sampler-pairs’,
since only texture bindings are ‘sampled’, but not storage-image bindings</li>
</ul>
</li>
<li>many new items in the <code class="language-plaintext highlighter-rouge">sg_frame_stats</code> struct, mostly not directly related
to resource views, but filling some gaps</li>
</ul>
<h2 id="shader-authoring-changes">Shader Authoring Changes</h2>
<blockquote>
<p>TL;DR: When recompiling existing shaders you might get new errors about bindslot
collisions which need to be resolved by changing the <code class="language-plaintext highlighter-rouge">layout(binding=N)</code>
decorations.</p>
</blockquote>
<p>When using sokol-shdc, the only change on the shader side is that textures,
storage buffers and storage images now share a common bindslot range, previously
each binding type had its own slot range:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">@</span><span class="n">cs</span> <span class="n">cs</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">cs_inp_tex</span><span class="p">;</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">rgba8</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">writeonly</span> <span class="kr">image2D</span> <span class="n">cs_outp_tex</span><span class="p">;</span>
<span class="c1">// ...</span>
<span class="err">@</span><span class="n">end</span>
</code></pre></div></div>
<p>Note how in this (old) code-snippet the texture- and storage-image bindings use
the same bindslot 0 because previously textures and storage images had their own
bindslot space.</p>
<p>This code will now produce a ‘bindslot collision error’ when compiled with
sokol-shdc, because texture- and storage-image bindings now use the same bindslot
space, so bindings for texture-, storage-buffer- and storage-image-bindings across
all shader stages need to be fixed to not collide:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">@</span><span class="n">cs</span> <span class="n">cs</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">cs_inp_tex</span><span class="p">;</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">rgba8</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">writeonly</span> <span class="kr">image2D</span> <span class="n">cs_outp_tex</span><span class="p">;</span>
<span class="c1">// ...</span>
<span class="err">@</span><span class="n">end</span>
</code></pre></div></div>
<p>This bindslot fixup is the only change required on the shader side.</p>
<h2 id="working-with-texture-views">Working with Texture Views</h2>
<p>Sample code:</p>
<ul>
<li><strong>texcube-sapp</strong> (simple textured rendering): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/texcube-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/texcube-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/texcube-sapp-ui.html">WebGPU sample</a></li>
<li><strong>dyntex-sapp</strong> (CPU-update dynamic texture): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/dyntex-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/dyntex-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/dyntex-sapp-ui.html">WebGPU sample</a></li>
</ul>
<p>Let’s say a shader defines a texture binding at slot 3:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">tex</span><span class="p">;</span>
</code></pre></div></div>
<p>To ‘populate’ this bindslot on the CPU side you need two objects now: an image
object, and a texture view on the image object:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span>
<span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span>
<span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">subimage</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">});</span>
<span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">texture</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span> <span class="p">},</span>
<span class="p">});</span>
</code></pre></div></div>
<p>Since this is C you can also chain the designated initializers which looks a bit more compact
(unfortunately this isn’t supported in most other languages):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">texture</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span> <span class="p">});</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> call now has an array of <code class="language-plaintext highlighter-rouge">sg_view</code> handles instead
of separate arrays for images and storage buffers:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span>
<span class="p">.</span><span class="n">vertex_buffers</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">.</span><span class="n">views</span><span class="p">[</span><span class="n">VIEW_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">tex_view</span><span class="p">,</span>
<span class="p">.</span><span class="n">samplers</span><span class="p">[</span><span class="n">SMP_smp</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>Since the texture binding was defined as <code class="language-plaintext highlighter-rouge">layout(binding=3)</code> it’s also
safe to just use the bind slot index directly instead of the code-generated
constant:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span>
<span class="p">.</span><span class="n">vertex_buffers</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">.</span><span class="n">views</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="n">tex_view</span><span class="p">,</span>
<span class="p">.</span><span class="n">samplers</span><span class="p">[</span><span class="n">SMP_smp</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>In many situations you only need the view handle and don’t need the
separate image handle, this means you can nest the <code class="language-plaintext highlighter-rouge">sg_make_image()</code>
inside the <code class="language-plaintext highlighter-rouge">sg_make_view()</code> call:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">texture</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_image_view</span><span class="p">){</span>
<span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span>
<span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span>
<span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">subimage</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">}),</span>
<span class="p">});</span>
</code></pre></div></div>
<p>If you need the image handle later you can extract it from the
view object via <code class="language-plaintext highlighter-rouge">sg_query_view_image()</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_query_view_image</span><span class="p">(</span><span class="n">tex_view</span><span class="p">);</span>
</code></pre></div></div>
<p>Texture views can select a subrange of mipmaps and slices of their parent
image (not supported on WebGL2, GLES3 or GL4.1):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">texture</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span><span class="p">,</span>
<span class="p">.</span><span class="n">mip_levels</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">base</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="p">.</span><span class="n">count</span> <span class="o">=</span> <span class="mi">3</span> <span class="p">},</span>
<span class="p">.</span><span class="n">slices</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">base</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="p">.</span><span class="n">count</span> <span class="o">=</span> <span class="mi">2</span> <span class="p">},</span>
<span class="p">},</span>
<span class="p">});</span>
</code></pre></div></div>
<p>If <code class="language-plaintext highlighter-rouge">.count</code> is left at default-zero it means ‘all remaining mipmaps or slices’.
For instance this will only skip the most detailed mipmap but keep the
remaining mipmap chain in place:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">texture</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span><span class="p">,</span>
<span class="p">.</span><span class="n">mip_levels</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">base</span> <span class="o">=</span> <span class="mi">1</span> <span class="p">},</span>
<span class="p">},</span>
<span class="p">});</span>
</code></pre></div></div>
<h2 id="view-vs-parent-resource-lifetime-considerations">View vs parent resource lifetime considerations</h2>
<p>Before moving on to the other view types, a little interlude about
lifetimes and resource states:</p>
<p>If you’re coming from 3D APIs with ref-counted lifetime management like D3D, WebGPU
or Metal you might be tempted to ‘release’ a view’s parent resource object right after creating
its view object if the image object handle isn’t needed anymore:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span>
<span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span>
<span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">subimage</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">});</span>
<span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_view</span><span class="p">){</span> <span class="p">.</span><span class="n">texture</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span> <span class="p">});</span>
<span class="n">sg_destroy_image</span><span class="p">(</span><span class="n">img</span><span class="p">);</span>
</code></pre></div></div>
<p>In sokol-gfx lifetimes are explicit, if you pull the rug under a view
like this nothing catastrophic will happen (e.g. no crashes or hard
validation layers errors), but rendering operations involving such ‘dangling views’
will be silently skipped (this is basically the same behavior as before
when trying to render with images or buffers in a non-valid resource state).</p>
<p>Another slightly counter-intuitive behavior might be that a view object
remains in valid resource state despite its parent resource being destroyed, e.g.
following the above example code:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// get the destroyed image's resource state</span>
<span class="k">if</span> <span class="p">(</span><span class="n">sg_query_image_state</span><span class="p">(</span><span class="n">img</span><span class="p">)</span> <span class="o">==</span> <span class="n">SG_RESOURCESTATE_INVALID</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// if-branch taken, since the image had been destroyed</span>
<span class="c1">// ...</span>
<span class="p">}</span>
<span class="c1">// get the image's texture view resource state</span>
<span class="k">if</span> <span class="p">(</span><span class="n">sg_query_view_state</span><span class="p">(</span><span class="n">tex_view</span><span class="p">)</span> <span class="o">==</span> <span class="n">SG_RESOURCESTATE_VALID</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// if-branch *also* taken!</span>
<span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>I went a bit back and forth on this decision but I think the behavior makes
sense from the perspective that all resource state changes in sokol-gfx
are explicit (e.g. there are no ‘automatic’ state changes as
a side effect of a ‘remote’ state change of another object, instead all resource state changes
are directly caused by a function call on that resource object). The same
has always been true for pipelines and their shader object, just not
specifically documented.</p>
<p>If you want to check whether a view is ‘renderable’ you can use
the following shortcut:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="n">sg_query_image_state</span><span class="p">(</span><span class="n">sg_query_view_image</span><span class="p">(</span><span class="n">tex_view</span><span class="p">))</span> <span class="o">==</span> <span class="n">SG_RESOURCESTATE_VALID</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// the view is 'renderable'</span>
<span class="p">}</span>
<span class="c1">// or for storage buffer views:</span>
<span class="k">if</span> <span class="p">(</span><span class="n">sg_query_buffer_state</span><span class="p">(</span><span class="n">sg_query_view_buffer</span><span class="p">(</span><span class="n">sbuf_view</span><span class="p">))</span> <span class="o">==</span> <span class="n">SG_RESOURCESTATE_VALID</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// the view is 'renderable'</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This works because no matter what state the view object is in (or even exists),
<code class="language-plaintext highlighter-rouge">sq_query_view_image()</code> will either return an image handle or an invalid handle
and both can be passed into <code class="language-plaintext highlighter-rouge">sg_query_image_state()</code>. An invalid image handle
will return <code class="language-plaintext highlighter-rouge">SG_RESOURCESTATE_INVALID</code> while a valid image handle will return
the actual <code class="language-plaintext highlighter-rouge">SG_RESOURCESTATE_*</code> of the image object.</p>
<h2 id="tracking-uninit--init-cycles">Tracking uninit => init cycles</h2>
<p>If the parent resource goes through a ‘destroy => make’ or ‘uninit => init’ cycle,
all views which had been created from this parent resource must also be
re-initialized, otherwise rendering operations involving such ‘dangling views’
will silently be skipped.</p>
<p>A common pattern for this situation is to use the ‘uninit => init’ calls instead
of ‘destroy => make’ because the handles will remain valid (e.g. you don’t need
to distribute new object handles into all corners of your code base):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// first uninit/init the parent image with new params:</span>
<span class="n">sg_uninit_image</span><span class="p">(</span><span class="n">img</span><span class="p">);</span>
<span class="n">sg_init_image</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="o">&</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span> <span class="p">...</span> <span class="p">});</span>
<span class="c1">// then 'cycle' the image's view objects</span>
<span class="n">sg_uninit_view</span><span class="p">(</span><span class="n">tex_view</span><span class="p">);</span>
<span class="n">sg_init_view</span><span class="p">(</span><span class="n">tex_view</span><span class="p">,</span> <span class="o">&</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">texture</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span> <span class="p">});</span>
</code></pre></div></div>
<p>I was at first considering to add a ‘managed mode’ for views which would track
the state of their parent resource and automatically go through an uninit/init
cycle when needed, but this just didn’t fit into the sokol philosophy of explicit
lifetimes and resource states, and having this one special case for view objects
caused more confusion which wasn’t worth the small gain in convenience (this
decision also wasn’t purely based on gut feeling since I actually <em>had</em>
implemented the ‘managed mode’ already but then kicked it out again after
actually starting to port the sokol sample code over - it just didn’t ‘feel right’).</p>
<p>When porting existing code over to resource view objects, don’t forget
that you need to destroy at least two objects now for complete cleanup
(views <em>and</em> their parent resource).</p>
<p>The order in which you destroy the views and parent resources doesn’t
matter, this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_destroy_view</span><span class="p">(</span><span class="n">view</span><span class="p">);</span>
<span class="n">sg_destroy_image</span><span class="p">(</span><span class="n">img</span><span class="p">);</span>
</code></pre></div></div>
<p>…works just as well as this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_destroy_image</span><span class="p">(</span><span class="n">img</span><span class="p">);</span>
<span class="n">sg_destroy_view</span><span class="p">(</span><span class="n">view</span><span class="p">);</span>
</code></pre></div></div>
<p><strong>BUT BE AWARE OF THIS TRAP:</strong></p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_destroy_view</span><span class="p">(</span><span class="n">view</span><span class="p">);</span>
<span class="n">sg_destroy_image</span><span class="p">(</span><span class="n">sg_query_view_image</span><span class="p">(</span><span class="n">view</span><span class="p">));</span>
</code></pre></div></div>
<p>Since the view is already destroyed, <code class="language-plaintext highlighter-rouge">sg_query_view_image()</code> will return the invalid
handle, and passing the invalid handle into <code class="language-plaintext highlighter-rouge">sg_destroy_image()</code> is a silent no-op
(e.g. your image will leak).</p>
<p>…this is actually a nice example of how convenience in one situation (calling
<code class="language-plaintext highlighter-rouge">sg_query_view_image(view)</code> and <code class="language-plaintext highlighter-rouge">sg_destroy_image()</code> with an invalid handle
being a silent no-op) can cause trouble in other situations. I’ll need to think
about whether this should at least be logged as an error instead.</p>
<h2 id="working-with-render-pass-attachment-views">Working with render pass attachment views</h2>
<p>Sample code:</p>
<ul>
<li><strong>offscreen-sapp</strong> (simple offscreen rendering): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/offscreen-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/offscreen-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/offscreen-sapp-ui.html">WebGPU sample</a></li>
<li><strong>offscreen-msaa-sapp</strong> (multi-sampled offscreen rendering): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/offscreen-msaa-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/offscreen-msaa-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/offscreen-msaa-sapp-ui.html">WebGPU sample</a></li>
<li><strong>mrt-sapp</strong> (multiple-render-target, multi-sampled offscreen rendering): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/mrt-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/mrt-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/mrt-sapp-ui.html">WebGPU sample</a></li>
<li><strong>mrt-pixelformats-sapp</strong> (multiple render target rendering with different pixel formats): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/mrt-pixelformats-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/mrt-pixelformats-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/mrt-pixelformats-sapp-ui.html">WebGPU sample</a></li>
<li><strong>shadows-sapp</strong> (shadow-mapping with regular shadow map texture): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/shadows-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/shadows-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/shadows-sapp-ui.html">WebGPU sample</a></li>
<li><strong>shadows-depthtex-sapp</strong> (shadow-mapping with a depth-buffer texture): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/shadows-depthtex-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/shadows-depthtex-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/shadows-depthtex-sapp-ui.html">WebGPU sample</a></li>
<li><strong>miprender-sapp</strong> (render into mipmaps): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/miprender-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/miprender-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/miprender-sapp-ui.html">WebGPU sample</a></li>
<li><strong>layerrender-sapp</strong> (render into array slice): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/layerrender-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/layerrender-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/layerrender-sapp-ui.html">WebGPU sample</a></li>
</ul>
<p>In the previous sokol-gfx version, when doing offscreen rendering into an image object
a ‘pre-baked’ attachments object had to be created which was then passed into <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code>:</p>
<p>E.g. old code:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// create a color and depth-buffer image for offscreen rendering</span>
<span class="n">sg_image</span> <span class="n">color_img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">render_attachment</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">},</span>
<span class="c1">// ...</span>
<span class="p">});</span>
<span class="n">sg_image</span> <span class="n">depth_img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">render_attachment</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">},</span>
<span class="c1">// ...</span>
<span class="p">});</span>
<span class="c1">// create an attachments object from those images...</span>
<span class="n">sg_attachments</span> <span class="n">atts</span> <span class="o">=</span> <span class="n">sg_make_attachments</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_attachments_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">colors</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">image</span> <span class="o">=</span> <span class="n">color_img</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth_stencil</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">depth_img</span><span class="p">,</span>
<span class="p">});</span>
<span class="c1">// ... in the render loop for the offscreen render pass:</span>
<span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">attachments</span> <span class="o">=</span> <span class="n">atts</span> <span class="p">});</span>
<span class="c1">// ...</span>
<span class="n">sg_end_pass</span><span class="p">();</span>
<span class="c1">// ... and in the swapchain pass, bind the color image as texture:</span>
<span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span>
<span class="c1">// ...</span>
<span class="p">.</span><span class="n">images</span><span class="p">[</span><span class="n">TEX_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">color_img</span><span class="p">,</span>
<span class="c1">// ...</span>
<span class="p">});</span>
</code></pre></div></div>
<p>Now, instead of creating a pre-baked attachments object, separate ‘attachment-view’
objects are created upfront, but their combined use for rendering is no longer
pre-baked but defined on-the-fly in the <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> call, much like
bindings in the <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> call:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// create color- and depth-buffer images</span>
<span class="c1">// NOTE the more detailed usage flags</span>
<span class="n">sg_image</span> <span class="n">color_img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">color_attachment</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">},</span>
<span class="c1">// ...</span>
<span class="p">});</span>
<span class="n">sg_image</span> <span class="n">depth_img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">depth_stencil_attachment</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">},</span>
<span class="c1">// ...</span>
<span class="p">});</span>
<span class="c1">// create color- and depth-stencil attachment views</span>
<span class="n">sg_view</span> <span class="n">color_att_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">color_attachment</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">color_img</span><span class="p">,</span>
<span class="p">});</span>
<span class="n">sg_view</span> <span class="n">depth_att_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">depth_stencil_attachment</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">depth_img</span><span class="p">,</span>
<span class="p">});</span>
<span class="c1">// since the color-attachment image is also sampled as texture,</span>
<span class="c1">// we'll also need a texture view:</span>
<span class="n">sg_view</span> <span class="n">color_tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">texture</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">color_img</span><span class="p">,</span>
<span class="p">});</span>
<span class="c1">// later in the offscreen render pass, the attachment views</span>
<span class="c1">// are passed directly into sg_begin_pass:</span>
<span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pass_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">attachments</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">colors</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">color_att_view</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth_stencil</span> <span class="o">=</span> <span class="n">depth_att_view</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">});</span>
<span class="c1">// ...</span>
<span class="n">sg_end_pass</span><span class="p">();</span>
<span class="c1">// and in the swapchain pass, the texture view is bound</span>
<span class="c1">// to sample the offscreen-rendered image as texture:</span>
<span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span>
<span class="c1">// ...</span>
<span class="p">.</span><span class="n">views</span><span class="p">[</span><span class="n">VIEW_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">color_tex_view</span><span class="p">,</span>
<span class="c1">// ...</span>
<span class="p">});</span>
</code></pre></div></div>
<h2 id="working-with-storage-image-views">Working with storage image views</h2>
<p>Samples:</p>
<ul>
<li><strong>write-storageimage-sapp</strong> (write into storage image with compute shader): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/write-storageimage-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/write-storageimage-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/write-storageimage-sapp-ui.html">WebGPU sample</a></li>
<li><strong>imageblur-sapp</strong> (image blurring with compute shaders): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/imageblur-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/imageblur-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/imageblur-sapp.html">WebGPU sample</a></li>
</ul>
<p>Storage image bindings are no longer defined as compute-pass attachments in <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code>, but instead
like regular texture- or storage-buffer-bindings in <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code>.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// first create an image object with storage-image usage:</span>
<span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">storage_image</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">},</span>
<span class="c1">// ...</span>
<span class="p">});</span>
<span class="c1">// to write to the image with a compute shader, a storage image view is needed:</span>
<span class="n">sg_view</span> <span class="n">simg_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">storage_image</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span><span class="p">,</span>
<span class="p">.</span><span class="n">mip_level</span> <span class="o">=</span> <span class="p">...,</span> <span class="c1">// optional: select a specific miplevel</span>
<span class="p">.</span><span class="n">slice</span> <span class="o">=</span> <span class="p">...,</span> <span class="c1">// optional: select a specific slice</span>
<span class="p">},</span>
<span class="p">});</span>
<span class="c1">// ...and to sample that same image as a texture for rendering, a texture view is needed:</span>
<span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">texture</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span><span class="p">,</span>
<span class="p">});</span>
<span class="c1">// storage image views are now applied as regular bindings in a compute pass:</span>
<span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">compute</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">});</span>
<span class="c1">// ...</span>
<span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span>
<span class="p">.</span><span class="n">views</span><span class="p">[</span><span class="n">VIEW_simg</span><span class="p">]</span> <span class="o">=</span> <span class="n">simg_view</span><span class="p">,</span>
<span class="p">})</span>
<span class="n">sg_dispatch</span><span class="p">(...);</span>
<span class="n">sg_end_pass</span><span class="p">();</span>
<span class="c1">// and to use the compute-shader-updated image as a texture in a render pass,</span>
<span class="c1">// bind the texture view as usual:</span>
<span class="n">sg_begin_pass</span><span class="p">(...);</span>
<span class="c1">// ...</span>
<span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span>
<span class="c1">// ...</span>
<span class="p">.</span><span class="n">views</span><span class="p">[</span><span class="n">VIEW_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">tex_view</span><span class="p">,</span>
<span class="p">.</span><span class="n">samplers</span><span class="p">[</span><span class="n">SMP_smp</span><span class="p">]</span> <span class="o">=</span> <span class="n">smp</span><span class="p">,</span>
<span class="p">});</span>
<span class="n">sg_draw</span><span class="p">(...);</span>
<span class="n">sg_end_pass</span><span class="p">();</span>
</code></pre></div></div>
<h2 id="working-with-storage-buffer-views">Working with storage buffer views</h2>
<p>Samples:</p>
<ul>
<li><strong>vertexpull-sapp</strong> (vertex pulling from storage buffer): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/vertexpull-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/vertexpull-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/vertexpull-sapp-ui.html">WebGPU sample</a></li>
<li><strong>sbuftex-sapp</strong> (access storage buffer in fragment shader): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/sbuftex-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/sbuftex-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/sbuftex-sapp-ui.html">WebGPU sample</a></li>
<li><strong>instancing-compute-sapp</strong> (update instancing data with compute shader): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/instancing-compute-sapp-ui.html">WebGPU sample</a></li>
<li><strong>sbufoffset-sapp</strong> (demonstrate storage buffer bindings with offset): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/sbufoffset-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/sbufoffset-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/sbufoffset-sapp-ui.html">WebGPU sample</a></li>
</ul>
<p>To bind a buffer object as storage buffer for vertex-pulling or compute-shader access you now need a storage-buffer-view object:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// create a buffer with storage-buffer usage:</span>
<span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">storage_buffer</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">},</span>
<span class="c1">// ...</span>
<span class="p">});</span>
<span class="c1">// create a storage buffer view</span>
<span class="n">sg_view</span> <span class="n">sbuf_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">storage_buffer</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">buffer</span> <span class="o">=</span> <span class="n">buf</span><span class="p">,</span>
<span class="p">.</span><span class="n">offset</span> <span class="o">=</span> <span class="p">...,</span> <span class="c1">// optional 256-byte aligned offset</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="c1">// ...later in a render- or compute-pass bind the storage buffer view:</span>
<span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span>
<span class="p">.</span><span class="n">views</span><span class="p">[</span><span class="n">VIEW_ssbo</span><span class="p">]</span> <span class="o">=</span> <span class="n">sbuf_view</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>The 256-byte-alignment restriction for the offset is a bit unfortunate, since
vertex-buffer and index-buffer bind offsets don’t have that restriction. The
alignment restriction is coming in via WebGPU which on some Android devices
requires this 256 byte alignment, but the only realistic lower choice would be
64 bytes which frankly isn’t that much better
(see: <a href="https://vulkan.gpuinfo.org/displaydevicelimit.php?platform=android&name=minStorageBufferOffsetAlignment">https://vulkan.gpuinfo.org/displaydevicelimit.php?platform=android&name=minStorageBufferOffsetAlignment</a>)
and would still exclude about 8 percent of Android devices which is quite a lot.</p>
<h2 id="when-not-using-sokol-shdc">When not using sokol-shdc…</h2>
<p>Samples:</p>
<ul>
<li>for <a href="https://github.com/floooh/sokol-samples/tree/master/d3d11">D3D11</a></li>
<li>for <a href="https://github.com/floooh/sokol-samples/tree/master/metal">Metal</a></li>
<li>for <a href="https://github.com/floooh/sokol-samples/tree/master/glfw">desktop GL</a></li>
<li>for <a href="https://github.com/floooh/sokol-samples/tree/master/html5">WebGL2</a></li>
<li>for <a href="https://github.com/floooh/sokol-samples/tree/master/wgpu">WebGPU</a></li>
</ul>
<p>Some tweaks on the manually populated <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> structs are needed when not
using sokol-shdc:</p>
<ul>
<li>The separate bindslot reflection arrays for images, storage-buffers and storage-images
have been unified into a <code class="language-plaintext highlighter-rouge">views[]</code> array which mirrors the <code class="language-plaintext highlighter-rouge">views[]</code> array in the
<code class="language-plaintext highlighter-rouge">sg_bindings</code> struct. The actual reflection information in each view bindslot
has remained the same though.</li>
<li>The <code class="language-plaintext highlighter-rouge">.image_sampler_pair</code> array has been renamed to <code class="language-plaintext highlighter-rouge">.texture_sampler_array</code>, and
the struct member <code class="language-plaintext highlighter-rouge">.image_slot</code> has been renamed to <code class="language-plaintext highlighter-rouge">.view_slot</code>.</li>
</ul>
<p>Example from the <a href="https://github.com/floooh/sokol-samples/blob/master/wgpu/mrt-wgpu.c">wgpu/mrt_wgpu.c sample</a>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_shader</span> <span class="n">fsq_shd</span> <span class="o">=</span> <span class="n">sg_make_shader</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_shader_desc</span><span class="p">){</span>
<span class="c1">// ...</span>
<span class="p">.</span><span class="n">views</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">texture</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">wgsl_group1_binding_n</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">},</span>
<span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">texture</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">wgsl_group1_binding_n</span> <span class="o">=</span> <span class="mi">1</span> <span class="p">},</span>
<span class="p">[</span><span class="mi">2</span><span class="p">].</span><span class="n">texture</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">wgsl_group1_binding_n</span> <span class="o">=</span> <span class="mi">2</span> <span class="p">},</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">samplers</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">wgsl_group1_binding_n</span> <span class="o">=</span> <span class="mi">3</span> <span class="p">},</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">texture_sampler_pairs</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">view_slot</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="p">.</span><span class="n">sampler_slot</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">},</span>
<span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">view_slot</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="p">.</span><span class="n">sampler_slot</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">},</span>
<span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">view_slot</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span> <span class="p">.</span><span class="n">sampler_slot</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">},</span>
<span class="p">},</span>
<span class="p">});</span>
</code></pre></div></div>
<p>Shader code changes are only needed on WebGPU when using storage images. Those have
moved from <code class="language-plaintext highlighter-rouge">@group(2)</code> into <code class="language-plaintext highlighter-rouge">@group(1)</code> (this is because storage images are no longer
special compute-pass-attachments, but regular bindings just like texture- and
storage-buffer bindings).</p>
<h2 id="q--a">Q & A</h2>
<h3 id="why-no-vertex--and-index-buffer-views">Why no vertex- and index-buffer views</h3>
<p>I had actually implemented vertex- and index-buffer views at first because it
would have reduced the size of <code class="language-plaintext highlighter-rouge">sg_bindings</code> by 36 bytes (32 bytes vertex-buffer-offsets and 4 bytes
index-buffer-offset). In the end I rolled that change back since none of the
backend 3D APIs require creating view objects for binding vertex- and index-buffers, but
some rendering scenarios (like writing a renderer backend for Dear ImGui) heavily
depend on dynamic offsets for vertex- and index-data.</p>
<p>I might come back to that idea once additional drawing functions with base-offsets
are added (which is planned for the ‘not-too-distant future’). <del>Also adding
a D3D12 backend would require adding view objects for vertex- and index-buffers,
since D3D12 has removed the ability to bind vertex- and index-buffers directly
with a dynamic offset (at least that’s what I’m seeing in the D3D12 docs).</del></p>
<p><strong>Update:</strong> Nvm, I was wrong here, D3D12 just uses the name ‘view’ both for transient
structs and for baked objects, and <code class="language-plaintext highlighter-rouge">D3D12_VERTEX_BUFFER_VIEW</code> and <code class="language-plaintext highlighter-rouge">D3D12_INDEX_BUFFER_VIEW</code> are such a transient struct. Thanks to ‘@[email protected]` for making me aware of my misconception!</p>
<h3 id="why-no-texture-field-in-sg_image_usage-to-indicate-that-texture-views-may-be-created-for-an-image-object">Why no ‘texture’ field in sg_image_usage to indicate that texture views may be created for an image object?</h3>
<p>Simply because creating a texture view is always supported for image objects, so
that flag could be implicitly hardwired to true anyway (with one ‘legacy edge
case’: WebGL2 and GL4.1 not supporting binding multi-sampled images as
textures). In that edge-case, an explicit <code class="language-plaintext highlighter-rouge">.usage.texture</code> flag would allow to fail already at
image object creation instead of failing to create a texture view on a
multi-sampled image object, but since this is such a minor detail which only affects
‘legacy APIs’ (WebGL2 and GL 4.1) that I didn’t think adding an explicit texture
usage flag was worth it.</p>
<h3 id="whats-up-with-sg_max_view_bindslots-being-this-odd-28-instead-of-some-2n-value">What’s up with SG_MAX_VIEW_BINDSLOTS being this odd 28 instead of some 2^N value?</h3>
<p>That way the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct is a nice round 256 bytes (64 bytes for vertex
buffer handles and offsets, 8 bytes for index buffer and offset, 112 bytes for
view handles, 64 bytes for sampler handles plus 2*4 bytes for the start and end
canaries).</p>
<p>16 separate samplers might be overkill, so I might tweak the number of views vs
samplers a bit in the ‘resource view update 2’.</p>
Sun, 17 Aug 2025 00:00:00 +0000
https://floooh.github.io/2025/08/17/sokol-gfx-view-update.html
https://floooh.github.io/2025/08/17/sokol-gfx-view-update.htmlThe sokol-gfx 'compute milestone 2' update<blockquote>
<p>Update: merge happened on 24-May-2025</p>
</blockquote>
<p>In a couple of days I will merge the next breaking sokol_gfx.h update (aka the <code class="language-plaintext highlighter-rouge">compute-ms2</code>
update) which makes working with buffer objects a bit more flexible and will allow
compute shaders to write to <code class="language-plaintext highlighter-rouge">sg_image</code> objects via ‘compute pass attachments’.</p>
<p>The update also comes with a matching sokol-shdc update which writes additional
reflection information for storage images used in compute shaders into the
code-generated <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct.</p>
<blockquote>
<p>NOTE: all WASM sample URLs in the blog post require a WebGPU capable browser and will only
be valid after the merge.</p>
</blockquote>
<p>The implementation ticket is here, and this also has links to all related
PRs: <a href="https://github.com/floooh/sokol/issues/1244">https://github.com/floooh/sokol/issues/1244</a></p>
<h3 id="updated-documentation-sections">Updated documentation sections</h3>
<ul>
<li>in sokol_gfx.h, re-read the updated section <a href="https://github.com/floooh/sokol/blob/afc74bd88eab597665f5e4f10962c73524d7cbc1/sokol_gfx.h#L707-L798">ON COMPUTE PASSES</a></li>
<li>if you’re not using sokol-shdc for shader compilation, also re-read the
updated section <a href="https://github.com/floooh/sokol/blob/afc74bd88eab597665f5e4f10962c73524d7cbc1/sokol_gfx.h#L801-L1036">ON SHADER CREATION</a>
(most of that information is only needed when <em>not</em> using sokol-shdc though)</li>
<li>read the new doc section <a href="https://github.com/floooh/sokol/blob/afc74bd88eab597665f5e4f10962c73524d7cbc1/sokol_gfx.h#L1390-L1436">ON STORAGE IMAGES</a></li>
</ul>
<h3 id="an-important-behaviour-change-for-immutable-buffer-objects">An important behaviour change for immutable buffer objects</h3>
<p>The initial ‘compute shader’ update allowed to create immutable buffers without
initial data and guaranteed that the buffer content would be zero-initialized.
On some backend APIs this required a temporary memory allocation of the buffer
size which obviously wasn’t great.</p>
<p>This guaranteed zero-initialization has been rolled back now and the rules
for creating immutable buffer objects have been changed like this:</p>
<ul>
<li>when creating an immutable non-storage-buffer object (e.g. the buffer cannot
be written to with a compute shader), initial data <em>must</em> be provided</li>
<li>when creating an immutable storage-buffer object, no initial data needs to
provided, but in that case the buffer content will be ‘undefined’</li>
</ul>
<p>In practice this means that when you use a compute shader to initialize
storage buffer content you can no longer rely on the initial buffer content being
zero-initialized, instead write <em>all</em> buffer items in the compute shader,
even when they are supposed to be zero.</p>
<h3 id="multi-purpose-buffer-objects">Multi-purpose buffer objects</h3>
<p>It’s now possible to bind the same buffer object to different bind points (e.g.
bind the same buffer as vertex buffer, index buffer and/or storage buffer).
This means the following scenarios are now enabled:</p>
<ul>
<li>It’s possible to stash vertices and indices into the same buffer
(with the exception of WebGL2 where this is explicitly disallowed)</li>
<li>It’s now possible to use a compute shader to write data to a buffer, and
then bind this buffer as vertex- or index-buffer.</li>
</ul>
<p>To achieve this, the <code class="language-plaintext highlighter-rouge">sg_buffer_desc</code> struct has been changed to merge the previous
buffer type and buffer usage enum items into a new <code class="language-plaintext highlighter-rouge">sg_buffer_usage</code> struct which is a
boolean flag group:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_buffer_usage</span> <span class="p">{</span>
<span class="n">bool</span> <span class="n">vertex_buffer</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">index_buffer</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">storage_buffer</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">immutable</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">dynamic_update</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">stream_update</span><span class="p">;</span>
<span class="p">}</span> <span class="n">sg_buffer_usage</span><span class="p">;</span>
</code></pre></div></div>
<p>The default setup configures an immutable vertex buffer (just as before), e.g.
creating a buffer object like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices</span><span class="p">),</span>
<span class="p">})</span>
</code></pre></div></div>
<p>…is identical with:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">vertex_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">immutable</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices</span><span class="p">),</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…to create an immutable index buffer:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">indices</span><span class="p">),</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…to create an index buffer with stream-update hint:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">stream_update</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…to create a buffer that can be written by a compute shader and then
bound to a vertex buffer bindpoint:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">vertex_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">storage_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…and the same as index buffer:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">storage_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>To stash both vertices and indices into the same buffer object:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">vertex_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices_and_indices</span><span class="p">),</span>
<span class="p">});</span>
</code></pre></div></div>
<p>Note that ‘multi-purpose buffer usage’ is explicitly disallowed on WebGL2 (which is
only relevant for using a single buffer to hold vertex- and index-data, since
storage buffers are not available on WebGL2 anyway). To check for this restriction
use the new <code class="language-plaintext highlighter-rouge">sg_features.separate_buffer_types</code> boolean:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">sg_query_features</span><span class="p">().</span><span class="n">separate_buffer_types</span><span class="p">)</span> <span class="p">{</span>
<span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">vertex_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices_and_indices</span><span class="p">),</span>
<span class="p">});</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Any invalid combination of usage flags will also be checked in the sokol-gfx validation layer.</p>
<p>The following new sample uses a combined vertex/index buffer:</p>
<ul>
<li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/vertexindexbuffer-sapp.c">https://github.com/floooh/sokol-samples/blob/master/sapp/vertexindexbuffer-sapp.c</a></li>
<li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/vertexindexbuffer-sapp.glsl">https://github.com/floooh/sokol-samples/blob/master/sapp/vertexindexbuffer-sapp.glsl</a></li>
<li>WASM: <a href="https://floooh.github.io/sokol-webgpu/vertexindexbuffer-sapp-ui.html">https://floooh.github.io/sokol-webgpu/vertexindexbuffer-sapp-ui.html</a></li>
</ul>
<p>The <code class="language-plaintext highlighter-rouge">instancing-compute-sapp</code> sample has been updated to bind the compute-shader-updated
storage buffer as vertex buffer with hardware instancing:</p>
<ul>
<li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.c">https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.c</a></li>
<li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.glsl">https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.glsl</a></li>
<li>WASM: <a href="https://floooh.github.io/sokol-webgpu/instancing-compute-sapp-ui.html">https://floooh.github.io/sokol-webgpu/instancing-compute-sapp-ui.html</a></li>
</ul>
<p>There is no sample yet which uses a compute shader to write index data.</p>
<h3 id="breaking-changes-when-creating-image-objects">Breaking changes when creating image objects</h3>
<p>Similar to the above <code class="language-plaintext highlighter-rouge">sg_buffer_desc</code> change, usage hints in the <code class="language-plaintext highlighter-rouge">sg_image_desc</code> struct
are now provided through a new <code class="language-plaintext highlighter-rouge">sg_image_usage</code> struct looking like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_image_usage</span> <span class="p">{</span>
<span class="n">bool</span> <span class="n">render_attachment</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">storage_attachment</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">immutable</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">dynamic_update</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">stream_update</span><span class="p">;</span>
<span class="p">}</span> <span class="n">sg_image_usage</span><span class="p">;</span>
</code></pre></div></div>
<p>E.g. creating a ‘render-target texture’ for offscreen rendering now looks like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">render_attachment</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">...</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…and creating a image updated dynamically with CPU data with stream-update
behaviour:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">stream_update</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">...</span>
<span class="p">});</span>
</code></pre></div></div>
<p>As with <code class="language-plaintext highlighter-rouge">sg_buffer_usage</code>, invalid usage flag combinations are caught in the
sokol-gfx validation layer.</p>
<h3 id="compute-pass-attachments-aka-storage-images">Compute pass attachments (aka storage images)</h3>
<p>It’s now possible to use compute shaders to write to <code class="language-plaintext highlighter-rouge">sg_image</code> objects. The way this is currently
implemented is very similar to offscreen rendering (but will change in a future ‘resource view update’,
more info on that at the end of the blog post).</p>
<p>Let’s first write a simple compute shader in the sokol-shdc GLSL flavour which writes some
animated color gradient to a storage image:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">@</span><span class="n">cs</span> <span class="n">cs</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">cs_params</span> <span class="p">{</span>
<span class="kt">float</span> <span class="n">offset</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">rgba8</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">writeonly</span> <span class="kr">image2D</span> <span class="n">cs_out_tex</span><span class="p">;</span>
<span class="k">layout</span><span class="p">(</span><span class="n">local_size_x</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span> <span class="n">local_size_y</span><span class="o">=</span><span class="mi">16</span><span class="p">)</span> <span class="k">in</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">ivec2</span> <span class="n">size</span> <span class="o">=</span> <span class="n">imageSize</span><span class="p">(</span><span class="n">cs_out_tex</span><span class="p">);</span>
<span class="kt">ivec2</span> <span class="n">pos</span> <span class="o">=</span> <span class="kt">ivec2</span><span class="p">(</span><span class="n">mod</span><span class="p">(</span><span class="kt">vec2</span><span class="p">(</span><span class="n">gl_GlobalInvocationID</span><span class="p">.</span><span class="n">xy</span><span class="p">)</span> <span class="o">+</span> <span class="kt">vec2</span><span class="p">(</span><span class="n">size</span><span class="p">)</span> <span class="o">*</span> <span class="n">offset</span><span class="p">,</span> <span class="n">size</span><span class="p">));</span>
<span class="kt">vec4</span> <span class="n">color</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="kt">vec2</span><span class="p">(</span><span class="n">gl_GlobalInvocationID</span><span class="p">.</span><span class="n">xy</span><span class="p">)</span> <span class="o">/</span> <span class="kt">float</span><span class="p">(</span><span class="n">size</span><span class="p">),</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">imageStore</span><span class="p">(</span><span class="n">cs_out_tex</span><span class="p">,</span> <span class="n">pos</span><span class="p">,</span> <span class="n">color</span><span class="p">);</span>
<span class="p">}</span>
<span class="err">@</span><span class="n">end</span>
<span class="err">@</span><span class="n">program</span> <span class="n">compute</span> <span class="n">cs</span>
</code></pre></div></div>
<p>On the CPU side, create an <code class="language-plaintext highlighter-rouge">sg_image</code> object with ‘storage attachment usage’:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">storage_attachment</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="n">WIDTH</span><span class="p">,</span>
<span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="n">HEIGHT</span><span class="p">,</span>
<span class="p">.</span><span class="n">pixel_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_RGBA8</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>Next the image must be wrapped in an <code class="language-plaintext highlighter-rouge">sg_attachments</code> object. This allows to pick a specific
image surface (mip-level and/or slice) for the compute shader to access.
Up to 4 (or <code class="language-plaintext highlighter-rouge">SG_MAX_STORAGE_ATTACHMENTS</code>) images can be defined in a single attachment:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_attachments</span> <span class="n">atts</span> <span class="o">=</span> <span class="n">sg_make_attachments</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_attachments_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">storages</span><span class="p">[</span><span class="n">SIMG_cs_out_tex</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span><span class="p">,</span>
<span class="c1">// optionally pick a mip level and slice:</span>
<span class="p">.</span><span class="n">mip_level</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">.</span><span class="n">slice</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…next a compute pipeline object which wraps the above compute shader:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_pipeline</span> <span class="n">pip</span> <span class="o">=</span> <span class="n">sg_make_pipeline</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pipeline_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">compute</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">shader</span> <span class="o">=</span> <span class="n">sg_make_shader</span><span class="p">(</span><span class="n">compute_shader_desc</span><span class="p">(</span><span class="n">sg_query_backend</span><span class="p">)),</span>
<span class="p">});</span>
</code></pre></div></div>
<p>In the frame loop, run a compute pass and provide the attachments object,
apply the compute pipeline and uniform data, and finally call <code class="language-plaintext highlighter-rouge">sg_dispatch()</code>
to kick off the compute shader:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">compute</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">attachments</span> <span class="o">=</span> <span class="n">atts</span> <span class="p">});</span>
<span class="n">sg_apply_pipeline</span><span class="p">(</span><span class="n">pip</span><span class="p">);</span>
<span class="n">sg_apply_uniforms</span><span class="p">(</span><span class="n">UB_cs_params</span><span class="p">,</span> <span class="o">&</span><span class="n">SG_RANGE</span><span class="p">(</span><span class="n">cs_params</span><span class="p">));</span>
<span class="n">sg_dispatch</span><span class="p">(</span><span class="n">WIDTH</span> <span class="o">/</span> <span class="mi">16</span><span class="p">,</span> <span class="n">HEIGHT</span> <span class="o">/</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">sg_end_pass</span><span class="p">();</span>
</code></pre></div></div>
<p>…after the compute pass the image object can then be used as a texture binding in a regular render pass.</p>
<p>Find the complete sample here:</p>
<ul>
<li>WASM: <a href="https://floooh.github.io/sokol-webgpu/write-storageimage-sapp.html">https://floooh.github.io/sokol-webgpu/write-storageimage-sapp.html</a></li>
<li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/write-storageimage-sapp.c">https://github.com/floooh/sokol-samples/blob/master/sapp/write-storageimage-sapp.c</a></li>
<li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/write-storageimage-sapp.glsl">https://github.com/floooh/sokol-samples/blob/master/sapp/write-storageimage-sapp.glsl</a></li>
</ul>
<p>…and a more advanced example which has been ported from WebGPU:</p>
<ul>
<li>WASM: <a href="https://floooh.github.io/sokol-webgpu/imageblur-sapp.html">https://floooh.github.io/sokol-webgpu/imageblur-sapp.html</a></li>
<li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/imageblur-sapp.c">https://github.com/floooh/sokol-samples/blob/master/sapp/imageblur-sapp.c</a></li>
<li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/imageblur-sapp.c">https://github.com/floooh/sokol-samples/blob/master/sapp/imageblur-sapp.c</a></li>
</ul>
<h3 id="detailed-change-list">Detailed change list</h3>
<h4 id="sokol_apph">sokol_app.h:</h4>
<p>The D3D11/DXGI backend now creates a <code class="language-plaintext highlighter-rouge">D3D_FEATURE_LEVEL_11_1</code> device (with a
fallback to <code class="language-plaintext highlighter-rouge">D3D_FEATURE_LEVEL_11_0</code>). Feature Level 11.1 is needed to allow
more than 8 UAV (Unordered Access View) bindings. D3D11.1 was released around
2011 with Windows 8, so this is only an issue if support for Windows 7 is still
required or on very old GPUs (Win7 is now at 0.12% on Steam Hardware Survey,
but even if this turns out to be a problem, only the bindslot allocation
strategy in sokol-shdc for HLSL5 UAV bindslots needs to be changed).</p>
<h4 id="sokol_gfxh">sokol_gfx.h:</h4>
<ul>
<li>A new constant <code class="language-plaintext highlighter-rouge">SG_MAX_STORAGE_ATTACHMENTS = 4</code> has been added (most likely
bumped to at least 8 in the future)</li>
<li>The struct <code class="language-plaintext highlighter-rouge">sg_pixelformat_info</code> has gained two new flags:
<ul>
<li><code class="language-plaintext highlighter-rouge">bool read</code>: true if the pixel format supports compute shader read access</li>
<li><code class="language-plaintext highlighter-rouge">bool write</code>: true if the pixel format supports compute shader write access</li>
</ul>
<p>Currently the list of compute shader accessible pixel formats is hardwired to
the following list which is safe to use across all GPUs and backend APIs
(all those formats support read+write access):</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_RGBA8</code></li>
<li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_RGBA8SN/UI/SI</code></li>
<li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_RGBA16UI/SI/F</code></li>
<li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_R32UI/SI/F</code></li>
<li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_RG32UI/SI/F</code></li>
<li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_RGBA32UI/SI/F</code></li>
</ul>
</li>
<li>A new feature flag <code class="language-plaintext highlighter-rouge">sg_features.separate_buffer_types</code> has been added,
this is only true on WebGL2. The only effect of that flag is that
the same buffer object cannot be used as vertex- and index-buffer bindings.</li>
<li>The enums <code class="language-plaintext highlighter-rouge">sg_usage</code> and <code class="language-plaintext highlighter-rouge">sg_buffer_type</code> have been removed.</li>
<li>The struct <code class="language-plaintext highlighter-rouge">sg_buffer_usage</code> has been added.</li>
<li>The enum field <code class="language-plaintext highlighter-rouge">sg_buffer_desc.type</code> has been removed and replaced by
boolean flags in <code class="language-plaintext highlighter-rouge">sg_buffer_usage</code>.</li>
<li>The enum field <code class="language-plaintext highlighter-rouge">sg_buffer_desc.usage</code> has been repurposed as nested
struct item of type <code class="language-plaintext highlighter-rouge">sg_buffer_usage</code>.</li>
<li>The struct <code class="language-plaintext highlighter-rouge">sg_image_usage</code> has been added.</li>
<li>The boolean <code class="language-plaintext highlighter-rouge">sg_image_desc.render_target</code> has been removed and replaced
by <code class="language-plaintext highlighter-rouge">sg_image_usage.render_attachment</code></li>
<li>The enum feld <code class="language-plaintext highlighter-rouge">sg_image_desc.usage</code> has been repurposed as nested struct
item of type <code class="language-plaintext highlighter-rouge">sg_image_usage</code>.</li>
<li>A new struct <code class="language-plaintext highlighter-rouge">sg_shader_storage_image</code> has been added, this is nested in
in <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> and holds reflection information about storage image
bindings in compute shaders.</li>
<li>A new array <code class="language-plaintext highlighter-rouge">sg_shader_desc.storage_images[]</code> has been added to communicate
reflection information about storage image usage in compute shaders to sokol_gfx.h</li>
<li>A new array <code class="language-plaintext highlighter-rouge">sg_attachments_desc.storages[]</code> has been added to describe
‘storage image attachments’ for compute passes.</li>
<li>The function <code class="language-plaintext highlighter-rouge">sg_query_buffer_usage()</code> now returns a struct <code class="language-plaintext highlighter-rouge">sg_buffer_usage</code>.</li>
<li>The function <code class="language-plaintext highlighter-rouge">sg_query_image_usage()</code> now returns a struct <code class="language-plaintext highlighter-rouge">sg_image_usage</code>.</li>
</ul>
<h3 id="whats-next">What’s next</h3>
<p>Long story short: while working on the storage image update it became clear
that sokol_gfx.h needs resource-view objects.</p>
<p>This will allow more flexible resource bindings without creating temporary
3D-backend objects in the ‘hot path’ while keeping the sokol_gfx.h backend
implementations simple (e.g. I want to avoid a dynamic ‘hash-and-cache’
approach for 3D-backend resource objects as much as possible, it’s already bad
enough that this is needed with WebGPU BindGroups).</p>
<p>Currently resource view objects are managed under the hood, for instance
in the D3D11 backend:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">sg_buffer</code> objects with storage buffer usage generally create a Shader Resource View
for readonly-access in vertex-, fragment- and compute-shaders, and if the buffer
is immutable, also an Unordered Access View for write-access in compute shaders.
Notably, any starting offsets are hardwired to zero in both view objects.</li>
<li><code class="language-plaintext highlighter-rouge">sg_image</code> objects generally create a Shader Resource View object, but without
allowing to specify a mip-level range, array-slice range or different pixel format.</li>
<li><code class="language-plaintext highlighter-rouge">sg_attachments</code> objects create:
<ul>
<li>one Render Target View object per color attachment</li>
<li>an optional Depth Stencil View object for the depth-stencil attachment</li>
<li>one Unordered Access View object per storage attachment</li>
</ul>
</li>
</ul>
<p>The reason why storage images are currently treated as pass attachments instead
of regular bindings applied via <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> is because storage image
bindings need to pick a mip-level and/or slice, and at least on D3D11 this
requires a baked UAV object. Likewise, binding the same storage buffer with
different offsets would require one SRV or UAV object per offset.</p>
<p>The current plan for view objects in sokol_gfx.h looks like this:</p>
<ul>
<li>a single new resource object type is added: <code class="language-plaintext highlighter-rouge">sg_view</code>, with matching structs and functions
(<code class="language-plaintext highlighter-rouge">sg_view_desc</code>, <code class="language-plaintext highlighter-rouge">sg_make_view()</code>, <code class="language-plaintext highlighter-rouge">sg_destroy_view()</code>, etc…)</li>
<li>in return, the <code class="language-plaintext highlighter-rouge">sg_attachments</code> resource object type is removed (along with <code class="language-plaintext highlighter-rouge">sg_attachments_desc</code>,
<code class="language-plaintext highlighter-rouge">sg_make_attachments()</code>, <code class="language-plaintext highlighter-rouge">sg_destroy_attachments()</code> etc…)</li>
<li>view objects can be thought of as specialization of a resource object for
a specific bindslot type (I actually thought about calling the new resource type <code class="language-plaintext highlighter-rouge">sg_binding</code>,
but ‘view’ is the established name for this type of thing across backend 3D APIs), e.g. views will come in the
following ‘runtime flavours’:
<ul>
<li>texture views</li>
<li>storage buffer views</li>
<li>storage image views</li>
<li>color attachment views</li>
<li>resolve attachment views</li>
<li>depth-stencil attachment views</li>
</ul>
</li>
<li>…and maybe (but not sure yet):
<ul>
<li>vertex buffer views</li>
<li>index buffer views</li>
</ul>
<p>…vertex- and index-buffer-views would allow to remove the bind offset for
vertex- and index-buffers from <code class="language-plaintext highlighter-rouge">sg_bindings</code>, with the downside that one view
object would be required per offset, but I can’t think of a situation where a
highly dynamic starting offset would be required for vertex- and index-data.
To be clear: there is no backend API which requires a view object for vertex-
and index-buffer bindings, it would be purely a sokol_gfx.h thing (this also
means that it would be very cheap to build and destroy vertex- and
index-buffer-view objects on the fly since no calls into backend APIs would happen)</p>
</li>
<li>
<p>the new <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct would then look like this (notably storage
images for compute shader access would move from ‘pass attachments’ to
regular ‘bindings’)</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_bindings</span> <span class="p">{</span>
<span class="n">sg_view</span> <span class="n">vertex_buffers</span><span class="p">[</span><span class="n">SG_MAX_VERTEXBUFFER_BINDINGS</span><span class="p">]</span>
<span class="n">sg_view</span> <span class="n">index_buffer</span><span class="p">;</span>
<span class="n">sg_view</span> <span class="n">textures</span><span class="p">[</span><span class="n">SG_MAX_TEXTURE_BINDINGS</span><span class="p">];</span>
<span class="n">sg_view</span> <span class="n">storage_buffers</span><span class="p">[</span><span class="n">SG_MAX_STORAGEBUFFER_BINDINGS</span><span class="p">];</span>
<span class="n">sg_view</span> <span class="n">storage_images</span><span class="p">[</span><span class="n">SG_MAX_STORAGEIMAGE_BINDINGS</span><span class="p">]</span>
<span class="n">sg_sampler</span> <span class="n">samplers</span><span class="p">[</span><span class="n">SG_MAX_SAMPLER_BINDINGS</span><span class="p">];</span>
<span class="p">}</span> <span class="n">sg_bindings</span><span class="p">;</span>
</code></pre></div> </div>
</li>
<li>
<p><code class="language-plaintext highlighter-rouge">sg_attachments</code> would become a ‘transient struct’ similar to
<code class="language-plaintext highlighter-rouge">sg_bindings</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_attachments</span> <span class="p">{</span>
<span class="n">sg_view</span> <span class="n">colors</span><span class="p">[</span><span class="n">SG_MAX_COLOR_ATTACHMENTS</span><span class="p">];</span>
<span class="n">sg_view</span> <span class="n">resolves</span><span class="p">[</span><span class="n">SG_MAX_COLOR_ATTACHMENTS</span><span class="p">];</span>
<span class="n">sg_view</span> <span class="n">depth_stencil</span><span class="p">;</span>
<span class="p">}</span> <span class="n">sg_attachments</span><span class="p">;</span>
</code></pre></div> </div>
</li>
</ul>
<p>This ‘view update’ would have the following advantages:</p>
<ul>
<li>storage buffer bindings can have a starting offset, which simplifies
managing different types of data in the same buffer</li>
<li>texture and storage image bindings can (to some extent) reinterpret
the image data (e.g. casting to a different pixel format or selecting
a miplevel and slice range - this will have to be behind a feature flag
though)</li>
<li>multiple-render-target combinations no longer need to be prebaked</li>
</ul>
<p>No ETA yet on the ‘view update’ though, first I want to fix a couple of
internal things:</p>
<ul>
<li>the GL texture creation code is currently an unholy combination of
<code class="language-plaintext highlighter-rouge">glTexStorage</code> and <code class="language-plaintext highlighter-rouge">glTexImage</code> functions. I want to cleanly split
this into two code paths (unfortunatly macOS being stuck at GL 4.1
doesn’t have the <code class="language-plaintext highlighter-rouge">glTexStorage</code> functions, although I heard
that those functions are implemented but just not present in the
core GL headers - which I’ll need to investigate)</li>
<li>I want to improve the internal ‘lifetime tracking’ for referenced
resources (e.g. one resource object holding a reference to another
object). Currently it’s not possible to detect when such a referenced
object has gone through an ‘uninit/init’ cycle because this keeps
the same public handle while discarding and recreating backend
3D API objects. Especially for view objects (which need to track
their original resource object) it is important that views can
detect when their referenced resource object is discarded (and
I’m thinking about ‘auto-managed’ view objects which can recreate
themselves on the fly when their resource object goes through
uninit/init - no promises yet though).</li>
</ul>
<p>More info on those planned updates are in the following planning
tickets:</p>
<ul>
<li>resource views: <a href="https://github.com/floooh/sokol/issues/1252">https://github.com/floooh/sokol/issues/1252</a></li>
<li>better internal reference tracking: <a href="https://github.com/floooh/sokol/issues/1260">https://github.com/floooh/sokol/issues/1260</a></li>
<li>glTexStorage vs glTexImage: <a href="https://github.com/floooh/sokol/issues/1263">https://github.com/floooh/sokol/issues/1263</a></li>
</ul>
<p>…and that is all for today :)</p>
Mon, 19 May 2025 00:00:00 +0000
https://floooh.github.io/2025/05/19/sokol-gfx-compute-ms2.html
https://floooh.github.io/2025/05/19/sokol-gfx-compute-ms2.htmlThe sokol-gfx compute shader update<p><strong>Update:</strong> merged happened on 08-Mar-2025</p>
<p>In the next couple of days I will merge initial compute shader support
for sokol_gfx.h (and sokol-shdc). The update is surprisingly ‘low-profile’ in terms
of API changes, the only breaking change is that the runtime feature flag
<code class="language-plaintext highlighter-rouge">sg_features.storage_buffer</code> has been renamed to <code class="language-plaintext highlighter-rouge">sg_features.compute</code>
(this is because the same backends that supported storage buffers before
now also support compute shaders).</p>
<h2 id="availability-and-restrictions">Availability and Restrictions</h2>
<p>Compute shader support is available on the following platform/backend combos:</p>
<ul>
<li>macOS and iOS with Metal</li>
<li>Windows with D3D11 and GL</li>
<li>Linux with GL</li>
<li>Web with WebGPU</li>
</ul>
<p>…which means that compute shaders are not available on:</p>
<ul>
<li>macOS with GL</li>
<li>iOS with GLES3</li>
<li>Web with WebGL2</li>
<li>Android with GLES3</li>
</ul>
<p>The initial compute shader support comes with a couple of restricitions
which will most likely be lifted in later updates (in about that order):</p>
<ul>
<li>storage buffers cannot be bound as vertex- or index-buffers</li>
<li>no storage textures, e.g. compute shaders can only write buffer data but not texture data</li>
<li>there’s no way to read data from GPU resources back to the CPU side (or
copy data between GPU resources)</li>
</ul>
<p>Right now compute shaders are mostly useful for replacing
dynamic- and streaming-buffer update scenarios, where dynamic render
data is computed on the CPU and uploaded to buffers via <code class="language-plaintext highlighter-rouge">sg_update_buffer()</code>.</p>
<h2 id="new-compute-shader-samples">New compute shader samples</h2>
<p>To get an idea how compute shaders work in sokol-gfx, it’s best to read the
new sample code:</p>
<ul>
<li><a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.c">C code</a></li>
<li><a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.glsl">GLSL code</a></li>
<li><a href="https://floooh.github.io/sokol-webgpu/instancing-compute-sapp.html">WebGPU demo</a></li>
</ul>
<p>This is an evolution of the <a href="https://floooh.github.io/sokol-webgpu/instancing-sapp-ui.html">instancing-sapp</a>
sample, and moves all particle computations into compute shaders.</p>
<p>The other compute shader sample is a straight port of the <a href="https://webgpu.github.io/webgpu-samples/?sample=computeBoids">WebGPU compute boids sample</a> to
sokol-gfx:</p>
<ul>
<li><a href="https://github.com/floooh/sokol-samples/blob/master/sapp/computeboids-sapp.c">C code</a></li>
<li><a href="https://github.com/floooh/sokol-samples/blob/master/sapp/computeboids-sapp.glsl">GLSL code</a></li>
<li><a href="https://floooh.github.io/sokol-webgpu/computeboids-sapp.html">WebGPU demo</a></li>
</ul>
<p>Those two samples use ‘cross-backend’ GLSL shader code compiled to the underlying
shading languages via <a href="https://github.com/floooh/sokol-tools/">sokol-shdc</a>.</p>
<p>For authoring compute shaders with sokol-shdc it might make sense to read up
on <a href="https://www.khronos.org/opengl/wiki/Compute_Shader">GLSL compute shaders in the GL Wiki</a> -
note though that not all features have been properly tested yet (like sampling
textures in compute shaders, or accessing shared memory).</p>
<p>For using sokol-gfx compute shaders without sokol-shdc, check out the following
backend specific versions of the <code class="language-plaintext highlighter-rouge">instancing-compute</code> sample:</p>
<ul>
<li>D3D11: <a href="https://github.com/floooh/sokol-samples/blob/master/d3d11/instancing-compute-d3d11.c">instancing-compute-d3d11.c</a></li>
<li>Metal: <a href="https://github.com/floooh/sokol-samples/blob/master/metal/instancing-compute-metal.c">instancing-compute-metal.c</a></li>
<li>WebGPU: <a href="https://github.com/floooh/sokol-samples/blob/master/wgpu/instancing-compute-wgpu.c">instancing-compute-wgpu.c</a></li>
<li>GL4.3: <a href="https://github.com/floooh/sokol-samples/blob/master/glfw/instancing-compute-glfw.c">instancing-compute-glfw.c</a></li>
</ul>
<p>Also check out the updated documentation of <a href="https://github.com/floooh/sokol-tools/blob/master/docs/sokol-shdc.md">sokol-shdc</a>,
and the new documentation comment section on compute shaders in the sokol_gfx.h
header (search for: <code class="language-plaintext highlighter-rouge">ON COMPUTE PASSES</code> and re-read the updated section <code class="language-plaintext highlighter-rouge">ON SHADER CREATION</code>).</p>
<h2 id="shader-authoring-changes">Shader Authoring Changes</h2>
<p>The sokol-gfx update comes with a matching sokol-shdc update for
authoring compute shaders.</p>
<p>A new tag <code class="language-plaintext highlighter-rouge">@cs [name]</code> (similar to the existing <code class="language-plaintext highlighter-rouge">@vs [name]</code> and <code class="language-plaintext highlighter-rouge">@fs [name]</code>)
is used to identify a compute shader snippet, e.g. everything inside <code class="language-plaintext highlighter-rouge">@cs / @end</code>
will be compiled as a <a href="https://www.khronos.org/opengl/wiki/Compute_Shader">GLSL compute shader</a>.</p>
<p>NOTE that the distinction between readonly and read/write storage buffer
bindings is important, e.g.:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="n">readonly</span> <span class="n">buffer</span> <span class="n">cs_ssbo_in</span> <span class="p">{</span> <span class="n">particle</span> <span class="n">prt_in</span><span class="p">[];</span> <span class="p">};</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="n">buffer</span> <span class="n">cs_ssbo_out</span> <span class="p">{</span> <span class="n">particle</span> <span class="n">prt_out</span><span class="p">[];</span> <span class="p">};</span>
</code></pre></div></div>
<p>If your compute shader only reads (but doesn’t write) storage buffer content,
its binding declaration should be marked as <code class="language-plaintext highlighter-rouge">readonly</code>. This information will
be extracted by sokol-shdc and used by sokol-gfx for hazard-tracking
needed in some 3D-APIs.</p>
<p>The other notable shader specialty is the ‘workgroup size’, which in
GLSL is defined as:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">layout</span><span class="p">(</span><span class="n">local_size_x</span><span class="o">=</span><span class="n">X</span><span class="p">,</span> <span class="n">local_size_y</span><span class="o">=</span><span class="n">Y</span><span class="p">,</span> <span class="n">local_size_z</span><span class="o">=</span><span class="n">Z</span><span class="p">)</span> <span class="k">in</span><span class="p">;</span>
</code></pre></div></div>
<p>…if you’re used to HLSL, this is the same as <code class="language-plaintext highlighter-rouge">[numthreads(X,Y,Z)]</code>, or in WGSL
<code class="language-plaintext highlighter-rouge">@workgroup_size(X,Y,Z)</code>. On Metal this is called <code class="language-plaintext highlighter-rouge">threadsPerThreadGroup</code> and
is <strong>not</strong> defined in the shader code, but on the CPU side when issuing a dispatch
call (this is another case where sokol-shdc comes in handy, since it extracts
the workgroup size from the GLSL shader and passes it into sokol-gfx as
<code class="language-plaintext highlighter-rouge">sg_shader_desc.mtl_threads_per_threadgroup</code>).</p>
<p>Other then that you mainly need to be aware that your compute shader code must
be thread safe because compute shaders allow random write access into storage buffers
and the GPU is spawning many invocations of your shader running in parallel.</p>
<h2 id="on-the-cpu-side">On the CPU side</h2>
<p>The <code class="language-plaintext highlighter-rouge">sg_setup()</code> call gets a new config item <code class="language-plaintext highlighter-rouge">sg_desc.max_dispatch_calls_per_pass</code>
(default: 1024). This is used to allocate an internal array to keep track of
written storage buffers in a compute pass for hazard tracking purposes.</p>
<p>There’s a minor change when creating buffers: It’s now allowed to create
immutable buffers without initial content, and such buffers will be
zero-initialized (note though that dynamic- and streaming-buffers may
still have undefined buffer content after creation). Zero-initialization is useful
when using a compute shader to write the initial buffer content instead
of providing the data from the CPU side during the <code class="language-plaintext highlighter-rouge">sg_make_buffer()</code> call.</p>
<p>Shaders, pipelines and passes now come in two runtime flavours: ‘render’ vs ‘compute’,
where the ‘render flavours’ are fully compatible with existing code.</p>
<p>For shaders, nothing changes either when using sokol-shdc for shader authoring.
In that case you just write a compute shader and sokol-shdc will code-generate
a matching <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct which can be plugged directly into the
<code class="language-plaintext highlighter-rouge">sg_make_shader()</code> call.</p>
<p>A compute pipeline is a regular pipeline object without any render state,
but with a compute shader attached:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_pipeline</span> <span class="n">pip</span> <span class="o">=</span> <span class="n">sg_make_pipeline</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pipeline_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">compute</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">shader</span> <span class="o">=</span> <span class="n">a_compute_shader</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>Finally, kicking off ‘compute workloads’ happens with a new function <code class="language-plaintext highlighter-rouge">sg_dispatch()</code>
inside ‘compute passes’:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">compute</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">});</span>
<span class="n">sg_apply_pipeline</span><span class="p">(</span><span class="n">pip</span><span class="p">);</span>
<span class="n">sg_apply_bindings</span><span class="p">(...);</span>
<span class="n">sg_apply_uniforms</span><span class="p">(...);</span>
<span class="n">sg_dispatch</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span><span class="p">);</span>
<span class="n">sg_end_pass</span><span class="p">();</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">sg_dispatch()</code> call takes the number of ‘workgroups’ as arguments
(same convention as GL, D3D11 and WebGPU, but different from Metal’s <code class="language-plaintext highlighter-rouge">dispatchThreads</code> method).</p>
<p>Compute- vs render-passes now impose a couple of restrictions (checked
by the validation layer):</p>
<ul>
<li>the following functions must only be called in render passes:
<ul>
<li><code class="language-plaintext highlighter-rouge">sg_apply_viewport[f]()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_apply_scissor_rect[f]()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_draw()</code></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">sg_dispatch()</code> must only be called in a compute pass</li>
<li><code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> in a compute pass must not attempt to bind vertex- or index-buffers</li>
<li>the <code class="language-plaintext highlighter-rouge">sg_apply_pipeline()</code> pipeline type must match the pass type (e.g. render pipeline
objects can only be applied in render passes, and compute pipeline objects only
in compute passes)</li>
</ul>
<h2 id="when-not-using-sokol-shdc">When not using sokol-shdc</h2>
<p>If you don’t use sokol-shdc for shader authoring you’ll need to populate the
all-important <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct passed into <code class="language-plaintext highlighter-rouge">sg_make_shader()</code> yourself
with information that matches your shader code:</p>
<ul>
<li>A nested struct <code class="language-plaintext highlighter-rouge">compute_func</code> has been added (similar to existing
<code class="language-plaintext highlighter-rouge">vertex_func</code> and <code class="language-plaintext highlighter-rouge">fragment_func</code>) to pass a compute shader function as
backend-specific source code or bytecode blob</li>
<li>A Metal-specific <code class="language-plaintext highlighter-rouge">mtl_threads_per_threadgroup</code> nested struct which
defines the ‘workgroup size’ to the Metal API (this is in <code class="language-plaintext highlighter-rouge">sg_shader_desc</code>
because those values are normally extracted from shader code via reflection)</li>
<li>The <code class="language-plaintext highlighter-rouge">readonly</code> boolean in the storage buffer bindslot declaration is now
allowed to be false, but only in compute shaders. This flag is
now used by sokol-gfx as hint for ‘resource hazard tracking’ in some backend APIs.</li>
<li>A new HLSL/D3D11 specific item <code class="language-plaintext highlighter-rouge">uint8_t register_u_n</code> has been added to
the nested <code class="language-plaintext highlighter-rouge">storage_buffers[]</code> declarations (struct <code class="language-plaintext highlighter-rouge">sg_shader_storage_buffer</code>), this is used to communicate the
HLSL bindslot for writable storage buffer bindings (which are bound as D3D11
‘unordered access views’, while readonly storage buffers continue to be
bound as ‘shader resource views’).</li>
</ul>
<p>Also please carefully review the backend-specific compute shader samples
which directly pass backend-specific shader code into sokol-gfx:</p>
<ul>
<li>D3D11: <a href="https://github.com/floooh/sokol-samples/blob/master/d3d11/instancing-compute-d3d11.c">instancing-compute-d3d11.c</a></li>
<li>Metal: <a href="https://github.com/floooh/sokol-samples/blob/master/metal/instancing-compute-metal.c">instancing-compute-metal.c</a></li>
<li>WebGPU: <a href="https://github.com/floooh/sokol-samples/blob/master/wgpu/instancing-compute-wgpu.c">instancing-compute-wgpu.c</a></li>
<li>GL4.3: <a href="https://github.com/floooh/sokol-samples/blob/master/glfw/instancing-compute-glfw.c">instancing-compute-glfw.c</a></li>
</ul>
<h2 id="under-the-hood">Under the hood</h2>
<p>Most of the new code in sokol_gfx.h is just a straight-forward mapping from
sokol-gfx types and functions into backend 3D-API types and functions.</p>
<p>Only two details are worth mentioning:</p>
<ul>
<li>On Metal, and only on systems without unified memory, GPU-written
managed storage buffers are ‘synchronized’ at the end of a compute
pass inside <code class="language-plaintext highlighter-rouge">sg_end_pass()</code>. This synchronization basically updates the
CPU-side shadow copy of the buffer with the new data that’s been written
by a compute shader. This requires keeping track of all read/write storage
buffer bindings inside a compute pass (this is what the new <code class="language-plaintext highlighter-rouge">sg_desc.max_dispatch_calls_per_pass</code>
config item is used for).</li>
<li>On GL, <code class="language-plaintext highlighter-rouge">glMemoryBarrier()</code> calls are issued (at most once per <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code>
call) when a storage buffer was previously bound as read/write (which sets
an internal ‘gpu_dirty’ flag).</li>
</ul>
<h2 id="whats-next">What’s next</h2>
<p>…mainly patching remaining feature gaps in a couple of minor updates:</p>
<ul>
<li>allow storage buffers to be bound as vertex- and index-buffers</li>
<li>introducing storage textures which can be written by compute shaders</li>
<li>more ‘feature coverage’ by writing a handful more interesting compute samples</li>
</ul>
<p>…and what will most likely a bigger update: figure out a proper
sub-API for <code class="language-plaintext highlighter-rouge">CPU => GPU</code>, <code class="language-plaintext highlighter-rouge">GPU => CPU</code> and <code class="language-plaintext highlighter-rouge">GPU => GPU</code> copies.</p>
Mon, 03 Mar 2025 00:00:00 +0000
https://floooh.github.io/2025/03/03/sokol-gfx-compute-update.html
https://floooh.github.io/2025/03/03/sokol-gfx-compute-update.htmlUpcoming Sokol header API changes (Nov 2024)<p>Update: the ‘bindings cleanup’ update has been merged on 07-Nov-2024</p>
<p>In a couple of days I will merge the next breaking sokol_gfx.h update (aka the
“Bindings Cleanup”). The update also affects sokol-shdc, so if you’re using
sokol-shdc for shader compilation make sure to update that as well.</p>
<ul id="markdown-toc">
<li><a href="#overview" id="markdown-toc-overview">Overview</a></li>
<li><a href="#updated-documentation-and-example-code" id="markdown-toc-updated-documentation-and-example-code">Updated documentation and example code</a> <ul>
<li><a href="#when-using-sokol-shdc" id="markdown-toc-when-using-sokol-shdc">When using sokol-shdc:</a></li>
<li><a href="#when-not-using-sokol-shdc" id="markdown-toc-when-not-using-sokol-shdc">When <em>not</em> using sokol-shdc</a></li>
</ul>
</li>
<li><a href="#change-recipes" id="markdown-toc-change-recipes">Change Recipes</a> <ul>
<li><a href="#when-using-sokol-shdc-1" id="markdown-toc-when-using-sokol-shdc-1">When using sokol-shdc:</a></li>
<li><a href="#when-not-using-sokol-shdc-1" id="markdown-toc-when-not-using-sokol-shdc-1">When <em>not</em> using sokol-shdc:</a></li>
</ul>
</li>
</ul>
<h2 id="overview">Overview</h2>
<p>In general, the update makes the relationship between the shader resource interface
and the sokol-gfx resource binding model more explicit, but also more flexible.
Another motivation for the change was to prepare the sokol-gfx API for compute
shader support.</p>
<p>The root PR is here: <a href="https://github.com/floooh/sokol/pull/1111">https://github.com/floooh/sokol/pull/1111</a>.</p>
<p>The TL;DR is:</p>
<ul>
<li>When using sokol-shdc for shader compilation, the input GLSL source
now requires explicit binding annotations via <code class="language-plaintext highlighter-rouge">layout(binding=N)</code>, where
<code class="language-plaintext highlighter-rouge">N</code> directly maps to bindslot indices in the sokol-gfx resource binding API.</li>
<li>The concept of ‘shader stages’ mostly disappears from the sokol-gfx API,
shader stages are now only a minor detail of the shader interface reflection
information in the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct passed into the <code class="language-plaintext highlighter-rouge">sg_make_shader()</code>
function.</li>
<li>When <em>not</em> using sokol-shdc there’s now an explicit mapping from sokol-gfx bindslots
to 3D backend-specific bindslots. This reduces the sokol-gfx internal
magic for mapping the backend-agnostic sokol-gfx binding model to the specific binding
models of the backend 3D APIs (there <em>are</em> still some restrictions but only
when they allow a more efficient resource binding implementation in sokol-gfx).</li>
</ul>
<p>In general, all changes result in compile errors, and cleaning up the
compile errors by following the ‘change recipes’ below should be enough
to make your existing code work.</p>
<p>The following parts of the public sokol_gfx.h API have changed:</p>
<ul>
<li>In the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct, the nested vertex- and fragment-stage structs
for the image-, sampler- and storage-buffer-bindings have been removed,
and the bindings arrays have moved up into the root struct.</li>
<li>In the <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> call, the shader stage parameter has been removed</li>
<li>The interior of the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct and the typename of nested structs
have changed completely (but if you are using sokol-shdc for shader authoring
you don’t need to worry about that, since sokol-shdc will code-generate
the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct.</li>
<li>A number of public API constants have been removed or renamed (but those
should rarely show up in user code).</li>
<li>The enum items in <code class="language-plaintext highlighter-rouge">sg_shader_stage</code> have been renamed, and those are now
only used in the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct and nowhere else:
<ul>
<li><code class="language-plaintext highlighter-rouge">SG_SHADERSTAGE_VS</code> => <code class="language-plaintext highlighter-rouge">SG_SHADERSTAGE_VERTEX</code></li>
<li><code class="language-plaintext highlighter-rouge">SG_SHADERSTAGE_FS</code> => <code class="language-plaintext highlighter-rouge">SG_SHADERSTAGE_FRAGMENT</code></li>
</ul>
</li>
</ul>
<p>The update also has some minor behaviour changes:</p>
<ul>
<li>Resource bindings can now have gaps, and validation for <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code>
has been relaxed to allow bindslots in the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct to be occupied
even when the current shader doesn’t use those bindings. This allows to use
the same <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct for different but related shader variants.</li>
<li>Likewise, uniform block bindslots can now be explicitly defined in the shaders
which allows to ‘share’ bindslot indices across shaders. Trying to call
<code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> for a bindslot that isn’t used by the current shader
is still an error though (not sure yet if this makes sense, could probably
be relaxed in a later update)</li>
<li>There’s now a new (debug-mode only) error check in <code class="language-plaintext highlighter-rouge">sg_draw()</code> to make sure
that <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> and/or <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> had been called since the
last <code class="language-plaintext highlighter-rouge">sg_apply_pipeline()</code> when required.</li>
</ul>
<h2 id="updated-documentation-and-example-code">Updated documentation and example code</h2>
<blockquote>
<p>NOTE: these links will only be uptodate after <a href="https://github.com/floooh/sokol/pull/1111">PR #1111</a> has been merged.</p>
</blockquote>
<h3 id="when-using-sokol-shdc">When using sokol-shdc:</h3>
<p>Please re-read the sokol-shdc documentation:</p>
<p><a href="https://github.com/floooh/sokol-tools/blob/master/docs/sokol-shdc.md">https://github.com/floooh/sokol-tools/blob/master/docs/sokol-shdc.md</a></p>
<p>Especially the section <code class="language-plaintext highlighter-rouge">Shader Authoring Considerations</code>.</p>
<p>In the <a href="https://github.com/floooh/sokol/blob/master/sokol_gfx.h">sokol_gfx.h header</a>, re-read the documentation header above
the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct.</p>
<p>Check the updated sokol samples here:</p>
<p><a href="https://github.com/floooh/sokol-samples/tree/master/sapp">https://github.com/floooh/sokol-samples/tree/master/sapp</a></p>
<h3 id="when-not-using-sokol-shdc">When <em>not</em> using sokol-shdc</h3>
<p>In the <a href="https://github.com/floooh/sokol/blob/master/sokol_gfx.h">sokol_gfx.h header</a>, re-read the updated documentation section <code class="language-plaintext highlighter-rouge">ON SHADER CREATION</code>.</p>
<p>Next read the updated documentation above the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> and <code class="language-plaintext highlighter-rouge">sg_bindings</code> structs.</p>
<p>Finally check the updated backend-specific samples:</p>
<ul>
<li>for Metal: <a href="https://github.com/floooh/sokol-samples/tree/master/metal">https://github.com/floooh/sokol-samples/tree/master/metal</a></li>
<li>for D3D11: <a href="https://github.com/floooh/sokol-samples/tree/master/d3d11">https://github.com/floooh/sokol-samples/tree/master/d3d11</a></li>
<li>for desktop GL: <a href="https://github.com/floooh/sokol-samples/tree/master/glfw">https://github.com/floooh/sokol-samples/tree/master/glfw</a></li>
<li>for WebGL/GLES3: <a href="https://github.com/floooh/sokol-samples/tree/master/html5">https://github.com/floooh/sokol-samples/tree/master/html5</a></li>
<li>for WebGPU: <a href="https://github.com/floooh/sokol-samples/tree/master/wgpu">https://github.com/floooh/sokol-samples/tree/master/wgpu</a></li>
</ul>
<p>Especially note the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct interiors in the <code class="language-plaintext highlighter-rouge">sg_make_shader()</code> calls.</p>
<h2 id="change-recipes">Change Recipes</h2>
<p>General rule of thumb: fix all places that throw compile errors and
you should be good.</p>
<h3 id="when-using-sokol-shdc-1">When using sokol-shdc:</h3>
<p>First you’ll need to fix your shaders and add explicit binding annotations. When running
sokol-shdc over your current shader code you’ll get errors looking like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error: 'binding' : uniform/buffer blocks require layout(binding=X)
</code></pre></div></div>
<p>…or this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error: 'binding' : sampler/texture/image requires layout(binding=X)
</code></pre></div></div>
<p>To fix those errors for the different resource types add <code class="language-plaintext highlighter-rouge">layout(binding=N)</code> annotations:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">vs_params</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">tex</span><span class="p">;</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">sampler</span> <span class="n">smp</span><span class="p">;</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="n">readonly</span> <span class="n">buffer</span> <span class="n">ssbo</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span>
</code></pre></div></div>
<p>Note that each resource type (uniform blocks, textures, samplers and storage buffers)
has its own bindslot space which is shared across shader stages. Trying to use
bindslot indices outside those ranges, or using the same bindslot for a resource
type in different shader stages will cause a compilation error.</p>
<p>The binding ranges per resource type are:</p>
<ul>
<li>uniform blocks: 0..7</li>
<li>textures: 0..15</li>
<li>samplers: 0..15</li>
<li>storage buffers: 0..7</li>
</ul>
<p>…these are also the maximum number of resources of that type that can be bound
on a shader across all shader stages.</p>
<p>Next fix the compile errors on the CPU side, you should see errors
when initializing an <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct, when calling <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code>
and possibly when setting up vertex attributes in the <code class="language-plaintext highlighter-rouge">sg_pipeline_desc</code>
struct:</p>
<ul>
<li>in the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct, the nested structs for the vertex
and fragment shader stage have been removed, and the former per-stage
binding arrays have moved up into the root</li>
<li>in the <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> call, the shader stage argument has been removed</li>
<li>all code-generated slot constants have new naming schemes (also the vertex
attribute slot constants)</li>
</ul>
<p>For instance if your shader resource interface looks like this:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">@</span><span class="n">vs</span>
<span class="c1">// a vertex shader uniform block</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">vs_params</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span>
<span class="c1">// a vertex shader texture and sampler</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">vs_tex</span><span class="p">;</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">sampler</span> <span class="n">vs_smp</span><span class="p">;</span>
<span class="c1">// a vertex shader storage buffer</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="n">readonly</span> <span class="n">buffer</span> <span class="n">vs_ssbo</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span>
<span class="err">@</span><span class="n">end</span>
<span class="err">@</span><span class="n">fs</span>
<span class="c1">// a fragment shader uniform block</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">fs_params</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span>
<span class="c1">// diffuse, normal and specular textures</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">diffuse_tex</span><span class="p">;</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">specular_tex</span><span class="p">;</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">normal_tex</span><span class="p">;</span>
<span class="c1">// a common sampler for the above textures</span>
<span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">sampler</span> <span class="n">smp</span><span class="p">;</span>
<span class="err">@</span><span class="n">end</span>
</code></pre></div></div>
<p>…the matching <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct on the CPU side needs to look like
this - note how the array indices match the shader <code class="language-plaintext highlighter-rouge">layout(binding=N)</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_bindings</span> <span class="n">bnd</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">vertex_buffer</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">.</span><span class="n">images</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">vs_tex</span><span class="p">,</span>
<span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">diffuse_tex</span><span class="p">,</span>
<span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="n">specular_tex</span><span class="p">,</span>
<span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="n">normal_tex</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">samplers</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">vs_smp</span><span class="p">,</span>
<span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">smp</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">storage_buffers</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">vs_ssbo</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">};</span>
</code></pre></div></div>
<p>…and the <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> calls to write the uniform data for the
<code class="language-plaintext highlighter-rouge">vs_params</code> and <code class="language-plaintext highlighter-rouge">fs_params</code> uniform blocks now look like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_apply_uniforms</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="o">&</span><span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vs_params</span><span class="p">));</span>
<span class="n">sg_apply_uniforms</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">SG_RANGE</span><span class="p">(</span><span class="n">fs_params</span><span class="p">));</span>
</code></pre></div></div>
<p>…instead of hardwired numeric indices you can also use code-generated constants
(note that those have been renamed from a generic <code class="language-plaintext highlighter-rouge">SLOT_*</code> to a per-resource-type
naming scheme):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_bindings</span> <span class="n">bnd</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">vertex_buffer</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">.</span><span class="n">images</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="n">IMG_vs_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">vs_tex</span><span class="p">,</span>
<span class="p">[</span><span class="n">IMG_diffuse_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">diffuse_tex</span><span class="p">,</span>
<span class="p">[</span><span class="n">IMG_specular_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">specular_tex</span><span class="p">,</span>
<span class="p">[</span><span class="n">IMG_normal_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">normal_tex</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">samplers</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="n">SMP_vs_smp</span><span class="p">]</span> <span class="o">=</span> <span class="n">vs_smp</span><span class="p">,</span>
<span class="p">[</span><span class="n">SMP_smp</span><span class="p">]</span> <span class="o">=</span> <span class="n">smp</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">storage_buffers</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="n">SBUF_vs_ssbo</span><span class="p">]</span> <span class="o">=</span> <span class="n">vs_ssbo</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">};</span>
</code></pre></div></div>
<p>…or for the uniform block updates:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_apply_uniforms</span><span class="p">(</span><span class="n">UB_vs_params</span><span class="p">,</span> <span class="o">&</span><span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vs_params</span><span class="p">));</span>
<span class="n">sg_apply_uniforms</span><span class="p">(</span><span class="n">UB_fs_params</span><span class="p">,</span> <span class="o">&</span><span class="n">SG_RANGE</span><span class="p">(</span><span class="n">fs_params</span><span class="p">));</span>
</code></pre></div></div>
<p>…using the code-generated constants has the advantage that changing the
bindslots in the shader code doesn’t require updating the CPU-side code, but other
then that it’s totally fine to use numeric indices.</p>
<p>The naming scheme for the code-generated vertex attribute slots has changed
to use the shader program name for ‘namespacing’ instead of the vertex shader
snippet name.</p>
<p>For instance with the following shader fragment:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">@</span><span class="n">vs</span> <span class="n">vs</span>
<span class="k">in</span> <span class="kt">vec4</span> <span class="n">position</span><span class="p">;</span>
<span class="k">in</span> <span class="kt">vec4</span> <span class="n">color0</span><span class="p">;</span>
<span class="p">...</span>
<span class="err">@</span><span class="n">end</span>
<span class="err">@</span><span class="n">fs</span> <span class="n">fs</span>
<span class="p">...</span>
<span class="err">@</span><span class="n">end</span>
<span class="err">@</span><span class="n">program</span> <span class="n">cube</span> <span class="n">vs</span> <span class="n">fs</span>
</code></pre></div></div>
<p>The generated vertex attribute slot constants <code class="language-plaintext highlighter-rouge">ATTR_*</code> previously looked like this
(in the sg_pipeline_desc struct):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_pipeline_desc</span> <span class="n">desc</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">layout</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">attrs</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="n">ATTR_vs_position</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">[</span><span class="n">ATTR_vs_color0</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="p">...</span>
<span class="p">};</span>
</code></pre></div></div>
<p>…now the <code class="language-plaintext highlighter-rouge">ATTR_*</code> names look like this (e.g. <code class="language-plaintext highlighter-rouge">ATTR_vs_*</code> to <code class="language-plaintext highlighter-rouge">ATTR_cube_*</code>):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_pipeline_desc</span> <span class="n">desc</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">layout</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">attrs</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="n">ATTR_cube_position</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">[</span><span class="n">ATTR_cube_color0</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="p">...</span>
<span class="p">};</span>
</code></pre></div></div>
<p>…it’s also possible to use explicit attribute locations and ignore
the code-generated constants, for instance:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">@</span><span class="n">vs</span> <span class="n">vs</span>
<span class="k">layout</span><span class="p">(</span><span class="n">location</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">in</span> <span class="kt">vec4</span> <span class="n">position</span><span class="p">;</span>
<span class="k">layout</span><span class="p">(</span><span class="n">location</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">in</span> <span class="kt">vec4</span> <span class="n">color0</span><span class="p">;</span>
<span class="p">...</span>
<span class="err">@</span><span class="n">end</span>
</code></pre></div></div>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_pipeline_desc</span> <span class="n">desc</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">layout</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">attrs</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="p">...,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="p">...</span>
<span class="p">};</span>
</code></pre></div></div>
<p>…note though that it’s still not allowed to have gaps in the vertex
attribute slots (this may be supported at a later time).</p>
<h3 id="when-not-using-sokol-shdc-1">When <em>not</em> using sokol-shdc:</h3>
<p>The interior of <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> has changed to match the new
‘shader-stage-agnostic’ sokol-gfx binding model. The toplevel-structure
now looks like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_shader_desc</span> <span class="n">desc</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">vertex_func</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// vertex shader source or bytecode</span>
<span class="p">.</span><span class="n">fragment_func</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// fragment shader source or bytecode</span>
<span class="p">.</span><span class="n">attrs</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// vertex attribute reflection info</span>
<span class="p">.</span><span class="n">uniform_blocks</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// reflection info for uniform block bindings</span>
<span class="p">.</span><span class="n">storage_buffers</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// reflection info for storage buffer bindings</span>
<span class="p">.</span><span class="n">images</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// reflection info for texture bindings</span>
<span class="p">.</span><span class="n">samplers</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// reflection info for sampler bindings</span>
<span class="p">.</span><span class="n">image_sampler_pairs</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// how images and samplers are used together in the shader</span>
<span class="p">};</span>
</code></pre></div></div>
<p>The array indices in the <code class="language-plaintext highlighter-rouge">uniform_blocks[]</code> array match the <code class="language-plaintext highlighter-rouge">ub_slot</code> parameter
in the <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> call:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sg_shader_desc.uniform_blocks[N] => sg_apply_uniforms(N, ...)
</code></pre></div></div>
<p>The array indices in the <code class="language-plaintext highlighter-rouge">storage_buffers[]</code>, <code class="language-plaintext highlighter-rouge">images[]</code> and <code class="language-plaintext highlighter-rouge">samplers[]</code> arrays
match the respective indices in the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sg_shader_desc.images[N] => sg_bindings.images[N]
sg_shader_desc.samplers[N] => sg_bindings.samplers[N]
sg_shader_desc.storage_buffers[N] => sg_bindings.storage_buffers[N]
</code></pre></div></div>
<p>Fields that are only required for a specific 3D backend now have consistent
prefixes:</p>
<ul>
<li>D3D11/HLSL: <code class="language-plaintext highlighter-rouge">hlsl_*</code></li>
<li>GL/GLSL: <code class="language-plaintext highlighter-rouge">glsl_*</code></li>
<li>Metal/MSL: <code class="language-plaintext highlighter-rouge">msl_*</code></li>
<li>WebGPU/WGSL: <code class="language-plaintext highlighter-rouge">wgsl_*</code></li>
</ul>
<p>The resource binding slots now require two new types of information:</p>
<ul>
<li>the shader stage this resource binding appears on</li>
<li>a 3D backend specific bindslot</li>
</ul>
<p>The backend specific bindslot struct members need to be filled with the
shader language specific resource bindslot numbers which also need to
lie within specific ranges:</p>
<ul>
<li>for uniform block items:
<ul>
<li><code class="language-plaintext highlighter-rouge">.hlsl_register_b_n = N;</code> <= HLSL <code class="language-plaintext highlighter-rouge">register(bN)</code> where <code class="language-plaintext highlighter-rouge">(N >= 0) && (N < 8)</code></li>
<li><code class="language-plaintext highlighter-rouge">.msl_buffer_n = N;</code> <= >MSL <code class="language-plaintext highlighter-rouge">[[buffer(N)]]</code> where <code class="language-plaintext highlighter-rouge">(N >= 0) && (N < 8)</code></li>
<li><code class="language-plaintext highlighter-rouge">.wgsl_group0_binding_n = N;</code> <= WGSL <code class="language-plaintext highlighter-rouge">@group(0) @binding(N)</code> where <code class="language-plaintext highlighter-rouge">(N >= 0) && (N < 8)</code></li>
</ul>
</li>
<li>for images:
<ul>
<li><code class="language-plaintext highlighter-rouge">.hlsl_register_t_n = N;</code> <= HLSL <code class="language-plaintext highlighter-rouge">register(tN)</code> where <code class="language-plaintext highlighter-rouge">(N >= 0) && (N < 24)</code></li>
<li><code class="language-plaintext highlighter-rouge">.msl_texture_n = N;</code> <= MSL <code class="language-plaintext highlighter-rouge">[[texture(N)]]</code> where <code class="language-plaintext highlighter-rouge">(N >= 0) && (N < 16)</code></li>
<li><code class="language-plaintext highlighter-rouge">.wgsl_group1_binding_n = N;</code> <= WGSL <code class="language-plaintext highlighter-rouge">@group(1) @binding(N)</code> where <code class="language-plaintext highlighter-rouge">(N >= 0) && (N < 128)</code></li>
</ul>
</li>
<li>for samplers:
<ul>
<li><code class="language-plaintext highlighter-rouge">.hlsl_register_s_n = N;</code> <= HLSL <code class="language-plaintext highlighter-rouge">register(sN)</code> where <code class="language-plaintext highlighter-rouge">(N >= 0) && (N < 16)</code></li>
<li><code class="language-plaintext highlighter-rouge">.msl_sampler_n = N;</code> <= MSL <code class="language-plaintext highlighter-rouge">[[sampler(N)]]</code> where <code class="language-plaintext highlighter-rouge">(N >= 0) && (N < 16)</code></li>
<li><code class="language-plaintext highlighter-rouge">.wgsl_group1_binding_n = N;</code> <= WGSL <code class="language-plaintext highlighter-rouge">@group(1) @binding(N)</code> where <code class="language-plaintext highlighter-rouge">(N >= 0) && (N < 128)</code></li>
</ul>
</li>
<li>for storage buffers:
<ul>
<li><code class="language-plaintext highlighter-rouge">.hlsl_register_t_n = N;</code> <= HLSL <code class="language-plaintext highlighter-rouge">register(tN)</code> where <code class="language-plaintext highlighter-rouge">(N >= 0) && (N < 24)</code></li>
<li><code class="language-plaintext highlighter-rouge">.msl_register_b_n = N;</code> <= MSL <code class="language-plaintext highlighter-rouge">[[buffer(N)]]</code> where <code class="language-plaintext highlighter-rouge">(N >= 8) && (N < 16)</code></li>
<li><code class="language-plaintext highlighter-rouge">.wgsl_group1_binding_n = N;</code> <= WGSL <code class="language-plaintext highlighter-rouge">@group(1) @binding(N)</code> where <code class="language-plaintext highlighter-rouge">(N >= 0) && (N < 128)</code></li>
<li><code class="language-plaintext highlighter-rouge">.glsl_binding_n = N;</code> <= GLSL <code class="language-plaintext highlighter-rouge">layout(binding=N)</code> where <code class="language-plaintext highlighter-rouge">(N >= 0) && (N < 16)</code></li>
</ul>
</li>
</ul>
<p>These backend-specific bindslots allow a more flexible mapping from the sokol-gfx
resource binding model to the backend 3D-API binding models, but there are still
some restrictions (which typically exist to allow a more efficient resource binding implementation
in sokol_gfx.h):</p>
<ul>
<li>in WebGPU/WGSL, all uniform blocks must be in <code class="language-plaintext highlighter-rouge">@group(0)</code> and all other
resource types in <code class="language-plaintext highlighter-rouge">@group(1)</code></li>
<li>in Metal/MSL, the <code class="language-plaintext highlighter-rouge">[[buffer(N)]]</code> slots 0..7 are reserved for uniform blocks,
and <code class="language-plaintext highlighter-rouge">[[buffer(N)]]</code> slots 8..15 are reserved for storage buffers</li>
</ul>
<p>For code examples, check out the backend-specific samples:</p>
<ul>
<li>for Metal: <a href="https://github.com/floooh/sokol-samples/tree/master/metal">https://github.com/floooh/sokol-samples/tree/master/metal</a></li>
<li>for D3D11: <a href="https://github.com/floooh/sokol-samples/tree/master/d3d11">https://github.com/floooh/sokol-samples/tree/master/d3d11</a></li>
<li>for desktop GL: <a href="https://github.com/floooh/sokol-samples/tree/master/glfw">https://github.com/floooh/sokol-samples/tree/master/glfw</a></li>
<li>for WebGL/GLES3: <a href="https://github.com/floooh/sokol-samples/tree/master/html5">https://github.com/floooh/sokol-samples/tree/master/html5</a></li>
<li>for WebGPU: <a href="https://github.com/floooh/sokol-samples/tree/master/wgpu">https://github.com/floooh/sokol-samples/tree/master/wgpu</a></li>
</ul>
<p>…and that should be it! Next big thing on the roadmap: compute shader support :)</p>
Mon, 04 Nov 2024 00:00:00 +0000
https://floooh.github.io/2024/11/04/sokol-fall-2024-update.html
https://floooh.github.io/2024/11/04/sokol-fall-2024-update.htmlZig and Emulators<p>Some quick Zig feedback in the context of a new 8-bit emulator project I started
a little while ago:</p>
<p><a href="https://github.com/floooh/chipz">https://github.com/floooh/chipz</a></p>
<p>Currently the project consists of:</p>
<ul>
<li>a cycle-stepped Z80 CPU emulator (similar to the emulator described
here: <a href="https://floooh.github.io/2021/12/17/cycle-stepped-z80.html">https://floooh.github.io/2021/12/17/cycle-stepped-z80.html</a></li>
<li>chip emulators for Z80 PIO, Z80 CTC and three variants of the AY-3-8910 sound chip</li>
<li>system emulators for Bombjack, Pengo and Pacman arcade machines,
and the East German KC85/2../4 home computer series</li>
<li>a code generation tool to create the Z80 instruction decoder code block</li>
<li>various tests to check Z80 emulation correctness</li>
</ul>
<p>With the exception of an external C dependency for ‘host system glue’ (the
cross-platform <a href="https://github.com/floooh/sokol-zig">sokol headers</a> used for
wrapping the platform-specific windowing, input, rendering and audio-output
code), the project is around 16 kloc of pure Zig code.</p>
<p>I’m not yet sure how this new project will evolve in relation to the <a href="https://github.com/floooh/chips">original C/C++ ‘chips’ emulator project</a>, but I expect
that the Zig project will overtake the C/C++ project at some point in the future.</p>
<h2 id="dev-environment">Dev Environment</h2>
<p>I’m coding on an M1 Mac in VSCode with the <a href="https://marketplace.visualstudio.com/items?itemName=ziglang.vscode-zig">Zig Language Extension</a>, and <a href="https://marketplace.visualstudio.com/items?itemName=vadimcn.vscode-lldb">CodeLLDB</a>
for step-debugging.</p>
<p>The Zig and ZLS (Zig Language Server) installation is managed with <a href="https://github.com/tristanisham/zvm">ZVM</a>.</p>
<p>For the most part this setup works pretty well, with a few tweaks:</p>
<ul>
<li>I’m doing ‘build-on-save’ to get more complete error information as described here:
<a href="https://kristoff.it/blog/improving-your-zls-experience/">Improving Your Zig Language Server Experience</a>
(I’m not bothering with creating separate non-install build targets though)</li>
<li>With the default Zig VSCode extension settings I was seeing that in long coding
session (5..6 hours or so) saving would take longer and longer until it would
eventually get stuck. After asking around on the Zig Discord this could be solved
by explicitly setting the Zig Language Server as ‘VSCode Formatting Provider’
in the Zig Extension settings.</li>
<li>When debugging, there’s a somewhat annoying issue that the debug line information
seems to be off in some places, the debugger appears to step into the last
line of an inactive if-else block for instance. Again, Discord to the rescue,
this seems to be a known issue.</li>
</ul>
<p>All in all, not yet perfect, but good enough to get shit done.</p>
<h2 id="zig-comptime-and-generics">Zig Comptime and Generics</h2>
<p>Before diving into language details, I’ll need to provide some minimal
background information of how the chipz emulators work:</p>
<p>Microchips of the 70s and 80s were very much like ‘software libraries, but
implemented in hardware’, they followed a minimal standard for interoperability
so that chips from different manufacturers could be combined into computer
systems without requiring too much custom glue logic between them. I think
it’s fair to say that this ‘competition through interoperability’ was the main
driver for the Cambrian Explosion of cheap 8-bit computer systems in the
70s and 80s.</p>
<p>Microchips communicate with the outside world via input/output pins, and a
typical 8-bit home computer system is essentially just a handful of microchips
talking to each other through their ‘pin API’.</p>
<p>The chipz project follows that same idea: The basic building blocks are
self-contained chip emulators which communicate with other chip emulators via
virtual input/output pins which are mapped to bits in an integer.</p>
<p>Chips of that era typically had up to 40 pins which makes them a good
fit for 64-bit integers used in today’s CPUs.</p>
<p>The API of such a chip emulator only has one important function:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="n">tick</span><span class="p">(</span><span class="n">pins</span><span class="p">:</span> <span class="kt">u64</span><span class="p">)</span> <span class="kt">u64</span>
</code></pre></div></div>
<p>This tick function executes exactly one clock cycle, it takes an integer
as input where the bits represent input/output pins, and returns
that same integer with modified bits.</p>
<p>Fitting a CPU emulator into such a ‘cycle-stepped model’ can be a bit of
a challenge and is described in these blog posts (for the 6502 and Z80):</p>
<ul>
<li>
<p><a href="https://floooh.github.io/2019/12/13/cycle-stepped-6502.html">A new cycle-stepped 6502 CPU emulator</a></p>
</li>
<li>
<p><a href="https://floooh.github.io/2021/12/17/cycle-stepped-z80.html">A new cycle-stepped Z80 emulator</a></p>
</li>
</ul>
<p>A whole computer system is then emulated by writing a ‘system tick function’
which emulates a single clock cycle for the whole system by calling the
tick functions of each chip emulator and passing pin-state integers
from one chip emulator to the next.</p>
<p>There’s two related problems to solve with the above approach:</p>
<ul>
<li>There’s not enough bits in a 64-bit integer to assign one bit for each
inter-chip connection of a complete computer system. This means a system
tick function will need to maintain one pin-state integer for each chip, and
shuffle bits around before each chip’s tick function is called.</li>
<li>For direct pin-to-pin connections it makes sense to assign the same bit position
in different chip emulators to avoid ‘runtime bit shuffling’ from an output
pin position of one chip to a different input pin position of another chip. Those
direct pin-to-pin connections are different in each emulated computer
system, so to make this idea work a specialized chip emulator needs to be
‘stamped out’ for each computer system.</li>
</ul>
<p>Both problems can be solved quite elegantly in Zig:</p>
<ul>
<li>Instead of 64-bit integers for the pin-state we can switch to wide integers
(u128, u192, u256, …) with enough bits to assign each chip in a system
its own reserved bit range instead of juggling with multiple 64-bit integers.</li>
<li>With Zig’s comptime generics it’s possible to stamp out chip emulators
which are specialized by a specific mapping of pins to bit positions in the
shared wide integer.</li>
</ul>
<p>This means a chip emulator is specialized by two comptime configuration values:</p>
<ul>
<li>a <code class="language-plaintext highlighter-rouge">Bus</code> type which is an unsigned integer type with enough bits for all pin-to-pin
connections in a system</li>
<li>a <code class="language-plaintext highlighter-rouge">Pins</code> structure which defines a bit position for each input/output pin
of a chip emulator</li>
</ul>
<p>For the Z80 CPU emulator this pin definition struct looks like this:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">const</span> <span class="n">Pins</span> <span class="o">=</span> <span class="k">struct</span> <span class="p">{</span>
<span class="n">DBUS</span><span class="p">:</span> <span class="p">[</span><span class="mi">8</span><span class="p">]</span><span class="nb">comptime_int</span><span class="p">,</span>
<span class="n">ABUS</span><span class="p">:</span> <span class="p">[</span><span class="mi">16</span><span class="p">]</span><span class="nb">comptime_int</span><span class="p">,</span>
<span class="n">M1</span><span class="p">:</span> <span class="nb">comptime_int</span><span class="p">,</span>
<span class="n">MREQ</span><span class="p">:</span> <span class="nb">comptime_int</span><span class="p">,</span>
<span class="n">IORQ</span><span class="p">:</span> <span class="nb">comptime_int</span><span class="p">,</span>
<span class="c">// ...more pins...</span>
<span class="p">};</span>
</code></pre></div></div>
<p>…which is used as nested struct in a <code class="language-plaintext highlighter-rouge">TypeConfig</code> struct which holds
all generic parameters to stamp out a specialized Z80 emulator:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">const</span> <span class="n">TypeConfig</span> <span class="o">=</span> <span class="k">struct</span> <span class="p">{</span>
<span class="n">pins</span><span class="p">:</span> <span class="n">Pins</span><span class="p">,</span>
<span class="n">bus</span><span class="p">:</span> <span class="k">type</span><span class="p">,</span>
<span class="p">};</span>
</code></pre></div></div>
<p>This <code class="language-plaintext highlighter-rouge">TypeConfig</code> struct is used as parameter for a comptime Zig function
which returns a specialized type (this is how Zig does generics):</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="n">Type</span><span class="p">(</span><span class="k">comptime</span> <span class="n">cfg</span><span class="p">:</span> <span class="n">TypeConfig</span><span class="p">)</span> <span class="k">type</span> <span class="p">{</span>
<span class="k">return</span> <span class="k">struct</span> <span class="p">{</span>
<span class="c">// the returned struct is a new type which is comptime-configured</span>
<span class="c">// by the 'cfg' type configuration parameter</span>
<span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>
<p>…now we can stamp out a Z80 CPU emulator that’s specialized for a specific
computer system by the system bus integer type and the Z80 pins mapped to
specific bit positions of this integer type:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">z80</span> <span class="o">=</span> <span class="nb">@import</span><span class="p">(</span><span class="s">"z80"</span><span class="p">);</span>
<span class="k">const</span> <span class="n">Z80</span> <span class="o">=</span> <span class="n">z80</span><span class="p">.</span><span class="nf">Type</span><span class="p">(</span><span class="o">.</span><span class="p">{</span>
<span class="p">.</span><span class="py">bus</span> <span class="o">=</span> <span class="kt">u128</span><span class="p">,</span>
<span class="p">.</span><span class="py">pins</span> <span class="o">=</span> <span class="o">.</span><span class="p">{</span>
<span class="p">.</span><span class="py">DBUS</span> <span class="o">=</span> <span class="o">.</span><span class="p">{</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span> <span class="p">},</span>
<span class="p">.</span><span class="py">ABUS</span> <span class="o">=</span> <span class="o">.</span><span class="p">{</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="c">// ... },</span>
<span class="c">// ...</span>
<span class="p">}</span>
<span class="p">});</span>
</code></pre></div></div>
<p>This specific <code class="language-plaintext highlighter-rouge">Z80</code> type uses a 128-bit pin-state integer and maps its own
pins to bit positions starting at bit 0, with the first 8 bits being the
data bus (most other chips in any computer system will also map their
data bus pins to the same bit range, since the data bus is usually shared
between all chips in a system).</p>
<p>Note that <code class="language-plaintext highlighter-rouge">Z80</code> is just a type, not a runtime object. To get a default-initialized
Z80 CPU object:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">var</span> <span class="n">cpu</span> <span class="o">=</span> <span class="n">Z80</span><span class="p">{};</span>
</code></pre></div></div>
<p>This example doesn’t look like much, it’s “just Zig code” after all, but this
is exactly what makes generic programming in Zig so elegant and powerful.</p>
<p>Arbitrarily complex comptime config options can be ‘baked’ into types,
and dynamic runtime configuration options can be passed in a ‘construction’ function
on that type, and all is just regular Zig code from top to bottom:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">var</span> <span class="n">obj</span> <span class="o">=</span> <span class="n">Type</span><span class="p">(</span><span class="o">.</span><span class="p">{</span>
<span class="c">// comptime options...</span>
<span class="p">.</span><span class="py">bus</span> <span class="o">=</span> <span class="kt">u128</span><span class="p">,</span>
<span class="p">.</span><span class="py">pins</span> <span class="o">=</span> <span class="o">.</span><span class="p">{</span> <span class="o">...</span> <span class="p">},</span>
<span class="p">}).</span><span class="nf">init</span><span class="p">(</span><span class="o">.</span><span class="p">{</span>
<span class="c">// additional runtime options...</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…and this is just scratching the surface. There’s a couple of really
interesting side effects of this 2-step approach (first build the type,
then build an object from that type):</p>
<ul>
<li>Can use designated-init-syntax for configuring the type which is just *chef’s kiss*
because it makes the code very readable (no guessing what a generic parameter
actually does because the name is right there in the code).</li>
<li>TypeConfig structs can be composed by nesting other TypeConfig structs,
or generic parameters in general, which then can be used to build
types inside types (Yo Dawg…).</li>
<li>It’s possible to build different struct interiors based on comptime
parameters (for instance the different KC85 models have different
runtime-config struct interiors for configuring model-specific features,
which makes ‘accidential misconfiguration’ an immediate compile error).</li>
</ul>
<p>In conclusion, the idea to use Zig’s comptime features to stamp out specialized
per-system chip and system emulators works exceptionally well and is (IMHO)
<em>much</em> more enjoyable than C++ or Rust generic programming (I’m sure C++ and
Rust can do the same things with sufficient template magic, but this code
definitely won’t look as straightforward as the Zig version).</p>
<h2 id="bit-twiddling-and-integer-math-can-be-awkward">Bit Twiddling and Integer Math can be awkward</h2>
<p>This section is hard to write because it’s criticizing without offering an
obviously better solution, please read it as ‘constructive criticism’. Hopefully Zig will
be able to fix some of those things on the road towards 1.0.</p>
<p>Zig’s integer handling is quite different from C:</p>
<ul>
<li>arbitrary bit-width integers are the norm, not the exception</li>
<li>there is no concept of integer promotion in math expressions
(not that I noticed at least)</li>
<li>implicit conversion between different integer types is only
allowed when no data loss can happen (e.g. an u8 can be assigned to an
u16, but assigning an u16 to an u8 requires an explicit cast)</li>
<li>mixing signed and unsigned values in expressions isn’t allowed</li>
<li>overflow is checked in Debug and ReleaseSafe mode, and there are separate
operators for ‘intended wraparound’</li>
</ul>
<p>At first glance these features look pretty nice because they fix some obvious
footguns in C and C++. Arbitrary width integer types are especially useful for
emulator code, because hardware chips are full of ‘odd-width’ counters and
registers (3, 5, 20 bits etc…). Directly mapping such registers to types like
u3, u5 or u20 should potentially allow for more readable and ‘expressive’ code.</p>
<p>Unfortunately, in reality it’s not so clear cut. While C is definitely too
sloppy when it comes to integer math, Zig might swing the pendulum a bit too
far into the other direction by requiring too much explicit casting.</p>
<p>The most extreme example I stumbled over was implementing the Z80’s indexed
addressing mode (e.g. those instructions involving <code class="language-plaintext highlighter-rouge">(IX+d)</code> or <code class="language-plaintext highlighter-rouge">(IY+d)</code>. This
takes the byte <code class="language-plaintext highlighter-rouge">d</code> and adds it as a signed quantity and with wraparound to a 16
bit address (e.g. the byte is sign-extended to a 16-bit value before the
addition).</p>
<p>In C this is quite straightforward:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint16_t</span> <span class="nf">addi8</span><span class="p">(</span><span class="kt">uint16_t</span> <span class="n">addr</span><span class="p">,</span> <span class="kt">uint8_t</span> <span class="n">offset</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">addr</span> <span class="o">+</span> <span class="p">(</span><span class="kt">int8_t</span><span class="p">)</span><span class="n">offset</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The simplest way I could come up with to do the same in Zig is:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">addi8</span><span class="p">(</span><span class="n">addr</span><span class="p">:</span> <span class="kt">u16</span><span class="p">,</span> <span class="n">offset</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u16</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">addr</span> <span class="o">+%</span> <span class="nb">@as</span><span class="p">(</span><span class="kt">u16</span><span class="p">,</span> <span class="nb">@bitCast</span><span class="p">(</span><span class="nb">@as</span><span class="p">(</span><span class="kt">i16</span><span class="p">,</span> <span class="nb">@as</span><span class="p">(</span><span class="kt">i8</span><span class="p">,</span> <span class="nb">@bitCast</span><span class="p">(</span><span class="n">offset</span><span class="p">)))));</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Note how the integer conversion gets totally drowned in ‘@-litter’.</p>
<p>Both functions result in the same x86 and ARM assembly output (with -O3 for C
and any of the Release modes in Zig):</p>
<pre><code class="language-assembly">addi8:
movsx eax, sil ; move low byte of esi into eax with sign-extension
add eax, edi ; eax += edi
ret
</code></pre>
<p>For ARM (looks like ARM handles the sign-extension right in the add instruction, not very RISC-y but neat!):</p>
<pre><code class="language-assembly">addi8:
add w0, w0, w1, sxtb
ret
</code></pre>
<p>IMHO when the assembly output of a compiler looks so much more straightforward
than the high level compiler input, it becomes a bit hard to justify why
high level programming languages had been invented in the first place ;)</p>
<p>Apart from that extreme case (which only exists once in the whole code
base), narrowing conversions are much more common when writing code that
mixes different integer widths, and those narrowing conversions require
explicit casts, and those explicit casts may reduce readability quite
a bit.</p>
<p>The basic idea to only allow implicit conversions that can’t lose data
is definitely a good one, but very often a cast is required even though the
compiler has all the information it needs at compile time to prove that no
information is lost.</p>
<p>For instance this Zig code currently is an error:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">trunc4</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u4</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">val</span> <span class="o">&</span> <span class="mi">0xF</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The expression result would fit into an u4, yet an <code class="language-plaintext highlighter-rouge">@intCast</code> or
<code class="language-plaintext highlighter-rouge">@truncate</code> is required to make it work:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">trunc4</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u4</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">@intCast</span><span class="p">(</span><span class="n">val</span> <span class="o">&</span> <span class="mi">0xF</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Similar situation with a right-shift:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">broken</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u4</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">val</span> <span class="o">>></span> <span class="mi">4</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="n">works</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u4</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">@truncate</span><span class="p">(</span><span class="n">val</span> <span class="o">>></span> <span class="mi">4</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Somewhat surprisingly, this works fine though:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">const</span> <span class="n">a</span><span class="p">:</span> <span class="kt">u8</span> <span class="o">=</span> <span class="mi">0xFF</span><span class="p">;</span>
<span class="k">const</span> <span class="n">b</span><span class="p">:</span> <span class="kt">u4</span> <span class="o">=</span> <span class="n">a</span> <span class="o">&</span> <span class="mi">0xF</span><span class="p">;</span>
<span class="k">const</span> <span class="n">c</span><span class="p">:</span> <span class="kt">u4</span> <span class="o">=</span> <span class="n">a</span> <span class="o">>></span> <span class="mi">4</span><span class="p">;</span>
</code></pre></div></div>
<p>A similar problem exists with loop variables, which are always of type usize and
which need to be explicitly narrowed even if the loop count is guaranteed to
fit into a smaller type:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="mi">0</span><span class="o">..</span><span class="mi">16</span><span class="p">)</span> <span class="p">|</span><span class="mi">_</span><span class="n">i</span><span class="p">|</span> <span class="p">{</span>
<span class="k">const</span> <span class="n">i</span><span class="p">:</span> <span class="kt">u4</span> <span class="o">=</span> <span class="nb">@intCast</span><span class="p">(</span><span class="mi">_</span><span class="n">i</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>There’s also surprising cases like this:</p>
<p>Assuming that:</p>
<ul>
<li>a: u16 = 0xF000</li>
<li>b: u16 = 0x1000</li>
<li>c: u32 = 0x10000</li>
</ul>
<p>This expression creates an overflow error:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">const</span> <span class="n">d</span> <span class="o">=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">+</span> <span class="n">c</span><span class="p">;</span>
</code></pre></div></div>
<p>…but this doesn’t:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">const</span> <span class="n">e</span> <span class="o">=</span> <span class="n">c</span> <span class="o">+</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">;</span>
</code></pre></div></div>
<p>The type of <code class="language-plaintext highlighter-rouge">d</code> and <code class="language-plaintext highlighter-rouge">e</code> is both <code class="language-plaintext highlighter-rouge">u32</code> btw (which I find also a bit surprising,
it means that Zig already picks the widest input type as the result type, but
it doesn’t promote the other inputs to this widest type).</p>
<p>And here’s another surprising behaviour I stumbled over:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// self.sprite_coords[] is an array of bytes</span>
<span class="k">const</span> <span class="n">px</span><span class="p">:</span> <span class="kt">usize</span> <span class="o">=</span> <span class="mi">272</span> <span class="o">-</span> <span class="n">self</span><span class="p">.</span><span class="py">sprite_coords</span><span class="p">[</span><span class="n">sprite_index</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
</code></pre></div></div>
<p>This produces the error <code class="language-plaintext highlighter-rouge">error: type 'u8' cannot represent integer value '272'</code>.
Why Zig tries to fit the constant 272 into an u8 instead of picking a wider type
is a bit of a mystery tbh.</p>
<p>One solution is to widen the value read from the array:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">px</span><span class="p">:</span> <span class="kt">usize</span> <span class="o">=</span> <span class="mi">272</span> <span class="o">-</span> <span class="nb">@as</span><span class="p">(</span><span class="kt">usize</span><span class="p">,</span> <span class="n">self</span><span class="p">.</span><span class="py">sprite_coords</span><span class="p">[</span><span class="n">sprite_index</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]);</span>
</code></pre></div></div>
<p>But this works too:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">px</span><span class="p">:</span> <span class="kt">usize</span> <span class="o">=</span> <span class="nb">@as</span><span class="p">(</span><span class="kt">u9</span><span class="p">,</span> <span class="mi">272</span><span class="p">)</span> <span class="o">-</span> <span class="n">self</span><span class="p">.</span><span class="py">sprite_coords</span><span class="p">[</span><span class="n">sprite_index</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
</code></pre></div></div>
<p>In conclusion, I only understood that C’s integer promotion actually has an
important purpose after missing it so badly in Zig :D</p>
<p>I think C’s main problem with integer promotion is that it promotes to <code class="language-plaintext highlighter-rouge">int</code>,
and int being stuck at 32-bits even on 64-bit CPUs (not moving the <code class="language-plaintext highlighter-rouge">int</code> type
to 64 bits during the transition from 32- to 64-bit CPUs was a pretty stupid
decision in hindsight).</p>
<p>TBF though, just extending to the natural word size (e.g. 64 bits) wouldn’t
help much in Zig when using wide integers like u128.</p>
<p>In any case, I hope that the current status quo isn’t what ends up in Zig 1.0
and that a way can be found to reduce ‘@-litter’ in mixed-width integer expressions
without going back entirely to C’s admittedly too sloppy integer promotion and
implicit conversion rules.</p>
<p>Asking around on the Zig Discord there seems to be a proposal which lets
operators narrow the result type for comptime known values (which if I understand
it right would make the result type of the expression <code class="language-plaintext highlighter-rouge">a & 0xF</code> a <code class="language-plaintext highlighter-rouge">u4</code> instead of
whatever wider type <code class="language-plaintext highlighter-rouge">a</code> is).</p>
<p>Another idea that might make sense is to promote integers to the widest
input type. Currently the compiler already seems to use the widest
input type in an expression as result type, promoting the other
inputs to this widest type looks like a logical step to me.</p>
<p>I would keep the strict separation of signed and unsigned integer types
though, e.g. mixed-sign expressions are not allowed, and any theoretical
integer promotion should never happen ‘across signedness’.</p>
<p>From my own experience in C (where I don’t allow implicit sign-conversion
via -Wsign-conversion warnings) I can tell that this will feel painful
in the beginning for C and C++ coders, but it makes for better code and API
design in the long run.</p>
<p>This experience (of transitioning to more restrictive but also more correct C
code by enabling certain warnings) is also why I’m giving Zig some slack about
its integer conversion strictness. After all, maybe I’m just not used to it
yet. But OTH, I have by now written enough Zig code that I should slowly get
used to it, but it <em>still</em> feels bumpy. All in all I think this is an area
where ‘strict design purity’ can harm the language in the long run though, and
a better balance should be found between strictness, coding convenience and
readability.</p>
<h2 id="using-wide-integers-with-bit-twiddling-code-is-fast">Using wide integers with bit twiddling code is fast</h2>
<p>Using a 128 bit integer variable for the emulator system bus works
nicely and doesn’t have a relevant performance impact. In fact, with a bit of
care (by not using bit twiddling operations that cross a 64-bit boundary) the
produced assembly code is identical to doing the same operation on a simple
64-bit variable.</p>
<p>For instance extracting an 8-bit value from the upper half of an 128-bit integer:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">getu8</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u128</span><span class="p">)</span> <span class="kt">u8</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">@truncate</span><span class="p">(</span><span class="n">val</span> <span class="o">>></span> <span class="mi">64</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>…is just moving the register which holds the upper 64 bits into the return
value register:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>getu8:
mov rax, rsi
ret
</code></pre></div></div>
<p>…which is the same cost as extracting an 8-bit value from a 64-bit variable:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">getu8</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u64</span><span class="p">)</span> <span class="kt">u8</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">@truncate</span><span class="p">(</span><span class="n">val</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>getu8:
mov rax, rdi
ret
</code></pre></div></div>
<p>…just make sure that the operation doesn’t cross 64-bit boundaries:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">getu8</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u128</span><span class="p">)</span> <span class="kt">u8</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">@truncate</span><span class="p">(</span><span class="n">val</span> <span class="o">>></span> <span class="mi">60</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>…because this now involves actual bit twiddling:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>getu8:
shl esi, 4
shr rdi, 60
lea eax, [rdi + rsi]
ret
</code></pre></div></div>
<h2 id="debug-performance">Debug Performance</h2>
<p>Release performance of my C emulator code (with -O3) and my Zig code (with
-ReleaseFast) is roughly in the same ballpark, but I’m seeing a pretty big
difference in Debug performance:</p>
<ul>
<li>in C, debug performance is roughly 2x slower than -O3</li>
<li>in Zig, debug performance is roughly 3..4x slower than ReleaseFast</li>
</ul>
<p>I haven’t figured out why yet, but it’s not the most obvious candidate (range and
overflow checks) since ReleaseSafe performance is nearly identical with ReleaseFast
(interestingly ReleaseSmall is the slowest Release build config, it’s about 40% slower
than both ReleaseFast and ReleaseSafe).</p>
<p>One important difference between my C and Zig code is that in C I’m using tons
of small preprocessor macros to make bit twiddling expressions more readable.
In Zig these are replaced with inline functions (<code class="language-plaintext highlighter-rouge">inline</code> in Zig isn’t just an
optimization hint, it causes the function body to be inlined also in debug
mode).</p>
<p>At first glance Zig’s inline functions seem to be a good replacement for
C preprocessor macros, but when looking at the generated code in debug mode,
the compiler still pushes and pops function arguments through the stack even though
the function body is inlined.</p>
<p>Consider this Zig code:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">inline</span> <span class="k">fn</span> <span class="n">add</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="kt">u8</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u8</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">+%</span> <span class="n">b</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="n">add_1</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="kt">u8</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u8</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">add</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="n">add_2</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="kt">u8</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u8</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">+%</span> <span class="n">b</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>…in release mode, both functions produce the same code as expected:</p>
<pre><code class="language-assembly">add_1:
lea eax, [rsi + rdi]
ret
add_2:
lea eax, [rsi + rdi]
ret
</code></pre>
<p>But in debug mode, the function which calls the inline function has a slightly
higher overhead because of additional stack traffic:</p>
<pre><code class="language-assembly">add_1:
push rbp
mov rbp, rsp
sub rsp, 5
mov cl, sil
mov al, dil
mov byte ptr [rbp - 4], al
mov byte ptr [rbp - 3], cl
mov byte ptr [rbp - 2], al
mov byte ptr [rbp - 1], cl
add al, cl
mov byte ptr [rbp - 5], al
mov al, byte ptr [rbp - 5]
movzx eax, al
add rsp, 5
pop rbp
ret
add_2:
push rbp
mov rbp, rsp
sub rsp, 2
mov cl, sil
mov al, dil
mov byte ptr [rbp - 2], al
mov byte ptr [rbp - 1], cl
add al, cl
movzx eax, al
add rsp, 2
pop rbp
ret
</code></pre>
<p>TBH though it’s unlikely that inline function overhead is the only contributor to the
slower debug performance, but it could be many such small papercuts combined.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I enjoy working with Zig immensely despite the few warts I encountered, for the
most part the code just ‘flows out of the hand’ which IMHO is an
important property of a programming language. It’s encouraging to see how areas
which were a bumpy ride during the 0.10 to 0.11 versions have improved and
stabilized (most importantly the build and package management system).</p>
<p>It’s also interesting how the ‘most popular design fault’ that comes up in every
single Zig discussion (currently that’s ‘unused variables are errors’) is a
complete non-issue (for me at least, not once in that 16-kloc project was that
an annoyance), while the issue that actually mildly annoyed me in real world
code (the <code class="language-plaintext highlighter-rouge">@-litter</code> in mixed-width integer expressions) is still very much
under the radar. Maybe also because mixed-width and bit twiddling code might
not be all that common in typical Zig projects, most integer code is probably
about computing array indices or data offsets and happen in usize.</p>
<p>I also completely left out a whole chapter about code generation with Zig
(which would have been mostly about string processing and memory management), simply
because the blog post would have become too big, and it is probably an
interesting enough topic for its own blog post. This is also an area where Zig is
different enough from C, mid-level languages like C++ or Rust, and high
level memory-managed languages that I don’t feel quite confident enough yet to
have found the right solution to questions like ‘who owns the underlying
memory of a slice returned from a function’ - I have solutions of course,
but I’m not entirely happy with them because it feels like a throwback
to my first forays into C and C++.</p>
<p>In short, I don’t want to burden myself (too much) with memory ownership questions, even
in low level systems programming languages. Typically in C I avoid such
problems with a ‘mostly value-driven approach’ instead of returning references
to data, I return a copy of the data (unless of course it’s about bulk data
like images, 3d meshes, file content etc.. but those are special cases which
are easy to deal with using manual memory management).</p>
<p>Zig is leaning in heavily on slices though, which are just pointer/size pairs
without any concept of ownership. It would be nice if Zig had some syntax sugar
to make working with arrays just as flexible as with slices, because arrays are
value types and avoid all the ownership footguns of slices. I think mostly this
comes down to implementing a handful ‘missing features’ from C99 designated
initialization (like <a href="https://github.com/ziglang/zig/issues/6068">#6068</a>) or maybe even
looking at languages like JS and TS (…shock and gasps from the audience!!! I
know but bear with me) for a couple of features which make working with struct
and array values more convenient (like destructuring and spreading).</p>
<p>…but I’m already halfway into that other blog post which I wanted to
avoid, so let’s end it here lol.</p>
Sat, 24 Aug 2024 00:00:00 +0000
https://floooh.github.io/2024/08/24/zig-and-emulators.html
https://floooh.github.io/2024/08/24/zig-and-emulators.htmlUpcoming Sokol header API changes (May 2024)<p>Aka: “the storage buffer update”</p>
<p>In a couple of days I will merge the next sokol-gfx feature update which adds
initial storage buffer support. The update also affects other headers and tools
(most notably sokol_app.h, all headers with embedded shaders, and sokol-shdc -
the cross-backend shader compiler).</p>
<p>The bad news first:</p>
<ul>
<li>This is ‘gpu-readonly’ support, e.g. it’s not possible (yet) to write to storage buffers
from shader code, gpu-write support will come in a future ‘compute shaders’ update.</li>
<li>The following platform/backend combos don’t get storage buffer support:
<ul>
<li>all GLES3 backends (WebGL2, iOS+GLES3, Android): for WebGL2 and iOS there is no
other choice since they are stuck with GLES 3.0, for Android, storage buffer
support may be added later</li>
<li>macOS+GL: macOS is stuck at GL 4.1, while storage buffers require at least
GL 4.3</li>
</ul>
</li>
<li>This leaves the following platform/backend combos which support storage buffers:
<ul>
<li>macOS + Metal</li>
<li>iOS + Metal</li>
<li>Windows + D3D11</li>
<li>Windows + GL</li>
<li>Linux + GL</li>
<li>Web + WebGPU</li>
</ul>
</li>
</ul>
<p>Storage buffers provide a convenient way to communicate
large array-like data to shaders (the minimum guaranteed size for storage buffers
is 128 MBytes), for instance:</p>
<ul>
<li>for ‘vertex pulling’ to load per-vertex and/or per-instance data from
storage buffers instead of relying on the fixed function vertex input
stage</li>
<li>as a more convenient and flexible way to load random access
data in shaders compared to the old-school way of using
‘data textures’.</li>
</ul>
<p>…and as a ‘drive-by’ feature: sokol-gfx now finally allows to kick off
a draw call without any resource bindings and instead synthesize vertices
‘out of thin air’ in the vertex shader.</p>
<p>The root PR for the update is here: <a href="https://github.com/floooh/sokol/pull/1007">#1007</a>.</p>
<h2 id="new-sample-code">New sample code</h2>
<p>The following backend-agnostic samples have been added (those use sokol_app.h and sokol-shdc).</p>
<blockquote>
<p>NOTE: You’ll need a recent Chrome for the WebGPU sample links to work, also expect
some general breakage and rendering artifacts depending on the platform (for
instance Chrome on Android straight up crashes the tab
on most samples). Also please note that the source code links in those samples
will not be valid until all the update PRs have been merged.</p>
</blockquote>
<ul>
<li><strong>triangle-bufferless-sapp</strong>: this demonstrates rendering without buffers (and
is the only new sample that also works on backends without storage buffer support):
<ul>
<li>WebGPU: <a href="https://floooh.github.io/sokol-webgpu/triangle-bufferless-sapp.html">triangle-bufferless-sapp.html</a></li>
<li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/triangle-bufferless-sapp.c">sapp/triangle-bufferless-sapp.c</a></li>
<li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/triangle-bufferless-sapp.glsl">sapp/triangle-bufferless-sapp.glsl</a></li>
</ul>
</li>
<li><strong>vertexpull-sapp</strong>: the cube-sapp sample ported to vertex pulling:
<ul>
<li>WebGPU: <a href="https://floooh.github.io/sokol-webgpu/vertexpull-sapp.html">vertexpull-sapp.html</a></li>
<li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/vertexpull-sapp.c">sapp/vertexpull-sapp.c</a></li>
<li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/vertexpull-sapp.glsl">sapp/vertexpull-sapp.glsl</a></li>
</ul>
</li>
<li><strong>sbuftex-sapp</strong>: a sample which uses a storage buffer in the fragment shader stage:
<ul>
<li>WebGPU: <a href="https://floooh.github.io/sokol-webgpu/sbuftex-sapp.html">sbuftex-sapp.html</a></li>
<li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/sbuftex-sapp.c">sapp/sbuftex-sapp.c</a></li>
<li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/sbuftex-sapp.glsl">sapp/sbuftex-sapp.glsl</a></li>
</ul>
</li>
<li><strong>instancing-pull-sapp</strong>: vertex pulling and instancing via storage buffers:
<ul>
<li>WebGPU: <a href="https://floooh.github.io/sokol-webgpu/instancing-pull-sapp.html">instancing-pull-sapp.html</a></li>
<li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-pull-sapp.c">sapp/instancing-pull-sapp.c</a></li>
<li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-pull-sapp.glsl">sapp/instancing-pull-sapp.glsl</a></li>
</ul>
</li>
<li><strong>ozz-storagebuffer-sapp</strong>: the ozz-skin sample rewritten to pull vertices, instance- and skinning-matrices from storage buffers:
<ul>
<li>WebGPU: <a href="https://floooh.github.io/sokol-webgpu/ozz-storagebuffer-sapp.html">ozz-storagebuffer-sapp.html</a></li>
<li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/ozz-storagebuffer-sapp.cc">sapp/ozz-storagebuffer-sapp.cc</a></li>
<li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/ozz-storagebuffer-sapp.glsl">sapp/ozz-storagebuffer-sapp.glsl</a></li>
</ul>
</li>
</ul>
<p>The following backend-specific samples demonstrate how to use storage buffers without the sokol-shdc shader compiler:</p>
<ul>
<li><strong>D3D11</strong> <a href="https://github.com/floooh/sokol-samples/blob/master/d3d11/vertexpulling-d3d11.c">d3d11/vertexpulling-d3d11.c</a></li>
<li><strong>Metal</strong>: <a href="https://github.com/floooh/sokol-samples/blob/master/metal/vertexpulling-metal.c">metal/vertexpulling-metal.c</a></li>
<li><strong>WebGPU</strong>: <a href="https://github.com/floooh/sokol-samples/blob/master/wgpu/vertexpulling-wgpu.c">wgpu/vertexpulling-wgpu.c</a></li>
<li><strong>desktop GL</strong>: <a href="https://github.com/floooh/sokol-samples/blob/master/glfw/vertexpulling-glfw.c">glfw/vertexpulling-glfw.c</a></li>
</ul>
<h2 id="how-to-check-for-storage-buffer-support">How to check for storage buffer support</h2>
<p>To check for storage buffer support at runtime, call <code class="language-plaintext highlighter-rouge">sg_query_features()</code> and check the <code class="language-plaintext highlighter-rouge">storage_buffer</code> boolean in the result:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="n">sg_query_features</span><span class="p">().</span><span class="n">storage_buffer</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// storage buffers are supported...</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="c1">// storage buffers are *NOT* supported...</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="desktop-gl-version-caveats-and-a-minor-breaking-change">Desktop GL version caveats (and a minor breaking change)</h2>
<p>The sokol_gfx.h desktop-GL backend will now query what GL version it runs on
to decide whether storage buffers are supported (storage buffers were added in GL 4.3).</p>
<p>The expected minimal version has been bumped to 4.1 on macOS and 4.3 on other platforms, this
also means that sokol_app.h will now by default create a 4.1 context on macOS, and 4.3 context
on other platforms.</p>
<p>Since the GL version is now flexible, the configuration define <code class="language-plaintext highlighter-rouge">SOKOL_GLCORE33</code> doesn’t make
much sense anymore and has been renamed to <code class="language-plaintext highlighter-rouge">SOKOL_GLCORE</code>. You’ll get a proper compile
error when trying to build with the old <code class="language-plaintext highlighter-rouge">SOKOL_GLCORE33</code> define.</p>
<p>Apart from rebuilding your shaders via an updated sokol-shdc, this is the only required
change for existing code.</p>
<p>In sokol-shdc, the target language <code class="language-plaintext highlighter-rouge">glsl330</code> has been removed and replaced
with <code class="language-plaintext highlighter-rouge">glsl410</code> and <code class="language-plaintext highlighter-rouge">glsl430</code>. When targeting the macOS GL backend, use <code class="language-plaintext highlighter-rouge">glsl410</code>,
otherwise <code class="language-plaintext highlighter-rouge">glsl430</code>.</p>
<h2 id="a-simple-vertex-pulling-example">A simple vertex pulling example</h2>
<p>First let’s rewrite the <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/cube-sapp.glsl">cube-sapp.glsl</a> shader
to pull vertices from a storage buffer instead of the fixed function vertex input.</p>
<p>The original shader declares the vertex input with vertex attributes:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">in</span> <span class="kt">vec4</span> <span class="n">position</span><span class="p">;</span>
<span class="k">in</span> <span class="kt">vec4</span> <span class="n">color0</span><span class="p">;</span>
</code></pre></div></div>
<blockquote>
<p>NOTE: the cube-sapp.glsl shader makes use of a fixed function vertex input
feature which extends float[3] vertex data on the CPU side to vec4 with a w-component 1.0
on the GPU side. Magic like this isn’t supported when reading from storage buffers (as far as
I’m aware at least).</p>
</blockquote>
<p>For vertex pulling the input vertex attributes are replaced with a flexible-array struct inside a buffer interface block.</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">sb_vertex</span> <span class="p">{</span>
<span class="kt">vec3</span> <span class="n">pos</span><span class="p">;</span>
<span class="kt">vec4</span> <span class="n">color</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">readonly</span> <span class="n">buffer</span> <span class="n">ssbo</span> <span class="p">{</span>
<span class="n">sb_vertex</span> <span class="n">vtx</span><span class="p">[];</span>
<span class="p">};</span>
</code></pre></div></div>
<blockquote>
<p>NOTE: I’m using <code class="language-plaintext highlighter-rouge">sb_vertex</code> for the struct name here because <code class="language-plaintext highlighter-rouge">vertex</code> is a reserved keyword
in the Metal Shading Language and would cause a compile error when outputting MSL.</p>
</blockquote>
<p>Do not use an attribute like <code class="language-plaintext highlighter-rouge">layout(std430, binding=0)</code> for the buffer interface block,
sokol-shdc will take care of those details.</p>
<p>The original vertex shader looks like this:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="nb">gl_Position</span> <span class="o">=</span> <span class="n">mvp</span> <span class="o">*</span> <span class="n">position</span><span class="p">;</span>
<span class="n">color</span> <span class="o">=</span> <span class="n">color0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Converted to vertex pulling it looks like this:</p>
<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">vec4</span> <span class="n">position</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="n">vtx</span><span class="p">[</span><span class="n">gl_VertexIndex</span><span class="p">].</span><span class="n">pos</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
<span class="nb">gl_Position</span> <span class="o">=</span> <span class="n">mvp</span> <span class="o">*</span> <span class="n">position</span><span class="p">;</span>
<span class="n">color</span> <span class="o">=</span> <span class="n">vtx</span><span class="p">[</span><span class="n">gl_VertexIndex</span><span class="p">].</span><span class="n">color</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Note how <code class="language-plaintext highlighter-rouge">gl_VertexIndex</code> (not <code class="language-plaintext highlighter-rouge">gl_VertexID</code>!) is used to index into the storage buffer,
this is because sokol-shdc shaders are written in ‘Vulkan style’, not ‘GL style’.</p>
<p>We also need to expand the vec3 input pos manually to a vec4 with w-component = 1.0.</p>
<p>That’s all the changes needed on the shader side. Next compile the modified shader with:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sokol-shdc <span class="nt">-i</span> shader.glsl <span class="nt">-o</span> shader.h <span class="nt">-l</span> metal_macos:hlsl5:glsl430:wgsl <span class="nt">-f</span> sokol
</code></pre></div></div>
<p>Apart from the ‘traditional’ code-generation output, sokol-shdc will create two new
declarations:</p>
<ul>
<li>
<p>A define <code class="language-plaintext highlighter-rouge">#define SLOT_ssbo (0)</code>, this is the bind slot index to be used in the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct</p>
</li>
<li>
<p>A C struct <code class="language-plaintext highlighter-rouge">sb_vertex_t</code> which maps the GLSL struct <code class="language-plaintext highlighter-rouge">sb_vertex</code> to the C side looking like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">SOKOL_SHDC_ALIGN</span><span class="p">(</span><span class="mi">16</span><span class="p">)</span> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">sb_vertex_t</span> <span class="p">{</span>
<span class="kt">float</span> <span class="n">pos</span><span class="p">[</span><span class="mi">3</span><span class="p">];</span>
<span class="kt">uint8_t</span> <span class="n">_pad_12</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span>
<span class="kt">float</span> <span class="n">color</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span>
<span class="p">}</span> <span class="n">sb_vertex_t</span><span class="p">;</span>
</code></pre></div> </div>
</li>
</ul>
<blockquote>
<p>NOTE: with the right <code class="language-plaintext highlighter-rouge">@ctype</code> tags at the top of the shader we could
also map the struct members to C or C++ types, for instance with HandmadeMath.h types:</p>
</blockquote>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">SOKOL_SHDC_ALIGN</span><span class="p">(</span><span class="mi">16</span><span class="p">)</span> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">sb_vertex_t</span> <span class="p">{</span>
<span class="n">hmm_vec3</span> <span class="n">pos</span><span class="p">;</span>
<span class="kt">uint8_t</span> <span class="n">_pad_12</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span>
<span class="n">hmm_vec4</span> <span class="n">color</span><span class="p">;</span>
<span class="p">}</span> <span class="n">sb_vertex_t</span><span class="p">;</span>
</code></pre></div></div>
<p>Next let’s see how the <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/cube-sapp.c">cube-sapp C code</a> needs to be changed:</p>
<p>The original code creates a vertex buffer like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">float</span> <span class="n">vertices</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
<span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span>
<span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span>
<span class="p">...</span>
<span class="p">};</span>
<span class="n">sg_buffer</span> <span class="n">vbuf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices</span><span class="p">),</span>
<span class="p">.</span><span class="n">label</span> <span class="o">=</span> <span class="s">"cube-vertices"</span>
<span class="p">});</span>
</code></pre></div></div>
<p>By default <code class="language-plaintext highlighter-rouge">sg_make_buffer()</code> creates a vertex buffer, so the above is
identical with a more explicit:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_buffer</span> <span class="n">vbuf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">type</span> <span class="o">=</span> <span class="n">SG_BUFFERTYPE_VERTEXBUFFER</span><span class="p">,</span>
<span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices</span><span class="p">),</span>
<span class="p">.</span><span class="n">label</span> <span class="o">=</span> <span class="s">"cube-vertices"</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…when changing the code to use storage buffers we can use the
code-generated <code class="language-plaintext highlighter-rouge">sb_vertex_t</code> struct to initialize the vertex data.
This has the advantage that we don’t need to care about the obscure
<code class="language-plaintext highlighter-rouge">std430</code> memory layout rules:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sb_vertex_t</span> <span class="n">vertices</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">{</span> <span class="p">.</span><span class="n">pos</span> <span class="o">=</span> <span class="p">{</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span> <span class="p">},</span> <span class="p">.</span><span class="n">color</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span> <span class="p">}</span> <span class="p">},</span>
<span class="p">{</span> <span class="p">.</span><span class="n">pos</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span> <span class="p">},</span> <span class="p">.</span><span class="n">color</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span> <span class="p">}</span> <span class="p">},</span>
<span class="p">...</span>
<span class="p">};</span>
<span class="n">sg_buffer</span> <span class="n">sbuf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">type</span> <span class="o">=</span> <span class="n">SG_BUFFERTYPE_STORAGEBUFFER</span><span class="p">,</span>
<span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices</span><span class="p">),</span>
<span class="p">.</span><span class="n">label</span> <span class="o">=</span> <span class="s">"cube-vertices"</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…note how the buffer type has changed to <code class="language-plaintext highlighter-rouge">SG_BUFFERTYPE_STORAGEBUFFER</code>.</p>
<p>On to the <code class="language-plaintext highlighter-rouge">sg_pipeline</code> object. In the original code, a vertex layout
must be defined in the <code class="language-plaintext highlighter-rouge">sg_pipeline_desc</code> struct to configure the
fixed function vertex input stage:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">state</span><span class="p">.</span><span class="n">pip</span> <span class="o">=</span> <span class="n">sg_make_pipeline</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pipeline_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">layout</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">attrs</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="n">ATTR_vs_position</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="n">SG_VERTEXFORMAT_FLOAT3</span><span class="p">,</span>
<span class="p">[</span><span class="n">ATTR_vs_color0</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="n">SG_VERTEXFORMAT_FLOAT4</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">shader</span> <span class="o">=</span> <span class="n">shd</span><span class="p">,</span>
<span class="p">.</span><span class="n">index_type</span> <span class="o">=</span> <span class="n">SG_INDEXTYPE_UINT16</span><span class="p">,</span>
<span class="p">.</span><span class="n">cull_mode</span> <span class="o">=</span> <span class="n">SG_CULLMODE_BACK</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">write_enabled</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">compare</span> <span class="o">=</span> <span class="n">SG_COMPAREFUNC_LESS_EQUAL</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">label</span> <span class="o">=</span> <span class="s">"cube-pipeline"</span>
<span class="p">});</span>
</code></pre></div></div>
<p>When pulling vertex data from storage buffers such a vertex layout description isn’t needed, so
the pipeline creation can be simplified to this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">state</span><span class="p">.</span><span class="n">pip</span> <span class="o">=</span> <span class="n">sg_make_pipeline</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pipeline_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">shader</span> <span class="o">=</span> <span class="n">shd</span><span class="p">,</span>
<span class="p">.</span><span class="n">index_type</span> <span class="o">=</span> <span class="n">SG_INDEXTYPE_UINT16</span><span class="p">,</span>
<span class="p">.</span><span class="n">cull_mode</span> <span class="o">=</span> <span class="n">SG_CULLMODE_BACK</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">write_enabled</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">compare</span> <span class="o">=</span> <span class="n">SG_COMPAREFUNC_LESS_EQUAL</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">label</span> <span class="o">=</span> <span class="s">"cube-pipeline"</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…the original <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct that’s passed into <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">state</span><span class="p">.</span><span class="n">bind</span> <span class="o">=</span> <span class="p">(</span><span class="n">sg_bindings</span><span class="p">)</span> <span class="p">{</span>
<span class="p">.</span><span class="n">vertex_buffers</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">vbuf</span><span class="p">,</span>
<span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="n">ibuf</span>
<span class="p">};</span>
</code></pre></div></div>
<p>…is changed like this (e.g. replace the vertex buffer binding with a storage
buffer binding on the vertex shader stage):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">state</span><span class="p">.</span><span class="n">bind</span> <span class="o">=</span> <span class="p">(</span><span class="n">sg_bindings</span><span class="p">)</span> <span class="p">{</span>
<span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="n">ibuf</span>
<span class="p">.</span><span class="n">vs</span><span class="p">.</span><span class="n">storage_buffers</span><span class="p">[</span><span class="n">SLOT_ssbo</span><span class="p">]</span> <span class="o">=</span> <span class="n">sbuf</span><span class="p">,</span>
<span class="p">};</span>
</code></pre></div></div>
<p>…and that’s it! On the CPU side, storage buffers actually simplify a lot of code because
you don’t need a vertex layout in the <code class="language-plaintext highlighter-rouge">sg_pipeline_desc</code> struct, and you get a properly
aligned and padded C struct for the storage buffer content from sokol-shdc.</p>
<blockquote>
<p>NOTE: A ‘proper’ cross-backend sample should also check whether storage buffers are
actually supported via <code class="language-plaintext highlighter-rouge">sg_query_features().storage_buffer</code> and render some
sort of fallback.</p>
</blockquote>
<h2 id="shader-authoring-caveats">Shader Authoring Caveats</h2>
<p>Shader authoring via sokol-shdc is a bit more restricted than vanilla GLSL:</p>
<ol>
<li>A storage buffer interface block must contain exactly one item, and this
item must be a flexible struct array member. In vanilla GLSL you can have
additional ‘header items’ in front of the flexible array member, but this
turned out tricky to map to CPU-side non-C languages that don’t allow
flexible array members (I actually need to research the various target languages a bit more, maybe
this rule can be relaxed in the future for some of the target languages).</li>
<li>Currently the following types are valid inside a storage buffer struct:
<ul>
<li><code class="language-plaintext highlighter-rouge">bool, bvec2..4</code>: mapped to int32_t, and int32_t[2..4]</li>
<li><code class="language-plaintext highlighter-rouge">int, ivec2..4</code>: mapped to int32_t, and int32_t[2..4]</li>
<li><code class="language-plaintext highlighter-rouge">uint, uvec2..4</code>: mapped to uint32_t, and uint32_t[2..4]</li>
<li><code class="language-plaintext highlighter-rouge">float, vec2..4</code>: mapped to float and float[2..4]</li>
<li><code class="language-plaintext highlighter-rouge">matNxM</code> where N=2..4 and M=1..4 mapped to float[2..64]</li>
</ul>
</li>
<li>nested structs</li>
<li>arrays of the above</li>
</ol>
<p>Please note that only few of those combinations are tested, especially when it
comes to correct array item padding and alignment. If you stumble over any problems
please write a ticket at <a href="https://github.com/floooh/sokol-tools/issues">https://github.com/floooh/sokol-tools/issues</a>.</p>
<p>To load packed vertex components from storage buffers, use the following GLSL builtins:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">vec2 unpackUnorm2x16(uint p)</code></li>
<li><code class="language-plaintext highlighter-rouge">vec2 unpackSnorm2x16(uint p)</code></li>
<li><code class="language-plaintext highlighter-rouge">vec4 unpackUnorm4x8(uint p)</code></li>
<li><code class="language-plaintext highlighter-rouge">vec4 unpackSnorm4x8(uint p)</code></li>
</ul>
<h2 id="under-the-hood">Under the hood</h2>
<blockquote>
<p>NOTE: the following information about shader bind slots are only relevant if
you do not use the sokol shader compiler (sokol-shdc), but instead pass ‘raw’
HLSL, MSL, GLSL or WGSL shaders into sokol_gfx.h. Also, this information will
become obsolete/irrelevant with another future update I have in mind which will allow
more flexibility when mapping sokol-gfx bind slots to backend 3D API bind slots
(see this planning ticket for more info: <a href="https://github.com/floooh/sokol/issues/1037">#1037</a>)</p>
</blockquote>
<h3 id="metal">Metal</h3>
<p>On Metal there is no ‘buffer zoo’ like in other 3D APIs, uniform-, vertex-,
index- and storage-buffers are all the same thing. The vertex-
and fragment-shader stages have their own buffer bind slot spaces though.</p>
<p>The following bind slot ranges are used for the various sokol-gfx
buffer types:</p>
<ul>
<li>on the vertex shader stage:
<ul>
<li><strong><code class="language-plaintext highlighter-rouge">slots 0..3</code></strong> for uniform buffer bindings (sokol-gfx internally
manages an uniform buffer which might be bound at up to four different
offsets)</li>
<li><strong><code class="language-plaintext highlighter-rouge">slots 4..11</code></strong> for vertex buffer bindings</li>
<li><strong><code class="language-plaintext highlighter-rouge">slots 12..19</code></strong> for storage buffer bindings</li>
</ul>
</li>
<li>on the fragment shader stage:
<ul>
<li><strong><code class="language-plaintext highlighter-rouge">slots 0..3</code></strong> for uniform buffer bindings</li>
<li><strong><code class="language-plaintext highlighter-rouge">slots 4..11</code></strong> for storage buffer bindings</li>
</ul>
</li>
</ul>
<p>When authoring Metal shaders directly you’ll need to use the above bind slots
(also see the low-level <a href="https://github.com/floooh/sokol-samples/tree/master/metal">Metal backend samples</a>).</p>
<h3 id="d3d11">D3D11</h3>
<p>On D3D11, so called <a href="https://learn.microsoft.com/en-us/windows/win32/direct3d11/overviews-direct3d-11-resources-intro#raw-views-of-buffers"><em>Byte Address
Buffers</em></a>
are used for storage buffers which makes their direct usage in manually written
HLSL a bit awkward (but is not an issue when using sokol-shdc).</p>
<p>If this turns out to be a problem I might add D3D11-specific creation flags to
<code class="language-plaintext highlighter-rouge">sg_buffer_desc</code> to allow using different D3D11 buffer and buffer-view types
under the hood, details like this might also change again once compute shader
support is added.</p>
<p>On D3D11 and HLSL storage buffers share a bind slot range with texture bindings, that’s
why sokol-gfx defines the following bind ranges for textures and storage buffers in
HLSL:</p>
<ul>
<li><strong><code class="language-plaintext highlighter-rouge">register(t0..t15)</code></strong>: reserved for texture bindings</li>
<li><strong><code class="language-plaintext highlighter-rouge">register(t16..t23)</code></strong>: reserved for storage buffer bindings</li>
</ul>
<p>Also see the low-level <a href="https://github.com/floooh/sokol-samples/tree/master/d3d11">D3D11 backend samples</a> for details.</p>
<h3 id="webgpu">WebGPU</h3>
<p>Storage buffers are created with <code class="language-plaintext highlighter-rouge">WGPUBufferUsage_Storage</code>. WebGPU uses a common bind slot
space across all shader resource types and shader stages. Sokol-gfx reserves the following bind
slot ranges for the different shader stages and resource types, use those when feeding manually
written WGSL shaders into sokol-gfx:</p>
<ul>
<li>vertex shader stage:
<ul>
<li>textures: <strong><code class="language-plaintext highlighter-rouge">@group(1) @binding(0..15)</code></strong></li>
<li>samplers: <strong><code class="language-plaintext highlighter-rouge">@group(1) @binding(16..31)</code></strong></li>
<li>storage buffers; <strong><code class="language-plaintext highlighter-rouge">@group(1) @binding(32..47)</code></strong></li>
</ul>
</li>
<li>fragment shader stage:
<ul>
<li>textures: <strong><code class="language-plaintext highlighter-rouge">@group(1) @binding(48..63)</code></strong></li>
<li>samplers: <strong><code class="language-plaintext highlighter-rouge">@group(1) @binding(64..79)</code></strong></li>
<li>storage buffers: <strong><code class="language-plaintext highlighter-rouge">@group(1) @binding(80..95)</code></strong></li>
</ul>
</li>
</ul>
<p>Also see the low-level <a href="https://github.com/floooh/sokol-samples/tree/master/wgpu">WebGPU backend samples</a> for details</p>
<h3 id="gl">GL</h3>
<p>In GL, storage buffers are bound to the <code class="language-plaintext highlighter-rouge">GL_SHADER_STORAGE_BUFFER</code> target. Sokol-gfx
does not lookup GLSL storage buffer interface blocks by name, but instead expects that
the GLSL code that’s passed into <code class="language-plaintext highlighter-rouge">sg_make_shader()</code> uses a <code class="language-plaintext highlighter-rouge">layout(std430, binding=N)</code>
annotation to define the bind slot.</p>
<p>The vertex- and fragment-shader stage use a common bind space:</p>
<ul>
<li>on the vertex shader stage, use <strong><code class="language-plaintext highlighter-rouge">binding 0..7</code></strong></li>
<li>on the fragment shader stage, use <strong><code class="language-plaintext highlighter-rouge">binding 7..15</code></strong></li>
</ul>
<p>Also see the low-level <a href="https://github.com/floooh/sokol-samples/tree/storage-buffers/glfw">desktop GL backend samples</a> for details.</p>
<h2 id="sokol-shdc-updates">sokol-shdc updates</h2>
<p>Sokol-shdc has been massively refactored, mainly with the goal to have a more
robust base for extracting reflection information from shaders and a more
‘structured’ approach to code generation so that supporting additional CPU-side
languages will be easier in the future (I’m not yet sure if that last goal was actually achieved though, but time will tell).</p>
<p>Unfortunately this massive refactoring also means that there’s a possibility that new
bugs have sneaked in. If you notice anything weird, please write tickets here:</p>
<p><a href="https://github.com/floooh/sokol-tools/issues">https://github.com/floooh/sokol-tools/issues</a>.</p>
<p>A couple of unrelated lingering bugs have been fixed as well:</p>
<ul>
<li>C++ exceptions are now enabled and exceptions coming out of SPIRVCross are now caught
and turned into proper error messages. Previously sokol-shdc would simply appear to
crash if SPIRVCross emitted an error (because without C++ exceptions enabled,
those errors would be turned into a panic which looks like a segfault).</li>
<li>Error and warning line numbers had been off by a couple of lines recently.
This has been fixed and error messages now point to the correct line again.</li>
<li>A couple of somewhat esoteric code generation bugs in non-C code generators
were fixed (but as I said, it’s also quite likely that I have introduced new bugs
in that area, since code generators were completely rewritten)</li>
</ul>
<h2 id="whats-next">What’s next:</h2>
<p>In short:</p>
<ul>
<li>A resource binding cleanup (see
<a href="https://github.com/floooh/sokol/issues/1037">#1037</a>), the main motivation
for this is that the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct is growing quite large and would
grow even larger if a new compute shader stage is added. Furthermore, the artificial
separation of shader stages when binding resources also doesn’t map
particularly well to some modern 3D APIs.</li>
<li>After that it’s finally time to tackle compute shaders. For this I need to
come up with a resource synchronization strategy, but I will most likely
just copy what WebGPU does.</li>
</ul>
<p>But first I will probably take a little break and dabble a bit with Zig and emulator coding :)</p>
Mon, 06 May 2024 00:00:00 +0000
https://floooh.github.io/2024/05/06/sokol-storage-buffers.html
https://floooh.github.io/2024/05/06/sokol-storage-buffers.htmlUpcoming Sokol header API changes (Feb 2024)<p>In a couple of days I will merge the first big API update of 2024 for sokol_gfx.h (with some
related changes in sokol_app.h, sokol_glue.h and sokol_gfx_imgui.h).</p>
<blockquote>
<p>NOTE: most links to code examples will only point to the right code after <a href="https://github.com/floooh/sokol/pull/985">PR #985</a> has been merged!</p>
</blockquote>
<p>The API update in sokol_gfx.h is a <strong>BREAKING CHANGE</strong> for all code, but for most use cases
the required changes are fairly minimal.</p>
<p>Apologies for the broken syntax highlighting, apparently <a href="https://github.com/rouge-ruby/rouge">Rouge</a> doesn’t understand C99.</p>
<h2 id="table-of-contents">Table of Contents</h2>
<ul id="markdown-toc">
<li><a href="#table-of-contents" id="markdown-toc-table-of-contents">Table of Contents</a></li>
<li><a href="#overview-and-motivation" id="markdown-toc-overview-and-motivation">Overview and Motivation</a></li>
<li><a href="#detailed-change-list" id="markdown-toc-detailed-change-list">Detailed change list</a> <ul>
<li><a href="#sokol_gfxh" id="markdown-toc-sokol_gfxh">sokol_gfx.h</a></li>
<li><a href="#sokol_apph" id="markdown-toc-sokol_apph">sokol_app.h</a></li>
<li><a href="#sokol_glueh" id="markdown-toc-sokol_glueh">sokol_glue.h</a></li>
<li><a href="#sokol_gfx_imguih" id="markdown-toc-sokol_gfx_imguih">sokol_gfx_imgui.h</a></li>
</ul>
</li>
<li><a href="#link-collection-with-example-code-changes" id="markdown-toc-link-collection-with-example-code-changes">Link collection with example code changes</a></li>
<li><a href="#detailed-change-recipes" id="markdown-toc-detailed-change-recipes">Detailed Change Recipes</a> <ul>
<li><a href="#for-sokol_gfxh--sokol_apph--sokol_glueh" id="markdown-toc-for-sokol_gfxh--sokol_apph--sokol_glueh">…for sokol_gfx.h + sokol_app.h + sokol_glue.h</a></li>
<li><a href="#for-offscreen-render-passes" id="markdown-toc-for-offscreen-render-passes">…for offscreen render passes</a></li>
<li><a href="#for-custom-window-system-glue" id="markdown-toc-for-custom-window-system-glue">…for custom window system glue</a> <ul>
<li><a href="#using-d3d11" id="markdown-toc-using-d3d11">…using D3D11</a></li>
<li><a href="#using-metal" id="markdown-toc-using-metal">…using Metal</a></li>
<li><a href="#using-webgpu" id="markdown-toc-using-webgpu">…using WebGPU</a></li>
<li><a href="#gl-with-glfw" id="markdown-toc-gl-with-glfw">…GL with GLFW</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#q-why-still-have-a-baked-pass-attachments-object" id="markdown-toc-q-why-still-have-a-baked-pass-attachments-object">Q: Why still have a baked pass attachments object?</a></li>
</ul>
<h2 id="overview-and-motivation">Overview and Motivation</h2>
<p>The general topic of this update is a cleanup of the sokol-gfx render pass
functions and how external swapchain information is passed into sokol-gfx.</p>
<p>Previously there was a special ‘default render pass’ into a ‘default framebuffer’,
and the concept of ‘contexts’ to allow switching between different rendering contexts
and their default framebuffers (very similar to traditional OpenGL contexts,
and in fact this old behavior only ever matched OpenGL, but not the other backend
APIs).</p>
<p>This setup was needlessly complicated for people who want to use sokol-gfx
to render into multiple windows, leading to planning <a href="https://github.com/floooh/sokol/issues/904">ticket #904</a>,
and then to <a href="https://github.com/floooh/sokol/pull/985">PR #985</a>.</p>
<p>The gist is:</p>
<ul>
<li>There is now only a single ‘unified’ <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> function which covers
both rendering into sokol-gfx render target textures (aka ‘offscreen passes’)
and externally managed ‘swapchains’ (aka ‘swapchain passes’).</li>
<li>The entire concept of <code class="language-plaintext highlighter-rouge">contexts</code> has been removed from sokol_gfx.h.</li>
<li>External swapchain properties are now passed directly into <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code>
in a transient structure.</li>
</ul>
<p>Instead of having a special and unique ‘default-render-pass’ per frame and context,
an application can now simply call <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> multiple times per frame,
each time with properties for a different swapchain, and all that without having to
create ‘context objects’ upfront or ‘switching contexts’.</p>
<p>Most simple applications that don’t render into offscreen passes and
use sokol_gfx.h together with sokol_app.h and sokol_glue.h only need to change
two calls: <code class="language-plaintext highlighter-rouge">sg_setup()</code> and <code class="language-plaintext highlighter-rouge">sg_begin_default_pass()</code>, for other situations
please check the ‘Change Recipes’ section further down.</p>
<p>In addition to this blog post, please also re-read the documentation headers
in sokol_gfx.h and sokol_app.h, and specifically the struct documentation
for the new sokol-gfx structs <code class="language-plaintext highlighter-rouge">sg_environment</code> and <code class="language-plaintext highlighter-rouge">sg_swapchain</code>.</p>
<h2 id="detailed-change-list">Detailed change list</h2>
<h3 id="sokol_gfxh">sokol_gfx.h</h3>
<p>The following public API structs and functions have been <strong>removed</strong>:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">sg_begin_default_pass()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_begin_default_passf()</code></li>
<li><code class="language-plaintext highlighter-rouge">struct sg_context_desc</code></li>
<li><code class="language-plaintext highlighter-rouge">struct sg_context</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_setup_context()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_activate_context()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_discard_context()</code></li>
</ul>
<p>The following top-level structs have been <strong>added</strong>:</p>
<ul>
<li>
<p><code class="language-plaintext highlighter-rouge">struct sg_environment</code>: this is passed as a nested struct of <code class="language-plaintext highlighter-rouge">sg_desc</code> into
the <code class="language-plaintext highlighter-rouge">sg_setup()</code> call to provide information about the environment sokol-gfx
runs in (most importantly 3D API device pointers).</p>
</li>
<li>
<p><code class="language-plaintext highlighter-rouge">struct sg_swapchain</code>: this is passed into <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> for render passes
which should render into an externally managed swapchain. The struct contains the
following information:</p>
<ul>
<li>the pixel format of the swapchain’s rendering surface</li>
<li>the pixel format of the optional depth/stencil surface</li>
<li>an MSAA sample count</li>
<li>3D backend specific resource handles, like D3D11/WebGPU texture views, Metal drawables,
or GL framebuffers</li>
</ul>
</li>
</ul>
<p>The resource handle type <code class="language-plaintext highlighter-rouge">sg_pass</code> has been <strong>renamed</strong> to <code class="language-plaintext highlighter-rouge">sg_attachments</code> (to
free the name for another purpose), this also causes related renames:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">sg_pass</code> => <code class="language-plaintext highlighter-rouge">sg_attachments</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_pass_desc</code> => <code class="language-plaintext highlighter-rouge">sg_attachments_desc</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_pass_info</code> => <code class="language-plaintext highlighter-rouge">sg_attachments_info</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_make_pass()</code> => <code class="language-plaintext highlighter-rouge">sg_make_attachments()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_destroy_pass()</code> => <code class="language-plaintext highlighter-rouge">sg_destroy_attachments()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_query_pass_state()</code> => <code class="language-plaintext highlighter-rouge">sg_query_attachments_state()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_query_pass_info()</code> => <code class="language-plaintext highlighter-rouge">sg_query_attachments_info()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_query_pass_desc()</code> => <code class="language-plaintext highlighter-rouge">sg_query_attachments_desc()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_alloc_pass()</code> => <code class="language-plaintext highlighter-rouge">sg_alloc_attachments()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_dealloc_pass()</code> => <code class="language-plaintext highlighter-rouge">sg_dealloc_attachments()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_init_pass()</code> => <code class="language-plaintext highlighter-rouge">sg_init_attachments()</code></li>
<li><code class="language-plaintext highlighter-rouge">sg_fail_pass()</code> => <code class="language-plaintext highlighter-rouge">sg_fail_attachments()</code></li>
<li>
<table>
<tbody>
<tr>
<td><code class="language-plaintext highlighter-rouge">sg_[*]_pass_info()</code> => <code class="language-plaintext highlighter-rouge">sg_[*]_attachments_info()</code> (where ‘*’ is ‘d3d11</td>
<td>gl</td>
<td>metal</td>
<td>wgpu’)</td>
</tr>
</tbody>
</table>
</li>
</ul>
<p>Inside the <code class="language-plaintext highlighter-rouge">sg_attachments_desc</code> struct there has been some renaming to reduce redundancy:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">.color_attachments[]</code> => <code class="language-plaintext highlighter-rouge">.colors[]</code></li>
<li><code class="language-plaintext highlighter-rouge">.resolve_attachments[]</code> => <code class="language-plaintext highlighter-rouge">.resolves[]</code></li>
<li><code class="language-plaintext highlighter-rouge">.depth_stencil_attachment</code> => <code class="language-plaintext highlighter-rouge">.depth_stencil</code></li>
</ul>
<p>The typename <code class="language-plaintext highlighter-rouge">sg_pass</code> has been repurposed to serve as the <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> parameter,
e.g. the begin-pass function signature now looks like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">sg_begin_pass</span><span class="p">(</span><span class="k">const</span> <span class="n">sg_pass</span><span class="o">*</span> <span class="n">pass</span><span class="p">);</span>
</code></pre></div></div>
<p>With the struct <code class="language-plaintext highlighter-rouge">sg_pass</code> now looking like this (with omitted start/end canaries):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_pass</span> <span class="p">{</span>
<span class="n">sg_pass_action</span> <span class="n">action</span><span class="p">;</span>
<span class="n">sg_attachments</span> <span class="n">attachments</span><span class="p">;</span>
<span class="n">sg_swapchain</span> <span class="n">swapchain</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">label</span><span class="p">;</span>
<span class="p">}</span> <span class="n">sg_pass</span><span class="p">;</span>
</code></pre></div></div>
<p>For an ‘offscreen-render-pass’, an <code class="language-plaintext highlighter-rouge">.attachments</code> item must be provided, but no
<code class="language-plaintext highlighter-rouge">.swapchain</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span>
<span class="p">.</span><span class="n">action</span> <span class="o">=</span> <span class="n">pass_action</span><span class="p">,</span>
<span class="p">.</span><span class="n">attachments</span> <span class="o">=</span> <span class="n">attachments</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…and for a ‘swapchain-render-pass’, a <code class="language-plaintext highlighter-rouge">.swapchain</code> item must be provided, but no
<code class="language-plaintext highlighter-rouge">.attachments</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span>
<span class="p">.</span><span class="n">action</span> <span class="o">=</span> <span class="n">pass_action</span><span class="p">,</span>
<span class="p">.</span><span class="n">swapchain</span> <span class="o">=</span> <span class="n">sglue_swapchain</span><span class="p">(),</span>
<span class="p">});</span>
</code></pre></div></div>
<p>Other unrelated ‘drive-by-changes’ in sokol_gfx.h:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">sg_limits.gl_max_vertex_uniform_vectors</code> has been replaced with <code class="language-plaintext highlighter-rouge">sg_limits.gl_max_vertex_uniform_components</code>
(see <a href="https://github.com/floooh/sokol/issues/714">#714</a>)</li>
<li>the start and end canaries in <code class="language-plaintext highlighter-rouge">sg_pass_action</code> have been removed (since <code class="language-plaintext highlighter-rouge">sg_pass_action</code> is now a nested
struct of <code class="language-plaintext highlighter-rouge">sg_pass</code>, the canaries are redundant)</li>
<li>a new initialization config item <code class="language-plaintext highlighter-rouge">sg_desc.mtl_use_command_buffer_with_retained_references</code> has been added,
(see: <a href="https://github.com/floooh/sokol/issues/981">#981</a>)</li>
</ul>
<h3 id="sokol_apph">sokol_app.h</h3>
<p>The following public API function has been removed:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">sapp_metal_get_renderpass_descriptor()</code></li>
</ul>
<p>The following functions have been renamed:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">sapp_metal_get_drawable()</code> => <code class="language-plaintext highlighter-rouge">sapp_metal_get_current_drawable()</code></li>
<li><code class="language-plaintext highlighter-rouge">sapp_d3d11_get_render_target_view()</code> => <code class="language-plaintext highlighter-rouge">sapp_d3d11_get_render_view()</code></li>
</ul>
<p>…and the following functions are new:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">sapp_metal_get_depth_stencil_texture()</code></li>
<li><code class="language-plaintext highlighter-rouge">sapp_metal_get_msaa_color_texture()</code></li>
<li><code class="language-plaintext highlighter-rouge">sapp_d3d11_get_resolve_view()</code></li>
<li><code class="language-plaintext highlighter-rouge">sapp_gl_get_framebuffer()</code></li>
</ul>
<p>…These functions directly plug into the new <code class="language-plaintext highlighter-rouge">sg_swapchain</code> struct in sokol_gfx.h.</p>
<h3 id="sokol_glueh">sokol_glue.h</h3>
<p>sokol_glue.h is now a regular library header without the ‘preprocessor magic’
which created a different API depending on what other sokol headers had been
included before sokol_glue.h (this was an ‘interesting’ but ultimately
pretty stupid idea).</p>
<p>The API prefix has changed from a somewhat confusing <code class="language-plaintext highlighter-rouge">sapp_</code> to the expected <code class="language-plaintext highlighter-rouge">sglue_</code>.</p>
<p>The old function <code class="language-plaintext highlighter-rouge">sapp_sgcontext()</code> has been split into two new functions:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">sglue_environment()</code> which plugs directly into <code class="language-plaintext highlighter-rouge">sg_desc.environment</code>, and…</li>
<li><code class="language-plaintext highlighter-rouge">sglue_swapchain()</code> which plugs into <code class="language-plaintext highlighter-rouge">sg_pass.swapchain</code></li>
</ul>
<p>Note that <code class="language-plaintext highlighter-rouge">sglue_swapchain()</code> may return different values each frame depending
on the 3D API backend.</p>
<h3 id="sokol_gfx_imguih">sokol_gfx_imgui.h</h3>
<p>In a similar vein, the public API prefix of sokol_gfx_imgui.h has been changed from the
weird ‘double prefix’ <code class="language-plaintext highlighter-rouge">sg_imgui_</code> to a more conventional <code class="language-plaintext highlighter-rouge">sgimgui_</code>.</p>
<p>Apart from this publicly visible change, all the internals have been updated to reflect
the sokol-gfx API changes.</p>
<h2 id="link-collection-with-example-code-changes">Link collection with example code changes</h2>
<p>If you use sokol_gfx.h + sokol_app.h + sokol_glue.h, check out the updated samples
here (first click on a sample, and then on the ‘src’ link at the bottom):</p>
<ul>
<li><a href="https://floooh.github.io/sokol-html5/">sokol samples</a></li>
</ul>
<p>Specifically look at <a href="https://floooh.github.io/sokol-html5/clear-sapp.html">clear-sapp</a>
for the simple case of only rendering to a default framebuffer, and
<a href="https://floooh.github.io/sokol-html5/offscreen-sapp.html">offscreen-sapp</a> for
rendering to an offscreen render target.</p>
<p>If you use sokol_gfx.h with your own window system glue, or a library like GLFW or SDL,
check out the updated backend specific examples:</p>
<ul>
<li>for D3D11: <a href="https://github.com/floooh/sokol-samples/tree/master/d3d11">https://github.com/floooh/sokol-samples/tree/master/d3d11</a></li>
<li>for Metal: <a href="https://github.com/floooh/sokol-samples/tree/master/metal">https://github.com/floooh/sokol-samples/tree/master/metal</a></li>
<li>for GL with GLFW: <a href="https://github.com/floooh/sokol-samples/tree/master/glfw">https://github.com/floooh/sokol-samples/tree/master/glfw</a></li>
<li>for WebGL2: <a href="https://github.com/floooh/sokol-samples/tree/master/html5">https://github.com/floooh/sokol-samples/tree/master/html5</a></li>
<li>for WebGPU: <a href="https://github.com/floooh/sokol-samples/tree/master/wgpu">https://github.com/floooh/sokol-samples/tree/master/wgpu</a></li>
</ul>
<p>The GLFW subdirectory also contains an updated <code class="language-plaintext highlighter-rouge">multiwindow-glfw</code> sample, and
a <code class="language-plaintext highlighter-rouge">metal-glfw</code> sample which demonstrates how to use GLFW in NO_API mode together
with the sokol_gfx.h Metal backend.</p>
<p>Also please be aware of the following behaviour and expectation changes if you
are using your own window system glue:</p>
<ul>
<li>
<p>For <strong>D3D11/DXGI</strong> the MSAA resolve operation is now performed in <code class="language-plaintext highlighter-rouge">sg_end_pass()</code>,
previously this was expected to be performed in the window system glue before
presentation.</p>
</li>
<li>
<p>For <strong>Metal</strong> it is now expected that the window system glue provides
a <code class="language-plaintext highlighter-rouge">CAMetalDrawable</code> and optional <code class="language-plaintext highlighter-rouge">MTLTexture</code> objects instead of an
<code class="language-plaintext highlighter-rouge">MTLRenderPassDescriptor</code>. This was also done to better ‘harmonize’
with the other backends (it’s just as easy getting those individual
objects from an <code class="language-plaintext highlighter-rouge">MTKView</code> as the <code class="language-plaintext highlighter-rouge">MTLRenderPassDescriptor</code>).</p>
</li>
<li>
<p>For <strong>GL</strong>, sokol-gfx now expects that <em>all</em> rendering goes through a single
GL context. This may require changes to existing code which renders into
multiple windows (for instance in GLFW, every window has its own GL context).
Refer to the new
<a href="https://github.com/floooh/sokol-samples/blob/master/glfw/multiwindow-glfw.c">multiwindow-glfw.c</a>
example for a possible solution.</p>
</li>
</ul>
<p>Additionally, check out the following PRs for required changes in my toy
projects:</p>
<ul>
<li><a href="https://github.com/floooh/pacman.c/pull/12">pacman.c</a></li>
<li><a href="https://github.com/floooh/doom-sokol/pull/1">Doom on Sokol</a></li>
<li><a href="https://github.com/floooh/v6502r/pull/24/files">Visual 6502 Remix</a></li>
<li><a href="https://github.com/floooh/qoiview/pull/10">qoiview</a></li>
<li><a href="https://github.com/floooh/chips-test/pull/33">chips</a></li>
</ul>
<p>When using the language bindings, check out the following PRs:</p>
<ul>
<li><a href="https://github.com/floooh/sokol-zig/pull/57/files">sokol-zig</a></li>
<li><a href="https://github.com/floooh/sokol-odin/pull/8">sokol-odin</a></li>
<li><a href="https://github.com/floooh/sokol-nim/pull/28">sokol-nim</a></li>
<li><a href="https://github.com/floooh/sokol-rust/pull/22">sokol-rust</a></li>
<li><a href="https://github.com/floooh/pacman.zig/pull/23">pacman.zig</a></li>
<li><a href="https://github.com/floooh/kc85.zig/pull/4">kc85.zig</a></li>
</ul>
<h2 id="detailed-change-recipes">Detailed Change Recipes</h2>
<h3 id="for-sokol_gfxh--sokol_apph--sokol_glueh">…for sokol_gfx.h + sokol_app.h + sokol_glue.h</h3>
<p>When using sokol_gfx.h together with sokol_app.h and sokol_glue.h…</p>
<p>…change your <code class="language-plaintext highlighter-rouge">sg_setup()</code> call from this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_setup</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">context</span> <span class="o">=</span> <span class="n">sapp_sgcontext</span><span class="p">(),</span>
<span class="p">.</span><span class="n">logger</span><span class="p">.</span><span class="n">func</span> <span class="o">=</span> <span class="n">slog_func</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…to this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_setup</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">environment</span> <span class="o">=</span> <span class="n">sglue_environment</span><span class="p">(),</span>
<span class="p">.</span><span class="n">logger</span><span class="p">.</span><span class="n">func</span> <span class="o">=</span> <span class="n">slog_func</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>Change the <code class="language-plaintext highlighter-rouge">sg_begin_default_pass()</code> call from this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_default_pass</span><span class="p">(</span><span class="o">&</span><span class="n">pass_action</span><span class="p">,</span> <span class="n">sapp_width</span><span class="p">(),</span> <span class="n">sapp_height</span><span class="p">());</span>
</code></pre></div></div>
<p>…to this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span>
<span class="p">.</span><span class="n">action</span> <span class="o">=</span> <span class="n">pass_action</span><span class="p">,</span>
<span class="p">.</span><span class="n">swapchain</span> <span class="o">=</span> <span class="n">sglue_swapchain</span><span class="p">()</span>
<span class="p">});</span>
</code></pre></div></div>
<h3 id="for-offscreen-render-passes">…for offscreen render passes</h3>
<p>Change <code class="language-plaintext highlighter-rouge">sg_make_pass()</code> calls from this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_pass</span> <span class="n">pass</span> <span class="o">=</span> <span class="n">sg_make_pass</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pass_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">color_attachments</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">image</span> <span class="o">=</span> <span class="n">color_img</span><span class="p">,</span>
<span class="p">.</span><span class="n">resolve_attachments</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">image</span> <span class="o">=</span> <span class="n">resolve_img</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth_stencil_attachment</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">depth_img</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…to this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_attachments</span> <span class="n">attachments</span> <span class="o">=</span> <span class="n">sg_make_attachments</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_attachments_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">colors</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">image</span> <span class="o">=</span> <span class="n">color_img</span><span class="p">,</span>
<span class="p">.</span><span class="n">resolves</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">image</span> <span class="o">=</span> <span class="n">resolve_img</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth_stencil</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">depth_img</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div></div>
<p>Change <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> calls from this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="n">pass</span><span class="p">,</span> <span class="o">&</span><span class="n">pass_action</span><span class="p">);</span>
</code></pre></div></div>
<p>…to this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span>
<span class="p">.</span><span class="n">action</span> <span class="o">=</span> <span class="n">pass_action</span><span class="p">,</span>
<span class="p">.</span><span class="n">attachments</span> <span class="o">=</span> <span class="n">attachments</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div></div>
<h3 id="for-custom-window-system-glue">…for custom window system glue</h3>
<p>Create two helper functions, one which returns an initialized <code class="language-plaintext highlighter-rouge">sg_environment</code>
struct and one which returns an initialized <code class="language-plaintext highlighter-rouge">sg_swapchain</code> struct. Following
are examples how these functions might look like for different backend 3D APIs.</p>
<h4 id="using-d3d11">…using D3D11</h4>
<p>Example implementations:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_environment</span> <span class="nf">d3d11_environment</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">sg_environment</span><span class="p">){</span>
<span class="p">.</span><span class="n">defaults</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_BGRA8</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_DEPTH_STENCIL</span><span class="p">,</span>
<span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">d3d11</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">device</span> <span class="o">=</span> <span class="n">d3d11_device</span><span class="p">,</span> <span class="c1">// ID3D11Device*</span>
<span class="p">.</span><span class="n">device_context</span> <span class="o">=</span> <span class="n">d3d11_device_context</span><span class="p">,</span> <span class="c1">// ID3D11DeviceContext*</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">.defaults.color_format</code>, <code class="language-plaintext highlighter-rouge">defaults.depth_format</code> and <code class="language-plaintext highlighter-rouge">defaults.sample_count</code>
should match the ‘most common’ swapchain surface properties. These defaults
will be used to fill in defaults for zero-initialized values in various
sokol-gfx calls. <code class="language-plaintext highlighter-rouge">.depth_format</code> can also be <code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_NONE</code> if no
depth-buffer exists, or <code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH</code> if no stencil buffer is used.</p>
<p>The associated DXGI depth-stencil-view pixel formats are:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH_STENCIL</code> => <code class="language-plaintext highlighter-rouge">DXGI_FORMAT_D24_UNORM_S8_UINT</code></li>
<li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH</code> => <code class="language-plaintext highlighter-rouge">DXGI_FORMAT_D32_FLOAT</code></li>
</ul>
<p>The helper function to obtain an <code class="language-plaintext highlighter-rouge">sg_swapchain</code> struct might look like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_swapchain</span> <span class="nf">d3d11_swapchain</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">sg_swapchain</span><span class="p">){</span>
<span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">width</span><span class="p">,</span>
<span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">height</span><span class="p">,</span>
<span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">sample_count</span><span class="p">,</span>
<span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_BGRA8</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_DEPTH_STENCIL</span><span class="p">,</span>
<span class="p">.</span><span class="n">d3d11</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">render_view</span> <span class="o">=</span> <span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">sample_count</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="o">?</span> <span class="n">state</span><span class="p">.</span><span class="n">rt_view</span> <span class="o">:</span> <span class="n">state</span><span class="p">.</span><span class="n">msaa_view</span><span class="p">,</span>
<span class="p">.</span><span class="n">resolve_view</span> <span class="o">=</span> <span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">sample_count</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="o">?</span> <span class="mi">0</span> <span class="o">:</span> <span class="n">state</span><span class="p">.</span><span class="n">rt_view</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth_stencil_view</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">ds_view</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">state.rt_view</code> and <code class="language-plaintext highlighter-rouge">state.msaa_view</code> are of type <code class="language-plaintext highlighter-rouge">ID3D11RenderTargetView</code> and <code class="language-plaintext highlighter-rouge">state.ds_view</code> is
of type <code class="language-plaintext highlighter-rouge">ID3D11DepthStencilView</code>.</p>
<p>Note how a different <code class="language-plaintext highlighter-rouge">.d3d11.render_view</code> is selected depending on whether multisampled rendering
is used or not. For non-multisampled rendering, sokol-gfx renders into the same view that’s
presented. For multisampled rendering, sokol-gfx will render into an intermediate MSAA texture
view (<code class="language-plaintext highlighter-rouge">state.msaa_view</code>) which is then resolved into the <code class="language-plaintext highlighter-rouge">d3d11.resolve_view</code> inside
<code class="language-plaintext highlighter-rouge">sg_end_pass()</code>.</p>
<p>Also check out the example D3D11 window system glue code here:</p>
<p><a href="https://github.com/floooh/sokol-samples/blob/master/d3d11/d3d11entry.c">https://github.com/floooh/sokol-samples/blob/master/d3d11/d3d11entry.c</a></p>
<h4 id="using-metal">…using Metal</h4>
<p>Example function which returns an initialized <code class="language-plaintext highlighter-rouge">sg_environment</code> struct:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_environment</span> <span class="nf">osx_environment</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">sg_environment</span><span class="p">)</span> <span class="p">{</span>
<span class="p">.</span><span class="n">defaults</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="n">sample_count</span><span class="p">,</span>
<span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_BGRA8</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_DEPTH_STENCIL</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">metal</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">device</span> <span class="o">=</span> <span class="p">(</span><span class="n">__bridge</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="n">mtl_device</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The ObjC type of <code class="language-plaintext highlighter-rouge">mtl_device</code> is <code class="language-plaintext highlighter-rouge">id<MTLDevice></code>. Note the special <code class="language-plaintext highlighter-rouge">__bridge</code> cast to
a void pointer for tunneling through the sokol_app.h and sokol_gfx.h C APIs.</p>
<p>…and the function which returns an <code class="language-plaintext highlighter-rouge">sg_swapchain</code> struct (in this case using an <code class="language-plaintext highlighter-rouge">MTKView</code>
to manage the swapchain surfaces):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_swapchain</span> <span class="nf">osx_swapchain</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">sg_swapchain</span><span class="p">)</span> <span class="p">{</span>
<span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span> <span class="p">[</span><span class="n">mtk_view</span> <span class="n">drawableSize</span><span class="p">].</span><span class="n">width</span><span class="p">,</span>
<span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span> <span class="p">[</span><span class="n">mtk_view</span> <span class="n">drawableSize</span><span class="p">].</span><span class="n">height</span><span class="p">,</span>
<span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="n">sample_count</span><span class="p">,</span>
<span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_BGRA8</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_DEPTH_STENCIL</span><span class="p">,</span>
<span class="p">.</span><span class="n">metal</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">current_drawable</span> <span class="o">=</span> <span class="p">(</span><span class="n">__bridge</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="p">[</span><span class="n">mtk_view</span> <span class="n">currentDrawable</span><span class="p">],</span>
<span class="p">.</span><span class="n">depth_stencil_texture</span> <span class="o">=</span> <span class="p">(</span><span class="n">__bridge</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="p">[</span><span class="n">mtk_view</span> <span class="n">depthStencilTexture</span><span class="p">],</span>
<span class="p">.</span><span class="n">msaa_color_texture</span> <span class="o">=</span> <span class="p">(</span><span class="n">__bridge</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="p">[</span><span class="n">mtk_view</span> <span class="n">multisampleColorTexture</span><span class="p">],</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Also check out the Metal window system glue code here:</p>
<p><a href="https://github.com/floooh/sokol-samples/blob/master/metal/osxentry.m">https://github.com/floooh/sokol-samples/blob/master/metal/osxentry.m</a></p>
<p>…alternatively check out the GLFW+Metal example here which doesn’t use an MTKView (but also doesn’t
support a depth-buffer or MSAA rendering):</p>
<p><a href="https://github.com/floooh/sokol-samples/blob/master/glfw/metal-glfw.m">https://github.com/floooh/sokol-samples/blob/master/glfw/metal-glfw.m</a></p>
<h4 id="using-webgpu">…using WebGPU</h4>
<p>The environment- and swapchain-helper-functions look very similar to D3D11:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_environment</span> <span class="nf">wgpu_environment</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">sg_environment</span><span class="p">)</span> <span class="p">{</span>
<span class="p">.</span><span class="n">defaults</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_</span><span class="p">...,</span>
<span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_</span><span class="p">...,</span>
<span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">desc</span><span class="p">.</span><span class="n">sample_count</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">.</span><span class="n">wgpu</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">device</span> <span class="o">=</span> <span class="p">(</span><span class="k">const</span> <span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="n">state</span><span class="p">.</span><span class="n">device</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>
<p>For <code class="language-plaintext highlighter-rouge">.defaults.color_format</code> you should use the result of
<code class="language-plaintext highlighter-rouge">wgpuSurfaceGetPreferredFormat()</code> translated to a sokol-gfx pixel format
(either <code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_BGRA8</code> or <code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_RGBA8</code>).</p>
<p>For the depth format use either <code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH_STENCIL</code>,
<code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH</code> or <code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_NONE</code>, which translate to WebGPU
pixel formats as follows:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH_STENCIL</code> => <code class="language-plaintext highlighter-rouge">WGPUTextureFormat_Depth32FloatStencil8</code></li>
<li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH</code> => <code class="language-plaintext highlighter-rouge">WGPUTextureFormat_Depth32Float</code></li>
</ul>
<p>The type of <code class="language-plaintext highlighter-rouge">state.device</code> is <code class="language-plaintext highlighter-rouge">WGPUDevice</code>.</p>
<p>The WebGPU swapchain helper function might look like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_swapchain</span> <span class="nf">wgpu_swapchain</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">sg_swapchain</span><span class="p">)</span> <span class="p">{</span>
<span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">width</span><span class="p">,</span>
<span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">height</span><span class="p">,</span>
<span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">sample_count</span><span class="p">,</span>
<span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_</span><span class="p">...,</span>
<span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_</span><span class="p">...,</span>
<span class="p">.</span><span class="n">wgpu</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">render_view</span> <span class="o">=</span> <span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">sample_count</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="n">state</span><span class="p">.</span><span class="n">rt_view</span> <span class="o">:</span> <span class="n">state</span><span class="p">.</span><span class="n">msaa_view</span><span class="p">,</span>
<span class="p">.</span><span class="n">resolve_view</span> <span class="o">=</span> <span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">sample_count</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="o">?</span> <span class="mi">0</span> <span class="o">:</span> <span class="n">state</span><span class="p">.</span><span class="n">rt_view</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth_stencil_view</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">ds_view</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>
<p>…note the selection for <code class="language-plaintext highlighter-rouge">.wgpu.render_view</code> and <code class="language-plaintext highlighter-rouge">.wgpu.resolve_view</code> based on the MSAA
sample count, which works the same as in the <code class="language-plaintext highlighter-rouge">d3d11_swapchain()</code> function.</p>
<p>The types for all view objects are <code class="language-plaintext highlighter-rouge">WGPUTextureView</code>.</p>
<p>Also check out the WebGPU system glue code here:</p>
<p><a href="https://github.com/floooh/sokol-samples/blob/master/wgpu/wgpu_entry.c">https://github.com/floooh/sokol-samples/blob/master/wgpu/wgpu_entry.c</a></p>
<h4 id="gl-with-glfw">…GL with GLFW</h4>
<p>The environment-helper-function only returns default pixel formats and sample count:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_environment</span> <span class="nf">glfw_environment</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">sg_environment</span><span class="p">)</span> <span class="p">{</span>
<span class="p">.</span><span class="n">defaults</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_RGBA8</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_DEPTH_STENCIL</span><span class="p">,</span>
<span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>
<p>…the swapchain function also returns a GL framebuffer object, for the default framebuffer
this is always zero, otherwise this is a handle created with <code class="language-plaintext highlighter-rouge">glGenFramebuffers()</code>.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_swapchain</span> <span class="nf">glfw_swapchain</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">;</span>
<span class="n">glfwGetFramebufferSize</span><span class="p">(</span><span class="n">_window</span><span class="p">,</span> <span class="o">&</span><span class="n">width</span><span class="p">,</span> <span class="o">&</span><span class="n">height</span><span class="p">);</span>
<span class="k">return</span> <span class="p">(</span><span class="n">sg_swapchain</span><span class="p">)</span> <span class="p">{</span>
<span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="n">width</span><span class="p">,</span>
<span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="n">height</span><span class="p">,</span>
<span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="n">_sample_count</span><span class="p">,</span>
<span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_RGBA8</span><span class="p">,</span>
<span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_DEPTH_STENCIL</span><span class="p">,</span>
<span class="p">.</span><span class="n">gl</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">framebuffer</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Also see <a href="https://github.com/floooh/sokol-samples/blob/master/glfw/glfw_glue.c">https://github.com/floooh/sokol-samples/blob/master/glfw/glfw_glue.c</a></p>
<h2 id="q-why-still-have-a-baked-pass-attachments-object">Q: Why still have a baked pass attachments object?</h2>
<p>I’ve been pondering for a little bit to get rid of pre-baked pass-attachments
objects alltogether (e.g. what were formerly <code class="language-plaintext highlighter-rouge">sg_pass</code> objects and are now
<code class="language-plaintext highlighter-rouge">sg_attachments</code> objects), and instead pass a transient struct with the same
information that’s in <code class="language-plaintext highlighter-rouge">sg_attachments_desc</code> into the <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code>
function, similar to how <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> takes a transient <code class="language-plaintext highlighter-rouge">sg_bindings</code>
struct with all the resource bindings.</p>
<p>I didn’t follow through with that idea because this would mean creating
temporary objects inside <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> and discarding them again in
<code class="language-plaintext highlighter-rouge">sg_end_pass()</code> (or alternatively use a ‘hash-and-cache’ approach).</p>
<p>In D3D11 and WebGPU, one temporary texture view object would need
to be created per pass-attachment (which may add up to 9 temporary objects),
and in the GL backend, a GL framebuffer object must be created,
configured and checked for completeness. All this work currently
only happens once in <code class="language-plaintext highlighter-rouge">sg_make_attachments()</code>, but would need to happen
inside <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> without baked attachments objects.</p>
<p>While these backend API objects should be ‘reasonably cheap’ to create, I still
decided against it.</p>
<p>Currently the only other place where such temporary objects are created and
discarded on the fly are in the <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> call for the WebGPU
backend, where temporary BindGroup objects are created and discarded
dynamically via a ‘hash-and-cache’ approach and I hate it :) I don’t want that
type of code to creep into other places.</p>
<p>Now, <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> and <code class="language-plaintext highlighter-rouge">sg_end_pass()</code> are by far not as high-frequency-calls as
<code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code>, and creating view- and framebuffer-objects <em>should</em> be
cheap enough, but it still feels ‘wrong’ to create and discard backend API
objects willy-nilly during the frame.</p>
Mon, 26 Feb 2024 00:00:00 +0000
https://floooh.github.io/2024/02/26/sokol-spring-cleaning-2024.html
https://floooh.github.io/2024/02/26/sokol-spring-cleaning-2024.htmlVSCode, WASM, WASI<p>I did a neat little thing during my year-end vacation: A VSCode extension for retro-assembly coding with the assembler
and home computer emulator integrated right into VSCode via WASM and WASI.</p>
<p>The extension is here (careful: it must be installed as <strong>pre-release</strong>, otherwise installing a dependency extension
won’t work, more on that later):</p>
<p><a href="https://marketplace.visualstudio.com/items?itemName=floooh.vscode-kcide">https://marketplace.visualstudio.com/items?itemName=floooh.vscode-kcide</a></p>
<p>This is what it looks like in action when debugging a KC85/4 demo I wrote for dog-fooding the extension:</p>
<p><img src="/images/vscode-wasm-wasi-1.webp" alt="Screenshot 1" /></p>
<p>The VSCode extension project is here:</p>
<p><a href="https://github.com/floooh/vscode-kcide">https://github.com/floooh/vscode-kcide</a></p>
<p>…and the samples for KC85/4, C64 and Amstrad CPC are here:</p>
<p><a href="https://github.com/floooh/kcide-sample">https://github.com/floooh/kcide-sample</a></p>
<p>The extension also integrates the following projects:</p>
<ul>
<li><a href="https://github.com/floooh/easmx">a fork</a> of the <a href="http://svn.xi6.com/svn/asmx/branches/2.x/asmx-doc.html">ASMX multi-cpu assembler</a></li>
<li>the KC85/4, C64 and CPC emulators from my <a href="https://floooh.github.io/tiny8bit/">chips project</a></li>
</ul>
<p>Creating a simple VSCode extension is fairly straightforward (see: <a href="https://code.visualstudio.com/api/get-started/your-first-extension">Your First Extension</a>),
so I won’t go into too many details there. What’s interesting is the use of WASM and WASI to integrate projects written
in other languages than JS/TS into a VSCode extension.</p>
<p>This allows to bundle the assembler (written in C89) and the emulator (C99 and C++11) directly with the
extension as WASM blobs. Similar extensions without WASM components
would either need to port the assembler and emulator to JS/TS, ask the user to install and run native tools
(most other retro-dev extensions seem to use that approach), or automatically download and install separate platform-specific native tools
(the approach used by the Microsoft C/C++ extension), which is asking for a lot of trust from the extension user.</p>
<p>WASM fixes all those issues:</p>
<ul>
<li>it’s completely hassle-free for the user because the WASM blobs can be bundled with the extension and everything works out of the box</li>
<li>it’s less hassle for the extension developer, because a single WASM blob automatically works on all platforms supported by VSCode (including the VSCode web version)</li>
<li>…and unlike native binaries, WASM and WASI don’t add any more security concerns over regular VSCode extensions written in TS/JS</li>
</ul>
<p>Also, how cool is it that I can take an assembler written in C89 in the 90’s and safely run that without
code changes in the VSCode web version?</p>
<p>(I <strong>did</strong> actually consider writing my own assembler in Typescript a long time ago just for the purpose of running
it in VSCode but quickly abandondend that idea, here are the ruins of that folly: <a href="https://github.com/floooh/hcasm">https://github.com/floooh/hcasm</a>)</p>
<h2 id="paths-not-taken">Paths not taken</h2>
<p>I considered various approaches:</p>
<ol>
<li>a native IDE via Qt similar to Goran Devic’s <a href="https://baltazarstudios.com/z80explorer/">Z80 Explorer</a></li>
<li>integrate the IDE features right into the emulator via <a href="https://github.com/ocornut/imgui">Dear ImGui</a> (the emulators already have an extensive Dear ImGui debugging UI)</li>
<li>create a VSCode extension which calls into an assembler and emulator written in Typescript</li>
<li>create a VSCode extension which calls into native assembler and emulator binaries</li>
<li>create a VSCode extension which uses WASM for the assembler and emulator</li>
</ol>
<p>The final decision to use VSCode with WASM comes down to a couple of central problems:</p>
<ul>
<li>dealing with native tools in a cross-platform scenario is a massive PITA these days:
<ul>
<li>running the same binary across different Linux distros is still pretty much an unsolved problem</li>
<li>on Windows and macOS you’ll get all sorts of scare popups when trying to run an executable downloaded from the internet</li>
</ul>
</li>
<li>porting a code base to TS/JS just so that it can be hooked up into a VSCode extension is almost always a massive waste of time</li>
</ul>
<p>In the end it was a decision between (2: extend the existing Dear ImGui emulator UI with IDE features), and
(4: figure out how to integrate the assembler and emulator as WASM blobs into a VSCode extension).</p>
<p>While I enjoy writing Dear ImGui UIs immensely, a robust text editing experience which can rival a dedicated text editor like VSCode would be
a massive project on its own.</p>
<p>…which leaves (4) as the one option which enables the most robust result for the least amount of work (important, since this is
a ‘vacation side project’ which shouldn’t increase my spare time software maintenance burden even more).</p>
<p>All in all the extension was finished in about 3 weeks of focused work (spread over 6 real-world weeks, with 2 weeks spent dog-fooding on a little <a href="https://floooh.github.io/kcide-sample/kc854.html?file=demo.kcc">KC85/4 assembly demo</a>).</p>
<p>Of the 3 weeks working on the VSCode extension, about 2 weeks were spent on the Debug Adapter alone
(a lot more effort than I initially expected).</p>
<h2 id="the-boring-parts">The boring parts</h2>
<p>I’ll run very quickly over the parts of the extension that are not all that interesting (since all of that
is just reading the <a href="https://code.visualstudio.com/api">VSCode extension documentation</a> about what features
can be provided by extensions and how to implement them).</p>
<p>The KC IDE extension implements:</p>
<ul>
<li>a handful of <strong>Commands</strong> which can be invoked via the <code class="language-plaintext highlighter-rouge">Ctrl-P</code> command palette:
<ul>
<li><code class="language-plaintext highlighter-rouge">KCIDE: Build</code>: assembles the source code into a binary file compatible with the
current emulator</li>
<li><code class="language-plaintext highlighter-rouge">KCIDE: Debug</code>: builds the source and starts a debugging session</li>
<li><code class="language-plaintext highlighter-rouge">KCIDE: Open Emulator</code>: (re-)opens the emulator tab</li>
<li><code class="language-plaintext highlighter-rouge">KCIDE: Reboot Emulator</code>: cold-boots the emulator and stops active debug session</li>
<li><code class="language-plaintext highlighter-rouge">KCIDE: Reset Emulator</code>: resets the emulator and stops active debug session (on some home computers,
a reset preserves the memory content)</li>
</ul>
</li>
<li>two <strong>Key Bindings</strong>: <code class="language-plaintext highlighter-rouge">F5</code> to start a debug session and <code class="language-plaintext highlighter-rouge">F7</code> to build the project source code into a binary file</li>
<li>a <strong>JSON Schema</strong> for a <code class="language-plaintext highlighter-rouge">kcide.project.json</code> file which defines the target
computer system, assembly dialect, file paths and output binary file format loadable by the emulator</li>
<li>a <strong>Language Grammar</strong> for regex-based syntax highlighting (Z80 and 6502 assembly statements, plus ASMX-specific keywords)</li>
<li>a <strong>Debug Adapter</strong> to connect the VSCode debugging UI with the (already existing) debugger that’s integrated into the emulator</li>
</ul>
<p>Some notable VSCode extension features which are <strong>not</strong> implemented:</p>
<ul>
<li>No <strong>Language Server</strong> (to provide error squiggles and code completion while typing), the LSP protocol
is a bit of overkill for low level languages like assembly, while it would have been a ‘nice to have’
feature, it wasn’t doable in the available time, and features similar to a full LSP can most likely
also be implemented without a full LSP implementation (VSCode has a couple of other language features
like <a href="https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide">semantic highlighting</a>,
<a href="https://code.visualstudio.com/api/language-extensions/snippet-guide">snippets</a> or
<a href="https://code.visualstudio.com/api/language-extensions/programmatic-language-features">programmatic language features</a>).
In the end I simply ran out of time, maybe in the next round of updates…</li>
<li>No <strong>Task Providers</strong> (e.g. proper integration with <code class="language-plaintext highlighter-rouge">tasks.json</code> and <code class="language-plaintext highlighter-rouge">launch.json</code>). This also seemed
like overkill. Just adding two key bindings while the extension is active (<code class="language-plaintext highlighter-rouge">F5</code> for debugging and <code class="language-plaintext highlighter-rouge">F7</code> for building)
achieves the same thing with less hassle for the user.</li>
</ul>
<p>Finally, a VSCode extension may run in 3 environments, which has some subtle consequences for what APIs can be
used in the extension code:</p>
<ul>
<li><strong>desktop</strong>: the extension only works in ‘desktop VSCode’ and can use the full set of node.js APIs</li>
<li><strong>web</strong>: the extension works in ‘VSCode for the web’, which means only the VSCode extension API and browser APIs can be called</li>
<li><strong>universal</strong>: the extension can run both in desktop and web VSCode</li>
</ul>
<p>The KC IDE is a universal extension, but still has some issues when running in the web version of VSCode (which comes down to a mix
of VSCode issues and some file-IO related issues I will most likely need to fix on my side).</p>
<h2 id="integrating-the-assembler-via-wasi">Integrating the assembler via WASI</h2>
<p>This turned out a lot easier than expected, because the <a href="https://code.visualstudio.com/blogs/2023/06/05/vscode-wasm-wasi">VSCode WASI extension</a>
does all the hard work.</p>
<p>What this extension basically does is to allow any POSIX commandline tool to run inside VSCode without requiring
changes to the source (most notably, no changes are required for blocking file IO code via fopen/fread/fwrite/fclose).</p>
<p>The only thing I had to fix in the ASMX assembler was a separately provided root path for the assembler’s
<code class="language-plaintext highlighter-rouge">include</code> statement (which is supposed to work with relative paths). WASI currently doesn’t have the concept
of a ‘current working directory’, so all filesystem paths must be resolved to absolute paths within the
WASI container’s virtual filesystem (a WASI environment doesn’t use direct filesystem paths of the host system,
but instead defines its own virtual filesystem with mount points mapped to host system directories).</p>
<p>The basic procedure to get the assembler working inside VSCode is:</p>
<ul>
<li>compile the assembler to a WASI blob using the <a href="https://github.com/WebAssembly/wasi-sdk">WASI SDK Clang toolchain</a>,
this happens manually outside the extension project, the resulting .wasm blob is then simply committed into the
extension’s git repo and bundled with the published extension. The size of the WASM blob is about 200 KBytes.</li>
<li>
<p>in the VSCode extension code: initialize the WASI runtime, setup a virtual filesystem, and load and compile the
assembler WASM blob, this happens only once during the extension’s life cycle:</p>
<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">export</span> <span class="k">async</span> <span class="kd">function</span> <span class="nx">requireWasiEnv</span><span class="p">(</span><span class="nx">ext</span><span class="p">:</span> <span class="nx">ExtensionContext</span><span class="p">):</span> <span class="nb">Promise</span><span class="o"><</span><span class="nx">WasiEnv</span><span class="o">></span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">wasiEnv</span> <span class="o">===</span> <span class="kc">null</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">wasm</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">Wasm</span><span class="p">.</span><span class="nx">load</span><span class="p">();</span>
<span class="kd">const</span> <span class="nx">fs</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">wasm</span><span class="p">.</span><span class="nx">createRootFileSystem</span><span class="p">([</span> <span class="p">{</span> <span class="na">kind</span><span class="p">:</span> <span class="dl">'</span><span class="s1">workspaceFolder</span><span class="dl">'</span> <span class="p">}</span> <span class="p">]);</span>
<span class="kd">const</span> <span class="nx">bits</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">workspace</span><span class="p">.</span><span class="nx">fs</span><span class="p">.</span><span class="nx">readFile</span><span class="p">(</span><span class="nx">Uri</span><span class="p">.</span><span class="nx">joinPath</span><span class="p">(</span><span class="nx">ext</span><span class="p">.</span><span class="nx">extensionUri</span><span class="p">,</span> <span class="dl">'</span><span class="s1">media/asmx.wasm</span><span class="dl">'</span><span class="p">));</span>
<span class="kd">const</span> <span class="nx">asmx</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">WebAssembly</span><span class="p">.</span><span class="nx">compile</span><span class="p">(</span><span class="nx">bits</span><span class="p">);</span>
<span class="nx">wasiEnv</span> <span class="o">=</span> <span class="p">{</span> <span class="nx">wasm</span><span class="p">,</span> <span class="nx">fs</span><span class="p">,</span> <span class="nx">asmx</span> <span class="p">};</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">wasiEnv</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div> </div>
</li>
<li>
<p>run the assembler WASM blob, capture stdout and stderr and check the exit code, this is quite similar to how a native tool would be launched:</p>
<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">export</span> <span class="k">async</span> <span class="kd">function</span> <span class="nx">runAsmx</span><span class="p">(</span><span class="nx">ext</span><span class="p">:</span> <span class="nx">ExtensionContext</span><span class="p">,</span> <span class="nx">args</span><span class="p">:</span> <span class="kr">string</span><span class="p">[]):</span> <span class="nb">Promise</span><span class="o"><</span><span class="nx">RunAsmxResult</span><span class="o">></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">wasiEnv</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">requireWasiEnv</span><span class="p">(</span><span class="nx">ext</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">process</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">wasiEnv</span><span class="p">.</span><span class="nx">wasm</span><span class="p">.</span><span class="nx">createProcess</span><span class="p">(</span><span class="dl">'</span><span class="s1">asmx</span><span class="dl">'</span><span class="p">,</span> <span class="nx">wasiEnv</span><span class="p">.</span><span class="nx">asmx</span><span class="p">,</span> <span class="p">{</span>
<span class="na">rootFileSystem</span><span class="p">:</span> <span class="nx">wasiEnv</span><span class="p">.</span><span class="nx">fs</span><span class="p">,</span>
<span class="na">stdio</span><span class="p">:</span> <span class="p">{</span>
<span class="na">out</span><span class="p">:</span> <span class="p">{</span> <span class="na">kind</span><span class="p">:</span> <span class="dl">'</span><span class="s1">pipeOut</span><span class="dl">'</span> <span class="p">},</span>
<span class="na">err</span><span class="p">:</span> <span class="p">{</span> <span class="na">kind</span><span class="p">:</span> <span class="dl">'</span><span class="s1">pipeOut</span><span class="dl">'</span> <span class="p">},</span>
<span class="p">},</span>
<span class="nx">args</span><span class="p">,</span>
<span class="p">});</span>
<span class="kd">const</span> <span class="nx">decoder</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">TextDecoder</span><span class="p">(</span><span class="dl">'</span><span class="s1">utf-8</span><span class="dl">'</span><span class="p">);</span>
<span class="kd">let</span> <span class="nx">stderr</span> <span class="o">=</span> <span class="dl">''</span><span class="p">;</span>
<span class="kd">let</span> <span class="nx">stdout</span> <span class="o">=</span> <span class="dl">''</span><span class="p">;</span>
<span class="nx">process</span><span class="p">.</span><span class="nx">stderr</span><span class="o">!</span><span class="p">.</span><span class="nx">onData</span><span class="p">((</span><span class="nx">data</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">stderr</span> <span class="o">+=</span> <span class="nx">decoder</span><span class="p">.</span><span class="nx">decode</span><span class="p">(</span><span class="nx">data</span><span class="p">);</span>
<span class="p">});</span>
<span class="nx">process</span><span class="p">.</span><span class="nx">stdout</span><span class="o">!</span><span class="p">.</span><span class="nx">onData</span><span class="p">((</span><span class="nx">data</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">stdout</span> <span class="o">+=</span> <span class="nx">decoder</span><span class="p">.</span><span class="nx">decode</span><span class="p">(</span><span class="nx">data</span><span class="p">);</span>
<span class="p">});</span>
<span class="kd">const</span> <span class="nx">exitCode</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">process</span><span class="p">.</span><span class="nx">run</span><span class="p">();</span>
<span class="k">return</span> <span class="p">{</span> <span class="nx">exitCode</span><span class="p">,</span> <span class="nx">stdout</span><span class="p">,</span> <span class="nx">stderr</span> <span class="p">};</span>
<span class="p">}</span>
</code></pre></div> </div>
</li>
<li>the KC IDE extension will then parse the assembler error messages in stderr and convert the error messages into VSCode Diagnostic objects,
which then show up in the <code class="language-plaintext highlighter-rouge">Problems</code> panel and as error squiggles in the text editor</li>
<li>the actual assembler output files are written directly into the host filesystem via the virtual filesystem mapping that was provided
when initializing the WASI runtime</li>
</ul>
<h2 id="integrating-the-emulator">Integrating the emulator</h2>
<p>The embedded home computer emulators are taken from the <a href="https://github.com/floooh/chips">chips project</a>,
those are implemented in C/C++, use the <a href="https://github.com/floooh/sokol">sokol headers</a> for abstracting platform details
and run both as natively compiled executables and <a href="https://floooh.github.io/tiny8bit/">in the browser</a> via WASM and WebGL, compiled
with the Emscripten SDK.</p>
<p>One emulator WASM blob is about 700..800 KBytes (most of that is the Dear ImGui debugging UI which costs about 450 Kbytes).</p>
<p>Currently the KC IDE extension contains 4 emulators (KC85/3, KC85/4, C64 and CPC) which adds up to about 3 MBytes (if there will
be drastically more supported systems in the future I’ll need to come up with a solution to reduce the size of the embedded
emulators, either downloading them on demand, merge them into a single ‘multi-system-emulator’ binary, or maybe moving the UI into a shared
WASM module that’s loaded like a DLL).</p>
<p>The emulator is running inside a VSCode <a href="https://code.visualstudio.com/api/extension-guides/webview">webview panel</a>. For the
most part this is quite straightforward for an Emscripten WebGL application by taking an <a href="https://github.com/floooh/vscode-kcide/blob/b062aa56609fafeffc70ef0ac440c6ee1d70fe5b/media/shell.html">index.html like this</a> (note the placeholders <code class="language-plaintext highlighter-rouge">{{{shell}}}</code> and <code class="language-plaintext highlighter-rouge">{{{emu}}}</code>, those must be replaced with runtime-generated URLs), and setup
a <a href="https://github.com/floooh/vscode-kcide/blob/b062aa56609fafeffc70ef0ac440c6ee1d70fe5b/src/emu.ts#L22-L77">webview panel object like this</a>.</p>
<p>There’s a couple of interesting details in that code:</p>
<p>The webview panel cannot simply load resources from anywhere in the host file system, instead a <code class="language-plaintext highlighter-rouge">localResourceRoot</code> must be provided
in the <code class="language-plaintext highlighter-rouge">window.createWebviewPanel()</code> call which points to the extension subdirectory <code class="language-plaintext highlighter-rouge">media/</code> (e.g. anything that’s loaded in the
webview panel needs to be located in that <code class="language-plaintext highlighter-rouge">media/</code> subdirectory):</p>
<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kd">const</span> <span class="nx">rootUri</span> <span class="o">=</span> <span class="nx">Uri</span><span class="p">.</span><span class="nx">joinPath</span><span class="p">(</span><span class="nx">getExtensionUri</span><span class="p">(),</span> <span class="dl">'</span><span class="s1">media</span><span class="dl">'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">panel</span> <span class="o">=</span> <span class="nb">window</span><span class="p">.</span><span class="nx">createWebviewPanel</span><span class="p">(</span>
<span class="c1">// ...</span>
<span class="p">{</span>
<span class="na">localResourceRoots</span><span class="p">:</span> <span class="p">[</span> <span class="nx">rootUri</span> <span class="p">],</span>
<span class="p">}</span>
<span class="p">);</span>
</code></pre></div></div>
<p>…next, all URLs referenced in the webview panel’s HTML content must be generated via the webview panel API, I’m doing that by loading a HTML template file and then replace the placeholders inside <code class="language-plaintext highlighter-rouge">{{{...}}}</code> with generated URLs (and while
at it, I also select the correct emulator to load):</p>
<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kd">let</span> <span class="nx">emuFilename</span><span class="p">;</span>
<span class="k">switch</span> <span class="p">(</span><span class="nx">project</span><span class="p">.</span><span class="nx">emulator</span><span class="p">.</span><span class="nx">system</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="nx">System</span><span class="p">.</span><span class="nx">KC853</span><span class="p">:</span> <span class="nx">emuFilename</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">kc853-ui.js</span><span class="dl">'</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="nx">System</span><span class="p">.</span><span class="nx">C64</span><span class="p">:</span> <span class="nx">emuFilename</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">c64-ui.js</span><span class="dl">'</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="nx">System</span><span class="p">.</span><span class="nx">CPC6128</span><span class="p">:</span> <span class="nx">emuFilename</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">cpc-ui.js</span><span class="dl">'</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span>
<span class="nl">default</span><span class="p">:</span> <span class="nx">emuFilename</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">kc854-ui.js</span><span class="dl">'</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">emuUri</span> <span class="o">=</span> <span class="nx">panel</span><span class="p">.</span><span class="nx">webview</span><span class="p">.</span><span class="nx">asWebviewUri</span><span class="p">(</span><span class="nx">Uri</span><span class="p">.</span><span class="nx">joinPath</span><span class="p">(</span><span class="nx">rootUri</span><span class="p">,</span> <span class="nx">emuFilename</span><span class="p">));</span>
<span class="kd">const</span> <span class="nx">shellUri</span> <span class="o">=</span> <span class="nx">panel</span><span class="p">.</span><span class="nx">webview</span><span class="p">.</span><span class="nx">asWebviewUri</span><span class="p">(</span><span class="nx">Uri</span><span class="p">.</span><span class="nx">joinPath</span><span class="p">(</span><span class="nx">rootUri</span><span class="p">,</span> <span class="dl">'</span><span class="s1">shell.js</span><span class="dl">'</span><span class="p">));</span>
<span class="kd">const</span> <span class="nx">templ</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">readTextFile</span><span class="p">(</span><span class="nx">Uri</span><span class="p">.</span><span class="nx">joinPath</span><span class="p">(</span><span class="nx">rootUri</span><span class="p">,</span> <span class="dl">'</span><span class="s1">shell.html</span><span class="dl">'</span><span class="p">));</span>
<span class="kd">const</span> <span class="nx">html</span> <span class="o">=</span> <span class="nx">templ</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="dl">'</span><span class="s1">{{{emu}}}</span><span class="dl">'</span><span class="p">,</span> <span class="nx">emuUri</span><span class="p">.</span><span class="nx">toString</span><span class="p">()).</span><span class="nx">replace</span><span class="p">(</span><span class="dl">'</span><span class="s1">{{{shell}}}</span><span class="dl">'</span><span class="p">,</span> <span class="nx">shellUri</span><span class="p">.</span><span class="nx">toString</span><span class="p">());</span>
<span class="nx">panel</span><span class="p">.</span><span class="nx">webview</span><span class="p">.</span><span class="nx">html</span> <span class="o">=</span> <span class="nx">html</span><span class="p">;</span>
</code></pre></div></div>
<p>Communication between VSCode and the WebView panel content works via bi-directional message passing, this means
the VSCode extension needs to register a listener function which dispatches received messages to their handler
functions:</p>
<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nx">panel</span><span class="p">.</span><span class="nx">webview</span><span class="p">.</span><span class="nx">onDidReceiveMessage</span><span class="p">((</span><span class="nx">msg</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">command</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">emu_cpustate</span><span class="dl">'</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">cpuStateResolved</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">state</span> <span class="k">as</span> <span class="nx">CPUState</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">command</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">emu_disassembly</span><span class="dl">'</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">disassemblyResolved</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">result</span> <span class="k">as</span> <span class="nx">DisasmLine</span><span class="p">[]);</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">command</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">emu_memory</span><span class="dl">'</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">readMemoryResolved</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">result</span> <span class="k">as</span> <span class="nx">ReadMemoryResult</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">command</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">emu_ready</span><span class="dl">'</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">state</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">state</span><span class="p">.</span><span class="nx">ready</span> <span class="o">=</span> <span class="nx">msg</span><span class="p">.</span><span class="nx">isReady</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">KCIDEDebugSession</span><span class="p">.</span><span class="nx">onEmulatorMessage</span><span class="p">(</span><span class="nx">msg</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">});</span>
</code></pre></div></div>
<p>…sending a message into the opposite direction (from the debug session to the webview panel) simply
looks like this:</p>
<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">await</span> <span class="nx">state</span><span class="p">.</span><span class="nx">panel</span><span class="p">.</span><span class="nx">webview</span><span class="p">.</span><span class="nx">postMessage</span><span class="p">({</span> <span class="na">cmd</span><span class="p">:</span> <span class="dl">'</span><span class="s1">boot</span><span class="dl">'</span> <span class="p">});</span>
</code></pre></div></div>
<p>…the message structure is entirely custom (and I’m just noticing that I’m using <code class="language-plaintext highlighter-rouge">command</code> in one direction,
but <code class="language-plaintext highlighter-rouge">cmd</code> in the other direction… but anyway…).</p>
<p>There is one missing step in the communication between VSCode debug session on one side, and the emulator on the other. There’s a <a href="https://github.com/floooh/vscode-kcide/blob/main/media/shell.js">Javascript shim</a>
running in the context of the webpage which translates between the JSON-like message objects which are sent and received
by the VSCode debug session, and a lower level WASM function call interface implemented by the emulator.</p>
<p>When a message is received from the VSCode debug session in the emulator’s HTML page, it’s dispatched to a Javascript function via
an event listener added to the <code class="language-plaintext highlighter-rouge">window</code> object (note that this code is plain Javascript, not Typescript):</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nb">window</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="dl">'</span><span class="s1">message</span><span class="dl">'</span><span class="p">,</span> <span class="nx">ev</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">msg</span> <span class="o">=</span> <span class="nx">ev</span><span class="p">.</span><span class="nx">data</span><span class="p">;</span>
<span class="k">switch</span> <span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">cmd</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="dl">'</span><span class="s1">boot</span><span class="dl">'</span><span class="p">:</span> <span class="nx">kcide_boot</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="dl">'</span><span class="s1">reset</span><span class="dl">'</span><span class="p">:</span> <span class="nx">kcide_reset</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="dl">'</span><span class="s1">ready</span><span class="dl">'</span><span class="p">:</span> <span class="nx">kcide_ready</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="dl">'</span><span class="s1">load</span><span class="dl">'</span><span class="p">:</span> <span class="nx">kcide_load</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">data</span><span class="p">);</span> <span class="k">break</span><span class="p">;</span>
<span class="c1">// ...</span>
<span class="k">case</span> <span class="dl">'</span><span class="s1">disassemble</span><span class="dl">'</span><span class="p">:</span> <span class="nx">kcide_dbgDisassemble</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">addr</span><span class="p">,</span> <span class="nx">msg</span><span class="p">.</span><span class="nx">offsetLines</span><span class="p">,</span> <span class="nx">msg</span><span class="p">.</span><span class="nx">numLines</span><span class="p">);</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="dl">'</span><span class="s1">readMemory</span><span class="dl">'</span><span class="p">:</span> <span class="nx">kcide_dbgReadMemory</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">addr</span><span class="p">,</span> <span class="nx">msg</span><span class="p">.</span><span class="nx">numBytes</span><span class="p">);</span> <span class="k">break</span><span class="p">;</span>
<span class="nl">default</span><span class="p">:</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`unknown cmd called: </span><span class="p">${</span><span class="nx">msg</span><span class="p">.</span><span class="nx">cmd</span><span class="p">}</span><span class="s2">`</span><span class="p">);</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">});</span>
</code></pre></div></div>
<p>Such a handler function looks like this:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">kcide_boot</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">Module</span><span class="p">.</span><span class="nx">_webapi_boot</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This is an ‘Emscripten-ism’. The easiest way to export a C function from WASM to Javascript is via the
<code class="language-plaintext highlighter-rouge">EMSCRIPTEN_KEEPALIVE</code> attribute in the C source, like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">EMSCRIPTEN_KEEPALIVE</span> <span class="kt">void</span> <span class="nf">webapi_boot</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">inited</span> <span class="o">&&</span> <span class="n">state</span><span class="p">.</span><span class="n">funcs</span><span class="p">.</span><span class="n">boot</span><span class="p">)</span> <span class="p">{</span>
<span class="n">state</span><span class="p">.</span><span class="n">funcs</span><span class="p">.</span><span class="n">boot</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>When Emscripten builds the project, it keeps track of all <code class="language-plaintext highlighter-rouge">EMSCRIPTEN_KEEPALIVE</code> C functions and makes them
available as Javascript functions on a global <code class="language-plaintext highlighter-rouge">Module</code> object created by the Emscripten entry stub. Calling
such an <code class="language-plaintext highlighter-rouge">EMSCRIPTEN_KEEPALIVE</code> C function from the Javascript side then looks like this:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nx">Module</span><span class="p">.</span><span class="nx">_webapi_boot</span><span class="p">();</span>
</code></pre></div></div>
<p>…and that’s essentially how the communication between VSCode and the WASM emulator works. For instance, when the VSCode palette command
<code class="language-plaintext highlighter-rouge">KCIDE: Reboot Emulator</code> is executed, eventually the C function <code class="language-plaintext highlighter-rouge">webapi_boot()</code> in the WASM emulator will be
called, which reboots the emulator.</p>
<p>Currently the emulators implement the following ‘web API’ functions callable from Javascript:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">webapi_dbg_connect</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// a VSCode debug session has started</span>
<span class="kt">void</span> <span class="nf">webapi_dbg_disconnect</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// a VSCode debug session has ended</span>
<span class="kt">void</span><span class="o">*</span> <span class="nf">webapi_alloc</span><span class="p">(</span><span class="kt">int</span> <span class="n">size</span><span class="p">);</span> <span class="c1">// helper function to allocate on the WASM heap from Javascript</span>
<span class="kt">void</span> <span class="nf">webapi_free</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">);</span> <span class="c1">// helper function to free memory allocated via webapi_alloc()</span>
<span class="kt">void</span> <span class="nf">webapi_boot</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// reboot the emulator (e.g. switch off and on)</span>
<span class="kt">void</span> <span class="nf">webapi_reset</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// reset the emulator (e.g. press the reset button)</span>
<span class="n">bool</span> <span class="nf">webapi_ready</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// returns true when the emulator is ready to start a debug session after rebooting</span>
<span class="n">bool</span> <span class="nf">webapi_load</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">ptr</span><span class="p">,</span> <span class="kt">int</span> <span class="n">size</span><span class="p">);</span> <span class="c1">// load binary data into the emulator</span>
<span class="kt">void</span> <span class="nf">webapi_dbg_add_breakpoint</span><span class="p">(</span><span class="kt">uint16_t</span> <span class="n">addr</span><span class="p">);</span> <span class="c1">// add a debug breakpoint at a 16-bit address</span>
<span class="kt">void</span> <span class="nf">webapi_dbg_remove_breakpoint</span><span class="p">(</span><span class="kt">uint16_t</span> <span class="n">addr</span><span class="p">);</span> <span class="c1">// delete a debug breakpoint at a 16-bit address</span>
<span class="kt">void</span> <span class="nf">webapi_dbg_break</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// break into the debugger</span>
<span class="kt">void</span> <span class="nf">webapi_dbg_continue</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// continue execution when stopped in debugger</span>
<span class="kt">void</span> <span class="nf">webapi_dbg_step_next</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// execute a 'step over' in the debugger</span>
<span class="kt">void</span> <span class="nf">webapi_dbg_step_into</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// execute a 'step into' in the debugger</span>
<span class="kt">uint16_t</span><span class="o">*</span> <span class="nf">webapi_dbg_cpu_state</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// request a raw 'CPU state' dump (current register values)</span>
<span class="n">webapi_dasm_line_t</span><span class="o">*</span> <span class="nf">webapi_dbg_request_disassembly</span><span class="p">(</span><span class="cm">/*...*/</span><span class="p">);</span> <span class="c1">// request a disassembly dump over a range of addresses</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="nf">webapi_dbg_read_memory</span><span class="p">(</span><span class="kt">uint16_t</span> <span class="n">addr</span><span class="p">,</span> <span class="kt">int</span> <span class="n">num_bytes</span><span class="p">);</span> <span class="c1">// request a memory dump over a range of addresses</span>
</code></pre></div></div>
<p>In the opposite direction (from the emulator to the VSCode debug session), the emulator calls into
the following C callback functions, which in turn call into Javascript to create a JSON-like message
object to send back into the VSCode debug session:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">webapi_event_stopped</span><span class="p">(</span><span class="kt">int</span> <span class="n">stop_reason</span><span class="p">,</span> <span class="kt">uint16_t</span> <span class="n">addr</span><span class="p">);</span> <span class="c1">// debugger has stopped at addr for a specific reason</span>
<span class="kt">void</span> <span class="nf">webapi_event_continued</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// the debugger has continued execution</span>
<span class="kt">void</span> <span class="nf">webapi_event_reboot</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// the emulator has been rebooted</span>
<span class="kt">void</span> <span class="kt">void</span> <span class="nf">webapi_event_reset</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// the emulator has been reset</span>
</code></pre></div></div>
<p>…in a nutshell, this is the minimal ‘virtual machine’ interface required to implement
a somewhat feature-complete VSCode Debug Adapter.</p>
<p>One downside of the <a href="https://microsoft.github.io/debug-adapter-protocol/">Debug Adapter Protocol</a>
is that it is clearly designed towards high level languages, and the protocol feature set
has little overlap with debugging features that are desired in an emulator virtual machine.</p>
<p>But thankfully, the Debug Adapter Protocol is also flexible enough that it can work side
by side with the much more powerful debugger that’s already integrated in the chips-emulators
via Dear ImGui:</p>
<p><img src="/images/vscode-wasm-wasi-3.webp" alt="Screenshot 3" /></p>
<p>…for instance, the embedded Dear ImGui debugger allows to step the emulator forward in single
clock cycles, while the VSCode debugger only steps at instruction or source line granularity.</p>
<h2 id="known-issues-and-future-updates">Known Issues and future updates</h2>
<p>There’s a couple of issues which are currently worked around or don’t work at all,
and which I want to fix in future updates (most of those are only an issue in the VSCode
web version, so not exactly show stoppers):</p>
<ul>
<li>
<p>Hopefully the <a href="https://github.com/microsoft/vscode-wasm">VSCode WASI extension</a> will go out of
pre-release-only mode rather sooner than later, at that point I can also move the KC IDE
extension out of pre-release. The problem is that trying to install a VSCode extension which depends
on a pre-release-only extension will fail to install the dependency with a cryptic error message. Worst
case is that I need to implement my own VSCode WASI runtime, or figure out another way to run the
assembler inside VSCode (maybe as a regular WASM blob which replaces the C stdlib IO calls with
asynchronous functions with completion-callback, delegated to Javascript)</p>
</li>
<li>
<p>Currently, any binary-blob data that needs to be transferred from VSCode into the emulator needs to go through a base64-encoded string
which is expensive to encode and decode. The reason for that hack is that transferring Uint8Array objects doesn’t
work when VSCode is running in the web (it’s supposed to work, but the data gets corrupted).</p>
</li>
<li>
<p>Working directly on Github repositories in the VSCode web version doesn’t work (weird
virtual filesystem issues).</p>
</li>
<li>
<p>…and of course some sort of Language-Server-like editing experience (proper code
completion and error squiggles while typing), but without implementing a full-blown
language server.</p>
</li>
</ul>
Sun, 31 Dec 2023 00:00:00 +0000
https://floooh.github.io/2023/12/31/vscode-wasm-wasi.html
https://floooh.github.io/2023/12/31/vscode-wasm-wasi.htmlWASM Debugging with Emscripten and VSCode<p><strong>TL;DR</strong>: glueing together VSCode, Cmake and the Emscripten SDK to enable an IDE-like workflow (including debugging).</p>
<p><strong>17-Nov-2024</strong>: looks like the problem that ‘early breakpoints’ are not caught is fixed, woohoo!</p>
<p><strong>09-Oct-2024</strong>: updated for the latest sokol_gfx.h and VSCode extension versions.</p>
<p>This is written from the perspective of a UNIX-like OS (macOS or Linux), but should also work on Windows with some minor tweaks.</p>
<h2 id="prerequisites">Prerequisites</h2>
<p>First make sure that the following tools are in the path:</p>
<ul>
<li>git</li>
<li>cmake</li>
<li>ninja</li>
</ul>
<p>You’ll also need VSCode and Chrome installed.</p>
<p>On macOS I’d recommend using <a href="https://brew.sh/">Homebrew</a> and on Windows <a href="https://scoop.sh/">Scoop</a>
to install those. On Linux of course, your system’s standard package manager.</p>
<h2 id="emscripten-hello-world">Emscripten Hello World</h2>
<p>Let’s start from scratch. On the command line:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir </span>hello
<span class="nb">cd </span>hello
git init
</code></pre></div></div>
<p>Add a <code class="language-plaintext highlighter-rouge">.gitignore</code> file:</p>
<p><code class="language-plaintext highlighter-rouge">.gitignore</code></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>build/
emsdk/
</code></pre></div></div>
<p>Install the Emscripten SDK, we’ll do so in a way that it doesn’t leave a trace on your system when
deleted so don’t worry. Still inside the <code class="language-plaintext highlighter-rouge">hello</code> directory:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone --depth=1 https://github.com/emscripten-core/emsdk
cd emsdk
./emsdk install latest
./emsdk activate --embedded latest
cd ..
</code></pre></div></div>
<p>Don’t forget the <code class="language-plaintext highlighter-rouge">./emsdk activate --embedded latest</code> step! (happens to me all the time)</p>
<p>…let’s check if that worked. Create a <code class="language-plaintext highlighter-rouge">hello.c</code> source file in the <code class="language-plaintext highlighter-rouge">hello</code> project directory:</p>
<p><code class="language-plaintext highlighter-rouge">hello.c</code></p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf"><stdio.h></span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Hello World!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>…compile that into a .wasm/.js pair runnable with node.js:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emsdk/upstream/emscripten/emcc hello.c <span class="nt">-o</span> hello.js
</code></pre></div></div>
<p>…there should be a hello.js and hello.wasm file now:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">ls
</span>emsdk hello.c hello.js hello.wasm
</code></pre></div></div>
<p>…run the hello.js file via node.js (depending on the emsdk version the path may differ):</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emsdk/node/18.20.3_64bit/bin/node hello.js
</code></pre></div></div>
<p>…you should see a <code class="language-plaintext highlighter-rouge">Hello World!</code> printed to the terminal.</p>
<p>Delete the compiler output, we don’t need that anymore:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">rm </span>hello.js hello.wasm
</code></pre></div></div>
<h2 id="cmake--emscripten">CMake + Emscripten</h2>
<p>Let’s bake the build process into a cmake file. Create a CMakeLists.txt
file in the <code class="language-plaintext highlighter-rouge">hello</code> project directory:</p>
<p><code class="language-plaintext highlighter-rouge">CMakeLists.txt</code></p>
<div class="language-cmake highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cmake_minimum_required</span><span class="p">(</span>VERSION 3.21<span class="p">)</span>
<span class="nb">project</span><span class="p">(</span>hello<span class="p">)</span>
<span class="nb">add_executable</span><span class="p">(</span>hello hello.c<span class="p">)</span>
<span class="nb">if</span> <span class="p">(</span>CMAKE_SYSTEM_NAME STREQUAL Emscripten<span class="p">)</span>
<span class="nb">set</span><span class="p">(</span>CMAKE_EXECUTABLE_SUFFIX .js<span class="p">)</span>
<span class="nb">endif</span><span class="p">()</span>
</code></pre></div></div>
<p>…and since this is a cross-compilation scenario, let’s also create
a CMakeUserPresets.json file. This simplifies calling cmake with the
right arguments for cross-compilation, and will help us later when
integrating with VSCode:</p>
<p><code class="language-plaintext highlighter-rouge">CMakeUserPresets.json</code></p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"version"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w">
</span><span class="nl">"cmakeMinimumRequired"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"major"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w">
</span><span class="nl">"minor"</span><span class="p">:</span><span class="w"> </span><span class="mi">21</span><span class="p">,</span><span class="w">
</span><span class="nl">"patch"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"configurePresets"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"default"</span><span class="p">,</span><span class="w">
</span><span class="nl">"displayName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Emscripten"</span><span class="p">,</span><span class="w">
</span><span class="nl">"binaryDir"</span><span class="p">:</span><span class="w"> </span><span class="s2">"build"</span><span class="p">,</span><span class="w">
</span><span class="nl">"generator"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Ninja Multi-Config"</span><span class="p">,</span><span class="w">
</span><span class="nl">"toolchainFile"</span><span class="p">:</span><span class="w"> </span><span class="s2">"emsdk/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">],</span><span class="w">
</span><span class="nl">"buildPresets"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Debug"</span><span class="p">,</span><span class="w">
</span><span class="nl">"configurePreset"</span><span class="p">:</span><span class="w"> </span><span class="s2">"default"</span><span class="p">,</span><span class="w">
</span><span class="nl">"configuration"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Debug"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Release"</span><span class="p">,</span><span class="w">
</span><span class="nl">"configurePreset"</span><span class="p">:</span><span class="w"> </span><span class="s2">"default"</span><span class="p">,</span><span class="w">
</span><span class="nl">"configuration"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Release"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>…let’s configure and build with cmake:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cmake <span class="nt">--preset</span> default <span class="nt">-B</span> build
cmake <span class="nt">--build</span> build <span class="nt">--preset</span> Debug
</code></pre></div></div>
<p>…and run with node.js:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emsdk/node/18.20.3_64bit/bin/node build/Debug/hello.js
</code></pre></div></div>
<p>…this should again print <code class="language-plaintext highlighter-rouge">Hello World!</code>.</p>
<h2 id="vscode--cmake--emscripten">VSCode + CMake + Emscripten</h2>
<p>Let’s integrate what we have so far with VSCode!</p>
<p>You’ll need the following VSCode extensions:</p>
<ul>
<li><a href="https://marketplace.visualstudio.com/items?itemName=ms-vscode.cpptools">ms-vscode.cpptools</a></li>
<li><a href="https://marketplace.visualstudio.com/items?itemName=ms-vscode.cmake-tools">ms-vscode.cmake-tools</a></li>
<li><a href="https://marketplace.visualstudio.com/items?itemName=ms-vscode.wasm-dwarf-debugging">ms-vscode.wasm-dwarf-debugging</a></li>
<li><a href="https://marketplace.visualstudio.com/items?itemName=ms-vscode.live-server">ms-vscode.live-server</a></li>
</ul>
<p>…with those installed, start VSCode from within the <code class="language-plaintext highlighter-rouge">hello</code> project directory:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>code .
</code></pre></div></div>
<p>You should see something like this, pay attention to the status bar at the bottom (underlined in red), these
items are used to control the cmake build config and target:</p>
<p>(<strong>NOTE 09-Oct-2024</strong>: the underlined items in the bottom bar have moved into the CMake Tools sidepanel in recent versions).</p>
<p><img src="/images/emscripten-ide-1.png" alt="VSCode Screenshot 1" /></p>
<p>Clicking those allows you to select a Configure- and Build-Preset, and a build target.</p>
<p>Change those that it looks like this:</p>
<p><img src="/images/emscripten-ide-2.png" alt="VSCode Screenshot 2" /></p>
<p>Here we also encounter the first wart, the CMake Tools extension isn’t able to communicate the
correct Emscripten sysroot include paths over to the C/C++ extension. You’ll see an error squiggle
under the stdio.h include path:</p>
<p><img src="/images/emscripten-ide-3.png" alt="VSCode Screenshot 3" /></p>
<p>I haven’t found a solution to this problem, it looks like a bug in
the CMake Tools extension. Annoying for sure, but not a showstopper, because
only Intellisense is affected, building should work fine.</p>
<p>You can test that by pressing <code class="language-plaintext highlighter-rouge">F7</code>, or run the palette command <code class="language-plaintext highlighter-rouge">CMake: Build</code>. You should
see something like this in the VSCode Output panel:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[main] Building folder: hello
[build] Starting build
[proc] Executing command: /opt/homebrew/bin/cmake --build /Users/floh/scratch/hello/build --config Debug --target hello
[build] [1/2] Building C object CMakeFiles/hello.dir/Debug/hello.c.o
[build] [2/2] Linking C executable Debug/hello.js
[driver] Build completed: 00:00:00.361
[build] Build finished with exit code
</code></pre></div></div>
<h2 id="debugging">Debugging</h2>
<p>…next lets make debugging work!</p>
<p>Create a <code class="language-plaintext highlighter-rouge">launch.json</code> file in the <code class="language-plaintext highlighter-rouge">.vscode</code> subdirectory:</p>
<p><code class="language-plaintext highlighter-rouge">.vscode/launch.json</code></p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0.2.0"</span><span class="p">,</span><span class="w">
</span><span class="nl">"configurations"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Launch"</span><span class="p">,</span><span class="w">
</span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"node"</span><span class="p">,</span><span class="w">
</span><span class="nl">"request"</span><span class="p">:</span><span class="w"> </span><span class="s2">"launch"</span><span class="p">,</span><span class="w">
</span><span class="nl">"program"</span><span class="p">:</span><span class="w"> </span><span class="s2">"build/Debug/${command:cmake.launchTargetFilename}"</span><span class="p">,</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Pressing <code class="language-plaintext highlighter-rouge">F5</code> should now work. You should see a <code class="language-plaintext highlighter-rouge">Hello World!</code> in the VSCode <code class="language-plaintext highlighter-rouge">Debug Panel</code>.</p>
<p>But when trying to debug there’s the next wart. Try to set a breakpoint in the C source code:</p>
<p><img src="/images/emscripten-ide-4.png" alt="VSCode Screenshot 4" /></p>
<p>Now hit <code class="language-plaintext highlighter-rouge">F5</code>. We’d expect that the execution stops at the breakpoint, but that doesn’t happen.</p>
<p>This is a known issue in the DWARF debugging extension. From the <a href="https://code.visualstudio.com/docs/nodejs/nodejs-debugging#_debugging-webassembly">documentation</a>:</p>
<blockquote>
<p>Breakpoints in WebAssembly code are resolved asynchronously, so breakpoints hit early on in a program’s lifecycle may be missed. There are plans to fix this in the future. If you’re debugging in a browser, you can refresh the page for your breakpoint to be hit. If you’re in Node.js, you can add an artificial delay, or set another breakpoint, after your WebAssembly module is loaded but before your desired breakpoint is hit.</p>
</blockquote>
<p>Hopefully this problem will be fixed soon-ish, since it’s currently the most annoying.</p>
<p>One workaround is to first set a breakpoint in the Javascript launch file at a point where the WASM blob has been loaded.</p>
<p>Load the file <code class="language-plaintext highlighter-rouge">build/Debug/hello.js</code> into the editor, search the function <code class="language-plaintext highlighter-rouge">callMain</code>, and set a breakpoint there:</p>
<p><img src="/images/emscripten-ide-5.png" alt="VSCode Screenshot 5" /></p>
<p>Press <code class="language-plaintext highlighter-rouge">F5</code> and execution should stop at that breakpoint. Now press <code class="language-plaintext highlighter-rouge">F5</code> again and execution should stop in the C code’s main() function
(assuming that breakpoint is still set):</p>
<p><img src="/images/emscripten-ide-6.png" alt="VSCode Screenshot 6" /></p>
<p>Yay. This is how debugging works for a Node.js Emscripten application.</p>
<h3 id="moving-into-the-web-browser">Moving into the web browser</h3>
<p>Let’s extend our <code class="language-plaintext highlighter-rouge">hello.c</code> to render something in WebGL2.</p>
<p>Clone the sokol headers into the <code class="language-plaintext highlighter-rouge">hello</code> project directory and copy some headers up into the project:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone <span class="nt">--depth</span><span class="o">=</span>1 https://github.com/floooh/sokol
<span class="nb">cp </span>sokol/sokol_gfx.h sokol/sokol_app.h sokol/sokol_log.h sokol/sokol_glue.h <span class="nb">.</span>
</code></pre></div></div>
<p>…delete the sokol directory since we don’t need it anymore:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rm -rf sokol
</code></pre></div></div>
<p>Replace the <code class="language-plaintext highlighter-rouge">hello.c</code> file with the following code which just clears the canvas with
a dynamically changing color:</p>
<p><code class="language-plaintext highlighter-rouge">hello.c</code></p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define SOKOL_IMPL
#define SOKOL_GLES3
#include</span> <span class="cpf">"sokol_gfx.h"</span><span class="cp">
#include</span> <span class="cpf">"sokol_app.h"</span><span class="cp">
#include</span> <span class="cpf">"sokol_log.h"</span><span class="cp">
#include</span> <span class="cpf">"sokol_glue.h"</span><span class="cp">
</span>
<span class="k">static</span> <span class="n">sg_pass_action</span> <span class="n">pass_action</span><span class="p">;</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="n">sg_setup</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">environment</span> <span class="o">=</span> <span class="n">sglue_environment</span><span class="p">(),</span>
<span class="p">.</span><span class="n">logger</span><span class="p">.</span><span class="n">func</span> <span class="o">=</span> <span class="n">slog_func</span><span class="p">,</span>
<span class="p">});</span>
<span class="n">pass_action</span> <span class="o">=</span> <span class="p">(</span><span class="n">sg_pass_action</span><span class="p">)</span> <span class="p">{</span>
<span class="p">.</span><span class="n">colors</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">load_action</span> <span class="o">=</span> <span class="n">SG_LOADACTION_CLEAR</span><span class="p">,</span>
<span class="p">.</span><span class="n">clear_value</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">frame</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">float</span> <span class="n">g</span> <span class="o">=</span> <span class="n">pass_action</span><span class="p">.</span><span class="n">colors</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">clear_value</span><span class="p">.</span><span class="n">g</span> <span class="o">+</span> <span class="mi">0</span><span class="p">.</span><span class="mo">01</span><span class="n">f</span><span class="p">;</span>
<span class="n">pass_action</span><span class="p">.</span><span class="n">colors</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">clear_value</span><span class="p">.</span><span class="n">g</span> <span class="o">=</span> <span class="p">(</span><span class="n">g</span> <span class="o">></span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span><span class="p">)</span> <span class="o">?</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span> <span class="o">:</span> <span class="n">g</span><span class="p">;</span>
<span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">action</span> <span class="o">=</span> <span class="n">pass_action</span><span class="p">,</span> <span class="p">.</span><span class="n">swapchain</span> <span class="o">=</span> <span class="n">sglue_swapchain</span><span class="p">()</span> <span class="p">});</span>
<span class="n">sg_end_pass</span><span class="p">();</span>
<span class="n">sg_commit</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">cleanup</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="n">sg_shutdown</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">sapp_desc</span> <span class="nf">sokol_main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">argv</span><span class="p">[])</span> <span class="p">{</span>
<span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">argc</span><span class="p">;</span> <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">argv</span><span class="p">;</span>
<span class="k">return</span> <span class="p">(</span><span class="n">sapp_desc</span><span class="p">){</span>
<span class="p">.</span><span class="n">init_cb</span> <span class="o">=</span> <span class="n">init</span><span class="p">,</span>
<span class="p">.</span><span class="n">frame_cb</span> <span class="o">=</span> <span class="n">frame</span><span class="p">,</span>
<span class="p">.</span><span class="n">cleanup_cb</span> <span class="o">=</span> <span class="n">cleanup</span><span class="p">,</span>
<span class="p">.</span><span class="n">window_title</span> <span class="o">=</span> <span class="s">"clear"</span><span class="p">,</span>
<span class="p">.</span><span class="n">icon</span><span class="p">.</span><span class="n">sokol_default</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span>
<span class="p">.</span><span class="n">logger</span><span class="p">.</span><span class="n">func</span> <span class="o">=</span> <span class="n">slog_func</span><span class="p">,</span>
<span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>
<p>…we’ll also need to make a few changes to our CMakeLists.txt file. Emscripten
needs to know that we want a program that runs in the browser. To do that we’ll
simply change the executable file extension to <code class="language-plaintext highlighter-rouge">.html</code>. Next we need to tell
Emscripten to link with WebGL2.</p>
<p>Open the CMakeLists.txt file and change the <code class="language-plaintext highlighter-rouge">Emscripten</code> if-block like this:</p>
<p><code class="language-plaintext highlighter-rouge">CMakeLists.txt</code></p>
<div class="language-cmake highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">if</span> <span class="p">(</span>CMAKE_SYSTEM_NAME STREQUAL Emscripten<span class="p">)</span>
<span class="nb">set</span><span class="p">(</span>CMAKE_EXECUTABLE_SUFFIX .html<span class="p">)</span>
<span class="nb">target_link_options</span><span class="p">(</span>hello PUBLIC -sUSE_WEBGL2=1<span class="p">)</span>
<span class="nb">endif</span><span class="p">()</span>
</code></pre></div></div>
<p>In VSCode press <code class="language-plaintext highlighter-rouge">F7</code> to rebuild the program. This should generate three output files
in the <code class="language-plaintext highlighter-rouge">build/Debug</code> directory:</p>
<ul>
<li>hello.html</li>
<li>hello.js</li>
<li>hello.wasm</li>
</ul>
<p>Let’s try to run that in the browser. On the command line in the project
directory:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emsdk/upstream/emscripten/emrun build/Debug/hello.html
</code></pre></div></div>
<p>This should open the system’s default web browser and you should see something like this,
with the orange rectangle cycling between yellow and red:</p>
<p><img src="/images/emscripten-ide-7.png" alt="Browser Screenshot" /></p>
<p>…let’s get rid of the ‘window chrome’ by injecting our own minimal <code class="language-plaintext highlighter-rouge">shell.html</code> file.</p>
<p>In the project directory, create a file <code class="language-plaintext highlighter-rouge">shell.html</code> looking like this:</p>
<p><code class="language-plaintext highlighter-rouge">shell.html</code></p>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"></span>
<span class="nt"><html></span>
<span class="nt"><head></span>
<span class="nt"><meta</span> <span class="na">charset=</span><span class="s">"UTF-8"</span><span class="nt">/></span>
<span class="nt"><title></span>Clear<span class="nt"></title></span>
<span class="nt"><style </span><span class="na">type=</span><span class="s">"text/css"</span><span class="nt">></span>
<span class="nc">.game</span> <span class="p">{</span>
<span class="nl">position</span><span class="p">:</span> <span class="nb">absolute</span><span class="p">;</span>
<span class="nl">top</span><span class="p">:</span> <span class="m">0px</span><span class="p">;</span>
<span class="nl">left</span><span class="p">:</span> <span class="m">0px</span><span class="p">;</span>
<span class="nl">margin</span><span class="p">:</span> <span class="m">0px</span><span class="p">;</span>
<span class="nl">border</span><span class="p">:</span> <span class="m">0</span><span class="p">;</span>
<span class="nl">width</span><span class="p">:</span> <span class="m">100%</span><span class="p">;</span>
<span class="nl">height</span><span class="p">:</span> <span class="m">100%</span><span class="p">;</span>
<span class="nl">overflow</span><span class="p">:</span> <span class="nb">hidden</span><span class="p">;</span>
<span class="nl">display</span><span class="p">:</span> <span class="nb">block</span><span class="p">;</span>
<span class="nl">image-rendering</span><span class="p">:</span> <span class="n">optimizeSpeed</span><span class="p">;</span>
<span class="nl">image-rendering</span><span class="p">:</span> <span class="n">-moz-crisp-edges</span><span class="p">;</span>
<span class="nl">image-rendering</span><span class="p">:</span> <span class="n">-o-crisp-edges</span><span class="p">;</span>
<span class="nl">image-rendering</span><span class="p">:</span> <span class="n">-webkit-optimize-contrast</span><span class="p">;</span>
<span class="nl">image-rendering</span><span class="p">:</span> <span class="n">optimize-contrast</span><span class="p">;</span>
<span class="nl">image-rendering</span><span class="p">:</span> <span class="n">crisp-edges</span><span class="p">;</span>
<span class="nl">image-rendering</span><span class="p">:</span> <span class="n">pixelated</span><span class="p">;</span>
<span class="nl">-ms-interpolation-mode</span><span class="p">:</span> <span class="n">nearest-neighbor</span><span class="p">;</span>
<span class="p">}</span>
<span class="nt"></style></span>
<span class="nt"></head></span>
<span class="nt"><body</span> <span class="na">style=</span><span class="s">"background:black"</span><span class="nt">></span>
<span class="nt"><canvas</span> <span class="na">class=</span><span class="s">"game"</span> <span class="na">id=</span><span class="s">"canvas"</span> <span class="na">oncontextmenu=</span><span class="s">"event.preventDefault()"</span><span class="nt">></canvas></span>
<span class="nt"><script </span><span class="na">type=</span><span class="s">"text/javascript"</span><span class="nt">></span>
<span class="kd">var</span> <span class="nx">Module</span> <span class="o">=</span> <span class="p">{</span>
<span class="na">preRun</span><span class="p">:</span> <span class="p">[],</span>
<span class="na">postRun</span><span class="p">:</span> <span class="p">[],</span>
<span class="na">print</span><span class="p">:</span> <span class="p">(</span><span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="kd">function</span><span class="p">(</span><span class="nx">text</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">text</span> <span class="o">=</span> <span class="nb">Array</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">slice</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="nx">arguments</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="dl">'</span><span class="s1"> </span><span class="dl">'</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">text</span><span class="p">);</span>
<span class="p">};</span>
<span class="p">})(),</span>
<span class="na">printErr</span><span class="p">:</span> <span class="kd">function</span><span class="p">(</span><span class="nx">text</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">text</span> <span class="o">=</span> <span class="nb">Array</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">slice</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="nx">arguments</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="dl">'</span><span class="s1"> </span><span class="dl">'</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">text</span><span class="p">);</span>
<span class="p">},</span>
<span class="na">canvas</span><span class="p">:</span> <span class="p">(</span><span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">canvas</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="dl">'</span><span class="s1">canvas</span><span class="dl">'</span><span class="p">);</span>
<span class="nx">canvas</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="dl">"</span><span class="s2">webglcontextlost</span><span class="dl">"</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{</span> <span class="nx">alert</span><span class="p">(</span><span class="dl">'</span><span class="s1">FIXME: WebGL context lost, please reload the page</span><span class="dl">'</span><span class="p">);</span> <span class="nx">e</span><span class="p">.</span><span class="nx">preventDefault</span><span class="p">();</span> <span class="p">},</span> <span class="kc">false</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">canvas</span><span class="p">;</span>
<span class="p">})(),</span>
<span class="na">setStatus</span><span class="p">:</span> <span class="kd">function</span><span class="p">(</span><span class="nx">text</span><span class="p">)</span> <span class="p">{</span> <span class="p">},</span>
<span class="na">monitorRunDependencies</span><span class="p">:</span> <span class="kd">function</span><span class="p">(</span><span class="nx">left</span><span class="p">)</span> <span class="p">{</span> <span class="p">},</span>
<span class="p">};</span>
<span class="nb">window</span><span class="p">.</span><span class="nx">onerror</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">onerror: </span><span class="dl">"</span> <span class="o">+</span> <span class="nx">event</span><span class="p">.</span><span class="nx">message</span><span class="p">);</span>
<span class="p">};</span>
<span class="nt"></script></span>
{{{ SCRIPT }}}
<span class="nt"></body></span>
<span class="nt"></html></span>
</code></pre></div></div>
<p>…and in the <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code> file, change the linker options like this:</p>
<p><code class="language-plaintext highlighter-rouge">CMakeLists.txt</code></p>
<div class="language-cmake highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nb">target_link_options</span><span class="p">(</span>hello PUBLIC -sUSE_WEBGL2=1 --shell-file=../shell.html<span class="p">)</span>
</code></pre></div></div>
<p>…build the project again by pressing <code class="language-plaintext highlighter-rouge">F7</code> and try opening the result
in the browser:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emsdk/upstream/emscripten/emrun build/Debug/hello.html
</code></pre></div></div>
<p>…the WebGL canvas should now stretch over the entire window client area:</p>
<p><img src="/images/emscripten-ide-8.png" alt="Browser Screenshot" /></p>
<h3 id="browser-remote-debugging">Browser Remote Debugging</h3>
<p>Now on to the last step: making remote debugging work!</p>
<p>First, <code class="language-plaintext highlighter-rouge">.vscode/launch.json</code> needs to be changed to start a Chrome remote debug session and a local web server:</p>
<p><code class="language-plaintext highlighter-rouge">.vscode/launch.json</code></p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0.2.0"</span><span class="p">,</span><span class="w">
</span><span class="nl">"configurations"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Launch"</span><span class="p">,</span><span class="w">
</span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"chrome"</span><span class="p">,</span><span class="w">
</span><span class="nl">"request"</span><span class="p">:</span><span class="w"> </span><span class="s2">"launch"</span><span class="p">,</span><span class="w">
</span><span class="nl">"url"</span><span class="p">:</span><span class="w"> </span><span class="s2">"http://localhost:3000/build/Debug/${command:cmake.launchTargetFilename}"</span><span class="p">,</span><span class="w">
</span><span class="nl">"preLaunchTask"</span><span class="p">:</span><span class="w"> </span><span class="s2">"StartServer"</span><span class="p">,</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>…note the <code class="language-plaintext highlighter-rouge">preLaunchTask</code>, this will start a web server using the Live Preview VSCode extension.</p>
<p>To define the <code class="language-plaintext highlighter-rouge">StartServer</code> task, create a file <code class="language-plaintext highlighter-rouge">.vscode/tasks.json</code> and populate it like this:</p>
<p><code class="language-plaintext highlighter-rouge">.vscode/tasks.json</code></p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2.0.0"</span><span class="p">,</span><span class="w">
</span><span class="nl">"tasks"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"label"</span><span class="p">:</span><span class="w"> </span><span class="s2">"StartServer"</span><span class="p">,</span><span class="w">
</span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"process"</span><span class="p">,</span><span class="w">
</span><span class="nl">"command"</span><span class="p">:</span><span class="w"> </span><span class="s2">"${input:startServer}"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">],</span><span class="w">
</span><span class="nl">"inputs"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"startServer"</span><span class="p">,</span><span class="w">
</span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"command"</span><span class="p">,</span><span class="w">
</span><span class="nl">"command"</span><span class="p">:</span><span class="w"> </span><span class="s2">"livePreview.runServerLoggingTask"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>…and that’s it!</p>
<p>When pressing <code class="language-plaintext highlighter-rouge">F5</code>, Chrome should now open and load our program:</p>
<p><img src="/images/emscripten-ide-9.png" alt="Browser Screenshot" /></p>
<p>…while the program is running in the browser, set a breakpoint in <code class="language-plaintext highlighter-rouge">hello.c</code>
at the start of function <code class="language-plaintext highlighter-rouge">void frame(void)</code>. The debugger should now stop
at the function and you can step through the code:</p>
<p><img src="/images/emscripten-ide-10.png" alt="VSCode Screenshot" /></p>
<p>And that’s it! You can now make changes to your code and then compile and
run/debug with <code class="language-plaintext highlighter-rouge">F5</code>. The only downside is the known issue that early breakpoints
are not caught. There are two workarounds, first the one already mentioned to set
a breakpoint on the JS side in <code class="language-plaintext highlighter-rouge">build/Debug/hello.js</code> in the <code class="language-plaintext highlighter-rouge">callMain</code> function
and stop on that first. This seems to catch any early breakpoints on the C side too.</p>
<p>The second option for programs with a render loop is to simply restart the debug
session by pressing the ‘Refresh’ button in the VSCode debugger controls:</p>
<p><img src="/images/emscripten-ide-11.png" alt="VSCode Screenshot" /></p>
<p>This will also catch early breakpoints on the C side, but will popup a warning
that the ‘Live Preview…` task is already running. This can simply be ignored.</p>
<p>You can also find the project described in this blog post on Github:</p>
<p><a href="https://github.com/floooh/vscode-emscripten-debugging">https://github.com/floooh/vscode-emscripten-debugging</a></p>
<h2 id="known-issues">Known Issues</h2>
<p>The list of issues I stumbled over, hopefully those will be fixed in the future:</p>
<ul>
<li>The CMake Tools extension doesn’t properly communicate the Emscripten system include
path to the C/C++ extension so that Intellisense doesn’t work for system headers.</li>
<li>The WASM DWARF debugging extension doesn’t catch early breakpoints on the C side
(known issue).</li>
<li>The Live Preview extension pops up a warning when refreshing a debug session.</li>
</ul>
Sat, 11 Nov 2023 00:00:00 +0000
https://floooh.github.io/2023/11/11/emscripten-ide.html
https://floooh.github.io/2023/11/11/emscripten-ide.html