The Brain Dump This is the blog and personal web page of Andre Weissflog (Floh, floooh, flohofwoe) mostly about programming stuff. https://floooh.github.io/ Sat, 21 Feb 2026 10:58:15 +0000 Sat, 21 Feb 2026 10:58:15 +0000 Jekyll v3.10.0 The experimental Sokol Vulkan backend <p>Update: merge happened on <a href="https://github.com/floooh/sokol/blob/master/CHANGELOG.md#02-dec-2025">02-Dec-2025</a>.</p> <p>In a couple of days I will merge the first implementation of a sokol-gfx Vulkan backend. Please consider this backend as ‘experimental’, it has only received limited testing, has limited platform coverage and some known shortcomings and feature gaps which I will address in followup updates.</p> <p>The related PRs are here:</p> <ul> <li><a href="https://github.com/floooh/sokol/pull/1350">sokol/#1350</a> - this one also has all the embedded shaders for the sokol ‘utility headers’, so it looks much bigger than it actually is (the Vulkan backend is around the same size as the GL backend, a bit over 3 kloc)</li> <li><a href="https://github.com/floooh/sokol-tools/pull/196">sokol-tools/#196</a> - this is the update for the shader compiler which is already merged</li> </ul> <p>The currently known limitiations are:</p> <ul> <li>the entire code expects a ‘desktop GPU feature set’ and doesn’t implement fallback paths for mobile or generally ancient GPUs</li> <li>the window system glue in sokol_app.h is only implemented for Linux/X11 - and before the question comes up again: it works just fine on Wayland-only distros</li> <li>only tested on an Intel Meteor Lake integrated GPU (which also means that some buffer types may be allocated in memory types that are not optimal on GPUs without unified memory)</li> <li>barriers for CPU =&gt; GPU updates are currently quite conservative (e.g. more barriers might be inserted than needed, or at a too early point in a frame)</li> <li>there’s currently no GPU memory allocator, nor a way to inject an external GPU memory allocator like VMA (at least the latter is planned)</li> <li>rendering is currently only supported to a single swapchain (not a problem when used with sokol_app.h because that also only supports a single window)</li> <li>it’s currently not possible to inject native Vulkan buffers and images into sokol-gfx (that’s a somewhat esoteric feature supported by the other backends)</li> <li>I couldn’t get RenderDoc to work, but it’s unclear why</li> </ul> <p>On the upside:</p> <ul> <li>no sokol-gfx API or shader-authoring changes are required (there are some minor breaking API changes because of some code cleanup work I had planned already and which are not directly related to Vulkan, but most code should work without or only minimal changes)</li> <li>the Vulkan validation layer is silent on all sokol-samples (which try to cover most sokol-gfx features and their combined usage), and this includes the tricky optional synchronization2 validations (I’m pretty proud of that considering that most Vulkan samples I tried have sync-validation errors)</li> <li>performance on my Intel Meteor Lake laptop in the <a href="https://floooh.github.io/sokol-html5/drawcallperf-sapp.html">drawcallperf-sample</a> is already slightly better than the OpenGL backend (on a vanilla Kubuntu system)</li> </ul> <p>It’s also important to understand what actually motivated the Vulkan backend (e.g. why now, and not earlier or much later):</p> <p>It’s <em>not</em> mainly about performance, but about ‘future potential’ and OpenGL rot. Essentially, the Vulkan backend is the first step towards deprecating the OpenGL backend (first, an alternative to WebGL2 had to happen - which exists now with WebGPU, and next an alternative for OpenGL on Linux (and less important: Android) had to be implemented (which is the Vulkan backend). So far Linux and Android were the only sokol-gfx target platforms limited to a single backend: OpenGL. All other target platforms already have a more modern alternative (Windows with D3D11 and macOS/iOS with Metal). Deprecating the OpenGL backend won’t happen for a while, but personally I can’t wait to free sokol-gfx from the ‘shackles of OpenGL’ ;)</p> <p>Also another reason why I felt that now is the right time to tackle Vulkan support is that the Vulkan API has improved quite a bit since 1.0 in ways that make it a much better fit for sokol-gfx. In a nutshell (if you already know Vulkan concepts), the sokol-gfx backend makes use of the following ‘modern’ Vulkan features:</p> <ul> <li>‘dynamic rendering’ (e.g. render passes are enclosed by begin/end calls instead of being baked into render-pass objects) - e.g. pretty much a copy of the Metal render pass model. This is a perfect match for sokol-gfx sg_begin_pass()/sg_end_pass()</li> <li><code class="language-plaintext highlighter-rouge">EXT_descriptor_buffer</code> - this is a controversial choice, but it’s a perfect match for the sokol-gfx resource binding model and I really did not want to deal with the traditional rigid Vulkan descriptor API (which is an overengineered boondoggle if I’ve ever seen one). This is also the main reason why mobile GPUs had to be left out for now, and apparently descriptor buffers are also a poor match for NVIDIA GPUs. The plan here is to wait until Khronos completes work on a descriptor pool replacement which AFAIK will be a mix of descriptor buffers and D3D12-style descriptor heaps and then port the <code class="language-plaintext highlighter-rouge">EXT_descriptor_buffer</code> code over to that new resource binding API</li> <li>‘synchronization2’ (not a drastic change from the original barrier model, I’m just listing it here for completeness)</li> </ul> <p>Work on the Vulkan backend spans three sub-projects:</p> <ul> <li><code class="language-plaintext highlighter-rouge">sokol-shdc</code>: added Vulkan-flavoured SPIRV output</li> <li><code class="language-plaintext highlighter-rouge">sokol_app.h</code>: device creation, swapchain management and frame loop</li> <li><code class="language-plaintext highlighter-rouge">sokol_gfx.h</code>: rendering and compute features</li> </ul> <h2 id="sokol-shdc-changes">sokol-shdc changes</h2> <p>From the outside, the shader compiler changes are minimal (so minimal that the update is actually already live for a little while).</p> <p>The only change is that a new output shader format has been added: <code class="language-plaintext highlighter-rouge">spirv_vk</code> for ‘Vulkan-flavoured SPIRV. To compile a GLSL input shader to SPIRV:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sokol-shdc -i bla.glsl -o bla.h -l spirv_vk </code></pre></div></div> <p>Internally the changes are also fairly small since sokol-shdc input shaders are already authored in ‘Vulkan-flavoured GLSL’, the only missing information is the descriptor set for resource bindings.</p> <p>Sokol-shdc shaders only declare a bindslot on resource bindings with different ‘bind spaces’ for uniform blocks, samplers and anything else, for instance:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">fs_params</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">tex</span><span class="p">;</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">sampler</span> <span class="n">smp</span><span class="p">;</span> </code></pre></div></div> <p>Sokol-shdc performs a backend-specific bindslot allocation which for SPIRV output assigns descriptor sets (uniform blocks live in descriptor set 0 and everything else in descriptor set 1), and remap sampler bindings to resolve bindslot collisions with textures, storage-buffer and storage-images, so the above code snippet essentially becomes:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">layout</span><span class="p">(</span><span class="n">set</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">fs_params</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span> <span class="k">layout</span><span class="p">(</span><span class="n">set</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">tex</span><span class="p">;</span> <span class="k">layout</span><span class="p">(</span><span class="n">set</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">binding</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">sampler</span> <span class="n">smp</span><span class="p">;</span> </code></pre></div></div> <p>The one thing that’s not straightforward is that sokol-shdc does a ‘double-tap’ for SPIRV-output:</p> <ul> <li>the input shader code is compiled from GLSL to SPIRV</li> <li>SPIRVTools optimizer passes are applied to the SPIRV</li> <li>bindings are remapped (in this case: simply add descriptor set decorators but keep the bindslots intact)</li> <li>the SPIRV is translated back to GLSL via SPIRVCross</li> <li>finally the SPIRVCross output is compiled <em>again</em> to SPIRV</li> </ul> <p>The weird double compilation is a compromise to avoid large structural changes to the sokol-shdc code base and make the Vulkan shader pipeline less of a special case. Essentially, SPIRV is used as an intermediate format in the first compile pass, and then as output bytecode format in the second pass.</p> <h2 id="sokol_apph-changes">sokol_app.h changes</h2> <p>Apart from the actual Vulkan-related update I took the opportunity to do some public API cleanup which was rolling around in my head for a while.</p> <p>First, the backend-specific config options in the <code class="language-plaintext highlighter-rouge">sapp_desc</code> struct are now grouped into per-backend-nested structs, e.g. from this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sapp_desc</span> <span class="nf">sokol_main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">argv</span><span class="p">[])</span> <span class="p">{</span> <span class="k">return</span> <span class="p">(</span><span class="n">sapp_desc</span><span class="p">){</span> <span class="c1">// ...</span> <span class="p">.</span><span class="n">win32_console_utf8</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">win32_console_attach</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">html5_bubble_mouse_events</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">html5_use_emsc_set_main_loop</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div></div> <p>…to this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sapp_desc</span> <span class="nf">sokol_main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">argv</span><span class="p">[])</span> <span class="p">{</span> <span class="k">return</span> <span class="p">(</span><span class="n">sapp_desc</span><span class="p">){</span> <span class="c1">// ...</span> <span class="p">.</span><span class="n">win32</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">console_utf8</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">console_attach</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">html5</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">bubble_mouse_events</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">use_emsc_set_main_loop</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">}</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div></div> <p>A new enum <code class="language-plaintext highlighter-rouge">sapp_pixel_format</code> has been introduced which will play a bigger role in the future to allow more configuration options for the sokol-app swapchain.</p> <p>A ton of backend-specific functions to query backend-specific objects have been merged to better harmonize with sokol-gfx:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_metal_get_device</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_metal_get_current_drawable</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_metal_get_depth_stencil_texture</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_metal_get_msaa_color_texture</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_d3d11_get_device</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_d3d11_get_device_context</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_d3d11_get_render_view</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_d3d11_get_resolve_view</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_d3d11_get_depth_stencil_view</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_wgpu_get_device</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_wgpu_get_render_view</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_wgpu_get_resolve_view</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">sapp_wgpu_get_depth_stencil_view</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="kt">uint32_t</span> <span class="nf">sapp_gl_get_framebuffer</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> </code></pre></div></div> <p>…those have been merged into:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sapp_environment</span> <span class="nf">sapp_get_environment</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="n">sapp_swapchain</span> <span class="nf">sapp_get_swapchain</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> </code></pre></div></div> <p>The new structs <code class="language-plaintext highlighter-rouge">sapp_environment</code> and <code class="language-plaintext highlighter-rouge">sapp_swapchain</code> conceptually plug into the sokol-gfx structs <code class="language-plaintext highlighter-rouge">sg_environment</code> and <code class="language-plaintext highlighter-rouge">sg_swapchain</code> (with the emphasis on <strong>conceptually</strong>, you still need a mapping from the sokol-app structs and enums to the sokol-gfx structs and enums, and this mapping is still peformed by the sokol_glue.h header.</p> <p>That’s it for the public API changes in sokol_app.h, now on to the Vulkan specific parts:</p> <p>The new struct <code class="language-plaintext highlighter-rouge">sapp_environment</code> contains a nested struct <code class="language-plaintext highlighter-rouge">sapp_vulkan_environment vulkan;</code> with Vulkan object pointers (as type-erased void-pointers so that they can be tunneled through backend-agnostic code):</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sapp_vulkan_environment</span> <span class="p">{</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">physical_device</span><span class="p">;</span> <span class="c1">// VkPhysicalDevice</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">device</span><span class="p">;</span> <span class="c1">// VkDevice</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">queue</span><span class="p">;</span> <span class="c1">// VkQueue</span> <span class="kt">uint32_t</span> <span class="n">queue_family_index</span><span class="p">;</span> <span class="p">}</span> <span class="n">sapp_vulkan_environment</span><span class="p">;</span> </code></pre></div></div> <p>…and likewise the new struct <code class="language-plaintext highlighter-rouge">sapp_swapchain</code> contains a nested struct <code class="language-plaintext highlighter-rouge">sapp_vulkan_swapchain vulkan;</code> with Vulkan object pointers which are needed for a sokol-gfx swapchain render pass:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sapp_vulkan_swapchain</span> <span class="p">{</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">render_image</span><span class="p">;</span> <span class="c1">// VkImage</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">render_view</span><span class="p">;</span> <span class="c1">// VkImageView</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">resolve_image</span><span class="p">;</span> <span class="c1">// VkImage;</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">resolve_view</span><span class="p">;</span> <span class="c1">// VkImageView</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">depth_stencil_image</span><span class="p">;</span> <span class="c1">// VkImage</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">depth_stencil_view</span><span class="p">;</span> <span class="c1">// VkImageView</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">render_finished_semaphore</span><span class="p">;</span> <span class="c1">// VkSemaphore</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span> <span class="n">present_complete_semaphore</span><span class="p">;</span> <span class="c1">// VkSemaphore</span> <span class="p">}</span> <span class="n">sapp_vulkan_swapchain</span><span class="p">;</span> </code></pre></div></div> <p>The Vulkan-specific startup code path looks like this (the usual boilerplate-heavy initialization dance):</p> <ul> <li>A <code class="language-plaintext highlighter-rouge">VkInstance</code> object is created.</li> <li>A platform- and window-system-specific <code class="language-plaintext highlighter-rouge">vkSurfaceKHR</code> object is created, this is essentially the glue between a Vulkan swapchain and a specific window system. In the first release this window system glue code is only implemented for X11 via <code class="language-plaintext highlighter-rouge">vkCreateXlibSurfaceKHR</code>.</li> <li>A <code class="language-plaintext highlighter-rouge">VkPhysicalDevice</code> is picked, this is the first time where the sokol-app backend takes a couple of shortcuts, initialization will fail if: <ul> <li>EXT_descriptor_buffer is not supported (this currently rules out most mobile devices)</li> <li>the supported Vulkan API version is not at least 1.3</li> <li>no ‘queue family’ exists which supports graphics, compute, transfer and presentation commands all on the same queue</li> </ul> </li> <li>Next a logical <code class="language-plaintext highlighter-rouge">VkDevice</code> object is created with the following required features and extensions (with the exception of compressed texture formats which are optional): <ul> <li>a single queue for all commands</li> <li>EXT_descriptor_buffer</li> <li>extendedDynamicState</li> <li>bufferDeviceAddress</li> <li>dynamicRendering</li> <li>synchronization2</li> <li>samplerAnisotropy</li> <li>optional: <ul> <li>textureCompressionBC</li> <li>textureCompressionETC2</li> <li>textureCompressionASTC_LDR</li> </ul> </li> </ul> </li> <li>The swapchain is initialized: <ul> <li>a <code class="language-plaintext highlighter-rouge">VkSwapchainKHR</code> object is created: <ul> <li>pixel format currently either RGBA8 or BGRA8 (no sRGB)</li> <li>present-mode hardwired to <code class="language-plaintext highlighter-rouge">VK_PRESENT_MODE_FIFO_KHR</code></li> <li>composite-alpha hardwired to <code class="language-plaintext highlighter-rouge">VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR</code></li> </ul> </li> <li><code class="language-plaintext highlighter-rouge">VkImage</code> and <code class="language-plaintext highlighter-rouge">VkImageView</code> objects are obtained or created for the swapchain images, depth-stencil-buffer and optional MSAA surface</li> </ul> </li> <li>Finally a couple of VkSemaphore objects are created for each swapchain image (the number of swapchain images is essentially dictated by the Vulkan driver): <ul> <li>one <code class="language-plaintext highlighter-rouge">render_finished_semaphore</code> which signals that the GPU has finished rendering to a swapchain surface</li> <li>one <code class="language-plaintext highlighter-rouge">present_complete_semaphore</code> which signals that presenting a swapchain image has completed and the image ready for reuse</li> </ul> </li> </ul> <p>At this point, the Vulkan specific code in sokol_app.h is at about 600 lines of code, which is a lot of boilerplate, but OTH is a lot less messy than the combined OpenGL window system code for GLX, EGL, WGL or NSOpenGL (yet still a lot more than the window system glue for the other backends).</p> <p>The actually interesting stuff happens in the last two Vulkan backend functions:</p> <p>The internal function <code class="language-plaintext highlighter-rouge">_sapp_vk_swapchain_next()</code> is a wrapper around <code class="language-plaintext highlighter-rouge">vkAcquireNextImageKHR()</code> and obtains the next free swapchain image. The function will also signal the associated <code class="language-plaintext highlighter-rouge">present_complete_semaphore</code>.</p> <p>The last function in the sokol-app Vulkan backend is <code class="language-plaintext highlighter-rouge">_sapp_vk_present()</code>, this is a wrapper for <code class="language-plaintext highlighter-rouge">vkQueuePresentKHR()</code>. The present operation uses the <code class="language-plaintext highlighter-rouge">render_finished_semaphore</code> to make sure that presentation happens after the GPU has finished rendering to the swapchain image. When the <code class="language-plaintext highlighter-rouge">vkQueuePresentKHR()</code> function returns with <code class="language-plaintext highlighter-rouge">VK_ERROR_OUT_OF_DATE_KHR</code> or <code class="language-plaintext highlighter-rouge">VK_SUBOPTIMAL_KHR</code>, the swapchain resources are recreated (this happens for instance when the window is resized).</p> <p>There’s a couple of open todo points in the sokol-app Vulkan backend which I’ll take care of later:</p> <ul> <li>Any non-success return values from <code class="language-plaintext highlighter-rouge">vkAcquireNextImageKHR()</code> are currently only logged but not handled. Normally the application is either supposed to re-create the swapchain resources or skip rendering and presentation. Since I couldn’t coerce my Kubuntu laptop to ever return a non-success value from <code class="language-plaintext highlighter-rouge">vkAcquireNextImageKHR()</code> I would have to implement behaviour I couldn’t test, so I had to skip this part for now. Maybe when moving the code over to my Windows/NVIDIA PC I’ll be able to handle that situation properly.</li> <li>Currently the swapchain image size must match the window client rectangle size (same as OpenGL via GLX). The Vulkan swapchain API has an optional scaling feature, but I couldn’t get this to work on my Kubuntu laptop. Window-system scaling is mainly useful when the system has a high-dpi display but lower-end GPU, and all other sokol-app backends depend on the system to scale a smaller framebuffer to the window client rectangle when needed.</li> </ul> <p>The main area I struggled with in the sokol-app Vulkan backend was swapchain resizing. Most sokol-app backends kick off any swapchain resize operation from the window system’s resize event, e.g.:</p> <ul> <li>window is resized by user</li> <li>window system resize event fires giving the new window size</li> <li>sokol-app listens for the window system resize event and initiates a swapchain resize with the new size coming from the window system event, then stores the new size for sapp_width/height() and finally fires an <code class="language-plaintext highlighter-rouge">SAPP_EVENTTYPE_RESIZED</code> event</li> </ul> <p>This doesn’t work on the Vulkan backend, the validation layer would sometimes complain that there’s a difference between actual and expected swapchain surface dimensions (I forgot the exact error circumstances, forgiveable since implementating a Vulkan backend is basically crawling from one validation layer error to the next).</p> <p>Long story short: I got it to work by leaving the host window system entirely out of the loop and let the Vulkan swapchain take full control of the resize process:</p> <ul> <li>window is resized by user</li> <li>window system resize event fires, but is now ignored by sokol-app</li> <li>the next time <code class="language-plaintext highlighter-rouge">vkQueuePresentKHR()</code> is called it returns with an error code and this triggers a swapchain-resource resize, with the size coming from the Vulkan surface object instead of the window system, finally an <code class="language-plaintext highlighter-rouge">SAPP_EVENTTYPE_RESIZED</code> event is fired</li> </ul> <p>This fixes any validation layer warnings and is in the end a cleaner implementation compared to letting the window system dictate the swapchain size.</p> <p>There are downsides though: At least on my Kubuntu laptop it looks like the window system and Vulkan swapchain code doesn’t run in lock step. Instead the Vulkan swapchain seems to lag behind the window system a bit and this results in minor artefacts during resizing: sometimes there’s a visible gap between the Vulkan surface and window border, and the frame rate gets slighly out of whack during resize. In comparison, on macOS rendering with Metal during window resize is buttery smooth and without resize-jitter or border-gaps (although tbf, removing the resize-jitter on macOS had to be explicitly implemented by anchoring the NSView object to a window border).</p> <p>That’s all there is to the Vulkan backend in sokol_app.h, on to sokol_gfx.h!</p> <h2 id="sokol_gfxh-changes">sokol_gfx.h changes</h2> <p>For the most part, the actual mapping of the sokol-gfx functions to Vulkan API functions is very straightforward, often the mapping is 1:1. This is mainly thanks to using a couple of modern Vulkan features and extensions:</p> <ul> <li>Dynamic rendering (e.g. <code class="language-plaintext highlighter-rouge">vkBeginRendering()/vkEndRendering()</code>) is a perfect match for sokol-gfx <code class="language-plaintext highlighter-rouge">sg_begin_pass()/sg_end_pass()</code>, this is not very surprising though because the dynamic rendering Vulkan API is basically a ‘de-OOP-ed’ version of the Metal render pass API.</li> <li><code class="language-plaintext highlighter-rouge">EXT_descriptor_buffers</code> is an absolutely perfect match for sokol-gfx’s <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> call, and a ‘pretty good’ match for <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code></li> </ul> <p>The main areas for future improvements are the barrier system and the staging system, but let’s not get ahead of ourselves.</p> <h3 id="a-10000-foot-view">A 10000 foot view</h3> <p>Apart from the straight mapping of sokol-gfx API calls to Vulkan-API calls, the Vulkan backend has to implement a couple of low-level subsystems. This isn’t all that unusual, other backends also have such subsystems, but the Vulkan backend definitely is the most ‘subsystem heavy’.</p> <p>OTH some concepts of modern Vulkan are quite similar to WebGPU, Metal and even D3D11 - and this conceptual overlap significantly simplified the Vulkan backend implementation.</p> <p>In some areas the Vulkan backend has even more straightforward implementations than some of the other backends, for instance the implementation of the resource binding call <code class="language-plaintext highlighter-rouge">sg_apply_bindings</code> in the Vulkan backend is one of the most straightforward of all backends and especially compared to the WebGPU backend. In Vulkan it’s literally just a bunch of memcpy’s followed by a single Vulkan API call to record an offset into the descriptor buffer (ok, it’s actually a bit more complicated because of the barrier system). Compared to that, the WebGPU backend needs to use a ‘hash-and-cache’ approach for baked BindGroup objects, e.g. calling <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> may involve creating and destroying WebGPU objects.</p> <p>The low-level subsystems in the sokol-gfx Vulkan backend are:</p> <ul> <li>a ‘delete queue’ system for delayed Vulkan object destruction</li> <li>the GPU memory allocation system (very rudimentary at the moment)</li> <li>the frame-sync system (e.g. ensuring that the CPU and GPU can work in parallel in typical render frames)</li> <li>the uniform update system</li> <li>the bindings update system</li> <li>two ‘staging systems’ for copying CPU-side data into GPU-side resources: <ul> <li>a ‘copy’ staging system</li> <li>a ‘stream’ staging system</li> </ul> </li> <li>the resource barrier system</li> </ul> <p>Let’s look at those one by one:</p> <h3 id="the-delete-queue-system">The Delete Queue System</h3> <p>Vulkan doesn’t have any automatic lifetime management like some other 3D APIs (e.g. no D3D-style reference counting). When you call a destroy function on an object, it’s gone. When you do that while the object is still in flight (e.g. referenced in a queue and waiting to be consumed by the GPU), hilarity ensues.</p> <p>IMHO this is much better than any automatic lifetime management system, because it avoids any confusion about reference counts (e.g. questions like: when I call this function to get an object reference, will that bump the refcount or not?), but this means that a Vulkan backend needs to implement some sort of garbage collection on its own.</p> <p>Sokol-gfx uses a double-buffered delete-queue system for this. Each ‘double-buffer-frame-context’ owns a delete queue which is a simple fixed-size array of pointer-pairs. Each queue item consists of:</p> <ul> <li>one type-erased Vulkan object pointer (e.g. a void-pointer)</li> <li>a function pointer for a destructor function which takes a void* as argument and knows how to destroy that Vulkan object</li> </ul> <p>All Vulkan object types which may be referenced in command buffers will not call their <code class="language-plaintext highlighter-rouge">vkDestroy*()</code> functions directly, but instead add them to the delete-queue that’s associated with the currently recorded command buffer. At the start of a new frame (what ‘new frame’ actually means is explained down in the ‘frame-sync system’), the delete-queue for that frame-context is drained by calling the destructor function with the Vulkan object pointer of a queue item. This makes sure that any Vulkan objects are kept alive until the GPU has finished processing any command buffers which might hold references to those objects.</p> <h3 id="the-gpu-memory-allocation-system">The GPU Memory Allocation System</h3> <p>Currently GPU allocations do <em>not</em> go through a custom allocator, instead all granular allocations directly call into <code class="language-plaintext highlighter-rouge">vkAllocateMemory()</code>. Originally I had intended to use SebAaltonen’s <a href="https://github.com/sebbbi/OffsetAllocator">OffsetAllocator</a> as the default GPU allocator, but also expose an allocator interface to allow users to inject more complex allocators like <a href="https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator">VMA</a>.</p> <p>Historically a custom allocator was pretty much required because some Vulkan drivers only allowed 4096 unique GPU allocations. Today though it looks like pretty much all (desktop) Vulkan drivers allow 4 billion allocations (at least according to the <a href="https://vulkan.gpuinfo.org/">Vulkan hardware database</a>).</p> <p>The plan is still to at least allow injecting a custom GPU allocator via an allocator interface, and also maybe to integrate OffsetAllocator as default allocator, but without knowing the memory allocation strategy of Vulkan drivers this may be redundant. E.g. if a Vulkan driver essentially integrates something like VMA anyway there’s not much point stacking another allocator on top of it, at least for a fairly high level API wrapper like sokol-gfx.</p> <p>In any case, the current GPU memory allocation implementation is prepared for a bit more abstraction in the future. All GPU allocations go through a single internal function <code class="language-plaintext highlighter-rouge">_sg_vk_mem_alloc_device_memory()</code> which takes a ‘memory type’ enum and a <code class="language-plaintext highlighter-rouge">VkMemoryRequirements</code> pointer as input. The memory type enum is sokol-gfx specific and includes:</p> <ul> <li>storage buffer (an sg_buffer object with storage buffer usage)</li> <li>generic buffer (all other sg_buffer types)</li> <li>image (all usages)</li> <li>internal staging buffer for the ‘copy-staging system’</li> <li>internal staging buffer for the ‘stream-staging system’</li> <li>internal uniform buffer</li> <li>internal descriptor buffer</li> </ul> <p>Currently all resources are either in ‘device-local’ memory, or in ‘host-visible + host-coherent’ memory. Having the mapping from sokol-specific memory type to Vulkan memory flags in one place makes it easier to tweak those flags in the future (or delegate that decision to an external memory allocator).</p> <h3 id="the-frame-sync-system">The Frame Sync System</h3> <p>The frame sync system is mainly concerned about letting the CPU and GPU work in parallel without stepping on each other’s feet. This basically comes down to double-buffering all resources which are written by the CPU and read by the GPU, and to have one sync-point in a sokol-gfx frame where the CPU needs to wait for the oldest ‘frame-context’ to become available (e.g. is no longer ‘in flight’).</p> <p>This single <code class="language-plaintext highlighter-rouge">CPU &lt;=&gt; GPU</code> sync point is implemented in a function <code class="language-plaintext highlighter-rouge">_sg_vk_acquire_frame_command_buffers()</code>. The name indicates the main feature of that function: it acquires command buffers to record the Vulkan commands of the current frame. Command buffers are reused, so this involves waiting for the command buffers to become available (e.g. they are no longer read from by the GPU). “Command buffers” is plural because there are two command buffers per frame: one which records all staging-commands, and one for the actual compute/render commands - more on that later in the staging system section.</p> <p>For this <code class="language-plaintext highlighter-rouge">CPU &lt;=&gt; GPU</code> synchronization, each double-buffered frame-context owns a <code class="language-plaintext highlighter-rouge">VkFence</code> which is signalled when the GPU is done processing a ‘queue submit’.</p> <p>So the first and most important thing the <code class="language-plaintext highlighter-rouge">_sg_vk_acquire_frame_command_buffers()</code> function does is to wait for the fence of the oldest frame-context with a call to <code class="language-plaintext highlighter-rouge">vkWaitForFences()</code>.</p> <p>This potential-wait-operation is the reason why sokol-gfx applications should move sokol-gfx calls towards the end of the frame callback and try to do all heavy non-rendering-related CPU work at the start of the frame callback. More specifically calls to:</p> <ul> <li><code class="language-plaintext highlighter-rouge">sg_begin_pass()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_update_buffer()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_update_image()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_append_buffer()</code></li> </ul> <p>…these are basically the ‘potential new-frame entry points’ of the sokol-gfx API which may require the CPU to wait for the GPU.</p> <p>The <code class="language-plaintext highlighter-rouge">_sg_vk_acquire_frame_command_buffers()</code> function does a couple more things after <code class="language-plaintext highlighter-rouge">vkWaitForFences()</code> returns:</p> <ul> <li>first (actually before the <code class="language-plaintext highlighter-rouge">vkWaitForFences()</code> call) it checks if the function had already been called in the current frame, if yes it returns immediately</li> <li><code class="language-plaintext highlighter-rouge">vkResetFences()</code> is called on the fence we just waited on</li> <li>the delete-queue is drained (e.g. all resources which were recorded for destruction in the frame-context we just waited on are finally destroyed)</li> <li>any command buffers associated with the new frame are reset via <code class="language-plaintext highlighter-rouge">vkResetCommandBuffer()</code></li> <li>…and recording into those command buffers is started via <code class="language-plaintext highlighter-rouge">vkBeginCommandBuffer()</code></li> <li>additionally the other subsystems are notified because they might want to do their own thing: <ul> <li><code class="language-plaintext highlighter-rouge">_sg_vk_uniform_after_acquire()</code></li> <li><code class="language-plaintext highlighter-rouge">_sg_vk_bind_after_acquire()</code></li> <li><code class="language-plaintext highlighter-rouge">_sg_vk_staging_stream_after_acquire()</code></li> </ul> </li> </ul> <p>The other internal function of the frame-sync system is <code class="language-plaintext highlighter-rouge">_sg_vk_submit_frame_command_buffers()</code>. This is called at the end of a ‘sokol-gfx frame’ in the <code class="language-plaintext highlighter-rouge">sg_commit()</code> call. The main job of this function is to submit the recorded command buffers for the current frame via <code class="language-plaintext highlighter-rouge">vkQueueSubmit()</code>. This submit operation uses the two semaphores we got handed from the outside world (e.g. sokol-app) as part of the swapchain information in <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code>:</p> <ul> <li>the <code class="language-plaintext highlighter-rouge">present_complete_semaphore</code> is used as the wait-semaphore of the <code class="language-plaintext highlighter-rouge">vkQueueSubmit()</code> call (the GPU basically needs to wait for the swapchain image of the render pass to become available for reuse)</li> <li>the <code class="language-plaintext highlighter-rouge">render_finished_semaphore</code> is used as the signal-semaphore to be signalled when the GPU is done processing the submit payload</li> </ul> <p>Before the <code class="language-plaintext highlighter-rouge">vkQueueSubmit()</code> call there’s a bit more housekeeping happening:</p> <ul> <li>the other subsystems are notified about the submit via: <ul> <li><code class="language-plaintext highlighter-rouge">_sg_vk_staging_stream_before_submit()</code></li> <li><code class="language-plaintext highlighter-rouge">_sg_vk_bind_before_submit()</code></li> <li><code class="language-plaintext highlighter-rouge">_sg_vk_uniform_before_submit()</code></li> </ul> </li> <li>recording into the command buffers which are associated with the current frame context is finished via <code class="language-plaintext highlighter-rouge">vkEndCommandBuffers()</code></li> </ul> <p>It’s also important to note that there is one other potential <code class="language-plaintext highlighter-rouge">CPU &lt;=&gt; GPU</code> sync-point in a frame, and that’s in the first <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> for a swapchain render pass: the swapchain-info struct that’s passed into <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> contains a swapchain image which must be acquired via <code class="language-plaintext highlighter-rouge">vkAcquireNextImageKHR()</code> (when using sokol_app.h this happens in the <code class="language-plaintext highlighter-rouge">sapp_get_swapchain()</code> call - usually indirectly via <code class="language-plaintext highlighter-rouge">sglue_swapchain()</code>).</p> <p>That is all for the frame-sync system in sokol-gfx, all in all quite similar to Metal or WebGPU, just with more code bloat (as is the Vulkan way).</p> <h3 id="resource-binding-via-ext_descriptor_buffer">Resource binding via EXT_descriptor_buffer</h3> <p>…a little detour into Vulkan descriptors and how the sokol-gfx resource binding model maps to Vulkan.</p> <p>Conceptually and somewhat simplified, a Vulkan <strong>descriptor</strong> is an abstract reference to a Vulkan buffer, image or sampler which needs to be accessible in a shader. Basically what shows up on the shader side whenever you see a <code class="language-plaintext highlighter-rouge">layout(binding=x) ...</code>. In sokol-gfx lingo this is called a ‘binding’.</p> <p>In an ideal world, such a binding would simply be a ‘GPU pointer’ to some opaque struct living in GPU memory which describes to shader code how to access bytes in a storage buffer, pixels in a storage image, or how to perform a texture-sampling operation.</p> <p>In the real world it’s not that simple because this is exactly the one main area where GPU architectures still differ dramatically: on some GPUs this information might be hardwired into register tables and/or involves fixed-function features instead of being just ‘structs in GPU memory’ - and unfortunately those differences are not limited to shitty mobile GPUs, but are also still present in desktop GPUs. Intel, AMD and NVIDIA all have different opinions on how this whole resource binding thing should work - and I’m not sure anything has changed in the last decade since Vulkan promised us a more-or-less direct mapping to the underlying hardware.</p> <p>So in the real world 3D APIs still need to come up with some sort of abstraction layer to get all those different hardware resource binding models under a common programming model (and yes, even the apparently ‘low-level’ Vulkan API had to come up with a highlevel abstraction for resource binding - and this went quite poorly… but I disgress).</p> <p>(side note: traditional vertex- and index-buffer-bindings are <em>not</em> performed through Vulkan descriptors, but through regular ‘bindslot-setter’ calls like in any other 3D API - go figure).</p> <p>A Vulkan <strong>descriptor-set</strong> is a group of such concrete bindings which can be applied as an atomic unit instead of applying each binding individually. In the end the traditional Vulkan descriptor model isn’t all that different from the ‘old’ bindslot model used in Metal V1 or D3D11, the one big and important difference is that bindings are not applied individually but as groups.</p> <p>The downside of such a ‘bind group model’ is of course that specific binding combinations may be unpredictable - which is the one big recurring topic in Vulkan’s (very slow) API evolution.</p> <p>In ‘old Vulkan’ pretty much all state-combinations in all areas of the API need to be known upfront in order to move as much work as possible into the init-phase and out of the render-phase. Theoretically a pretty sensible plan, but unfortunately only theoretically. In practice there are a lot of use cases where pre-baking everything is simply not possible, especially outside the game engine world, and even in gaming it doesn’t quite work - whenever you see stuttering when something new appears on screen in modern games built on top of state-of-the-art engines calling into modern 3D APIs - that’s most likely the core design philosophy of Vulkan and D3D12 crashing and burning after colliding with reality. Thankfully - but unfortunately very slowly - this is changing. Most of Vulkan’s progress in the last decade was about rolling the core API back to a more ‘dynamic’ programming model.</p> <p>Ok, back to Vulkan’s resource binding lingo:</p> <p>A Vulkan <strong>descriptor-set-layout</strong> is the <em>shape</em> of a descriptor-set. It basically says ‘there will be a sampled texture at binding 0, a buffer at binding 1 and a sampler at binding 2’, but not the concrete texture, buffer or sampler objects (those are referenced in the concrete <strong>descriptor-sets</strong>).</p> <p>And finally a Vulkan <strong>pipeline-layout</strong> groups all descriptor-set-layouts required by the shader stages of a Vulkan pipeline-state-object.</p> <p>When coming from WebGPU this should all sound quite familiar since the WebGPU bindgroups model is essentially the Vulkan 1.0 descriptor model (for better or worse):</p> <ul> <li>WebGPU BindGroupEntry maps to Vulkan descriptors</li> <li>WebGPU BindGroup maps to Vulkan descriptor sets</li> <li>WebGPU BindGroupLayout maps to Vulkan descriptor set layouts</li> <li>WebGPU PipelineLayout maps to Vulkan pipeline layouts</li> </ul> <p>‘Old Vulkan’ then adds descriptor pools on top of that but tbh I didn’t even bother to deal with those and skipped right to <code class="language-plaintext highlighter-rouge">EXT_descriptor_buffer</code>.</p> <p>With the descriptor buffer extension, descriptors and descriptor sets are ‘just memory’ with opaque memory layouts for each descriptor type which are specific to the Vulkan driver (depending on the driver and descriptor type, such opaque memory blobs seem to be between 16 and 256 bytes per descriptor).</p> <p>Binding resources with <code class="language-plaintext highlighter-rouge">EXT_descriptor_buffers</code> essentially looks like this:</p> <p>In the init-phase:</p> <ul> <li>create a descriptor buffer big enough to hold all descriptors needed in a worst-case frame</li> <li>for each item in a descriptor-set-layout, ask Vulkan for the descriptor size and relative offset to the start of the descriptor-set data in the descriptor buffer</li> <li>similar for all concrete descriptors, ask Vulkan to copy their opaque memory representation into some private memory location and keep those around for the render phase (of course it’s also possible to move this step into the render phase)</li> </ul> <p>In the render-phase:</p> <ul> <li>memcpy the concrete descriptor blobs we stored upfront into the descriptor buffer to compose an adhoc descriptor set, using the offsets we also stored upfront</li> <li>finally record the start offset in the descriptor buffer into a Vulkan command buffer via a Vulkan API call, and that’s it!</li> </ul> <p>This is pretty much the same procedure how uniform data updates are performed in the sokol-gfx Metal and WebGPU backends, now just extended to resource bindings.</p> <p>E.g. TL;DR: both uniform data snippets and resource bindings are ‘just frame-transient data snippets’ which are memcpy’ed into per-frame buffers and the buffer offsets recorded before the next draw- or dispatch-call.</p> <p>In sokol-gfx, the VkDescriptorSetLayout and VkPipelineLayout objects are created in <code class="language-plaintext highlighter-rouge">sg_make_shader()</code> using the shader interface reflection information provided in the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> arg (which is usually code-generated by the sokol-shdc shader compiler).</p> <ul> <li>the first descriptor set layout (set 0) describes all uniform block bindings used by the shader across all shader stages</li> <li>the second descriptor set layout (set 1) describes all texture, storage buffer, storage image and sampler bindings</li> </ul> <p>…additionally, <code class="language-plaintext highlighter-rouge">sg_make_shader()</code> queries the descriptor sizes and offsets within their descriptor set.</p> <h3 id="the-uniform-update-system">The uniform update system:</h3> <p>Conceptually uniform updates in the Vulkan backend are similar to the Metal backend:</p> <ul> <li>a double-buffered uniform buffer big enough to hold all uniform updates for a worst-case frame, allocated in host-visible memory (so that the memory is directly writable by the CPU and directly readable by the GPU)</li> <li>a call to <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> memcpy’s the uniform data snippet into the next free uniform buffer location (taking alignment requirements into account), this happens individually for the up to 8 ‘uniform block slots’</li> <li>before the next draw- or dispatch-call, the offsets into the uniform buffer for the up to 8 uniform block slots are recorded into the current command buffer</li> </ul> <p>The last step of recording the uniform-buffer offsets is delayed into the next draw- or dispatch-call to avoid redundant work. This is because <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> works on a single uniform block slot, but in Vulkan all uniform block slots are grouped into one descriptor set, and we only want to apply that descriptor-set at most once per draw/dispatch call.</p> <p>The actual <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> call is extremely cheap since no Vulkan API calls are performed:</p> <ul> <li>a simple memcpy of the uniform data snippet into the per-frame uniform buffer</li> <li>writing the ‘GPU buffer address’ and snippet size into a cached array of <code class="language-plaintext highlighter-rouge">VkDescriptorAddressInfoEXT</code> structs</li> <li>setting a ‘uniforms dirty flag’.</li> </ul> <p>…then later in the next draw- or dispatch-calls if the ‘uniforms dirty flag’ is set the actual uniform block descriptor set binding happens:</p> <ul> <li>for each uniform block used in the current pipeline/shader, a opaque descriptor memory blob is directly written into the frame’s descriptor buffer via a call to <code class="language-plaintext highlighter-rouge">vkGetDescriptorEXT()</code></li> <li>the start offset of the descriptor-set in the descriptor buffer is recorded into the current frame command buffer via <code class="language-plaintext highlighter-rouge">vkCmdSetDescriptorBufferOffsetsEXT()</code></li> </ul> <p>…delaying the operation to record the uniform buffer offsets into the draw- or dispatch-call to avoid redundant API calls is actually something that I will also need to implement in the WebGPU backend (I was taking notes while implementing the Vulkan backend which improvements could be back-ported to the WebGPU backend, and I’ll take care of those right after the Vulkan backend is merged).</p> <h3 id="the-resource-binding-system">The resource binding system</h3> <p>Updating resource bindings via <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> is very similar to the uniform update system, but actually even simpler because no extra uniform buffer is involved, and some more initialization can be moved into the init-phase when creating view objects:</p> <p>When creating a texture-, storage-buffer- or storage-image-view object via <code class="language-plaintext highlighter-rouge">sg_make_view()</code> or a sampler object via <code class="language-plaintext highlighter-rouge">sg_make_sampler)</code>, the concrete descriptor data (those little 16..256 byte opaque memory blobs) is copied into the sokol-gfx view or sampler object via <code class="language-plaintext highlighter-rouge">vkGetDescriptorEXT()</code>.</p> <p>Then <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> is just a couple of memcpy’s and a Vulkan call:</p> <ul> <li>for each view and sampler in the <code class="language-plaintext highlighter-rouge">sg_bindings</code> argument, a memcpy of the descriptor memory blob which was stored in the sokol-gfx object into the current frame’s descriptor buffer happens - e.g. no Vulkan calls for that…</li> <li>finally a single call to <code class="language-plaintext highlighter-rouge">vkCmdSetDescriptorBufferOffsetsEXT()</code> records the descriptor buffer offset into the current frame’s command buffer</li> </ul> <p>Vertex- and index-buffer bindings happen via traditional bindslot calls (<code class="language-plaintext highlighter-rouge">vkCmdBindVertexBuffers</code> and <code class="language-plaintext highlighter-rouge">vkCmdBindIndexBuffer</code>). Additionally, barriers may be inserted inside <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> but that will be explained further down in the barrier system.</p> <h3 id="the-two-staging-systems">The two staging systems</h3> <p>Sokol-gfx currently has two separate staging systems for uploading CPU-side data into GPU-memory with the rather arbitrary names ‘copy-staging-system’ and ‘stream-staging-system’. Both can upload data into buffers and images, but with different compromises:</p> <ul> <li>the ‘copy-staging-system’ can upload large amounts of data through a single small staging buffer (default size: 4 MB), with the downside that the Vulkan queue needs to be flushed (e.g. a <code class="language-plaintext highlighter-rouge">vkQueueWaitIdle()</code> is involved)</li> <li>the ‘stream-staging-system’ can upload a limited amount of data per-frame through a fixed-size double-buffered staging buffer (default size: 16 MB - but this can be tweaked in the <code class="language-plaintext highlighter-rouge">sg_setup()</code> call of course), this doesn’t cause any frame-pacing ‘disruptions’ like the copy-staging-system does</li> </ul> <p>The copy-staging-system is currently used:</p> <ol> <li>to upload initial content into immutable buffers and images within <code class="language-plaintext highlighter-rouge">sg_make_buffer()</code> and <code class="language-plaintext highlighter-rouge">sg_make_image()</code></li> <li>to upload data into <code class="language-plaintext highlighter-rouge">usage.dynamic_update</code> images and buffers in the <code class="language-plaintext highlighter-rouge">sg_update_buffer()</code>, <code class="language-plaintext highlighter-rouge">sg_append_buffer()</code> and <code class="language-plaintext highlighter-rouge">sg_update_image()</code> calls</li> </ol> <p>The stream-staging system is only used for <code class="language-plaintext highlighter-rouge">usage.stream_update</code> resources when calling <code class="language-plaintext highlighter-rouge">sg_update_buffer()</code>, <code class="language-plaintext highlighter-rouge">sg_append_buffer()</code> and <code class="language-plaintext highlighter-rouge">sg_update_image()</code>.</p> <p>This means that the correct choice of <code class="language-plaintext highlighter-rouge">usage.dynamic_update</code> and <code class="language-plaintext highlighter-rouge">usage.stream_update</code> for buffers and images is much more important in the Vulkan backend than in other backends.</p> <p>In general:</p> <ul> <li>creating an immutable buffer or image <strong>with initial content</strong> in the render-phase will ‘disrupt’ rendering (how bad this disruption actually is remains to be seen though)</li> <li>the same disruption happens for updating a buffer or image with <code class="language-plaintext highlighter-rouge">usage.dynamic_update</code>,</li> <li>make sure to use <code class="language-plaintext highlighter-rouge">usage.stream_update</code> for buffers and images that need to be updated each frame, but be aware that those uploads go through a single per-frame staging buffer which needs to be big enough to hold all stream-uploads in a single frame (staging buffer sizes can be adjusted in the sg_setup() call)</li> </ul> <p>The strategy for updating <code class="language-plaintext highlighter-rouge">usage.dynamic_update</code> resources may change in the future. For instance I was considering treating dynamic-updates exactly the same as stream-updates (e.g. going through the per-frame staging buffer to avoid the <code class="language-plaintext highlighter-rouge">vkQueueWaitIdle()</code>), and when the staging buffer would overflow fall back to the copy-staging system (also for stream-updates). This felt too unpredictable to me, so I didn’t go that way for now.</p> <p>Note that the staging system is the most likely system to drastically change in the future (together with the barrier system). One of the important planned changes in my mental sokol-gfx roadmap is a rewrite of the resource update API, and this rewrite will most likely ‘favour’ modern 3D APIs and not worry about OpenGL as much as the current very restrictive resource update API does.</p> <p>The common part in both staging systems is how the actual upload happens:</p> <ul> <li>staging buffers are allocated in CPU-visible + cache-coherent memory (the copy-staging system uses a single small buffer, while the stream-staging system uses double-buffering)</li> <li>a staging operation first memcpy’s a chunk of memory into the staging buffer and then records a Vulkan command to copy that data from the staging buffer into a Vulkan buffer or image (via <code class="language-plaintext highlighter-rouge">vkCmdCopyBuffer()</code> or <code class="language-plaintext highlighter-rouge">vkCmdCopyBufferToImage2()</code></li> <li>in the stream-staging system each buffer update is always a single call to <code class="language-plaintext highlighter-rouge">vkCmdCopyBuffer()</code> and each image update is always one call to <code class="language-plaintext highlighter-rouge">vkCmdCopyBufferToImage2()</code> per mipmap</li> <li>in the copy-staging-system, staging operations which are bigger than the staging buffer size will be split into multiple copy operations, each copy-step involving a <code class="language-plaintext highlighter-rouge">vkQueueWaitIdle</code></li> <li>overflowing the stream-staging buffer is a ‘soft error’, e.g. an error will be logged but otherwise this is a no-op</li> </ul> <p>There is another notable implementation detail in the stream-staging system which is related to the barrier system:</p> <p>All stream-staging copy commands are recorded into a separate Vulkan command buffer object so that they are not interleaved with the compute/render commands which are recorded into the regular per-frame command buffer.</p> <p>This is done to move any staging commands out of render passes which is pretty much required for barrier management (I don’t quite remember though if the Vulkan validation layer only complained about issuing barriers inside <code class="language-plaintext highlighter-rouge">vkBeginRendering/vkEndRendering</code> or if copy commands were also prohibited during the render phase).</p> <p>Long story short: all Vulkan commands used for staging operations are recorded into a separate command buffer so that all GPU =&gt; CPU copies can be moved in front of any computer/render commands because of various Vulkan API usage restrictions. This was necessary because sokol-gfx allows to call the resource update functions at any point in a frame, most importantly within render passes.</p> <h3 id="the-resource-barrier-system">The resource barrier system</h3> <p>This was by far the biggest hassle and took a long time to get right, involving several rewrites (and there’s <em>still</em> quite a lot of room for improvement).</p> <p>The first implementation phase was basically to come up with a general barrier insertion strategy which isn’t completely dumb yet still satisfies the Vulkan default validation layer, the second and much harder step was then to also satisify the optional synchronization2 validation layer (which even most ‘official’ Vulkan samples don’t seem to get right - go figure).</p> <p>I won’t bore you with what Vulkan barriers are or why they are necessary, just that barriers are usually needed when a Vulkan buffer or image changes the way it is accessed by the GPU (for instance when a resource changes from being a staging-upload target to being accessed by a shader, or when an image object changes from being used as a pass attachment to being sampled as a texture).</p> <p>In sokol-gfx I tried as much as possible to use a ‘lazy barrier system’, e.g. a barrier is inserted at the latest possible moment before a resource is used.</p> <p>The basic idea is that sokol-gfx buffers and images keep track of their current ‘access state’, this may be a combination of:</p> <ul> <li>staging upload target</li> <li>vertex buffer binding</li> <li>index buffer binding</li> <li>read-only storage buffer binding</li> <li>read-write storage buffer binding</li> <li>texture binding</li> <li>storage image binding (always read-write)</li> <li>a pass attachment (in the flavours color, resolve, depth or stencil)</li> <li>a special ‘discard’ access modifier for pass attachments (used with <code class="language-plaintext highlighter-rouge">SG_LOADACTION_DONTCARE</code>)</li> <li>swapchain presentation</li> </ul> <p>Implicity those access states carry additional information which may be needed for picking the right barrier type, like whether shader accesses are read-only, read-write or write-only, and whether the access may happen exclusively in compute passes, render passes, or both.</p> <p>Ideally barriers would always be inserted right at the point before a resource is bound (because only at that point it’s clear what the new access state is).</p> <p>Unfortunately it’s not that simple: there’s a metric shitton of arbitrary restrictions in Vulkan where exactly barriers may be inserted. The main limitation is that no barriers can be inserted between <code class="language-plaintext highlighter-rouge">vkBeginRendering</code> and <code class="language-plaintext highlighter-rouge">vkEndRendering</code> (which is hella weird, it would be obvious to disallow barriers that involve the current pass attachments, but not for any other resources used in the pass).</p> <p>This limitation is currently the main reason why the sokol-gfx barrier system is not optimal in some cases, because it requires to move any barriers that would be inserted inside render passes before the start of the render pass. However sokol-gfx can’t predict what resources will actually be used in the render pass (spoiler: there’s a surprisingly simple solution to this problem which I should have thought of myself much earlier - but that will be for a later Vulkan backend update).</p> <p>Currently, barrier insertion points are in the following sokol-gfx functions:</p> <ul> <li><code class="language-plaintext highlighter-rouge">sg_begin_pass()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_end_pass()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_update/append_*()</code></li> </ul> <p>The obvious barriers in begin- and end-pass are for image objects transitioning in and out of attachment state.</p> <p>In <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> barriers are only inserted inside compute passes (because of the above mentioned ‘no barriers inside render passes’ rule).</p> <p>In staging operations, barriers are issued at the start and end of the staging operation, the ‘after-barrier’ is not great and eventually needs to be moved elsewhere.</p> <p>Now the tricky part: moving barriers out of render passes… there is one situation where this is relevant: a compute pass writes to a buffer or image, and that buffer or image is then read by a shader in a render pass. Ideally the barrier for this would happen inside the render pass in <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code>, but Vulkan validation layer says “no”.</p> <p>What happens instead is that any resource that’s (potentially) written in a compute pass is tracked as ‘dirty’, and then in the <code class="language-plaintext highlighter-rouge">sg_end_pass()</code> of the compute pass, very conservative barriers are inserted for all those dirty resources. ‘Conservative’ means that I cannot predict how the resource will be used next, so buffers are generally transitioned into ‘vertex+index+storage-buffer access state’ and images are generally transferred into ‘texture access state’.</p> <p>This generally appears to work but is not optimal. We’d like to delay those barriers to when the resources are actually used, and also tighten the scope of the barriers to their actual usage.</p> <p>The solution for this is surprisingly simple: use the same ‘time warp’ that is used for recording staging operations by recording barrier commands that would need to be issued from within sokol-gfx render passes into a separate command buffer which can then be enqueued <strong>before</strong> another command buffer which holds all render/compute commands for the pass.</p> <p>This is a perfect solution but requires a couple of changes which I didn’t want to do in the first Vulkan backend release to not push that out even further:</p> <ul> <li>instead of a single command buffer per frame to hold all render/compute commands, one command buffer per sokol-gfx pass is needed</li> <li>for render passes, a separate command buffer per pass is needed to record barrier commands so that the barriers can be moved out of Vulkan’s <code class="language-plaintext highlighter-rouge">vkBeginRendering/vkEndRendering</code></li> </ul> <p>…inside <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> and <code class="language-plaintext highlighter-rouge">sg_end_pass()</code> we’re now doing some serious time-travelling-shit:</p> <p>Each resource that’s used in a render pass will keep track of all the ‘access states’ it’s used as in the <code class="language-plaintext highlighter-rouge">sg_apply_bindings</code> call (for buffers that may be vertex-, index- or read-only-storage-buffer-binding and for images it can only be texture-binding), additionally the resource is uniquely-added to a tracking array.</p> <p>In <code class="language-plaintext highlighter-rouge">sg_end_pass()</code> we now have a list of all bound resources and their binding types, and this information can be used to record ‘just the right’ barriers into the <strong>separate</strong> command buffer that’s been set aside for render pass barriers. This barrier command buffer is then enqueued <strong>before</strong> the command buffer which holds the render commands for that pass and voila: perfectly scoped render pass barriers. But as I said, this will need to wait until a followup update.</p> <h3 id="everything-else">Everything else…</h3> <p>The rest of the Vulkan backend is so straightforward that it’s not worth writing about, essentially 1:1 mappings from sokol-gfx API functions to Vulkan API functions (the blog post is long enough as it is).</p> <p>Apart from the resource update system (which is overly restrictive and conservative in sokol-gfx, mainly because of OpenGL/WebGL), the sokol-gfx API actually is a really good match for Vulkan. There are no expensive operations (like creating and discarding Vulkan objects) happening in the ‘hot-path’. The use of <code class="language-plaintext highlighter-rouge">EXT_descriptor_buffer</code> is not a great choice for some GPU architectures, but as I said at the start: I’m waiting for Khronos to finish their new resource binding API which apparently will be a mix of D3D12-style descriptor heaps and <code class="language-plaintext highlighter-rouge">EXT_descriptor_buffer</code>.</p> <p>The next steps will most likely be:</p> <ul> <li>porting the backend to Windows (still limited to Intel GPU though)</li> <li>port the backend to NVIDIA (will have to wait until around January because I’ll be away from my NVIDIA PC for the rest of the year)</li> <li>expose a GPU memory allocator interface, and add a sample which hooks up VMA</li> <li>…maaaybe integrate SebAaltonen’s OffsetAllocator as default allocator (still not clear if I need that when all modern Vulkan drivers no longer seem to have that infamous 4096 unique allocations limit)</li> <li>tinker around with GPU memory heap types for uniform- and descriptor-buffers on GPUs without unified memory (e.g. host-visible + device-local)</li> <li>figure out why exactly RenderDoc doesn’t work (apparently it’s because of <code class="language-plaintext highlighter-rouge">EXT_descriptor_buffer</code>, but RenderDoc claims to support the extension since 1.41)</li> <li>add support for debug labels (not much point to implement this before RenderDoc works)</li> <li>implement the improved resource barrier system outlined above</li> <li>add support for multiple swapchain passes (not needed when used with sokol_app.h, but required for any ‘multi-window-scenario’)</li> <li>improve interoperability with Vulkan code that exists outside sokol-gfx (injecting Vulkan buffers and images into <code class="language-plaintext highlighter-rouge">sg_make_buffer/sg_make_image</code> and add the missing <code class="language-plaintext highlighter-rouge">sg_vk_query_*()</code> functions to expose internal Vulkan object handles)</li> </ul> <p>Originally I also had a long rant about the Vulkan API design in this blog post, maybe I’ll put that into a separate post and also change the style from rant into ‘constructive criticism’ (as hard as that will be lol).</p> <p>My verdict about Vulkan so far is basically: Not great, not terrible.</p> <p>It’s better than OpenGL but not as good (from an API user’s perspective) as pretty much any other 3D API. In many places Vulkan is already the same mess as OpenGL. Sediment layers of outdated, deprecated or competing features and extensions which is incredibly hard to make sense of when not closely following Vulkan’s development since its initial release in 2016 (which is the exact same problem that ruined OpenGL).</p> <p>At the very least, please, please, PLEASE aggressively remove cruft and reduce the ‘optional-features creep’ in minor Vulkan API versions (which I think should actually be major versions - 4 breaking versions in 10 years sounds just about right).</p> <p>For instance when I’m working against the Vulkan 1.3 API I really don’t care about any legacy features which have been replaced by newer systems (like synchronization2 replacing the old synchronization API). Don’t expose the extensions that have been incorporated into core up to 1.3, and also let me filter out all those outdated declarations from the Vulkan headers so that code-completion doesn’t suggest outdated API types and functions. Don’t require me to explicitly enable every little feature (like anisotropic filtering) when creating a Vulkan device. If some shitty old-school GPU doesn’t have anisotropic filtering, then just silently ignore it instead of polluting the 3D API for all eternity just for this one GPU model which probably wasn’t even produced anymore even back in 2016.</p> <p>Vulkan profiles are a good idea in theory, but please move them into the core API instead of implementing them as a Vulkan SDK feature. Give me a <code class="language-plaintext highlighter-rouge">vkCreateSystemDefaultDevice(VK_PROFILE_*)</code> function to get rid of those 500 lines of boilerplate that <strong>every single Vulkan programmer</strong> needs to duplicate line by line (people who need more control over the setup process can still use that traditional initialization dance).</p> <p>And PLEASE get somebody into Khronos who has the power to inject at least a minimal amount of taste and elegance into Vulkan and who has a clear idea what should and shouldn’t go into the core API, because just promoting random vendor extensions into core is really not a good way to build an API (and that was clear since OpenGL - and the <strong>one</strong> thing that Vulkan should have done better).</p> <p>Also, a low-level and explicit API <strong>DOES NOT HAVE TO BE</strong> a hassle to use.</p> <p>Somehow modern software systems always seem be built around the ‘no pain, no gain’ philosophy (see Rust, Vulkan, Wayland, …), this sort of self-inflicted suffering for the sake of purity is such a weird Christian flex that I’m starting to wonder if ‘religious memes’ surviving under the surface in even the most rational and atheist developer brains is actually a thing…</p> <p>Maybe we should return to the ‘Californian hippie attitude’ for building computer systems and software - apparently that had worked pretty great in the 70’s and 80’s ;)</p> <p>…ok I’m getting into old-man-yells-at-cloud-mode again, so I’ll better stop here :D</p> Mon, 01 Dec 2025 00:00:00 +0000 https://floooh.github.io/2025/12/01/sokol-vulkan-backend-1.html https://floooh.github.io/2025/12/01/sokol-vulkan-backend-1.html The sokol-gfx resource view update. <p><strong>Update:</strong> merge happened on 23-Aug-2025.</p> <p>In a couple of days I will merge the next big (and breaking) sokol-gfx update which adds resource view objects and in turn removes pre-baked pass-attachment objects.</p> <p>The update also requires to update sokol-shdc and recompile shaders.</p> <p>The root PR is here: <a href="https://github.com/floooh/sokol/pull/1287">https://github.com/floooh/sokol/pull/1287</a></p> <p>After merging the update I will spend a couple of weeks to take care of pending issues and PRs before moving on to a followup <a href="https://github.com/floooh/sokol/issues/1302">resource views update 2</a>.</p> <h2 id="what-are-resource-view-objects">What are resource view objects?</h2> <p>If you’re familiar with D3D10 and later you’ll feel right at home since resource views are a fundamental concept in D3D, and sokol-gfx’s concept of resource views is closest to D3D11. Other 3D APIs either don’t have view objects at all (WebGL2 and GL before version 4.3), or only associate resource views with texture data but not buffer data (GL &gt;= 4.3, Metal and WebGPU).</p> <p>Typically resource views have a number of different purposes in the various 3D-APIs:</p> <ul> <li>they specialize a parent resource object for a specific usage in shaders (for instance sampling an image object as a texture versus using the same image object as render target)</li> <li>they can reinterpret the data in a resource object (for instance to a different pixel format or image type)</li> <li>they can define a subset of the data in the resource object (for instance selecting a specific mipmap or range of mipmaps in a texture)</li> </ul> <p>In sokol-gfx you can think of view objects mainly as specializations of an <code class="language-plaintext highlighter-rouge">sg_image</code> or <code class="language-plaintext highlighter-rouge">sg_buffer</code> object for how the image or buffer is going to be accessed in shaders:</p> <ul> <li>sampling a texture in a shader requires a <strong>texture view</strong></li> <li>writing to a storage image in a compute shader requires a <strong>storage image view</strong></li> <li>accessing a storage buffer in a shader requires a <strong>storage buffer view</strong></li> <li>each render pass attachment type requires its own view object type: <ul> <li><strong>color-attachment views</strong></li> <li><strong>resolve-attachment views</strong></li> <li><strong>depth-stencil-attachment views</strong></li> </ul> </li> </ul> <p>Alternatively you can think of view objects as specializations of a resource object for a specific bindings type (I was actually considering calling this new object type <code class="language-plaintext highlighter-rouge">sg_binding</code>, but since ‘view’ is the more established term I went with <code class="language-plaintext highlighter-rouge">sg_view</code> instead).</p> <p>In sokol-gfx, resource view types are ‘runtime flavours’ of the same handle type <code class="language-plaintext highlighter-rouge">sg_view</code>. This means that setting the wrong resource type on a bindslot won’t be a compilation error, but a runtime error in the sokol-gfx validation layer, so please make sure to test your code in debug build mode from time to time.</p> <h2 id="new-unlocked-features">New unlocked features</h2> <p>This first sokol-gfx resource view update unlocks the following features:</p> <ul> <li>Storage buffer bindings can now have an offset. Binding storage buffers with offsets is mainly useful when the same buffer contains different types of items in different sections of the buffer, and processing those items in separate compute shaders - or if you only need to access a section of a buffer with a compute shader.</li> <li>Texture views can define a subset of the parent image by defining their own mipmap- and slice-ranges (not on WebGL, GLES3 or GL4.1 - e.g. macOS)</li> <li>Storage images are no longer ‘compute pass attachments’, but instead bound like regular textures in the <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> call. This allows writing to many different storage images in the same compute pass (the number of simultaneously bound storage images is still very restricted though)</li> <li>Combinations of render pass attachment images are no longer ‘pre-baked’ into <code class="language-plaintext highlighter-rouge">sg_attachments</code> objects. Instead <code class="language-plaintext highlighter-rouge">sg_attachments</code> is now a transient struct like <code class="language-plaintext highlighter-rouge">sg_bindings</code>. This relaxes another ‘combinatorial explosion scenario’ because rendering code longer needs to predict all possible render-pass attachment combinations upfront.</li> </ul> <h2 id="current-restrictions-and-planned-features">Current restrictions and planned features</h2> <p>The following resource view features are planned for a followup ‘resource view update 2’:</p> <ul> <li>Reinterpret the pixel format and image type of image objects in a view object.</li> <li>Change the max number of per-shader-stage resource bindings of the same type from hardwired conservative limits to dynamic device limits exposed in the <code class="language-plaintext highlighter-rouge">sg_limits</code> struct (e.g. more than 4 storage image, 8 storage buffer or 16 texture bindings - instead try to push those limits closer to 32)</li> </ul> <p>For more details about planned ‘update 2’ features see:</p> <p><a href="https://github.com/floooh/sokol/issues/1302">https://github.com/floooh/sokol/issues/1302</a></p> <h2 id="high-level-overview-of-public-api-changes">High level overview of public API changes</h2> <ul> <li>the <code class="language-plaintext highlighter-rouge">sg_attachments</code> object type and related functions have been removed</li> <li>a new object type <code class="language-plaintext highlighter-rouge">sg_view</code> has been added along with related functions</li> <li><code class="language-plaintext highlighter-rouge">sg_features</code> gained a new flag <code class="language-plaintext highlighter-rouge">.gl_texture_views</code>, when this is false the GL backend doesn’t have full texture view support (e.g. it’s not possible to limit a view to a miplevel or slices subset)</li> <li>the <code class="language-plaintext highlighter-rouge">sg_attachments</code> name has been repurposed for a transient struct of render pass attachment views: <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_attachments</span> <span class="p">{</span> <span class="n">sg_view</span> <span class="n">colors</span><span class="p">[</span><span class="n">SG_MAX_COLOR_ATTACHMENTS</span><span class="p">];</span> <span class="n">sg_view</span> <span class="n">resolves</span><span class="p">[</span><span class="n">SG_MAX_COLOR_ATTACHMENTS</span><span class="p">];</span> <span class="n">sg_view</span> <span class="n">depth_stencil</span><span class="p">;</span> <span class="p">}</span> <span class="n">sg_attachments</span><span class="p">;</span> </code></pre></div> </div> </li> <li>the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct now has a unified array for views instead of separate arrays for each ‘shader resource type’ (textures, storage images and storage buffers): <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_bindings</span> <span class="p">{</span> <span class="c1">// ...</span> <span class="n">sg_view</span> <span class="n">views</span><span class="p">[</span><span class="n">SG_MAX_VIEW_BINDSLOTS</span><span class="p">];</span> <span class="c1">// ...</span> <span class="p">}</span> <span class="n">sg_bindings</span><span class="p">;</span> </code></pre></div> </div> </li> <li>the <code class="language-plaintext highlighter-rouge">sg_image_usage</code> struct now has more detailed usage flags for render pass attachments, and the <code class="language-plaintext highlighter-rouge">.storage_attachment</code> usage flag has been renamed to <code class="language-plaintext highlighter-rouge">.storage_image</code>: <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_image_usage</span> <span class="p">{</span> <span class="n">bool</span> <span class="n">storage_image</span><span class="p">;</span> <span class="n">bool</span> <span class="n">color_attachment</span><span class="p">;</span> <span class="n">bool</span> <span class="n">resolve_attachment</span><span class="p">;</span> <span class="n">bool</span> <span class="n">depth_stencil_attachment</span><span class="p">;</span> <span class="c1">// ...</span> <span class="p">}</span> <span class="n">sg_image_usage</span><span class="p">;</span> </code></pre></div> </div> </li> <li>in <code class="language-plaintext highlighter-rouge">sg_image_desc</code> the items to directly inject backend-specific view objects have been removed: <ul> <li><code class="language-plaintext highlighter-rouge">d3d11_shader_resource_view</code></li> <li><code class="language-plaintext highlighter-rouge">wgpu_texture_view</code></li> </ul> </li> <li>in <code class="language-plaintext highlighter-rouge">sg_shader_desc</code>: <ul> <li>the internals of the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct to describe the shader binding interface has been changed to a unified array of <code class="language-plaintext highlighter-rouge">sg_shader_view</code> structs: <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_shader_desc</span> <span class="p">{</span> <span class="c1">// ...</span> <span class="n">sg_shader_view</span> <span class="n">views</span><span class="p">[</span><span class="n">SG_MAX_VIEW_BINDSLOTS</span><span class="p">];</span> <span class="c1">// ...</span> <span class="p">}</span> <span class="n">sg_shader_desc</span><span class="p">;</span> </code></pre></div> </div> </li> <li>some renaming to better differentiate between ‘(storage) image and texture bindings’, for instance ‘image-sampler-pairs’ are now called ‘texture-sampler-pairs’, since only texture bindings are ‘sampled’, but not storage-image bindings</li> </ul> </li> <li>many new items in the <code class="language-plaintext highlighter-rouge">sg_frame_stats</code> struct, mostly not directly related to resource views, but filling some gaps</li> </ul> <h2 id="shader-authoring-changes">Shader Authoring Changes</h2> <blockquote> <p>TL;DR: When recompiling existing shaders you might get new errors about bindslot collisions which need to be resolved by changing the <code class="language-plaintext highlighter-rouge">layout(binding=N)</code> decorations.</p> </blockquote> <p>When using sokol-shdc, the only change on the shader side is that textures, storage buffers and storage images now share a common bindslot range, previously each binding type had its own slot range:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">@</span><span class="n">cs</span> <span class="n">cs</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">cs_inp_tex</span><span class="p">;</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">rgba8</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">writeonly</span> <span class="kr">image2D</span> <span class="n">cs_outp_tex</span><span class="p">;</span> <span class="c1">// ...</span> <span class="err">@</span><span class="n">end</span> </code></pre></div></div> <p>Note how in this (old) code-snippet the texture- and storage-image bindings use the same bindslot 0 because previously textures and storage images had their own bindslot space.</p> <p>This code will now produce a ‘bindslot collision error’ when compiled with sokol-shdc, because texture- and storage-image bindings now use the same bindslot space, so bindings for texture-, storage-buffer- and storage-image-bindings across all shader stages need to be fixed to not collide:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">@</span><span class="n">cs</span> <span class="n">cs</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">cs_inp_tex</span><span class="p">;</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">rgba8</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">writeonly</span> <span class="kr">image2D</span> <span class="n">cs_outp_tex</span><span class="p">;</span> <span class="c1">// ...</span> <span class="err">@</span><span class="n">end</span> </code></pre></div></div> <p>This bindslot fixup is the only change required on the shader side.</p> <h2 id="working-with-texture-views">Working with Texture Views</h2> <p>Sample code:</p> <ul> <li><strong>texcube-sapp</strong> (simple textured rendering): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/texcube-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/texcube-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/texcube-sapp-ui.html">WebGPU sample</a></li> <li><strong>dyntex-sapp</strong> (CPU-update dynamic texture): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/dyntex-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/dyntex-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/dyntex-sapp-ui.html">WebGPU sample</a></li> </ul> <p>Let’s say a shader defines a texture binding at slot 3:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">tex</span><span class="p">;</span> </code></pre></div></div> <p>To ‘populate’ this bindslot on the CPU side you need two objects now: an image object, and a texture view on the image object:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span> <span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span> <span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">subimage</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">});</span> <span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">texture</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span> <span class="p">},</span> <span class="p">});</span> </code></pre></div></div> <p>Since this is C you can also chain the designated initializers which looks a bit more compact (unfortunately this isn’t supported in most other languages):</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">texture</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span> <span class="p">});</span> </code></pre></div></div> <p>The <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> call now has an array of <code class="language-plaintext highlighter-rouge">sg_view</code> handles instead of separate arrays for images and storage buffers:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span> <span class="p">.</span><span class="n">vertex_buffers</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">.</span><span class="n">views</span><span class="p">[</span><span class="n">VIEW_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">tex_view</span><span class="p">,</span> <span class="p">.</span><span class="n">samplers</span><span class="p">[</span><span class="n">SMP_smp</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">});</span> </code></pre></div></div> <p>Since the texture binding was defined as <code class="language-plaintext highlighter-rouge">layout(binding=3)</code> it’s also safe to just use the bind slot index directly instead of the code-generated constant:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span> <span class="p">.</span><span class="n">vertex_buffers</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">.</span><span class="n">views</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="n">tex_view</span><span class="p">,</span> <span class="p">.</span><span class="n">samplers</span><span class="p">[</span><span class="n">SMP_smp</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">});</span> </code></pre></div></div> <p>In many situations you only need the view handle and don’t need the separate image handle, this means you can nest the <code class="language-plaintext highlighter-rouge">sg_make_image()</code> inside the <code class="language-plaintext highlighter-rouge">sg_make_view()</code> call:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">texture</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_image_view</span><span class="p">){</span> <span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span> <span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span> <span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">subimage</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">}),</span> <span class="p">});</span> </code></pre></div></div> <p>If you need the image handle later you can extract it from the view object via <code class="language-plaintext highlighter-rouge">sg_query_view_image()</code>:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_query_view_image</span><span class="p">(</span><span class="n">tex_view</span><span class="p">);</span> </code></pre></div></div> <p>Texture views can select a subrange of mipmaps and slices of their parent image (not supported on WebGL2, GLES3 or GL4.1):</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">texture</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span><span class="p">,</span> <span class="p">.</span><span class="n">mip_levels</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">base</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="p">.</span><span class="n">count</span> <span class="o">=</span> <span class="mi">3</span> <span class="p">},</span> <span class="p">.</span><span class="n">slices</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">base</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="p">.</span><span class="n">count</span> <span class="o">=</span> <span class="mi">2</span> <span class="p">},</span> <span class="p">},</span> <span class="p">});</span> </code></pre></div></div> <p>If <code class="language-plaintext highlighter-rouge">.count</code> is left at default-zero it means ‘all remaining mipmaps or slices’. For instance this will only skip the most detailed mipmap but keep the remaining mipmap chain in place:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">texture</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span><span class="p">,</span> <span class="p">.</span><span class="n">mip_levels</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">base</span> <span class="o">=</span> <span class="mi">1</span> <span class="p">},</span> <span class="p">},</span> <span class="p">});</span> </code></pre></div></div> <h2 id="view-vs-parent-resource-lifetime-considerations">View vs parent resource lifetime considerations</h2> <p>Before moving on to the other view types, a little interlude about lifetimes and resource states:</p> <p>If you’re coming from 3D APIs with ref-counted lifetime management like D3D, WebGPU or Metal you might be tempted to ‘release’ a view’s parent resource object right after creating its view object if the image object handle isn’t needed anymore:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span> <span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span> <span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">subimage</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">});</span> <span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_view</span><span class="p">){</span> <span class="p">.</span><span class="n">texture</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span> <span class="p">});</span> <span class="n">sg_destroy_image</span><span class="p">(</span><span class="n">img</span><span class="p">);</span> </code></pre></div></div> <p>In sokol-gfx lifetimes are explicit, if you pull the rug under a view like this nothing catastrophic will happen (e.g. no crashes or hard validation layers errors), but rendering operations involving such ‘dangling views’ will be silently skipped (this is basically the same behavior as before when trying to render with images or buffers in a non-valid resource state).</p> <p>Another slightly counter-intuitive behavior might be that a view object remains in valid resource state despite its parent resource being destroyed, e.g. following the above example code:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// get the destroyed image's resource state</span> <span class="k">if</span> <span class="p">(</span><span class="n">sg_query_image_state</span><span class="p">(</span><span class="n">img</span><span class="p">)</span> <span class="o">==</span> <span class="n">SG_RESOURCESTATE_INVALID</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// if-branch taken, since the image had been destroyed</span> <span class="c1">// ...</span> <span class="p">}</span> <span class="c1">// get the image's texture view resource state</span> <span class="k">if</span> <span class="p">(</span><span class="n">sg_query_view_state</span><span class="p">(</span><span class="n">tex_view</span><span class="p">)</span> <span class="o">==</span> <span class="n">SG_RESOURCESTATE_VALID</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// if-branch *also* taken!</span> <span class="c1">// ...</span> <span class="p">}</span> </code></pre></div></div> <p>I went a bit back and forth on this decision but I think the behavior makes sense from the perspective that all resource state changes in sokol-gfx are explicit (e.g. there are no ‘automatic’ state changes as a side effect of a ‘remote’ state change of another object, instead all resource state changes are directly caused by a function call on that resource object). The same has always been true for pipelines and their shader object, just not specifically documented.</p> <p>If you want to check whether a view is ‘renderable’ you can use the following shortcut:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="n">sg_query_image_state</span><span class="p">(</span><span class="n">sg_query_view_image</span><span class="p">(</span><span class="n">tex_view</span><span class="p">))</span> <span class="o">==</span> <span class="n">SG_RESOURCESTATE_VALID</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// the view is 'renderable'</span> <span class="p">}</span> <span class="c1">// or for storage buffer views:</span> <span class="k">if</span> <span class="p">(</span><span class="n">sg_query_buffer_state</span><span class="p">(</span><span class="n">sg_query_view_buffer</span><span class="p">(</span><span class="n">sbuf_view</span><span class="p">))</span> <span class="o">==</span> <span class="n">SG_RESOURCESTATE_VALID</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// the view is 'renderable'</span> <span class="p">}</span> </code></pre></div></div> <p>This works because no matter what state the view object is in (or even exists), <code class="language-plaintext highlighter-rouge">sq_query_view_image()</code> will either return an image handle or an invalid handle and both can be passed into <code class="language-plaintext highlighter-rouge">sg_query_image_state()</code>. An invalid image handle will return <code class="language-plaintext highlighter-rouge">SG_RESOURCESTATE_INVALID</code> while a valid image handle will return the actual <code class="language-plaintext highlighter-rouge">SG_RESOURCESTATE_*</code> of the image object.</p> <h2 id="tracking-uninit--init-cycles">Tracking uninit =&gt; init cycles</h2> <p>If the parent resource goes through a ‘destroy =&gt; make’ or ‘uninit =&gt; init’ cycle, all views which had been created from this parent resource must also be re-initialized, otherwise rendering operations involving such ‘dangling views’ will silently be skipped.</p> <p>A common pattern for this situation is to use the ‘uninit =&gt; init’ calls instead of ‘destroy =&gt; make’ because the handles will remain valid (e.g. you don’t need to distribute new object handles into all corners of your code base):</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// first uninit/init the parent image with new params:</span> <span class="n">sg_uninit_image</span><span class="p">(</span><span class="n">img</span><span class="p">);</span> <span class="n">sg_init_image</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="o">&amp;</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span> <span class="p">...</span> <span class="p">});</span> <span class="c1">// then 'cycle' the image's view objects</span> <span class="n">sg_uninit_view</span><span class="p">(</span><span class="n">tex_view</span><span class="p">);</span> <span class="n">sg_init_view</span><span class="p">(</span><span class="n">tex_view</span><span class="p">,</span> <span class="o">&amp;</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">texture</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span> <span class="p">});</span> </code></pre></div></div> <p>I was at first considering to add a ‘managed mode’ for views which would track the state of their parent resource and automatically go through an uninit/init cycle when needed, but this just didn’t fit into the sokol philosophy of explicit lifetimes and resource states, and having this one special case for view objects caused more confusion which wasn’t worth the small gain in convenience (this decision also wasn’t purely based on gut feeling since I actually <em>had</em> implemented the ‘managed mode’ already but then kicked it out again after actually starting to port the sokol sample code over - it just didn’t ‘feel right’).</p> <p>When porting existing code over to resource view objects, don’t forget that you need to destroy at least two objects now for complete cleanup (views <em>and</em> their parent resource).</p> <p>The order in which you destroy the views and parent resources doesn’t matter, this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_destroy_view</span><span class="p">(</span><span class="n">view</span><span class="p">);</span> <span class="n">sg_destroy_image</span><span class="p">(</span><span class="n">img</span><span class="p">);</span> </code></pre></div></div> <p>…works just as well as this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_destroy_image</span><span class="p">(</span><span class="n">img</span><span class="p">);</span> <span class="n">sg_destroy_view</span><span class="p">(</span><span class="n">view</span><span class="p">);</span> </code></pre></div></div> <p><strong>BUT BE AWARE OF THIS TRAP:</strong></p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_destroy_view</span><span class="p">(</span><span class="n">view</span><span class="p">);</span> <span class="n">sg_destroy_image</span><span class="p">(</span><span class="n">sg_query_view_image</span><span class="p">(</span><span class="n">view</span><span class="p">));</span> </code></pre></div></div> <p>Since the view is already destroyed, <code class="language-plaintext highlighter-rouge">sg_query_view_image()</code> will return the invalid handle, and passing the invalid handle into <code class="language-plaintext highlighter-rouge">sg_destroy_image()</code> is a silent no-op (e.g. your image will leak).</p> <p>…this is actually a nice example of how convenience in one situation (calling <code class="language-plaintext highlighter-rouge">sg_query_view_image(view)</code> and <code class="language-plaintext highlighter-rouge">sg_destroy_image()</code> with an invalid handle being a silent no-op) can cause trouble in other situations. I’ll need to think about whether this should at least be logged as an error instead.</p> <h2 id="working-with-render-pass-attachment-views">Working with render pass attachment views</h2> <p>Sample code:</p> <ul> <li><strong>offscreen-sapp</strong> (simple offscreen rendering): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/offscreen-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/offscreen-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/offscreen-sapp-ui.html">WebGPU sample</a></li> <li><strong>offscreen-msaa-sapp</strong> (multi-sampled offscreen rendering): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/offscreen-msaa-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/offscreen-msaa-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/offscreen-msaa-sapp-ui.html">WebGPU sample</a></li> <li><strong>mrt-sapp</strong> (multiple-render-target, multi-sampled offscreen rendering): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/mrt-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/mrt-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/mrt-sapp-ui.html">WebGPU sample</a></li> <li><strong>mrt-pixelformats-sapp</strong> (multiple render target rendering with different pixel formats): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/mrt-pixelformats-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/mrt-pixelformats-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/mrt-pixelformats-sapp-ui.html">WebGPU sample</a></li> <li><strong>shadows-sapp</strong> (shadow-mapping with regular shadow map texture): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/shadows-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/shadows-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/shadows-sapp-ui.html">WebGPU sample</a></li> <li><strong>shadows-depthtex-sapp</strong> (shadow-mapping with a depth-buffer texture): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/shadows-depthtex-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/shadows-depthtex-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/shadows-depthtex-sapp-ui.html">WebGPU sample</a></li> <li><strong>miprender-sapp</strong> (render into mipmaps): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/miprender-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/miprender-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/miprender-sapp-ui.html">WebGPU sample</a></li> <li><strong>layerrender-sapp</strong> (render into array slice): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/layerrender-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/layerrender-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/layerrender-sapp-ui.html">WebGPU sample</a></li> </ul> <p>In the previous sokol-gfx version, when doing offscreen rendering into an image object a ‘pre-baked’ attachments object had to be created which was then passed into <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code>:</p> <p>E.g. old code:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// create a color and depth-buffer image for offscreen rendering</span> <span class="n">sg_image</span> <span class="n">color_img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">render_attachment</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">},</span> <span class="c1">// ...</span> <span class="p">});</span> <span class="n">sg_image</span> <span class="n">depth_img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">render_attachment</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">},</span> <span class="c1">// ...</span> <span class="p">});</span> <span class="c1">// create an attachments object from those images...</span> <span class="n">sg_attachments</span> <span class="n">atts</span> <span class="o">=</span> <span class="n">sg_make_attachments</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_attachments_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">colors</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">image</span> <span class="o">=</span> <span class="n">color_img</span><span class="p">,</span> <span class="p">.</span><span class="n">depth_stencil</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">depth_img</span><span class="p">,</span> <span class="p">});</span> <span class="c1">// ... in the render loop for the offscreen render pass:</span> <span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">attachments</span> <span class="o">=</span> <span class="n">atts</span> <span class="p">});</span> <span class="c1">// ...</span> <span class="n">sg_end_pass</span><span class="p">();</span> <span class="c1">// ... and in the swapchain pass, bind the color image as texture:</span> <span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span> <span class="c1">// ...</span> <span class="p">.</span><span class="n">images</span><span class="p">[</span><span class="n">TEX_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">color_img</span><span class="p">,</span> <span class="c1">// ...</span> <span class="p">});</span> </code></pre></div></div> <p>Now, instead of creating a pre-baked attachments object, separate ‘attachment-view’ objects are created upfront, but their combined use for rendering is no longer pre-baked but defined on-the-fly in the <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> call, much like bindings in the <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> call:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// create color- and depth-buffer images</span> <span class="c1">// NOTE the more detailed usage flags</span> <span class="n">sg_image</span> <span class="n">color_img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">color_attachment</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">},</span> <span class="c1">// ...</span> <span class="p">});</span> <span class="n">sg_image</span> <span class="n">depth_img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">depth_stencil_attachment</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">},</span> <span class="c1">// ...</span> <span class="p">});</span> <span class="c1">// create color- and depth-stencil attachment views</span> <span class="n">sg_view</span> <span class="n">color_att_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">color_attachment</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">color_img</span><span class="p">,</span> <span class="p">});</span> <span class="n">sg_view</span> <span class="n">depth_att_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">depth_stencil_attachment</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">depth_img</span><span class="p">,</span> <span class="p">});</span> <span class="c1">// since the color-attachment image is also sampled as texture,</span> <span class="c1">// we'll also need a texture view:</span> <span class="n">sg_view</span> <span class="n">color_tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">texture</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">color_img</span><span class="p">,</span> <span class="p">});</span> <span class="c1">// later in the offscreen render pass, the attachment views</span> <span class="c1">// are passed directly into sg_begin_pass:</span> <span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pass_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">attachments</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">colors</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">color_att_view</span><span class="p">,</span> <span class="p">.</span><span class="n">depth_stencil</span> <span class="o">=</span> <span class="n">depth_att_view</span><span class="p">,</span> <span class="p">},</span> <span class="p">});</span> <span class="c1">// ...</span> <span class="n">sg_end_pass</span><span class="p">();</span> <span class="c1">// and in the swapchain pass, the texture view is bound</span> <span class="c1">// to sample the offscreen-rendered image as texture:</span> <span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span> <span class="c1">// ...</span> <span class="p">.</span><span class="n">views</span><span class="p">[</span><span class="n">VIEW_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">color_tex_view</span><span class="p">,</span> <span class="c1">// ...</span> <span class="p">});</span> </code></pre></div></div> <h2 id="working-with-storage-image-views">Working with storage image views</h2> <p>Samples:</p> <ul> <li><strong>write-storageimage-sapp</strong> (write into storage image with compute shader): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/write-storageimage-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/write-storageimage-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/write-storageimage-sapp-ui.html">WebGPU sample</a></li> <li><strong>imageblur-sapp</strong> (image blurring with compute shaders): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/imageblur-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/imageblur-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/imageblur-sapp.html">WebGPU sample</a></li> </ul> <p>Storage image bindings are no longer defined as compute-pass attachments in <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code>, but instead like regular texture- or storage-buffer-bindings in <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code>.</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// first create an image object with storage-image usage:</span> <span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">storage_image</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">},</span> <span class="c1">// ...</span> <span class="p">});</span> <span class="c1">// to write to the image with a compute shader, a storage image view is needed:</span> <span class="n">sg_view</span> <span class="n">simg_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">storage_image</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span><span class="p">,</span> <span class="p">.</span><span class="n">mip_level</span> <span class="o">=</span> <span class="p">...,</span> <span class="c1">// optional: select a specific miplevel</span> <span class="p">.</span><span class="n">slice</span> <span class="o">=</span> <span class="p">...,</span> <span class="c1">// optional: select a specific slice</span> <span class="p">},</span> <span class="p">});</span> <span class="c1">// ...and to sample that same image as a texture for rendering, a texture view is needed:</span> <span class="n">sg_view</span> <span class="n">tex_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">texture</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span><span class="p">,</span> <span class="p">});</span> <span class="c1">// storage image views are now applied as regular bindings in a compute pass:</span> <span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">compute</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">});</span> <span class="c1">// ...</span> <span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span> <span class="p">.</span><span class="n">views</span><span class="p">[</span><span class="n">VIEW_simg</span><span class="p">]</span> <span class="o">=</span> <span class="n">simg_view</span><span class="p">,</span> <span class="p">})</span> <span class="n">sg_dispatch</span><span class="p">(...);</span> <span class="n">sg_end_pass</span><span class="p">();</span> <span class="c1">// and to use the compute-shader-updated image as a texture in a render pass,</span> <span class="c1">// bind the texture view as usual:</span> <span class="n">sg_begin_pass</span><span class="p">(...);</span> <span class="c1">// ...</span> <span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span> <span class="c1">// ...</span> <span class="p">.</span><span class="n">views</span><span class="p">[</span><span class="n">VIEW_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">tex_view</span><span class="p">,</span> <span class="p">.</span><span class="n">samplers</span><span class="p">[</span><span class="n">SMP_smp</span><span class="p">]</span> <span class="o">=</span> <span class="n">smp</span><span class="p">,</span> <span class="p">});</span> <span class="n">sg_draw</span><span class="p">(...);</span> <span class="n">sg_end_pass</span><span class="p">();</span> </code></pre></div></div> <h2 id="working-with-storage-buffer-views">Working with storage buffer views</h2> <p>Samples:</p> <ul> <li><strong>vertexpull-sapp</strong> (vertex pulling from storage buffer): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/vertexpull-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/vertexpull-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/vertexpull-sapp-ui.html">WebGPU sample</a></li> <li><strong>sbuftex-sapp</strong> (access storage buffer in fragment shader): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/sbuftex-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/sbuftex-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/sbuftex-sapp-ui.html">WebGPU sample</a></li> <li><strong>instancing-compute-sapp</strong> (update instancing data with compute shader): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/instancing-compute-sapp-ui.html">WebGPU sample</a></li> <li><strong>sbufoffset-sapp</strong> (demonstrate storage buffer bindings with offset): <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/sbufoffset-sapp.c">C code</a>, <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/sbufoffset-sapp.glsl">GLSL code</a>, <a href="https://floooh.github.io/sokol-webgpu/sbufoffset-sapp-ui.html">WebGPU sample</a></li> </ul> <p>To bind a buffer object as storage buffer for vertex-pulling or compute-shader access you now need a storage-buffer-view object:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// create a buffer with storage-buffer usage:</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">storage_buffer</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">},</span> <span class="c1">// ...</span> <span class="p">});</span> <span class="c1">// create a storage buffer view</span> <span class="n">sg_view</span> <span class="n">sbuf_view</span> <span class="o">=</span> <span class="n">sg_make_view</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_view_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">storage_buffer</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">buffer</span> <span class="o">=</span> <span class="n">buf</span><span class="p">,</span> <span class="p">.</span><span class="n">offset</span> <span class="o">=</span> <span class="p">...,</span> <span class="c1">// optional 256-byte aligned offset</span> <span class="p">}</span> <span class="p">});</span> <span class="c1">// ...later in a render- or compute-pass bind the storage buffer view:</span> <span class="n">sg_apply_bindings</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_bindings</span><span class="p">){</span> <span class="p">.</span><span class="n">views</span><span class="p">[</span><span class="n">VIEW_ssbo</span><span class="p">]</span> <span class="o">=</span> <span class="n">sbuf_view</span><span class="p">,</span> <span class="p">});</span> </code></pre></div></div> <p>The 256-byte-alignment restriction for the offset is a bit unfortunate, since vertex-buffer and index-buffer bind offsets don’t have that restriction. The alignment restriction is coming in via WebGPU which on some Android devices requires this 256 byte alignment, but the only realistic lower choice would be 64 bytes which frankly isn’t that much better (see: <a href="https://vulkan.gpuinfo.org/displaydevicelimit.php?platform=android&amp;name=minStorageBufferOffsetAlignment">https://vulkan.gpuinfo.org/displaydevicelimit.php?platform=android&amp;name=minStorageBufferOffsetAlignment</a>) and would still exclude about 8 percent of Android devices which is quite a lot.</p> <h2 id="when-not-using-sokol-shdc">When not using sokol-shdc…</h2> <p>Samples:</p> <ul> <li>for <a href="https://github.com/floooh/sokol-samples/tree/master/d3d11">D3D11</a></li> <li>for <a href="https://github.com/floooh/sokol-samples/tree/master/metal">Metal</a></li> <li>for <a href="https://github.com/floooh/sokol-samples/tree/master/glfw">desktop GL</a></li> <li>for <a href="https://github.com/floooh/sokol-samples/tree/master/html5">WebGL2</a></li> <li>for <a href="https://github.com/floooh/sokol-samples/tree/master/wgpu">WebGPU</a></li> </ul> <p>Some tweaks on the manually populated <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> structs are needed when not using sokol-shdc:</p> <ul> <li>The separate bindslot reflection arrays for images, storage-buffers and storage-images have been unified into a <code class="language-plaintext highlighter-rouge">views[]</code> array which mirrors the <code class="language-plaintext highlighter-rouge">views[]</code> array in the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct. The actual reflection information in each view bindslot has remained the same though.</li> <li>The <code class="language-plaintext highlighter-rouge">.image_sampler_pair</code> array has been renamed to <code class="language-plaintext highlighter-rouge">.texture_sampler_array</code>, and the struct member <code class="language-plaintext highlighter-rouge">.image_slot</code> has been renamed to <code class="language-plaintext highlighter-rouge">.view_slot</code>.</li> </ul> <p>Example from the <a href="https://github.com/floooh/sokol-samples/blob/master/wgpu/mrt-wgpu.c">wgpu/mrt_wgpu.c sample</a>:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_shader</span> <span class="n">fsq_shd</span> <span class="o">=</span> <span class="n">sg_make_shader</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_shader_desc</span><span class="p">){</span> <span class="c1">// ...</span> <span class="p">.</span><span class="n">views</span> <span class="o">=</span> <span class="p">{</span> <span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">texture</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">wgsl_group1_binding_n</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">},</span> <span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">texture</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">wgsl_group1_binding_n</span> <span class="o">=</span> <span class="mi">1</span> <span class="p">},</span> <span class="p">[</span><span class="mi">2</span><span class="p">].</span><span class="n">texture</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">wgsl_group1_binding_n</span> <span class="o">=</span> <span class="mi">2</span> <span class="p">},</span> <span class="p">},</span> <span class="p">.</span><span class="n">samplers</span> <span class="o">=</span> <span class="p">{</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">wgsl_group1_binding_n</span> <span class="o">=</span> <span class="mi">3</span> <span class="p">},</span> <span class="p">},</span> <span class="p">.</span><span class="n">texture_sampler_pairs</span> <span class="o">=</span> <span class="p">{</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">view_slot</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="p">.</span><span class="n">sampler_slot</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">},</span> <span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">view_slot</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="p">.</span><span class="n">sampler_slot</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">},</span> <span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stage</span> <span class="o">=</span> <span class="n">SG_SHADERSTAGE_FRAGMENT</span><span class="p">,</span> <span class="p">.</span><span class="n">view_slot</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span> <span class="p">.</span><span class="n">sampler_slot</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">},</span> <span class="p">},</span> <span class="p">});</span> </code></pre></div></div> <p>Shader code changes are only needed on WebGPU when using storage images. Those have moved from <code class="language-plaintext highlighter-rouge">@group(2)</code> into <code class="language-plaintext highlighter-rouge">@group(1)</code> (this is because storage images are no longer special compute-pass-attachments, but regular bindings just like texture- and storage-buffer bindings).</p> <h2 id="q--a">Q &amp; A</h2> <h3 id="why-no-vertex--and-index-buffer-views">Why no vertex- and index-buffer views</h3> <p>I had actually implemented vertex- and index-buffer views at first because it would have reduced the size of <code class="language-plaintext highlighter-rouge">sg_bindings</code> by 36 bytes (32 bytes vertex-buffer-offsets and 4 bytes index-buffer-offset). In the end I rolled that change back since none of the backend 3D APIs require creating view objects for binding vertex- and index-buffers, but some rendering scenarios (like writing a renderer backend for Dear ImGui) heavily depend on dynamic offsets for vertex- and index-data.</p> <p>I might come back to that idea once additional drawing functions with base-offsets are added (which is planned for the ‘not-too-distant future’). <del>Also adding a D3D12 backend would require adding view objects for vertex- and index-buffers, since D3D12 has removed the ability to bind vertex- and index-buffers directly with a dynamic offset (at least that’s what I’m seeing in the D3D12 docs).</del></p> <p><strong>Update:</strong> Nvm, I was wrong here, D3D12 just uses the name ‘view’ both for transient structs and for baked objects, and <code class="language-plaintext highlighter-rouge">D3D12_VERTEX_BUFFER_VIEW</code> and <code class="language-plaintext highlighter-rouge">D3D12_INDEX_BUFFER_VIEW</code> are such a transient struct. Thanks to ‘@[email protected]` for making me aware of my misconception!</p> <h3 id="why-no-texture-field-in-sg_image_usage-to-indicate-that-texture-views-may-be-created-for-an-image-object">Why no ‘texture’ field in sg_image_usage to indicate that texture views may be created for an image object?</h3> <p>Simply because creating a texture view is always supported for image objects, so that flag could be implicitly hardwired to true anyway (with one ‘legacy edge case’: WebGL2 and GL4.1 not supporting binding multi-sampled images as textures). In that edge-case, an explicit <code class="language-plaintext highlighter-rouge">.usage.texture</code> flag would allow to fail already at image object creation instead of failing to create a texture view on a multi-sampled image object, but since this is such a minor detail which only affects ‘legacy APIs’ (WebGL2 and GL 4.1) that I didn’t think adding an explicit texture usage flag was worth it.</p> <h3 id="whats-up-with-sg_max_view_bindslots-being-this-odd-28-instead-of-some-2n-value">What’s up with SG_MAX_VIEW_BINDSLOTS being this odd 28 instead of some 2^N value?</h3> <p>That way the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct is a nice round 256 bytes (64 bytes for vertex buffer handles and offsets, 8 bytes for index buffer and offset, 112 bytes for view handles, 64 bytes for sampler handles plus 2*4 bytes for the start and end canaries).</p> <p>16 separate samplers might be overkill, so I might tweak the number of views vs samplers a bit in the ‘resource view update 2’.</p> Sun, 17 Aug 2025 00:00:00 +0000 https://floooh.github.io/2025/08/17/sokol-gfx-view-update.html https://floooh.github.io/2025/08/17/sokol-gfx-view-update.html The sokol-gfx 'compute milestone 2' update <blockquote> <p>Update: merge happened on 24-May-2025</p> </blockquote> <p>In a couple of days I will merge the next breaking sokol_gfx.h update (aka the <code class="language-plaintext highlighter-rouge">compute-ms2</code> update) which makes working with buffer objects a bit more flexible and will allow compute shaders to write to <code class="language-plaintext highlighter-rouge">sg_image</code> objects via ‘compute pass attachments’.</p> <p>The update also comes with a matching sokol-shdc update which writes additional reflection information for storage images used in compute shaders into the code-generated <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct.</p> <blockquote> <p>NOTE: all WASM sample URLs in the blog post require a WebGPU capable browser and will only be valid after the merge.</p> </blockquote> <p>The implementation ticket is here, and this also has links to all related PRs: <a href="https://github.com/floooh/sokol/issues/1244">https://github.com/floooh/sokol/issues/1244</a></p> <h3 id="updated-documentation-sections">Updated documentation sections</h3> <ul> <li>in sokol_gfx.h, re-read the updated section <a href="https://github.com/floooh/sokol/blob/afc74bd88eab597665f5e4f10962c73524d7cbc1/sokol_gfx.h#L707-L798">ON COMPUTE PASSES</a></li> <li>if you’re not using sokol-shdc for shader compilation, also re-read the updated section <a href="https://github.com/floooh/sokol/blob/afc74bd88eab597665f5e4f10962c73524d7cbc1/sokol_gfx.h#L801-L1036">ON SHADER CREATION</a> (most of that information is only needed when <em>not</em> using sokol-shdc though)</li> <li>read the new doc section <a href="https://github.com/floooh/sokol/blob/afc74bd88eab597665f5e4f10962c73524d7cbc1/sokol_gfx.h#L1390-L1436">ON STORAGE IMAGES</a></li> </ul> <h3 id="an-important-behaviour-change-for-immutable-buffer-objects">An important behaviour change for immutable buffer objects</h3> <p>The initial ‘compute shader’ update allowed to create immutable buffers without initial data and guaranteed that the buffer content would be zero-initialized. On some backend APIs this required a temporary memory allocation of the buffer size which obviously wasn’t great.</p> <p>This guaranteed zero-initialization has been rolled back now and the rules for creating immutable buffer objects have been changed like this:</p> <ul> <li>when creating an immutable non-storage-buffer object (e.g. the buffer cannot be written to with a compute shader), initial data <em>must</em> be provided</li> <li>when creating an immutable storage-buffer object, no initial data needs to provided, but in that case the buffer content will be ‘undefined’</li> </ul> <p>In practice this means that when you use a compute shader to initialize storage buffer content you can no longer rely on the initial buffer content being zero-initialized, instead write <em>all</em> buffer items in the compute shader, even when they are supposed to be zero.</p> <h3 id="multi-purpose-buffer-objects">Multi-purpose buffer objects</h3> <p>It’s now possible to bind the same buffer object to different bind points (e.g. bind the same buffer as vertex buffer, index buffer and/or storage buffer). This means the following scenarios are now enabled:</p> <ul> <li>It’s possible to stash vertices and indices into the same buffer (with the exception of WebGL2 where this is explicitly disallowed)</li> <li>It’s now possible to use a compute shader to write data to a buffer, and then bind this buffer as vertex- or index-buffer.</li> </ul> <p>To achieve this, the <code class="language-plaintext highlighter-rouge">sg_buffer_desc</code> struct has been changed to merge the previous buffer type and buffer usage enum items into a new <code class="language-plaintext highlighter-rouge">sg_buffer_usage</code> struct which is a boolean flag group:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_buffer_usage</span> <span class="p">{</span> <span class="n">bool</span> <span class="n">vertex_buffer</span><span class="p">;</span> <span class="n">bool</span> <span class="n">index_buffer</span><span class="p">;</span> <span class="n">bool</span> <span class="n">storage_buffer</span><span class="p">;</span> <span class="n">bool</span> <span class="n">immutable</span><span class="p">;</span> <span class="n">bool</span> <span class="n">dynamic_update</span><span class="p">;</span> <span class="n">bool</span> <span class="n">stream_update</span><span class="p">;</span> <span class="p">}</span> <span class="n">sg_buffer_usage</span><span class="p">;</span> </code></pre></div></div> <p>The default setup configures an immutable vertex buffer (just as before), e.g. creating a buffer object like this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices</span><span class="p">),</span> <span class="p">})</span> </code></pre></div></div> <p>…is identical with:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">vertex_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">immutable</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices</span><span class="p">),</span> <span class="p">});</span> </code></pre></div></div> <p>…to create an immutable index buffer:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">indices</span><span class="p">),</span> <span class="p">});</span> </code></pre></div></div> <p>…to create an index buffer with stream-update hint:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">stream_update</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">});</span> </code></pre></div></div> <p>…to create a buffer that can be written by a compute shader and then bound to a vertex buffer bindpoint:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">vertex_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">storage_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">});</span> </code></pre></div></div> <p>…and the same as index buffer:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">storage_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">});</span> </code></pre></div></div> <p>To stash both vertices and indices into the same buffer object:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">vertex_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices_and_indices</span><span class="p">),</span> <span class="p">});</span> </code></pre></div></div> <p>Note that ‘multi-purpose buffer usage’ is explicitly disallowed on WebGL2 (which is only relevant for using a single buffer to hold vertex- and index-data, since storage buffers are not available on WebGL2 anyway). To check for this restriction use the new <code class="language-plaintext highlighter-rouge">sg_features.separate_buffer_types</code> boolean:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">sg_query_features</span><span class="p">().</span><span class="n">separate_buffer_types</span><span class="p">)</span> <span class="p">{</span> <span class="k">const</span> <span class="n">sg_buffer</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">vertex_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices_and_indices</span><span class="p">),</span> <span class="p">});</span> <span class="p">}</span> </code></pre></div></div> <p>Any invalid combination of usage flags will also be checked in the sokol-gfx validation layer.</p> <p>The following new sample uses a combined vertex/index buffer:</p> <ul> <li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/vertexindexbuffer-sapp.c">https://github.com/floooh/sokol-samples/blob/master/sapp/vertexindexbuffer-sapp.c</a></li> <li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/vertexindexbuffer-sapp.glsl">https://github.com/floooh/sokol-samples/blob/master/sapp/vertexindexbuffer-sapp.glsl</a></li> <li>WASM: <a href="https://floooh.github.io/sokol-webgpu/vertexindexbuffer-sapp-ui.html">https://floooh.github.io/sokol-webgpu/vertexindexbuffer-sapp-ui.html</a></li> </ul> <p>The <code class="language-plaintext highlighter-rouge">instancing-compute-sapp</code> sample has been updated to bind the compute-shader-updated storage buffer as vertex buffer with hardware instancing:</p> <ul> <li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.c">https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.c</a></li> <li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.glsl">https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.glsl</a></li> <li>WASM: <a href="https://floooh.github.io/sokol-webgpu/instancing-compute-sapp-ui.html">https://floooh.github.io/sokol-webgpu/instancing-compute-sapp-ui.html</a></li> </ul> <p>There is no sample yet which uses a compute shader to write index data.</p> <h3 id="breaking-changes-when-creating-image-objects">Breaking changes when creating image objects</h3> <p>Similar to the above <code class="language-plaintext highlighter-rouge">sg_buffer_desc</code> change, usage hints in the <code class="language-plaintext highlighter-rouge">sg_image_desc</code> struct are now provided through a new <code class="language-plaintext highlighter-rouge">sg_image_usage</code> struct looking like this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_image_usage</span> <span class="p">{</span> <span class="n">bool</span> <span class="n">render_attachment</span><span class="p">;</span> <span class="n">bool</span> <span class="n">storage_attachment</span><span class="p">;</span> <span class="n">bool</span> <span class="n">immutable</span><span class="p">;</span> <span class="n">bool</span> <span class="n">dynamic_update</span><span class="p">;</span> <span class="n">bool</span> <span class="n">stream_update</span><span class="p">;</span> <span class="p">}</span> <span class="n">sg_image_usage</span><span class="p">;</span> </code></pre></div></div> <p>E.g. creating a ‘render-target texture’ for offscreen rendering now looks like this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">render_attachment</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">},</span> <span class="p">...</span> <span class="p">});</span> </code></pre></div></div> <p>…and creating a image updated dynamically with CPU data with stream-update behaviour:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">stream_update</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">},</span> <span class="p">...</span> <span class="p">});</span> </code></pre></div></div> <p>As with <code class="language-plaintext highlighter-rouge">sg_buffer_usage</code>, invalid usage flag combinations are caught in the sokol-gfx validation layer.</p> <h3 id="compute-pass-attachments-aka-storage-images">Compute pass attachments (aka storage images)</h3> <p>It’s now possible to use compute shaders to write to <code class="language-plaintext highlighter-rouge">sg_image</code> objects. The way this is currently implemented is very similar to offscreen rendering (but will change in a future ‘resource view update’, more info on that at the end of the blog post).</p> <p>Let’s first write a simple compute shader in the sokol-shdc GLSL flavour which writes some animated color gradient to a storage image:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">@</span><span class="n">cs</span> <span class="n">cs</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">cs_params</span> <span class="p">{</span> <span class="kt">float</span> <span class="n">offset</span><span class="p">;</span> <span class="p">};</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">rgba8</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">writeonly</span> <span class="kr">image2D</span> <span class="n">cs_out_tex</span><span class="p">;</span> <span class="k">layout</span><span class="p">(</span><span class="n">local_size_x</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span> <span class="n">local_size_y</span><span class="o">=</span><span class="mi">16</span><span class="p">)</span> <span class="k">in</span><span class="p">;</span> <span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="kt">ivec2</span> <span class="n">size</span> <span class="o">=</span> <span class="n">imageSize</span><span class="p">(</span><span class="n">cs_out_tex</span><span class="p">);</span> <span class="kt">ivec2</span> <span class="n">pos</span> <span class="o">=</span> <span class="kt">ivec2</span><span class="p">(</span><span class="n">mod</span><span class="p">(</span><span class="kt">vec2</span><span class="p">(</span><span class="n">gl_GlobalInvocationID</span><span class="p">.</span><span class="n">xy</span><span class="p">)</span> <span class="o">+</span> <span class="kt">vec2</span><span class="p">(</span><span class="n">size</span><span class="p">)</span> <span class="o">*</span> <span class="n">offset</span><span class="p">,</span> <span class="n">size</span><span class="p">));</span> <span class="kt">vec4</span> <span class="n">color</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="kt">vec2</span><span class="p">(</span><span class="n">gl_GlobalInvocationID</span><span class="p">.</span><span class="n">xy</span><span class="p">)</span> <span class="o">/</span> <span class="kt">float</span><span class="p">(</span><span class="n">size</span><span class="p">),</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span> <span class="n">imageStore</span><span class="p">(</span><span class="n">cs_out_tex</span><span class="p">,</span> <span class="n">pos</span><span class="p">,</span> <span class="n">color</span><span class="p">);</span> <span class="p">}</span> <span class="err">@</span><span class="n">end</span> <span class="err">@</span><span class="n">program</span> <span class="n">compute</span> <span class="n">cs</span> </code></pre></div></div> <p>On the CPU side, create an <code class="language-plaintext highlighter-rouge">sg_image</code> object with ‘storage attachment usage’:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_image</span> <span class="n">img</span> <span class="o">=</span> <span class="n">sg_make_image</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_image_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">storage_attachment</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="n">WIDTH</span><span class="p">,</span> <span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="n">HEIGHT</span><span class="p">,</span> <span class="p">.</span><span class="n">pixel_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_RGBA8</span><span class="p">,</span> <span class="p">});</span> </code></pre></div></div> <p>Next the image must be wrapped in an <code class="language-plaintext highlighter-rouge">sg_attachments</code> object. This allows to pick a specific image surface (mip-level and/or slice) for the compute shader to access. Up to 4 (or <code class="language-plaintext highlighter-rouge">SG_MAX_STORAGE_ATTACHMENTS</code>) images can be defined in a single attachment:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_attachments</span> <span class="n">atts</span> <span class="o">=</span> <span class="n">sg_make_attachments</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_attachments_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">storages</span><span class="p">[</span><span class="n">SIMG_cs_out_tex</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">img</span><span class="p">,</span> <span class="c1">// optionally pick a mip level and slice:</span> <span class="p">.</span><span class="n">mip_level</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="p">.</span><span class="n">slice</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="p">},</span> <span class="p">});</span> </code></pre></div></div> <p>…next a compute pipeline object which wraps the above compute shader:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_pipeline</span> <span class="n">pip</span> <span class="o">=</span> <span class="n">sg_make_pipeline</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pipeline_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">compute</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">shader</span> <span class="o">=</span> <span class="n">sg_make_shader</span><span class="p">(</span><span class="n">compute_shader_desc</span><span class="p">(</span><span class="n">sg_query_backend</span><span class="p">)),</span> <span class="p">});</span> </code></pre></div></div> <p>In the frame loop, run a compute pass and provide the attachments object, apply the compute pipeline and uniform data, and finally call <code class="language-plaintext highlighter-rouge">sg_dispatch()</code> to kick off the compute shader:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">compute</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">attachments</span> <span class="o">=</span> <span class="n">atts</span> <span class="p">});</span> <span class="n">sg_apply_pipeline</span><span class="p">(</span><span class="n">pip</span><span class="p">);</span> <span class="n">sg_apply_uniforms</span><span class="p">(</span><span class="n">UB_cs_params</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">SG_RANGE</span><span class="p">(</span><span class="n">cs_params</span><span class="p">));</span> <span class="n">sg_dispatch</span><span class="p">(</span><span class="n">WIDTH</span> <span class="o">/</span> <span class="mi">16</span><span class="p">,</span> <span class="n">HEIGHT</span> <span class="o">/</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span> <span class="n">sg_end_pass</span><span class="p">();</span> </code></pre></div></div> <p>…after the compute pass the image object can then be used as a texture binding in a regular render pass.</p> <p>Find the complete sample here:</p> <ul> <li>WASM: <a href="https://floooh.github.io/sokol-webgpu/write-storageimage-sapp.html">https://floooh.github.io/sokol-webgpu/write-storageimage-sapp.html</a></li> <li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/write-storageimage-sapp.c">https://github.com/floooh/sokol-samples/blob/master/sapp/write-storageimage-sapp.c</a></li> <li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/write-storageimage-sapp.glsl">https://github.com/floooh/sokol-samples/blob/master/sapp/write-storageimage-sapp.glsl</a></li> </ul> <p>…and a more advanced example which has been ported from WebGPU:</p> <ul> <li>WASM: <a href="https://floooh.github.io/sokol-webgpu/imageblur-sapp.html">https://floooh.github.io/sokol-webgpu/imageblur-sapp.html</a></li> <li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/imageblur-sapp.c">https://github.com/floooh/sokol-samples/blob/master/sapp/imageblur-sapp.c</a></li> <li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/imageblur-sapp.c">https://github.com/floooh/sokol-samples/blob/master/sapp/imageblur-sapp.c</a></li> </ul> <h3 id="detailed-change-list">Detailed change list</h3> <h4 id="sokol_apph">sokol_app.h:</h4> <p>The D3D11/DXGI backend now creates a <code class="language-plaintext highlighter-rouge">D3D_FEATURE_LEVEL_11_1</code> device (with a fallback to <code class="language-plaintext highlighter-rouge">D3D_FEATURE_LEVEL_11_0</code>). Feature Level 11.1 is needed to allow more than 8 UAV (Unordered Access View) bindings. D3D11.1 was released around 2011 with Windows 8, so this is only an issue if support for Windows 7 is still required or on very old GPUs (Win7 is now at 0.12% on Steam Hardware Survey, but even if this turns out to be a problem, only the bindslot allocation strategy in sokol-shdc for HLSL5 UAV bindslots needs to be changed).</p> <h4 id="sokol_gfxh">sokol_gfx.h:</h4> <ul> <li>A new constant <code class="language-plaintext highlighter-rouge">SG_MAX_STORAGE_ATTACHMENTS = 4</code> has been added (most likely bumped to at least 8 in the future)</li> <li>The struct <code class="language-plaintext highlighter-rouge">sg_pixelformat_info</code> has gained two new flags: <ul> <li><code class="language-plaintext highlighter-rouge">bool read</code>: true if the pixel format supports compute shader read access</li> <li><code class="language-plaintext highlighter-rouge">bool write</code>: true if the pixel format supports compute shader write access</li> </ul> <p>Currently the list of compute shader accessible pixel formats is hardwired to the following list which is safe to use across all GPUs and backend APIs (all those formats support read+write access):</p> <ul> <li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_RGBA8</code></li> <li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_RGBA8SN/UI/SI</code></li> <li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_RGBA16UI/SI/F</code></li> <li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_R32UI/SI/F</code></li> <li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_RG32UI/SI/F</code></li> <li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_RGBA32UI/SI/F</code></li> </ul> </li> <li>A new feature flag <code class="language-plaintext highlighter-rouge">sg_features.separate_buffer_types</code> has been added, this is only true on WebGL2. The only effect of that flag is that the same buffer object cannot be used as vertex- and index-buffer bindings.</li> <li>The enums <code class="language-plaintext highlighter-rouge">sg_usage</code> and <code class="language-plaintext highlighter-rouge">sg_buffer_type</code> have been removed.</li> <li>The struct <code class="language-plaintext highlighter-rouge">sg_buffer_usage</code> has been added.</li> <li>The enum field <code class="language-plaintext highlighter-rouge">sg_buffer_desc.type</code> has been removed and replaced by boolean flags in <code class="language-plaintext highlighter-rouge">sg_buffer_usage</code>.</li> <li>The enum field <code class="language-plaintext highlighter-rouge">sg_buffer_desc.usage</code> has been repurposed as nested struct item of type <code class="language-plaintext highlighter-rouge">sg_buffer_usage</code>.</li> <li>The struct <code class="language-plaintext highlighter-rouge">sg_image_usage</code> has been added.</li> <li>The boolean <code class="language-plaintext highlighter-rouge">sg_image_desc.render_target</code> has been removed and replaced by <code class="language-plaintext highlighter-rouge">sg_image_usage.render_attachment</code></li> <li>The enum feld <code class="language-plaintext highlighter-rouge">sg_image_desc.usage</code> has been repurposed as nested struct item of type <code class="language-plaintext highlighter-rouge">sg_image_usage</code>.</li> <li>A new struct <code class="language-plaintext highlighter-rouge">sg_shader_storage_image</code> has been added, this is nested in in <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> and holds reflection information about storage image bindings in compute shaders.</li> <li>A new array <code class="language-plaintext highlighter-rouge">sg_shader_desc.storage_images[]</code> has been added to communicate reflection information about storage image usage in compute shaders to sokol_gfx.h</li> <li>A new array <code class="language-plaintext highlighter-rouge">sg_attachments_desc.storages[]</code> has been added to describe ‘storage image attachments’ for compute passes.</li> <li>The function <code class="language-plaintext highlighter-rouge">sg_query_buffer_usage()</code> now returns a struct <code class="language-plaintext highlighter-rouge">sg_buffer_usage</code>.</li> <li>The function <code class="language-plaintext highlighter-rouge">sg_query_image_usage()</code> now returns a struct <code class="language-plaintext highlighter-rouge">sg_image_usage</code>.</li> </ul> <h3 id="whats-next">What’s next</h3> <p>Long story short: while working on the storage image update it became clear that sokol_gfx.h needs resource-view objects.</p> <p>This will allow more flexible resource bindings without creating temporary 3D-backend objects in the ‘hot path’ while keeping the sokol_gfx.h backend implementations simple (e.g. I want to avoid a dynamic ‘hash-and-cache’ approach for 3D-backend resource objects as much as possible, it’s already bad enough that this is needed with WebGPU BindGroups).</p> <p>Currently resource view objects are managed under the hood, for instance in the D3D11 backend:</p> <ul> <li><code class="language-plaintext highlighter-rouge">sg_buffer</code> objects with storage buffer usage generally create a Shader Resource View for readonly-access in vertex-, fragment- and compute-shaders, and if the buffer is immutable, also an Unordered Access View for write-access in compute shaders. Notably, any starting offsets are hardwired to zero in both view objects.</li> <li><code class="language-plaintext highlighter-rouge">sg_image</code> objects generally create a Shader Resource View object, but without allowing to specify a mip-level range, array-slice range or different pixel format.</li> <li><code class="language-plaintext highlighter-rouge">sg_attachments</code> objects create: <ul> <li>one Render Target View object per color attachment</li> <li>an optional Depth Stencil View object for the depth-stencil attachment</li> <li>one Unordered Access View object per storage attachment</li> </ul> </li> </ul> <p>The reason why storage images are currently treated as pass attachments instead of regular bindings applied via <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> is because storage image bindings need to pick a mip-level and/or slice, and at least on D3D11 this requires a baked UAV object. Likewise, binding the same storage buffer with different offsets would require one SRV or UAV object per offset.</p> <p>The current plan for view objects in sokol_gfx.h looks like this:</p> <ul> <li>a single new resource object type is added: <code class="language-plaintext highlighter-rouge">sg_view</code>, with matching structs and functions (<code class="language-plaintext highlighter-rouge">sg_view_desc</code>, <code class="language-plaintext highlighter-rouge">sg_make_view()</code>, <code class="language-plaintext highlighter-rouge">sg_destroy_view()</code>, etc…)</li> <li>in return, the <code class="language-plaintext highlighter-rouge">sg_attachments</code> resource object type is removed (along with <code class="language-plaintext highlighter-rouge">sg_attachments_desc</code>, <code class="language-plaintext highlighter-rouge">sg_make_attachments()</code>, <code class="language-plaintext highlighter-rouge">sg_destroy_attachments()</code> etc…)</li> <li>view objects can be thought of as specialization of a resource object for a specific bindslot type (I actually thought about calling the new resource type <code class="language-plaintext highlighter-rouge">sg_binding</code>, but ‘view’ is the established name for this type of thing across backend 3D APIs), e.g. views will come in the following ‘runtime flavours’: <ul> <li>texture views</li> <li>storage buffer views</li> <li>storage image views</li> <li>color attachment views</li> <li>resolve attachment views</li> <li>depth-stencil attachment views</li> </ul> </li> <li>…and maybe (but not sure yet): <ul> <li>vertex buffer views</li> <li>index buffer views</li> </ul> <p>…vertex- and index-buffer-views would allow to remove the bind offset for vertex- and index-buffers from <code class="language-plaintext highlighter-rouge">sg_bindings</code>, with the downside that one view object would be required per offset, but I can’t think of a situation where a highly dynamic starting offset would be required for vertex- and index-data. To be clear: there is no backend API which requires a view object for vertex- and index-buffer bindings, it would be purely a sokol_gfx.h thing (this also means that it would be very cheap to build and destroy vertex- and index-buffer-view objects on the fly since no calls into backend APIs would happen)</p> </li> <li> <p>the new <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct would then look like this (notably storage images for compute shader access would move from ‘pass attachments’ to regular ‘bindings’)</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_bindings</span> <span class="p">{</span> <span class="n">sg_view</span> <span class="n">vertex_buffers</span><span class="p">[</span><span class="n">SG_MAX_VERTEXBUFFER_BINDINGS</span><span class="p">]</span> <span class="n">sg_view</span> <span class="n">index_buffer</span><span class="p">;</span> <span class="n">sg_view</span> <span class="n">textures</span><span class="p">[</span><span class="n">SG_MAX_TEXTURE_BINDINGS</span><span class="p">];</span> <span class="n">sg_view</span> <span class="n">storage_buffers</span><span class="p">[</span><span class="n">SG_MAX_STORAGEBUFFER_BINDINGS</span><span class="p">];</span> <span class="n">sg_view</span> <span class="n">storage_images</span><span class="p">[</span><span class="n">SG_MAX_STORAGEIMAGE_BINDINGS</span><span class="p">]</span> <span class="n">sg_sampler</span> <span class="n">samplers</span><span class="p">[</span><span class="n">SG_MAX_SAMPLER_BINDINGS</span><span class="p">];</span> <span class="p">}</span> <span class="n">sg_bindings</span><span class="p">;</span> </code></pre></div> </div> </li> <li> <p><code class="language-plaintext highlighter-rouge">sg_attachments</code> would become a ‘transient struct’ similar to <code class="language-plaintext highlighter-rouge">sg_bindings</code>:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_attachments</span> <span class="p">{</span> <span class="n">sg_view</span> <span class="n">colors</span><span class="p">[</span><span class="n">SG_MAX_COLOR_ATTACHMENTS</span><span class="p">];</span> <span class="n">sg_view</span> <span class="n">resolves</span><span class="p">[</span><span class="n">SG_MAX_COLOR_ATTACHMENTS</span><span class="p">];</span> <span class="n">sg_view</span> <span class="n">depth_stencil</span><span class="p">;</span> <span class="p">}</span> <span class="n">sg_attachments</span><span class="p">;</span> </code></pre></div> </div> </li> </ul> <p>This ‘view update’ would have the following advantages:</p> <ul> <li>storage buffer bindings can have a starting offset, which simplifies managing different types of data in the same buffer</li> <li>texture and storage image bindings can (to some extent) reinterpret the image data (e.g. casting to a different pixel format or selecting a miplevel and slice range - this will have to be behind a feature flag though)</li> <li>multiple-render-target combinations no longer need to be prebaked</li> </ul> <p>No ETA yet on the ‘view update’ though, first I want to fix a couple of internal things:</p> <ul> <li>the GL texture creation code is currently an unholy combination of <code class="language-plaintext highlighter-rouge">glTexStorage</code> and <code class="language-plaintext highlighter-rouge">glTexImage</code> functions. I want to cleanly split this into two code paths (unfortunatly macOS being stuck at GL 4.1 doesn’t have the <code class="language-plaintext highlighter-rouge">glTexStorage</code> functions, although I heard that those functions are implemented but just not present in the core GL headers - which I’ll need to investigate)</li> <li>I want to improve the internal ‘lifetime tracking’ for referenced resources (e.g. one resource object holding a reference to another object). Currently it’s not possible to detect when such a referenced object has gone through an ‘uninit/init’ cycle because this keeps the same public handle while discarding and recreating backend 3D API objects. Especially for view objects (which need to track their original resource object) it is important that views can detect when their referenced resource object is discarded (and I’m thinking about ‘auto-managed’ view objects which can recreate themselves on the fly when their resource object goes through uninit/init - no promises yet though).</li> </ul> <p>More info on those planned updates are in the following planning tickets:</p> <ul> <li>resource views: <a href="https://github.com/floooh/sokol/issues/1252">https://github.com/floooh/sokol/issues/1252</a></li> <li>better internal reference tracking: <a href="https://github.com/floooh/sokol/issues/1260">https://github.com/floooh/sokol/issues/1260</a></li> <li>glTexStorage vs glTexImage: <a href="https://github.com/floooh/sokol/issues/1263">https://github.com/floooh/sokol/issues/1263</a></li> </ul> <p>…and that is all for today :)</p> Mon, 19 May 2025 00:00:00 +0000 https://floooh.github.io/2025/05/19/sokol-gfx-compute-ms2.html https://floooh.github.io/2025/05/19/sokol-gfx-compute-ms2.html The sokol-gfx compute shader update <p><strong>Update:</strong> merged happened on 08-Mar-2025</p> <p>In the next couple of days I will merge initial compute shader support for sokol_gfx.h (and sokol-shdc). The update is surprisingly ‘low-profile’ in terms of API changes, the only breaking change is that the runtime feature flag <code class="language-plaintext highlighter-rouge">sg_features.storage_buffer</code> has been renamed to <code class="language-plaintext highlighter-rouge">sg_features.compute</code> (this is because the same backends that supported storage buffers before now also support compute shaders).</p> <h2 id="availability-and-restrictions">Availability and Restrictions</h2> <p>Compute shader support is available on the following platform/backend combos:</p> <ul> <li>macOS and iOS with Metal</li> <li>Windows with D3D11 and GL</li> <li>Linux with GL</li> <li>Web with WebGPU</li> </ul> <p>…which means that compute shaders are not available on:</p> <ul> <li>macOS with GL</li> <li>iOS with GLES3</li> <li>Web with WebGL2</li> <li>Android with GLES3</li> </ul> <p>The initial compute shader support comes with a couple of restricitions which will most likely be lifted in later updates (in about that order):</p> <ul> <li>storage buffers cannot be bound as vertex- or index-buffers</li> <li>no storage textures, e.g. compute shaders can only write buffer data but not texture data</li> <li>there’s no way to read data from GPU resources back to the CPU side (or copy data between GPU resources)</li> </ul> <p>Right now compute shaders are mostly useful for replacing dynamic- and streaming-buffer update scenarios, where dynamic render data is computed on the CPU and uploaded to buffers via <code class="language-plaintext highlighter-rouge">sg_update_buffer()</code>.</p> <h2 id="new-compute-shader-samples">New compute shader samples</h2> <p>To get an idea how compute shaders work in sokol-gfx, it’s best to read the new sample code:</p> <ul> <li><a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.c">C code</a></li> <li><a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-compute-sapp.glsl">GLSL code</a></li> <li><a href="https://floooh.github.io/sokol-webgpu/instancing-compute-sapp.html">WebGPU demo</a></li> </ul> <p>This is an evolution of the <a href="https://floooh.github.io/sokol-webgpu/instancing-sapp-ui.html">instancing-sapp</a> sample, and moves all particle computations into compute shaders.</p> <p>The other compute shader sample is a straight port of the <a href="https://webgpu.github.io/webgpu-samples/?sample=computeBoids">WebGPU compute boids sample</a> to sokol-gfx:</p> <ul> <li><a href="https://github.com/floooh/sokol-samples/blob/master/sapp/computeboids-sapp.c">C code</a></li> <li><a href="https://github.com/floooh/sokol-samples/blob/master/sapp/computeboids-sapp.glsl">GLSL code</a></li> <li><a href="https://floooh.github.io/sokol-webgpu/computeboids-sapp.html">WebGPU demo</a></li> </ul> <p>Those two samples use ‘cross-backend’ GLSL shader code compiled to the underlying shading languages via <a href="https://github.com/floooh/sokol-tools/">sokol-shdc</a>.</p> <p>For authoring compute shaders with sokol-shdc it might make sense to read up on <a href="https://www.khronos.org/opengl/wiki/Compute_Shader">GLSL compute shaders in the GL Wiki</a> - note though that not all features have been properly tested yet (like sampling textures in compute shaders, or accessing shared memory).</p> <p>For using sokol-gfx compute shaders without sokol-shdc, check out the following backend specific versions of the <code class="language-plaintext highlighter-rouge">instancing-compute</code> sample:</p> <ul> <li>D3D11: <a href="https://github.com/floooh/sokol-samples/blob/master/d3d11/instancing-compute-d3d11.c">instancing-compute-d3d11.c</a></li> <li>Metal: <a href="https://github.com/floooh/sokol-samples/blob/master/metal/instancing-compute-metal.c">instancing-compute-metal.c</a></li> <li>WebGPU: <a href="https://github.com/floooh/sokol-samples/blob/master/wgpu/instancing-compute-wgpu.c">instancing-compute-wgpu.c</a></li> <li>GL4.3: <a href="https://github.com/floooh/sokol-samples/blob/master/glfw/instancing-compute-glfw.c">instancing-compute-glfw.c</a></li> </ul> <p>Also check out the updated documentation of <a href="https://github.com/floooh/sokol-tools/blob/master/docs/sokol-shdc.md">sokol-shdc</a>, and the new documentation comment section on compute shaders in the sokol_gfx.h header (search for: <code class="language-plaintext highlighter-rouge">ON COMPUTE PASSES</code> and re-read the updated section <code class="language-plaintext highlighter-rouge">ON SHADER CREATION</code>).</p> <h2 id="shader-authoring-changes">Shader Authoring Changes</h2> <p>The sokol-gfx update comes with a matching sokol-shdc update for authoring compute shaders.</p> <p>A new tag <code class="language-plaintext highlighter-rouge">@cs [name]</code> (similar to the existing <code class="language-plaintext highlighter-rouge">@vs [name]</code> and <code class="language-plaintext highlighter-rouge">@fs [name]</code>) is used to identify a compute shader snippet, e.g. everything inside <code class="language-plaintext highlighter-rouge">@cs / @end</code> will be compiled as a <a href="https://www.khronos.org/opengl/wiki/Compute_Shader">GLSL compute shader</a>.</p> <p>NOTE that the distinction between readonly and read/write storage buffer bindings is important, e.g.:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="n">readonly</span> <span class="n">buffer</span> <span class="n">cs_ssbo_in</span> <span class="p">{</span> <span class="n">particle</span> <span class="n">prt_in</span><span class="p">[];</span> <span class="p">};</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="n">buffer</span> <span class="n">cs_ssbo_out</span> <span class="p">{</span> <span class="n">particle</span> <span class="n">prt_out</span><span class="p">[];</span> <span class="p">};</span> </code></pre></div></div> <p>If your compute shader only reads (but doesn’t write) storage buffer content, its binding declaration should be marked as <code class="language-plaintext highlighter-rouge">readonly</code>. This information will be extracted by sokol-shdc and used by sokol-gfx for hazard-tracking needed in some 3D-APIs.</p> <p>The other notable shader specialty is the ‘workgroup size’, which in GLSL is defined as:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">layout</span><span class="p">(</span><span class="n">local_size_x</span><span class="o">=</span><span class="n">X</span><span class="p">,</span> <span class="n">local_size_y</span><span class="o">=</span><span class="n">Y</span><span class="p">,</span> <span class="n">local_size_z</span><span class="o">=</span><span class="n">Z</span><span class="p">)</span> <span class="k">in</span><span class="p">;</span> </code></pre></div></div> <p>…if you’re used to HLSL, this is the same as <code class="language-plaintext highlighter-rouge">[numthreads(X,Y,Z)]</code>, or in WGSL <code class="language-plaintext highlighter-rouge">@workgroup_size(X,Y,Z)</code>. On Metal this is called <code class="language-plaintext highlighter-rouge">threadsPerThreadGroup</code> and is <strong>not</strong> defined in the shader code, but on the CPU side when issuing a dispatch call (this is another case where sokol-shdc comes in handy, since it extracts the workgroup size from the GLSL shader and passes it into sokol-gfx as <code class="language-plaintext highlighter-rouge">sg_shader_desc.mtl_threads_per_threadgroup</code>).</p> <p>Other then that you mainly need to be aware that your compute shader code must be thread safe because compute shaders allow random write access into storage buffers and the GPU is spawning many invocations of your shader running in parallel.</p> <h2 id="on-the-cpu-side">On the CPU side</h2> <p>The <code class="language-plaintext highlighter-rouge">sg_setup()</code> call gets a new config item <code class="language-plaintext highlighter-rouge">sg_desc.max_dispatch_calls_per_pass</code> (default: 1024). This is used to allocate an internal array to keep track of written storage buffers in a compute pass for hazard tracking purposes.</p> <p>There’s a minor change when creating buffers: It’s now allowed to create immutable buffers without initial content, and such buffers will be zero-initialized (note though that dynamic- and streaming-buffers may still have undefined buffer content after creation). Zero-initialization is useful when using a compute shader to write the initial buffer content instead of providing the data from the CPU side during the <code class="language-plaintext highlighter-rouge">sg_make_buffer()</code> call.</p> <p>Shaders, pipelines and passes now come in two runtime flavours: ‘render’ vs ‘compute’, where the ‘render flavours’ are fully compatible with existing code.</p> <p>For shaders, nothing changes either when using sokol-shdc for shader authoring. In that case you just write a compute shader and sokol-shdc will code-generate a matching <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct which can be plugged directly into the <code class="language-plaintext highlighter-rouge">sg_make_shader()</code> call.</p> <p>A compute pipeline is a regular pipeline object without any render state, but with a compute shader attached:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_pipeline</span> <span class="n">pip</span> <span class="o">=</span> <span class="n">sg_make_pipeline</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pipeline_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">compute</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">shader</span> <span class="o">=</span> <span class="n">a_compute_shader</span><span class="p">,</span> <span class="p">});</span> </code></pre></div></div> <p>Finally, kicking off ‘compute workloads’ happens with a new function <code class="language-plaintext highlighter-rouge">sg_dispatch()</code> inside ‘compute passes’:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">compute</span> <span class="o">=</span> <span class="nb">true</span> <span class="p">});</span> <span class="n">sg_apply_pipeline</span><span class="p">(</span><span class="n">pip</span><span class="p">);</span> <span class="n">sg_apply_bindings</span><span class="p">(...);</span> <span class="n">sg_apply_uniforms</span><span class="p">(...);</span> <span class="n">sg_dispatch</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span><span class="p">);</span> <span class="n">sg_end_pass</span><span class="p">();</span> </code></pre></div></div> <p>The <code class="language-plaintext highlighter-rouge">sg_dispatch()</code> call takes the number of ‘workgroups’ as arguments (same convention as GL, D3D11 and WebGPU, but different from Metal’s <code class="language-plaintext highlighter-rouge">dispatchThreads</code> method).</p> <p>Compute- vs render-passes now impose a couple of restrictions (checked by the validation layer):</p> <ul> <li>the following functions must only be called in render passes: <ul> <li><code class="language-plaintext highlighter-rouge">sg_apply_viewport[f]()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_apply_scissor_rect[f]()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_draw()</code></li> </ul> </li> <li><code class="language-plaintext highlighter-rouge">sg_dispatch()</code> must only be called in a compute pass</li> <li><code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> in a compute pass must not attempt to bind vertex- or index-buffers</li> <li>the <code class="language-plaintext highlighter-rouge">sg_apply_pipeline()</code> pipeline type must match the pass type (e.g. render pipeline objects can only be applied in render passes, and compute pipeline objects only in compute passes)</li> </ul> <h2 id="when-not-using-sokol-shdc">When not using sokol-shdc</h2> <p>If you don’t use sokol-shdc for shader authoring you’ll need to populate the all-important <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct passed into <code class="language-plaintext highlighter-rouge">sg_make_shader()</code> yourself with information that matches your shader code:</p> <ul> <li>A nested struct <code class="language-plaintext highlighter-rouge">compute_func</code> has been added (similar to existing <code class="language-plaintext highlighter-rouge">vertex_func</code> and <code class="language-plaintext highlighter-rouge">fragment_func</code>) to pass a compute shader function as backend-specific source code or bytecode blob</li> <li>A Metal-specific <code class="language-plaintext highlighter-rouge">mtl_threads_per_threadgroup</code> nested struct which defines the ‘workgroup size’ to the Metal API (this is in <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> because those values are normally extracted from shader code via reflection)</li> <li>The <code class="language-plaintext highlighter-rouge">readonly</code> boolean in the storage buffer bindslot declaration is now allowed to be false, but only in compute shaders. This flag is now used by sokol-gfx as hint for ‘resource hazard tracking’ in some backend APIs.</li> <li>A new HLSL/D3D11 specific item <code class="language-plaintext highlighter-rouge">uint8_t register_u_n</code> has been added to the nested <code class="language-plaintext highlighter-rouge">storage_buffers[]</code> declarations (struct <code class="language-plaintext highlighter-rouge">sg_shader_storage_buffer</code>), this is used to communicate the HLSL bindslot for writable storage buffer bindings (which are bound as D3D11 ‘unordered access views’, while readonly storage buffers continue to be bound as ‘shader resource views’).</li> </ul> <p>Also please carefully review the backend-specific compute shader samples which directly pass backend-specific shader code into sokol-gfx:</p> <ul> <li>D3D11: <a href="https://github.com/floooh/sokol-samples/blob/master/d3d11/instancing-compute-d3d11.c">instancing-compute-d3d11.c</a></li> <li>Metal: <a href="https://github.com/floooh/sokol-samples/blob/master/metal/instancing-compute-metal.c">instancing-compute-metal.c</a></li> <li>WebGPU: <a href="https://github.com/floooh/sokol-samples/blob/master/wgpu/instancing-compute-wgpu.c">instancing-compute-wgpu.c</a></li> <li>GL4.3: <a href="https://github.com/floooh/sokol-samples/blob/master/glfw/instancing-compute-glfw.c">instancing-compute-glfw.c</a></li> </ul> <h2 id="under-the-hood">Under the hood</h2> <p>Most of the new code in sokol_gfx.h is just a straight-forward mapping from sokol-gfx types and functions into backend 3D-API types and functions.</p> <p>Only two details are worth mentioning:</p> <ul> <li>On Metal, and only on systems without unified memory, GPU-written managed storage buffers are ‘synchronized’ at the end of a compute pass inside <code class="language-plaintext highlighter-rouge">sg_end_pass()</code>. This synchronization basically updates the CPU-side shadow copy of the buffer with the new data that’s been written by a compute shader. This requires keeping track of all read/write storage buffer bindings inside a compute pass (this is what the new <code class="language-plaintext highlighter-rouge">sg_desc.max_dispatch_calls_per_pass</code> config item is used for).</li> <li>On GL, <code class="language-plaintext highlighter-rouge">glMemoryBarrier()</code> calls are issued (at most once per <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> call) when a storage buffer was previously bound as read/write (which sets an internal ‘gpu_dirty’ flag).</li> </ul> <h2 id="whats-next">What’s next</h2> <p>…mainly patching remaining feature gaps in a couple of minor updates:</p> <ul> <li>allow storage buffers to be bound as vertex- and index-buffers</li> <li>introducing storage textures which can be written by compute shaders</li> <li>more ‘feature coverage’ by writing a handful more interesting compute samples</li> </ul> <p>…and what will most likely a bigger update: figure out a proper sub-API for <code class="language-plaintext highlighter-rouge">CPU =&gt; GPU</code>, <code class="language-plaintext highlighter-rouge">GPU =&gt; CPU</code> and <code class="language-plaintext highlighter-rouge">GPU =&gt; GPU</code> copies.</p> Mon, 03 Mar 2025 00:00:00 +0000 https://floooh.github.io/2025/03/03/sokol-gfx-compute-update.html https://floooh.github.io/2025/03/03/sokol-gfx-compute-update.html Upcoming Sokol header API changes (Nov 2024) <p>Update: the ‘bindings cleanup’ update has been merged on 07-Nov-2024</p> <p>In a couple of days I will merge the next breaking sokol_gfx.h update (aka the “Bindings Cleanup”). The update also affects sokol-shdc, so if you’re using sokol-shdc for shader compilation make sure to update that as well.</p> <ul id="markdown-toc"> <li><a href="#overview" id="markdown-toc-overview">Overview</a></li> <li><a href="#updated-documentation-and-example-code" id="markdown-toc-updated-documentation-and-example-code">Updated documentation and example code</a> <ul> <li><a href="#when-using-sokol-shdc" id="markdown-toc-when-using-sokol-shdc">When using sokol-shdc:</a></li> <li><a href="#when-not-using-sokol-shdc" id="markdown-toc-when-not-using-sokol-shdc">When <em>not</em> using sokol-shdc</a></li> </ul> </li> <li><a href="#change-recipes" id="markdown-toc-change-recipes">Change Recipes</a> <ul> <li><a href="#when-using-sokol-shdc-1" id="markdown-toc-when-using-sokol-shdc-1">When using sokol-shdc:</a></li> <li><a href="#when-not-using-sokol-shdc-1" id="markdown-toc-when-not-using-sokol-shdc-1">When <em>not</em> using sokol-shdc:</a></li> </ul> </li> </ul> <h2 id="overview">Overview</h2> <p>In general, the update makes the relationship between the shader resource interface and the sokol-gfx resource binding model more explicit, but also more flexible. Another motivation for the change was to prepare the sokol-gfx API for compute shader support.</p> <p>The root PR is here: <a href="https://github.com/floooh/sokol/pull/1111">https://github.com/floooh/sokol/pull/1111</a>.</p> <p>The TL;DR is:</p> <ul> <li>When using sokol-shdc for shader compilation, the input GLSL source now requires explicit binding annotations via <code class="language-plaintext highlighter-rouge">layout(binding=N)</code>, where <code class="language-plaintext highlighter-rouge">N</code> directly maps to bindslot indices in the sokol-gfx resource binding API.</li> <li>The concept of ‘shader stages’ mostly disappears from the sokol-gfx API, shader stages are now only a minor detail of the shader interface reflection information in the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct passed into the <code class="language-plaintext highlighter-rouge">sg_make_shader()</code> function.</li> <li>When <em>not</em> using sokol-shdc there’s now an explicit mapping from sokol-gfx bindslots to 3D backend-specific bindslots. This reduces the sokol-gfx internal magic for mapping the backend-agnostic sokol-gfx binding model to the specific binding models of the backend 3D APIs (there <em>are</em> still some restrictions but only when they allow a more efficient resource binding implementation in sokol-gfx).</li> </ul> <p>In general, all changes result in compile errors, and cleaning up the compile errors by following the ‘change recipes’ below should be enough to make your existing code work.</p> <p>The following parts of the public sokol_gfx.h API have changed:</p> <ul> <li>In the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct, the nested vertex- and fragment-stage structs for the image-, sampler- and storage-buffer-bindings have been removed, and the bindings arrays have moved up into the root struct.</li> <li>In the <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> call, the shader stage parameter has been removed</li> <li>The interior of the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct and the typename of nested structs have changed completely (but if you are using sokol-shdc for shader authoring you don’t need to worry about that, since sokol-shdc will code-generate the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct.</li> <li>A number of public API constants have been removed or renamed (but those should rarely show up in user code).</li> <li>The enum items in <code class="language-plaintext highlighter-rouge">sg_shader_stage</code> have been renamed, and those are now only used in the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct and nowhere else: <ul> <li><code class="language-plaintext highlighter-rouge">SG_SHADERSTAGE_VS</code> =&gt; <code class="language-plaintext highlighter-rouge">SG_SHADERSTAGE_VERTEX</code></li> <li><code class="language-plaintext highlighter-rouge">SG_SHADERSTAGE_FS</code> =&gt; <code class="language-plaintext highlighter-rouge">SG_SHADERSTAGE_FRAGMENT</code></li> </ul> </li> </ul> <p>The update also has some minor behaviour changes:</p> <ul> <li>Resource bindings can now have gaps, and validation for <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> has been relaxed to allow bindslots in the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct to be occupied even when the current shader doesn’t use those bindings. This allows to use the same <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct for different but related shader variants.</li> <li>Likewise, uniform block bindslots can now be explicitly defined in the shaders which allows to ‘share’ bindslot indices across shaders. Trying to call <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> for a bindslot that isn’t used by the current shader is still an error though (not sure yet if this makes sense, could probably be relaxed in a later update)</li> <li>There’s now a new (debug-mode only) error check in <code class="language-plaintext highlighter-rouge">sg_draw()</code> to make sure that <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> and/or <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> had been called since the last <code class="language-plaintext highlighter-rouge">sg_apply_pipeline()</code> when required.</li> </ul> <h2 id="updated-documentation-and-example-code">Updated documentation and example code</h2> <blockquote> <p>NOTE: these links will only be uptodate after <a href="https://github.com/floooh/sokol/pull/1111">PR #1111</a> has been merged.</p> </blockquote> <h3 id="when-using-sokol-shdc">When using sokol-shdc:</h3> <p>Please re-read the sokol-shdc documentation:</p> <p><a href="https://github.com/floooh/sokol-tools/blob/master/docs/sokol-shdc.md">https://github.com/floooh/sokol-tools/blob/master/docs/sokol-shdc.md</a></p> <p>Especially the section <code class="language-plaintext highlighter-rouge">Shader Authoring Considerations</code>.</p> <p>In the <a href="https://github.com/floooh/sokol/blob/master/sokol_gfx.h">sokol_gfx.h header</a>, re-read the documentation header above the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct.</p> <p>Check the updated sokol samples here:</p> <p><a href="https://github.com/floooh/sokol-samples/tree/master/sapp">https://github.com/floooh/sokol-samples/tree/master/sapp</a></p> <h3 id="when-not-using-sokol-shdc">When <em>not</em> using sokol-shdc</h3> <p>In the <a href="https://github.com/floooh/sokol/blob/master/sokol_gfx.h">sokol_gfx.h header</a>, re-read the updated documentation section <code class="language-plaintext highlighter-rouge">ON SHADER CREATION</code>.</p> <p>Next read the updated documentation above the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> and <code class="language-plaintext highlighter-rouge">sg_bindings</code> structs.</p> <p>Finally check the updated backend-specific samples:</p> <ul> <li>for Metal: <a href="https://github.com/floooh/sokol-samples/tree/master/metal">https://github.com/floooh/sokol-samples/tree/master/metal</a></li> <li>for D3D11: <a href="https://github.com/floooh/sokol-samples/tree/master/d3d11">https://github.com/floooh/sokol-samples/tree/master/d3d11</a></li> <li>for desktop GL: <a href="https://github.com/floooh/sokol-samples/tree/master/glfw">https://github.com/floooh/sokol-samples/tree/master/glfw</a></li> <li>for WebGL/GLES3: <a href="https://github.com/floooh/sokol-samples/tree/master/html5">https://github.com/floooh/sokol-samples/tree/master/html5</a></li> <li>for WebGPU: <a href="https://github.com/floooh/sokol-samples/tree/master/wgpu">https://github.com/floooh/sokol-samples/tree/master/wgpu</a></li> </ul> <p>Especially note the <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> struct interiors in the <code class="language-plaintext highlighter-rouge">sg_make_shader()</code> calls.</p> <h2 id="change-recipes">Change Recipes</h2> <p>General rule of thumb: fix all places that throw compile errors and you should be good.</p> <h3 id="when-using-sokol-shdc-1">When using sokol-shdc:</h3> <p>First you’ll need to fix your shaders and add explicit binding annotations. When running sokol-shdc over your current shader code you’ll get errors looking like this:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error: 'binding' : uniform/buffer blocks require layout(binding=X) </code></pre></div></div> <p>…or this:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error: 'binding' : sampler/texture/image requires layout(binding=X) </code></pre></div></div> <p>To fix those errors for the different resource types add <code class="language-plaintext highlighter-rouge">layout(binding=N)</code> annotations:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">vs_params</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">tex</span><span class="p">;</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">sampler</span> <span class="n">smp</span><span class="p">;</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="n">readonly</span> <span class="n">buffer</span> <span class="n">ssbo</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span> </code></pre></div></div> <p>Note that each resource type (uniform blocks, textures, samplers and storage buffers) has its own bindslot space which is shared across shader stages. Trying to use bindslot indices outside those ranges, or using the same bindslot for a resource type in different shader stages will cause a compilation error.</p> <p>The binding ranges per resource type are:</p> <ul> <li>uniform blocks: 0..7</li> <li>textures: 0..15</li> <li>samplers: 0..15</li> <li>storage buffers: 0..7</li> </ul> <p>…these are also the maximum number of resources of that type that can be bound on a shader across all shader stages.</p> <p>Next fix the compile errors on the CPU side, you should see errors when initializing an <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct, when calling <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> and possibly when setting up vertex attributes in the <code class="language-plaintext highlighter-rouge">sg_pipeline_desc</code> struct:</p> <ul> <li>in the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct, the nested structs for the vertex and fragment shader stage have been removed, and the former per-stage binding arrays have moved up into the root</li> <li>in the <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> call, the shader stage argument has been removed</li> <li>all code-generated slot constants have new naming schemes (also the vertex attribute slot constants)</li> </ul> <p>For instance if your shader resource interface looks like this:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">@</span><span class="n">vs</span> <span class="c1">// a vertex shader uniform block</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">vs_params</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span> <span class="c1">// a vertex shader texture and sampler</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">vs_tex</span><span class="p">;</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">sampler</span> <span class="n">vs_smp</span><span class="p">;</span> <span class="c1">// a vertex shader storage buffer</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="n">readonly</span> <span class="n">buffer</span> <span class="n">vs_ssbo</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span> <span class="err">@</span><span class="n">end</span> <span class="err">@</span><span class="n">fs</span> <span class="c1">// a fragment shader uniform block</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">fs_params</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span> <span class="c1">// diffuse, normal and specular textures</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">diffuse_tex</span><span class="p">;</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">specular_tex</span><span class="p">;</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">texture2D</span> <span class="n">normal_tex</span><span class="p">;</span> <span class="c1">// a common sampler for the above textures</span> <span class="k">layout</span><span class="p">(</span><span class="n">binding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">uniform</span> <span class="n">sampler</span> <span class="n">smp</span><span class="p">;</span> <span class="err">@</span><span class="n">end</span> </code></pre></div></div> <p>…the matching <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct on the CPU side needs to look like this - note how the array indices match the shader <code class="language-plaintext highlighter-rouge">layout(binding=N)</code>:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_bindings</span> <span class="n">bnd</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">vertex_buffer</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">.</span><span class="n">images</span> <span class="o">=</span> <span class="p">{</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">vs_tex</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">diffuse_tex</span><span class="p">,</span> <span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="n">specular_tex</span><span class="p">,</span> <span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="n">normal_tex</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">samplers</span> <span class="o">=</span> <span class="p">{</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">vs_smp</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">smp</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">storage_buffers</span> <span class="o">=</span> <span class="p">{</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">vs_ssbo</span><span class="p">,</span> <span class="p">},</span> <span class="p">};</span> </code></pre></div></div> <p>…and the <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> calls to write the uniform data for the <code class="language-plaintext highlighter-rouge">vs_params</code> and <code class="language-plaintext highlighter-rouge">fs_params</code> uniform blocks now look like this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_apply_uniforms</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vs_params</span><span class="p">));</span> <span class="n">sg_apply_uniforms</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">SG_RANGE</span><span class="p">(</span><span class="n">fs_params</span><span class="p">));</span> </code></pre></div></div> <p>…instead of hardwired numeric indices you can also use code-generated constants (note that those have been renamed from a generic <code class="language-plaintext highlighter-rouge">SLOT_*</code> to a per-resource-type naming scheme):</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_bindings</span> <span class="n">bnd</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">vertex_buffer</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">.</span><span class="n">images</span> <span class="o">=</span> <span class="p">{</span> <span class="p">[</span><span class="n">IMG_vs_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">vs_tex</span><span class="p">,</span> <span class="p">[</span><span class="n">IMG_diffuse_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">diffuse_tex</span><span class="p">,</span> <span class="p">[</span><span class="n">IMG_specular_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">specular_tex</span><span class="p">,</span> <span class="p">[</span><span class="n">IMG_normal_tex</span><span class="p">]</span> <span class="o">=</span> <span class="n">normal_tex</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">samplers</span> <span class="o">=</span> <span class="p">{</span> <span class="p">[</span><span class="n">SMP_vs_smp</span><span class="p">]</span> <span class="o">=</span> <span class="n">vs_smp</span><span class="p">,</span> <span class="p">[</span><span class="n">SMP_smp</span><span class="p">]</span> <span class="o">=</span> <span class="n">smp</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">storage_buffers</span> <span class="o">=</span> <span class="p">{</span> <span class="p">[</span><span class="n">SBUF_vs_ssbo</span><span class="p">]</span> <span class="o">=</span> <span class="n">vs_ssbo</span><span class="p">,</span> <span class="p">},</span> <span class="p">};</span> </code></pre></div></div> <p>…or for the uniform block updates:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_apply_uniforms</span><span class="p">(</span><span class="n">UB_vs_params</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vs_params</span><span class="p">));</span> <span class="n">sg_apply_uniforms</span><span class="p">(</span><span class="n">UB_fs_params</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">SG_RANGE</span><span class="p">(</span><span class="n">fs_params</span><span class="p">));</span> </code></pre></div></div> <p>…using the code-generated constants has the advantage that changing the bindslots in the shader code doesn’t require updating the CPU-side code, but other then that it’s totally fine to use numeric indices.</p> <p>The naming scheme for the code-generated vertex attribute slots has changed to use the shader program name for ‘namespacing’ instead of the vertex shader snippet name.</p> <p>For instance with the following shader fragment:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">@</span><span class="n">vs</span> <span class="n">vs</span> <span class="k">in</span> <span class="kt">vec4</span> <span class="n">position</span><span class="p">;</span> <span class="k">in</span> <span class="kt">vec4</span> <span class="n">color0</span><span class="p">;</span> <span class="p">...</span> <span class="err">@</span><span class="n">end</span> <span class="err">@</span><span class="n">fs</span> <span class="n">fs</span> <span class="p">...</span> <span class="err">@</span><span class="n">end</span> <span class="err">@</span><span class="n">program</span> <span class="n">cube</span> <span class="n">vs</span> <span class="n">fs</span> </code></pre></div></div> <p>The generated vertex attribute slot constants <code class="language-plaintext highlighter-rouge">ATTR_*</code> previously looked like this (in the sg_pipeline_desc struct):</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_pipeline_desc</span> <span class="n">desc</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">layout</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">attrs</span> <span class="o">=</span> <span class="p">{</span> <span class="p">[</span><span class="n">ATTR_vs_position</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">[</span><span class="n">ATTR_vs_color0</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">},</span> <span class="p">},</span> <span class="p">...</span> <span class="p">};</span> </code></pre></div></div> <p>…now the <code class="language-plaintext highlighter-rouge">ATTR_*</code> names look like this (e.g. <code class="language-plaintext highlighter-rouge">ATTR_vs_*</code> to <code class="language-plaintext highlighter-rouge">ATTR_cube_*</code>):</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_pipeline_desc</span> <span class="n">desc</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">layout</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">attrs</span> <span class="o">=</span> <span class="p">{</span> <span class="p">[</span><span class="n">ATTR_cube_position</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">[</span><span class="n">ATTR_cube_color0</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">},</span> <span class="p">},</span> <span class="p">...</span> <span class="p">};</span> </code></pre></div></div> <p>…it’s also possible to use explicit attribute locations and ignore the code-generated constants, for instance:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">@</span><span class="n">vs</span> <span class="n">vs</span> <span class="k">layout</span><span class="p">(</span><span class="n">location</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="k">in</span> <span class="kt">vec4</span> <span class="n">position</span><span class="p">;</span> <span class="k">layout</span><span class="p">(</span><span class="n">location</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">in</span> <span class="kt">vec4</span> <span class="n">color0</span><span class="p">;</span> <span class="p">...</span> <span class="err">@</span><span class="n">end</span> </code></pre></div></div> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_pipeline_desc</span> <span class="n">desc</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">layout</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">attrs</span> <span class="o">=</span> <span class="p">{</span> <span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="p">...,</span> <span class="p">},</span> <span class="p">},</span> <span class="p">...</span> <span class="p">};</span> </code></pre></div></div> <p>…note though that it’s still not allowed to have gaps in the vertex attribute slots (this may be supported at a later time).</p> <h3 id="when-not-using-sokol-shdc-1">When <em>not</em> using sokol-shdc:</h3> <p>The interior of <code class="language-plaintext highlighter-rouge">sg_shader_desc</code> has changed to match the new ‘shader-stage-agnostic’ sokol-gfx binding model. The toplevel-structure now looks like this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">sg_shader_desc</span> <span class="n">desc</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">vertex_func</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// vertex shader source or bytecode</span> <span class="p">.</span><span class="n">fragment_func</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// fragment shader source or bytecode</span> <span class="p">.</span><span class="n">attrs</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// vertex attribute reflection info</span> <span class="p">.</span><span class="n">uniform_blocks</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// reflection info for uniform block bindings</span> <span class="p">.</span><span class="n">storage_buffers</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// reflection info for storage buffer bindings</span> <span class="p">.</span><span class="n">images</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// reflection info for texture bindings</span> <span class="p">.</span><span class="n">samplers</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// reflection info for sampler bindings</span> <span class="p">.</span><span class="n">image_sampler_pairs</span> <span class="o">=</span> <span class="p">{</span> <span class="p">...</span> <span class="p">},</span> <span class="c1">// how images and samplers are used together in the shader</span> <span class="p">};</span> </code></pre></div></div> <p>The array indices in the <code class="language-plaintext highlighter-rouge">uniform_blocks[]</code> array match the <code class="language-plaintext highlighter-rouge">ub_slot</code> parameter in the <code class="language-plaintext highlighter-rouge">sg_apply_uniforms()</code> call:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sg_shader_desc.uniform_blocks[N] =&gt; sg_apply_uniforms(N, ...) </code></pre></div></div> <p>The array indices in the <code class="language-plaintext highlighter-rouge">storage_buffers[]</code>, <code class="language-plaintext highlighter-rouge">images[]</code> and <code class="language-plaintext highlighter-rouge">samplers[]</code> arrays match the respective indices in the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sg_shader_desc.images[N] =&gt; sg_bindings.images[N] sg_shader_desc.samplers[N] =&gt; sg_bindings.samplers[N] sg_shader_desc.storage_buffers[N] =&gt; sg_bindings.storage_buffers[N] </code></pre></div></div> <p>Fields that are only required for a specific 3D backend now have consistent prefixes:</p> <ul> <li>D3D11/HLSL: <code class="language-plaintext highlighter-rouge">hlsl_*</code></li> <li>GL/GLSL: <code class="language-plaintext highlighter-rouge">glsl_*</code></li> <li>Metal/MSL: <code class="language-plaintext highlighter-rouge">msl_*</code></li> <li>WebGPU/WGSL: <code class="language-plaintext highlighter-rouge">wgsl_*</code></li> </ul> <p>The resource binding slots now require two new types of information:</p> <ul> <li>the shader stage this resource binding appears on</li> <li>a 3D backend specific bindslot</li> </ul> <p>The backend specific bindslot struct members need to be filled with the shader language specific resource bindslot numbers which also need to lie within specific ranges:</p> <ul> <li>for uniform block items: <ul> <li><code class="language-plaintext highlighter-rouge">.hlsl_register_b_n = N;</code> &lt;= HLSL <code class="language-plaintext highlighter-rouge">register(bN)</code> where <code class="language-plaintext highlighter-rouge">(N &gt;= 0) &amp;&amp; (N &lt; 8)</code></li> <li><code class="language-plaintext highlighter-rouge">.msl_buffer_n = N;</code> &lt;= &gt;MSL <code class="language-plaintext highlighter-rouge">[[buffer(N)]]</code> where <code class="language-plaintext highlighter-rouge">(N &gt;= 0) &amp;&amp; (N &lt; 8)</code></li> <li><code class="language-plaintext highlighter-rouge">.wgsl_group0_binding_n = N;</code> &lt;= WGSL <code class="language-plaintext highlighter-rouge">@group(0) @binding(N)</code> where <code class="language-plaintext highlighter-rouge">(N &gt;= 0) &amp;&amp; (N &lt; 8)</code></li> </ul> </li> <li>for images: <ul> <li><code class="language-plaintext highlighter-rouge">.hlsl_register_t_n = N;</code> &lt;= HLSL <code class="language-plaintext highlighter-rouge">register(tN)</code> where <code class="language-plaintext highlighter-rouge">(N &gt;= 0) &amp;&amp; (N &lt; 24)</code></li> <li><code class="language-plaintext highlighter-rouge">.msl_texture_n = N;</code> &lt;= MSL <code class="language-plaintext highlighter-rouge">[[texture(N)]]</code> where <code class="language-plaintext highlighter-rouge">(N &gt;= 0) &amp;&amp; (N &lt; 16)</code></li> <li><code class="language-plaintext highlighter-rouge">.wgsl_group1_binding_n = N;</code> &lt;= WGSL <code class="language-plaintext highlighter-rouge">@group(1) @binding(N)</code> where <code class="language-plaintext highlighter-rouge">(N &gt;= 0) &amp;&amp; (N &lt; 128)</code></li> </ul> </li> <li>for samplers: <ul> <li><code class="language-plaintext highlighter-rouge">.hlsl_register_s_n = N;</code> &lt;= HLSL <code class="language-plaintext highlighter-rouge">register(sN)</code> where <code class="language-plaintext highlighter-rouge">(N &gt;= 0) &amp;&amp; (N &lt; 16)</code></li> <li><code class="language-plaintext highlighter-rouge">.msl_sampler_n = N;</code> &lt;= MSL <code class="language-plaintext highlighter-rouge">[[sampler(N)]]</code> where <code class="language-plaintext highlighter-rouge">(N &gt;= 0) &amp;&amp; (N &lt; 16)</code></li> <li><code class="language-plaintext highlighter-rouge">.wgsl_group1_binding_n = N;</code> &lt;= WGSL <code class="language-plaintext highlighter-rouge">@group(1) @binding(N)</code> where <code class="language-plaintext highlighter-rouge">(N &gt;= 0) &amp;&amp; (N &lt; 128)</code></li> </ul> </li> <li>for storage buffers: <ul> <li><code class="language-plaintext highlighter-rouge">.hlsl_register_t_n = N;</code> &lt;= HLSL <code class="language-plaintext highlighter-rouge">register(tN)</code> where <code class="language-plaintext highlighter-rouge">(N &gt;= 0) &amp;&amp; (N &lt; 24)</code></li> <li><code class="language-plaintext highlighter-rouge">.msl_register_b_n = N;</code> &lt;= MSL <code class="language-plaintext highlighter-rouge">[[buffer(N)]]</code> where <code class="language-plaintext highlighter-rouge">(N &gt;= 8) &amp;&amp; (N &lt; 16)</code></li> <li><code class="language-plaintext highlighter-rouge">.wgsl_group1_binding_n = N;</code> &lt;= WGSL <code class="language-plaintext highlighter-rouge">@group(1) @binding(N)</code> where <code class="language-plaintext highlighter-rouge">(N &gt;= 0) &amp;&amp; (N &lt; 128)</code></li> <li><code class="language-plaintext highlighter-rouge">.glsl_binding_n = N;</code> &lt;= GLSL <code class="language-plaintext highlighter-rouge">layout(binding=N)</code> where <code class="language-plaintext highlighter-rouge">(N &gt;= 0) &amp;&amp; (N &lt; 16)</code></li> </ul> </li> </ul> <p>These backend-specific bindslots allow a more flexible mapping from the sokol-gfx resource binding model to the backend 3D-API binding models, but there are still some restrictions (which typically exist to allow a more efficient resource binding implementation in sokol_gfx.h):</p> <ul> <li>in WebGPU/WGSL, all uniform blocks must be in <code class="language-plaintext highlighter-rouge">@group(0)</code> and all other resource types in <code class="language-plaintext highlighter-rouge">@group(1)</code></li> <li>in Metal/MSL, the <code class="language-plaintext highlighter-rouge">[[buffer(N)]]</code> slots 0..7 are reserved for uniform blocks, and <code class="language-plaintext highlighter-rouge">[[buffer(N)]]</code> slots 8..15 are reserved for storage buffers</li> </ul> <p>For code examples, check out the backend-specific samples:</p> <ul> <li>for Metal: <a href="https://github.com/floooh/sokol-samples/tree/master/metal">https://github.com/floooh/sokol-samples/tree/master/metal</a></li> <li>for D3D11: <a href="https://github.com/floooh/sokol-samples/tree/master/d3d11">https://github.com/floooh/sokol-samples/tree/master/d3d11</a></li> <li>for desktop GL: <a href="https://github.com/floooh/sokol-samples/tree/master/glfw">https://github.com/floooh/sokol-samples/tree/master/glfw</a></li> <li>for WebGL/GLES3: <a href="https://github.com/floooh/sokol-samples/tree/master/html5">https://github.com/floooh/sokol-samples/tree/master/html5</a></li> <li>for WebGPU: <a href="https://github.com/floooh/sokol-samples/tree/master/wgpu">https://github.com/floooh/sokol-samples/tree/master/wgpu</a></li> </ul> <p>…and that should be it! Next big thing on the roadmap: compute shader support :)</p> Mon, 04 Nov 2024 00:00:00 +0000 https://floooh.github.io/2024/11/04/sokol-fall-2024-update.html https://floooh.github.io/2024/11/04/sokol-fall-2024-update.html Zig and Emulators <p>Some quick Zig feedback in the context of a new 8-bit emulator project I started a little while ago:</p> <p><a href="https://github.com/floooh/chipz">https://github.com/floooh/chipz</a></p> <p>Currently the project consists of:</p> <ul> <li>a cycle-stepped Z80 CPU emulator (similar to the emulator described here: <a href="https://floooh.github.io/2021/12/17/cycle-stepped-z80.html">https://floooh.github.io/2021/12/17/cycle-stepped-z80.html</a></li> <li>chip emulators for Z80 PIO, Z80 CTC and three variants of the AY-3-8910 sound chip</li> <li>system emulators for Bombjack, Pengo and Pacman arcade machines, and the East German KC85/2../4 home computer series</li> <li>a code generation tool to create the Z80 instruction decoder code block</li> <li>various tests to check Z80 emulation correctness</li> </ul> <p>With the exception of an external C dependency for ‘host system glue’ (the cross-platform <a href="https://github.com/floooh/sokol-zig">sokol headers</a> used for wrapping the platform-specific windowing, input, rendering and audio-output code), the project is around 16 kloc of pure Zig code.</p> <p>I’m not yet sure how this new project will evolve in relation to the <a href="https://github.com/floooh/chips">original C/C++ ‘chips’ emulator project</a>, but I expect that the Zig project will overtake the C/C++ project at some point in the future.</p> <h2 id="dev-environment">Dev Environment</h2> <p>I’m coding on an M1 Mac in VSCode with the <a href="https://marketplace.visualstudio.com/items?itemName=ziglang.vscode-zig">Zig Language Extension</a>, and <a href="https://marketplace.visualstudio.com/items?itemName=vadimcn.vscode-lldb">CodeLLDB</a> for step-debugging.</p> <p>The Zig and ZLS (Zig Language Server) installation is managed with <a href="https://github.com/tristanisham/zvm">ZVM</a>.</p> <p>For the most part this setup works pretty well, with a few tweaks:</p> <ul> <li>I’m doing ‘build-on-save’ to get more complete error information as described here: <a href="https://kristoff.it/blog/improving-your-zls-experience/">Improving Your Zig Language Server Experience</a> (I’m not bothering with creating separate non-install build targets though)</li> <li>With the default Zig VSCode extension settings I was seeing that in long coding session (5..6 hours or so) saving would take longer and longer until it would eventually get stuck. After asking around on the Zig Discord this could be solved by explicitly setting the Zig Language Server as ‘VSCode Formatting Provider’ in the Zig Extension settings.</li> <li>When debugging, there’s a somewhat annoying issue that the debug line information seems to be off in some places, the debugger appears to step into the last line of an inactive if-else block for instance. Again, Discord to the rescue, this seems to be a known issue.</li> </ul> <p>All in all, not yet perfect, but good enough to get shit done.</p> <h2 id="zig-comptime-and-generics">Zig Comptime and Generics</h2> <p>Before diving into language details, I’ll need to provide some minimal background information of how the chipz emulators work:</p> <p>Microchips of the 70s and 80s were very much like ‘software libraries, but implemented in hardware’, they followed a minimal standard for interoperability so that chips from different manufacturers could be combined into computer systems without requiring too much custom glue logic between them. I think it’s fair to say that this ‘competition through interoperability’ was the main driver for the Cambrian Explosion of cheap 8-bit computer systems in the 70s and 80s.</p> <p>Microchips communicate with the outside world via input/output pins, and a typical 8-bit home computer system is essentially just a handful of microchips talking to each other through their ‘pin API’.</p> <p>The chipz project follows that same idea: The basic building blocks are self-contained chip emulators which communicate with other chip emulators via virtual input/output pins which are mapped to bits in an integer.</p> <p>Chips of that era typically had up to 40 pins which makes them a good fit for 64-bit integers used in today’s CPUs.</p> <p>The API of such a chip emulator only has one important function:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="n">tick</span><span class="p">(</span><span class="n">pins</span><span class="p">:</span> <span class="kt">u64</span><span class="p">)</span> <span class="kt">u64</span> </code></pre></div></div> <p>This tick function executes exactly one clock cycle, it takes an integer as input where the bits represent input/output pins, and returns that same integer with modified bits.</p> <p>Fitting a CPU emulator into such a ‘cycle-stepped model’ can be a bit of a challenge and is described in these blog posts (for the 6502 and Z80):</p> <ul> <li> <p><a href="https://floooh.github.io/2019/12/13/cycle-stepped-6502.html">A new cycle-stepped 6502 CPU emulator</a></p> </li> <li> <p><a href="https://floooh.github.io/2021/12/17/cycle-stepped-z80.html">A new cycle-stepped Z80 emulator</a></p> </li> </ul> <p>A whole computer system is then emulated by writing a ‘system tick function’ which emulates a single clock cycle for the whole system by calling the tick functions of each chip emulator and passing pin-state integers from one chip emulator to the next.</p> <p>There’s two related problems to solve with the above approach:</p> <ul> <li>There’s not enough bits in a 64-bit integer to assign one bit for each inter-chip connection of a complete computer system. This means a system tick function will need to maintain one pin-state integer for each chip, and shuffle bits around before each chip’s tick function is called.</li> <li>For direct pin-to-pin connections it makes sense to assign the same bit position in different chip emulators to avoid ‘runtime bit shuffling’ from an output pin position of one chip to a different input pin position of another chip. Those direct pin-to-pin connections are different in each emulated computer system, so to make this idea work a specialized chip emulator needs to be ‘stamped out’ for each computer system.</li> </ul> <p>Both problems can be solved quite elegantly in Zig:</p> <ul> <li>Instead of 64-bit integers for the pin-state we can switch to wide integers (u128, u192, u256, …) with enough bits to assign each chip in a system its own reserved bit range instead of juggling with multiple 64-bit integers.</li> <li>With Zig’s comptime generics it’s possible to stamp out chip emulators which are specialized by a specific mapping of pins to bit positions in the shared wide integer.</li> </ul> <p>This means a chip emulator is specialized by two comptime configuration values:</p> <ul> <li>a <code class="language-plaintext highlighter-rouge">Bus</code> type which is an unsigned integer type with enough bits for all pin-to-pin connections in a system</li> <li>a <code class="language-plaintext highlighter-rouge">Pins</code> structure which defines a bit position for each input/output pin of a chip emulator</li> </ul> <p>For the Z80 CPU emulator this pin definition struct looks like this:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">const</span> <span class="n">Pins</span> <span class="o">=</span> <span class="k">struct</span> <span class="p">{</span> <span class="n">DBUS</span><span class="p">:</span> <span class="p">[</span><span class="mi">8</span><span class="p">]</span><span class="nb">comptime_int</span><span class="p">,</span> <span class="n">ABUS</span><span class="p">:</span> <span class="p">[</span><span class="mi">16</span><span class="p">]</span><span class="nb">comptime_int</span><span class="p">,</span> <span class="n">M1</span><span class="p">:</span> <span class="nb">comptime_int</span><span class="p">,</span> <span class="n">MREQ</span><span class="p">:</span> <span class="nb">comptime_int</span><span class="p">,</span> <span class="n">IORQ</span><span class="p">:</span> <span class="nb">comptime_int</span><span class="p">,</span> <span class="c">// ...more pins...</span> <span class="p">};</span> </code></pre></div></div> <p>…which is used as nested struct in a <code class="language-plaintext highlighter-rouge">TypeConfig</code> struct which holds all generic parameters to stamp out a specialized Z80 emulator:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">const</span> <span class="n">TypeConfig</span> <span class="o">=</span> <span class="k">struct</span> <span class="p">{</span> <span class="n">pins</span><span class="p">:</span> <span class="n">Pins</span><span class="p">,</span> <span class="n">bus</span><span class="p">:</span> <span class="k">type</span><span class="p">,</span> <span class="p">};</span> </code></pre></div></div> <p>This <code class="language-plaintext highlighter-rouge">TypeConfig</code> struct is used as parameter for a comptime Zig function which returns a specialized type (this is how Zig does generics):</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="n">Type</span><span class="p">(</span><span class="k">comptime</span> <span class="n">cfg</span><span class="p">:</span> <span class="n">TypeConfig</span><span class="p">)</span> <span class="k">type</span> <span class="p">{</span> <span class="k">return</span> <span class="k">struct</span> <span class="p">{</span> <span class="c">// the returned struct is a new type which is comptime-configured</span> <span class="c">// by the 'cfg' type configuration parameter</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div></div> <p>…now we can stamp out a Z80 CPU emulator that’s specialized for a specific computer system by the system bus integer type and the Z80 pins mapped to specific bit positions of this integer type:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">z80</span> <span class="o">=</span> <span class="nb">@import</span><span class="p">(</span><span class="s">"z80"</span><span class="p">);</span> <span class="k">const</span> <span class="n">Z80</span> <span class="o">=</span> <span class="n">z80</span><span class="p">.</span><span class="nf">Type</span><span class="p">(</span><span class="o">.</span><span class="p">{</span> <span class="p">.</span><span class="py">bus</span> <span class="o">=</span> <span class="kt">u128</span><span class="p">,</span> <span class="p">.</span><span class="py">pins</span> <span class="o">=</span> <span class="o">.</span><span class="p">{</span> <span class="p">.</span><span class="py">DBUS</span> <span class="o">=</span> <span class="o">.</span><span class="p">{</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span> <span class="p">},</span> <span class="p">.</span><span class="py">ABUS</span> <span class="o">=</span> <span class="o">.</span><span class="p">{</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="c">// ... },</span> <span class="c">// ...</span> <span class="p">}</span> <span class="p">});</span> </code></pre></div></div> <p>This specific <code class="language-plaintext highlighter-rouge">Z80</code> type uses a 128-bit pin-state integer and maps its own pins to bit positions starting at bit 0, with the first 8 bits being the data bus (most other chips in any computer system will also map their data bus pins to the same bit range, since the data bus is usually shared between all chips in a system).</p> <p>Note that <code class="language-plaintext highlighter-rouge">Z80</code> is just a type, not a runtime object. To get a default-initialized Z80 CPU object:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">var</span> <span class="n">cpu</span> <span class="o">=</span> <span class="n">Z80</span><span class="p">{};</span> </code></pre></div></div> <p>This example doesn’t look like much, it’s “just Zig code” after all, but this is exactly what makes generic programming in Zig so elegant and powerful.</p> <p>Arbitrarily complex comptime config options can be ‘baked’ into types, and dynamic runtime configuration options can be passed in a ‘construction’ function on that type, and all is just regular Zig code from top to bottom:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">var</span> <span class="n">obj</span> <span class="o">=</span> <span class="n">Type</span><span class="p">(</span><span class="o">.</span><span class="p">{</span> <span class="c">// comptime options...</span> <span class="p">.</span><span class="py">bus</span> <span class="o">=</span> <span class="kt">u128</span><span class="p">,</span> <span class="p">.</span><span class="py">pins</span> <span class="o">=</span> <span class="o">.</span><span class="p">{</span> <span class="o">...</span> <span class="p">},</span> <span class="p">}).</span><span class="nf">init</span><span class="p">(</span><span class="o">.</span><span class="p">{</span> <span class="c">// additional runtime options...</span> <span class="p">});</span> </code></pre></div></div> <p>…and this is just scratching the surface. There’s a couple of really interesting side effects of this 2-step approach (first build the type, then build an object from that type):</p> <ul> <li>Can use designated-init-syntax for configuring the type which is just *chef’s kiss* because it makes the code very readable (no guessing what a generic parameter actually does because the name is right there in the code).</li> <li>TypeConfig structs can be composed by nesting other TypeConfig structs, or generic parameters in general, which then can be used to build types inside types (Yo Dawg…).</li> <li>It’s possible to build different struct interiors based on comptime parameters (for instance the different KC85 models have different runtime-config struct interiors for configuring model-specific features, which makes ‘accidential misconfiguration’ an immediate compile error).</li> </ul> <p>In conclusion, the idea to use Zig’s comptime features to stamp out specialized per-system chip and system emulators works exceptionally well and is (IMHO) <em>much</em> more enjoyable than C++ or Rust generic programming (I’m sure C++ and Rust can do the same things with sufficient template magic, but this code definitely won’t look as straightforward as the Zig version).</p> <h2 id="bit-twiddling-and-integer-math-can-be-awkward">Bit Twiddling and Integer Math can be awkward</h2> <p>This section is hard to write because it’s criticizing without offering an obviously better solution, please read it as ‘constructive criticism’. Hopefully Zig will be able to fix some of those things on the road towards 1.0.</p> <p>Zig’s integer handling is quite different from C:</p> <ul> <li>arbitrary bit-width integers are the norm, not the exception</li> <li>there is no concept of integer promotion in math expressions (not that I noticed at least)</li> <li>implicit conversion between different integer types is only allowed when no data loss can happen (e.g. an u8 can be assigned to an u16, but assigning an u16 to an u8 requires an explicit cast)</li> <li>mixing signed and unsigned values in expressions isn’t allowed</li> <li>overflow is checked in Debug and ReleaseSafe mode, and there are separate operators for ‘intended wraparound’</li> </ul> <p>At first glance these features look pretty nice because they fix some obvious footguns in C and C++. Arbitrary width integer types are especially useful for emulator code, because hardware chips are full of ‘odd-width’ counters and registers (3, 5, 20 bits etc…). Directly mapping such registers to types like u3, u5 or u20 should potentially allow for more readable and ‘expressive’ code.</p> <p>Unfortunately, in reality it’s not so clear cut. While C is definitely too sloppy when it comes to integer math, Zig might swing the pendulum a bit too far into the other direction by requiring too much explicit casting.</p> <p>The most extreme example I stumbled over was implementing the Z80’s indexed addressing mode (e.g. those instructions involving <code class="language-plaintext highlighter-rouge">(IX+d)</code> or <code class="language-plaintext highlighter-rouge">(IY+d)</code>. This takes the byte <code class="language-plaintext highlighter-rouge">d</code> and adds it as a signed quantity and with wraparound to a 16 bit address (e.g. the byte is sign-extended to a 16-bit value before the addition).</p> <p>In C this is quite straightforward:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint16_t</span> <span class="nf">addi8</span><span class="p">(</span><span class="kt">uint16_t</span> <span class="n">addr</span><span class="p">,</span> <span class="kt">uint8_t</span> <span class="n">offset</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">addr</span> <span class="o">+</span> <span class="p">(</span><span class="kt">int8_t</span><span class="p">)</span><span class="n">offset</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <p>The simplest way I could come up with to do the same in Zig is:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">addi8</span><span class="p">(</span><span class="n">addr</span><span class="p">:</span> <span class="kt">u16</span><span class="p">,</span> <span class="n">offset</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u16</span> <span class="p">{</span> <span class="k">return</span> <span class="n">addr</span> <span class="o">+%</span> <span class="nb">@as</span><span class="p">(</span><span class="kt">u16</span><span class="p">,</span> <span class="nb">@bitCast</span><span class="p">(</span><span class="nb">@as</span><span class="p">(</span><span class="kt">i16</span><span class="p">,</span> <span class="nb">@as</span><span class="p">(</span><span class="kt">i8</span><span class="p">,</span> <span class="nb">@bitCast</span><span class="p">(</span><span class="n">offset</span><span class="p">)))));</span> <span class="p">}</span> </code></pre></div></div> <p>Note how the integer conversion gets totally drowned in ‘@-litter’.</p> <p>Both functions result in the same x86 and ARM assembly output (with -O3 for C and any of the Release modes in Zig):</p> <pre><code class="language-assembly">addi8: movsx eax, sil ; move low byte of esi into eax with sign-extension add eax, edi ; eax += edi ret </code></pre> <p>For ARM (looks like ARM handles the sign-extension right in the add instruction, not very RISC-y but neat!):</p> <pre><code class="language-assembly">addi8: add w0, w0, w1, sxtb ret </code></pre> <p>IMHO when the assembly output of a compiler looks so much more straightforward than the high level compiler input, it becomes a bit hard to justify why high level programming languages had been invented in the first place ;)</p> <p>Apart from that extreme case (which only exists once in the whole code base), narrowing conversions are much more common when writing code that mixes different integer widths, and those narrowing conversions require explicit casts, and those explicit casts may reduce readability quite a bit.</p> <p>The basic idea to only allow implicit conversions that can’t lose data is definitely a good one, but very often a cast is required even though the compiler has all the information it needs at compile time to prove that no information is lost.</p> <p>For instance this Zig code currently is an error:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">trunc4</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u4</span> <span class="p">{</span> <span class="k">return</span> <span class="n">val</span> <span class="o">&amp;</span> <span class="mi">0xF</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <p>The expression result would fit into an u4, yet an <code class="language-plaintext highlighter-rouge">@intCast</code> or <code class="language-plaintext highlighter-rouge">@truncate</code> is required to make it work:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">trunc4</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u4</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">@intCast</span><span class="p">(</span><span class="n">val</span> <span class="o">&amp;</span> <span class="mi">0xF</span><span class="p">);</span> <span class="p">}</span> </code></pre></div></div> <p>Similar situation with a right-shift:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">broken</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u4</span> <span class="p">{</span> <span class="k">return</span> <span class="n">val</span> <span class="o">&gt;&gt;</span> <span class="mi">4</span><span class="p">;</span> <span class="p">}</span> <span class="k">fn</span> <span class="n">works</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u4</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">@truncate</span><span class="p">(</span><span class="n">val</span> <span class="o">&gt;&gt;</span> <span class="mi">4</span><span class="p">);</span> <span class="p">}</span> </code></pre></div></div> <p>Somewhat surprisingly, this works fine though:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">const</span> <span class="n">a</span><span class="p">:</span> <span class="kt">u8</span> <span class="o">=</span> <span class="mi">0xFF</span><span class="p">;</span> <span class="k">const</span> <span class="n">b</span><span class="p">:</span> <span class="kt">u4</span> <span class="o">=</span> <span class="n">a</span> <span class="o">&amp;</span> <span class="mi">0xF</span><span class="p">;</span> <span class="k">const</span> <span class="n">c</span><span class="p">:</span> <span class="kt">u4</span> <span class="o">=</span> <span class="n">a</span> <span class="o">&gt;&gt;</span> <span class="mi">4</span><span class="p">;</span> </code></pre></div></div> <p>A similar problem exists with loop variables, which are always of type usize and which need to be explicitly narrowed even if the loop count is guaranteed to fit into a smaller type:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="mi">0</span><span class="o">..</span><span class="mi">16</span><span class="p">)</span> <span class="p">|</span><span class="mi">_</span><span class="n">i</span><span class="p">|</span> <span class="p">{</span> <span class="k">const</span> <span class="n">i</span><span class="p">:</span> <span class="kt">u4</span> <span class="o">=</span> <span class="nb">@intCast</span><span class="p">(</span><span class="mi">_</span><span class="n">i</span><span class="p">);</span> <span class="p">}</span> </code></pre></div></div> <p>There’s also surprising cases like this:</p> <p>Assuming that:</p> <ul> <li>a: u16 = 0xF000</li> <li>b: u16 = 0x1000</li> <li>c: u32 = 0x10000</li> </ul> <p>This expression creates an overflow error:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">const</span> <span class="n">d</span> <span class="o">=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">+</span> <span class="n">c</span><span class="p">;</span> </code></pre></div></div> <p>…but this doesn’t:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">const</span> <span class="n">e</span> <span class="o">=</span> <span class="n">c</span> <span class="o">+</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">;</span> </code></pre></div></div> <p>The type of <code class="language-plaintext highlighter-rouge">d</code> and <code class="language-plaintext highlighter-rouge">e</code> is both <code class="language-plaintext highlighter-rouge">u32</code> btw (which I find also a bit surprising, it means that Zig already picks the widest input type as the result type, but it doesn’t promote the other inputs to this widest type).</p> <p>And here’s another surprising behaviour I stumbled over:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// self.sprite_coords[] is an array of bytes</span> <span class="k">const</span> <span class="n">px</span><span class="p">:</span> <span class="kt">usize</span> <span class="o">=</span> <span class="mi">272</span> <span class="o">-</span> <span class="n">self</span><span class="p">.</span><span class="py">sprite_coords</span><span class="p">[</span><span class="n">sprite_index</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span> </code></pre></div></div> <p>This produces the error <code class="language-plaintext highlighter-rouge">error: type 'u8' cannot represent integer value '272'</code>. Why Zig tries to fit the constant 272 into an u8 instead of picking a wider type is a bit of a mystery tbh.</p> <p>One solution is to widen the value read from the array:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">px</span><span class="p">:</span> <span class="kt">usize</span> <span class="o">=</span> <span class="mi">272</span> <span class="o">-</span> <span class="nb">@as</span><span class="p">(</span><span class="kt">usize</span><span class="p">,</span> <span class="n">self</span><span class="p">.</span><span class="py">sprite_coords</span><span class="p">[</span><span class="n">sprite_index</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]);</span> </code></pre></div></div> <p>But this works too:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">px</span><span class="p">:</span> <span class="kt">usize</span> <span class="o">=</span> <span class="nb">@as</span><span class="p">(</span><span class="kt">u9</span><span class="p">,</span> <span class="mi">272</span><span class="p">)</span> <span class="o">-</span> <span class="n">self</span><span class="p">.</span><span class="py">sprite_coords</span><span class="p">[</span><span class="n">sprite_index</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span> </code></pre></div></div> <p>In conclusion, I only understood that C’s integer promotion actually has an important purpose after missing it so badly in Zig :D</p> <p>I think C’s main problem with integer promotion is that it promotes to <code class="language-plaintext highlighter-rouge">int</code>, and int being stuck at 32-bits even on 64-bit CPUs (not moving the <code class="language-plaintext highlighter-rouge">int</code> type to 64 bits during the transition from 32- to 64-bit CPUs was a pretty stupid decision in hindsight).</p> <p>TBF though, just extending to the natural word size (e.g. 64 bits) wouldn’t help much in Zig when using wide integers like u128.</p> <p>In any case, I hope that the current status quo isn’t what ends up in Zig 1.0 and that a way can be found to reduce ‘@-litter’ in mixed-width integer expressions without going back entirely to C’s admittedly too sloppy integer promotion and implicit conversion rules.</p> <p>Asking around on the Zig Discord there seems to be a proposal which lets operators narrow the result type for comptime known values (which if I understand it right would make the result type of the expression <code class="language-plaintext highlighter-rouge">a &amp; 0xF</code> a <code class="language-plaintext highlighter-rouge">u4</code> instead of whatever wider type <code class="language-plaintext highlighter-rouge">a</code> is).</p> <p>Another idea that might make sense is to promote integers to the widest input type. Currently the compiler already seems to use the widest input type in an expression as result type, promoting the other inputs to this widest type looks like a logical step to me.</p> <p>I would keep the strict separation of signed and unsigned integer types though, e.g. mixed-sign expressions are not allowed, and any theoretical integer promotion should never happen ‘across signedness’.</p> <p>From my own experience in C (where I don’t allow implicit sign-conversion via -Wsign-conversion warnings) I can tell that this will feel painful in the beginning for C and C++ coders, but it makes for better code and API design in the long run.</p> <p>This experience (of transitioning to more restrictive but also more correct C code by enabling certain warnings) is also why I’m giving Zig some slack about its integer conversion strictness. After all, maybe I’m just not used to it yet. But OTH, I have by now written enough Zig code that I should slowly get used to it, but it <em>still</em> feels bumpy. All in all I think this is an area where ‘strict design purity’ can harm the language in the long run though, and a better balance should be found between strictness, coding convenience and readability.</p> <h2 id="using-wide-integers-with-bit-twiddling-code-is-fast">Using wide integers with bit twiddling code is fast</h2> <p>Using a 128 bit integer variable for the emulator system bus works nicely and doesn’t have a relevant performance impact. In fact, with a bit of care (by not using bit twiddling operations that cross a 64-bit boundary) the produced assembly code is identical to doing the same operation on a simple 64-bit variable.</p> <p>For instance extracting an 8-bit value from the upper half of an 128-bit integer:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">getu8</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u128</span><span class="p">)</span> <span class="kt">u8</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">@truncate</span><span class="p">(</span><span class="n">val</span> <span class="o">&gt;&gt;</span> <span class="mi">64</span><span class="p">);</span> <span class="p">}</span> </code></pre></div></div> <p>…is just moving the register which holds the upper 64 bits into the return value register:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>getu8: mov rax, rsi ret </code></pre></div></div> <p>…which is the same cost as extracting an 8-bit value from a 64-bit variable:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">getu8</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u64</span><span class="p">)</span> <span class="kt">u8</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">@truncate</span><span class="p">(</span><span class="n">val</span><span class="p">);</span> <span class="p">}</span> </code></pre></div></div> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>getu8: mov rax, rdi ret </code></pre></div></div> <p>…just make sure that the operation doesn’t cross 64-bit boundaries:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">getu8</span><span class="p">(</span><span class="n">val</span><span class="p">:</span> <span class="kt">u128</span><span class="p">)</span> <span class="kt">u8</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">@truncate</span><span class="p">(</span><span class="n">val</span> <span class="o">&gt;&gt;</span> <span class="mi">60</span><span class="p">);</span> <span class="p">}</span> </code></pre></div></div> <p>…because this now involves actual bit twiddling:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>getu8: shl esi, 4 shr rdi, 60 lea eax, [rdi + rsi] ret </code></pre></div></div> <h2 id="debug-performance">Debug Performance</h2> <p>Release performance of my C emulator code (with -O3) and my Zig code (with -ReleaseFast) is roughly in the same ballpark, but I’m seeing a pretty big difference in Debug performance:</p> <ul> <li>in C, debug performance is roughly 2x slower than -O3</li> <li>in Zig, debug performance is roughly 3..4x slower than ReleaseFast</li> </ul> <p>I haven’t figured out why yet, but it’s not the most obvious candidate (range and overflow checks) since ReleaseSafe performance is nearly identical with ReleaseFast (interestingly ReleaseSmall is the slowest Release build config, it’s about 40% slower than both ReleaseFast and ReleaseSafe).</p> <p>One important difference between my C and Zig code is that in C I’m using tons of small preprocessor macros to make bit twiddling expressions more readable. In Zig these are replaced with inline functions (<code class="language-plaintext highlighter-rouge">inline</code> in Zig isn’t just an optimization hint, it causes the function body to be inlined also in debug mode).</p> <p>At first glance Zig’s inline functions seem to be a good replacement for C preprocessor macros, but when looking at the generated code in debug mode, the compiler still pushes and pops function arguments through the stack even though the function body is inlined.</p> <p>Consider this Zig code:</p> <div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">inline</span> <span class="k">fn</span> <span class="n">add</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="kt">u8</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u8</span> <span class="p">{</span> <span class="k">return</span> <span class="n">a</span> <span class="o">+%</span> <span class="n">b</span><span class="p">;</span> <span class="p">}</span> <span class="k">fn</span> <span class="n">add_1</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="kt">u8</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u8</span> <span class="p">{</span> <span class="k">return</span> <span class="n">add</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span> <span class="p">}</span> <span class="k">fn</span> <span class="n">add_2</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="kt">u8</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="kt">u8</span> <span class="p">{</span> <span class="k">return</span> <span class="n">a</span> <span class="o">+%</span> <span class="n">b</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <p>…in release mode, both functions produce the same code as expected:</p> <pre><code class="language-assembly">add_1: lea eax, [rsi + rdi] ret add_2: lea eax, [rsi + rdi] ret </code></pre> <p>But in debug mode, the function which calls the inline function has a slightly higher overhead because of additional stack traffic:</p> <pre><code class="language-assembly">add_1: push rbp mov rbp, rsp sub rsp, 5 mov cl, sil mov al, dil mov byte ptr [rbp - 4], al mov byte ptr [rbp - 3], cl mov byte ptr [rbp - 2], al mov byte ptr [rbp - 1], cl add al, cl mov byte ptr [rbp - 5], al mov al, byte ptr [rbp - 5] movzx eax, al add rsp, 5 pop rbp ret add_2: push rbp mov rbp, rsp sub rsp, 2 mov cl, sil mov al, dil mov byte ptr [rbp - 2], al mov byte ptr [rbp - 1], cl add al, cl movzx eax, al add rsp, 2 pop rbp ret </code></pre> <p>TBH though it’s unlikely that inline function overhead is the only contributor to the slower debug performance, but it could be many such small papercuts combined.</p> <h2 id="conclusion">Conclusion</h2> <p>I enjoy working with Zig immensely despite the few warts I encountered, for the most part the code just ‘flows out of the hand’ which IMHO is an important property of a programming language. It’s encouraging to see how areas which were a bumpy ride during the 0.10 to 0.11 versions have improved and stabilized (most importantly the build and package management system).</p> <p>It’s also interesting how the ‘most popular design fault’ that comes up in every single Zig discussion (currently that’s ‘unused variables are errors’) is a complete non-issue (for me at least, not once in that 16-kloc project was that an annoyance), while the issue that actually mildly annoyed me in real world code (the <code class="language-plaintext highlighter-rouge">@-litter</code> in mixed-width integer expressions) is still very much under the radar. Maybe also because mixed-width and bit twiddling code might not be all that common in typical Zig projects, most integer code is probably about computing array indices or data offsets and happen in usize.</p> <p>I also completely left out a whole chapter about code generation with Zig (which would have been mostly about string processing and memory management), simply because the blog post would have become too big, and it is probably an interesting enough topic for its own blog post. This is also an area where Zig is different enough from C, mid-level languages like C++ or Rust, and high level memory-managed languages that I don’t feel quite confident enough yet to have found the right solution to questions like ‘who owns the underlying memory of a slice returned from a function’ - I have solutions of course, but I’m not entirely happy with them because it feels like a throwback to my first forays into C and C++.</p> <p>In short, I don’t want to burden myself (too much) with memory ownership questions, even in low level systems programming languages. Typically in C I avoid such problems with a ‘mostly value-driven approach’ instead of returning references to data, I return a copy of the data (unless of course it’s about bulk data like images, 3d meshes, file content etc.. but those are special cases which are easy to deal with using manual memory management).</p> <p>Zig is leaning in heavily on slices though, which are just pointer/size pairs without any concept of ownership. It would be nice if Zig had some syntax sugar to make working with arrays just as flexible as with slices, because arrays are value types and avoid all the ownership footguns of slices. I think mostly this comes down to implementing a handful ‘missing features’ from C99 designated initialization (like <a href="https://github.com/ziglang/zig/issues/6068">#6068</a>) or maybe even looking at languages like JS and TS (…shock and gasps from the audience!!! I know but bear with me) for a couple of features which make working with struct and array values more convenient (like destructuring and spreading).</p> <p>…but I’m already halfway into that other blog post which I wanted to avoid, so let’s end it here lol.</p> Sat, 24 Aug 2024 00:00:00 +0000 https://floooh.github.io/2024/08/24/zig-and-emulators.html https://floooh.github.io/2024/08/24/zig-and-emulators.html Upcoming Sokol header API changes (May 2024) <p>Aka: “the storage buffer update”</p> <p>In a couple of days I will merge the next sokol-gfx feature update which adds initial storage buffer support. The update also affects other headers and tools (most notably sokol_app.h, all headers with embedded shaders, and sokol-shdc - the cross-backend shader compiler).</p> <p>The bad news first:</p> <ul> <li>This is ‘gpu-readonly’ support, e.g. it’s not possible (yet) to write to storage buffers from shader code, gpu-write support will come in a future ‘compute shaders’ update.</li> <li>The following platform/backend combos don’t get storage buffer support: <ul> <li>all GLES3 backends (WebGL2, iOS+GLES3, Android): for WebGL2 and iOS there is no other choice since they are stuck with GLES 3.0, for Android, storage buffer support may be added later</li> <li>macOS+GL: macOS is stuck at GL 4.1, while storage buffers require at least GL 4.3</li> </ul> </li> <li>This leaves the following platform/backend combos which support storage buffers: <ul> <li>macOS + Metal</li> <li>iOS + Metal</li> <li>Windows + D3D11</li> <li>Windows + GL</li> <li>Linux + GL</li> <li>Web + WebGPU</li> </ul> </li> </ul> <p>Storage buffers provide a convenient way to communicate large array-like data to shaders (the minimum guaranteed size for storage buffers is 128 MBytes), for instance:</p> <ul> <li>for ‘vertex pulling’ to load per-vertex and/or per-instance data from storage buffers instead of relying on the fixed function vertex input stage</li> <li>as a more convenient and flexible way to load random access data in shaders compared to the old-school way of using ‘data textures’.</li> </ul> <p>…and as a ‘drive-by’ feature: sokol-gfx now finally allows to kick off a draw call without any resource bindings and instead synthesize vertices ‘out of thin air’ in the vertex shader.</p> <p>The root PR for the update is here: <a href="https://github.com/floooh/sokol/pull/1007">#1007</a>.</p> <h2 id="new-sample-code">New sample code</h2> <p>The following backend-agnostic samples have been added (those use sokol_app.h and sokol-shdc).</p> <blockquote> <p>NOTE: You’ll need a recent Chrome for the WebGPU sample links to work, also expect some general breakage and rendering artifacts depending on the platform (for instance Chrome on Android straight up crashes the tab on most samples). Also please note that the source code links in those samples will not be valid until all the update PRs have been merged.</p> </blockquote> <ul> <li><strong>triangle-bufferless-sapp</strong>: this demonstrates rendering without buffers (and is the only new sample that also works on backends without storage buffer support): <ul> <li>WebGPU: <a href="https://floooh.github.io/sokol-webgpu/triangle-bufferless-sapp.html">triangle-bufferless-sapp.html</a></li> <li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/triangle-bufferless-sapp.c">sapp/triangle-bufferless-sapp.c</a></li> <li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/triangle-bufferless-sapp.glsl">sapp/triangle-bufferless-sapp.glsl</a></li> </ul> </li> <li><strong>vertexpull-sapp</strong>: the cube-sapp sample ported to vertex pulling: <ul> <li>WebGPU: <a href="https://floooh.github.io/sokol-webgpu/vertexpull-sapp.html">vertexpull-sapp.html</a></li> <li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/vertexpull-sapp.c">sapp/vertexpull-sapp.c</a></li> <li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/vertexpull-sapp.glsl">sapp/vertexpull-sapp.glsl</a></li> </ul> </li> <li><strong>sbuftex-sapp</strong>: a sample which uses a storage buffer in the fragment shader stage: <ul> <li>WebGPU: <a href="https://floooh.github.io/sokol-webgpu/sbuftex-sapp.html">sbuftex-sapp.html</a></li> <li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/sbuftex-sapp.c">sapp/sbuftex-sapp.c</a></li> <li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/sbuftex-sapp.glsl">sapp/sbuftex-sapp.glsl</a></li> </ul> </li> <li><strong>instancing-pull-sapp</strong>: vertex pulling and instancing via storage buffers: <ul> <li>WebGPU: <a href="https://floooh.github.io/sokol-webgpu/instancing-pull-sapp.html">instancing-pull-sapp.html</a></li> <li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-pull-sapp.c">sapp/instancing-pull-sapp.c</a></li> <li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/instancing-pull-sapp.glsl">sapp/instancing-pull-sapp.glsl</a></li> </ul> </li> <li><strong>ozz-storagebuffer-sapp</strong>: the ozz-skin sample rewritten to pull vertices, instance- and skinning-matrices from storage buffers: <ul> <li>WebGPU: <a href="https://floooh.github.io/sokol-webgpu/ozz-storagebuffer-sapp.html">ozz-storagebuffer-sapp.html</a></li> <li>C code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/ozz-storagebuffer-sapp.cc">sapp/ozz-storagebuffer-sapp.cc</a></li> <li>GLSL code: <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/ozz-storagebuffer-sapp.glsl">sapp/ozz-storagebuffer-sapp.glsl</a></li> </ul> </li> </ul> <p>The following backend-specific samples demonstrate how to use storage buffers without the sokol-shdc shader compiler:</p> <ul> <li><strong>D3D11</strong> <a href="https://github.com/floooh/sokol-samples/blob/master/d3d11/vertexpulling-d3d11.c">d3d11/vertexpulling-d3d11.c</a></li> <li><strong>Metal</strong>: <a href="https://github.com/floooh/sokol-samples/blob/master/metal/vertexpulling-metal.c">metal/vertexpulling-metal.c</a></li> <li><strong>WebGPU</strong>: <a href="https://github.com/floooh/sokol-samples/blob/master/wgpu/vertexpulling-wgpu.c">wgpu/vertexpulling-wgpu.c</a></li> <li><strong>desktop GL</strong>: <a href="https://github.com/floooh/sokol-samples/blob/master/glfw/vertexpulling-glfw.c">glfw/vertexpulling-glfw.c</a></li> </ul> <h2 id="how-to-check-for-storage-buffer-support">How to check for storage buffer support</h2> <p>To check for storage buffer support at runtime, call <code class="language-plaintext highlighter-rouge">sg_query_features()</code> and check the <code class="language-plaintext highlighter-rouge">storage_buffer</code> boolean in the result:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="n">sg_query_features</span><span class="p">().</span><span class="n">storage_buffer</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// storage buffers are supported...</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="c1">// storage buffers are *NOT* supported...</span> <span class="p">}</span> </code></pre></div></div> <h2 id="desktop-gl-version-caveats-and-a-minor-breaking-change">Desktop GL version caveats (and a minor breaking change)</h2> <p>The sokol_gfx.h desktop-GL backend will now query what GL version it runs on to decide whether storage buffers are supported (storage buffers were added in GL 4.3).</p> <p>The expected minimal version has been bumped to 4.1 on macOS and 4.3 on other platforms, this also means that sokol_app.h will now by default create a 4.1 context on macOS, and 4.3 context on other platforms.</p> <p>Since the GL version is now flexible, the configuration define <code class="language-plaintext highlighter-rouge">SOKOL_GLCORE33</code> doesn’t make much sense anymore and has been renamed to <code class="language-plaintext highlighter-rouge">SOKOL_GLCORE</code>. You’ll get a proper compile error when trying to build with the old <code class="language-plaintext highlighter-rouge">SOKOL_GLCORE33</code> define.</p> <p>Apart from rebuilding your shaders via an updated sokol-shdc, this is the only required change for existing code.</p> <p>In sokol-shdc, the target language <code class="language-plaintext highlighter-rouge">glsl330</code> has been removed and replaced with <code class="language-plaintext highlighter-rouge">glsl410</code> and <code class="language-plaintext highlighter-rouge">glsl430</code>. When targeting the macOS GL backend, use <code class="language-plaintext highlighter-rouge">glsl410</code>, otherwise <code class="language-plaintext highlighter-rouge">glsl430</code>.</p> <h2 id="a-simple-vertex-pulling-example">A simple vertex pulling example</h2> <p>First let’s rewrite the <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/cube-sapp.glsl">cube-sapp.glsl</a> shader to pull vertices from a storage buffer instead of the fixed function vertex input.</p> <p>The original shader declares the vertex input with vertex attributes:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">in</span> <span class="kt">vec4</span> <span class="n">position</span><span class="p">;</span> <span class="k">in</span> <span class="kt">vec4</span> <span class="n">color0</span><span class="p">;</span> </code></pre></div></div> <blockquote> <p>NOTE: the cube-sapp.glsl shader makes use of a fixed function vertex input feature which extends float[3] vertex data on the CPU side to vec4 with a w-component 1.0 on the GPU side. Magic like this isn’t supported when reading from storage buffers (as far as I’m aware at least).</p> </blockquote> <p>For vertex pulling the input vertex attributes are replaced with a flexible-array struct inside a buffer interface block.</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">sb_vertex</span> <span class="p">{</span> <span class="kt">vec3</span> <span class="n">pos</span><span class="p">;</span> <span class="kt">vec4</span> <span class="n">color</span><span class="p">;</span> <span class="p">};</span> <span class="n">readonly</span> <span class="n">buffer</span> <span class="n">ssbo</span> <span class="p">{</span> <span class="n">sb_vertex</span> <span class="n">vtx</span><span class="p">[];</span> <span class="p">};</span> </code></pre></div></div> <blockquote> <p>NOTE: I’m using <code class="language-plaintext highlighter-rouge">sb_vertex</code> for the struct name here because <code class="language-plaintext highlighter-rouge">vertex</code> is a reserved keyword in the Metal Shading Language and would cause a compile error when outputting MSL.</p> </blockquote> <p>Do not use an attribute like <code class="language-plaintext highlighter-rouge">layout(std430, binding=0)</code> for the buffer interface block, sokol-shdc will take care of those details.</p> <p>The original vertex shader looks like this:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="nb">gl_Position</span> <span class="o">=</span> <span class="n">mvp</span> <span class="o">*</span> <span class="n">position</span><span class="p">;</span> <span class="n">color</span> <span class="o">=</span> <span class="n">color0</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <p>Converted to vertex pulling it looks like this:</p> <div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="kt">vec4</span> <span class="n">position</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="n">vtx</span><span class="p">[</span><span class="n">gl_VertexIndex</span><span class="p">].</span><span class="n">pos</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span> <span class="nb">gl_Position</span> <span class="o">=</span> <span class="n">mvp</span> <span class="o">*</span> <span class="n">position</span><span class="p">;</span> <span class="n">color</span> <span class="o">=</span> <span class="n">vtx</span><span class="p">[</span><span class="n">gl_VertexIndex</span><span class="p">].</span><span class="n">color</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <p>Note how <code class="language-plaintext highlighter-rouge">gl_VertexIndex</code> (not <code class="language-plaintext highlighter-rouge">gl_VertexID</code>!) is used to index into the storage buffer, this is because sokol-shdc shaders are written in ‘Vulkan style’, not ‘GL style’.</p> <p>We also need to expand the vec3 input pos manually to a vec4 with w-component = 1.0.</p> <p>That’s all the changes needed on the shader side. Next compile the modified shader with:</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sokol-shdc <span class="nt">-i</span> shader.glsl <span class="nt">-o</span> shader.h <span class="nt">-l</span> metal_macos:hlsl5:glsl430:wgsl <span class="nt">-f</span> sokol </code></pre></div></div> <p>Apart from the ‘traditional’ code-generation output, sokol-shdc will create two new declarations:</p> <ul> <li> <p>A define <code class="language-plaintext highlighter-rouge">#define SLOT_ssbo (0)</code>, this is the bind slot index to be used in the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct</p> </li> <li> <p>A C struct <code class="language-plaintext highlighter-rouge">sb_vertex_t</code> which maps the GLSL struct <code class="language-plaintext highlighter-rouge">sb_vertex</code> to the C side looking like this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">SOKOL_SHDC_ALIGN</span><span class="p">(</span><span class="mi">16</span><span class="p">)</span> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">sb_vertex_t</span> <span class="p">{</span> <span class="kt">float</span> <span class="n">pos</span><span class="p">[</span><span class="mi">3</span><span class="p">];</span> <span class="kt">uint8_t</span> <span class="n">_pad_12</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span> <span class="kt">float</span> <span class="n">color</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span> <span class="p">}</span> <span class="n">sb_vertex_t</span><span class="p">;</span> </code></pre></div> </div> </li> </ul> <blockquote> <p>NOTE: with the right <code class="language-plaintext highlighter-rouge">@ctype</code> tags at the top of the shader we could also map the struct members to C or C++ types, for instance with HandmadeMath.h types:</p> </blockquote> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">SOKOL_SHDC_ALIGN</span><span class="p">(</span><span class="mi">16</span><span class="p">)</span> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">sb_vertex_t</span> <span class="p">{</span> <span class="n">hmm_vec3</span> <span class="n">pos</span><span class="p">;</span> <span class="kt">uint8_t</span> <span class="n">_pad_12</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span> <span class="n">hmm_vec4</span> <span class="n">color</span><span class="p">;</span> <span class="p">}</span> <span class="n">sb_vertex_t</span><span class="p">;</span> </code></pre></div></div> <p>Next let’s see how the <a href="https://github.com/floooh/sokol-samples/blob/master/sapp/cube-sapp.c">cube-sapp C code</a> needs to be changed:</p> <p>The original code creates a vertex buffer like this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">float</span> <span class="n">vertices</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="p">...</span> <span class="p">};</span> <span class="n">sg_buffer</span> <span class="n">vbuf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices</span><span class="p">),</span> <span class="p">.</span><span class="n">label</span> <span class="o">=</span> <span class="s">"cube-vertices"</span> <span class="p">});</span> </code></pre></div></div> <p>By default <code class="language-plaintext highlighter-rouge">sg_make_buffer()</code> creates a vertex buffer, so the above is identical with a more explicit:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_buffer</span> <span class="n">vbuf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">type</span> <span class="o">=</span> <span class="n">SG_BUFFERTYPE_VERTEXBUFFER</span><span class="p">,</span> <span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices</span><span class="p">),</span> <span class="p">.</span><span class="n">label</span> <span class="o">=</span> <span class="s">"cube-vertices"</span> <span class="p">});</span> </code></pre></div></div> <p>…when changing the code to use storage buffers we can use the code-generated <code class="language-plaintext highlighter-rouge">sb_vertex_t</code> struct to initialize the vertex data. This has the advantage that we don’t need to care about the obscure <code class="language-plaintext highlighter-rouge">std430</code> memory layout rules:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sb_vertex_t</span> <span class="n">vertices</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="p">{</span> <span class="p">.</span><span class="n">pos</span> <span class="o">=</span> <span class="p">{</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span> <span class="p">},</span> <span class="p">.</span><span class="n">color</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span> <span class="p">}</span> <span class="p">},</span> <span class="p">{</span> <span class="p">.</span><span class="n">pos</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span> <span class="p">},</span> <span class="p">.</span><span class="n">color</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span> <span class="p">}</span> <span class="p">},</span> <span class="p">...</span> <span class="p">};</span> <span class="n">sg_buffer</span> <span class="n">sbuf</span> <span class="o">=</span> <span class="n">sg_make_buffer</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_buffer_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">type</span> <span class="o">=</span> <span class="n">SG_BUFFERTYPE_STORAGEBUFFER</span><span class="p">,</span> <span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">SG_RANGE</span><span class="p">(</span><span class="n">vertices</span><span class="p">),</span> <span class="p">.</span><span class="n">label</span> <span class="o">=</span> <span class="s">"cube-vertices"</span><span class="p">,</span> <span class="p">});</span> </code></pre></div></div> <p>…note how the buffer type has changed to <code class="language-plaintext highlighter-rouge">SG_BUFFERTYPE_STORAGEBUFFER</code>.</p> <p>On to the <code class="language-plaintext highlighter-rouge">sg_pipeline</code> object. In the original code, a vertex layout must be defined in the <code class="language-plaintext highlighter-rouge">sg_pipeline_desc</code> struct to configure the fixed function vertex input stage:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">state</span><span class="p">.</span><span class="n">pip</span> <span class="o">=</span> <span class="n">sg_make_pipeline</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pipeline_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">layout</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">attrs</span> <span class="o">=</span> <span class="p">{</span> <span class="p">[</span><span class="n">ATTR_vs_position</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="n">SG_VERTEXFORMAT_FLOAT3</span><span class="p">,</span> <span class="p">[</span><span class="n">ATTR_vs_color0</span><span class="p">].</span><span class="n">format</span> <span class="o">=</span> <span class="n">SG_VERTEXFORMAT_FLOAT4</span> <span class="p">}</span> <span class="p">},</span> <span class="p">.</span><span class="n">shader</span> <span class="o">=</span> <span class="n">shd</span><span class="p">,</span> <span class="p">.</span><span class="n">index_type</span> <span class="o">=</span> <span class="n">SG_INDEXTYPE_UINT16</span><span class="p">,</span> <span class="p">.</span><span class="n">cull_mode</span> <span class="o">=</span> <span class="n">SG_CULLMODE_BACK</span><span class="p">,</span> <span class="p">.</span><span class="n">depth</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">write_enabled</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">compare</span> <span class="o">=</span> <span class="n">SG_COMPAREFUNC_LESS_EQUAL</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">label</span> <span class="o">=</span> <span class="s">"cube-pipeline"</span> <span class="p">});</span> </code></pre></div></div> <p>When pulling vertex data from storage buffers such a vertex layout description isn’t needed, so the pipeline creation can be simplified to this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">state</span><span class="p">.</span><span class="n">pip</span> <span class="o">=</span> <span class="n">sg_make_pipeline</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pipeline_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">shader</span> <span class="o">=</span> <span class="n">shd</span><span class="p">,</span> <span class="p">.</span><span class="n">index_type</span> <span class="o">=</span> <span class="n">SG_INDEXTYPE_UINT16</span><span class="p">,</span> <span class="p">.</span><span class="n">cull_mode</span> <span class="o">=</span> <span class="n">SG_CULLMODE_BACK</span><span class="p">,</span> <span class="p">.</span><span class="n">depth</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">write_enabled</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">compare</span> <span class="o">=</span> <span class="n">SG_COMPAREFUNC_LESS_EQUAL</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">label</span> <span class="o">=</span> <span class="s">"cube-pipeline"</span> <span class="p">});</span> </code></pre></div></div> <p>…the original <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct that’s passed into <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code>:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">state</span><span class="p">.</span><span class="n">bind</span> <span class="o">=</span> <span class="p">(</span><span class="n">sg_bindings</span><span class="p">)</span> <span class="p">{</span> <span class="p">.</span><span class="n">vertex_buffers</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">vbuf</span><span class="p">,</span> <span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="n">ibuf</span> <span class="p">};</span> </code></pre></div></div> <p>…is changed like this (e.g. replace the vertex buffer binding with a storage buffer binding on the vertex shader stage):</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">state</span><span class="p">.</span><span class="n">bind</span> <span class="o">=</span> <span class="p">(</span><span class="n">sg_bindings</span><span class="p">)</span> <span class="p">{</span> <span class="p">.</span><span class="n">index_buffer</span> <span class="o">=</span> <span class="n">ibuf</span> <span class="p">.</span><span class="n">vs</span><span class="p">.</span><span class="n">storage_buffers</span><span class="p">[</span><span class="n">SLOT_ssbo</span><span class="p">]</span> <span class="o">=</span> <span class="n">sbuf</span><span class="p">,</span> <span class="p">};</span> </code></pre></div></div> <p>…and that’s it! On the CPU side, storage buffers actually simplify a lot of code because you don’t need a vertex layout in the <code class="language-plaintext highlighter-rouge">sg_pipeline_desc</code> struct, and you get a properly aligned and padded C struct for the storage buffer content from sokol-shdc.</p> <blockquote> <p>NOTE: A ‘proper’ cross-backend sample should also check whether storage buffers are actually supported via <code class="language-plaintext highlighter-rouge">sg_query_features().storage_buffer</code> and render some sort of fallback.</p> </blockquote> <h2 id="shader-authoring-caveats">Shader Authoring Caveats</h2> <p>Shader authoring via sokol-shdc is a bit more restricted than vanilla GLSL:</p> <ol> <li>A storage buffer interface block must contain exactly one item, and this item must be a flexible struct array member. In vanilla GLSL you can have additional ‘header items’ in front of the flexible array member, but this turned out tricky to map to CPU-side non-C languages that don’t allow flexible array members (I actually need to research the various target languages a bit more, maybe this rule can be relaxed in the future for some of the target languages).</li> <li>Currently the following types are valid inside a storage buffer struct: <ul> <li><code class="language-plaintext highlighter-rouge">bool, bvec2..4</code>: mapped to int32_t, and int32_t[2..4]</li> <li><code class="language-plaintext highlighter-rouge">int, ivec2..4</code>: mapped to int32_t, and int32_t[2..4]</li> <li><code class="language-plaintext highlighter-rouge">uint, uvec2..4</code>: mapped to uint32_t, and uint32_t[2..4]</li> <li><code class="language-plaintext highlighter-rouge">float, vec2..4</code>: mapped to float and float[2..4]</li> <li><code class="language-plaintext highlighter-rouge">matNxM</code> where N=2..4 and M=1..4 mapped to float[2..64]</li> </ul> </li> <li>nested structs</li> <li>arrays of the above</li> </ol> <p>Please note that only few of those combinations are tested, especially when it comes to correct array item padding and alignment. If you stumble over any problems please write a ticket at <a href="https://github.com/floooh/sokol-tools/issues">https://github.com/floooh/sokol-tools/issues</a>.</p> <p>To load packed vertex components from storage buffers, use the following GLSL builtins:</p> <ul> <li><code class="language-plaintext highlighter-rouge">vec2 unpackUnorm2x16(uint p)</code></li> <li><code class="language-plaintext highlighter-rouge">vec2 unpackSnorm2x16(uint p)</code></li> <li><code class="language-plaintext highlighter-rouge">vec4 unpackUnorm4x8(uint p)</code></li> <li><code class="language-plaintext highlighter-rouge">vec4 unpackSnorm4x8(uint p)</code></li> </ul> <h2 id="under-the-hood">Under the hood</h2> <blockquote> <p>NOTE: the following information about shader bind slots are only relevant if you do not use the sokol shader compiler (sokol-shdc), but instead pass ‘raw’ HLSL, MSL, GLSL or WGSL shaders into sokol_gfx.h. Also, this information will become obsolete/irrelevant with another future update I have in mind which will allow more flexibility when mapping sokol-gfx bind slots to backend 3D API bind slots (see this planning ticket for more info: <a href="https://github.com/floooh/sokol/issues/1037">#1037</a>)</p> </blockquote> <h3 id="metal">Metal</h3> <p>On Metal there is no ‘buffer zoo’ like in other 3D APIs, uniform-, vertex-, index- and storage-buffers are all the same thing. The vertex- and fragment-shader stages have their own buffer bind slot spaces though.</p> <p>The following bind slot ranges are used for the various sokol-gfx buffer types:</p> <ul> <li>on the vertex shader stage: <ul> <li><strong><code class="language-plaintext highlighter-rouge">slots 0..3</code></strong> for uniform buffer bindings (sokol-gfx internally manages an uniform buffer which might be bound at up to four different offsets)</li> <li><strong><code class="language-plaintext highlighter-rouge">slots 4..11</code></strong> for vertex buffer bindings</li> <li><strong><code class="language-plaintext highlighter-rouge">slots 12..19</code></strong> for storage buffer bindings</li> </ul> </li> <li>on the fragment shader stage: <ul> <li><strong><code class="language-plaintext highlighter-rouge">slots 0..3</code></strong> for uniform buffer bindings</li> <li><strong><code class="language-plaintext highlighter-rouge">slots 4..11</code></strong> for storage buffer bindings</li> </ul> </li> </ul> <p>When authoring Metal shaders directly you’ll need to use the above bind slots (also see the low-level <a href="https://github.com/floooh/sokol-samples/tree/master/metal">Metal backend samples</a>).</p> <h3 id="d3d11">D3D11</h3> <p>On D3D11, so called <a href="https://learn.microsoft.com/en-us/windows/win32/direct3d11/overviews-direct3d-11-resources-intro#raw-views-of-buffers"><em>Byte Address Buffers</em></a> are used for storage buffers which makes their direct usage in manually written HLSL a bit awkward (but is not an issue when using sokol-shdc).</p> <p>If this turns out to be a problem I might add D3D11-specific creation flags to <code class="language-plaintext highlighter-rouge">sg_buffer_desc</code> to allow using different D3D11 buffer and buffer-view types under the hood, details like this might also change again once compute shader support is added.</p> <p>On D3D11 and HLSL storage buffers share a bind slot range with texture bindings, that’s why sokol-gfx defines the following bind ranges for textures and storage buffers in HLSL:</p> <ul> <li><strong><code class="language-plaintext highlighter-rouge">register(t0..t15)</code></strong>: reserved for texture bindings</li> <li><strong><code class="language-plaintext highlighter-rouge">register(t16..t23)</code></strong>: reserved for storage buffer bindings</li> </ul> <p>Also see the low-level <a href="https://github.com/floooh/sokol-samples/tree/master/d3d11">D3D11 backend samples</a> for details.</p> <h3 id="webgpu">WebGPU</h3> <p>Storage buffers are created with <code class="language-plaintext highlighter-rouge">WGPUBufferUsage_Storage</code>. WebGPU uses a common bind slot space across all shader resource types and shader stages. Sokol-gfx reserves the following bind slot ranges for the different shader stages and resource types, use those when feeding manually written WGSL shaders into sokol-gfx:</p> <ul> <li>vertex shader stage: <ul> <li>textures: <strong><code class="language-plaintext highlighter-rouge">@group(1) @binding(0..15)</code></strong></li> <li>samplers: <strong><code class="language-plaintext highlighter-rouge">@group(1) @binding(16..31)</code></strong></li> <li>storage buffers; <strong><code class="language-plaintext highlighter-rouge">@group(1) @binding(32..47)</code></strong></li> </ul> </li> <li>fragment shader stage: <ul> <li>textures: <strong><code class="language-plaintext highlighter-rouge">@group(1) @binding(48..63)</code></strong></li> <li>samplers: <strong><code class="language-plaintext highlighter-rouge">@group(1) @binding(64..79)</code></strong></li> <li>storage buffers: <strong><code class="language-plaintext highlighter-rouge">@group(1) @binding(80..95)</code></strong></li> </ul> </li> </ul> <p>Also see the low-level <a href="https://github.com/floooh/sokol-samples/tree/master/wgpu">WebGPU backend samples</a> for details</p> <h3 id="gl">GL</h3> <p>In GL, storage buffers are bound to the <code class="language-plaintext highlighter-rouge">GL_SHADER_STORAGE_BUFFER</code> target. Sokol-gfx does not lookup GLSL storage buffer interface blocks by name, but instead expects that the GLSL code that’s passed into <code class="language-plaintext highlighter-rouge">sg_make_shader()</code> uses a <code class="language-plaintext highlighter-rouge">layout(std430, binding=N)</code> annotation to define the bind slot.</p> <p>The vertex- and fragment-shader stage use a common bind space:</p> <ul> <li>on the vertex shader stage, use <strong><code class="language-plaintext highlighter-rouge">binding 0..7</code></strong></li> <li>on the fragment shader stage, use <strong><code class="language-plaintext highlighter-rouge">binding 7..15</code></strong></li> </ul> <p>Also see the low-level <a href="https://github.com/floooh/sokol-samples/tree/storage-buffers/glfw">desktop GL backend samples</a> for details.</p> <h2 id="sokol-shdc-updates">sokol-shdc updates</h2> <p>Sokol-shdc has been massively refactored, mainly with the goal to have a more robust base for extracting reflection information from shaders and a more ‘structured’ approach to code generation so that supporting additional CPU-side languages will be easier in the future (I’m not yet sure if that last goal was actually achieved though, but time will tell).</p> <p>Unfortunately this massive refactoring also means that there’s a possibility that new bugs have sneaked in. If you notice anything weird, please write tickets here:</p> <p><a href="https://github.com/floooh/sokol-tools/issues">https://github.com/floooh/sokol-tools/issues</a>.</p> <p>A couple of unrelated lingering bugs have been fixed as well:</p> <ul> <li>C++ exceptions are now enabled and exceptions coming out of SPIRVCross are now caught and turned into proper error messages. Previously sokol-shdc would simply appear to crash if SPIRVCross emitted an error (because without C++ exceptions enabled, those errors would be turned into a panic which looks like a segfault).</li> <li>Error and warning line numbers had been off by a couple of lines recently. This has been fixed and error messages now point to the correct line again.</li> <li>A couple of somewhat esoteric code generation bugs in non-C code generators were fixed (but as I said, it’s also quite likely that I have introduced new bugs in that area, since code generators were completely rewritten)</li> </ul> <h2 id="whats-next">What’s next:</h2> <p>In short:</p> <ul> <li>A resource binding cleanup (see <a href="https://github.com/floooh/sokol/issues/1037">#1037</a>), the main motivation for this is that the <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct is growing quite large and would grow even larger if a new compute shader stage is added. Furthermore, the artificial separation of shader stages when binding resources also doesn’t map particularly well to some modern 3D APIs.</li> <li>After that it’s finally time to tackle compute shaders. For this I need to come up with a resource synchronization strategy, but I will most likely just copy what WebGPU does.</li> </ul> <p>But first I will probably take a little break and dabble a bit with Zig and emulator coding :)</p> Mon, 06 May 2024 00:00:00 +0000 https://floooh.github.io/2024/05/06/sokol-storage-buffers.html https://floooh.github.io/2024/05/06/sokol-storage-buffers.html Upcoming Sokol header API changes (Feb 2024) <p>In a couple of days I will merge the first big API update of 2024 for sokol_gfx.h (with some related changes in sokol_app.h, sokol_glue.h and sokol_gfx_imgui.h).</p> <blockquote> <p>NOTE: most links to code examples will only point to the right code after <a href="https://github.com/floooh/sokol/pull/985">PR #985</a> has been merged!</p> </blockquote> <p>The API update in sokol_gfx.h is a <strong>BREAKING CHANGE</strong> for all code, but for most use cases the required changes are fairly minimal.</p> <p>Apologies for the broken syntax highlighting, apparently <a href="https://github.com/rouge-ruby/rouge">Rouge</a> doesn’t understand C99.</p> <h2 id="table-of-contents">Table of Contents</h2> <ul id="markdown-toc"> <li><a href="#table-of-contents" id="markdown-toc-table-of-contents">Table of Contents</a></li> <li><a href="#overview-and-motivation" id="markdown-toc-overview-and-motivation">Overview and Motivation</a></li> <li><a href="#detailed-change-list" id="markdown-toc-detailed-change-list">Detailed change list</a> <ul> <li><a href="#sokol_gfxh" id="markdown-toc-sokol_gfxh">sokol_gfx.h</a></li> <li><a href="#sokol_apph" id="markdown-toc-sokol_apph">sokol_app.h</a></li> <li><a href="#sokol_glueh" id="markdown-toc-sokol_glueh">sokol_glue.h</a></li> <li><a href="#sokol_gfx_imguih" id="markdown-toc-sokol_gfx_imguih">sokol_gfx_imgui.h</a></li> </ul> </li> <li><a href="#link-collection-with-example-code-changes" id="markdown-toc-link-collection-with-example-code-changes">Link collection with example code changes</a></li> <li><a href="#detailed-change-recipes" id="markdown-toc-detailed-change-recipes">Detailed Change Recipes</a> <ul> <li><a href="#for-sokol_gfxh--sokol_apph--sokol_glueh" id="markdown-toc-for-sokol_gfxh--sokol_apph--sokol_glueh">…for sokol_gfx.h + sokol_app.h + sokol_glue.h</a></li> <li><a href="#for-offscreen-render-passes" id="markdown-toc-for-offscreen-render-passes">…for offscreen render passes</a></li> <li><a href="#for-custom-window-system-glue" id="markdown-toc-for-custom-window-system-glue">…for custom window system glue</a> <ul> <li><a href="#using-d3d11" id="markdown-toc-using-d3d11">…using D3D11</a></li> <li><a href="#using-metal" id="markdown-toc-using-metal">…using Metal</a></li> <li><a href="#using-webgpu" id="markdown-toc-using-webgpu">…using WebGPU</a></li> <li><a href="#gl-with-glfw" id="markdown-toc-gl-with-glfw">…GL with GLFW</a></li> </ul> </li> </ul> </li> <li><a href="#q-why-still-have-a-baked-pass-attachments-object" id="markdown-toc-q-why-still-have-a-baked-pass-attachments-object">Q: Why still have a baked pass attachments object?</a></li> </ul> <h2 id="overview-and-motivation">Overview and Motivation</h2> <p>The general topic of this update is a cleanup of the sokol-gfx render pass functions and how external swapchain information is passed into sokol-gfx.</p> <p>Previously there was a special ‘default render pass’ into a ‘default framebuffer’, and the concept of ‘contexts’ to allow switching between different rendering contexts and their default framebuffers (very similar to traditional OpenGL contexts, and in fact this old behavior only ever matched OpenGL, but not the other backend APIs).</p> <p>This setup was needlessly complicated for people who want to use sokol-gfx to render into multiple windows, leading to planning <a href="https://github.com/floooh/sokol/issues/904">ticket #904</a>, and then to <a href="https://github.com/floooh/sokol/pull/985">PR #985</a>.</p> <p>The gist is:</p> <ul> <li>There is now only a single ‘unified’ <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> function which covers both rendering into sokol-gfx render target textures (aka ‘offscreen passes’) and externally managed ‘swapchains’ (aka ‘swapchain passes’).</li> <li>The entire concept of <code class="language-plaintext highlighter-rouge">contexts</code> has been removed from sokol_gfx.h.</li> <li>External swapchain properties are now passed directly into <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> in a transient structure.</li> </ul> <p>Instead of having a special and unique ‘default-render-pass’ per frame and context, an application can now simply call <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> multiple times per frame, each time with properties for a different swapchain, and all that without having to create ‘context objects’ upfront or ‘switching contexts’.</p> <p>Most simple applications that don’t render into offscreen passes and use sokol_gfx.h together with sokol_app.h and sokol_glue.h only need to change two calls: <code class="language-plaintext highlighter-rouge">sg_setup()</code> and <code class="language-plaintext highlighter-rouge">sg_begin_default_pass()</code>, for other situations please check the ‘Change Recipes’ section further down.</p> <p>In addition to this blog post, please also re-read the documentation headers in sokol_gfx.h and sokol_app.h, and specifically the struct documentation for the new sokol-gfx structs <code class="language-plaintext highlighter-rouge">sg_environment</code> and <code class="language-plaintext highlighter-rouge">sg_swapchain</code>.</p> <h2 id="detailed-change-list">Detailed change list</h2> <h3 id="sokol_gfxh">sokol_gfx.h</h3> <p>The following public API structs and functions have been <strong>removed</strong>:</p> <ul> <li><code class="language-plaintext highlighter-rouge">sg_begin_default_pass()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_begin_default_passf()</code></li> <li><code class="language-plaintext highlighter-rouge">struct sg_context_desc</code></li> <li><code class="language-plaintext highlighter-rouge">struct sg_context</code></li> <li><code class="language-plaintext highlighter-rouge">sg_setup_context()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_activate_context()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_discard_context()</code></li> </ul> <p>The following top-level structs have been <strong>added</strong>:</p> <ul> <li> <p><code class="language-plaintext highlighter-rouge">struct sg_environment</code>: this is passed as a nested struct of <code class="language-plaintext highlighter-rouge">sg_desc</code> into the <code class="language-plaintext highlighter-rouge">sg_setup()</code> call to provide information about the environment sokol-gfx runs in (most importantly 3D API device pointers).</p> </li> <li> <p><code class="language-plaintext highlighter-rouge">struct sg_swapchain</code>: this is passed into <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> for render passes which should render into an externally managed swapchain. The struct contains the following information:</p> <ul> <li>the pixel format of the swapchain’s rendering surface</li> <li>the pixel format of the optional depth/stencil surface</li> <li>an MSAA sample count</li> <li>3D backend specific resource handles, like D3D11/WebGPU texture views, Metal drawables, or GL framebuffers</li> </ul> </li> </ul> <p>The resource handle type <code class="language-plaintext highlighter-rouge">sg_pass</code> has been <strong>renamed</strong> to <code class="language-plaintext highlighter-rouge">sg_attachments</code> (to free the name for another purpose), this also causes related renames:</p> <ul> <li><code class="language-plaintext highlighter-rouge">sg_pass</code> =&gt; <code class="language-plaintext highlighter-rouge">sg_attachments</code></li> <li><code class="language-plaintext highlighter-rouge">sg_pass_desc</code> =&gt; <code class="language-plaintext highlighter-rouge">sg_attachments_desc</code></li> <li><code class="language-plaintext highlighter-rouge">sg_pass_info</code> =&gt; <code class="language-plaintext highlighter-rouge">sg_attachments_info</code></li> <li><code class="language-plaintext highlighter-rouge">sg_make_pass()</code> =&gt; <code class="language-plaintext highlighter-rouge">sg_make_attachments()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_destroy_pass()</code> =&gt; <code class="language-plaintext highlighter-rouge">sg_destroy_attachments()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_query_pass_state()</code> =&gt; <code class="language-plaintext highlighter-rouge">sg_query_attachments_state()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_query_pass_info()</code> =&gt; <code class="language-plaintext highlighter-rouge">sg_query_attachments_info()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_query_pass_desc()</code> =&gt; <code class="language-plaintext highlighter-rouge">sg_query_attachments_desc()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_alloc_pass()</code> =&gt; <code class="language-plaintext highlighter-rouge">sg_alloc_attachments()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_dealloc_pass()</code> =&gt; <code class="language-plaintext highlighter-rouge">sg_dealloc_attachments()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_init_pass()</code> =&gt; <code class="language-plaintext highlighter-rouge">sg_init_attachments()</code></li> <li><code class="language-plaintext highlighter-rouge">sg_fail_pass()</code> =&gt; <code class="language-plaintext highlighter-rouge">sg_fail_attachments()</code></li> <li> <table> <tbody> <tr> <td><code class="language-plaintext highlighter-rouge">sg_[*]_pass_info()</code> =&gt; <code class="language-plaintext highlighter-rouge">sg_[*]_attachments_info()</code> (where ‘*’ is ‘d3d11</td> <td>gl</td> <td>metal</td> <td>wgpu’)</td> </tr> </tbody> </table> </li> </ul> <p>Inside the <code class="language-plaintext highlighter-rouge">sg_attachments_desc</code> struct there has been some renaming to reduce redundancy:</p> <ul> <li><code class="language-plaintext highlighter-rouge">.color_attachments[]</code> =&gt; <code class="language-plaintext highlighter-rouge">.colors[]</code></li> <li><code class="language-plaintext highlighter-rouge">.resolve_attachments[]</code> =&gt; <code class="language-plaintext highlighter-rouge">.resolves[]</code></li> <li><code class="language-plaintext highlighter-rouge">.depth_stencil_attachment</code> =&gt; <code class="language-plaintext highlighter-rouge">.depth_stencil</code></li> </ul> <p>The typename <code class="language-plaintext highlighter-rouge">sg_pass</code> has been repurposed to serve as the <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> parameter, e.g. the begin-pass function signature now looks like this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">sg_begin_pass</span><span class="p">(</span><span class="k">const</span> <span class="n">sg_pass</span><span class="o">*</span> <span class="n">pass</span><span class="p">);</span> </code></pre></div></div> <p>With the struct <code class="language-plaintext highlighter-rouge">sg_pass</code> now looking like this (with omitted start/end canaries):</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">sg_pass</span> <span class="p">{</span> <span class="n">sg_pass_action</span> <span class="n">action</span><span class="p">;</span> <span class="n">sg_attachments</span> <span class="n">attachments</span><span class="p">;</span> <span class="n">sg_swapchain</span> <span class="n">swapchain</span><span class="p">;</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">label</span><span class="p">;</span> <span class="p">}</span> <span class="n">sg_pass</span><span class="p">;</span> </code></pre></div></div> <p>For an ‘offscreen-render-pass’, an <code class="language-plaintext highlighter-rouge">.attachments</code> item must be provided, but no <code class="language-plaintext highlighter-rouge">.swapchain</code>:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">action</span> <span class="o">=</span> <span class="n">pass_action</span><span class="p">,</span> <span class="p">.</span><span class="n">attachments</span> <span class="o">=</span> <span class="n">attachments</span><span class="p">,</span> <span class="p">});</span> </code></pre></div></div> <p>…and for a ‘swapchain-render-pass’, a <code class="language-plaintext highlighter-rouge">.swapchain</code> item must be provided, but no <code class="language-plaintext highlighter-rouge">.attachments</code>:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">action</span> <span class="o">=</span> <span class="n">pass_action</span><span class="p">,</span> <span class="p">.</span><span class="n">swapchain</span> <span class="o">=</span> <span class="n">sglue_swapchain</span><span class="p">(),</span> <span class="p">});</span> </code></pre></div></div> <p>Other unrelated ‘drive-by-changes’ in sokol_gfx.h:</p> <ul> <li><code class="language-plaintext highlighter-rouge">sg_limits.gl_max_vertex_uniform_vectors</code> has been replaced with <code class="language-plaintext highlighter-rouge">sg_limits.gl_max_vertex_uniform_components</code> (see <a href="https://github.com/floooh/sokol/issues/714">#714</a>)</li> <li>the start and end canaries in <code class="language-plaintext highlighter-rouge">sg_pass_action</code> have been removed (since <code class="language-plaintext highlighter-rouge">sg_pass_action</code> is now a nested struct of <code class="language-plaintext highlighter-rouge">sg_pass</code>, the canaries are redundant)</li> <li>a new initialization config item <code class="language-plaintext highlighter-rouge">sg_desc.mtl_use_command_buffer_with_retained_references</code> has been added, (see: <a href="https://github.com/floooh/sokol/issues/981">#981</a>)</li> </ul> <h3 id="sokol_apph">sokol_app.h</h3> <p>The following public API function has been removed:</p> <ul> <li><code class="language-plaintext highlighter-rouge">sapp_metal_get_renderpass_descriptor()</code></li> </ul> <p>The following functions have been renamed:</p> <ul> <li><code class="language-plaintext highlighter-rouge">sapp_metal_get_drawable()</code> =&gt; <code class="language-plaintext highlighter-rouge">sapp_metal_get_current_drawable()</code></li> <li><code class="language-plaintext highlighter-rouge">sapp_d3d11_get_render_target_view()</code> =&gt; <code class="language-plaintext highlighter-rouge">sapp_d3d11_get_render_view()</code></li> </ul> <p>…and the following functions are new:</p> <ul> <li><code class="language-plaintext highlighter-rouge">sapp_metal_get_depth_stencil_texture()</code></li> <li><code class="language-plaintext highlighter-rouge">sapp_metal_get_msaa_color_texture()</code></li> <li><code class="language-plaintext highlighter-rouge">sapp_d3d11_get_resolve_view()</code></li> <li><code class="language-plaintext highlighter-rouge">sapp_gl_get_framebuffer()</code></li> </ul> <p>…These functions directly plug into the new <code class="language-plaintext highlighter-rouge">sg_swapchain</code> struct in sokol_gfx.h.</p> <h3 id="sokol_glueh">sokol_glue.h</h3> <p>sokol_glue.h is now a regular library header without the ‘preprocessor magic’ which created a different API depending on what other sokol headers had been included before sokol_glue.h (this was an ‘interesting’ but ultimately pretty stupid idea).</p> <p>The API prefix has changed from a somewhat confusing <code class="language-plaintext highlighter-rouge">sapp_</code> to the expected <code class="language-plaintext highlighter-rouge">sglue_</code>.</p> <p>The old function <code class="language-plaintext highlighter-rouge">sapp_sgcontext()</code> has been split into two new functions:</p> <ul> <li><code class="language-plaintext highlighter-rouge">sglue_environment()</code> which plugs directly into <code class="language-plaintext highlighter-rouge">sg_desc.environment</code>, and…</li> <li><code class="language-plaintext highlighter-rouge">sglue_swapchain()</code> which plugs into <code class="language-plaintext highlighter-rouge">sg_pass.swapchain</code></li> </ul> <p>Note that <code class="language-plaintext highlighter-rouge">sglue_swapchain()</code> may return different values each frame depending on the 3D API backend.</p> <h3 id="sokol_gfx_imguih">sokol_gfx_imgui.h</h3> <p>In a similar vein, the public API prefix of sokol_gfx_imgui.h has been changed from the weird ‘double prefix’ <code class="language-plaintext highlighter-rouge">sg_imgui_</code> to a more conventional <code class="language-plaintext highlighter-rouge">sgimgui_</code>.</p> <p>Apart from this publicly visible change, all the internals have been updated to reflect the sokol-gfx API changes.</p> <h2 id="link-collection-with-example-code-changes">Link collection with example code changes</h2> <p>If you use sokol_gfx.h + sokol_app.h + sokol_glue.h, check out the updated samples here (first click on a sample, and then on the ‘src’ link at the bottom):</p> <ul> <li><a href="https://floooh.github.io/sokol-html5/">sokol samples</a></li> </ul> <p>Specifically look at <a href="https://floooh.github.io/sokol-html5/clear-sapp.html">clear-sapp</a> for the simple case of only rendering to a default framebuffer, and <a href="https://floooh.github.io/sokol-html5/offscreen-sapp.html">offscreen-sapp</a> for rendering to an offscreen render target.</p> <p>If you use sokol_gfx.h with your own window system glue, or a library like GLFW or SDL, check out the updated backend specific examples:</p> <ul> <li>for D3D11: <a href="https://github.com/floooh/sokol-samples/tree/master/d3d11">https://github.com/floooh/sokol-samples/tree/master/d3d11</a></li> <li>for Metal: <a href="https://github.com/floooh/sokol-samples/tree/master/metal">https://github.com/floooh/sokol-samples/tree/master/metal</a></li> <li>for GL with GLFW: <a href="https://github.com/floooh/sokol-samples/tree/master/glfw">https://github.com/floooh/sokol-samples/tree/master/glfw</a></li> <li>for WebGL2: <a href="https://github.com/floooh/sokol-samples/tree/master/html5">https://github.com/floooh/sokol-samples/tree/master/html5</a></li> <li>for WebGPU: <a href="https://github.com/floooh/sokol-samples/tree/master/wgpu">https://github.com/floooh/sokol-samples/tree/master/wgpu</a></li> </ul> <p>The GLFW subdirectory also contains an updated <code class="language-plaintext highlighter-rouge">multiwindow-glfw</code> sample, and a <code class="language-plaintext highlighter-rouge">metal-glfw</code> sample which demonstrates how to use GLFW in NO_API mode together with the sokol_gfx.h Metal backend.</p> <p>Also please be aware of the following behaviour and expectation changes if you are using your own window system glue:</p> <ul> <li> <p>For <strong>D3D11/DXGI</strong> the MSAA resolve operation is now performed in <code class="language-plaintext highlighter-rouge">sg_end_pass()</code>, previously this was expected to be performed in the window system glue before presentation.</p> </li> <li> <p>For <strong>Metal</strong> it is now expected that the window system glue provides a <code class="language-plaintext highlighter-rouge">CAMetalDrawable</code> and optional <code class="language-plaintext highlighter-rouge">MTLTexture</code> objects instead of an <code class="language-plaintext highlighter-rouge">MTLRenderPassDescriptor</code>. This was also done to better ‘harmonize’ with the other backends (it’s just as easy getting those individual objects from an <code class="language-plaintext highlighter-rouge">MTKView</code> as the <code class="language-plaintext highlighter-rouge">MTLRenderPassDescriptor</code>).</p> </li> <li> <p>For <strong>GL</strong>, sokol-gfx now expects that <em>all</em> rendering goes through a single GL context. This may require changes to existing code which renders into multiple windows (for instance in GLFW, every window has its own GL context). Refer to the new <a href="https://github.com/floooh/sokol-samples/blob/master/glfw/multiwindow-glfw.c">multiwindow-glfw.c</a> example for a possible solution.</p> </li> </ul> <p>Additionally, check out the following PRs for required changes in my toy projects:</p> <ul> <li><a href="https://github.com/floooh/pacman.c/pull/12">pacman.c</a></li> <li><a href="https://github.com/floooh/doom-sokol/pull/1">Doom on Sokol</a></li> <li><a href="https://github.com/floooh/v6502r/pull/24/files">Visual 6502 Remix</a></li> <li><a href="https://github.com/floooh/qoiview/pull/10">qoiview</a></li> <li><a href="https://github.com/floooh/chips-test/pull/33">chips</a></li> </ul> <p>When using the language bindings, check out the following PRs:</p> <ul> <li><a href="https://github.com/floooh/sokol-zig/pull/57/files">sokol-zig</a></li> <li><a href="https://github.com/floooh/sokol-odin/pull/8">sokol-odin</a></li> <li><a href="https://github.com/floooh/sokol-nim/pull/28">sokol-nim</a></li> <li><a href="https://github.com/floooh/sokol-rust/pull/22">sokol-rust</a></li> <li><a href="https://github.com/floooh/pacman.zig/pull/23">pacman.zig</a></li> <li><a href="https://github.com/floooh/kc85.zig/pull/4">kc85.zig</a></li> </ul> <h2 id="detailed-change-recipes">Detailed Change Recipes</h2> <h3 id="for-sokol_gfxh--sokol_apph--sokol_glueh">…for sokol_gfx.h + sokol_app.h + sokol_glue.h</h3> <p>When using sokol_gfx.h together with sokol_app.h and sokol_glue.h…</p> <p>…change your <code class="language-plaintext highlighter-rouge">sg_setup()</code> call from this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_setup</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">context</span> <span class="o">=</span> <span class="n">sapp_sgcontext</span><span class="p">(),</span> <span class="p">.</span><span class="n">logger</span><span class="p">.</span><span class="n">func</span> <span class="o">=</span> <span class="n">slog_func</span><span class="p">,</span> <span class="p">});</span> </code></pre></div></div> <p>…to this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_setup</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">environment</span> <span class="o">=</span> <span class="n">sglue_environment</span><span class="p">(),</span> <span class="p">.</span><span class="n">logger</span><span class="p">.</span><span class="n">func</span> <span class="o">=</span> <span class="n">slog_func</span><span class="p">,</span> <span class="p">});</span> </code></pre></div></div> <p>Change the <code class="language-plaintext highlighter-rouge">sg_begin_default_pass()</code> call from this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_default_pass</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pass_action</span><span class="p">,</span> <span class="n">sapp_width</span><span class="p">(),</span> <span class="n">sapp_height</span><span class="p">());</span> </code></pre></div></div> <p>…to this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">action</span> <span class="o">=</span> <span class="n">pass_action</span><span class="p">,</span> <span class="p">.</span><span class="n">swapchain</span> <span class="o">=</span> <span class="n">sglue_swapchain</span><span class="p">()</span> <span class="p">});</span> </code></pre></div></div> <h3 id="for-offscreen-render-passes">…for offscreen render passes</h3> <p>Change <code class="language-plaintext highlighter-rouge">sg_make_pass()</code> calls from this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_pass</span> <span class="n">pass</span> <span class="o">=</span> <span class="n">sg_make_pass</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pass_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">color_attachments</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">image</span> <span class="o">=</span> <span class="n">color_img</span><span class="p">,</span> <span class="p">.</span><span class="n">resolve_attachments</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">image</span> <span class="o">=</span> <span class="n">resolve_img</span><span class="p">,</span> <span class="p">.</span><span class="n">depth_stencil_attachment</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">depth_img</span><span class="p">,</span> <span class="p">});</span> </code></pre></div></div> <p>…to this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_attachments</span> <span class="n">attachments</span> <span class="o">=</span> <span class="n">sg_make_attachments</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_attachments_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">colors</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">image</span> <span class="o">=</span> <span class="n">color_img</span><span class="p">,</span> <span class="p">.</span><span class="n">resolves</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">image</span> <span class="o">=</span> <span class="n">resolve_img</span><span class="p">,</span> <span class="p">.</span><span class="n">depth_stencil</span><span class="p">.</span><span class="n">image</span> <span class="o">=</span> <span class="n">depth_img</span><span class="p">,</span> <span class="p">});</span> </code></pre></div></div> <p>Change <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> calls from this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="n">pass</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">pass_action</span><span class="p">);</span> </code></pre></div></div> <p>…to this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">action</span> <span class="o">=</span> <span class="n">pass_action</span><span class="p">,</span> <span class="p">.</span><span class="n">attachments</span> <span class="o">=</span> <span class="n">attachments</span><span class="p">,</span> <span class="p">});</span> </code></pre></div></div> <h3 id="for-custom-window-system-glue">…for custom window system glue</h3> <p>Create two helper functions, one which returns an initialized <code class="language-plaintext highlighter-rouge">sg_environment</code> struct and one which returns an initialized <code class="language-plaintext highlighter-rouge">sg_swapchain</code> struct. Following are examples how these functions might look like for different backend 3D APIs.</p> <h4 id="using-d3d11">…using D3D11</h4> <p>Example implementations:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_environment</span> <span class="nf">d3d11_environment</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="p">(</span><span class="n">sg_environment</span><span class="p">){</span> <span class="p">.</span><span class="n">defaults</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_BGRA8</span><span class="p">,</span> <span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_DEPTH_STENCIL</span><span class="p">,</span> <span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">d3d11</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">device</span> <span class="o">=</span> <span class="n">d3d11_device</span><span class="p">,</span> <span class="c1">// ID3D11Device*</span> <span class="p">.</span><span class="n">device_context</span> <span class="o">=</span> <span class="n">d3d11_device_context</span><span class="p">,</span> <span class="c1">// ID3D11DeviceContext*</span> <span class="p">}</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div></div> <p><code class="language-plaintext highlighter-rouge">.defaults.color_format</code>, <code class="language-plaintext highlighter-rouge">defaults.depth_format</code> and <code class="language-plaintext highlighter-rouge">defaults.sample_count</code> should match the ‘most common’ swapchain surface properties. These defaults will be used to fill in defaults for zero-initialized values in various sokol-gfx calls. <code class="language-plaintext highlighter-rouge">.depth_format</code> can also be <code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_NONE</code> if no depth-buffer exists, or <code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH</code> if no stencil buffer is used.</p> <p>The associated DXGI depth-stencil-view pixel formats are:</p> <ul> <li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH_STENCIL</code> =&gt; <code class="language-plaintext highlighter-rouge">DXGI_FORMAT_D24_UNORM_S8_UINT</code></li> <li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH</code> =&gt; <code class="language-plaintext highlighter-rouge">DXGI_FORMAT_D32_FLOAT</code></li> </ul> <p>The helper function to obtain an <code class="language-plaintext highlighter-rouge">sg_swapchain</code> struct might look like this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_swapchain</span> <span class="nf">d3d11_swapchain</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="p">(</span><span class="n">sg_swapchain</span><span class="p">){</span> <span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">width</span><span class="p">,</span> <span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">height</span><span class="p">,</span> <span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">sample_count</span><span class="p">,</span> <span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_BGRA8</span><span class="p">,</span> <span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_DEPTH_STENCIL</span><span class="p">,</span> <span class="p">.</span><span class="n">d3d11</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">render_view</span> <span class="o">=</span> <span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">sample_count</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="o">?</span> <span class="n">state</span><span class="p">.</span><span class="n">rt_view</span> <span class="o">:</span> <span class="n">state</span><span class="p">.</span><span class="n">msaa_view</span><span class="p">,</span> <span class="p">.</span><span class="n">resolve_view</span> <span class="o">=</span> <span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">sample_count</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="o">?</span> <span class="mi">0</span> <span class="o">:</span> <span class="n">state</span><span class="p">.</span><span class="n">rt_view</span><span class="p">,</span> <span class="p">.</span><span class="n">depth_stencil_view</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">ds_view</span><span class="p">,</span> <span class="p">}</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div></div> <p><code class="language-plaintext highlighter-rouge">state.rt_view</code> and <code class="language-plaintext highlighter-rouge">state.msaa_view</code> are of type <code class="language-plaintext highlighter-rouge">ID3D11RenderTargetView</code> and <code class="language-plaintext highlighter-rouge">state.ds_view</code> is of type <code class="language-plaintext highlighter-rouge">ID3D11DepthStencilView</code>.</p> <p>Note how a different <code class="language-plaintext highlighter-rouge">.d3d11.render_view</code> is selected depending on whether multisampled rendering is used or not. For non-multisampled rendering, sokol-gfx renders into the same view that’s presented. For multisampled rendering, sokol-gfx will render into an intermediate MSAA texture view (<code class="language-plaintext highlighter-rouge">state.msaa_view</code>) which is then resolved into the <code class="language-plaintext highlighter-rouge">d3d11.resolve_view</code> inside <code class="language-plaintext highlighter-rouge">sg_end_pass()</code>.</p> <p>Also check out the example D3D11 window system glue code here:</p> <p><a href="https://github.com/floooh/sokol-samples/blob/master/d3d11/d3d11entry.c">https://github.com/floooh/sokol-samples/blob/master/d3d11/d3d11entry.c</a></p> <h4 id="using-metal">…using Metal</h4> <p>Example function which returns an initialized <code class="language-plaintext highlighter-rouge">sg_environment</code> struct:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_environment</span> <span class="nf">osx_environment</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="p">(</span><span class="n">sg_environment</span><span class="p">)</span> <span class="p">{</span> <span class="p">.</span><span class="n">defaults</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="n">sample_count</span><span class="p">,</span> <span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_BGRA8</span><span class="p">,</span> <span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_DEPTH_STENCIL</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">metal</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">device</span> <span class="o">=</span> <span class="p">(</span><span class="n">__bridge</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="n">mtl_device</span><span class="p">,</span> <span class="p">}</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div></div> <p>The ObjC type of <code class="language-plaintext highlighter-rouge">mtl_device</code> is <code class="language-plaintext highlighter-rouge">id&lt;MTLDevice&gt;</code>. Note the special <code class="language-plaintext highlighter-rouge">__bridge</code> cast to a void pointer for tunneling through the sokol_app.h and sokol_gfx.h C APIs.</p> <p>…and the function which returns an <code class="language-plaintext highlighter-rouge">sg_swapchain</code> struct (in this case using an <code class="language-plaintext highlighter-rouge">MTKView</code> to manage the swapchain surfaces):</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_swapchain</span> <span class="nf">osx_swapchain</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="p">(</span><span class="n">sg_swapchain</span><span class="p">)</span> <span class="p">{</span> <span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span> <span class="p">[</span><span class="n">mtk_view</span> <span class="n">drawableSize</span><span class="p">].</span><span class="n">width</span><span class="p">,</span> <span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span> <span class="p">[</span><span class="n">mtk_view</span> <span class="n">drawableSize</span><span class="p">].</span><span class="n">height</span><span class="p">,</span> <span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="n">sample_count</span><span class="p">,</span> <span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_BGRA8</span><span class="p">,</span> <span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_DEPTH_STENCIL</span><span class="p">,</span> <span class="p">.</span><span class="n">metal</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">current_drawable</span> <span class="o">=</span> <span class="p">(</span><span class="n">__bridge</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="p">[</span><span class="n">mtk_view</span> <span class="n">currentDrawable</span><span class="p">],</span> <span class="p">.</span><span class="n">depth_stencil_texture</span> <span class="o">=</span> <span class="p">(</span><span class="n">__bridge</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="p">[</span><span class="n">mtk_view</span> <span class="n">depthStencilTexture</span><span class="p">],</span> <span class="p">.</span><span class="n">msaa_color_texture</span> <span class="o">=</span> <span class="p">(</span><span class="n">__bridge</span> <span class="k">const</span> <span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="p">[</span><span class="n">mtk_view</span> <span class="n">multisampleColorTexture</span><span class="p">],</span> <span class="p">}</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div></div> <p>Also check out the Metal window system glue code here:</p> <p><a href="https://github.com/floooh/sokol-samples/blob/master/metal/osxentry.m">https://github.com/floooh/sokol-samples/blob/master/metal/osxentry.m</a></p> <p>…alternatively check out the GLFW+Metal example here which doesn’t use an MTKView (but also doesn’t support a depth-buffer or MSAA rendering):</p> <p><a href="https://github.com/floooh/sokol-samples/blob/master/glfw/metal-glfw.m">https://github.com/floooh/sokol-samples/blob/master/glfw/metal-glfw.m</a></p> <h4 id="using-webgpu">…using WebGPU</h4> <p>The environment- and swapchain-helper-functions look very similar to D3D11:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_environment</span> <span class="nf">wgpu_environment</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="p">(</span><span class="n">sg_environment</span><span class="p">)</span> <span class="p">{</span> <span class="p">.</span><span class="n">defaults</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_</span><span class="p">...,</span> <span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_</span><span class="p">...,</span> <span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">desc</span><span class="p">.</span><span class="n">sample_count</span><span class="p">,</span> <span class="p">},</span> <span class="p">.</span><span class="n">wgpu</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">device</span> <span class="o">=</span> <span class="p">(</span><span class="k">const</span> <span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="n">state</span><span class="p">.</span><span class="n">device</span><span class="p">,</span> <span class="p">}</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div></div> <p>For <code class="language-plaintext highlighter-rouge">.defaults.color_format</code> you should use the result of <code class="language-plaintext highlighter-rouge">wgpuSurfaceGetPreferredFormat()</code> translated to a sokol-gfx pixel format (either <code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_BGRA8</code> or <code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_RGBA8</code>).</p> <p>For the depth format use either <code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH_STENCIL</code>, <code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH</code> or <code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_NONE</code>, which translate to WebGPU pixel formats as follows:</p> <ul> <li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH_STENCIL</code> =&gt; <code class="language-plaintext highlighter-rouge">WGPUTextureFormat_Depth32FloatStencil8</code></li> <li><code class="language-plaintext highlighter-rouge">SG_PIXELFORMAT_DEPTH</code> =&gt; <code class="language-plaintext highlighter-rouge">WGPUTextureFormat_Depth32Float</code></li> </ul> <p>The type of <code class="language-plaintext highlighter-rouge">state.device</code> is <code class="language-plaintext highlighter-rouge">WGPUDevice</code>.</p> <p>The WebGPU swapchain helper function might look like this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_swapchain</span> <span class="nf">wgpu_swapchain</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="p">(</span><span class="n">sg_swapchain</span><span class="p">)</span> <span class="p">{</span> <span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">width</span><span class="p">,</span> <span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">height</span><span class="p">,</span> <span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">sample_count</span><span class="p">,</span> <span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_</span><span class="p">...,</span> <span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_</span><span class="p">...,</span> <span class="p">.</span><span class="n">wgpu</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">render_view</span> <span class="o">=</span> <span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">sample_count</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="n">state</span><span class="p">.</span><span class="n">rt_view</span> <span class="o">:</span> <span class="n">state</span><span class="p">.</span><span class="n">msaa_view</span><span class="p">,</span> <span class="p">.</span><span class="n">resolve_view</span> <span class="o">=</span> <span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">sample_count</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="o">?</span> <span class="mi">0</span> <span class="o">:</span> <span class="n">state</span><span class="p">.</span><span class="n">rt_view</span><span class="p">,</span> <span class="p">.</span><span class="n">depth_stencil_view</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">ds_view</span><span class="p">,</span> <span class="p">}</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div></div> <p>…note the selection for <code class="language-plaintext highlighter-rouge">.wgpu.render_view</code> and <code class="language-plaintext highlighter-rouge">.wgpu.resolve_view</code> based on the MSAA sample count, which works the same as in the <code class="language-plaintext highlighter-rouge">d3d11_swapchain()</code> function.</p> <p>The types for all view objects are <code class="language-plaintext highlighter-rouge">WGPUTextureView</code>.</p> <p>Also check out the WebGPU system glue code here:</p> <p><a href="https://github.com/floooh/sokol-samples/blob/master/wgpu/wgpu_entry.c">https://github.com/floooh/sokol-samples/blob/master/wgpu/wgpu_entry.c</a></p> <h4 id="gl-with-glfw">…GL with GLFW</h4> <p>The environment-helper-function only returns default pixel formats and sample count:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_environment</span> <span class="nf">glfw_environment</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="p">(</span><span class="n">sg_environment</span><span class="p">)</span> <span class="p">{</span> <span class="p">.</span><span class="n">defaults</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_RGBA8</span><span class="p">,</span> <span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_DEPTH_STENCIL</span><span class="p">,</span> <span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span> <span class="p">},</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div></div> <p>…the swapchain function also returns a GL framebuffer object, for the default framebuffer this is always zero, otherwise this is a handle created with <code class="language-plaintext highlighter-rouge">glGenFramebuffers()</code>.</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sg_swapchain</span> <span class="nf">glfw_swapchain</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">;</span> <span class="n">glfwGetFramebufferSize</span><span class="p">(</span><span class="n">_window</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">width</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">height</span><span class="p">);</span> <span class="k">return</span> <span class="p">(</span><span class="n">sg_swapchain</span><span class="p">)</span> <span class="p">{</span> <span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="n">width</span><span class="p">,</span> <span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="n">height</span><span class="p">,</span> <span class="p">.</span><span class="n">sample_count</span> <span class="o">=</span> <span class="n">_sample_count</span><span class="p">,</span> <span class="p">.</span><span class="n">color_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_RGBA8</span><span class="p">,</span> <span class="p">.</span><span class="n">depth_format</span> <span class="o">=</span> <span class="n">SG_PIXELFORMAT_DEPTH_STENCIL</span><span class="p">,</span> <span class="p">.</span><span class="n">gl</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">framebuffer</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="p">}</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div></div> <p>Also see <a href="https://github.com/floooh/sokol-samples/blob/master/glfw/glfw_glue.c">https://github.com/floooh/sokol-samples/blob/master/glfw/glfw_glue.c</a></p> <h2 id="q-why-still-have-a-baked-pass-attachments-object">Q: Why still have a baked pass attachments object?</h2> <p>I’ve been pondering for a little bit to get rid of pre-baked pass-attachments objects alltogether (e.g. what were formerly <code class="language-plaintext highlighter-rouge">sg_pass</code> objects and are now <code class="language-plaintext highlighter-rouge">sg_attachments</code> objects), and instead pass a transient struct with the same information that’s in <code class="language-plaintext highlighter-rouge">sg_attachments_desc</code> into the <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> function, similar to how <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> takes a transient <code class="language-plaintext highlighter-rouge">sg_bindings</code> struct with all the resource bindings.</p> <p>I didn’t follow through with that idea because this would mean creating temporary objects inside <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> and discarding them again in <code class="language-plaintext highlighter-rouge">sg_end_pass()</code> (or alternatively use a ‘hash-and-cache’ approach).</p> <p>In D3D11 and WebGPU, one temporary texture view object would need to be created per pass-attachment (which may add up to 9 temporary objects), and in the GL backend, a GL framebuffer object must be created, configured and checked for completeness. All this work currently only happens once in <code class="language-plaintext highlighter-rouge">sg_make_attachments()</code>, but would need to happen inside <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> without baked attachments objects.</p> <p>While these backend API objects should be ‘reasonably cheap’ to create, I still decided against it.</p> <p>Currently the only other place where such temporary objects are created and discarded on the fly are in the <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code> call for the WebGPU backend, where temporary BindGroup objects are created and discarded dynamically via a ‘hash-and-cache’ approach and I hate it :) I don’t want that type of code to creep into other places.</p> <p>Now, <code class="language-plaintext highlighter-rouge">sg_begin_pass()</code> and <code class="language-plaintext highlighter-rouge">sg_end_pass()</code> are by far not as high-frequency-calls as <code class="language-plaintext highlighter-rouge">sg_apply_bindings()</code>, and creating view- and framebuffer-objects <em>should</em> be cheap enough, but it still feels ‘wrong’ to create and discard backend API objects willy-nilly during the frame.</p> Mon, 26 Feb 2024 00:00:00 +0000 https://floooh.github.io/2024/02/26/sokol-spring-cleaning-2024.html https://floooh.github.io/2024/02/26/sokol-spring-cleaning-2024.html VSCode, WASM, WASI <p>I did a neat little thing during my year-end vacation: A VSCode extension for retro-assembly coding with the assembler and home computer emulator integrated right into VSCode via WASM and WASI.</p> <p>The extension is here (careful: it must be installed as <strong>pre-release</strong>, otherwise installing a dependency extension won’t work, more on that later):</p> <p><a href="https://marketplace.visualstudio.com/items?itemName=floooh.vscode-kcide">https://marketplace.visualstudio.com/items?itemName=floooh.vscode-kcide</a></p> <p>This is what it looks like in action when debugging a KC85/4 demo I wrote for dog-fooding the extension:</p> <p><img src="/images/vscode-wasm-wasi-1.webp" alt="Screenshot 1" /></p> <p>The VSCode extension project is here:</p> <p><a href="https://github.com/floooh/vscode-kcide">https://github.com/floooh/vscode-kcide</a></p> <p>…and the samples for KC85/4, C64 and Amstrad CPC are here:</p> <p><a href="https://github.com/floooh/kcide-sample">https://github.com/floooh/kcide-sample</a></p> <p>The extension also integrates the following projects:</p> <ul> <li><a href="https://github.com/floooh/easmx">a fork</a> of the <a href="http://svn.xi6.com/svn/asmx/branches/2.x/asmx-doc.html">ASMX multi-cpu assembler</a></li> <li>the KC85/4, C64 and CPC emulators from my <a href="https://floooh.github.io/tiny8bit/">chips project</a></li> </ul> <p>Creating a simple VSCode extension is fairly straightforward (see: <a href="https://code.visualstudio.com/api/get-started/your-first-extension">Your First Extension</a>), so I won’t go into too many details there. What’s interesting is the use of WASM and WASI to integrate projects written in other languages than JS/TS into a VSCode extension.</p> <p>This allows to bundle the assembler (written in C89) and the emulator (C99 and C++11) directly with the extension as WASM blobs. Similar extensions without WASM components would either need to port the assembler and emulator to JS/TS, ask the user to install and run native tools (most other retro-dev extensions seem to use that approach), or automatically download and install separate platform-specific native tools (the approach used by the Microsoft C/C++ extension), which is asking for a lot of trust from the extension user.</p> <p>WASM fixes all those issues:</p> <ul> <li>it’s completely hassle-free for the user because the WASM blobs can be bundled with the extension and everything works out of the box</li> <li>it’s less hassle for the extension developer, because a single WASM blob automatically works on all platforms supported by VSCode (including the VSCode web version)</li> <li>…and unlike native binaries, WASM and WASI don’t add any more security concerns over regular VSCode extensions written in TS/JS</li> </ul> <p>Also, how cool is it that I can take an assembler written in C89 in the 90’s and safely run that without code changes in the VSCode web version?</p> <p>(I <strong>did</strong> actually consider writing my own assembler in Typescript a long time ago just for the purpose of running it in VSCode but quickly abandondend that idea, here are the ruins of that folly: <a href="https://github.com/floooh/hcasm">https://github.com/floooh/hcasm</a>)</p> <h2 id="paths-not-taken">Paths not taken</h2> <p>I considered various approaches:</p> <ol> <li>a native IDE via Qt similar to Goran Devic’s <a href="https://baltazarstudios.com/z80explorer/">Z80 Explorer</a></li> <li>integrate the IDE features right into the emulator via <a href="https://github.com/ocornut/imgui">Dear ImGui</a> (the emulators already have an extensive Dear ImGui debugging UI)</li> <li>create a VSCode extension which calls into an assembler and emulator written in Typescript</li> <li>create a VSCode extension which calls into native assembler and emulator binaries</li> <li>create a VSCode extension which uses WASM for the assembler and emulator</li> </ol> <p>The final decision to use VSCode with WASM comes down to a couple of central problems:</p> <ul> <li>dealing with native tools in a cross-platform scenario is a massive PITA these days: <ul> <li>running the same binary across different Linux distros is still pretty much an unsolved problem</li> <li>on Windows and macOS you’ll get all sorts of scare popups when trying to run an executable downloaded from the internet</li> </ul> </li> <li>porting a code base to TS/JS just so that it can be hooked up into a VSCode extension is almost always a massive waste of time</li> </ul> <p>In the end it was a decision between (2: extend the existing Dear ImGui emulator UI with IDE features), and (4: figure out how to integrate the assembler and emulator as WASM blobs into a VSCode extension).</p> <p>While I enjoy writing Dear ImGui UIs immensely, a robust text editing experience which can rival a dedicated text editor like VSCode would be a massive project on its own.</p> <p>…which leaves (4) as the one option which enables the most robust result for the least amount of work (important, since this is a ‘vacation side project’ which shouldn’t increase my spare time software maintenance burden even more).</p> <p>All in all the extension was finished in about 3 weeks of focused work (spread over 6 real-world weeks, with 2 weeks spent dog-fooding on a little <a href="https://floooh.github.io/kcide-sample/kc854.html?file=demo.kcc">KC85/4 assembly demo</a>).</p> <p>Of the 3 weeks working on the VSCode extension, about 2 weeks were spent on the Debug Adapter alone (a lot more effort than I initially expected).</p> <h2 id="the-boring-parts">The boring parts</h2> <p>I’ll run very quickly over the parts of the extension that are not all that interesting (since all of that is just reading the <a href="https://code.visualstudio.com/api">VSCode extension documentation</a> about what features can be provided by extensions and how to implement them).</p> <p>The KC IDE extension implements:</p> <ul> <li>a handful of <strong>Commands</strong> which can be invoked via the <code class="language-plaintext highlighter-rouge">Ctrl-P</code> command palette: <ul> <li><code class="language-plaintext highlighter-rouge">KCIDE: Build</code>: assembles the source code into a binary file compatible with the current emulator</li> <li><code class="language-plaintext highlighter-rouge">KCIDE: Debug</code>: builds the source and starts a debugging session</li> <li><code class="language-plaintext highlighter-rouge">KCIDE: Open Emulator</code>: (re-)opens the emulator tab</li> <li><code class="language-plaintext highlighter-rouge">KCIDE: Reboot Emulator</code>: cold-boots the emulator and stops active debug session</li> <li><code class="language-plaintext highlighter-rouge">KCIDE: Reset Emulator</code>: resets the emulator and stops active debug session (on some home computers, a reset preserves the memory content)</li> </ul> </li> <li>two <strong>Key Bindings</strong>: <code class="language-plaintext highlighter-rouge">F5</code> to start a debug session and <code class="language-plaintext highlighter-rouge">F7</code> to build the project source code into a binary file</li> <li>a <strong>JSON Schema</strong> for a <code class="language-plaintext highlighter-rouge">kcide.project.json</code> file which defines the target computer system, assembly dialect, file paths and output binary file format loadable by the emulator</li> <li>a <strong>Language Grammar</strong> for regex-based syntax highlighting (Z80 and 6502 assembly statements, plus ASMX-specific keywords)</li> <li>a <strong>Debug Adapter</strong> to connect the VSCode debugging UI with the (already existing) debugger that’s integrated into the emulator</li> </ul> <p>Some notable VSCode extension features which are <strong>not</strong> implemented:</p> <ul> <li>No <strong>Language Server</strong> (to provide error squiggles and code completion while typing), the LSP protocol is a bit of overkill for low level languages like assembly, while it would have been a ‘nice to have’ feature, it wasn’t doable in the available time, and features similar to a full LSP can most likely also be implemented without a full LSP implementation (VSCode has a couple of other language features like <a href="https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide">semantic highlighting</a>, <a href="https://code.visualstudio.com/api/language-extensions/snippet-guide">snippets</a> or <a href="https://code.visualstudio.com/api/language-extensions/programmatic-language-features">programmatic language features</a>). In the end I simply ran out of time, maybe in the next round of updates…</li> <li>No <strong>Task Providers</strong> (e.g. proper integration with <code class="language-plaintext highlighter-rouge">tasks.json</code> and <code class="language-plaintext highlighter-rouge">launch.json</code>). This also seemed like overkill. Just adding two key bindings while the extension is active (<code class="language-plaintext highlighter-rouge">F5</code> for debugging and <code class="language-plaintext highlighter-rouge">F7</code> for building) achieves the same thing with less hassle for the user.</li> </ul> <p>Finally, a VSCode extension may run in 3 environments, which has some subtle consequences for what APIs can be used in the extension code:</p> <ul> <li><strong>desktop</strong>: the extension only works in ‘desktop VSCode’ and can use the full set of node.js APIs</li> <li><strong>web</strong>: the extension works in ‘VSCode for the web’, which means only the VSCode extension API and browser APIs can be called</li> <li><strong>universal</strong>: the extension can run both in desktop and web VSCode</li> </ul> <p>The KC IDE is a universal extension, but still has some issues when running in the web version of VSCode (which comes down to a mix of VSCode issues and some file-IO related issues I will most likely need to fix on my side).</p> <h2 id="integrating-the-assembler-via-wasi">Integrating the assembler via WASI</h2> <p>This turned out a lot easier than expected, because the <a href="https://code.visualstudio.com/blogs/2023/06/05/vscode-wasm-wasi">VSCode WASI extension</a> does all the hard work.</p> <p>What this extension basically does is to allow any POSIX commandline tool to run inside VSCode without requiring changes to the source (most notably, no changes are required for blocking file IO code via fopen/fread/fwrite/fclose).</p> <p>The only thing I had to fix in the ASMX assembler was a separately provided root path for the assembler’s <code class="language-plaintext highlighter-rouge">include</code> statement (which is supposed to work with relative paths). WASI currently doesn’t have the concept of a ‘current working directory’, so all filesystem paths must be resolved to absolute paths within the WASI container’s virtual filesystem (a WASI environment doesn’t use direct filesystem paths of the host system, but instead defines its own virtual filesystem with mount points mapped to host system directories).</p> <p>The basic procedure to get the assembler working inside VSCode is:</p> <ul> <li>compile the assembler to a WASI blob using the <a href="https://github.com/WebAssembly/wasi-sdk">WASI SDK Clang toolchain</a>, this happens manually outside the extension project, the resulting .wasm blob is then simply committed into the extension’s git repo and bundled with the published extension. The size of the WASM blob is about 200 KBytes.</li> <li> <p>in the VSCode extension code: initialize the WASI runtime, setup a virtual filesystem, and load and compile the assembler WASM blob, this happens only once during the extension’s life cycle:</p> <div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">export</span> <span class="k">async</span> <span class="kd">function</span> <span class="nx">requireWasiEnv</span><span class="p">(</span><span class="nx">ext</span><span class="p">:</span> <span class="nx">ExtensionContext</span><span class="p">):</span> <span class="nb">Promise</span><span class="o">&lt;</span><span class="nx">WasiEnv</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="nx">wasiEnv</span> <span class="o">===</span> <span class="kc">null</span><span class="p">)</span> <span class="p">{</span> <span class="kd">const</span> <span class="nx">wasm</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">Wasm</span><span class="p">.</span><span class="nx">load</span><span class="p">();</span> <span class="kd">const</span> <span class="nx">fs</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">wasm</span><span class="p">.</span><span class="nx">createRootFileSystem</span><span class="p">([</span> <span class="p">{</span> <span class="na">kind</span><span class="p">:</span> <span class="dl">'</span><span class="s1">workspaceFolder</span><span class="dl">'</span> <span class="p">}</span> <span class="p">]);</span> <span class="kd">const</span> <span class="nx">bits</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">workspace</span><span class="p">.</span><span class="nx">fs</span><span class="p">.</span><span class="nx">readFile</span><span class="p">(</span><span class="nx">Uri</span><span class="p">.</span><span class="nx">joinPath</span><span class="p">(</span><span class="nx">ext</span><span class="p">.</span><span class="nx">extensionUri</span><span class="p">,</span> <span class="dl">'</span><span class="s1">media/asmx.wasm</span><span class="dl">'</span><span class="p">));</span> <span class="kd">const</span> <span class="nx">asmx</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">WebAssembly</span><span class="p">.</span><span class="nx">compile</span><span class="p">(</span><span class="nx">bits</span><span class="p">);</span> <span class="nx">wasiEnv</span> <span class="o">=</span> <span class="p">{</span> <span class="nx">wasm</span><span class="p">,</span> <span class="nx">fs</span><span class="p">,</span> <span class="nx">asmx</span> <span class="p">};</span> <span class="p">}</span> <span class="k">return</span> <span class="nx">wasiEnv</span><span class="p">;</span> <span class="p">}</span> </code></pre></div> </div> </li> <li> <p>run the assembler WASM blob, capture stdout and stderr and check the exit code, this is quite similar to how a native tool would be launched:</p> <div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">export</span> <span class="k">async</span> <span class="kd">function</span> <span class="nx">runAsmx</span><span class="p">(</span><span class="nx">ext</span><span class="p">:</span> <span class="nx">ExtensionContext</span><span class="p">,</span> <span class="nx">args</span><span class="p">:</span> <span class="kr">string</span><span class="p">[]):</span> <span class="nb">Promise</span><span class="o">&lt;</span><span class="nx">RunAsmxResult</span><span class="o">&gt;</span> <span class="p">{</span> <span class="kd">const</span> <span class="nx">wasiEnv</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">requireWasiEnv</span><span class="p">(</span><span class="nx">ext</span><span class="p">);</span> <span class="kd">const</span> <span class="nx">process</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">wasiEnv</span><span class="p">.</span><span class="nx">wasm</span><span class="p">.</span><span class="nx">createProcess</span><span class="p">(</span><span class="dl">'</span><span class="s1">asmx</span><span class="dl">'</span><span class="p">,</span> <span class="nx">wasiEnv</span><span class="p">.</span><span class="nx">asmx</span><span class="p">,</span> <span class="p">{</span> <span class="na">rootFileSystem</span><span class="p">:</span> <span class="nx">wasiEnv</span><span class="p">.</span><span class="nx">fs</span><span class="p">,</span> <span class="na">stdio</span><span class="p">:</span> <span class="p">{</span> <span class="na">out</span><span class="p">:</span> <span class="p">{</span> <span class="na">kind</span><span class="p">:</span> <span class="dl">'</span><span class="s1">pipeOut</span><span class="dl">'</span> <span class="p">},</span> <span class="na">err</span><span class="p">:</span> <span class="p">{</span> <span class="na">kind</span><span class="p">:</span> <span class="dl">'</span><span class="s1">pipeOut</span><span class="dl">'</span> <span class="p">},</span> <span class="p">},</span> <span class="nx">args</span><span class="p">,</span> <span class="p">});</span> <span class="kd">const</span> <span class="nx">decoder</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">TextDecoder</span><span class="p">(</span><span class="dl">'</span><span class="s1">utf-8</span><span class="dl">'</span><span class="p">);</span> <span class="kd">let</span> <span class="nx">stderr</span> <span class="o">=</span> <span class="dl">''</span><span class="p">;</span> <span class="kd">let</span> <span class="nx">stdout</span> <span class="o">=</span> <span class="dl">''</span><span class="p">;</span> <span class="nx">process</span><span class="p">.</span><span class="nx">stderr</span><span class="o">!</span><span class="p">.</span><span class="nx">onData</span><span class="p">((</span><span class="nx">data</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="nx">stderr</span> <span class="o">+=</span> <span class="nx">decoder</span><span class="p">.</span><span class="nx">decode</span><span class="p">(</span><span class="nx">data</span><span class="p">);</span> <span class="p">});</span> <span class="nx">process</span><span class="p">.</span><span class="nx">stdout</span><span class="o">!</span><span class="p">.</span><span class="nx">onData</span><span class="p">((</span><span class="nx">data</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="nx">stdout</span> <span class="o">+=</span> <span class="nx">decoder</span><span class="p">.</span><span class="nx">decode</span><span class="p">(</span><span class="nx">data</span><span class="p">);</span> <span class="p">});</span> <span class="kd">const</span> <span class="nx">exitCode</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">process</span><span class="p">.</span><span class="nx">run</span><span class="p">();</span> <span class="k">return</span> <span class="p">{</span> <span class="nx">exitCode</span><span class="p">,</span> <span class="nx">stdout</span><span class="p">,</span> <span class="nx">stderr</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div> </div> </li> <li>the KC IDE extension will then parse the assembler error messages in stderr and convert the error messages into VSCode Diagnostic objects, which then show up in the <code class="language-plaintext highlighter-rouge">Problems</code> panel and as error squiggles in the text editor</li> <li>the actual assembler output files are written directly into the host filesystem via the virtual filesystem mapping that was provided when initializing the WASI runtime</li> </ul> <h2 id="integrating-the-emulator">Integrating the emulator</h2> <p>The embedded home computer emulators are taken from the <a href="https://github.com/floooh/chips">chips project</a>, those are implemented in C/C++, use the <a href="https://github.com/floooh/sokol">sokol headers</a> for abstracting platform details and run both as natively compiled executables and <a href="https://floooh.github.io/tiny8bit/">in the browser</a> via WASM and WebGL, compiled with the Emscripten SDK.</p> <p>One emulator WASM blob is about 700..800 KBytes (most of that is the Dear ImGui debugging UI which costs about 450 Kbytes).</p> <p>Currently the KC IDE extension contains 4 emulators (KC85/3, KC85/4, C64 and CPC) which adds up to about 3 MBytes (if there will be drastically more supported systems in the future I’ll need to come up with a solution to reduce the size of the embedded emulators, either downloading them on demand, merge them into a single ‘multi-system-emulator’ binary, or maybe moving the UI into a shared WASM module that’s loaded like a DLL).</p> <p>The emulator is running inside a VSCode <a href="https://code.visualstudio.com/api/extension-guides/webview">webview panel</a>. For the most part this is quite straightforward for an Emscripten WebGL application by taking an <a href="https://github.com/floooh/vscode-kcide/blob/b062aa56609fafeffc70ef0ac440c6ee1d70fe5b/media/shell.html">index.html like this</a> (note the placeholders <code class="language-plaintext highlighter-rouge">{{{shell}}}</code> and <code class="language-plaintext highlighter-rouge">{{{emu}}}</code>, those must be replaced with runtime-generated URLs), and setup a <a href="https://github.com/floooh/vscode-kcide/blob/b062aa56609fafeffc70ef0ac440c6ee1d70fe5b/src/emu.ts#L22-L77">webview panel object like this</a>.</p> <p>There’s a couple of interesting details in that code:</p> <p>The webview panel cannot simply load resources from anywhere in the host file system, instead a <code class="language-plaintext highlighter-rouge">localResourceRoot</code> must be provided in the <code class="language-plaintext highlighter-rouge">window.createWebviewPanel()</code> call which points to the extension subdirectory <code class="language-plaintext highlighter-rouge">media/</code> (e.g. anything that’s loaded in the webview panel needs to be located in that <code class="language-plaintext highlighter-rouge">media/</code> subdirectory):</p> <div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kd">const</span> <span class="nx">rootUri</span> <span class="o">=</span> <span class="nx">Uri</span><span class="p">.</span><span class="nx">joinPath</span><span class="p">(</span><span class="nx">getExtensionUri</span><span class="p">(),</span> <span class="dl">'</span><span class="s1">media</span><span class="dl">'</span><span class="p">);</span> <span class="kd">const</span> <span class="nx">panel</span> <span class="o">=</span> <span class="nb">window</span><span class="p">.</span><span class="nx">createWebviewPanel</span><span class="p">(</span> <span class="c1">// ...</span> <span class="p">{</span> <span class="na">localResourceRoots</span><span class="p">:</span> <span class="p">[</span> <span class="nx">rootUri</span> <span class="p">],</span> <span class="p">}</span> <span class="p">);</span> </code></pre></div></div> <p>…next, all URLs referenced in the webview panel’s HTML content must be generated via the webview panel API, I’m doing that by loading a HTML template file and then replace the placeholders inside <code class="language-plaintext highlighter-rouge">{{{...}}}</code> with generated URLs (and while at it, I also select the correct emulator to load):</p> <div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kd">let</span> <span class="nx">emuFilename</span><span class="p">;</span> <span class="k">switch</span> <span class="p">(</span><span class="nx">project</span><span class="p">.</span><span class="nx">emulator</span><span class="p">.</span><span class="nx">system</span><span class="p">)</span> <span class="p">{</span> <span class="k">case</span> <span class="nx">System</span><span class="p">.</span><span class="nx">KC853</span><span class="p">:</span> <span class="nx">emuFilename</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">kc853-ui.js</span><span class="dl">'</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span> <span class="k">case</span> <span class="nx">System</span><span class="p">.</span><span class="nx">C64</span><span class="p">:</span> <span class="nx">emuFilename</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">c64-ui.js</span><span class="dl">'</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span> <span class="k">case</span> <span class="nx">System</span><span class="p">.</span><span class="nx">CPC6128</span><span class="p">:</span> <span class="nx">emuFilename</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">cpc-ui.js</span><span class="dl">'</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span> <span class="nl">default</span><span class="p">:</span> <span class="nx">emuFilename</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">kc854-ui.js</span><span class="dl">'</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span> <span class="p">}</span> <span class="kd">const</span> <span class="nx">emuUri</span> <span class="o">=</span> <span class="nx">panel</span><span class="p">.</span><span class="nx">webview</span><span class="p">.</span><span class="nx">asWebviewUri</span><span class="p">(</span><span class="nx">Uri</span><span class="p">.</span><span class="nx">joinPath</span><span class="p">(</span><span class="nx">rootUri</span><span class="p">,</span> <span class="nx">emuFilename</span><span class="p">));</span> <span class="kd">const</span> <span class="nx">shellUri</span> <span class="o">=</span> <span class="nx">panel</span><span class="p">.</span><span class="nx">webview</span><span class="p">.</span><span class="nx">asWebviewUri</span><span class="p">(</span><span class="nx">Uri</span><span class="p">.</span><span class="nx">joinPath</span><span class="p">(</span><span class="nx">rootUri</span><span class="p">,</span> <span class="dl">'</span><span class="s1">shell.js</span><span class="dl">'</span><span class="p">));</span> <span class="kd">const</span> <span class="nx">templ</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">readTextFile</span><span class="p">(</span><span class="nx">Uri</span><span class="p">.</span><span class="nx">joinPath</span><span class="p">(</span><span class="nx">rootUri</span><span class="p">,</span> <span class="dl">'</span><span class="s1">shell.html</span><span class="dl">'</span><span class="p">));</span> <span class="kd">const</span> <span class="nx">html</span> <span class="o">=</span> <span class="nx">templ</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="dl">'</span><span class="s1">{{{emu}}}</span><span class="dl">'</span><span class="p">,</span> <span class="nx">emuUri</span><span class="p">.</span><span class="nx">toString</span><span class="p">()).</span><span class="nx">replace</span><span class="p">(</span><span class="dl">'</span><span class="s1">{{{shell}}}</span><span class="dl">'</span><span class="p">,</span> <span class="nx">shellUri</span><span class="p">.</span><span class="nx">toString</span><span class="p">());</span> <span class="nx">panel</span><span class="p">.</span><span class="nx">webview</span><span class="p">.</span><span class="nx">html</span> <span class="o">=</span> <span class="nx">html</span><span class="p">;</span> </code></pre></div></div> <p>Communication between VSCode and the WebView panel content works via bi-directional message passing, this means the VSCode extension needs to register a listener function which dispatches received messages to their handler functions:</p> <div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nx">panel</span><span class="p">.</span><span class="nx">webview</span><span class="p">.</span><span class="nx">onDidReceiveMessage</span><span class="p">((</span><span class="nx">msg</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">command</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">emu_cpustate</span><span class="dl">'</span><span class="p">)</span> <span class="p">{</span> <span class="nx">cpuStateResolved</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">state</span> <span class="k">as</span> <span class="nx">CPUState</span><span class="p">);</span> <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">command</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">emu_disassembly</span><span class="dl">'</span><span class="p">)</span> <span class="p">{</span> <span class="nx">disassemblyResolved</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">result</span> <span class="k">as</span> <span class="nx">DisasmLine</span><span class="p">[]);</span> <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">command</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">emu_memory</span><span class="dl">'</span><span class="p">)</span> <span class="p">{</span> <span class="nx">readMemoryResolved</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">result</span> <span class="k">as</span> <span class="nx">ReadMemoryResult</span><span class="p">);</span> <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">command</span> <span class="o">===</span> <span class="dl">'</span><span class="s1">emu_ready</span><span class="dl">'</span><span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="nx">state</span><span class="p">)</span> <span class="p">{</span> <span class="nx">state</span><span class="p">.</span><span class="nx">ready</span> <span class="o">=</span> <span class="nx">msg</span><span class="p">.</span><span class="nx">isReady</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="nx">KCIDEDebugSession</span><span class="p">.</span><span class="nx">onEmulatorMessage</span><span class="p">(</span><span class="nx">msg</span><span class="p">);</span> <span class="p">}</span> <span class="p">});</span> </code></pre></div></div> <p>…sending a message into the opposite direction (from the debug session to the webview panel) simply looks like this:</p> <div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">await</span> <span class="nx">state</span><span class="p">.</span><span class="nx">panel</span><span class="p">.</span><span class="nx">webview</span><span class="p">.</span><span class="nx">postMessage</span><span class="p">({</span> <span class="na">cmd</span><span class="p">:</span> <span class="dl">'</span><span class="s1">boot</span><span class="dl">'</span> <span class="p">});</span> </code></pre></div></div> <p>…the message structure is entirely custom (and I’m just noticing that I’m using <code class="language-plaintext highlighter-rouge">command</code> in one direction, but <code class="language-plaintext highlighter-rouge">cmd</code> in the other direction… but anyway…).</p> <p>There is one missing step in the communication between VSCode debug session on one side, and the emulator on the other. There’s a <a href="https://github.com/floooh/vscode-kcide/blob/main/media/shell.js">Javascript shim</a> running in the context of the webpage which translates between the JSON-like message objects which are sent and received by the VSCode debug session, and a lower level WASM function call interface implemented by the emulator.</p> <p>When a message is received from the VSCode debug session in the emulator’s HTML page, it’s dispatched to a Javascript function via an event listener added to the <code class="language-plaintext highlighter-rouge">window</code> object (note that this code is plain Javascript, not Typescript):</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nb">window</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="dl">'</span><span class="s1">message</span><span class="dl">'</span><span class="p">,</span> <span class="nx">ev</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="kd">const</span> <span class="nx">msg</span> <span class="o">=</span> <span class="nx">ev</span><span class="p">.</span><span class="nx">data</span><span class="p">;</span> <span class="k">switch</span> <span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">cmd</span><span class="p">)</span> <span class="p">{</span> <span class="k">case</span> <span class="dl">'</span><span class="s1">boot</span><span class="dl">'</span><span class="p">:</span> <span class="nx">kcide_boot</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span> <span class="k">case</span> <span class="dl">'</span><span class="s1">reset</span><span class="dl">'</span><span class="p">:</span> <span class="nx">kcide_reset</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span> <span class="k">case</span> <span class="dl">'</span><span class="s1">ready</span><span class="dl">'</span><span class="p">:</span> <span class="nx">kcide_ready</span><span class="p">();</span> <span class="k">break</span><span class="p">;</span> <span class="k">case</span> <span class="dl">'</span><span class="s1">load</span><span class="dl">'</span><span class="p">:</span> <span class="nx">kcide_load</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">data</span><span class="p">);</span> <span class="k">break</span><span class="p">;</span> <span class="c1">// ...</span> <span class="k">case</span> <span class="dl">'</span><span class="s1">disassemble</span><span class="dl">'</span><span class="p">:</span> <span class="nx">kcide_dbgDisassemble</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">addr</span><span class="p">,</span> <span class="nx">msg</span><span class="p">.</span><span class="nx">offsetLines</span><span class="p">,</span> <span class="nx">msg</span><span class="p">.</span><span class="nx">numLines</span><span class="p">);</span> <span class="k">break</span><span class="p">;</span> <span class="k">case</span> <span class="dl">'</span><span class="s1">readMemory</span><span class="dl">'</span><span class="p">:</span> <span class="nx">kcide_dbgReadMemory</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">addr</span><span class="p">,</span> <span class="nx">msg</span><span class="p">.</span><span class="nx">numBytes</span><span class="p">);</span> <span class="k">break</span><span class="p">;</span> <span class="nl">default</span><span class="p">:</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`unknown cmd called: </span><span class="p">${</span><span class="nx">msg</span><span class="p">.</span><span class="nx">cmd</span><span class="p">}</span><span class="s2">`</span><span class="p">);</span> <span class="k">break</span><span class="p">;</span> <span class="p">}</span> <span class="p">});</span> </code></pre></div></div> <p>Such a handler function looks like this:</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">kcide_boot</span><span class="p">()</span> <span class="p">{</span> <span class="nx">Module</span><span class="p">.</span><span class="nx">_webapi_boot</span><span class="p">();</span> <span class="p">}</span> </code></pre></div></div> <p>This is an ‘Emscripten-ism’. The easiest way to export a C function from WASM to Javascript is via the <code class="language-plaintext highlighter-rouge">EMSCRIPTEN_KEEPALIVE</code> attribute in the C source, like this:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">EMSCRIPTEN_KEEPALIVE</span> <span class="kt">void</span> <span class="nf">webapi_boot</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">inited</span> <span class="o">&amp;&amp;</span> <span class="n">state</span><span class="p">.</span><span class="n">funcs</span><span class="p">.</span><span class="n">boot</span><span class="p">)</span> <span class="p">{</span> <span class="n">state</span><span class="p">.</span><span class="n">funcs</span><span class="p">.</span><span class="n">boot</span><span class="p">();</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>When Emscripten builds the project, it keeps track of all <code class="language-plaintext highlighter-rouge">EMSCRIPTEN_KEEPALIVE</code> C functions and makes them available as Javascript functions on a global <code class="language-plaintext highlighter-rouge">Module</code> object created by the Emscripten entry stub. Calling such an <code class="language-plaintext highlighter-rouge">EMSCRIPTEN_KEEPALIVE</code> C function from the Javascript side then looks like this:</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nx">Module</span><span class="p">.</span><span class="nx">_webapi_boot</span><span class="p">();</span> </code></pre></div></div> <p>…and that’s essentially how the communication between VSCode and the WASM emulator works. For instance, when the VSCode palette command <code class="language-plaintext highlighter-rouge">KCIDE: Reboot Emulator</code> is executed, eventually the C function <code class="language-plaintext highlighter-rouge">webapi_boot()</code> in the WASM emulator will be called, which reboots the emulator.</p> <p>Currently the emulators implement the following ‘web API’ functions callable from Javascript:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">webapi_dbg_connect</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// a VSCode debug session has started</span> <span class="kt">void</span> <span class="nf">webapi_dbg_disconnect</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// a VSCode debug session has ended</span> <span class="kt">void</span><span class="o">*</span> <span class="nf">webapi_alloc</span><span class="p">(</span><span class="kt">int</span> <span class="n">size</span><span class="p">);</span> <span class="c1">// helper function to allocate on the WASM heap from Javascript</span> <span class="kt">void</span> <span class="nf">webapi_free</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">);</span> <span class="c1">// helper function to free memory allocated via webapi_alloc()</span> <span class="kt">void</span> <span class="nf">webapi_boot</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// reboot the emulator (e.g. switch off and on)</span> <span class="kt">void</span> <span class="nf">webapi_reset</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// reset the emulator (e.g. press the reset button)</span> <span class="n">bool</span> <span class="nf">webapi_ready</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// returns true when the emulator is ready to start a debug session after rebooting</span> <span class="n">bool</span> <span class="nf">webapi_load</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">ptr</span><span class="p">,</span> <span class="kt">int</span> <span class="n">size</span><span class="p">);</span> <span class="c1">// load binary data into the emulator</span> <span class="kt">void</span> <span class="nf">webapi_dbg_add_breakpoint</span><span class="p">(</span><span class="kt">uint16_t</span> <span class="n">addr</span><span class="p">);</span> <span class="c1">// add a debug breakpoint at a 16-bit address</span> <span class="kt">void</span> <span class="nf">webapi_dbg_remove_breakpoint</span><span class="p">(</span><span class="kt">uint16_t</span> <span class="n">addr</span><span class="p">);</span> <span class="c1">// delete a debug breakpoint at a 16-bit address</span> <span class="kt">void</span> <span class="nf">webapi_dbg_break</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// break into the debugger</span> <span class="kt">void</span> <span class="nf">webapi_dbg_continue</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// continue execution when stopped in debugger</span> <span class="kt">void</span> <span class="nf">webapi_dbg_step_next</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// execute a 'step over' in the debugger</span> <span class="kt">void</span> <span class="nf">webapi_dbg_step_into</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// execute a 'step into' in the debugger</span> <span class="kt">uint16_t</span><span class="o">*</span> <span class="nf">webapi_dbg_cpu_state</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// request a raw 'CPU state' dump (current register values)</span> <span class="n">webapi_dasm_line_t</span><span class="o">*</span> <span class="nf">webapi_dbg_request_disassembly</span><span class="p">(</span><span class="cm">/*...*/</span><span class="p">);</span> <span class="c1">// request a disassembly dump over a range of addresses</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="nf">webapi_dbg_read_memory</span><span class="p">(</span><span class="kt">uint16_t</span> <span class="n">addr</span><span class="p">,</span> <span class="kt">int</span> <span class="n">num_bytes</span><span class="p">);</span> <span class="c1">// request a memory dump over a range of addresses</span> </code></pre></div></div> <p>In the opposite direction (from the emulator to the VSCode debug session), the emulator calls into the following C callback functions, which in turn call into Javascript to create a JSON-like message object to send back into the VSCode debug session:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">webapi_event_stopped</span><span class="p">(</span><span class="kt">int</span> <span class="n">stop_reason</span><span class="p">,</span> <span class="kt">uint16_t</span> <span class="n">addr</span><span class="p">);</span> <span class="c1">// debugger has stopped at addr for a specific reason</span> <span class="kt">void</span> <span class="nf">webapi_event_continued</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// the debugger has continued execution</span> <span class="kt">void</span> <span class="nf">webapi_event_reboot</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// the emulator has been rebooted</span> <span class="kt">void</span> <span class="kt">void</span> <span class="nf">webapi_event_reset</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// the emulator has been reset</span> </code></pre></div></div> <p>…in a nutshell, this is the minimal ‘virtual machine’ interface required to implement a somewhat feature-complete VSCode Debug Adapter.</p> <p>One downside of the <a href="https://microsoft.github.io/debug-adapter-protocol/">Debug Adapter Protocol</a> is that it is clearly designed towards high level languages, and the protocol feature set has little overlap with debugging features that are desired in an emulator virtual machine.</p> <p>But thankfully, the Debug Adapter Protocol is also flexible enough that it can work side by side with the much more powerful debugger that’s already integrated in the chips-emulators via Dear ImGui:</p> <p><img src="/images/vscode-wasm-wasi-3.webp" alt="Screenshot 3" /></p> <p>…for instance, the embedded Dear ImGui debugger allows to step the emulator forward in single clock cycles, while the VSCode debugger only steps at instruction or source line granularity.</p> <h2 id="known-issues-and-future-updates">Known Issues and future updates</h2> <p>There’s a couple of issues which are currently worked around or don’t work at all, and which I want to fix in future updates (most of those are only an issue in the VSCode web version, so not exactly show stoppers):</p> <ul> <li> <p>Hopefully the <a href="https://github.com/microsoft/vscode-wasm">VSCode WASI extension</a> will go out of pre-release-only mode rather sooner than later, at that point I can also move the KC IDE extension out of pre-release. The problem is that trying to install a VSCode extension which depends on a pre-release-only extension will fail to install the dependency with a cryptic error message. Worst case is that I need to implement my own VSCode WASI runtime, or figure out another way to run the assembler inside VSCode (maybe as a regular WASM blob which replaces the C stdlib IO calls with asynchronous functions with completion-callback, delegated to Javascript)</p> </li> <li> <p>Currently, any binary-blob data that needs to be transferred from VSCode into the emulator needs to go through a base64-encoded string which is expensive to encode and decode. The reason for that hack is that transferring Uint8Array objects doesn’t work when VSCode is running in the web (it’s supposed to work, but the data gets corrupted).</p> </li> <li> <p>Working directly on Github repositories in the VSCode web version doesn’t work (weird virtual filesystem issues).</p> </li> <li> <p>…and of course some sort of Language-Server-like editing experience (proper code completion and error squiggles while typing), but without implementing a full-blown language server.</p> </li> </ul> Sun, 31 Dec 2023 00:00:00 +0000 https://floooh.github.io/2023/12/31/vscode-wasm-wasi.html https://floooh.github.io/2023/12/31/vscode-wasm-wasi.html WASM Debugging with Emscripten and VSCode <p><strong>TL;DR</strong>: glueing together VSCode, Cmake and the Emscripten SDK to enable an IDE-like workflow (including debugging).</p> <p><strong>17-Nov-2024</strong>: looks like the problem that ‘early breakpoints’ are not caught is fixed, woohoo!</p> <p><strong>09-Oct-2024</strong>: updated for the latest sokol_gfx.h and VSCode extension versions.</p> <p>This is written from the perspective of a UNIX-like OS (macOS or Linux), but should also work on Windows with some minor tweaks.</p> <h2 id="prerequisites">Prerequisites</h2> <p>First make sure that the following tools are in the path:</p> <ul> <li>git</li> <li>cmake</li> <li>ninja</li> </ul> <p>You’ll also need VSCode and Chrome installed.</p> <p>On macOS I’d recommend using <a href="https://brew.sh/">Homebrew</a> and on Windows <a href="https://scoop.sh/">Scoop</a> to install those. On Linux of course, your system’s standard package manager.</p> <h2 id="emscripten-hello-world">Emscripten Hello World</h2> <p>Let’s start from scratch. On the command line:</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir </span>hello <span class="nb">cd </span>hello git init </code></pre></div></div> <p>Add a <code class="language-plaintext highlighter-rouge">.gitignore</code> file:</p> <p><code class="language-plaintext highlighter-rouge">.gitignore</code></p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>build/ emsdk/ </code></pre></div></div> <p>Install the Emscripten SDK, we’ll do so in a way that it doesn’t leave a trace on your system when deleted so don’t worry. Still inside the <code class="language-plaintext highlighter-rouge">hello</code> directory:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone --depth=1 https://github.com/emscripten-core/emsdk cd emsdk ./emsdk install latest ./emsdk activate --embedded latest cd .. </code></pre></div></div> <p>Don’t forget the <code class="language-plaintext highlighter-rouge">./emsdk activate --embedded latest</code> step! (happens to me all the time)</p> <p>…let’s check if that worked. Create a <code class="language-plaintext highlighter-rouge">hello.c</code> source file in the <code class="language-plaintext highlighter-rouge">hello</code> project directory:</p> <p><code class="language-plaintext highlighter-rouge">hello.c</code></p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp"> </span> <span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="n">printf</span><span class="p">(</span><span class="s">"Hello World!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <p>…compile that into a .wasm/.js pair runnable with node.js:</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emsdk/upstream/emscripten/emcc hello.c <span class="nt">-o</span> hello.js </code></pre></div></div> <p>…there should be a hello.js and hello.wasm file now:</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">ls </span>emsdk hello.c hello.js hello.wasm </code></pre></div></div> <p>…run the hello.js file via node.js (depending on the emsdk version the path may differ):</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emsdk/node/18.20.3_64bit/bin/node hello.js </code></pre></div></div> <p>…you should see a <code class="language-plaintext highlighter-rouge">Hello World!</code> printed to the terminal.</p> <p>Delete the compiler output, we don’t need that anymore:</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">rm </span>hello.js hello.wasm </code></pre></div></div> <h2 id="cmake--emscripten">CMake + Emscripten</h2> <p>Let’s bake the build process into a cmake file. Create a CMakeLists.txt file in the <code class="language-plaintext highlighter-rouge">hello</code> project directory:</p> <p><code class="language-plaintext highlighter-rouge">CMakeLists.txt</code></p> <div class="language-cmake highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cmake_minimum_required</span><span class="p">(</span>VERSION 3.21<span class="p">)</span> <span class="nb">project</span><span class="p">(</span>hello<span class="p">)</span> <span class="nb">add_executable</span><span class="p">(</span>hello hello.c<span class="p">)</span> <span class="nb">if</span> <span class="p">(</span>CMAKE_SYSTEM_NAME STREQUAL Emscripten<span class="p">)</span> <span class="nb">set</span><span class="p">(</span>CMAKE_EXECUTABLE_SUFFIX .js<span class="p">)</span> <span class="nb">endif</span><span class="p">()</span> </code></pre></div></div> <p>…and since this is a cross-compilation scenario, let’s also create a CMakeUserPresets.json file. This simplifies calling cmake with the right arguments for cross-compilation, and will help us later when integrating with VSCode:</p> <p><code class="language-plaintext highlighter-rouge">CMakeUserPresets.json</code></p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="nl">"version"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nl">"cmakeMinimumRequired"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"major"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nl">"minor"</span><span class="p">:</span><span class="w"> </span><span class="mi">21</span><span class="p">,</span><span class="w"> </span><span class="nl">"patch"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nl">"configurePresets"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"default"</span><span class="p">,</span><span class="w"> </span><span class="nl">"displayName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Emscripten"</span><span class="p">,</span><span class="w"> </span><span class="nl">"binaryDir"</span><span class="p">:</span><span class="w"> </span><span class="s2">"build"</span><span class="p">,</span><span class="w"> </span><span class="nl">"generator"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Ninja Multi-Config"</span><span class="p">,</span><span class="w"> </span><span class="nl">"toolchainFile"</span><span class="p">:</span><span class="w"> </span><span class="s2">"emsdk/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake"</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">],</span><span class="w"> </span><span class="nl">"buildPresets"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Debug"</span><span class="p">,</span><span class="w"> </span><span class="nl">"configurePreset"</span><span class="p">:</span><span class="w"> </span><span class="s2">"default"</span><span class="p">,</span><span class="w"> </span><span class="nl">"configuration"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Debug"</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Release"</span><span class="p">,</span><span class="w"> </span><span class="nl">"configurePreset"</span><span class="p">:</span><span class="w"> </span><span class="s2">"default"</span><span class="p">,</span><span class="w"> </span><span class="nl">"configuration"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Release"</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span></code></pre></div></div> <p>…let’s configure and build with cmake:</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cmake <span class="nt">--preset</span> default <span class="nt">-B</span> build cmake <span class="nt">--build</span> build <span class="nt">--preset</span> Debug </code></pre></div></div> <p>…and run with node.js:</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emsdk/node/18.20.3_64bit/bin/node build/Debug/hello.js </code></pre></div></div> <p>…this should again print <code class="language-plaintext highlighter-rouge">Hello World!</code>.</p> <h2 id="vscode--cmake--emscripten">VSCode + CMake + Emscripten</h2> <p>Let’s integrate what we have so far with VSCode!</p> <p>You’ll need the following VSCode extensions:</p> <ul> <li><a href="https://marketplace.visualstudio.com/items?itemName=ms-vscode.cpptools">ms-vscode.cpptools</a></li> <li><a href="https://marketplace.visualstudio.com/items?itemName=ms-vscode.cmake-tools">ms-vscode.cmake-tools</a></li> <li><a href="https://marketplace.visualstudio.com/items?itemName=ms-vscode.wasm-dwarf-debugging">ms-vscode.wasm-dwarf-debugging</a></li> <li><a href="https://marketplace.visualstudio.com/items?itemName=ms-vscode.live-server">ms-vscode.live-server</a></li> </ul> <p>…with those installed, start VSCode from within the <code class="language-plaintext highlighter-rouge">hello</code> project directory:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>code . </code></pre></div></div> <p>You should see something like this, pay attention to the status bar at the bottom (underlined in red), these items are used to control the cmake build config and target:</p> <p>(<strong>NOTE 09-Oct-2024</strong>: the underlined items in the bottom bar have moved into the CMake Tools sidepanel in recent versions).</p> <p><img src="/images/emscripten-ide-1.png" alt="VSCode Screenshot 1" /></p> <p>Clicking those allows you to select a Configure- and Build-Preset, and a build target.</p> <p>Change those that it looks like this:</p> <p><img src="/images/emscripten-ide-2.png" alt="VSCode Screenshot 2" /></p> <p>Here we also encounter the first wart, the CMake Tools extension isn’t able to communicate the correct Emscripten sysroot include paths over to the C/C++ extension. You’ll see an error squiggle under the stdio.h include path:</p> <p><img src="/images/emscripten-ide-3.png" alt="VSCode Screenshot 3" /></p> <p>I haven’t found a solution to this problem, it looks like a bug in the CMake Tools extension. Annoying for sure, but not a showstopper, because only Intellisense is affected, building should work fine.</p> <p>You can test that by pressing <code class="language-plaintext highlighter-rouge">F7</code>, or run the palette command <code class="language-plaintext highlighter-rouge">CMake: Build</code>. You should see something like this in the VSCode Output panel:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[main] Building folder: hello [build] Starting build [proc] Executing command: /opt/homebrew/bin/cmake --build /Users/floh/scratch/hello/build --config Debug --target hello [build] [1/2] Building C object CMakeFiles/hello.dir/Debug/hello.c.o [build] [2/2] Linking C executable Debug/hello.js [driver] Build completed: 00:00:00.361 [build] Build finished with exit code </code></pre></div></div> <h2 id="debugging">Debugging</h2> <p>…next lets make debugging work!</p> <p>Create a <code class="language-plaintext highlighter-rouge">launch.json</code> file in the <code class="language-plaintext highlighter-rouge">.vscode</code> subdirectory:</p> <p><code class="language-plaintext highlighter-rouge">.vscode/launch.json</code></p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="nl">"version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0.2.0"</span><span class="p">,</span><span class="w"> </span><span class="nl">"configurations"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Launch"</span><span class="p">,</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"node"</span><span class="p">,</span><span class="w"> </span><span class="nl">"request"</span><span class="p">:</span><span class="w"> </span><span class="s2">"launch"</span><span class="p">,</span><span class="w"> </span><span class="nl">"program"</span><span class="p">:</span><span class="w"> </span><span class="s2">"build/Debug/${command:cmake.launchTargetFilename}"</span><span class="p">,</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span></code></pre></div></div> <p>Pressing <code class="language-plaintext highlighter-rouge">F5</code> should now work. You should see a <code class="language-plaintext highlighter-rouge">Hello World!</code> in the VSCode <code class="language-plaintext highlighter-rouge">Debug Panel</code>.</p> <p>But when trying to debug there’s the next wart. Try to set a breakpoint in the C source code:</p> <p><img src="/images/emscripten-ide-4.png" alt="VSCode Screenshot 4" /></p> <p>Now hit <code class="language-plaintext highlighter-rouge">F5</code>. We’d expect that the execution stops at the breakpoint, but that doesn’t happen.</p> <p>This is a known issue in the DWARF debugging extension. From the <a href="https://code.visualstudio.com/docs/nodejs/nodejs-debugging#_debugging-webassembly">documentation</a>:</p> <blockquote> <p>Breakpoints in WebAssembly code are resolved asynchronously, so breakpoints hit early on in a program’s lifecycle may be missed. There are plans to fix this in the future. If you’re debugging in a browser, you can refresh the page for your breakpoint to be hit. If you’re in Node.js, you can add an artificial delay, or set another breakpoint, after your WebAssembly module is loaded but before your desired breakpoint is hit.</p> </blockquote> <p>Hopefully this problem will be fixed soon-ish, since it’s currently the most annoying.</p> <p>One workaround is to first set a breakpoint in the Javascript launch file at a point where the WASM blob has been loaded.</p> <p>Load the file <code class="language-plaintext highlighter-rouge">build/Debug/hello.js</code> into the editor, search the function <code class="language-plaintext highlighter-rouge">callMain</code>, and set a breakpoint there:</p> <p><img src="/images/emscripten-ide-5.png" alt="VSCode Screenshot 5" /></p> <p>Press <code class="language-plaintext highlighter-rouge">F5</code> and execution should stop at that breakpoint. Now press <code class="language-plaintext highlighter-rouge">F5</code> again and execution should stop in the C code’s main() function (assuming that breakpoint is still set):</p> <p><img src="/images/emscripten-ide-6.png" alt="VSCode Screenshot 6" /></p> <p>Yay. This is how debugging works for a Node.js Emscripten application.</p> <h3 id="moving-into-the-web-browser">Moving into the web browser</h3> <p>Let’s extend our <code class="language-plaintext highlighter-rouge">hello.c</code> to render something in WebGL2.</p> <p>Clone the sokol headers into the <code class="language-plaintext highlighter-rouge">hello</code> project directory and copy some headers up into the project:</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone <span class="nt">--depth</span><span class="o">=</span>1 https://github.com/floooh/sokol <span class="nb">cp </span>sokol/sokol_gfx.h sokol/sokol_app.h sokol/sokol_log.h sokol/sokol_glue.h <span class="nb">.</span> </code></pre></div></div> <p>…delete the sokol directory since we don’t need it anymore:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rm -rf sokol </code></pre></div></div> <p>Replace the <code class="language-plaintext highlighter-rouge">hello.c</code> file with the following code which just clears the canvas with a dynamically changing color:</p> <p><code class="language-plaintext highlighter-rouge">hello.c</code></p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define SOKOL_IMPL #define SOKOL_GLES3 #include</span> <span class="cpf">"sokol_gfx.h"</span><span class="cp"> #include</span> <span class="cpf">"sokol_app.h"</span><span class="cp"> #include</span> <span class="cpf">"sokol_log.h"</span><span class="cp"> #include</span> <span class="cpf">"sokol_glue.h"</span><span class="cp"> </span> <span class="k">static</span> <span class="n">sg_pass_action</span> <span class="n">pass_action</span><span class="p">;</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="n">sg_setup</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">environment</span> <span class="o">=</span> <span class="n">sglue_environment</span><span class="p">(),</span> <span class="p">.</span><span class="n">logger</span><span class="p">.</span><span class="n">func</span> <span class="o">=</span> <span class="n">slog_func</span><span class="p">,</span> <span class="p">});</span> <span class="n">pass_action</span> <span class="o">=</span> <span class="p">(</span><span class="n">sg_pass_action</span><span class="p">)</span> <span class="p">{</span> <span class="p">.</span><span class="n">colors</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">load_action</span> <span class="o">=</span> <span class="n">SG_LOADACTION_CLEAR</span><span class="p">,</span> <span class="p">.</span><span class="n">clear_value</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span> <span class="p">}</span> <span class="p">}</span> <span class="p">};</span> <span class="p">}</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">frame</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="kt">float</span> <span class="n">g</span> <span class="o">=</span> <span class="n">pass_action</span><span class="p">.</span><span class="n">colors</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">clear_value</span><span class="p">.</span><span class="n">g</span> <span class="o">+</span> <span class="mi">0</span><span class="p">.</span><span class="mo">01</span><span class="n">f</span><span class="p">;</span> <span class="n">pass_action</span><span class="p">.</span><span class="n">colors</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">clear_value</span><span class="p">.</span><span class="n">g</span> <span class="o">=</span> <span class="p">(</span><span class="n">g</span> <span class="o">&gt;</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span><span class="p">)</span> <span class="o">?</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="n">f</span> <span class="o">:</span> <span class="n">g</span><span class="p">;</span> <span class="n">sg_begin_pass</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">sg_pass</span><span class="p">){</span> <span class="p">.</span><span class="n">action</span> <span class="o">=</span> <span class="n">pass_action</span><span class="p">,</span> <span class="p">.</span><span class="n">swapchain</span> <span class="o">=</span> <span class="n">sglue_swapchain</span><span class="p">()</span> <span class="p">});</span> <span class="n">sg_end_pass</span><span class="p">();</span> <span class="n">sg_commit</span><span class="p">();</span> <span class="p">}</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">cleanup</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="n">sg_shutdown</span><span class="p">();</span> <span class="p">}</span> <span class="n">sapp_desc</span> <span class="nf">sokol_main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">argv</span><span class="p">[])</span> <span class="p">{</span> <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">argc</span><span class="p">;</span> <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">argv</span><span class="p">;</span> <span class="k">return</span> <span class="p">(</span><span class="n">sapp_desc</span><span class="p">){</span> <span class="p">.</span><span class="n">init_cb</span> <span class="o">=</span> <span class="n">init</span><span class="p">,</span> <span class="p">.</span><span class="n">frame_cb</span> <span class="o">=</span> <span class="n">frame</span><span class="p">,</span> <span class="p">.</span><span class="n">cleanup_cb</span> <span class="o">=</span> <span class="n">cleanup</span><span class="p">,</span> <span class="p">.</span><span class="n">window_title</span> <span class="o">=</span> <span class="s">"clear"</span><span class="p">,</span> <span class="p">.</span><span class="n">icon</span><span class="p">.</span><span class="n">sokol_default</span> <span class="o">=</span> <span class="nb">true</span><span class="p">,</span> <span class="p">.</span><span class="n">logger</span><span class="p">.</span><span class="n">func</span> <span class="o">=</span> <span class="n">slog_func</span><span class="p">,</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div></div> <p>…we’ll also need to make a few changes to our CMakeLists.txt file. Emscripten needs to know that we want a program that runs in the browser. To do that we’ll simply change the executable file extension to <code class="language-plaintext highlighter-rouge">.html</code>. Next we need to tell Emscripten to link with WebGL2.</p> <p>Open the CMakeLists.txt file and change the <code class="language-plaintext highlighter-rouge">Emscripten</code> if-block like this:</p> <p><code class="language-plaintext highlighter-rouge">CMakeLists.txt</code></p> <div class="language-cmake highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">if</span> <span class="p">(</span>CMAKE_SYSTEM_NAME STREQUAL Emscripten<span class="p">)</span> <span class="nb">set</span><span class="p">(</span>CMAKE_EXECUTABLE_SUFFIX .html<span class="p">)</span> <span class="nb">target_link_options</span><span class="p">(</span>hello PUBLIC -sUSE_WEBGL2=1<span class="p">)</span> <span class="nb">endif</span><span class="p">()</span> </code></pre></div></div> <p>In VSCode press <code class="language-plaintext highlighter-rouge">F7</code> to rebuild the program. This should generate three output files in the <code class="language-plaintext highlighter-rouge">build/Debug</code> directory:</p> <ul> <li>hello.html</li> <li>hello.js</li> <li>hello.wasm</li> </ul> <p>Let’s try to run that in the browser. On the command line in the project directory:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emsdk/upstream/emscripten/emrun build/Debug/hello.html </code></pre></div></div> <p>This should open the system’s default web browser and you should see something like this, with the orange rectangle cycling between yellow and red:</p> <p><img src="/images/emscripten-ide-7.png" alt="Browser Screenshot" /></p> <p>…let’s get rid of the ‘window chrome’ by injecting our own minimal <code class="language-plaintext highlighter-rouge">shell.html</code> file.</p> <p>In the project directory, create a file <code class="language-plaintext highlighter-rouge">shell.html</code> looking like this:</p> <p><code class="language-plaintext highlighter-rouge">shell.html</code></p> <div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"&gt;</span> <span class="nt">&lt;html&gt;</span> <span class="nt">&lt;head&gt;</span> <span class="nt">&lt;meta</span> <span class="na">charset=</span><span class="s">"UTF-8"</span><span class="nt">/&gt;</span> <span class="nt">&lt;title&gt;</span>Clear<span class="nt">&lt;/title&gt;</span> <span class="nt">&lt;style </span><span class="na">type=</span><span class="s">"text/css"</span><span class="nt">&gt;</span> <span class="nc">.game</span> <span class="p">{</span> <span class="nl">position</span><span class="p">:</span> <span class="nb">absolute</span><span class="p">;</span> <span class="nl">top</span><span class="p">:</span> <span class="m">0px</span><span class="p">;</span> <span class="nl">left</span><span class="p">:</span> <span class="m">0px</span><span class="p">;</span> <span class="nl">margin</span><span class="p">:</span> <span class="m">0px</span><span class="p">;</span> <span class="nl">border</span><span class="p">:</span> <span class="m">0</span><span class="p">;</span> <span class="nl">width</span><span class="p">:</span> <span class="m">100%</span><span class="p">;</span> <span class="nl">height</span><span class="p">:</span> <span class="m">100%</span><span class="p">;</span> <span class="nl">overflow</span><span class="p">:</span> <span class="nb">hidden</span><span class="p">;</span> <span class="nl">display</span><span class="p">:</span> <span class="nb">block</span><span class="p">;</span> <span class="nl">image-rendering</span><span class="p">:</span> <span class="n">optimizeSpeed</span><span class="p">;</span> <span class="nl">image-rendering</span><span class="p">:</span> <span class="n">-moz-crisp-edges</span><span class="p">;</span> <span class="nl">image-rendering</span><span class="p">:</span> <span class="n">-o-crisp-edges</span><span class="p">;</span> <span class="nl">image-rendering</span><span class="p">:</span> <span class="n">-webkit-optimize-contrast</span><span class="p">;</span> <span class="nl">image-rendering</span><span class="p">:</span> <span class="n">optimize-contrast</span><span class="p">;</span> <span class="nl">image-rendering</span><span class="p">:</span> <span class="n">crisp-edges</span><span class="p">;</span> <span class="nl">image-rendering</span><span class="p">:</span> <span class="n">pixelated</span><span class="p">;</span> <span class="nl">-ms-interpolation-mode</span><span class="p">:</span> <span class="n">nearest-neighbor</span><span class="p">;</span> <span class="p">}</span> <span class="nt">&lt;/style&gt;</span> <span class="nt">&lt;/head&gt;</span> <span class="nt">&lt;body</span> <span class="na">style=</span><span class="s">"background:black"</span><span class="nt">&gt;</span> <span class="nt">&lt;canvas</span> <span class="na">class=</span><span class="s">"game"</span> <span class="na">id=</span><span class="s">"canvas"</span> <span class="na">oncontextmenu=</span><span class="s">"event.preventDefault()"</span><span class="nt">&gt;&lt;/canvas&gt;</span> <span class="nt">&lt;script </span><span class="na">type=</span><span class="s">"text/javascript"</span><span class="nt">&gt;</span> <span class="kd">var</span> <span class="nx">Module</span> <span class="o">=</span> <span class="p">{</span> <span class="na">preRun</span><span class="p">:</span> <span class="p">[],</span> <span class="na">postRun</span><span class="p">:</span> <span class="p">[],</span> <span class="na">print</span><span class="p">:</span> <span class="p">(</span><span class="kd">function</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="kd">function</span><span class="p">(</span><span class="nx">text</span><span class="p">)</span> <span class="p">{</span> <span class="nx">text</span> <span class="o">=</span> <span class="nb">Array</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">slice</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="nx">arguments</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="dl">'</span><span class="s1"> </span><span class="dl">'</span><span class="p">);</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">text</span><span class="p">);</span> <span class="p">};</span> <span class="p">})(),</span> <span class="na">printErr</span><span class="p">:</span> <span class="kd">function</span><span class="p">(</span><span class="nx">text</span><span class="p">)</span> <span class="p">{</span> <span class="nx">text</span> <span class="o">=</span> <span class="nb">Array</span><span class="p">.</span><span class="nx">prototype</span><span class="p">.</span><span class="nx">slice</span><span class="p">.</span><span class="nx">call</span><span class="p">(</span><span class="nx">arguments</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="dl">'</span><span class="s1"> </span><span class="dl">'</span><span class="p">);</span> <span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="nx">text</span><span class="p">);</span> <span class="p">},</span> <span class="na">canvas</span><span class="p">:</span> <span class="p">(</span><span class="kd">function</span><span class="p">()</span> <span class="p">{</span> <span class="kd">var</span> <span class="nx">canvas</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="dl">'</span><span class="s1">canvas</span><span class="dl">'</span><span class="p">);</span> <span class="nx">canvas</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="dl">"</span><span class="s2">webglcontextlost</span><span class="dl">"</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{</span> <span class="nx">alert</span><span class="p">(</span><span class="dl">'</span><span class="s1">FIXME: WebGL context lost, please reload the page</span><span class="dl">'</span><span class="p">);</span> <span class="nx">e</span><span class="p">.</span><span class="nx">preventDefault</span><span class="p">();</span> <span class="p">},</span> <span class="kc">false</span><span class="p">);</span> <span class="k">return</span> <span class="nx">canvas</span><span class="p">;</span> <span class="p">})(),</span> <span class="na">setStatus</span><span class="p">:</span> <span class="kd">function</span><span class="p">(</span><span class="nx">text</span><span class="p">)</span> <span class="p">{</span> <span class="p">},</span> <span class="na">monitorRunDependencies</span><span class="p">:</span> <span class="kd">function</span><span class="p">(</span><span class="nx">left</span><span class="p">)</span> <span class="p">{</span> <span class="p">},</span> <span class="p">};</span> <span class="nb">window</span><span class="p">.</span><span class="nx">onerror</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(</span><span class="nx">event</span><span class="p">)</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">onerror: </span><span class="dl">"</span> <span class="o">+</span> <span class="nx">event</span><span class="p">.</span><span class="nx">message</span><span class="p">);</span> <span class="p">};</span> <span class="nt">&lt;/script&gt;</span> {{{ SCRIPT }}} <span class="nt">&lt;/body&gt;</span> <span class="nt">&lt;/html&gt;</span> </code></pre></div></div> <p>…and in the <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code> file, change the linker options like this:</p> <p><code class="language-plaintext highlighter-rouge">CMakeLists.txt</code></p> <div class="language-cmake highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nb">target_link_options</span><span class="p">(</span>hello PUBLIC -sUSE_WEBGL2=1 --shell-file=../shell.html<span class="p">)</span> </code></pre></div></div> <p>…build the project again by pressing <code class="language-plaintext highlighter-rouge">F7</code> and try opening the result in the browser:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>emsdk/upstream/emscripten/emrun build/Debug/hello.html </code></pre></div></div> <p>…the WebGL canvas should now stretch over the entire window client area:</p> <p><img src="/images/emscripten-ide-8.png" alt="Browser Screenshot" /></p> <h3 id="browser-remote-debugging">Browser Remote Debugging</h3> <p>Now on to the last step: making remote debugging work!</p> <p>First, <code class="language-plaintext highlighter-rouge">.vscode/launch.json</code> needs to be changed to start a Chrome remote debug session and a local web server:</p> <p><code class="language-plaintext highlighter-rouge">.vscode/launch.json</code></p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="nl">"version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0.2.0"</span><span class="p">,</span><span class="w"> </span><span class="nl">"configurations"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Launch"</span><span class="p">,</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"chrome"</span><span class="p">,</span><span class="w"> </span><span class="nl">"request"</span><span class="p">:</span><span class="w"> </span><span class="s2">"launch"</span><span class="p">,</span><span class="w"> </span><span class="nl">"url"</span><span class="p">:</span><span class="w"> </span><span class="s2">"http://localhost:3000/build/Debug/${command:cmake.launchTargetFilename}"</span><span class="p">,</span><span class="w"> </span><span class="nl">"preLaunchTask"</span><span class="p">:</span><span class="w"> </span><span class="s2">"StartServer"</span><span class="p">,</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span></code></pre></div></div> <p>…note the <code class="language-plaintext highlighter-rouge">preLaunchTask</code>, this will start a web server using the Live Preview VSCode extension.</p> <p>To define the <code class="language-plaintext highlighter-rouge">StartServer</code> task, create a file <code class="language-plaintext highlighter-rouge">.vscode/tasks.json</code> and populate it like this:</p> <p><code class="language-plaintext highlighter-rouge">.vscode/tasks.json</code></p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="nl">"version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2.0.0"</span><span class="p">,</span><span class="w"> </span><span class="nl">"tasks"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"label"</span><span class="p">:</span><span class="w"> </span><span class="s2">"StartServer"</span><span class="p">,</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"process"</span><span class="p">,</span><span class="w"> </span><span class="nl">"command"</span><span class="p">:</span><span class="w"> </span><span class="s2">"${input:startServer}"</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">],</span><span class="w"> </span><span class="nl">"inputs"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"startServer"</span><span class="p">,</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"command"</span><span class="p">,</span><span class="w"> </span><span class="nl">"command"</span><span class="p">:</span><span class="w"> </span><span class="s2">"livePreview.runServerLoggingTask"</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span></code></pre></div></div> <p>…and that’s it!</p> <p>When pressing <code class="language-plaintext highlighter-rouge">F5</code>, Chrome should now open and load our program:</p> <p><img src="/images/emscripten-ide-9.png" alt="Browser Screenshot" /></p> <p>…while the program is running in the browser, set a breakpoint in <code class="language-plaintext highlighter-rouge">hello.c</code> at the start of function <code class="language-plaintext highlighter-rouge">void frame(void)</code>. The debugger should now stop at the function and you can step through the code:</p> <p><img src="/images/emscripten-ide-10.png" alt="VSCode Screenshot" /></p> <p>And that’s it! You can now make changes to your code and then compile and run/debug with <code class="language-plaintext highlighter-rouge">F5</code>. The only downside is the known issue that early breakpoints are not caught. There are two workarounds, first the one already mentioned to set a breakpoint on the JS side in <code class="language-plaintext highlighter-rouge">build/Debug/hello.js</code> in the <code class="language-plaintext highlighter-rouge">callMain</code> function and stop on that first. This seems to catch any early breakpoints on the C side too.</p> <p>The second option for programs with a render loop is to simply restart the debug session by pressing the ‘Refresh’ button in the VSCode debugger controls:</p> <p><img src="/images/emscripten-ide-11.png" alt="VSCode Screenshot" /></p> <p>This will also catch early breakpoints on the C side, but will popup a warning that the ‘Live Preview…` task is already running. This can simply be ignored.</p> <p>You can also find the project described in this blog post on Github:</p> <p><a href="https://github.com/floooh/vscode-emscripten-debugging">https://github.com/floooh/vscode-emscripten-debugging</a></p> <h2 id="known-issues">Known Issues</h2> <p>The list of issues I stumbled over, hopefully those will be fixed in the future:</p> <ul> <li>The CMake Tools extension doesn’t properly communicate the Emscripten system include path to the C/C++ extension so that Intellisense doesn’t work for system headers.</li> <li>The WASM DWARF debugging extension doesn’t catch early breakpoints on the C side (known issue).</li> <li>The Live Preview extension pops up a warning when refreshing a debug session.</li> </ul> Sat, 11 Nov 2023 00:00:00 +0000 https://floooh.github.io/2023/11/11/emscripten-ide.html https://floooh.github.io/2023/11/11/emscripten-ide.html