Planet Igaliahttps://planet.igalia.com//atom.xml2026-03-16T14:00:29+00:00Planet/2.0 +http://www.planetplanet.orgAndy Wingo: nominal types in webassemblyhttps://wingolog.org/2026/03/10/nominal-types-in-webassembly2026-03-10T08:19:34+00:00
<div><p>Before the managed data types extension to WebAssembly was incorporated
in the standard, there was a huge debate about type equality. The end
result is that if you have two types in a Wasm module that look the
same, like this:</p><pre class="pre-wat">(type $t (struct i32))
(type $u (struct i32))
</pre><p>Then they are for all intents and purposes equivalent. When a Wasm
implementation loads up a module, it has to partition the module’s types
into equivalence classes. When the Wasm program references a given type
by name, as in <tt>(struct.get $t 0)</tt> which would get the first field of
type <tt>$t</tt>, it maps <tt>$t</tt> to the equivalence class containing <tt>$t</tt> and
<tt>$u</tt>. See the <a href="https://webassembly.github.io/spec/core/valid/conventions.html#rolling-and-unrolling">spec</a>, for more details.</p><p>This is a form of <i>structural type equality</i>. Sometimes this is what you
want. But not always! Sometimes you want <i>nominal types</i>, in which no
type declaration is equivalent to any other. WebAssembly doesn’t have
that, but it has something close: <i>recursive type groups</i>. In fact, the
type declarations above are equivalent to these:</p><pre class="pre-wat">(rec (type $t (struct i32)))
(rec (type $u (struct i32)))
</pre><p>Which is to say, each type is in a group containing just itself. One
thing that this allows is self-recursion, as in:</p><pre class="pre-way">(type $succ (struct (ref null $succ)))
</pre><p>Here the struct’s field is itself a reference to a <tt>$succ</tt> struct, or
null (because it’s <tt>ref null</tt> and not just <tt>ref</tt>).</p><p>To allow for mutual recursion between types, you put them in the same <tt>rec</tt>
group, instead of each having its own:</p><pre class="pre-wat">(rec
(type $t (struct i32))
(type $u (struct i32)))
</pre><p>Between <tt>$t</tt> and <tt>$u</tt> we don’t have mutual recursion though, so why
bother? Well <tt>rec</tt> groups have another role, which is that they are the
unit of structural type equivalence. In this case, types <tt>$t</tt> and <tt>$u</tt>
are not in the same equivalence class, because they are part of the same
<tt>rec</tt> group. Again, see <a href="https://webassembly.github.io/spec/core/valid/conventions.html#defined-types">the spec</a>.</p><p>Within a Wasm module, <tt>rec</tt> gives you an approximation of nominal
typing. But what about between modules? Let’s imagine that <tt>$t</tt>
carries important capabilities, and you don’t want another module to be
able to forge those capabilities. In this case, <tt>rec</tt> is not enough:
the other module could define an equivalent <tt>rec</tt> group, construct a
<tt>$t</tt>, and pass it to our module; because of isorecursive type equality,
this would work just fine. What to do?</p><h3>cursèd nominal typing</h3><p>I said before that Wasm doesn’t have nominal types. That was true in
the past, but no more! The <a href="https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md">nominal typing
proposal</a>
was incorporated in the standard last July. Its vocabulary is a bit
odd, though. You have to define your data types with the <a href="https://webassembly.github.io/spec/core/syntax/types.html#tag-types"><tt>tag</tt> keyword</a>:</p><pre>(tag $v (param $secret i32))
</pre><p>Syntactically, these data types are a bit odd: you have to declare
fields using <tt>param</tt> instead of <tt>field</tt> and you don’t have to wrap the
fields in <tt>struct</tt>.</p><p>They also omit some features relative to isorecursive structs, namely
subtyping and mutability. However, sometimes subtyping is not
necessary, and one can always assignment-convert mutable fields, wrapping them in mutable structs as needed.</p><p>To construct a nominally-typed value, the mechanics are somewhat
involved; instead of <tt>(struct.new $t (i32.const 42))</tt>, you use <a href="https://webassembly.github.io/spec/core/exec/instructions.html#xref-syntax-instructions-syntax-instr-control-mathsf-throw-x"><tt>throw</tt></a>:</p><pre>(block $b (result (ref exn))
(try_table
(catch_all_ref $b)
(throw $v (i32.const 42)))
(unreachable))
</pre><p>Of course, as this is a new proposal, we don’t yet have precise type
information on the Wasm side; the new instance instead is returned as
the top type for nominally-typed values, <tt>exn</tt>.</p><p>To check if a value is a <tt>$v</tt>, you need to write a bit of code:</p><pre>(func $is-v? (param $x (ref exn)) (result i32)
(block $yep (result (ref exn))
(block $nope
(try_table
(catch_ref $v $yep)
(catch_all $nope)
(throw_ref (local.get $x))))
(return (i32.const 0)))
(return (i32.const 1)))
</pre><p>Finally, field access is a bit odd; unlike structs which have
<tt>struct.get</tt>, nominal types receive all their values via a <tt>catch</tt>
handler.</p><pre>(func $v-fields (param $x (ref exn)) (result i32)
(try_table
(catch $v 0)
(throw_ref (local.get $x)))
(unreachable))
</pre><p>Here, the <tt>0</tt> in the <tt>(catch $v 0)</tt> refers to the function call itself:
all fields of <tt>$v</tt> get returned from the function call. In this case
there’s only one, othewise a get-fields function would return multiple
values. Happily, this accessor preserves type safety: if <tt>$x</tt> is not
actually <tt>$v</tt>, an exception will be thrown.</p><p>Now, sometimes you want to be quite strict about your nominal type
identities; in that case, just define your <tt>tag</tt> in a module and don’t
export it. But if you want to enable composition in a principled way,
not just subject to the randomness of whether another module happens to
implement a type structurally the same as your own, the nominal typing
proposal also gives a preview of <a href="https://github.com/WebAssembly/proposal-type-imports/blob/main/proposals/type-imports/Overview.md">type
imports</a>.
The facility is direct: you simply export your <tt>tag</tt> from your module,
and allow other modules to import it. Everything will work as expected!</p><h3>fin</h3><p>Friends, as I am sure is abundantly clear, this is a troll post :) It’s
not wrong, though! All of the facilities for nominally-typed structs
without subtyping or field mutability are present in the
exception-handling proposal.</p><p>The context for this work was that I was updating
<a href="https://spritely.institute/hoot/">Hoot</a> to use the newer version of
Wasm exception handling, instead of the pre-standardization version. It
was a nice change, but as it introduces the <tt>exnref</tt> type, it does open
the door to some funny shenanigans, and I find it hilarious that the
committee has been hemming and hawwing about type imports for 7 years
and then goes and ships it in this backward kind of way.</p><p>Next up, exception support in
<a href="https://codeberg.org/andywingo/wastrel">Wastrel</a>, as soon as I can
figure out where to allocate type tags for this new nominal typing
facility. Onwards and upwards!</p></div> Andy Wingohttps://wingolog.org/Yeunjoo Choi: Smarter Chromium GN in Vim with gn-language-serverhttps://duswnchl.github.io/posts/smarter-chromium-gn-in-vim-with-gn-language-server/2026-03-10T03:06:00+00:00
<p>GN Language Server for Chromium development was announced on <a href="https://groups.google.com/a/chromium.org/g/chromium-dev/c/uTa5mrlvbvw/m/vTVpKZPVDwAJ">chromium-dev</a>.
It’s very easy to install in VSCode, NeoVim or Emacs. But how can we configure
it with classic Vim + <a href="https://github.com/ycm-core/YouCompleteMe">YCM</a>?</p>
<h2 id="setup">Setup</h2>
<p>First, install the language server with Cargo.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cargo <span class="nb">install</span> <span class="nt">--locked</span> gn-language-server
</code></pre></div></div>
<p>Then, add this to your vimrc.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">let </span>g:ycm_language_server <span class="o">=</span> <span class="o">[</span>
<span class="se">\ </span><span class="o">{</span>
<span class="se">\ </span> <span class="s1">'name'</span>: <span class="s1">'gn'</span>,
<span class="se">\ </span> <span class="s1">'cmdline'</span>: <span class="o">[</span> <span class="s1">'gn-language-server'</span> <span class="o">]</span>,
<span class="se">\ </span> <span class="s1">'filetypes'</span>: <span class="o">[</span> <span class="s1">'gn'</span> <span class="o">]</span>,
<span class="se">\ </span><span class="o">}</span>
<span class="se">\ </span><span class="o">]</span>
</code></pre></div></div>
<p>That easy, right?</p>
<h2 id="whats-working">What’s Working</h2>
<h3 id="hover-documentation">Hover Documentation</h3>
<p><img src="https://duswnchl.github.io/assets/posts/smarter-chromium-gn-in-vim-with-gn-language-server/hover.gif" alt="hover" /></p>
<h3 id="go-to-imports">Go To Imports</h3>
<p><img src="https://duswnchl.github.io/assets/posts/smarter-chromium-gn-in-vim-with-gn-language-server/jump_import.gif" alt="jump_import" /></p>
<h3 id="go-to-dependencies">Go To Dependencies</h3>
<p><img src="https://duswnchl.github.io/assets/posts/smarter-chromium-gn-in-vim-with-gn-language-server/jump_deps.gif" alt="jump_deps" /></p>
<h2 id="current-limitations">Current Limitations</h2>
<p>The following features are not working yet. They may need more configuration or
further work:</p>
<h3 id="code-folding">Code Folding</h3>
<p>Classic Vim and YCM don’t support LSP-based folding, and I’m not a big fan of
that feature anyway. But you can configure another plugin that supports
LSP-based folding, or simply rely on indent-based folding.</p>
<h3 id="go-to-definition">Go To Definition</h3>
<p>When I try to go to the definition of <code class="language-plaintext highlighter-rouge">template</code>, I get an error <code class="language-plaintext highlighter-rouge">KeyError:
'uri'</code>. I’m not sure whether this is caused by my local configuration, but it
needs further investigation.
<img src="https://duswnchl.github.io/assets/posts/smarter-chromium-gn-in-vim-with-gn-language-server/go_def_error.gif" alt="go_def_error" /></p> Yeunjoo Choihttps://duswnchl.github.io/tags/igalia-planet/Igalia WebKit Team: WebKit Igalia Periodical #59https://blogs.igalia.com/webkit/blog/2026/wip-59/2026-03-09T20:02:33+00:00
<p>Update on what happened in WebKit in the week from March 2 to March 9.</p>
<p>
As part of this week's handful of news, WebKitGTK and WPE WebKit
now have support for Gamepad's "VibationActuator" property, the
video decoding limit is now configurable at runtime in addition
to build time, and an interesting fix that makes WebKit render
fonts like other browsers by making it blend text incorrectly (!).
</p>
<h2 id="cross-port-cat">Cross-Port 🐱</h2>
<div class="wip-item">
<p>Using <code>libmanette</code>'s <em>rumble</em> support, enabled <a rel="external" href="https://developer.mozilla.org/en-US/docs/Web/API/Gamepad/vibrationActuator">Gamepad <em>VibrationActuator</em></a> for <a rel="external" href="https://commits.webkit.org/308799@main">WebKitGTK</a> and <a rel="external" href="https://commits.webkit.org/308792@main">WPE WebKit</a>.</p>
<p>With these changes, <a rel="external" href="https://developer.mozilla.org/en-US/docs/Web/API/GamepadHapticActuator/playEffect">playEffect()</a> can be used to play <em>dual-rumble</em> vibration effects.</p>
</div>
<h3 id="multimedia-movie-camera">Multimedia 🎥</h3>
<div class="wip-description">
<p>GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.</p>
</div>
<div class="wip-item">
<p><code>VIDEO_DECODING_LIMIT</code> is now <a rel="external" href="https://bugs.webkit.org/show_bug.cgi?id=308969">configurable at runtime</a>, in addition to build time. That will allow vendors that share a single binary build on different platforms to fine-tune their needs without a rebuild.</p>
</div>
<h3 id="graphics-frame-photo">Graphics 🖼️</h3>
<div class="wip-item">
<p>Landed <a rel="external" href="https://github.com/WebKit/WebKit/pull/59880">a change</a> that tweaks the text rendering done with Skia. With this change, the text looks more natural now - just like in other browsers. However, this is done by blending text incorrectly as a compromise.</p>
</div>
<h2 id="releases-package">Releases 📦️</h2>
<div class="wip-item">
<p>One more set of release candidates for the upcoming stable branch,
<a rel="external" href="https://webkitgtk.org/2026/03/06/webkitgtk2.51.93-released.html">WebKitGTK 2.51.93</a> and
<a rel="external" href="https://wpewebkit.org/release/wpewebkit-2.51.93.html">WPE WebKit 2.51.93</a>,
have been published. For those interested in previewing the upcoming 2.52.x
series this release is expected to be quite stable. Reporting <a rel="external" href="https://bugs.webkit.org/">issues in Bugzilla</a> are,
as usual, more than welcome.</p>
</div>
<div class="wip-end">
<p>That’s all for this week!</p>
</div> Igalia WebKit Teamhttps://blogs.igalia.com/webkitTiago Vignatti: Accessibility and PDF documentshttps://vignatti.com/posts/accessibility-and-pdfs/2026-03-04T13:00:00+00:00
<div>
<h2 class="relative group">Accessibility
<div id="accessibility" class="anchor"></div>
<span class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none">
<a class="text-primary-300 dark:text-neutral-700 !no-underline" href="https://vignatti.com/posts/index.xml#accessibility">#</a>
</span>
</h2>
<p>When we think of <strong>accessibility</strong>, we tend to picture it as something designed for a small minority. The reality is much broader: <strong>16% of the world’s population — 1.3 billion people — live with a significant disability</strong><a href="https://www.who.int/news-room/fact-sheets/detail/disability-and-health" target="_blank" rel="noreferrer">¹</a>. In <strong>Brazil</strong> alone, where I live, that means around <strong>14.4 million people report some form of disability</strong><a href="https://agenciadenoticias.ibge.gov.br/en/agencia-news/2184-news-agency/news/43477-2022-census-brazil-has-14-4-million-persons-with-disabilities" target="_blank" rel="noreferrer">²</a>. And those numbers capture only permanent disabilities.</p></div> Tiago Vignattihttps://vignatti.com/posts/Igalia WebKit Team: WebKit Igalia Periodical #58https://blogs.igalia.com/webkit/blog/2026/wip-58/2026-03-02T20:11:00+00:00
<p>Update on what happened in WebKit in the week from February 23 to March 2.</p>
<p>
This installment of the periodical brings news about support
for Qualcomm qtivdec2 and qtivenc2 on GStreamer, GPU texture
atlas creation and replay substitution, enhancement of the scroll
gesture in WPE, and two new releases: WebKitGTK 2.51.92 and WPE
WebKit 2.51.92.
</p>
<h2 id="cross-port-cat">Cross-Port 🐱</h2>
<h3 id="multimedia-movie-camera">Multimedia 🎥</h3>
<div class="wip-description">
<p>GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.</p>
</div>
<div class="wip-item">
<p>Work on adding support for the Qualcomm GStreamer qtivdec2 and qtivenc2 elements is on-going</p>
</div>
<h3 id="graphics-frame-photo">Graphics 🖼️</h3>
<div class="wip-item">
<p><a rel="external" href="https://commits.webkit.org/308458@main">Implemented GPU texture atlas creation and replay substitution</a> in the Skia painting engine on GTK/WPE. After recording, raster images are packed into GPU atlases via <code>BitmapTexture</code>, with two upload paths: an optimized DMA-buf path that memory-maps GPU buffers and dispatches uploading to a dedicated worker thread, and a synchronous GL fallback using <code>BitmapTexture::updateContents()</code>. Atlas uploads are synchronized across workers using a countdown-latch fence. During replay, <code>SkiaReplayCanvas</code> intercepts raster image draws and substitutes them with atlas texture draws, mapping source coordinates into atlas space.</p>
</div>
<h2 id="wpe-webkit-pager">WPE WebKit 📟</h2>
<h3 id="wpe-platform-api-jigsaw">WPE Platform API 🧩</h3>
<div class="wip-description">
<p>New, modern platform API that supersedes usage of libwpe and WPE backends.</p>
</div>
<div class="wip-item">
<p>The recent WPE WebKit 2.51.92 release is the first one to have its <a rel="external" href="https://wpewebkit.org/reference/2.51.92/wpe-platform-2.0/">WPEPlatform documentation online</a>, but it was not included in the tarball. This issue <a rel="external" href="https://commits.webkit.org/308408@main">has been corrected</a> and tarballs for future releases will also include this documentation.</p>
</div>
<div class="wip-item">
<p>Scrolling using touch input with WPEPlatform would result in scrolling faster when more than one touch point was in effect. The gesture detector <a rel="external" href="https://commits.webkit.org/308271@main">has been fixed</a> to make scrolling have always a consistent speed.</p>
</div>
<h2 id="releases-package">Releases 📦️</h2>
<div class="wip-item">
<p>The third —and likely the last— release candidates for the upcoming stable branch, <a rel="external" href="https://webkitgtk.org/2026/02/27/webkitgtk2.51.92-released.html">WebKitGTK 2.51.92</a> and <a rel="external" href="https://wpewebkit.org/release/wpewebkit-2.51.92.html">WPE WebKit 2.51.92</a>, have been published. For those interested in previewing the upcoming 2.52.x series this release is expected to be quite stable; but there might be still some rough edges. Reporting <a rel="external" href="https://bugs.webkit.org/">issues in Bugzilla</a> are, as usual, more than welcome.</p>
</div>
<div class="wip-end">
<p>That’s all for this week!</p>
</div> Igalia WebKit Teamhttps://blogs.igalia.com/webkitZiran Sun: A Day in “State of the Browser 2026” Conferencehttps://blogs.igalia.com/zsun/?p=8372026-03-02T18:11:17+00:00
<p>The “State of the Browser 2026” Conference was held on Saturday, the 28th of February in <a href="https://www.barbican.org.uk/" target="_blank" rel="noreferrer noopener">The Barbican Centre</a>, London. It is a yearly conference organised by <a href="https://londonwebstandards.org/">London Web Standards</a>. This is year is the 14th Edition.</p>
<p>From Igalia, this year we had<a href="https://www.igalia.com/team/lwarlow"> Luke Warlow</a> and myself attended in person, <a href="https://www.igalia.com/team/jfernandez">Javier Fernández</a> attended online. My colleague <a href="https://www.igalia.com/team/sstimac">Stephanie Stimac</a> introduced this event to <a href="https://www.igalia.com/">Igalia</a> a couple of years ago. Now <a href="https://www.igalia.com/">Igalia</a> has become one of the sponsors for this great event. Luke had participated this event previously so it’s very helpful to understand more about this event from his note.</p>
<p>The event is a one-day, single-track conference that is community focused. While queuing for the registrations, a couple of attendees commented that talks for this event had been very good in the past few years. I’d say, this year was not an exception. I thoroughly enjoyed the talks, and the whole experiences.</p>
<p><a href="https://www.youtube.com/@londonwebstandards8403">Talks</a> throughout the day covered a wide variety of topics including CSS features, accessibility, JS footprint, playing with gaming APIs and the art of connecting to people etc.. As someone who loves food, maybe I can describe it as a feast with content, taste, depth, variety…and a bit fun factor?</p>
<p>The open talk was <a href="https://www.bram.us/2026/02/28/anchors-aweigh-sotb2026/">Anchor positioning by Bramus Van Damme</a>. The walk-through on the feature with examples were pretty cool, especially the case of a popover… with a little triangle (You’ll know what I mean if you look up the talk). <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1808823">Igalia worked on popover for Firefox</a> in 2024, sponsored by Google. It’s really great to see that anchor positioning is in Firefox – popover has now found its place.</p>
<p>It was nice to hear that Igalians’ names were mentioned in the <a href="https://2026.stateofthebrowser.com/speaker/jason-williams/">Temporal talk by Jason Williams from Bloomberg</a>. A big shout out to <a href="https://www.igalia.com/team/pchimento">Philip Chimento</a>, <a href="https://www.igalia.com/team/usharma">Ujjwal Sharma</a> who have participated substantially in the discussions about standardizing Temporal over the years and my fellow Igalians who have been writing spec PRs and tests for the feature. Check on <a href="https://blogs.igalia.com/compilers/2026/02/02/implementing-the-temporal-proposal-in-javascriptcore/">Tim Chevalier’s blog on “Implementing the Temporal proposal in JavaScriptCore”</a> if you’d like to find out more.</p>
<p>The atmosphere of the event was friendly, inclusive and energetic. I was very happy bumping into some ex-colleagues and making new friends.</p>
<p>One final note – This event brings a range of attendees, many are web developers. There are representatives from companies and browser vendors etc.. For some web developers, “<a href="https://www.igalia.com/">Igalia</a>” is a new name. I had a question like “Oh, is it the company with rainbow colours in the sponsors?”. Yes, <em>Igalia</em> is <em>a private, worker-owned, employee-run cooperative model consultancy focused on open source software</em>[<a href="https://en.wikipedia.org/wiki/Igalia">1</a>]. And Igalia has been a part of the Interop Project since its inception in 2021. Here is Igalia’s “rainbowy” logo :-).</p>
<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img width="940" height="426" src="https://blogs.igalia.com/zsun/files/2026/03/igalia_logo-940x426.png" alt="" class="wp-image-844" /></figure></div>
<p></p> zsunhttps://blogs.igalia.com/zsunFrédéric Wang: Stage d'implémentation des normes Web (session 2026)https://frederic-wang.fr//2026/02/25/stage-implementation-des-normes-web2026-02-24T23:00:00+00:00
<p>Les candidatures pour les <a href="https://www.igalia.com/2026/02/27/Igalia-2026-Coding-Experience-Open-for-Applications.html">« stages de programmation informatique »</a> d’<a href="https://www.igalia.com/about/">Igalia</a> sont officiellement ouvertes jusqu’à début avril. Ils offrent aux étudiant·e·s l’occasion de participer au développement de logiciels libres tout en étant rémunéré·e·s 7 000 € brut pour 450 heures, réparties de juin à décembre 2026.</p>
<p>Comme chaque année, j’encadrerai un·e étudiant·e sur l’« Implémentation des normes Web » (<em>Web Standards</em> en anglais). L’objectif étant de modifier les navigateurs (<a href="https://fr.wikipedia.org/wiki/Chromium">Chromium</a>, <a href="https://fr.wikipedia.org/wiki/Mozilla_Firefox">Firefox</a> ou <a href="https://fr.wikipedia.org/wiki/Safari_(navigateur_web)">Safari</a>…) afin d’améliorer le support de technologies Web (<a href="https://fr.wikipedia.org/wiki/Hypertext_Markup_Language">HTML</a>, <a href="https://fr.wikipedia.org/wiki/Feuilles_de_style_en_cascade">CSS</a>, <a href="https://fr.wikipedia.org/wiki/Document_Object_Model">DOM</a>…). Il faudra notamment étudier les spécifications correspondantes et écrire des <a href="https://web-platform-tests.org/">tests de conformité</a>. Notez bien que ce n’est <em>pas</em> un stage de développement Web mais de développement <a href="https://fr.wikipedia.org/wiki/C%2B%2B">C++</a>.</p>
<p>Un des objectifs de ce programme étant de lutter contre les discriminations professionnelles, tout le monde (y compris celles et ceux qui se sentent sous-représenté·e·s dans le secteur informatique) sont invité·e·s à candidater. Depuis 2016, mon équipe « Web Platform » a ainsi encadré 13 étudiant·e·s de différents pays dans le monde (Espagne, Inde, Italie, <a href="https://www.azabani.com/2020/09/27/my-internship-with-igalia.html">Australie</a>, Cameroun, Chine, Vietnam, Angleterre et États-Unis) dont 7 femmes. L’année dernière, nous avions sélectionné <a href="https://github.com/Charlotte-McCleary">Charlotte McCleary</a>, une Américaine non-voyante qui a travaillé sur l’accessibilité dans Firefox au cours de son stage et a depuis rejoint <a href="https://fizz.studio/about.html">Fizz Studio</a>. J’aimerais encourager les étudiant·e·s Sourd·e·s à postuler et donne dans la vidéo ci-dessous une brève présentation du programme en LSF (en espérant que ce soit compréhensible et que vous serez indulgents avec mon piètre niveau en langue des signes 😅):</p>
<div></div>
<p>Si vous êtes intéréssé·e·s, <a href="https://www.igalia.com/coding-experience/">remplissez ce formulaire</a> en cochant la case <em>Web Standards</em> et en précisant éventuellement que vous avez trouvé cette offre via mon site Web. Enfin, si vous connaissez des étudiant·e·s qui pourraient participer, n’hésitez pas à partager l’annonce !</p> Frédéric Wanghttps://frederic-wang.fr//Igalia WebKit Team: WebKit Igalia Periodical #57https://blogs.igalia.com/webkit/blog/2026/wip-57/2026-02-23T19:52:49+00:00
<p>Update on what happened in WebKit in the week from February 9 to February 23.</p>
<p>
In this week we have a nice fix for video streams timestamps, a fix
for a PDF rendering regression, support for rendering video buffers
provided by Qualcomm video decoders, and a fix for a font selection
issue. Also notable we had a new WPE Android release, and the libsoup
3.6.6 release.
</p>
<h2 id="cross-port-cat">Cross-Port 🐱</h2>
<div class="wip-item">
<p>Added a <a rel="external" href="https://commits.webkit.org/307348@main">new <code>webkit_feature_list_find()</code> convenience function</a> to the public API, which searches for a <a rel="external" href="https://webkitgtk.org/reference/webkitgtk/2.51.91/struct.Feature.html">WebKitFeature</a> given its identifier.</p>
</div>
<h3 id="multimedia-movie-camera">Multimedia 🎥</h3>
<div class="wip-description">
<p>GStreamer-based multimedia support for WebKit, including (but not limited to) playback, capture, WebAudio, WebCodecs, and WebRTC.</p>
</div>
<div class="wip-item">
<p><a rel="external" href="https://commits.webkit.org/307359@main">Opportunistically fix decoding timestamps to prevent deletion of preexisting samples when PTS doesn't conflict</a>, fixing potential glitches when inserting videos (eg: ad insertion).</p>
</div>
<h3 id="graphics-frame-photo">Graphics 🖼️</h3>
<div class="wip-item">
<p><a rel="external" href="https://commits.webkit.org/308033@main">Fixed</a> a <a rel="external" href="https://bugs.webkit.org/show_bug.cgi?id=306621">PDF rendering regression</a> caused by the canvas 2D operation recording feature, where switching between the recording canvas and the GPU surface canvas failed to preserve the full save/restore nesting, clip stack, and transparency layer state. Replaced the fragile state-copying approach with a state replay mechanism in GraphicsContextSkia that tracks the full sequence of save restore, clip, and transparency layer operations, then reconstructs the exact nesting on the target canvas when flushing a recording.</p>
</div>
<div class="wip-item">
<p><a rel="external" href="https://commits.webkit.org/307174@main">Added support</a> for rendering video buffers provided by Qualcomm hardware-accelerated decoders, with aid from the <a rel="external" href="https://registry.khronos.org/OpenGL/extensions/EXT/EXT_YUV_target.txt">EXT_YUV_target</a> OpenGL extension.</p>
</div>
<div class="wip-item">
<p><a rel="external" href="https://commits.webkit.org/307565@main">Fixed</a> the font selection issue that the system fallback font cache mixed up different font styles.</p>
</div>
<h2 id="releases-package">Releases 📦️</h2>
<div class="wip-item">
<p><a rel="external" href="https://github.com/Igalia/wpe-android/releases/tag/v0.3.2">WPE Android 0.3.2</a> has been released, and prebuilt packages are available <a rel="external" href="https://central.sonatype.com/artifact/org.wpewebkit.wpeview/wpeview/">at the Maven Central repository</a>. This is a stable maintenance release which updates WPE WebKit to 2.50.5, which is the most recent stable release.</p>
</div>
<div class="wip-item">
<p><a rel="external" href="https://gitlab.gnome.org/GNOME/libsoup/-/releases/3.6.6">libsoup 3.6.6</a> has been released with numerous bug and security fixes.</p>
</div>
<div class="wip-end">
<p>That’s all for this week!</p>
</div> Igalia WebKit Teamhttps://blogs.igalia.com/webkitMauricio Faria de Oliveira: page_owner Part 2: optimizing outputhttps://mfo.dev.br/posts/2026-02-23-page_owner-part-2-optimizing-output/2026-02-23T00:00:00+00:00
<p>This blog post is <a href="https://mfo.dev.br/tags/page_owner/">part of a series</a> about the <code>page_owner</code> debug feature in the Linux memory management subsystem, related to the talk <em><a href="http://www.youtube.com/watch?v=qFdjO3t5F9I">Improving <code>page_owner</code> for profiling and monitoring memory usage per allocation stack trace</a></em> presented at <a href="https://lpc.events/event/19/contributions/2202/">Linux Plumbers Conference 2025</a>.</p>
<ul>
<li><a href="https://mfo.dev.br/posts/2026-02-23-page_owner-part-1-quick-introduction/">Part 1</a> is a quick introduction to <code>page_owner</code> and its debugfs files.</li>
<li>Part 2 describes challenges with processing <code>page_owner</code> files over time and a solution with new debugfs files in Linux v6.19.</li>
</ul>
<h1 id="problem-stack-traces-over-time">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#problem-stack-traces-over-time">
Problem: stack traces over time
</a>
</h1>
<p>As described in Part 1, <code>page_owner</code>’s debugfs files contain stack traces for the most part:</p>
<ul>
<li><code>/sys/kernel/debug/page_owner</code> has one stack trace per allocated page, and</li>
<li><code>/sys/kernel/debug/page_owner_stacks/show_stacks</code> lists the stack traces that allocated pages.</li>
</ul>
<p>Reading and processing a significant amount of stack traces incurs a non-trivial computational cost in CPU and memory (copying to, and processing in, userspace) and storage usage, as the total size for such long strings might become large. This shouldn’t be an issue if done only once, but it does pose a concern if done repeatedly.</p>
<p>Take the processing of stack traces one step further and that concern materializes into a technical problem:</p>
<blockquote>
<p>How to store information (say, number of pages) <em>per-stack trace</em> and <em>over time</em>?</p></blockquote>
<p>For that, the stack trace must become a <em>key</em> to be assigned <em>values</em> from multiple reads over time. However, keys are usually numbers or somewhat short identifiers, not such long strings as stack traces (although doable, that is computationally more expensive in CPU and memory usage).</p>
<h1 id="workaround-stack-trace-hashing">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#workaround-stack-trace-hashing">
Workaround: stack trace hashing
</a>
</h1>
<p>One possible solution to this problem is hashing the stack traces and using the resulting hash values as keys.</p>
<p>However, this is inefficient with <code>page_owner</code> since there is significant duplication of stack traces on both debugfs files:</p>
<ul>
<li>In the <code>page_owner</code> file, even on a single read, some stack traces may have tens/hundreds/thousands of duplicates; and they compound on multiple reads over time.</li>
<li>In the <code>show_stacks</code> file, there are no duplicates on a single read, but duplicates frequently happen on multiple reads over time.</li>
</ul>
<p>With a high ratio of duplication, the dominant component in computational cost is the hashing step, which is significantly more expensive than the remaining step that simply use the resulting keys for storing values.</p>
<p>Additionally, the hashing step is usually repeated with the same data set (stack traces present in previous reads), which means that most of the calculations are discarded and done again on every read – wasting time and computational resources.</p>
<p>For illustration purposes, compare the execution time of <a href="https://mfo.dev.br/posts/2026-02-23-page_owner-part-2-optimizing-output/#script-1-page_owner-to-show_stackspy">script <code>page_owner-to-show_stacks.py</code></a>, which parses the <code>page_owner</code> file hashing the stack traces (with the <a href="https://github.com/Cyan4973/xxHash?tab=readme-ov-file#benchmarks">extremely fast</a> <code>XXH3_64</code>) and accumulating the number of pages per stack trace, reporting it at the end – basically mimicking <code>show_stacks</code> – with just reading the equivalent file.</p>
<p>The single read with hashing is 38.55 times slower:</p>
<div class="highlight"><pre tabindex="0"><code class="language-shell"><span><span><span># time ./page_owner-to-show_stacks.py </sys/kernel/debug/page_owner >/dev/null</span>
</span></span><span><span>
</span></span><span><span>real 0m1.542s
</span></span><span><span>user 0m1.486s
</span></span><span><span>sys 0m0.057s
</span></span><span><span>
</span></span><span><span><span># time cat /sys/kernel/debug/page_owner_stacks/show_stacks >/dev/null</span>
</span></span><span><span>
</span></span><span><span>real 0m0.040s
</span></span><span><span>user 0m0.000s
</span></span><span><span>sys 0m0.040s
</span></span></code></pre></div><p>So, considering the single-read results with the <code>page_owner</code> file, it’s not compelling to use it for multiple reads. However, multiple reads of the <code>show_stacks</code> file instead should perform better, though, as it contains unique stack traces and likely a lower ratio of duplication on multiple reads than in a single read of the former file.</p>
<p>Check the execution time of <a href="https://mfo.dev.br/posts/2026-02-23-page_owner-part-2-optimizing-output/#script-2-show_stacks-over-timepy">script <code>show_stacks-over-time.py</code></a>, which parses copies of <code>show_stacks</code> (collected over time), similarly hashing the stack traces and storing the number of pages per stack trace over time (that is, per copy).</p>
<p>For 100 copies, the execution time is almost 1 second:</p>
<div class="highlight"><pre tabindex="0"><code class="language-shell"><span><span><span># time ./show_stacks-over-time.py show_stacks.{1..100} >/dev/null</span>
</span></span><span><span>
</span></span><span><span>real 0m0.944s
</span></span><span><span>user 0m0.900s
</span></span><span><span>sys 0m0.044s
</span></span></code></pre></div><p>That is a great improvement (comparing to processing a single read of the <code>page_owner</code> file), but this is just a particular case on a lightly stressed, small VM with 1 GiB RAM. There is still the computational cost of hashing, which might increase processing time in cases with more stack traces (that is, a greater number of different code paths for memory allocation were exercised in the kernel).</p>
<h1 id="solution-stack-trace-handle-numbers">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#solution-stack-trace-handle-numbers">
Solution: stack trace handle numbers
</a>
</h1>
<p>The hashing of stack traces is only required in order to obtain a <em>unique identifier</em> for each stack trace, so that it can be used as a <em>key</em>. However, if such an identifier were already available, the hashing step (and associated computational cost) could be avoided altogether.</p>
<p>Fortunately, that is now the case with Linux 6.19! The stack trace storage used by <code>page_owner</code> ( <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/stackdepot.h?h=v6.19"><code>stackdepot</code></a>) provides a <em>handle number</em> to uniquely refer to stack traces – which meets the requirement.</p>
<p>Linux 6.19 <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/Documentation/mm/page_owner.rst?h=v6.19&id=0de9a442eeba4a6435af74120822b10b12ab8449">contains two new debugfs files</a> with <em>handle numbers</em> for optimized output:</p>
<ul>
<li><code>/sys/kernel/debug/page_owner_stacks/show_handles</code>: this lists <code>nr_base_pages:</code> per <code>handle:</code> (instead of per stack trace as in <code>show_stacks</code>)</li>
<li><code>/sys/kernel/debug/page_owner_stacks/show_stacks_handles</code>: this lists <code>handle:</code> per stack trace (for resolving handle numbers to stack traces)</li>
</ul>
<p>For the example in the previous post, <code>show_stacks</code> contains:</p>
<div class="highlight"><pre tabindex="0"><code class="language-shell"><span><span><span># cat /sys/kernel/debug/page_owner_stacks/show_stacks</span>
</span></span><span><span>...
</span></span><span><span> get_page_from_freelist+0x1416/0x1600
</span></span><span><span> __alloc_frozen_pages_noprof+0x18c/0x1000
</span></span><span><span> alloc_pages_mpol+0x43/0x100
</span></span><span><span> folio_alloc_noprof+0x56/0xa0
</span></span><span><span> page_cache_ra_unbounded+0xd9/0x230
</span></span><span><span> filemap_fault+0x305/0x1000
</span></span><span><span> __do_fault+0x2c/0xb0
</span></span><span><span> __handle_mm_fault+0x6f4/0xeb0
</span></span><span><span> handle_mm_fault+0xd9/0x210
</span></span><span><span> do_user_addr_fault+0x205/0x600
</span></span><span><span> exc_page_fault+0x61/0x130
</span></span><span><span> asm_exc_page_fault+0x26/0x30
</span></span><span><span>nr_base_pages: <span>9643</span>
</span></span><span><span>
</span></span><span><span>...
</span></span></code></pre></div><p>While, for the same snippet, <code>show_handles</code> contains:</p>
<pre tabindex="0"><code>...
handle: 27000838
nr_base_pages: 9643
...
</code></pre><p>And the handle number can be resolved to a stack trace with <code>show_stacks_handles</code>:</p>
<pre tabindex="0"><code>...
get_page_from_freelist+0x1416/0x1600
__alloc_frozen_pages_noprof+0x18c/0x1000
alloc_pages_mpol+0x43/0x100
folio_alloc_noprof+0x56/0xa0
page_cache_ra_unbounded+0xd9/0x230
filemap_fault+0x305/0x1000
__do_fault+0x2c/0xb0
__handle_mm_fault+0x6f4/0xeb0
handle_mm_fault+0xd9/0x210
do_user_addr_fault+0x205/0x600
exc_page_fault+0x61/0x130
asm_exc_page_fault+0x26/0x30
handle: 27000838
...
</code></pre><h2 id="comparison-show_stacks-vs-show_handles">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#comparison-show_stacks-vs-show_handles">
Comparison: <code>show_stacks</code> vs. <code>show_handles</code>
</a>
</h2>
<p>From the previous post, for <code>show_stacks</code>:</p>
<pre tabindex="0"><code># time cat /sys/kernel/debug/page_owner_stacks/show_stacks \
| wc --bytes | numfmt --to=iec
402K
real 0m0.042s
user 0m0.004s
sys 0m0.046s
</code></pre><p>Now, for <code>show_handles</code>:</p>
<div class="highlight"><pre tabindex="0"><code class="language-shell"><span><span><span># time cat /sys/kernel/debug/page_owner_stacks/show_handles \</span>
</span></span><span><span> | wc --bytes | numfmt --to<span>=</span>iec
</span></span><span><span>31K
</span></span><span><span>
</span></span><span><span>real 0m0.015s
</span></span><span><span>user 0m0.004s
</span></span><span><span>sys 0m0.019s
</span></span></code></pre></div><p>That is only 7.7% of the size and 35.7% of the time! Nice improvements.</p>
<p>Finally, compare the execution time of <a href="https://mfo.dev.br/posts/2026-02-23-page_owner-part-2-optimizing-output/#script-3-show_handles-over-timepy">script <code>show_handles-over-time.py</code></a> with the previous one; it uses handle numbers as keys for stack traces instead of hashing them.</p>
<p>For 100 copies, the execution time is approximately 1/3 of a second, roughly 3 times faster.</p>
<div class="highlight"><pre tabindex="0"><code class="language-shell"><span><span><span># time ./show_handles-over-time.py show_stacks_handles show_handles.ln.{1..100} >/dev/nul</span>
</span></span><span><span>
</span></span><span><span>real 0m0.348s
</span></span><span><span>user 0m0.319s
</span></span><span><span>sys 0m0.030s
</span></span></code></pre></div><h1 id="conclusion">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#conclusion">
Conclusion
</a>
</h1>
<p>The original debugfs files provided by <code>page_owner</code> consist mainly of stack traces, which isn’t an efficient format for reading and processing repeatedly.</p>
<p>In order to store the number of pages used per stack trace over time, the stack traces must be converted to keys for storing values over time, for which hashing can be used. However, even efficient hashing algorithms incur a significant overhead.</p>
<p>In order to address this issue, Linux 6.19 provides new debugfs files for <code>page_owner</code> with <em>handle numbers</em>, which are unique identifiers for stack traces and can be used as keys, instead of hashing.</p>
<p>This optimizes the reading and processing of <code>page_owner</code> information, as it reduces the amount of data copied from kernel to userspace and allows storing the number of pages per stack trace over time without the overhead of hashing.</p>
<h1 id="scripts">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#scripts">
Scripts
</a>
</h1>
<h2 id="script-1-page_owner-to-show_stackspy">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#script-1-page_owner-to-show_stackspy">
Script 1: <code>page_owner-to-show_stacks.py</code>
</a>
</h2>
<div class="highlight"><pre tabindex="0"><code class="language-python"><span><span><span>#!/usr/bin/env python3</span>
</span></span><span><span><span># SPDX-License-Identifier: GPL-2.0</span>
</span></span><span><span><span>#</span>
</span></span><span><span><span># Script to parse /sys/kernel/debug/page_owner, hashing the stack trace</span>
</span></span><span><span><span># of each page and accumulating the number of pages per stack trace.</span>
</span></span><span><span><span># At the end, print all stack traces and their number of pages in a format</span>
</span></span><span><span><span># like /sys/kernel/debug/page_owner_stacks/show_stacks.</span>
</span></span><span><span><span>#</span>
</span></span><span><span><span># Usage: page_owner-to-show_stacks.py </sys/kernel/debug/page_owner</span>
</span></span><span><span><span>#</span>
</span></span><span><span><span># Author: Mauricio Faria de Oliveira <[email protected]></span>
</span></span><span><span>
</span></span><span><span><span>import</span> re
</span></span><span><span><span>import</span> sys
</span></span><span><span><span>import</span> xxhash
</span></span><span><span>
</span></span><span><span>re_page <span>=</span> re<span>.</span>compile(<span>'^Page allocated via order ([0-9]+)'</span>)
</span></span><span><span>re_stack <span>=</span> re<span>.</span>compile(<span>'^ '</span>)
</span></span><span><span>re_empty <span>=</span> re<span>.</span>compile(<span>'^$'</span>)
</span></span><span><span>
</span></span><span><span>pages <span>=</span> {} <span># key -> number of pages</span>
</span></span><span><span>stacks <span>=</span> {} <span># key -> stack trace</span>
</span></span><span><span>
</span></span><span><span><span>for</span> line <span>in</span> sys<span>.</span>stdin:
</span></span><span><span>
</span></span><span><span> <span># middle lines: try stack trace first as it occurs more often</span>
</span></span><span><span> <span>if</span> re_stack<span>.</span><span>match</span>(line):
</span></span><span><span> stack <span>=</span> stack <span>+</span> line
</span></span><span><span> <span>continue</span>
</span></span><span><span>
</span></span><span><span> <span># first line</span>
</span></span><span><span> <span>match</span> <span>=</span> re_page<span>.</span><span>match</span>(line)
</span></span><span><span> <span>if</span> <span>match</span>:
</span></span><span><span> order <span>=</span> int(<span>match</span><span>.</span>group(<span>1</span>));
</span></span><span><span> stack <span>=</span> <span>''</span>
</span></span><span><span> <span>continue</span>
</span></span><span><span>
</span></span><span><span> <span># last line</span>
</span></span><span><span> <span>if</span> re_empty<span>.</span><span>match</span>(line):
</span></span><span><span> key <span>=</span> xxhash<span>.</span>xxh3_64_hexdigest(stack)
</span></span><span><span> nr_pages <span>=</span> <span>2</span> <span>**</span> order
</span></span><span><span>
</span></span><span><span> <span>if</span> key <span>in</span> pages:
</span></span><span><span> pages[key] <span>+=</span> nr_pages
</span></span><span><span> <span>else</span>:
</span></span><span><span> pages[key] <span>=</span> nr_pages
</span></span><span><span> stacks[key] <span>=</span> stack
</span></span><span><span>
</span></span><span><span> <span>continue</span>
</span></span><span><span>
</span></span><span><span><span>for</span> key <span>in</span> stacks<span>.</span>keys():
</span></span><span><span> print(<span>" "</span> <span>+</span> stacks[key]<span>.</span>strip())
</span></span><span><span> print(<span>"nr_base_pages: "</span> <span>+</span> str(pages[key]))
</span></span><span><span> print()
</span></span></code></pre></div><h2 id="script-2-show_stacks-over-timepy">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#script-2-show_stacks-over-timepy">
Script #2: <code>show_stacks-over-time.py</code>
</a>
</h2>
<div class="highlight"><pre tabindex="0"><code class="language-python"><span><span><span>#!/usr/bin/env python3</span>
</span></span><span><span><span># SPDX-License-Identifier: GPL-2.0</span>
</span></span><span><span><span>#</span>
</span></span><span><span><span># Script to parse /sys/kernel/debug/page_owner_stacks/show_stacks in multiple</span>
</span></span><span><span><span># reads, hashing each stack trace and recording the number of base pages per</span>
</span></span><span><span><span># stack trace in each read.</span>
</span></span><span><span><span># At the end, print all stack traces and their number of pages in each read.</span>
</span></span><span><span><span>#</span>
</span></span><span><span><span># Usage: show_stacks-over-time.py <read1> <read2> <read3> ... <read N></span>
</span></span><span><span><span>#</span>
</span></span><span><span><span># Author: Mauricio Faria de Oliveira <[email protected]></span>
</span></span><span><span>
</span></span><span><span><span>import</span> re
</span></span><span><span><span>import</span> sys
</span></span><span><span><span>import</span> xxhash
</span></span><span><span>
</span></span><span><span>re_pages <span>=</span> re<span>.</span>compile(<span>'^nr_base_pages: ([0-9]+)'</span>)
</span></span><span><span>re_stack <span>=</span> re<span>.</span>compile(<span>'^ '</span>)
</span></span><span><span>re_empty <span>=</span> re<span>.</span>compile(<span>'^$'</span>)
</span></span><span><span>
</span></span><span><span>stacks <span>=</span> {} <span># key -> stack trace (all reads)</span>
</span></span><span><span>pages <span>=</span> {} <span># key -> array of number of pages (per read)</span>
</span></span><span><span>read <span>=</span> <span>0</span> <span># number of the current read</span>
</span></span><span><span>
</span></span><span><span><span>if</span> len(sys<span>.</span>argv) <span><</span> <span>2</span>:
</span></span><span><span> exit(<span>1</span>)
</span></span><span><span>
</span></span><span><span>files <span>=</span> sys<span>.</span>argv[<span>1</span>:]
</span></span><span><span>nr_files <span>=</span> len(files)
</span></span><span><span>
</span></span><span><span><span>for</span> file <span>in</span> files:
</span></span><span><span> <span>with</span> open(file, <span>'r'</span>) <span>as</span> fd:
</span></span><span><span> stack <span>=</span> <span>''</span>
</span></span><span><span> <span>for</span> line <span>in</span> fd:
</span></span><span><span>
</span></span><span><span> <span># first lines</span>
</span></span><span><span> <span>if</span> re_stack<span>.</span><span>match</span>(line):
</span></span><span><span> stack <span>=</span> stack <span>+</span> line
</span></span><span><span> <span>continue</span>
</span></span><span><span>
</span></span><span><span> <span># next to last line</span>
</span></span><span><span> <span>match</span> <span>=</span> re_pages<span>.</span><span>match</span>(line)
</span></span><span><span> <span>if</span> <span>match</span>:
</span></span><span><span> nr_pages <span>=</span> int(<span>match</span><span>.</span>group(<span>1</span>));
</span></span><span><span> <span>continue</span>
</span></span><span><span>
</span></span><span><span> <span># last line</span>
</span></span><span><span> <span>if</span> re_empty<span>.</span><span>match</span>(line):
</span></span><span><span> key <span>=</span> xxhash<span>.</span>xxh3_64_hexdigest(stack)
</span></span><span><span>
</span></span><span><span> <span>if</span> key <span>not</span> <span>in</span> stacks:
</span></span><span><span> stacks[key] <span>=</span> stack;
</span></span><span><span>
</span></span><span><span> <span>if</span> key <span>not</span> <span>in</span> pages:
</span></span><span><span> pages[key] <span>=</span> {}
</span></span><span><span>
</span></span><span><span> pages[key][read] <span>=</span> nr_pages
</span></span><span><span>
</span></span><span><span> stack <span>=</span> <span>''</span>
</span></span><span><span> <span>continue</span>
</span></span><span><span>
</span></span><span><span> read <span>+=</span> <span>1</span>
</span></span><span><span>
</span></span><span><span><span>for</span> key <span>in</span> stacks<span>.</span>keys():
</span></span><span><span> print(<span>" "</span> <span>+</span> stacks[key]<span>.</span>strip())
</span></span><span><span>
</span></span><span><span> pages_per_read <span>=</span> []
</span></span><span><span> <span>for</span> read <span>in</span> range(nr_files):
</span></span><span><span> nr_pages <span>=</span> <span>0</span>
</span></span><span><span> <span>if</span> read <span>in</span> pages[key]:
</span></span><span><span> nr_pages <span>=</span> pages[key][read]
</span></span><span><span> pages_per_read<span>.</span>append(str(nr_pages))
</span></span><span><span>
</span></span><span><span> print(<span>' '</span><span>.</span>join(pages_per_read))
</span></span><span><span> print()
</span></span></code></pre></div><h2 id="script-3-show_handles-over-timepy">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#script-3-show_handles-over-timepy">
Script #3: <code>show_handles-over-time.py</code>
</a>
</h2>
<div class="highlight"><pre tabindex="0"><code class="language-python"><span><span><span>#!/usr/bin/env python3</span>
</span></span><span><span><span># SPDX-License-Identifier: GPL-2.0</span>
</span></span><span><span><span>#</span>
</span></span><span><span><span># Script to parse /sys/kernel/debug/page_owner_stacks/show_handles in multiple</span>
</span></span><span><span><span># reads, collecting handle numbers and recording the number of base pages per</span>
</span></span><span><span><span># handle number in each read.</span>
</span></span><span><span><span># At the end, print all stack traces and their number of pages in each read,</span>
</span></span><span><span><span># resolving handle numbers with /sys/kernel/debug/page_owner_stacks/show_stacks_handles.</span>
</span></span><span><span><span>#</span>
</span></span><span><span><span># Usage: show_handles-over-time.py <show_stacks_handles> <read1> <read2> <read3> ... <read N></span>
</span></span><span><span><span>#</span>
</span></span><span><span><span># Author: Mauricio Faria de Oliveira <[email protected]></span>
</span></span><span><span>
</span></span><span><span><span>import</span> re
</span></span><span><span><span>import</span> sys
</span></span><span><span><span>import</span> xxhash
</span></span><span><span>
</span></span><span><span>re_pages <span>=</span> re<span>.</span>compile(<span>'^nr_base_pages: ([0-9]+)'</span>)
</span></span><span><span>re_stack <span>=</span> re<span>.</span>compile(<span>'^ '</span>)
</span></span><span><span>re_empty <span>=</span> re<span>.</span>compile(<span>'^$'</span>)
</span></span><span><span>re_handle <span>=</span> re<span>.</span>compile(<span>'^handle: ([0-9]+)'</span>)
</span></span><span><span>
</span></span><span><span>stacks <span>=</span> {} <span># handle number -> stack trace (all reads)</span>
</span></span><span><span>pages <span>=</span> {} <span># handle number -> array of number of pages (per read)</span>
</span></span><span><span>read <span>=</span> <span>0</span> <span># number of the current read</span>
</span></span><span><span>
</span></span><span><span><span>if</span> len(sys<span>.</span>argv) <span><</span> <span>3</span>:
</span></span><span><span> exit(<span>1</span>)
</span></span><span><span>
</span></span><span><span>resolver <span>=</span> sys<span>.</span>argv[<span>1</span>]
</span></span><span><span>files <span>=</span> sys<span>.</span>argv[<span>2</span>:]
</span></span><span><span>nr_files <span>=</span> len(files)
</span></span><span><span>
</span></span><span><span><span>for</span> file <span>in</span> files:
</span></span><span><span> <span>with</span> open(file, <span>'r'</span>) <span>as</span> fd:
</span></span><span><span> <span>for</span> line <span>in</span> fd:
</span></span><span><span>
</span></span><span><span> <span># first line</span>
</span></span><span><span> <span>match</span> <span>=</span> re_handle<span>.</span><span>match</span>(line)
</span></span><span><span> <span>if</span> <span>match</span>:
</span></span><span><span> handle <span>=</span> int(<span>match</span><span>.</span>group(<span>1</span>))
</span></span><span><span> <span>continue</span>
</span></span><span><span>
</span></span><span><span> <span># next to last line</span>
</span></span><span><span> <span>match</span> <span>=</span> re_pages<span>.</span><span>match</span>(line)
</span></span><span><span> <span>if</span> <span>match</span>:
</span></span><span><span> nr_pages <span>=</span> int(<span>match</span><span>.</span>group(<span>1</span>));
</span></span><span><span> <span>continue</span>
</span></span><span><span>
</span></span><span><span> <span># last line</span>
</span></span><span><span> <span>if</span> re_empty<span>.</span><span>match</span>(line):
</span></span><span><span> key <span>=</span> handle
</span></span><span><span>
</span></span><span><span> <span>if</span> key <span>not</span> <span>in</span> pages:
</span></span><span><span> pages[key] <span>=</span> {}
</span></span><span><span>
</span></span><span><span> pages[key][read] <span>=</span> nr_pages
</span></span><span><span>
</span></span><span><span> <span>continue</span>
</span></span><span><span>
</span></span><span><span> read <span>+=</span> <span>1</span>
</span></span><span><span>
</span></span><span><span><span>with</span> open(resolver, <span>'r'</span>) <span>as</span> fd:
</span></span><span><span> stack <span>=</span> <span>''</span>
</span></span><span><span>
</span></span><span><span> <span>for</span> line <span>in</span> fd:
</span></span><span><span>
</span></span><span><span> <span># first line</span>
</span></span><span><span> <span>if</span> re_stack<span>.</span><span>match</span>(line):
</span></span><span><span> stack <span>=</span> stack <span>+</span> line
</span></span><span><span> <span>continue</span>
</span></span><span><span>
</span></span><span><span> <span># next to last line</span>
</span></span><span><span> <span>match</span> <span>=</span> re_handle<span>.</span><span>match</span>(line)
</span></span><span><span> <span>if</span> <span>match</span>:
</span></span><span><span> handle <span>=</span> int(<span>match</span><span>.</span>group(<span>1</span>))
</span></span><span><span> <span>continue</span>
</span></span><span><span>
</span></span><span><span> <span># last line</span>
</span></span><span><span> <span>if</span> re_empty<span>.</span><span>match</span>(line):
</span></span><span><span> stacks[handle] <span>=</span> stack
</span></span><span><span> stack <span>=</span> <span>''</span>
</span></span><span><span> <span>continue</span>
</span></span><span><span>
</span></span><span><span><span>for</span> key <span>in</span> pages<span>.</span>keys():
</span></span><span><span> print(<span>" "</span> <span>+</span> stacks[key]<span>.</span>strip())
</span></span><span><span>
</span></span><span><span> pages_per_read <span>=</span> []
</span></span><span><span> <span>for</span> read <span>in</span> range(nr_files):
</span></span><span><span> nr_pages <span>=</span> <span>0</span>
</span></span><span><span> <span>if</span> read <span>in</span> pages[key]:
</span></span><span><span> nr_pages <span>=</span> pages[key][read]
</span></span><span><span> pages_per_read<span>.</span>append(str(nr_pages))
</span></span><span><span>
</span></span><span><span> print(<span>' '</span><span>.</span>join(pages_per_read))
</span></span><span><span> print()
</span></span></code></pre></div> Mauricio Faria de Oliveirahttps://mfo.dev.br/tags/igalia/Mauricio Faria de Oliveira: page_owner Part 1: a quick introductionhttps://mfo.dev.br/posts/2026-02-23-page_owner-part-1-quick-introduction/2026-02-20T00:00:00+00:00
<p>This blog post is <a href="https://mfo.dev.br/tags/page_owner/">part of a series</a> about the <code>page_owner</code> debug feature in the Linux memory management subsystem, related to the talk <em><a href="http://www.youtube.com/watch?v=qFdjO3t5F9I">Improving <code>page_owner</code> for profiling and monitoring memory usage per allocation stack trace</a></em> presented at <a href="https://lpc.events/event/19/contributions/2202/">Linux Plumbers Conference 2025</a>.</p>
<h1 id="what-is-page_owner">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#what-is-page_owner">
What is <code>page_owner</code>?
</a>
</h1>
<p>In the Linux kernel, <code>page_owner</code> is a debug feature that tracks the memory allocation (and release) of pages in the system – so as to tell the ‘<em>owner of a page</em>’ ;-).</p>
<p>For each memory allocation, <code>page_owner</code> stores its order, <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/gfp_types.h?h=v6.19">GFP flags</a>, stack trace, timestamp, command, process ID (PID) and thread-group ID (TGID), and more. It also stores some information when pages are freed (stack trace, timestamp, PID and TGID).</p>
<p>With <code>page_owner</code>, one can find out “<em>What allocated this page?</em>” and “<em>How many pages are allocated by this particular stack trace, PID, or comm</em>”, for example.</p>
<p>This is <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_owner.c?h=v6.19#n24">struct page_owner</a> in Linux <code>v6.19</code>. It stores additional information per-page, as an extension of <code>struct page</code> with <code>CONFIG_PAGE_EXTENSION</code>.</p>
<div class="highlight"><pre tabindex="0"><code class="language-c"><span><span><span>struct</span> page_owner {
</span></span><span><span> <span>unsigned</span> <span>short</span> order;
</span></span><span><span> <span>short</span> last_migrate_reason;
</span></span><span><span> <span>gfp_t</span> gfp_mask;
</span></span><span><span> <span>depot_stack_handle_t</span> handle;
</span></span><span><span> <span>depot_stack_handle_t</span> free_handle;
</span></span><span><span> u64 ts_nsec;
</span></span><span><span> u64 free_ts_nsec;
</span></span><span><span> <span>char</span> comm[TASK_COMM_LEN];
</span></span><span><span> <span>pid_t</span> pid;
</span></span><span><span> <span>pid_t</span> tgid;
</span></span><span><span> <span>pid_t</span> free_pid;
</span></span><span><span> <span>pid_t</span> free_tgid;
</span></span><span><span>};
</span></span></code></pre></div><h1 id="usage">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#usage">
Usage
</a>
</h1>
<p>In order to use <code>page_owner</code>, build the kernel with <code>CONFIG_PAGE_OWNER=y</code> (see <code>mm/Kconfig.debug</code>) and boot the kernel with <code>page_owner=on</code>.</p>
<p>The debugfs file <code>/sys/kernel/debug/page_owner</code> provides the information in <code>struct page_owner</code> for every page, listed per <code>PFN</code> (page frame number).</p>
<p>This example shows the entry for a page (line continuation added for clarity) – it tells “<em>What allocated this page?</em>”:</p>
<pre tabindex="0"><code># cat /sys/kernel/debug/page_owner
...
Page allocated via order 0, \
mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), \
pid 5640, tgid 5640 (stress-ng-brk), ts 414987114269 ns
PFN 0x114 type Unmovable Block 0 type Unmovable Flags 0x200(workingset|node=0|zone=0)
get_page_from_freelist+0x1416/0x1600
__alloc_frozen_pages_noprof+0x18c/0x1000
alloc_pages_mpol+0x43/0x100
new_slab+0x349/0x460
___slab_alloc+0x811/0xd90
__kmem_cache_alloc_bulk+0xb8/0x1f0
__prefill_sheaf_pfmemalloc+0x42/0x90
kmem_cache_prefill_sheaf+0xa9/0x240
mas_preallocate+0x32f/0x420
__split_vma+0xdc/0x300
vms_gather_munmap_vmas+0xa4/0x240
do_vmi_align_munmap+0xe9/0x180
do_vmi_munmap+0xcb/0x160
__vm_munmap+0xa7/0x150
__x64_sys_munmap+0x16/0x20
do_syscall_64+0xa4/0x310
...
</code></pre><p>One can use <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/mm/page_owner_sort.c?h=v6.19">tools/mm/page_owner_sort</a> to process the information in the file, or come up with custom commands, scripts, or programs.</p>
<p>For example: calculate the total size of pages allocated by <code>stress-ng-brk</code> with any order, in MiB:</p>
<div class="highlight"><pre tabindex="0"><code class="language-shell"><span><span><span># COMM=stress-ng-brk</span>
</span></span><span><span><span># cat /sys/kernel/debug/page_owner \</span>
</span></span><span><span> | awk -F <span>'[ ,]'</span> <span>\
</span></span></span><span><span><span></span> <span>'/^Page allocated via order .* \('</span><span>${</span>COMM<span>}</span><span>'\)/ { PAGES+=2^$5 }
</span></span></span><span><span><span> END { print PAGES*4096/2**20 " MiB" }'</span>
</span></span><span><span>0.0429688 MiB
</span></span></code></pre></div><p>More information about <code>page_owner</code> is available in <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/mm/page_owner.rst?h=v6.19">Documentation/mm/page_owner.rst</a>.</p>
<h1 id="problem-output-size">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#problem-output-size">
Problem: output size
</a>
</h1>
<p>In the <code>page_owner</code> file, note the significant amount of text that is produced <em>per-page</em>: 745 bytes, in the example above.</p>
<p>Considering a system with 1 GiB of RAM and 4 kB pages, fully allocated, with similarly sized entries per page, the output size might reach approximately 186 MiB! (<code>745 [bytes/page] * (2**30 [bytes of RAM] / 4096 [bytes/page]) / 2**20 [bytes/MiB]</code>)</p>
<p>For validation, a test VM with 1 GiB of RAM after just a warm-up level of stress (<code>stress-ng --sequential --timeout 1</code>) produced 125 MiB, which was not quick to read even in idle state:</p>
<pre tabindex="0"><code># time cat /sys/kernel/debug/page_owner \
| wc --bytes | numfmt --to=iec
125M
real 0m3.009s
user 0m0.512s
sys 0m3.542s
</code></pre><p>While this might not be a serious issue for reading and processing the file only once, it can likely impact a sequence of operations.</p>
<h1 id="alternative-optimized-output">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#alternative-optimized-output">
Alternative: optimized output
</a>
</h1>
<p>Fortunately, another debugfs file, <code>/sys/kernel/debug/page_owner_stacks/show_stacks</code>, provides an optimized output for obtaining the memory usage per stack trace. Even though it doesn’t address all needs as the generic output, it resembles the default operation of <code>page_owner_sort</code> (without <code>PFN</code> lines) and provides an often interesting information for kernel development or analysis.</p>
<p>This example shows the entry for a stack trace – it tells “<em>How many pages are allocated by this particular stack trace?</em>”</p>
<div class="highlight"><pre tabindex="0"><code class="language-shell"><span><span><span># cat /sys/kernel/debug/page_owner_stacks/show_stacks</span>
</span></span><span><span>...
</span></span><span><span> get_page_from_freelist+0x1416/0x1600
</span></span><span><span> __alloc_frozen_pages_noprof+0x18c/0x1000
</span></span><span><span> alloc_pages_mpol+0x43/0x100
</span></span><span><span> folio_alloc_noprof+0x56/0xa0
</span></span><span><span> page_cache_ra_unbounded+0xd9/0x230
</span></span><span><span> filemap_fault+0x305/0x1000
</span></span><span><span> __do_fault+0x2c/0xb0
</span></span><span><span> __handle_mm_fault+0x6f4/0xeb0
</span></span><span><span> handle_mm_fault+0xd9/0x210
</span></span><span><span> do_user_addr_fault+0x205/0x600
</span></span><span><span> exc_page_fault+0x61/0x130
</span></span><span><span> asm_exc_page_fault+0x26/0x30
</span></span><span><span>nr_base_pages: <span>9643</span>
</span></span><span><span>
</span></span><span><span>...
</span></span></code></pre></div><p>The <code>nr_base_pages</code> field tells the number of base pages (i.e., not huge pages) allocated by a stack trace. So, this particular stack trace for <em>readahead</em> (<code>page_cache_ra_unbounded()</code>) has allocated approximately 37 MiB (<code>9643 [pages] * 4096 [bytes/page] / 2**20 [ bytes/MiB]</code>).</p>
<p>Note this file is more efficient for this particular purpose: just 402 KiB in less than 0.05 seconds. (That is 0.3% of the size and 1.7% of the time):</p>
<pre tabindex="0"><code># time cat /sys/kernel/debug/page_owner_stacks/show_stacks \
| wc --bytes | numfmt --to=iec
402K
real 0m0.042s
user 0m0.004s
sys 0m0.046s
</code></pre><h1 id="conclusion">
<a class="header-link" href="https://mfo.dev.br/tags/igalia/index.xml#conclusion">
Conclusion
</a>
</h1>
<p>The <code>page_owner</code> debug feature (enabled with <code>CONFIG_PAGE_OWNER=y</code> and <code>page_owner=on</code>) provides information about the memory allocation of pages in the system in debugfs files <code>/sys/kernel/debug/page_owner</code> with a generic format (dense description per-page) and <code>/sys/kernel/debug/page_owner_stacks/show_stacks</code> with an optimized format (number of base pages per stack trace).</p> Mauricio Faria de Oliveirahttps://mfo.dev.br/tags/igalia/Alex Bradbury: Minipost: Additional figures for per-query energy consumption of LLMshttps://muxup.com/2026q1/minipost-additional-figures-for-per-query-energy-consumption-of-LLMs2026-02-17T12:00:00+00:00
<p>Last month I wrote up a fairly long piece on <a href="https://muxup.com/2026q1/per-query-energy-consumption-of-llms">per-query energy consumption of
LLMs using the data from
InferenceMAX</a> (note:
InferenceMAX has since been renamed to InferenceX). Much of the write-up was
dedicated to exploring what you can actually conclude from these figures and
how that interacts with some of the implementation decisions in the benchmark,
but I feel the results still give a useful yardstick. Beyond concerns about
overly-specialised serving engine configurations and whether the workload is
representative of real-world model serving in a paid API host, the other
obvious limitation is that InferenceMAX is only testing GPT-OSS 120b and
DeepSeek R1 0528 when there is a world of other models out there. I dutifully
added "run my own tests using other models" to the todo list and here we are.
By "here we are" I of course mean I made no progress towards that goal but
<a href="https://muellerzr.github.io/">Zach Mueller</a> at <a href="https://lambda.ai/">Lambda</a>
started publishing <a href="https://lambda.ai/inference-models">model cards with the needed
data</a> - thanks Zach!</p>
<p>The setup for Lambda is simple - each model card lists the observed token
generation throughput and total throughput (along with other stats) for an
input sequence length / output sequence length (ISL/OSL) of 8192/1024, as
benchmarked using <code>vllm bench serve</code>. The command used to serve the LLM (using
sglang or vllm depending on the model) is also given. As a starting point this
is no worse than the InferenceMAX data, and potentially somewhat better due to
figures being taken from a configuration that's not <a href="https://github.com/SemiAnalysisAI/InferenceX/issues/359#issue-3750796719">overly specialised to a
particular query
length</a>.</p>
<p>The figures each Lambda model card gives us that are relevant for calculating
the energy per query are: the hardware used, token generation throughput and
total token throughput (input+output tokens). Other statistics such as the
time to first token, inter-token latency, and parallel requests tested help
confirm whether this is a configuration someone would realistically use. Using
an equivalent methodology to before, we get the Watt hours per query by:</p>
<ul>
<li>Determining the total Watts for the GPU cluster. We take the figures used by
SemiAnalysis (2.17kW for a single B200) and multiply by the number of GPUs.</li>
<li>Calculate the joules per token by dividing this total Watts figure by the
total token throughput. This gives a weighted average of the joules per
token for the measured workload, reflecting the ratio of isl:osl.</li>
<li>Multiply this weighted average of joules per token by the tokens per query
(isl+osl) to get the joules per query. Then divide by 3600 to get Wh.</li>
</ul>
<p>Collecting the data from the individual model cards we can generate the
following (as before, using minutes of PlayStation 5 gameplay as a point of
comparison):</p>
<div class="highlight"><pre><span></span><code><span>data</span> <span>=</span> {
<span>"Qwen/Qwen3.5-397B-A17B"</span>: {
<span>"num_b200"</span>: <span>8</span>,
<span>"total_throughput"</span>: <span>11092</span>,
},
<span>"MiniMaxAI/MiniMax-M2.5"</span>: {
<span>"num_b200"</span>: <span>2</span>,
<span>"total_throughput"</span>: <span>8062</span>,
},
<span>"zai-org/GLM-5-FP8"</span>: {
<span>"num_b200"</span>: <span>8</span>,
<span>"total_throughput"</span>: <span>6300</span>,
},
<span>"zai-org/GLM-4.7-Flash"</span>: {
<span>"num_b200"</span>: <span>1</span>,
<span>"total_throughput"</span>: <span>8125</span>,
},
<span>"arcee-ai/Trinity-Large-Preview"</span>: {
<span>"num_b200"</span>: <span>8</span>,
<span>"total_throughput"</span>: <span>15611</span>,
},
}
<span># 8192 + 1024</span>
<span>TOKENS_PER_QUERY</span> <span>=</span> <span>9216</span>
<span># Taken from <https://inferencex.semianalysis.com/></span>
<span>B200_KW</span> <span>=</span> <span>2.17</span>
<span># Reference power draw for PS5 playing a game. Taken from</span>
<span># <https://www.playstation.com/en-gb/legal/ecodesign/> ("Active Power</span>
<span># Consumption"). Ranges from ~217W to ~197W depending on model.</span>
<span>PS5_KW</span> <span>=</span> <span>0.2</span>
<span>def</span> <span>wh_per_query</span>(<span>num_b200</span>, <span>total_throughput</span>, <span>tokens_per_query</span>):
<span>total_cluster_kw</span> <span>=</span> <span>num_b200</span> <span>*</span> <span>B200_KW</span>
<span>total_cluster_watts</span> <span>=</span> <span>total_cluster_kw</span> <span>*</span> <span>1000</span>
<span># joules_per_token is a weighted average for the measured mix of input</span>
<span># and output tokens.</span>
<span>joules_per_token</span> <span>=</span> <span>total_cluster_watts</span> <span>/</span> <span>total_throughput</span>
<span>joules_per_query</span> <span>=</span> <span>joules_per_token</span> <span>*</span> <span>tokens_per_query</span>
<span># Convert joules to watt-hours</span>
<span>return</span> <span>joules_per_query</span> <span>/</span> <span>3600.0</span>
<span>def</span> <span>ps5_minutes</span>(<span>wh</span>):
<span>ps5_watts</span> <span>=</span> <span>PS5_KW</span> <span>*</span> <span>1000</span>
<span>return</span> (<span>wh</span> <span>/</span> <span>ps5_watts</span>) <span>*</span> <span>60.0</span>
<span>MODEL_WIDTH</span> <span>=</span> <span>31</span>
<span>WH_WIDTH</span> <span>=</span> <span>8</span>
<span>PS5_WIDTH</span> <span>=</span> <span>8</span>
<span>header</span> <span>=</span> <span>f"{'Model':<{</span><span>MODEL_WIDTH</span><span>}} | {'Wh/q':<{</span><span>WH_WIDTH</span><span>}} | {'PS5 min':<{</span><span>PS5_WIDTH</span><span>}}"</span>
<span>separator</span> <span>=</span> <span>f"{'-'</span> <span>*</span> <span>MODEL_WIDTH</span><span>} | {'-'</span> <span>*</span> <span>WH_WIDTH</span><span>} | {'-'</span> <span>*</span> <span>PS5_WIDTH</span><span>}"</span>
<span>print</span>(<span>header</span>)
<span>print</span>(<span>separator</span>)
<span>for</span> <span>model</span>, <span>vals</span> <span>in</span> <span>data.items</span>():
<span>wh</span> <span>=</span> <span>wh_per_query</span>(<span>vals</span>[<span>"num_b200"</span>], <span>vals</span>[<span>"total_throughput"</span>], <span>TOKENS_PER_QUERY</span>)
<span>ps5_min</span> <span>=</span> <span>ps5_minutes</span>(<span>wh</span>)
<span>wh_str</span> <span>=</span> <span>f"{</span><span>wh</span><span>:.2f}"</span> <span>if</span> <span>wh</span> <span><</span> <span>10</span> <span>else</span> <span>f"{</span><span>wh</span><span>:.1f}"</span>
<span>print</span>(<span>f"{</span><span>model.strip</span>()<span>:<{</span><span>MODEL_WIDTH</span><span>}} | {</span><span>wh_str</span><span>:<{</span><span>WH_WIDTH</span><span>}} | {</span><span>ps5_min</span><span>:.2f}"</span>)
</code></pre></div>
<p>This gives the following figures (reordered to show Wh per query in ascending
order, and added a column for interactivity (1/TPOT)):</p>
<table>
<thead>
<tr>
<th align="left">Model</th>
<th align="left">Intvty (tok/s)</th>
<th align="left">Wh/q</th>
<th align="left">PS5 min.</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">zai-org/GLM-4.7-Flash (bf16)</td>
<td align="left">34.0</td>
<td align="left">0.68</td>
<td align="left">0.21</td>
</tr>
<tr>
<td align="left">MiniMaxAI/MiniMax-M2.5 (fp8)</td>
<td align="left">30.3</td>
<td align="left">1.38</td>
<td align="left">0.41</td>
</tr>
<tr>
<td align="left">arcee-ai/Trinity-Large-Preview (bf16)</td>
<td align="left">58.8</td>
<td align="left">2.85</td>
<td align="left">0.85</td>
</tr>
<tr>
<td align="left">Qwen/Qwen3.5-397B-A17B (bf16)</td>
<td align="left">41.7</td>
<td align="left">4.01</td>
<td align="left">1.20</td>
</tr>
<tr>
<td align="left">zai-org/GLM-5-FP8 (fp8)</td>
<td align="left">23.3</td>
<td align="left">7.05</td>
<td align="left">2.12</td>
</tr>
</tbody>
</table>
<p>As a point of comparison, the most efficient 8 GPU deployment of fp8 DeepSeek
R1 0528 from my figures in the <a href="https://muxup.com/2026q1/per-query-energy-consumption-of-llms">previous
article</a> was 3.32 Wh
per query.</p>
<p>And that's all I really have for today. Some interesting datapoints with
hopefully more to come as Lambda puts up more model cards in this format.
There's a range of interesting potential further experiments to do, but for
now, I just wanted to share this initial look.</p>
<hr /><a href="https://muxup.com/feed.xml#article-changelog" class="anchor" tabindex="-1"></a>Article changelog
<ul>
<li>2026-02-17: Initial publication date.</li>
</ul> Alex Bradburyhttps://muxup.comAlex Bradbury: shandboxhttps://muxup.com/shandbox2026-02-11T12:00:00+00:00
<p><a href="https://github.com/muxup/medley/blob/main/shandbox"><code>shandbox</code></a> is a simple
Linux sandboxing script that serves my needs well. Perhaps it works for you
too? No dependencies between a shell and util-linux (<code>unshare</code> and <code>nsenter</code>).</p>
<p>In short, it aims to provide fairly good isolation for personal files (i.e.
your <code>$HOME</code>) while being very convenient for day to day use. It's designed to
be run as an unprivileged user - as long as you can make new namespaces you
should be good to go. By default <code>/home/youruser/sandbox</code> shows up as
<code>/home/sandbox</code> within the sandbox, and other than standard paths like <code>/usr</code>,
<code>/etc</code>, <code>/tmp</code>, and so on it's left for you to either copy things into the
sandbox or expose them via a mount. There's a single shared sandbox (i.e.
processes within the sandbox can see and interact with each other, and the
exposed sandbox filesystem is shared as well), which trades off some ease of
use for the security you might get with a larger number of more targeted
sandboxes. On the other hand, you only gain security from a sandbox if you
actually use it and this is a setup that offers very low friction for me. The
network is not namespaced (although this is something you could change with a
simple edit).</p>
<p>Usability is both subjective and highly dependent on your actual use case, so
the tradeoffs may or may not align with what is interesting for you!
<a href="https://github.com/containers/bubblewrap">Bubblewrap</a> is an example of a
mature alternative unprivileged sandboxing
tool that offers a lot of configurability as well as options with greater
degrees of sandboxing. Beyond that, look to
<a href="https://firecracker-microvm.github.io/">Firecracker</a> based solutions or
<a href="https://gvisor.dev/">gvisor</a>. <code>shandbox</code> obviously aims to provide a
reasonable sandbox as much as Linux namespaces alone are able to offer, but if
you're looking for a security property stronger than "makes it harder for
something to edit or access unwanted files" it's down to you to both carefully
review its implementation and consider alternatives.</p>
<h2 id="usage-example"><a href="https://muxup.com/feed.xml#usage-example" class="anchor" tabindex="-1"></a>Usage example</h2>
<pre><code>$ shandbox run uvx pycowsay
Installed 1 package in 5ms
------------
< Hello, world >
------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
$ shandbox status
running (pid 1589364)
log:
2026-02-11 13:02:51 stopped
2026-02-11 13:05:06 started (pid 1589289)
$ shandbox add-mount ~/repos/llvm-project /home/sandbox/llvm-project
mounted /home/asb/repos/llvm-project -> /home/sandbox/llvm-project
$ shandbox run touch /home/sandbox/llvm-project/write-attempt
touch: cannot touch '/home/sandbox/llvm-project/write-attempt': Read-only file system
$ shandbox remove-mount /home/sandbox/llvm-project
unmounted /home/sandbox/llvm-project
$ shandbox add-mount --read-write ~/repos/llvm-project /home/sandbox/llvm-project
mounted /home/asb/repos/llvm-project -> /home/sandbox/llvm-project
$ shandbox run touch /home/sandbox/llvm-project/write-attempt
</code></pre>
<p><code>shandbox enter</code> will open a shell within the sandbox for easy interactive
usage. As a convenience, if the current working directory is in
<code>$HOME/sandbox</code> (e.g. <code>$HOME/sandbox/foo</code>) then the working directory within
the sandbox for <code>shandbox run</code> or <code>shandbox enter</code> will be set to the
appropriate path within the sandbox (<code>/home/sandbox/foo</code> in this case). i.e.,
the case where this mapping is trivial. Environment variables are not passed
through.</p>
<h2 id="functionality-overview"><a href="https://muxup.com/feed.xml#functionality-overview" class="anchor" tabindex="-1"></a>Functionality overview</h2>
<ul>
<li><code>shandbox start</code>: Start the sandbox, creating the necessary namespaces and
mount layout. Fails if the sandbox is already running.</li>
<li><code>shandbox stop</code>: Stop the sandbox by killing the process holding the
namespaces. Fails if the sandbox is not running.</li>
<li><code>shandbox restart</code>: Stop the sandbox and start it again.</li>
<li><code>shandbox status</code>: Print whether the sandbox is running and if it is, the
pid. Also print the last 20 lines of the log.</li>
<li><code>shandbox enter</code>: Open bash within the sandbox, starting the sandbox first
if it's not already running.</li>
<li><code>shandbox run <command> [args...]</code>: Run a command inside the sandbox. The
current working directory is translated to an in-sandbox path if it falls
within the sandbox home directory. Starts the sandbox first if it isn't
already running.</li>
<li><code>shandbox add-mount [--read-write] <host-path> <sandbox-path></code>: Bind-mount a
host path into the running sandbox. Mounts are read-only by default; pass
<code>--read-write</code> to allow writes. The sandbox must already be running.
Both directories and individual files are supported.</li>
<li><code>shandbox remove-mount <sandbox-path></code>: Remove a previously added bind mount
from the running sandbox.</li>
</ul>
<h2 id="implementation-approach"><a href="https://muxup.com/feed.xml#implementation-approach" class="anchor" tabindex="-1"></a>Implementation approach</h2>
<p>The core sandboxing functionality is provided by the Linux namespaces
functionality exposed by
<a href="https://manpages.debian.org/unstable/util-linux/unshare.1.en.html"><code>unshare</code></a>
and
<a href="https://manpages.debian.org/unstable/util-linux/nsenter.1.en.html"><code>nsenter</code></a>.
The <a href="https://github.com/muxup/medley/blob/main/shandbox">script's
implementation</a> should be
quite readable but I'll try to summarise some key points here.</p>
<p>The goal is that:</p>
<ul>
<li>Within the sandbox, you appear as an unprivileged user, with uid and gid
equal to your usual Linux user.</li>
<li>It should be possible to expose additional files or directories to the
sandbox once it's running.</li>
<li>Applications running within the sandbox have no way (modulo bugs or
vulnerabilities in the kernel or accessible applications) of reaching files
on the host filesystem that aren't explicitly exposed.
<ul>
<li>To underline: This is a goal, it is <em>not</em> a guarantee.</li>
</ul>
</li>
<li>It's possible to launch multiple processes within the sandbox which can all
see each other, and have the same shared sandboxed filesystem.</li>
<li>This is all doable as an unprivileged user.</li>
</ul>
<p>To implement that:</p>
<ul>
<li>Two sets of namespaces are used to provide this isolation: the outer
'shandbox_root' has the user mapped to root within the namespace and retains
access to standard / (allowing us to mount additional paths into after the
sandbox has started). The inner 'shandbox_user' represents a new user
namepsace mapping our uid/gid to an unprivileged user, but other namespaces
are shared with 'shandbox_root'. Sandboxed processes are launched within the
namespaces of 'shandbox_user'.</li>
<li>The process IDs of the initial process within 'sandbox_root' and
'sandbox_user' are saved and recalled so the script can use <code>nsenter</code> to
enter the namespace.</li>
<li>To help make it easier to tell when you're in the sandbox, a dummy
<code>/etc/passwd</code> is bind-mounted naming the current user as <code>sandbox</code>.</li>
<li>When <code>shandbox start</code> is executed, the necessary directories are bind
mounted in a directory that will be used as root (<code>/</code>) for the user sandbox
in <code>.local/share/shandbox/root</code>. This happens within the sandbox_root
namespace, which then uses <code>unshare</code> again to create a new user namespace
with an unprivileged user, executing within a chroot.</li>
<li>'sandbox_root' retains access to the host filesystem, which is necessary to
allow mounting additional paths after the fact. Without this requirement, we
could likely rewrite <code>shandbox start</code> to use <code>pivot_root</code>.</li>
</ul>
<h2 id="making-it-your-own"><a href="https://muxup.com/feed.xml#making-it-your-own" class="anchor" tabindex="-1"></a>Making it your own</h2>
<p>The script should be straight-forward enough to customise to your needs if
they're not too dissimilar to what is offered out of the box. Some variables
at the top provide things you may be more likely to want to change, such as
the home directory location, and a list of files or directories in <code>$HOME</code> to
always bind-mount into the sandbox home:</p>
<div class="highlight"><pre><span></span><code><span>SANDBOX_HOME_DIR=</span><span>"</span><span>$HOME</span><span>/sandbox"</span>
<span>HOME_FILES_TO_MAP=</span><span>".bashrc .vimrc"</span>
<span>HOME_DIRS_TO_MAP=</span><span>".vim bin"</span>
<span>SB_HOME=</span><span>"/home/sandbox"</span>
<span>SB_PATH=</span><span>"</span><span>$SB_HOME</span><span>/bin:/usr/local/bin:/usr/bin"</span>
</code></pre></div>
<hr /><a href="https://muxup.com/feed.xml#article-changelog" class="anchor" tabindex="-1"></a>Article changelog
<ul>
<li>2026-02-11: Initial publication date.</li>
</ul> Alex Bradburyhttps://muxup.comJosé Dapena: Container Timing: measuring web components performancehttps://blogs.igalia.com/dape/2026/02/10/container-timing-measuring-web-components-performance/2026-02-10T00:00:00+00:00
<img class="face" src="/images/dape.png" width="74" height="100" alt="" align="right" style="float: right" />
<p>Over the last year, as part of the collaboration between <a href="https://www.igalia.com">Igalia</a> and <a href="https://www.techatbloomberg.com/">Bloomberg</a> to improve web performance observability, I worked on a new web performance API: <strong>Container Timing</strong>. This standard aims to make component-level performance measurement as easy as page-level metrics like LCP and FCP.</p>
<p>My focus has been writing the native implementation in Chromium, which is now available behind a feature flag.</p>
<p>In this post, I will explain why this API is needed, how it works, and how you can experiment with it today. In a follow-up post, I will dive deep into the implementation details within the Blink rendering engine.</p>
<h2 id="the-problem-measuring-component-performance" tabindex="-1">The problem: measuring component performance <a class="header-anchor" href="https://blogs.igalia.com/dape/2026/02/10/container-timing-measuring-web-components-performance/">#</a></h2>
<p>We currently use <a href="https://web.dev/articles/lcp">Largest Contentful Paint (LCP)</a> and <a href="https://web.dev/articles/fcp">First Contentful Paint (FCP)</a> to measure web page loading performance. Both metrics are page-scoped, meaning they evaluate the user perceived load speed for full page.</p>
<p>The <a href="https://w3c.github.io/element-timing/">Element Timing API</a> shifts the focus to individual DOM elements. By targetting specific elements, like hero images or a headers, we can measure their specific rendering performance independent of the rest of the page.</p>
<p>However, modern web development is component-based. Developers build complex widgets (as grids, charts, feeds or panels) that are made of many elements. It is not trivial to understand the performance of those components:</p>
<ul>
<li>LCP may not be useful as another large image painting could delay it.</li>
<li>Measuring a web component with Element Timing may require instrumenting all the significant elements one by one.</li>
</ul>
<p><img src="https://blogs.igalia.com/dape/2026/02/10/container-timing-measuring-web-components-performance/images/container_timing_problem.png" alt="A representation of a news web page, where the scope of LCP is the full web page, and Element Timing is a specific element, but we want to measure the latest news feed widget." class="dark-invert" /></p>
<h2 id="the-solution-container-timing" tabindex="-1">The solution: Container Timing <a class="header-anchor" href="https://blogs.igalia.com/dape/2026/02/10/container-timing-measuring-web-components-performance/">#</a></h2>
<p>This is where <strong>Container Timing</strong> comes in! With the new specification, a web developer can mark subtrees of the DOM as “containers”. Then, it provides performance entries aggregating the painting time of that subtree.</p>
<p><img src="https://blogs.igalia.com/dape/2026/02/10/container-timing-measuring-web-components-performance/images/container_timing_solution.png" alt="A representation of a news web page, where aggregating the paints of the children of the news feed widget allows to know when its painting has finished." class="dark-invert" /></p>
<p>This way, we can answer: “when did a specific component finish painting its content?”.</p>
<p>Some examples:</p>
<ul>
<li><strong>Breaking down the contributors to the initial page load</strong>: with <strong>Container Timing</strong> we can focus on the components that are more relevant to the user experience.</li>
<li><strong>Single page application navigation</strong>: when a soft navigation shows a new component on the screen, we can obtain painting information for it.</li>
<li><strong>Lazy-loaded components</strong>: Tracking when a widget that loads below the fold is fully visible.</li>
<li><strong>Third-party content</strong>: Monitoring the performance of ads or embedded widgets.</li>
</ul>
<p>You just need to add, to the top element of the subtree, the new attribute <code>containertiming</code>. When you add it to an HTML element, the browser will track all the painting updates of that element and its descendants.</p>
<p>What happens under the hood? The browser will start monitoring the rendering pipeline for paints that contribute to representing the subtree. When a new frame is painted, if that paints new areas for that subtree, it reports a performance entry showing the increase in painted area. It is similar to LCP, but for a specific subtree!</p>
<h2 id="how-to-use-container-timing" tabindex="-1">How to use Container Timing? <a class="header-anchor" href="https://blogs.igalia.com/dape/2026/02/10/container-timing-measuring-web-components-performance/">#</a></h2>
<p>Using the API is straightforward. First, mark the containers you want to track in HTML:</p>
<pre class="language-html" tabindex="0"><code class="language-html"><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>my-widget<span class="token punctuation">"</span></span> <span class="token attr-name">containertiming</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>widget-load<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>img</span> <span class="token attr-name">src</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>graph.png<span class="token punctuation">"</span></span> <span class="token punctuation">/></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>p</span><span class="token punctuation">></span></span>Loading data...<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>p</span><span class="token punctuation">></span></span><br /><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span></code></pre>
<p>Then, use a <code>PerformanceObserver</code> to listen for container entries:</p>
<pre class="language-javascript" tabindex="0"><code class="language-javascript"><span class="token keyword">const</span> observer <span class="token operator">=</span> <span class="token keyword">new</span> <span class="token class-name">PerformanceObserver</span><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token parameter">list</span><span class="token punctuation">)</span> <span class="token operator">=></span> <span class="token punctuation">{</span><br /> list<span class="token punctuation">.</span><span class="token function">getEntries</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">forEach</span><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token parameter">entry</span><span class="token punctuation">)</span> <span class="token operator">=></span> <span class="token punctuation">{</span><br /> console<span class="token punctuation">.</span><span class="token function">log</span><span class="token punctuation">(</span><span class="token template-string"><span class="token template-punctuation string">`</span><span class="token string">Container '</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span>entry<span class="token punctuation">.</span>identifier<span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">' painted.</span><span class="token template-punctuation string">`</span></span><span class="token punctuation">)</span><span class="token punctuation">;</span><br /> console<span class="token punctuation">.</span><span class="token function">log</span><span class="token punctuation">(</span><span class="token template-string"><span class="token template-punctuation string">`</span><span class="token string">Time: </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span>entry<span class="token punctuation">.</span>startTime<span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">`</span></span><span class="token punctuation">)</span><span class="token punctuation">;</span><br /> console<span class="token punctuation">.</span><span class="token function">log</span><span class="token punctuation">(</span><span class="token template-string"><span class="token template-punctuation string">`</span><span class="token string">Size: </span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span>entry<span class="token punctuation">.</span>size<span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">`</span></span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// The area painted</span><br /> <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br /><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br /><br />observer<span class="token punctuation">.</span><span class="token function">observe</span><span class="token punctuation">(</span><span class="token punctuation">{</span> <span class="token literal-property property">type</span><span class="token operator">:</span> <span class="token string">"container"</span><span class="token punctuation">,</span> <span class="token literal-property property">buffered</span><span class="token operator">:</span> <span class="token boolean">true</span> <span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span></code></pre>
<p>When the web contents load, new Performance entries will be emitted with the container updates.</p>
<p>Which entry will be interesting? The API lets you choose what best fits your needs! Some ideas:</p>
<ul>
<li>The most important entry could be the last one: the one that increased the painted area for the last time. Something similar to LCP.</li>
<li>Or maybe the last one that contributed a significant size increase?</li>
<li>Or the last one before a user interaction?</li>
</ul>
<h2 id="a-native-implementation-for-chromium" tabindex="-1">A native implementation for Chromium <a class="header-anchor" href="https://blogs.igalia.com/dape/2026/02/10/container-timing-measuring-web-components-performance/">#</a></h2>
<p>In the initial steps of the specification, Jason Williams wrote a <a href="https://github.com/bloomberg/container-timing/tree/main/polyfill">polyfill</a> that worked on top of Element Timing. This was very useful to understand and polish the kind of information the specification could provide. However, this had its own performance impact.</p>
<div class="markdown-alert markdown-alert-warning"><p class="markdown-alert-title">Deprecation Notice:</p><p>The polyfill is now deprecated and no longer maintained, as the native API cannot be fully replicated using Element Timing. Please use the native implementation for accurate results.</p>
</div>
<p>So I started a native implementation in Chromium. The main idea was working on top of the already existing implementation for Element Timing, and add the remaining bits.</p>
<p>In my next blog post I will go through the implementation details. But, for this post, it is relevant to state that the goals of this native implementation were:</p>
<ul>
<li>Minimizing the overhead. It should be almost zero when elements are not interesting to <strong>Container Timing</strong>, and very fast and light when paints were relevant.</li>
<li>It should reuse as much as possible of the already existing logic for Element Timing.</li>
</ul>
<p>The native implementation has landed and is available in Chromium144+, but still behind the <code>ContainerTiming</code> feature flag.</p>
<p>You can experiment with this feature locally by passing the following flag to Chromium at startup:</p>
<pre class="language-bash" tabindex="0"><code class="language-bash">chrome --enable-blink-features<span class="token operator">=</span>ContainerTiming</code></pre>
<p>Or you can just enable the “Experimental Web Platform features” in <code>chrome://flags</code>.</p>
<h2 id="upcoming-trials" tabindex="-1">Upcoming trials <a class="header-anchor" href="https://blogs.igalia.com/dape/2026/02/10/container-timing-measuring-web-components-performance/">#</a></h2>
<p>So now, it is time to collect feedback from the actual web developers.</p>
<p>We have already presented the specification in several conferences (as <a href="https://www.igalia.com/downloads/slides/josedapenapaz-containertiming.pdf">BlinkOn 20</a> or <a href="https://perfnow.nl/2024/">Performance.now() 2024</a>). And discussions are ongoing in the <a href="https://www.w3.org/webperf/">Web Performance Working Group</a>.</p>
<p>We just <a href="https://groups.google.com/a/chromium.org/g/blink-dev/c/FnM3lweVssM/m/eVhhCtG5AQAJ">announced the Dev Trial in the blink-dev mailing list</a>! The feature is now officially ready for testing.</p>
<p>What’s next? We are also preparing an Origin Trial, that will allow developers to test the specification in production for a subset of their users.</p>
<p>If you want to provide feedback, we are collecting it in the explainer <a href="https://github.com/bloomberg/container-timing/issues">ticket tracker</a>.</p>
<h2 id="wrapping-up" tabindex="-1">Wrapping up <a class="header-anchor" href="https://blogs.igalia.com/dape/2026/02/10/container-timing-measuring-web-components-performance/">#</a></h2>
<p>With Container Timing, you will be able to measure paintings at the web component level, filling a significant gap in the web performance monitoring landscape.</p>
<p>If you struggled with finding out the ready time of your widgets, just try it! It is available, under the feature flags <code>ContainerTiming</code>, in Chromium Stable today.</p>
<p>And stay tuned! In a follow up post, I will go through the native implementation details in Chromium.</p>
<h2 id="thanks" tabindex="-1">Thanks! <a class="header-anchor" href="https://blogs.igalia.com/dape/2026/02/10/container-timing-measuring-web-components-performance/">#</a></h2>
<p>This has been done as part of the collaboration between <a href="https://techatbloomberg.com">Bloomberg</a> and <a href="https://www.igalia.com">Igalia</a>. Thanks!</p>
<p><a href="https://www.igalia.com">
<source media="(prefers-color-scheme: dark)">
<img src="https://blogs.igalia.com/dape/img/igalia_-_500px_-_RGB_-_Feb23-580x210.png" alt="Igalia" />
</source></a> <a href="https://techatbloomberg.com"><img src="https://blogs.igalia.com/dape/img/Bloomberg-logo-580x117.png" alt="Bloomberg" class="dark-invert" /></a></p>
<h2 id="references" tabindex="-1">References <a class="header-anchor" href="https://blogs.igalia.com/dape/2026/02/10/container-timing-measuring-web-components-performance/">#</a></h2>
<ul>
<li><a href="https://github.com/bloomberg/container-timing">Container Timing explainer</a></li>
<li><a href="https://bloomberg.github.io/container-timing/">Container Timing specification draft</a></li>
<li><a href="https://github.com/bloomberg/container-timing/issues">Ticket tracker for specification discussion</a></li>
<li><a href="https://chromestatus.com/feature/5110962817073152">Chrome status feature: Container Timing</a></li>
<li><a href="https://groups.google.com/a/chromium.org/g/blink-dev/c/FnM3lweVssM/m/eVhhCtG5AQAJ">Container Timing ready for testing announcement in blink-dev</a></li>
<li><a href="https://issues.chromium.org/382422286">Container Timing native implementation Chromium issue</a></li>
</ul> José Dapenahttps://blogs.igalia.com/dape/Igalia WebKit Team: WebKit Igalia Periodical #56https://blogs.igalia.com/webkit/blog/2026/wip-56/2026-02-09T23:21:30+00:00
<p>Update on what happened in WebKit in the week from February 2 to February 9.</p>
<p>
The main event this week was FOSDEM (pun intended), which included
presentations related to WebKit; but also we got a batch of stable
and development releases, asynchronous scrolling work, OpenGL
logging, cleanups, and improving the inspector for the WPE work.
</p>
<h2 id="cross-port-cat">Cross-Port 🐱</h2>
<h3 id="graphics-frame-photo">Graphics 🖼️</h3>
<div class="wip-item">
<p>While asynchronous scrolling for mouse wheel events was already supported,
scrollbar layers were still being painted on the main thread. This has been
<a rel="external" href="https://commits.webkit.org/306838@main">changed</a> to paint scrollbars on the
scrolling thread instead, which avoids scrollbars to “lag” behind scrolled
content.</p>
</div>
<div class="wip-item">
<p><a rel="external" href="https://commits.webkit.org/306987@main">Fixed</a> flickering caused by the
combination of damage tracking and asynchronous scrolling for mouse wheel
events.</p>
</div>
<div class="wip-item">
<p>It is now possible to <a rel="external" href="https://commits.webkit.org/306778@main">enable debug logging for OpenGL
contexts</a> using the new <code>GLContext</code> log
channel, which takes advantage of the message events produced by the
<a rel="external" href="https://wikis.khronos.org/opengl/Debug_Output">widespread KHR_debug
extension</a>.</p>
<p>Figuring out the exact location inside WebKit that triggered an OpenGL issue
may still be challenging with this aid, and therefore <a rel="external" href="https://commits.webkit.org/306862@main">a backtrace will be
appended</a> in case of errors to help
pinpoint the source, when the log channel is enabled at the “debug” level with
<code>GLContext=debug</code>.</p>
</div>
<div class="wip-item">
<p>Configuring the build with <code>USE_SKIA=OFF</code> to make WebKit use the
<a rel="external" href="https://cairographics.org/">Cairo</a> graphics library <a rel="external" href="https://commits.webkit.org/306343@main">is no longer
supported</a>. Using
<a rel="external" href="https://skia.org">Skia</a> has been the default <a rel="external" href="https://blogs.igalia.com/carlosgc/2024/09/27/graphics-improvements-in-webkitgtk-and-wpewebkit-2-46/">since late
2024</a>,
and after two full years the 2.54.0 release (due in September 2026)
will be the first one where the choice is no longer possible.</p>
</div>
<h2 id="webkitgtk-desktop">WebKitGTK 🖥️</h2>
<div class="wip-item">
<p>The “on demand” hardware acceleration policy has been rarely used lately, and
thus support for it has been <a rel="external" href="https://commits.webkit.org/306855@main">removed</a>.
Note that this affects only the GTK port when built with GTK 3—the option never
existed when using GTK 4.</p>
<p>Existing GTK 3 applications that use
<code>WEBKIT_HARDWARE_ACCELERATION_POLICY_ON_DEMAND</code> will continue to work and do
<strong>not</strong> need rebuilding: they will be promoted to use the “always enabled” policy
starting with WebKitGTK 2.54.0 (due in September 2026).</p>
</div>
<h2 id="wpe-webkit-pager">WPE WebKit 📟</h2>
<div class="wip-item">
<p>The Web Inspector <a rel="external" href="https://commits.webkit.org/306914@main">has received
support</a> for saving data to local
files, allowing things such as saving page resources or exporting the network
session to a <a rel="external" href="https://en.wikipedia.org/wiki/HAR_(file_format)">HAR archive</a>.</p>
<p>Note that using the Web Inspector locally is supported when using the
WPEPlatform API, and the keyboard shortcut <kbd title="Control + Shift + I">Ctrl+Shift+I</kbd> may be used to bring it up.</p>
</div>
<h2 id="releases-package">Releases 📦️</h2>
<div class="wip-item">
<p><a rel="external" href="https://webkitgtk.org/2026/02/09/webkitgtk2.50.5-released.html">WebKitGTK
2.50.5</a> and
<a rel="external" href="https://wpewebkit.org/release/wpewebkit-2.50.5.html">WPE WebKit 2.50.5</a> have
been released. These are stable maintenance releases that improves stability,
correct bugs, and fixes small rendering issues.</p>
<p>The second release candidates for the upcoming stable branch, <a rel="external" href="https://webkitgtk.org/2026/02/06/webkitgtk2.51.91-released.html">WebKitGTK
2.51.91</a> and
<a rel="external" href="https://wpewebkit.org/release/wpewebkit-2.51.91.html">WPE WebKit 2.51.91</a>,
have been published as well. Those using those to preview the upcoming 2.52.x
series are encouraged to provide <a rel="external" href="https://bugs.webkit.org/">bug reports in
Bugzilla</a> for any issue they may experience.</p>
</div>
<h2 id="community-events-handshake">Community & Events 🤝</h2>
<div class="wip-item">
<p>We have published a <a rel="external" href="https://blogs.igalia.com/compilers/2026/02/02/implementing-the-temporal-proposal-in-javascriptcore/">blog
post</a>
on our work implementing the
<a rel="external" href="https://tc39.es/proposal-temporal/docs/">Temporal</a> proposal in JavaScriptCore,
WebKit's JavaScript engine.</p>
</div>
<div class="wip-item">
<p>This year's edition of <a rel="external" href="https://fosdem.org/2026/">FOSDEM</a> took place in
Brussels between January 31st and February 1st, and featured a number of
sessions related to WebKitGTK and WPE:</p>
<ul>
<li><a rel="external" href="https://fosdem.org/2026/schedule/event/8ZL9BZ-web-platform-on-linux-devices-with-webkit/">The Web Platform on Linux devices with WebKit: where are we
now?</a>,
by Mario Sánchez, is a good introduction-level talk about the GTK and WPE
WebKit ports.</li>
<li><a rel="external" href="https://fosdem.org/2026/schedule/event/KMMLGM-webrtc_support_in_webkitgtk_and_wpewebkit_with_gstreamer_current_status_and_plan/">WebRTC support in WebKitGTK and WPEWebKit with GStreamer: Current status and
plans</a>
by Philippe Normand. Exactly what it says on the tin.</li>
<li><a rel="external" href="https://fosdem.org/2026/schedule/event/NJM3KB-mathml-core/">Interop and MathML
Core</a> by Eri
Pazos, about the ongoing effort to improve how different Web engines handle
MathML—including WebKit!</li>
</ul>
<p>The videos for the talks are already available, too.</p>
</div>
<div class="wip-end">
<p>That’s all for this week!</p>
</div> Igalia WebKit Teamhttps://blogs.igalia.com/webkitAndy Wingo: six thoughts on generating chttps://wingolog.org/2026/02/09/six-thoughts-on-generating-c2026-02-09T13:47:44+00:00
<div><p>So I work in compilers, which means that I write programs that translate
programs to programs. Sometimes you will want to target a language at a
higher level than just, like, assembler, and oftentimes C is that
language. Generating C is less fraught than writing C by hand, as the
generator can often avoid the undefined-behavior pitfalls that one has
to be so careful about when writing C by hand. Still, I have found some
patterns that help me get good results.</p><p>Today’s note is a quick summary of things that work for me. I won’t be
so vain as to call them “best practices”, but they are my practices, and
you can have them too if you like.</p><h3>static inline functions enable data abstraction</h3><p>When I learned C, in the early days of
<a href="https://gstreamer.freedesktop.org/">GStreamer</a> (oh bless its heart it
still has the same web page!), we used lots of preprocessor macros.
Mostly we got the message over time that <a href="https://gcc.gnu.org/onlinedocs/gcc/Inline.html">many macro uses should have
been inline functions</a>;
macros are for token-pasting and generating names, not for data access
or other implementation.</p><p>But what I did not appreciate until much later was that always-inline
functions remove any possible performance penalty for data abstractions.
For example, in <a href="https://codeberg.org/andywingo/wastrel">Wastrel</a>, I can
describe a bounded range of WebAssembly memory via a <tt>memory</tt> struct,
and an access to that memory in another struct:</p><pre class="pre-c">struct memory { uintptr_t base; uint64_t size; };
struct access { uint32_t addr; uint32_t len; };
</pre><p>And then if I want a writable pointer to that memory, I can do so:</p><pre class="pre-c">#define static_inline \
static inline __attribute__((always_inline))
static_inline void* write_ptr(struct memory m, struct access a) {
BOUNDS_CHECK(m, a);
char *base = __builtin_assume_aligned((char *) m.base_addr, 4096);
return (void *) (base + a.addr);
}
</pre><p>(Wastrel usually omits any code for <tt>BOUNDS_CHECK</tt>, and just relies on
memory being mapped into a <tt>PROT_NONE</tt> region of an appropriate size.
We use a macro there because if the bounds check fails and kills the
process, it’s nice to be able to use <tt>__FILE__</tt> and <tt>__LINE__</tt>.)</p><p>Regardless of whether explicit bounds checks are enabled, the
<tt>static_inline</tt> attribute ensures that the abstraction cost is entirely
burned away; and in the case where bounds checks are elided, we don’t
need the <tt>size</tt> of the memory or the <tt>len</tt> of the access, so they won’t
be allocated at all.</p><p>If <tt>write_ptr</tt> wasn’t <tt>static_inline</tt>, I would be a little worried that
somewhere one of these <tt>struct</tt> values would get passed through memory.
This is mostly a concern with functions that return structs by value;
whereas in e.g. AArch64, returning a <tt>struct memory</tt> would use the same
registers that a call to <tt>void (*)(struct memory)</tt> would use for the
argument, the SYS-V x64 ABI only allocates two general-purpose registers
to be used for return values. I would mostly prefer to not think about
this flavor of bottleneck, and that is what static inline functions do
for me.</p><h3>avoid implicit integer conversions</h3><p>C has an odd set of default integer conversions, for example promoting
<tt>uint8_t</tt> to <tt>signed int</tt>, and also has weird boundary conditions for
signed integers. When generating C, we should probably sidestep these
rules and instead be explicit: define static inline <tt>u8_to_u32</tt>,
<tt>s16_to_s32</tt>, etc conversion functions, and turn on <tt>-Wconversion</tt>.</p><p>Using static inline cast functions also allows the generated code to assert
that operands are of a particular type. Ideally, you end up in a
situation where all casts are in your helper functions, and no cast is
in generated code.</p><h3>wrap raw pointers and integers with intent</h3><p><a href="https://github.com/wingo/whippet">Whippet</a> is a garbage collector
written in C. A garbage collector cuts across all data abstractions:
objects are sometimes viewed as absolute addresses, or ranges in a paged
space, or offsets from the beginning of an aligned region, and so on.
If you represent all of these concepts with <tt>size_t</tt> or <tt>uintptr_t</tt> or
whatever, you’re going to have a bad time. So Whippet has <a href="https://github.com/wingo/whippet/blob/main/api/gc-ref.h#L9-L11"><tt>struct gc_ref</tt></a>,
<a href="https://github.com/wingo/whippet/blob/main/api/gc-edge.h#L6-L8"><tt>struct gc_edge</tt></a>,
and the like: single-member structs whose purpose it is to avoid
confusion by partitioning sets of applicable operations. A
<tt>gc_edge_address</tt> call will never apply to a <tt>struct gc_ref</tt>, and so on
for other types and operations.</p><p>This is a great pattern for hand-written code, but it’s particularly
powerful for compilers: you will often end up compiling a term of a
known type or kind and you would like to avoid mistakes in the residualized
C.</p><p>For example, when compiling WebAssembly, consider <a href="https://webassembly.github.io/spec/core/exec/instructions.html#xref-syntax-instructions-syntax-instr-struct-mathsf-struct-set-x-i"><tt>struct.set</tt>‘s
operational
semantics</a>:
the textual rendering states, “Assert: Due to validation, <i>val</i> is some
<tt>ref.struct structaddr</tt>.” Wouldn’t it be nice if this assertion could
translate to C? Well in this case it can: with single-inheritance
subtyping (as WebAssembly has), you can make a forest of pointer
subtypes:</p><pre class="pre-c">typedef struct anyref { uintptr_t value; } anyref;
typedef struct eqref { anyref p; } eqref;
typedef struct i31ref { eqref p; } i31ref;
typedef struct arrayref { eqref p; } arrayref;
typedef struct structref { eqref p; } structref;
</pre><p>So for a <tt>(type $type_0 (struct (mut f64)))</tt>, I might generate:</p><pre class="pre-c">typedef struct type_0ref { structref p; } type_0ref;
</pre><p>Then if I generate a field setter for <tt>$type_0</tt>, I make it take a
<tt>type_0ref</tt>:</p><pre class="pre-c">static inline void
type_0_set_field_0(type_0ref obj, double val) {
...
}
</pre><p>In this way the types carry through from source to target language.
There is a similar type forest for the actual object representations:</p><pre>typedef struct wasm_any { uintptr_t type_tag; } wasm_any;
typedef struct wasm_struct { wasm_any p; } wasm_struct;
typedef struct type_0 { wasm_struct p; double field_0; } type_0;
...
</pre><p>And we generate little cast routines to go back and forth between
<tt>type_0ref</tt> and <tt>type_0*</tt> as needed. There is no overhead because all
routines are static inline, and we get pointer subtyping for free: if a
<tt>struct.set $type_0 0</tt> instruction is passed a subtype of <tt>$type_0</tt>, the
compiler can generate an upcast that type-checks.</p><h3>fear not <tt>memcpy</tt></h3><p>In WebAssembly, accesses to linear memory are not necessarily aligned,
so we can’t just cast an address to (say) <tt>int32_t*</tt> and dereference.
Instead we <tt>memcpy(&i32, addr, sizeof(int32_t))</tt>, and trust the compiler
to just emit an unaligned load if it can (and it can). No need for more
words here!</p><h3>for ABI and tail calls, perform manual register allocation</h3><p>So, <a href="https://gcc.gnu.org/onlinedocs/gcc/Statement-Attributes.html#index-musttail-statement-attribute">GCC finally has
<tt>__attribute__((musttail))</tt></a>:
praise be. However, when compiling WebAssembly, it could be that you
end up compiling a function with, like 30 arguments, or 30 return
values; I don’t trust a C compiler to reliably shuffle between different
stack argument needs at tail calls to or from such a function. It could
even refuse to compile a file if it can’t meet its <tt>musttail</tt>
obligations; not a good characteristic for a target language.</p><p>Really you would like it if all function parameters were allocated to
registers. You can ensure this is the case if, say, you only pass the
first <i>n</i> values in registers, and then pass the rest in global
variables. You don’t need to pass them on a stack, because you can make
the callee load them back to locals as part of the prologue.</p><p>What’s fun about this is that it also neatly enables multiple return
values when compiling to C: simply go through the set of function types
used in your program, allocate enough global variables of the right
types to store all return values, and make a function epilogue store any
“excess” return values—those beyond the first return value, if any—in
global variables, and have callers reload those values right after
calls.</p><h3>what’s not to like</h3><p>Generating C is a local optimum: you get the industrial-strength
instruction selection and register allocation of GCC or Clang, you don’t
have to implement many peephole-style optimizations, and you get to link
to to possibly-inlinable C runtime routines. It’s hard to improve over
this design point in a marginal way.</p><p>There are drawbacks, of course. As a Schemer, my largest source of
annoyance is that I don’t have control of the stack: I don’t know how
much stack a given function will need, nor can I extend the stack of my
program in any reasonable way. I can’t iterate the stack to precisely
enumerate embedded pointers (<a href="https://wingolog.org/archives/2024/09/07/conservative-gc-can-be-faster-than-precise-gc">but perhaps that’s
fine</a>).
I certainly can’t slice a stack to capture a delimited continuation.</p><p>The other major irritation is about side tables: one would like to be
able to implement so-called <a href="https://devblogs.microsoft.com/oldnewthing/20220228-00/?p=106296">zero-cost
exceptions</a>,
but without support from the compiler and toolchain, it’s impossible.</p><p>And finally, source-level debugging is gnarly. You would like to be
able to embed DWARF information corresponding to the code you
residualize; I don’t know how to do that when generating C.</p><p>(Why not Rust, you ask? Of course you are asking that. For what it is
worth, I have found that lifetimes are a frontend issue; if I had a
source language with explicit lifetimes, I would consider producing
Rust, as I could machine-check that the output has the same guarantees
as the input. Likewise if I were using a Rust standard library. But if
you are compiling <i>from</i> a language without fancy lifetimes, I don’t
know what you would get from Rust: fewer implicit conversions, yes, but
less mature tail call support, longer compile times... it’s a wash, I
think.)</p><p>Oh well. Nothing is perfect, and it’s best to go into things with your
eyes wide open. If you got down to here, I hope these notes help you in
your generations. For me, once my generated C type-checked, it worked:
very little debugging has been necessary. Hacking is not always like
this, but I’ll take it when it comes. Until next time, happy hacking!</p></div> Andy Wingohttps://wingolog.org/Andy Wingo: ahead-of-time wasm gc in wastrelhttps://wingolog.org/2026/02/06/ahead-of-time-wasm-gc-in-wastrel2026-02-06T15:48:17+00:00
<div><p>Hello friends! Today, a quick note: the
<a href="https://codeberg.org/andywingo/wastrel/">Wastrel</a> ahead-of-time
WebAssembly compiler now supports managed memory via garbage collection!</p><h3>hello, world</h3><p>The quickest demo I have is that you should check out and build wastrel
itself:</p><pre>git clone https://codeberg.org/andywingo/wastrel
cd wastrel
guix shell
# alternately: sudo apt install guile-3.0 guile-3.0-dev \
# pkg-config gcc automake autoconf make
autoreconf -vif && ./configure
make -j
</pre><p>Then run a quick check with <a href="https://codeberg.org/andywingo/wastrel/src/branch/main/examples/simple-string.wat">hello, world</a>:</p><pre>$ ./pre-inst-env wastrel examples/simple-string.wat
Hello, world!
</pre><p>Now give a check to
<a href="https://codeberg.org/andywingo/wastrel/src/branch/main/examples/gcbench.wat"><tt>gcbench</tt></a>,
a classic GC micro-benchmark:</p><pre>$ WASTREL_PRINT_STATS=1 ./pre-inst-env wastrel examples/gcbench.wat
Garbage Collector Test
Creating long-lived binary tree of depth 16
Creating a long-lived array of 500000 doubles
Creating 33824 trees of depth 4
Top-down construction: 10.189 msec
Bottom-up construction: 8.629 msec
Creating 8256 trees of depth 6
Top-down construction: 8.075 msec
Bottom-up construction: 8.754 msec
Creating 2052 trees of depth 8
Top-down construction: 7.980 msec
Bottom-up construction: 8.030 msec
Creating 512 trees of depth 10
Top-down construction: 7.719 msec
Bottom-up construction: 9.631 msec
Creating 128 trees of depth 12
Top-down construction: 11.084 msec
Bottom-up construction: 9.315 msec
Creating 32 trees of depth 14
Top-down construction: 9.023 msec
Bottom-up construction: 20.670 msec
Creating 8 trees of depth 16
Top-down construction: 9.212 msec
Bottom-up construction: 9.002 msec
Completed 32 major collections (0 minor).
138.673 ms total time (12.603 stopped); 209.372 ms CPU time (83.327 stopped).
0.368 ms median pause time, 0.512 p95, 0.800 max.
Heap size is 26.739 MB (max 26.739 MB); peak live data 5.548 MB.
</pre><p>We set <tt>WASTREL_PRINT_STATS=1</tt> to get those last 4 lines.
So, this is a microbenchmark: it runs for only 138 ms, and the heap is
tiny (26.7 MB). It does collect 30 times, which is something.</p><h3>is it good?</h3><p>I know what you are thinking: OK, it’s a microbenchmark, but can it tell us anything about how Wastrel compares to V8? Well, probably so:</p><pre>$ guix shell node time -- \
time node js-runtime/run.js -- \
js-runtime/wtf8.wasm examples/gcbench.wasm
Garbage Collector Test
[... some output elided ...]
total_heap_size: 48082944
[...]
0.23user 0.03system 0:00.20elapsed 128%CPU (0avgtext+0avgdata 87844maxresident)k
0inputs+0outputs (0major+13325minor)pagefaults 0swaps
</pre><p>Which is to say, V8 takes more CPU time (230ms vs 209ms) and more
wall-clock time (200ms vs 138ms). Also it uses twice as much
managed memory (48 MB vs 26.7 MB), and more than that for the total
process (88 MB vs 34 MB, not shown).</p><h3>improving on v8, really?</h3><p>Let’s try with
<a href="https://codeberg.org/andywingo/wastrel/src/branch/main/examples/quads.wat"><tt>quads</tt></a>,
which at least has a larger active heap size. This time we’ll compile a binary and then run it:</p><pre>$ ./pre-inst-env wastrel compile -o quads examples/quads.wat
$ WASTREL_PRINT_STATS=1 guix shell time -- time ./quads
Making quad tree of depth 10 (1398101 nodes).
construction: 23.274 msec
Allocating garbage tree of depth 9 (349525 nodes), 60 times, validating live tree each time.
allocation loop: 826.310 msec
quads test: 860.018 msec
Completed 26 major collections (0 minor).
848.825 ms total time (85.533 stopped); 1349.199 ms CPU time (585.936 stopped).
3.456 ms median pause time, 3.840 p95, 5.888 max.
Heap size is 133.333 MB (max 133.333 MB); peak live data 82.416 MB.
1.35user 0.01system 0:00.86elapsed 157%CPU (0avgtext+0avgdata 141496maxresident)k
0inputs+0outputs (0major+231minor)pagefaults 0swaps
</pre><p>Compare to V8 via node:</p><pre>$ guix shell node time -- time node js-runtime/run.js -- js-runtime/wtf8.wasm examples/quads.wasm
Making quad tree of depth 10 (1398101 nodes).
construction: 64.524 msec
Allocating garbage tree of depth 9 (349525 nodes), 60 times, validating live tree each time.
allocation loop: 2288.092 msec
quads test: 2394.361 msec
total_heap_size: 156798976
[...]
3.74user 0.24system 0:02.46elapsed 161%CPU (0avgtext+0avgdata 382992maxresident)k
0inputs+0outputs (0major+87866minor)pagefaults 0swaps
</pre><p>Which is to say, <i>wastrel is almost three times as fast, while using
almost three times less memory</i>: 2460ms (v8) vs 849ms (wastrel), and
383MB vs 141 MB.</p><h3>zowee!</h3><p>So, yes, the V8 times include the time to compile the wasm module on the fly. No idea what is going on with tiering, either, but I understand that tiering up is a thing these days; this is node v22.14, released about a year ago, for what that’s worth. Also, there is a V8-specific module to do some impedance-matching with regards to strings; in Wastrel they are WTF-8 byte arrays, whereas in Node they are JS strings. But it’s not a string benchmark, so I doubt that’s a significant factor.</p><p>I think the performance edge comes in having the program ahead-of-time: you can statically allocate type checks, statically allocate object shapes, and the compiler can see through it all. But I don’t really know yet, as I just got everything working this week.</p><p>Wastrel with GC is demo-quality, thus far. If you’re interested in the back-story and the making-of, see <a href="https://wingolog.org/archives/2025/10/30/wastrel-a-profligate-implementation-of-webassembly">my intro to Wastrel</a> article from October, or the FOSDEM talk from last week:</p><video poster="https://wingolog.org/pub/fosdem-2026-wastrel-webassembly-without-the-runtime.jpg" controls="controls" width="100%">
<source src="https://wingolog.org/pub/fosdem-2026-wastrel-webassembly-without-the-runtime.vp9.webm" type="video/webm; codecs=vp9,opus"></source>
<source src="https://video.fosdem.org/2026/ub4136/HT9HAG-wastrel-webassembly-without-the-runtime.mp4" type="video/mp4"></source>
</video><p>Slides <a href="https://wingolog.org/pub/fosdem-wastrel-2026-slides.pdf">here</a>, if that’s your thing.</p><p>More to share on this next week, but for now I just wanted to get the
word out. Happy hacking and have a nice weekend!</p></div> Andy Wingohttps://wingolog.org/Igalia Compilers Team: Igalia’s Compilers Team - A 2025 Retrospectivehttps://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/2026-02-06T00:00:00+00:00
<p>Hey, hey, it’s the beginning of a new year and before we sprint too far into 2026, let’s take a quick breather, zoom out, and celebrate what Igalia’s awesome compilers team got up to in 2025.
Over the past year we’ve been deeply involved in shaping <em>and</em> shipping key Web and JavaScript standards, which includes not just participating in committees but also chairing and actively moving the proposals forward.
We worked on major JavaScript runtimes and foundational ahead-of-time compilers including LLVM and Mesa, as well as JIT CPU emulation, and smaller language VMs.</p>
<p>Some big highlights of this year included our work on FEX and Mesa that helped Valve with their upcomimg gaming devices - the Steam Frame and the Steam Machine (we talk more about this in a dedicated <a href="https://www.igalia.com/2025/11/helpingvalve.html">blog post</a>), our continued involvement in supporting RISC-V in contemporary compilers, and our key role in multiple WebAssembly implementations.</p>
<h2 id="standards" tabindex="-1">Standards <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h2>
<p>In 2025, our standards work focused on parts of JavaScript developers touch every day like time, numbers, modules and more. Across TC39, WHATWG, WinterTC and internationalization ecosystems, we helped move proposals forward while turning specifications into running, interoperable code. So yep, let’s talk about our most significant standards contributions from the year!</p>
<h3 id="temporal" tabindex="-1">Temporal <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h3>
<p>It’s been an exciting year for the <a href="https://github.com/tc39/proposal-temporal/">Temporal proposal</a>, which adds a modern date-and-time API to JavaScript. For starters, MDN published their <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Temporal">API documentation for it</a>, which created a huge surge of interest.</p>
<p>On the shipping front: Firefox shipped their implementation of the proposal and it’s now available in Firefox 139. Chrome moved their implementation to beta in late 2025, and released it in early 2026. Meanwhile, we’ve been steadily working on getting Temporal into Safari, with support for correct duration math and the <code>PlainMonthDay</code> and <code>PlainYearMonth</code> types added during 2025/early 2026. You can read more about this in our recent <a href="https://blogs.igalia.com/compilers/2026/02/02/implementing-the-temporal-proposal-in-javascriptcore/">post on implementing Temporal</a>.</p>
<p>Alongside that, we’ve been working on the <a href="https://github.com/tc39/proposal-intl-era-monthcode/">Intl Era and Month Code proposal</a>, which has expanded in scope beyond era codes and month codes to cover other calendar-specific things that a JS engine with <code>Intl</code> must implement. This allows developers to make use of a number of commonly-used non-Gregorian calendars, including but not limited to the calendar used in Thailand, the Japanese Imperial calendar, and Islamic calendars.</p>
<h3 id="decimal" tabindex="-1">Decimal <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h3>
<p>A lot of our recent work around the <a href="https://github.com/tc39/proposal-decimal">Decimal</a> proposal has now migrated to a newer similarly number-focused effort called <a href="https://github.com/tc39/proposal-amount">Amount</a> (formerly known as "Measure" and officially renamed in 2025). The proposal reached Stage 1 at the November 2024 TC39 plenary. We also launched a <a href="https://www.npmjs.com/package/proposal-amount">polyfill</a>.
Since then, we have iterated on the Amount API and data model a number of times in plenary. So while it started 2025 at <a href="https://tc39.es/process-document/">stage 1</a> and remains at stage 1 heading into 2026, the design is noticeably sharper, thanks to a lot of TC39 discussions. We’re lined up to keep it pushing forward next year.</p>
<p>And because numerics work benefits a ton from regular iteration, in late 2024, we also kicked off a biweekly community call ("JS Numerics") for those in TC39 interested in proposals related to numbers, such as Decimal, Amount, <a href="https://github.com/tc39/proposal-intl-keep-trailing-zeros">intl-keep-trailing-zeros</a>, etc. We still host it, and it’s turned out to be a genuinely productive place to hash things out without waiting for plenary.</p>
<h3 id="source-maps" tabindex="-1">Source Maps <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h3>
<p>We implemented draft range mappings implementations on a number of systems: WebKit, <a href="https://github.com/jridgewell/sourcemaps/tree/main/packages/sourcemap-codec">Justin Ridgewell’s source map decoder</a>, a source map validator, and more.</p>
<p>We also facilitated source map TG4 meetings and assisted with advancing proposals such as the scopes proposal.
Throughout the year, we continued serving as editors for the ECMA-426 specification, landing a steady stream of improvements and clarifications.</p>
<h3 id="modules" tabindex="-1">Modules <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h3>
<p>We pushed JavaScript’s module system forward on multiple fronts, especially around reducing the impact of modules on application startup:</p>
<ul>
<li>we advanced the <a href="https://github.com/tc39/proposal-defer-import-eval"><code>import defer</code></a> proposal, which allows modules to be be synchronously lazily evaluated, to Stage 3 in TC39. We are working on its implementations in V8 and WebKit, and we implemented it in Babel, webpack (together with other community members) and TypeScript.</li>
<li>we presented <a href="https://github.com/tc39/proposal-deferred-reexports"><code>export defer</code></a> and pushed it to Stage 2 in TC39: it allows more granular lazy evaluation, as well as built-in browser support for tree-shaking of re-exports.</li>
</ul>
<p>We are among the most active members of the "Modules Harmony" group, an unofficial group within TC39 that aims at improving the capabilities of ESM to improve native adoption, while making sure that all modules proposals are well-coordinated with each other.</p>
<h3 id="asynccontext" tabindex="-1">AsyncContext <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h3>
<p>And over in the <a href="https://github.com/tc39/proposal-async-context">AsyncContext proposal</a> world, we spent 2025 focusing on how the proposal should integrate with various <a href="https://github.com/tc39/proposal-async-context/blob/master/WEB-INTEGRATION.md">web APIs</a>. The way AsyncContext interacts with the web platform is unusually pervasive, and more challenging to figure out than the core TC39 proposal itself.</p>
<p>In a first for a TC39 proposal, it is not also going through the <a href="https://whatwg.org/stages">WHATWG stages process</a>, where it has reached Stage 1. This gives us a clearer path to iterate with direct feedback from browser engines.</p>
<h3 id="unicode-standards" tabindex="-1">Unicode standards <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h3>
<p>We have been working on <a href="https://messageformat.unicode.org/">Unicode MessageFormat</a>, which is a Unicode standard for localizable dynamic message strings, designed to make it simple to create natural sounding localized messages.</p>
<p>In 2025, we helped the <a href="https://icu.unicode.org/">ICU4C</a> implementation of Unicode MessageFormat align with ongoing specification changes. We also carried out <a href="https://github.com/unicode-org/icu/pull/3536">experimental work</a> on the custom function interface to support more extensible formatting formatting capabilities, which is currently under review.</p>
<h3 id="wintertc" tabindex="-1">WinterTC <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h3>
<p>In December 2024, <a href="https://wintertc.org/">WinterTC</a> was formed to replace WinterCG as an official ECMA <a href="https://ecma-international.org/technical-committees/">Techincal committee</a> to achieve some level of API interoperability across server-side JavaScript runtimes, especially for APIs that are common with the web.</p>
<p>We started chairing (together folks from Deno), and became involved in admin tasks.
Over the course of the year, we:</p>
<ul>
<li>Identified a core set of Web APIs that should be shared across runtimes and standardized it as the <a href="https://min-common-api.proposal.wintertc.org/">Minimum Common Web API specification</a>, which was officially published at the ECMA General Assembly in December.</li>
<li>Started identifying a subset of the WPT test suite that covers the Minimum Common Web API, and made some headway towards clarifying which parts of the Fetch specification server-side runtimes should support, and which they shouldn’t.</li>
</ul>
<p>Additionally, if you’re curious, we gave two talks about WinterTC: <a href="https://youtu.be/elGNcCv57ZE">one at the Web Engines Hackfest together with Deno folks, the other chair of WinterTC</a>; and <a href="https://youtu.be/T9g3DtdTsGU">one at JSConf.JP</a>.</p>
<h2 id="node-js" tabindex="-1">Node.js <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h2>
<p>In Node.js, our work in 2025 spanned interoperability, proxy integration, and adding support for HTTP/HTTPS proxy and shipping integration of System CA certificates across platforms.</p>
<p>On the module side, we delivered interoperability features and bug fixes for <code>require(esm)</code> and helped stabilize it (read more about it in our colleague <a href="https://joyeecheung.github.io/blog/2025/12/30/require-esm-in-node-js-from-experiment-to-stability/">Joyee’s blog</a>), <a href="https://github.com/nodejs/node/pull/55698">shipped synchronous and universal loader hooks</a> (now promoted to release candidate), integrated TypeScript into the <a href="https://github.com/nodejs/node/issues/52696">compile cache</a>, and improved the portability of the cache. Check out <a href="https://www.youtube.com/watch?v=MYVn6TuZCEQ">Joyee’s talk at JSConf JP</a> if you are interested in learning more about these new module loader features.</p>
<p>We also strengthened <a href="https://github.com/nodejs/node/issues/58990">System CA certificate integration</a> along with JavaScript APIs for reading and configuring trusted CAs globally, <a href="https://github.com/nodejs/node/issues/57872">adding built-in HTTP/HTTPS proxy support</a>, and expanding <a href="https://nodejs.org/en/learn/http/enterprise-network-configuration">documentation for using Node.js in enterprise environments</a>.</p>
<p>Additionally, we started migration to the new V8 CppHeap model in Node.js and improved its V8 Platform integration.</p>
<h2 id="v8" tabindex="-1">V8 <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h2>
<p>On the V8 side of things, we worked on <code>HeapProfiler::QueryHolders</code>, a companion API to the <a href="https://chromium-review.googlesource.com/c/v8/v8/+/5006373">QueryObjects API</a>.</p>
<p>We worked on extending the <a href="https://v8.github.io/api/head/classv8_1_1HeapStatistics.html">HeapStatistics API</a> to include a new field that tracks the total of bytes allocated in an Isolate since its creation. This counter excludes allocations that happen due to GC operations and it’s intended to be used to create memory regression tests. Here’s the <a href="https://chromium-review.googlesource.com/c/v8/v8/+/6996467">CL</a> highlighting these changes.</p>
<p>We also started working on implementation of the <a href="https://github.com/tc39/proposal-defer-import-eval">import defer proposal</a> on V8. This proposal extends the syntax of ESM imports to allow a mode where the evaluation of an imported module is deferred until its first access.
From our work in Node.js, we upstreamed a few improvements and bug fixes in V8’s embedder API and startup snapshot implementation. We also contributed to Node.js’s V8 upgrade and upstreamed patches to address issues discovered in the upgrade.</p>
<p>As part of our collaboration with Cloudflare we added <code>v8::IsolateGroup</code>: a new unit that owns an independent pointer-compression cage. We then also enabled multiple cages per process (“multi-cage”), so thousands of isolates aren’t forced into one < 4 GiB region. Finally, we extended this to multiple sandboxes: one sandbox per isolate group instead of a single process-wide sandbox. In the end this work helped Cloudflare to enable the sandbox in Cloudflare workers.</p>
<h2 id="babel" tabindex="-1">Babel <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h2>
<p>Our team also helps co-maintianing <a href="https://babeljs.io">Babel</a>. The build tools area is very active nowdays, and we strongly believe that alongside the innovation happening in the ecosystem companies need to invest on ensuring that the older and widely used tools keep being actively maintained and improving over time.</p>
<h2 id="llvm" tabindex="-1">LLVM <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h2>
<p>In LLVM, we helped extend auto-vectorization to take full advantage of the RISC-V vector extension’s many innovative features.</p>
<p>After four years of development by contributors from multiple organizations including Igalia, we finally enabled <a href="https://github.com/llvm/llvm-project/pull/151681">EVL tail folding for RISC-V</a> as an LLVM default.</p>
<p>This work took advantage of the new VPlan infrastructure, extending it and developing it iteratively in-tree when needed to give us the ability to model a relatively complex vectorization scheme.</p>
<p>We also added <a href="https://github.com/llvm/llvm-project/pull/141865">full scalable segmented access support</a> and <a href="https://github.com/llvm/llvm-project/pull/158690">taught the loop vectorizer to make smarter cost model decisions</a>.</p>
<p>Building on top of this, we achieved <a href="https://blogs.igalia.com/compilers/2025/05/28/improvements-to-risc-v-vector-code-generation-in-llvm/">improvements in RISC-V vectorization</a>. In parallel, we also worked on LLVM scheduling models for the SpacemiT-x60 RISC-V processor, scoring a whopping <a href="https://blogs.igalia.com/compilers/2025/11/22/unlocking-15-more-performance-a-case-study-in-llvm-optimization-for-risc-v/">16% performance improvement</a>.</p>
<p>Regarding WebAssembly in LLVM we landed a number of commits that improve size and performance of generated code, and added support for a few ISD nodes that enable vectorization for otherwise sequential codegen.</p>
<h2 id="mesa-ir3" tabindex="-1">Mesa/IR3 <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h2>
<p>We continued work on improving IR3, the Mesa compiler backend for Qualcomm Adreno GPUs. We <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31222">implemented support for alias instructions</a> novel to the <em>a7xx</em> generation of GPUs, significantly improving register pressure for texture instructions. We also <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34108">refactored the post-RA scheduler</a> to be able to reuse the legalization logic, significantly improving its accuracy when calculating instruction delays and, consequently, reducing latency.</p>
<p>We also <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33602">added debug tooling</a> to easily identify the shader that causes problems, among many other <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/?sort=merged_at_desc&state=merged&author_username=jnoorman&first_page_size=20">optimizations, implementations of new instructions, and bug fixes</a>.</p>
<h2 id="guile-and-whippet" tabindex="-1">Guile and Whippet <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h2>
<p>This year we also made some interesting progress on <a href="https://github.com/wingo/whippet">Whippet</a>, a no-dependencies embeddable garbage collector. We were able to integrate Whippet into the <a href="https://gnu.org/s/guile">Guile</a> Scheme implementation, replacing Guile’s use of the venerable Boehm-Demers-Weiser library. We hope to merge the <a href="https://codeberg.org/guile/guile/src/branch/wip-whippet">integration branch</a> upstream over the next months. We also wrote up a <a href="https://arxiv.org/abs/2503.16971">paper describing the innards of some of Whippet’s algorithms</a>.</p>
<p>We think Whippet is interesting whereever a programming language needs a garbage collector: it’s customizable and easy to manage, as it is designed to be "vendored" directly into a user’s source code repository. We are now in the phase of building out examples to allow for proper performance evaluation; after a <a href="https://github.com/wingo/whiffle">bespoke Scheme implementation</a> and Guile itself, we also wrote a <a href="https://codeberg.org/wingo/wastrel">fresh ahead-of-time compiler for WebAssembly</a>, which in the near future will gain support for the garbage collection WebAssembly extensions, thanks to Whippet. For more info on our progress, check out <a href="https://wingolog.org/tags/whippet">Andy Wingo’s blog series</a>.</p>
<h2 id="fex" tabindex="-1">FEX <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/06/igalia-s-compilers-team-a-2025-retrospective/">#</a></h2>
<p>For the FEX x86 JIT emulator for ARM64, we worked on X87 Floating-Point Emulation, <a href="https://github.com/FEX-Emu/FEX/pull/4642">implemented x87 invalid operation bit handling in F80 mode</a>, <a href="https://github.com/FEX-Emu/FEX/pull/4811">fixed IEEE 754 unordered comparison detection</a>, and <a href="https://github.com/FEX-Emu/FEX/pull/5009">added f80 stack xchg optimization for fast path</a>.</p>
<p>Besides <a href="https://github.com/FEX-Emu/FEX/pull/4846">further</a> <a href="https://github.com/FEX-Emu/FEX/pull/5062">fixes</a> for instruction implementations, we also worked on memory and stability improvements, <a href="https://github.com/FEX-Emu/FEX/pull/4540">protecting the last page of CodeBuffer</a>, and <a href="https://github.com/FEX-Emu/FEX/pull/5035">implementing gradual memory growth</a>. Finally, we also did some infrastructure work by <a href="https://github.com/FEX-Emu/FEX/pull/4661">upgrading the codebase to clang-format-19</a> and <a href="https://github.com/FEX-Emu/FEX/pull/4488">adding UBSAN support</a>.</p>
<p>This year’s FEX work focused on x87 floating-point correctness and 32-bit compatibility—both critical for Valve’s Steam Frame, the ARM-powered VR headset they announced in November that uses FEX to run x86 games.</p>
<p>The x87 improvements matter because many games and middleware still use legacy floating-point code. Subtle deviations from Intel’s behavior—wrong exception flags, incorrect comparison semantics—cause crashes or weird behavior. Fixing invalid operation exceptions, IEEE 754 comparisons, and optimizing the x87 stack pass eliminated entire classes of compatibility bugs.</p>
<p>The 32-bit fixes are just as important. A huge chunk of Steam’s catalog is still 32-bit, and even 64-bit games often ship 32-bit launchers. Getting <code>fcntl</code> and addressing modes right means these games just work without users needing to do anything.</p>
<p>In total, this work gave Valve confidence that the Steam Frame could ship with solid library coverage, letting them announce the device on schedule.</p>
<hr />
<p>Alright, that’s a wrap on our 2025 retrospective! We hope you had as much fun reading it as we had writing it, and building all the things we talked about along the way. We’ll see you next year with another roundup; until then, you can keep up with our latest work on the <a href="https://blogs.igalia.com/compilers/">team blog</a>.</p> Igalia Compilers Teamhttps://blogs.igalia.com/compilers/Manuel Rego: FOSDEM 2026https://blogs.igalia.com/mrego/fosdem-2026/2026-02-05T00:00:00+00:00
<p>Last weekend I was in Brussels attending <a href="https://fosdem.org/2026/"><strong>FOSDEM</strong></a>. A big event with lots of people and lots of things happening in parallel, where it’s impossible to be everywhere.</p>
<h2 id="browser-and-web-platform" tabindex="-1">Browser and web platform 🌐 <a class="header-anchor" href="https://blogs.igalia.com/mrego/fosdem-2026/">#</a></h2>
<p>My main participation was in the <a href="https://fosdem.org/2026/schedule/track/browser-and-web-platform/"><strong>Browser and web platform</strong></a> devroom, a <em>new</em> devroom (it seems it already happened back in the days, but not in the recent years) where I had the chance to speak about <a href="https://servo.org/">Servo</a> with a talk titled <a href="https://fosdem.org/2026/schedule/event/LXFKS9-servo-project-impact/"><strong>The Servo project and its impact on the web platform ecosystem</strong></a>. My colleagues from <a href="https://www.igalia.com/">Igalia</a> <a href="https://www.igalia.com/team/eri">Eri Pazos</a> and <a href="https://www.igalia.com/team/msanchez">Mario Sánchez Prada</a> were also speaking in that devroom about <a href="https://fosdem.org/2026/schedule/event/NJM3KB-mathml-core/"><strong>Interop and MathML Core</strong></a> and <a href="https://fosdem.org/2026/schedule/event/8ZL9BZ-web-platform-on-linux-devices-with-webkit/"><strong>The Web Platform on Linux devices with WebKit: where are we now?</strong></a> respectively. The room was fully packed a big part of the day, not unexpectedly many people are interested in the web platform.</p>
<h3 id="mathml" tabindex="-1">MathML <a class="header-anchor" href="https://blogs.igalia.com/mrego/fosdem-2026/">#</a></h3>
<p><a href="https://fosdem.org/2026/schedule/event/NJM3KB-mathml-core/"><strong>Eri was the first igalian talking in the room</strong></a>, they summarized the work Igalia has been doing for many years in MathML, on the standards side proposing the <a href="https://w3c.github.io/mathml-core/">MathML Core spec</a>, and on the implementation side <a href="https://www.igalia.com/2023/01/10/Igalia-Brings-MathML-Back-to-Chromium.html">bringing it back to Chromium</a> and improving it in Gecko and WebKit.</p>
<p>In the talk, Eri went into deep detail about the last additions we have been adding around MathML: <code>math-depth</code>, <code>math-shift</code>, RTL mirroring, <code>font-family: math</code>, etc. This work is part of an agreement with the <a href="https://www.sovereign.tech/">Sovereign Tech Fund</a>, big thanks for your support.</p>
<p>MathML is more ready than ever for production, someone from <a href="https://arxiv.org/">arXiv.org</a> in the audience mentioned that they are shipping it on millions of webpages today. Waiting for the day when <a href="https://www.wikipedia.org/">Wikipedia</a> switches to it by default, it will be a huge milestone.</p>
<h3 id="webkit-on-linux" tabindex="-1">WebKit on Linux <a class="header-anchor" href="https://blogs.igalia.com/mrego/fosdem-2026/">#</a></h3>
<p><a href="https://fosdem.org/2026/schedule/event/8ZL9BZ-web-platform-on-linux-devices-with-webkit/"><strong>Mario talked about the Linux ports of WebKit: WebKitGTK and WPE</strong></a>, both maintained by Igalia.</p>
<p>He explained what they are, the differences between them, reviewed their history and highlighted the big progress on the recent years, with multiple improvements in several areas: WebPlatform API, WebKit Container SDK, switch from Cairo to Skia graphics library, etc.</p>
<p>If you are curious about the status of things regarding them, you shouldn’t miss his talk.</p>
<h3 id="servo" tabindex="-1">Servo <a class="header-anchor" href="https://blogs.igalia.com/mrego/fosdem-2026/">#</a></h3>
<p><a href="https://fosdem.org/2026/schedule/event/LXFKS9-servo-project-impact/"><strong>My talk</strong></a> started with an introduction to the Servo project and the current status of things. I showed a few demos about how Servo works and some of the things it can do already. After that introduction, I explained how Servo has been contributing to the wider web platform ecosystem.</p>
<p>Like for the rest of talks, <a href="https://servo.org/slides/2026-02-fosdem-servo-web-platform/">slides</a> and <a href="https://mirrors.dotsrc.org/fosdem/2026/h1309/LXFKS9-servo-project-impact.av1.webm">video</a> are already available if you want to know all the details. Kudos to the organization for being so quick.</p>
<figure>
<p><img src="https://blogs.igalia.com/mrego/files/2026/02/servo-talk.jpg" alt="Picture of my talk with the slides about conclusions at the back" /></p>
<figcaption>Picture of my talk with the slides about conclusions at the back</figcaption>
</figure>
<p>As an anecdote, the night before the talk a <a href="https://tangled.org/me.webbeef.org/browser.html/">new project based on Servo was published</a>, a browser developed fully with web technologies using Servo underneath. I couldn’t resist the urge to build it, play with it and add it to the presentation. It looks really cool what Servo can do these days.</p>
<figure>
<video src="https://blogs.igalia.com/mrego/files/2026/02/servo-demo-fosdem-2026-beaver.webm" controls="" title="Screencast of the new Beaver browser based on Servo">
Screencast of the new Beaver browser based on Servo
</video>
<figcaption>Screencast of the new <a href="https://tangled.org/me.webbeef.org/browser.html/">Beaver browser</a> based on Servo</figcaption>
</figure>
<p>In addition, in the same devroom there was another Servo talk, this time by <a href="https://github.com/Taym95">Taym</a>, one of the Servo maintainers, <a href="https://fosdem.org/2026/schedule/event/3J8GUD-servo-streams-reimplementation/"><strong>Implementing Streams Spec in Servo web engine</strong></a>, where he explained all the work behind adding Streams support to Servo.</p>
<p>I am also very happy to met many Servo contributors at the event:
<a href="https://github.com/delan">Delan Azabani (@delan)</a>,
<a href="https://github.com/eerii">Eri Pazos (@eerii)</a>,
<a href="https://github.com/jschwe">Jonathan Schwender (@jschwe)</a>,
<a href="https://github.com/mrobinson">Martin Robinson (@mrobinson)</a>,
<a href="https://github.com/Taym95">Taym Haddadi (@Taym95)</a>,
<a href="https://github.com/TimvdLippe">Tim van der Lippe (@TimvdLippe)</a>.
We had the chance to have some informal conversations about the project, discussing some technical topics and ideas about things we can do in Servo.</p>
<p>The feedback about Servo has been extremely positive, people are really happy with the evolution of the project and excited about the future.</p>
<p>Apart from that, we also had the opportunity to talk to the nice folks from <a href="https://nlnet.nl/">NLnet</a> and the <a href="https://www.sovereign.tech/">Sovereign Tech Agency</a> who both have ongoings collaborations around Servo. The work these organizations do is really important for the open software development, and more should learn from them and join forces to try to fix the funding issues in <acronym title="Free/Libre and Open-Source Software">FLOSS</acronym> (more about this later, when talking about <a href="https://blogs.igalia.com/mrego/fosdem-2026/">Marga’s keynote</a>).</p>
<h2 id="igalia" tabindex="-1">Igalia <a class="header-anchor" href="https://blogs.igalia.com/mrego/fosdem-2026/">#</a></h2>
<p><a href="https://www.igalia.com/"><strong>Igalia</strong></a> presence in the open source community is very big, and not unexpectedly we have more talks at FOSDEM this year. This is the full list:</p>
<ul>
<li><a href="https://fosdem.org/2026/schedule/event/KMMLGM-webrtc_support_in_webkitgtk_and_wpewebkit_with_gstreamer_current_status_and_plan/">WebRTC support in WebKitGTK and WPEWebKit with GStreamer: Current status and plans</a> by <a href="https://igalia.com/team/pnormand">Philippe Normand</a></li>
<li><a href="https://fosdem.org/2026/schedule/event/BMFQSE-raspberry-pi-gpu-drivers-from-bookworm-to-trixie/">From Bookworm to Trixie: Upgrading the Raspberry Pi graphics stack</a> by <a href="https://igalia.com/team/chema">José María Casanova Crespo</a></li>
<li><a href="https://fosdem.org/2026/schedule/event/HT9HAG-wastrel-webassembly-without-the-runtime/">Wastrel: WebAssembly Without the Runtime</a> by <a href="https://igalia.com/team/awingo">Andy Wingo</a></li>
<li><a href="https://fosdem.org/2026/schedule/event/HX9XAY-mesa3d_the_heart_of_the_linux_graphics_stack/">Mesa3D: the heart of the linux graphics stack</a> by <a href="https://igalia.com/team/jasuarez">Juan A. Suarez</a></li>
</ul>
<p>Also our work in different projects was mentioned in several talks and conversations, we’re really happy regarding all the good feedback we got about Igalia contributions.</p>
<h3 id="keynote" tabindex="-1">Keynote <a class="header-anchor" href="https://blogs.igalia.com/mrego/fosdem-2026/">#</a></h3>
<p><a href="https://www.igalia.com/team/marga">Marga Manterola</a> was doing one of the keynotes talking about funding open source software: <a href="https://fosdem.org/2026/schedule/event/L3BK7S-free-as-in-burned-out/"><strong>Free as in Burned Out: Who Really Pays for Open Source?</strong>
</a>. First time Igalia was doing a keynote at FOSDEM, in the year we celebrate our 25th anniversary, and in <a href="https://bib.ulb.be/fr/documents/digitheque/institutionalia/histoire-de-lulb/historique/histoire-de-lulb-la-crise-de-68">a historical auditorium where May 68 started in Brussels</a>. 🎉</p>
<figure>
<p><img src="https://blogs.igalia.com/mrego/files/2026/02/marga-talk.jpg" alt="Picture of Marga Manterola's keynote at FOSDEM 2026" /></p>
<figcaption>Picture of Marga Manterola's keynote at FOSDEM 2026</figcaption>
</figure>
<p>The talk was great, Marga explained many of the issues with open source software sustainability and some potential ideas about how to improve the situation. This is a recurring topic in many conversations these days, we should find a way to get this fixed somehow.</p>
<p>There I learnt about the <a href="https://opensourcepledge.com/">Open Source Pledge</a>, an interesting initiative to get companies donating 2,000 USD per developer to open source software maintainers. 💰</p>
<h2 id="wrap-up" tabindex="-1">Wrap up <a class="header-anchor" href="https://blogs.igalia.com/mrego/fosdem-2026/">#</a></h2>
<p>All in all, it was a nice but very busy weekend in Brussels, weather was ok (a bit cold but not rainy) and waffles were delicious as usual. 🧇😋</p>
<p>Next big event on my calendar is the <a href="https://webengineshackfest.org/"><strong>Web Engines Hackfest</strong></a> in June, more than 50 people have already registered and a bunch of Servo folks will be there too. If you’re interested in the web platform and willing to discuss about different topics, we would be very happy to host you there.</p> Manuel Regohttps://blogs.igalia.com/mrego/Brian Kardell: What if we justhttps://bkardell.com/blog/WhatIfYouJustTellMe.html2026-02-04T05:00:00+00:00
<h1 class="contextual-heading">What if we just</h1>
<p class="segue">What if a better answer to a question I've been struggling with for more than a decade is just... Way simpler? Sharing a potentially half-baked idea for discussion.</p>
<p>Back in 2013 I wrote <a href="https://bkardell.com/blog/Dropping-The-F-Bomb-On-Standards.html">Dropping the F-Bomb on Web Standards</a>. The core argument was simple: the web works best when developers can invent “slang,” and standards bodies behave more like dictionary editors — watching what people actually say, then paving the cow paths that clearly matter.</p>
<p>It fed into the <a href="https://extensiblewebmanifesto.org/">Extensible Web Manifesto</a> (which followed) and over the years I've continued to push for study of what people are really doing. I have helped add features to the HTTPArchive crawl and built tools to analyze this data. </p>
<p>But it's hard. It's <a href="https://bkardell.com/blog/SecretLifeOfCustomElements.html#:~:text=What's%20difficult%20about%20it...">biased</a>. It's incomplete. Even the best crawl misses huge swaths of the web — anything behind logins, paywalls, dashboards, internal tools, or private deployments. And all of them have limits. It requires a ton of follow-up analysis and raises almost as many questions as it answers.</p>
<p>So lately I've been wondering (a bit like <a href="https://www.youtube.com/watch?v=L2DqcXeGTyc">Kramer</a>):</p>
<blockquote>
<p>What if we just... voluntarily shared this information?</p>
</blockquote>
<p>We don't need a formal standard or anyone's permission, we could just... share it, and build tools to share it easily in a well known format at a well known URL.</p>
<p>It could give us insight into the use of custom elements behind logins and paywalls and so on too, and tell us where they come from (a git repo, for example)... </p>
<p>Lots of things that are common happened through community effort and adoption. Normally you <em>get</em> something from it - <code>robots.txt</code> helped your site from being aggressively scraped in problematic ways, <code>ads.txt</code> helped say something about monetization, <code>feed.rss</code> helped syndicate, and so on. What do you get out of sharing this kind of info? </p>
<p>Individually, I'm not sure. But, collectively the benefit is clear: We'd finally have a real, ecosystem‑wide index of custom elements and how they're used. and hopefully a way to shape useful standards on them easily.</p>
<p>As to what that would look like, I'm not sure.</p>
<p>The community defined <a href="https://github.com/webcomponents/custom-elements-manifest/">Custom Element Manifest</a> already has a bit of uptake and tooling - we <em>could</em> just publish that to a well known URL. It might be too much, or too little.. A simpler manifest of just element names and URLs of packages/repositories that supply them would even be nice.</p>
<p>Is it too much? Too little? Maybe.</p>
<p>Is it worth trying? I think so.</p>
<p>What do you think? Is this worth trying somehow?</p> Brian Kardellhttp://bkardell.com/Igalia WebKit Team: WebKit Igalia Periodical #55https://blogs.igalia.com/webkit/blog/2026/wip-55/2026-02-02T20:11:18+00:00
<p>Update on what happened in WebKit in the week from January 26 to February 2.</p>
<p>
A calm week for sure! The highlight this week is the fix for scrolling not starting when the main thread is blocked.
</p>
<h2 id="cross-port-cat">Cross-Port 🐱</h2>
<h3 id="graphics-frame-photo">Graphics 🖼️</h3>
<div class="wip-item">
<p><a rel="external" href="https://commits.webkit.org/306396@main">Fixed</a> the problem of wheel event async scrolling doesn't start while the main thread is blocked. This should make WebKit feel more responsive even on heavier websites.</p>
</div>
<div class="wip-end">
<p>That’s all for this week!</p>
</div> Igalia WebKit Teamhttps://blogs.igalia.com/webkitBrian Kardell: Maintaining the Bridgeshttps://bkardell.com/blog/Bridges.html2026-02-02T05:00:00+00:00
<h1 class="contextual-heading">Maintaining the Bridges</h1>
<p class="segue">Thoughts and analogies about infrastructure...</p>
<p>I live in Pittsburgh, Pennsylvania — “The Steel City,” once the beating heart of American steelmaking. In 1902, U.S. Steel’s first full year of operation, it produced 67% of all steel in the United States. By 1943, the company employed more than 340,000 people. We burned so much coal that Pittsburgh earned the nickname “Hell with the lid off.” Streetlights sometimes ran at noon because the sky was that dark. </p>
<figure class="captioned-image">
<img src="https://bkardell.com/media/2026/pittsburgh-midday.jpg" />
<figcaption>A photo of Pittsburgh dark with smoke at midday. (You can search more about this it if that interests you, here's <a href="https://www.treehugger.com/think-air-quality-doesnt-matter-look-at-pittsburgh-in-the-s-4862509">one nice piece with a few pictures</a>).</figcaption>
</figure>
<p>The city’s geography didn’t make things any easier. Pittsburgh is carved by mountains, valleys, and the three rivers — the Allegheny and Monongahela merging to form the Ohio. That topography, combined with the industrial boom, meant we built <em>a lot of bridges</em>. It helps that when your city is literally manufacturing the materials, you get a hometown discount. </p>
<figure class="captioned-image">
<img src="https://bkardell.com/media/2026/3-sisters.jpg" alt="" /><p></p>
<figcaption>A view down river of Pittsburgh's <a href="https://en.wikipedia.org/wiki/Three_Sisters_(Pittsburgh)">3 sisters" bridges</a> and several others.</figcaption>
</figure>
<p>One of them, the Hot Metal Bridge — just a mile or two from my house — once carried ladle cars full of molten iron between the blast furnaces and mills of J&L Steel. During World War II, 15% of America’s steelmaking capacity crossed that bridge, up to 180 tons per hour.</p>
<p>These bridges were originally built by private companies with a clear profit motive: to move coal, ore, steel, or workers. Others were toll bridges, run by private companies the way you’d run a turnpike or ferry.</p>
<p>But more bridges meant more industry, which meant more people, which meant more bridges. You can see where this goes.</p>
<p>Even by the late 1800s we were beginning to publicly fund them. By the 1920s–1930s Allegheny County’s bridge program bought out many of the private bridges and replaced many of them. By the time the New Deal and Interstate era arrived, the private‑toll era was basically over - and since then over 90% of Pittsburgh's public bridges were funded by federal programs (we still have some private industry use bridges).</p>
<h2 class="contextual-heading">So what does any of this have to do with software?</h2>
<p>Aside from giving me an excuse to talk about my city (which I enjoy), Pittsburgh’s bridges are a useful metaphor for the infrastructure we rely on in tech, in two important ways:</p>
<ol>
<li><p><em>Becoming</em> a public good
Private investment built early bridges, just like private companies built much of what we've got now in terms of browser engines, search index, foundational libraries and so on - but eventually they stopped becoming optional. I think we're only now starting to really understand that we need a lot of this to be a public good in the same kind of way. These are the roads and bridges of the modern world.</p>
</li>
<li><p>Building something new is exciting. Maintaining it, not so much.
A lot of my city's physical infrastructure is aging and some of it has been neglected. It's somehow way easier to get people to build new things than to take care of the old stuff. The public notices a new bridge! The ribbon-cutting gets a photo op and celebration. The maintenance budgets and crews struggle to even get funding.</p>
</li>
</ol>
<p>In fact, even when things are fairly well funded, it doesn't mean they're kept up to date. While researching to write this piece I realized that a lot of the Wikipedia data about Pittsburgh (and many topics!) is actually <em>really</em> out of date. It's cool to write the article with these cool facts, but it's not so cool to do the work to keep it up... Or, maybe thats just not what you want to do anymore. Or maybe you were incarcerated, or you died, or you went to Mars - idk. </p>
<p>The point is that writing the thing in the first place is only half the battle. If most of your entry on a city was written two decades ago, a lot of what it details about the economics, population, jobs, and so on are probably not very accurate!</p>
<p>It's no different with software. It's cool and fun to build a new thing or add a new feature to an existing thing, but keeping them maintained is annoying. New mechanisms arrive that you might need to adapt to. Underlying code bit rots. All of it needs release teams and Q&A and reviews and fixes and updates at global scales, even if no new features were added. But very few people actually want to do that, and almost nobody wants to pay for it.</p>
<h2 class="contextual-heading">More Public Funding</h2>
<p>I'd really love for societies around the world to come to the realization that a lot of the online things we've built are, like roads and bridges, now <em>necessary</em> - and figure out how we can publicly fund enough of them that important things without an obvious and direct profit motive can get done. MathML and SVG are two easy examples of this, but there are plenty more. Maybe XSLT is another example. Perhaps if we had good funding for those things, their ongoing survival wouldn't be questioned.</p>
<p>I feel like there is a lot of room here for improvement from the status quo. It doesn't even have to start with governments. Any ways that we expand the pool of funding avilable and diversifying, it helps.</p> Brian Kardellhttp://bkardell.com/Igalia Compilers Team: Implementing the Temporal proposal in JavaScriptCorehttps://blogs.igalia.com/compilers/2026/02/02/implementing-the-temporal-proposal-in-javascriptcore/2026-02-02T00:00:00+00:00
<p><em><a href="https://www.publicdomainpictures.net/en/view-image.php?image=10690&picture=prague-astronomical-clock-detail">Image source</a></em></p>
<p>For the past year, I've been working on implementing the <a href="https://tc39.es/proposal-temporal/docs/">Temporal</a> proposal for date and time handling in JavaScript, in JavaScriptCore (JSC). JavaScriptCore is the JavaScript engine that's part of the WebKit browser engine. When I started, Temporal was partially implemented, with support for the <code>Duration</code>, <code>PlainDate</code>, <code>PlainDateTime</code>, and <code>Instant</code> types. However, many <a href="https://github.com/tc39/test262">test262</a> tests related to Temporal didn't pass, and there was no support for <code>PlainMonthDay</code>, <code>PlainYearMonth</code>, or <code>ZonedDateTime</code> objects. Further, there was no support for the <code>relativeTo</code> parameter, and only the "iso8601" calendar was supported.</p>
<h2 id="duration-precision-landed" tabindex="-1">Duration precision (landed) <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/02/implementing-the-temporal-proposal-in-javascriptcore/">#</a></h2>
<p>Conceptually, a duration is a 10-tuple of time components, or a record with the fields "years", "months", "weeks", "days", "hours", "seconds", "milliseconds", "microseconds", and "nanoseconds".</p>
<p>One way durations are used is to represent the difference between two dates. For example, to find the length of time from a given date until the end of 2027, I could write the following JS code:</p>
<pre class="language-javascript" tabindex="0"><code class="language-javascript"><span class="token operator">></span> <span class="token keyword">const</span> duration <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">Temporal<span class="token punctuation">.</span>PlainDate</span><span class="token punctuation">(</span><span class="token number">2026</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">26</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">until</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">Temporal<span class="token punctuation">.</span>PlainDate</span><span class="token punctuation">(</span><span class="token number">2027</span><span class="token punctuation">,</span> <span class="token number">12</span><span class="token punctuation">,</span> <span class="token number">31</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">{</span> <span class="token literal-property property">largestUnit</span><span class="token operator">:</span> <span class="token string">"years"</span> <span class="token punctuation">}</span><span class="token punctuation">)</span><br /><span class="token operator">></span> duration<br />Temporal<span class="token punctuation">.</span>Duration <span class="token operator"><</span><span class="token constant">P1Y11M5D</span><span class="token operator">></span> <span class="token punctuation">{</span><br /> <span class="token literal-property property">years</span><span class="token operator">:</span> <span class="token number">1</span><span class="token punctuation">,</span><br /> <span class="token literal-property property">months</span><span class="token operator">:</span> <span class="token number">11</span><span class="token punctuation">,</span><br /> <span class="token literal-property property">days</span><span class="token operator">:</span> <span class="token number">5</span><br /><span class="token punctuation">}</span></code></pre>
<p>The <code>until</code> method in this case returns a duration comprising one year, eleven months, and five days. Because durations can represent differences between dates, they can also be negative:</p>
<pre class="language-javascript" tabindex="0"><code class="language-javascript"><span class="token operator">></span> <span class="token keyword">const</span> duration <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">Temporal<span class="token punctuation">.</span>PlainDate</span><span class="token punctuation">(</span><span class="token number">2027</span><span class="token punctuation">,</span> <span class="token number">12</span><span class="token punctuation">,</span> <span class="token number">31</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">until</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">Temporal<span class="token punctuation">.</span>PlainDate</span><span class="token punctuation">(</span><span class="token number">2026</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">26</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">{</span> <span class="token literal-property property">largestUnit</span><span class="token operator">:</span> <span class="token string">"years"</span> <span class="token punctuation">}</span><span class="token punctuation">)</span><br /><span class="token operator">></span> duration<br />Temporal<span class="token punctuation">.</span>Duration <span class="token operator"><</span><span class="token constant">P1Y11M5D</span><span class="token operator">></span> <span class="token punctuation">{</span><br /> <span class="token literal-property property">years</span><span class="token operator">:</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span><br /> <span class="token literal-property property">months</span><span class="token operator">:</span> <span class="token operator">-</span><span class="token number">11</span><span class="token punctuation">,</span><br /> <span class="token literal-property property">days</span><span class="token operator">:</span> <span class="token operator">-</span><span class="token number">5</span><br /><span class="token punctuation">}</span></code></pre>
<p>When converted to nanoseconds, the total of days, hours, minutes, seconds, milliseconds, microseconds, and nanoseconds for a duration may be a number whose absolute value is as large as 10<sup>9</sup> × 2<sup>53</sup>. This number is too large to represent either as a 32-bit integer or as a 64-bit double-precision value. (If you're wondering about the significance of the number 2<sup>53</sup>, see the <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER">MDN documentation on JavaScript's <code>MAX_SAFE_INTEGER</code></a>.)</p>
<p>To understand why we need to be able to work with such large numbers, consider totaling the number of nanoseconds in a duration. Following on the previous example’s definition of the variable <code>duration</code>:</p>
<pre class="language-javascript" tabindex="0"><code class="language-javascript"><span class="token operator">></span> duration<span class="token punctuation">.</span><span class="token function">total</span><span class="token punctuation">(</span><span class="token punctuation">{</span><span class="token literal-property property">unit</span><span class="token operator">:</span> <span class="token string">"nanoseconds"</span><span class="token punctuation">,</span> <span class="token literal-property property">relativeTo</span><span class="token operator">:</span> <span class="token keyword">new</span> <span class="token class-name">Temporal<span class="token punctuation">.</span>PlainDate</span><span class="token punctuation">(</span><span class="token number">2025</span><span class="token punctuation">,</span> <span class="token number">12</span><span class="token punctuation">,</span> <span class="token number">15</span><span class="token punctuation">)</span><span class="token punctuation">}</span><span class="token punctuation">)</span><br /><span class="token number">60912000000000000</span></code></pre>
<p>There are 60912000000000000 nanoseconds, or about 6.1e16, in a period of one year, eleven months, and five days. Since we want to allow this computation to be done with any valid start and end date, and valid years in Temporal range from -271821 to 275760, the result can get quite large. (By default, Temporal follows the <a href="https://en.wikipedia.org/wiki/ISO_8601">ISO 8601 standard</a> for calendars, which entails using a <a href="https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar">proleptic Gregorian calendar</a>. Also note that this example uses a <code>PlainDate</code>, which has no time zone, so computations are not affected by daylight savings time; when computing with the Temporal <code>ZonedDateTime</code> type, the specification ensures that time zone math is done properly.)</p>
<p>To make it easier for implementations to fulfill these requirements, the specification represents durations internally as <a href="https://tc39.es/proposal-temporal/#sec-temporal-internal-duration-records">Internal Duration Records</a> and converts between JavaScript-level duration objects and Internal Duration Records (which I'll call "internal durations") as needed. An internal duration pairs the date component of the duration (the years, months, weeks, and days fields) with a "time duration", which is a single integer that falls within an accepted range, and can be as large as 2<sup>53</sup> × 10<sup>9</sup> - 1.</p>
<p>Implementations don't <em>have</em> to use this representation, as long as the results are observably the same as what the specification dictates. However, the pre-existing implementation didn't suffice, so I re-implemented durations in a way that closely follows the approach in the specification.</p>
<p>This work has been landed in JSC.</p>
<h2 id="new-date-types" tabindex="-1">New date types <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/02/implementing-the-temporal-proposal-in-javascriptcore/">#</a></h2>
<p>Temporal's date types include <code>PlainDate</code>, <code>PlainDateTime</code>, <code>Instant</code>, <code>ZonedDateTime</code>, <code>PlainMonthDay</code>, and <code>PlainYearMonth</code>. The latter two represent partial dates: either a pair of a month and a day within that month, or a pair of a year and month within that year. Partial dates are a better solution for representing dates where not all of the fields are known (or not all of the fields matter) than full dates with default values for the missing bits.</p>
<p>Temporal's <code>ZonedDateTime</code> type represents a date along with a time zone, which can either be a numeric offset from UTC, or a named time zone.</p>
<p>I implemented <code>PlainMonthDay</code> and <code>PlainYearMonth</code> with all their operations. <code>ZonedDateTime</code> is fully implemented and the first pull request in a series of PRs for it has been submitted.</p>
<h2 id="the-relativeto-parameter" tabindex="-1">The relativeTo parameter <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/02/implementing-the-temporal-proposal-in-javascriptcore/">#</a></h2>
<p>What if you want to convert a number of years to a number of days? Temporal can do that, but there's a catch. Converting years to days depends on what year it is, when using the ISO 8601 calendar (similar to the Gregorian calendar), because the calendar has leap years. Some calendars have leap months as well, so converting years to months would depend on what year it is as well. Likewise, converting months to days doesn't have a consistent answer, because months vary in length.</p>
<p>For that reason, the following code will throw an exception, because there's not enough information to compute the result:</p>
<pre class="language-javascript" tabindex="0"><code class="language-javascript"><span class="token operator">></span> <span class="token keyword">const</span> duration <span class="token operator">=</span> Temporal<span class="token punctuation">.</span>Duration<span class="token punctuation">.</span><span class="token function">from</span><span class="token punctuation">(</span><span class="token punctuation">{</span> <span class="token literal-property property">years</span><span class="token operator">:</span> <span class="token number">1</span> <span class="token punctuation">}</span><span class="token punctuation">)</span><br /><span class="token operator">></span> duration<span class="token punctuation">.</span><span class="token function">total</span><span class="token punctuation">(</span><span class="token punctuation">{</span> <span class="token literal-property property">unit</span><span class="token operator">:</span> <span class="token string">"days"</span> <span class="token punctuation">}</span><span class="token punctuation">)</span><br />Uncaught RangeError<span class="token operator">:</span> a starting point is required <span class="token keyword">for</span> years total</code></pre>
<p>The above definition of <code>duration</code> can still be made to work if we pass in a starting point, which we can do using the <code>relativeTo</code> parameter:</p>
<pre class="language-javascript" tabindex="0"><code class="language-javascript"><span class="token operator">></span> duration<span class="token punctuation">.</span><span class="token function">total</span><span class="token punctuation">(</span><span class="token punctuation">{</span> <span class="token literal-property property">unit</span><span class="token operator">:</span> <span class="token string">"days"</span><span class="token punctuation">,</span> <span class="token literal-property property">relativeTo</span><span class="token operator">:</span> <span class="token string">"2025-01-01"</span> <span class="token punctuation">}</span><span class="token punctuation">)</span><br /><span class="token number">365</span><br /><span class="token operator">></span> duration<span class="token punctuation">.</span><span class="token function">total</span><span class="token punctuation">(</span><span class="token punctuation">{</span> <span class="token literal-property property">unit</span><span class="token operator">:</span> <span class="token string">"days"</span><span class="token punctuation">,</span> <span class="token literal-property property">relativeTo</span><span class="token operator">:</span> <span class="token string">"2024-01-01"</span> <span class="token punctuation">}</span><span class="token punctuation">)</span><br /><span class="token number">366</span></code></pre>
<p>The string passed in for the <code>relativeTo</code> parameter is automatically converted to either a <code>PlainDate</code> or a <code>ZonedDateTime</code>, depending on which format it conforms to.</p>
<p>I implemented support for the <code>relativeTo</code> parameter on all the operations that have it; once the implementations for all the date types land, I'll be submitting this work as a series of pull requests.</p>
<h2 id="calendars" tabindex="-1">Calendars <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/02/implementing-the-temporal-proposal-in-javascriptcore/">#</a></h2>
<p>Representing dates with non-ISO8601 calendars is still very much a work in progress. The <a href="https://icu.unicode.org/">ICU library</a> can already do the basic date computations, but much glue code is necessary to internally represent dates with non-ISO8601 calendars and call the correct ICU functions to do the computations. This work is still underway. The Temporal specification does not require support for non-ISO8601 calendars, but a separate proposal, <a href="https://tc39.es/proposal-intl-era-monthcode/">Intl Era Month Code</a>, proposes a set of calendars to be supported by conformant implementations.</p>
<h2 id="testing-temporal" tabindex="-1">Testing Temporal <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/02/implementing-the-temporal-proposal-in-javascriptcore/">#</a></h2>
<p>The JavaScript test suite is called <a href="https://github.com/tc39/test262">test262</a> and every new proposal in JavaScript must be accompanied by test262 tests. Not all JS implementations are required to support internationalization, so Temporal tests that involve non-ISO calendars or named time zones (other than the UTC time zone) are organized in a separate <code>intl402</code> subdirectory in test262.</p>
<p>The test262 suite includes 6,764 tests for Temporal, with 1,791 of these tests added in 2025. Igalia invested hundreds of hours on increasing test coverage over the past year.</p>
<h2 id="status-of-work" tabindex="-1">Status of work <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/02/02/implementing-the-temporal-proposal-in-javascriptcore/">#</a></h2>
<p>All of this work is behind a flag in JSC in Technology Preview, so to try it out, you'll have to pass the <code>--useTemporal=1</code> flag.</p>
<p>All of the implementation work discussed above (except for non-ISO calendars) is complete, but I've been following an incremental approach to submitting the code for review by the JSC code owners. I've already landed about 40 pull requests over the course of 2025, and expect to be submitting at least 25 more to complete the work on <code>PlainYearMonth</code>, <code>ZonedDateTime</code>, and <code>relativeTo</code>.</p>
<p>Based on all the code that I've implemented, 100% of the non-intl402 test262 tests for Temporal pass, while the current HEAD version of JSC passes less than half the tests.</p>
<p>My colleagues at Igalia and I look forward to a future JavaScript standard that fully integrates Temporal, enabling JavaScript programs to handle dates more robustly and efficiently. Consistent implementation of the proposal across browsers is a key step towards this future. Step by step, we're getting closer to this goal.</p>
<p>We thank <a href="https://www.bloomberg.com/">Bloomberg</a> for sponsoring this work.</p> Igalia Compilers Teamhttps://blogs.igalia.com/compilers/Qiuyi Zhang (Joyee): Tickering with Node.js Core on ARM64 Windowshttps://joyeecheung.github.io/blog/2026/01/31/tinkering-with-nodejs-core-on-arm64-windows/2026-01-31T10:25:49+00:00
<p>A while back, I wrote about <a href="https://joyeecheung.github.io/blog/2025/02/16/building-nodejs-on-windows-with-clang-cl/" title="Building Node.js on Windows using the new ClangCL support">Building Node.js on Windows using the new ClangCL support</a>, which was done on an actual x64 Windows machine</p> Qiuyi Zhang (Joyee)https://joyeecheung.github.io/blog/Manuel Regohttps://blogs.igalia.com/mrego/blog/2026-01-30/2026-01-30T00:00:00+00:00
<p>On my way to <a href="https://fosdem.org/2026/">FOSDEM</a> where tomorrow I’ll be <a href="https://fosdem.org/2026/schedule/event/LXFKS9-servo-project-impact/">talking about Servo</a> in the <em>Browser and web platform</em> devroom. See you there!</p>
<p><img src="https://blogs.igalia.com/mrego/files/2026/01/fosdem-2026-servo-talk-banner.png" alt="Banner of my talk at FOSDEM that reads: Igalia @ FOSDEM'26. The Servo project and its impact on the web platform ecosystem. Manuel Rego. Saturday, Jan. 31, 2:00pm" /></p> Manuel Regohttps://blogs.igalia.com/mrego/Alice Boxhall: Reference Target: having your encapsulation and eating it toohttps://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/2026-01-30T00:00:00+00:00
<p>Three years ago, I wrote a blog post about <a href="https://blogs.igalia.com/alice/how-shadow-dom-and-accessibility-are-in-conflict/">How Shadow DOM and accessibility are in conflict</a>.</p>
<p>I explained how the encapsulation provided by shadow roots is a double-edged sword, particularly when it comes to accessibility. Being able to programmatically express relationships from one element to another is critical for creating user experiences which don’t rely on visual cues - but elements inside a shadow root aren’t available to be referenced from elements in the light DOM. This encapsulation, however, is what allows component authors to create accessible components which can be safely reused in any context, without necessarily requiring any particular dependencies or extra build steps.</p>
<p>In the year or so following, even more heroic attempts were made to square this circle, and finally one seems likely to stick: <a href="https://github.com/WICG/webcomponents/blob/gh-pages/proposals/reference-target-explainer.md">Reference Target</a>. In this post I’ll explain how this feature works, why I like it, and what the situation is right now with the spec and implementation (thanks in part to <a href="https://blogs.igalia.com/mrego/solving-cross-root-aria-issues-in-shadow-dom/">Igalia’s NLNet funding</a>).</p>
<h2 id="a-quick-introduction" tabindex="-1">A quick introduction <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h2>
<p><code>referenceTarget</code> is a new property on shadow root objects which lets you nominate an element in the shadow root’s subtree which should be the <strong>target</strong> of any attribute-based <strong>reference</strong> to the shadow host.</p>
<p>As an example, imagine that you have a <code><custom-input></code> component, which has an <code><input></code> tucked away in its shadow root.
This is a pattern which is ubiqutous in custom element libraries, as it allows the custom element to use <a href="https://en.wikipedia.org/wiki/Object_composition">composition</a> to enhance the behaviour of a built-in element.</p>
<pre class="language-html" tabindex="0"><code class="language-html"><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>label</span> <span class="token attr-name">for</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>track<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Track name:<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>label</span><span class="token punctuation">></span></span><br /><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>custom-input</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>track<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><br /> #shadowRoot<br /> | <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>input</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>inner-input<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><br /><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>custom-input</span><span class="token punctuation">></span></span></code></pre>
<p>We can set the shadow root’s <code>referenceTarget</code> to allow the <code><label></code> to correctly label the inner <code><input></code>:</p>
<pre class="language-js" tabindex="0"><code class="language-js"><span class="token comment">// in the constructor for the custom-input:</span><br /><br /><span class="token keyword">const</span> shadowRoot <span class="token operator">=</span> <span class="token keyword">this</span><span class="token punctuation">.</span><span class="token function">attachShadow</span><span class="token punctuation">(</span><span class="token punctuation">{</span><span class="token literal-property property">mode</span><span class="token operator">:</span> <span class="token string">'open'</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br />shadowRoot<span class="token punctuation">.</span>referenceTarget <span class="token operator">=</span> <span class="token string">'inner-input'</span><span class="token punctuation">;</span><br />shadowRoot<span class="token punctuation">.</span>innerHTML <span class="token operator">=</span> '<span class="token operator"><</span>input id<span class="token operator">=</span><span class="token string">"inner-input"</span><span class="token operator">></span>`<span class="token punctuation">;</span></code></pre>
<p>This lets the label refer to the <code><custom-input></code> just like it would refer to an <code><input></code>; the <code><custom-input></code> transparently proxies the reference through to the encapsulated <code><input></code>.</p>
<!--
bk: would it be more accurate to say "This makes the `<custom-input>` act as a transparent proxy for its internal input for the purposes of the `<label>`"? It's a little easier for me to grok because the above easily misreads to me as "behave just like", but not necessarily "behave as if it were" - if that makes any sense
ab: is this better?
-->
<p>In this example, we’ve set the <code>referenceTarget</code> property directly on the <code>ShadowRoot</code> object, but it can also be set declaratively when using the <code><template></code> element to create the shadow root:</p>
<pre class="language-html" tabindex="0"><code class="language-html"><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>label</span> <span class="token attr-name">for</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>track<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Track name:<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>label</span><span class="token punctuation">></span></span><br /><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>custom-input</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>track<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>template</span> <span class="token attr-name">shadowRootMode</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>open<span class="token punctuation">"</span></span><br /> <span class="token attr-name">shadowRootReferenceTarget</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>inner-input<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>input</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>inner-input<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>template</span><span class="token punctuation">></span></span><br /><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>custom-input</span><span class="token punctuation">></span></span></code></pre>
<p>This works equally well for <em>any</em> attribute which refers to other elements like this - even if you set it via a reflected property like <code>commandForElement</code>:</p>
<pre class="language-html" tabindex="0"><code class="language-html"><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>button</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>settings-trigger<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Site settings<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>button</span><span class="token punctuation">></span></span><br /><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>custom-dialog</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>settings-dialog<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><br /> #shadowRoot referenceTarget="inner-dialog"<br /> | <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>dialog</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>inner-dialog<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><br /> | <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>button</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>close<span class="token punctuation">"</span></span> <span class="token attr-name">aria-label</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>close<span class="token punctuation">"</span></span><br /> <span class="token attr-name">|</span> <span class="token attr-name">commandFor</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>inner-dialog<span class="token punctuation">"</span></span><br /> <span class="token attr-name">|</span> <span class="token attr-name">command</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>request-close<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>button</span><span class="token punctuation">></span></span><br /> | <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>slot</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>slot</span><span class="token punctuation">></span></span><br /> | <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>dialog</span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>fieldset</span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>legend</span><span class="token punctuation">></span></span>Colour scheme:<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>legend</span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>label</span> <span class="token attr-name">for</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>dark<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>input</span> <span class="token attr-name">type</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>radio<span class="token punctuation">"</span></span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>dark<span class="token punctuation">"</span></span> <span class="token attr-name">name</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>appearance<span class="token punctuation">"</span></span> <span class="token attr-name">value</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>dark<span class="token punctuation">"</span></span> <span class="token attr-name">checked</span><span class="token punctuation">></span></span><br /> Dark<br /> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>label</span><span class="token punctuation">></span></span><br /> <span class="token comment"><!-- TODO: more colour schemes --></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>fieldset</span><span class="token punctuation">></span></span><br /><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>custom-dialog</span><span class="token punctuation">></span></span></code></pre>
<pre class="language-js" tabindex="0"><code class="language-js"><span class="token comment">// Someone probably has a good reason why they'd do it this way, right?</span><br /><br /><span class="token keyword">const</span> settingsButton <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementById</span><span class="token punctuation">(</span><span class="token string">'settings-trigger'</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br />settingsButton<span class="token punctuation">.</span>command <span class="token operator">=</span> <span class="token string">'show-modal'</span><span class="token punctuation">;</span><br />settingsButton<span class="token punctuation">.</span>commandForElement <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementById</span><span class="token punctuation">(</span><span class="token string">'settings-dialog'</span><span class="token punctuation">)</span><span class="token punctuation">;</span></code></pre>
<p>This lets the <code><custom-dialog></code> behave exactly like a <code><dialog></code> for the purposes of the <code>command</code> and <code>commandForElement</code> properties.</p>
<h2 id="why-i-like-it" tabindex="-1">Why I like it <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h2>
<p>In my <a href="https://blogs.igalia.com/alice/how-shadow-dom-and-accessibility-are-in-conflict/">earlier blog post</a> I explained that I was concerned that the Cross-root ARIA delegation and reflection proposals introduced a <a href="https://blogs.igalia.com/alice/how-shadow-dom-and-accessibility-are-in-conflict/#limitations-of-these-apis">bottleneck problem</a>. This problem arose because it was only possible to refer to one element per attribute, rather than allowing <em>arbitrary</em> cross-shadow root references.</p>
<p>This proposal <em>absolutely doesn’t</em> solve that problem, but it reframes the overall problem such that I don’t think it matters any more.</p>
<p>The key difference between reference target and the earlier proposals is that reference target is a catch-all for references to the shadow host, rather than requiring each attribute to be forwarded separately. This solves a specific problem, which I alluded to above: how can custom element authors encapsulate the behaviour of a given built-in HTML element while also allowing other elements to refer to the custom element as if it <em>was</em> the built-in element?</p>
<p>I believe this more narrow problem definition accounts for a significant proportion - not all, but many - cases where references need to be able to cross into shadow roots. And it makes the API make much more sense to me - if you’re using the <code>for</code> attribute to refer to a <code><custom-input></code>, you’re not meant to need to know that you’re actually referring to an enclosed <code><input></code>, you just want the <code><custom-input></code> to be labelled. This API makes the enclosed <code><input></code> an implementation detail. And since a shadow root can only have one host, it makes sense that it can only have one reference target.</p>
<h2 id="adjacent-as-yet-unsolved-problems" tabindex="-1">Adjacent, as-yet unsolved problems <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h2>
<h3 id="arbitrary-cross-shadow-root-references" tabindex="-1">Arbitrary cross-shadow root references <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h3>
<p>As mentioned above, one adjacent problem is the problem of element references which do need to refer to specific elements within a shadow root, rather than a stand-in for the shadow host.</p>
<p>The explainer gives two examples of this: <a href="https://github.com/WICG/webcomponents/blob/gh-pages/proposals/reference-target-explainer.md#aria-activedescendant-and-comboboxes"><code>aria-activedescendant</code></a> on a combobox element which needs to refer to an option inside of a shadow root, and ARIA attributes like <code>aria-labelledby</code>, <code>aria-describedby</code> and <code>aria-errormessage</code> which may need <a href="https://github.com/WICG/webcomponents/blob/gh-pages/proposals/reference-target-explainer.md#fine-grained-aria-labelledby-and-aria-describedby">a computed name for the component which excludes some parts</a>.</p>
<p>I think we need to be careful about generalising this problem, though. As I describe later in the explainer, I think we might be able to <a href="https://github.com/WICG/webcomponents/blob/gh-pages/proposals/reference-target-explainer.md#addressing-individual-use-cases-separately">get better solutions by solving more specific problems</a> - as we have with reference target.</p>
<p>If you have another example of where you need to refer to specific elements within a shadow root, you can leave a comment on <a href="https://github.com/WICG/webcomponents/issues/1111">this issue collecting use cases</a>.</p>
<h3 id="attribute-forwarding" tabindex="-1">Attribute forwarding <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h3>
<p>While reference target allows other elements to refer to the encapsulated element, custom element authors may also want to allow developers using their component to use standard HTML and ARIA attributes on the host element and have those apply to the encapsulated element.</p>
<p>For example, you might like to support <code>popoverTarget</code> on your <code><custom-button></code> element:</p>
<pre class="language-html" tabindex="0"><code class="language-html"><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>custom-button</span> <span class="token attr-name">popoverTarget</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>languages<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Language<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>custom-button</span><span class="token punctuation">></span></span><br /><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>custom-menu</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>languages<span class="token punctuation">"</span></span> <span class="token attr-name">popover</span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>custom-menuitem</span><span class="token punctuation">></span></span>Nederlands<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>custom-menuitem</span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>custom-menuitem</span><span class="token punctuation">></span></span>Fryslân<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>custom-menuitem</span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>custom-menuitem</span><span class="token punctuation">></span></span>Vlaams<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>custom-menuitem</span><span class="token punctuation">></span></span><br /><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>custom-menu</span><span class="token punctuation">></span></span></code></pre>
<p>There is an <a href="https://github.com/WICG/webcomponents/issues/1068">issue for the attribute forwarding idea</a>; leave a comment there if this is an idea you’d like to see pursued.</p>
<h3 id="form-association" tabindex="-1">Form association <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h3>
<p>Custom elements can be specified as <a href="https://html.spec.whatwg.org/dev/custom-elements.html#custom-elements-face-example">form-associated</a>, but there’s no way to associate an encapsulated form-associated built-in element (such as <code><input></code>) with an enclosing <code><form></code>.</p>
<p>For example, the <code><custom-input></code> above could be nested in a <code><form></code> element, but the enclosed <code><input></code> wouldn’t be associated with the <code><form></code> - instead, you’d have to use <code>setFormValue()</code> on the custom element and copy the value of the <code><input></code>.</p>
<h2 id="spec-and-implementation-status" tabindex="-1">Spec and implementation status <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h2>
<p>In brief: the spec changes seem to be in good shape, Chromium has the most feature-complete implementation and there are significantly less-baked implementations in WebKit and Firefox.</p>
<h3 id="spec-changes" tabindex="-1">Spec changes <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h3>
<p>There are open pull requests on the <a href="https://github.com/whatwg/html/pull/10995">HTML</a> and <a href="https://github.com/whatwg/dom/pull/1353">DOM</a> specs. Since these PRs are still being reviewed, the concepts and terminology below might change, but this is what we have right now. These changes have already had a few rounds of reviews, thanks to Anne van Kesteren, Olli Pettay and Keith Cirkel.</p>
<p>The <a href="https://whatpr.org/dom/1353/388779b...64676d6.html">DOM</a> change:</p>
<ul>
<li>adds the concept of a <a href="https://whatpr.org/dom/1353/388779b...64676d6.html#shadowroot-reference-target">reference target</a></li>
<li>adds the <a href="https://whatpr.org/dom/1353/388779b...64676d6.html#interface-shadowroot"><code>referenceTarget</code></a> property to the <code>ShadowRoot</code> object.</li>
</ul>
<p>The <a href="https://github.com/whatwg/html/pull/10995">HTML</a> change is where the actual effect of the reference target is defined.</p>
<h4 id="element-reference-attribute-type" tabindex="-1">Element reference attribute type <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h4>
<p>One key change in the HTML spec is the addition of an attribute type for <a href="https://whatpr.org/html/10995/common-microsyntaxes.html#element-reference-attributes">“element reference” attributes</a>. This formalises in HTML what has previously been referred to as an <a href="https://www.w3.org/TR/wai-aria/#valuetype_idref">ID reference</a> or <a href="https://www.w3.org/TR/xmlschema11-2/#IDREF">IDREF</a>. This term isn’t currently used in HTML, and since the addition of <a href="https://html.spec.whatwg.org/#reflecting-content-attributes-in-idl-attributes:element">reflected IDL Element attributes</a>, IDs aren’t strictly necessary, either.</p>
<p>Before this change, whenever an attribute in the HTML spec was required to match another element based on its ID, this was written out explicitly where the attribute was defined. For example, the <a href="https://html.spec.whatwg.org/#attr-label-for">definition of the <code><label></code> element’s <code>for</code> attribute</a>
currently reads:</p>
<blockquote>
<p>The <code>for</code> attribute may be specified to indicate a form control with which the caption is to be associated. If the attribute is specified, the attribute’s value must be the ID of a labelable element in the same tree as the <code>label</code> element. If the attribute is specified and there is an element in the tree whose ID is equal to the value of the for attribute, and the first such element in tree order is a labelable element, then that element is the <code>label</code> element’s labeled control.</p>
</blockquote>
<p>Since reference target affects how this type of reference works, and is intended to apply for every attribute which refers to another element, it was simpler to have one central definition.</p>
<h4 id="reference-target-resolution" tabindex="-1">Reference target resolution <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h4>
<p>For a reference target to actually do something, we need to define what effect it has. This is defined, quite straightforwardly, in the <a href="https://whatpr.org/html/10995/common-microsyntaxes.html#resolve-the-reference-target">steps to resolve the reference target</a>:</p>
<blockquote>
<ol>
<li>If <em>element</em> is not a shadow host, or <em>element</em>’s shadow root’s reference target is null, then return <em>element</em>.</li>
<li>Let <em>referenceTargetValue</em> be the value of <em>element</em>’s shadow root’s reference target.</li>
<li>Let <em>candidate</em> be the first element in <em>element</em>’s shadow root whose ID matches <em>referenceTargetValue</em>.</li>
<li>If no such element exists, return null.</li>
<li>Return the result of resolving the reference target on <em>candidate</em>.</li>
</ol>
</blockquote>
<p>These steps are recursive: if a shadow root’s reference target has its own shadow root, and that shadow root has a reference target, we keep descending into the nested shadow root.</p>
<p>One slightly subtle design choice here is that if a shadow root has a reference target which doesn’t refer to <em>any</em> element - for example, an empty string, or a value which doesn’t match the ID of any element in its subtree - the resolved reference target is null, <strong>not</strong> the shadow host.</p>
<p>For example, if you tried to use <code>popoverTarget</code> to refer to a shadow host which had a <code>popover</code> attribute, but had an invalid reference target on its shadow root, the <code>popoverTarget</code> attribute won’t actually target anything:</p>
<pre class="language-html" tabindex="0"><code class="language-html"><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>button</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>more-actions<span class="token punctuation">"</span></span> <span class="token attr-name">popoverTarget</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>actions-popover<span class="token punctuation">"</span></span> <span class="token attr-name">aria-label</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>more actions<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>…<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>button</span><span class="token punctuation">></span></span><br /><br /><span class="token comment"><!-- Even though this has a popover attribute, the button won't toggle it! --></span><br /><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>custom-popover</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>actions-popover<span class="token punctuation">"</span></span> <span class="token attr-name">popover</span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>template</span> <span class="token attr-name">shadowRootMode</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>open<span class="token punctuation">"</span></span><br /> <span class="token attr-name">shadowRootReferenceTarget</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0xDEADBEEF<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>help-im-trapped-in-a-shadow-root<span class="token punctuation">"</span></span> <span class="token attr-name">popover</span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>slot</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>slot</span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><br /> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>template</span><span class="token punctuation">></span></span><br /><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>custom-dialog</span><span class="token punctuation">></span></span></code></pre>
<h4 id="resolved-and-unresolved-attr-target-elements" tabindex="-1">Resolved and unresolved <em>attr</em> target elements <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h4>
<p>Like many spec concepts, this one is a real mouthful.</p>
<p>This lets us be very clear about whether reference target resolution has happened when we’re talking about what element an attribute refers to.</p>
<p>If we’re <a href="https://whatpr.org/html/10995/common-dom-interfaces.html#reflecting-content-attributes-in-idl-attributes:element">reflecting an attribute to its IDL counterpart</a>, we now use the <a href="https://whatpr.org/html/10995/common-microsyntaxes.html#get-the-unresolved-attr-target-element">unresolved <em>attr</em> target element</a>. For example, if we had the DOM defined in the previous example, and we wanted to get the <code>popoverTargetElement</code> for the <code>"settings-trigger"</code> button:</p>
<pre class="language-js" tabindex="0"><code class="language-js"><span class="token keyword">const</span> moreActions <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementById</span><span class="token punctuation">(</span><span class="token string">"more-actions"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br /><br /><span class="token comment">// This will log the <custom-popover> element (!)</span><br />console<span class="token punctuation">.</span><span class="token function">log</span><span class="token punctuation">(</span>moreActions<span class="token punctuation">.</span>popoverTargetElement<span class="token punctuation">)</span><span class="token punctuation">;</span></code></pre>
<p>(In spec terms: the <code><custom-popover></code> element is the <strong>unresolved <code>popoverTarget</code> target element</strong> for the <code><button></code>.)</p>
<p>This might also be a bit surprising; we spent quite a bit of time going back and forth on this, since we thought developers might want to know that the <code>popoverTarget</code> isn’t actually targeting anything. However, using the unresolved target lets us have a very close parallel between setting and getting the <code>popoverTargetElement</code>, as well as preserving the shadow root’s encapsulation.</p>
<p>The <a href="https://whatpr.org/html/10995/common-microsyntaxes.html#get-the-resolved-attr-target-element">resolved <em>attr</em> target element</a>, meanwhile, is what will be used when actually doing something with the attribute - such as triggering a popover, or computing a label’s labeled control, or determining an element’s <a href="https://w3c.github.io/accname/#mapping_additional_nd_description">accessible description</a>.</p>
<p>In the above example, the <strong>resolved <code>popoverTarget</code> target element</strong> for the button is null. And, going back to the examples we’ve seen earlier:</p>
<ul>
<li>the <strong>resolved <code>commandFor</code> target element</strong> for the Settings button is the inner <code><dialog></code> - clicking the button will open the <code><dialog></code>.</li>
<li>the <strong>resolved <code>for</code> target element</strong> for the <code><label></code> is the inner <code><input></code> - clicking the label will focus the input, and the input’s computed <a href="https://w3c.github.io/accname/#mapping_additional_nd_name">accessible name</a> will be “Track name”.</li>
</ul>
<h4 id="referring-to" tabindex="-1">“Referring to” <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h4>
<p>For convenience, we define the concept of an attribute <a href="https://whatpr.org/html/10995/common-microsyntaxes.html#single-element-reference-refers-to">“referring to”</a> an element:</p>
<blockquote>
<p>A single-element reference attribute <em>attr</em> on an element <em>X</em> <strong>refers to</strong> an element <em>Y</em> as its target if <em>Y</em> is the resolved <em>attr</em> target element for <em>X</em>.</p>
</blockquote>
<p>So, for example, the <code>commandFor</code> attribute on the Settings button <strong>refers to</strong> the inner <code><dialog></code> element.</p>
<h4 id="sets-of-element-references" tabindex="-1">Sets of element references <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h4>
<p>All of the above used single element references as examples, but there are attributes which can refer to more than one element. For example, almost all of the ARIA attributes which refer to other elements refer to multiple elements in an ordered list - one such is <a href="https://w3c.github.io/aria/#aria-errormessage"><code>aria-errormessage</code></a>, which can refer to one or more elements which should be exposed as specifically as an error message for an element which is marked as invalid.</p>
<p>We define a <a href="https://whatpr.org/html/10995/common-microsyntaxes.html#set-of-element-references">set of element references</a> attribute type, as well as a couple of subtypes which impose constraints such as ordering or uniqueness, as well as what it means for one of these attributes to <strong>refer to</strong> another element, and how to get the resolved and unresolved <em>attr</em> target elements for these attributes.</p>
<p>While these are slightly more complex than the single element versions, they follow the same basic logic. The only marginally significant difference is that since they produce lists of elements, if a shadow root’s reference target is invalid, no element is added to the list for that unresolved <em>attr</em> target, instead of returning null.</p>
<h4 id="using-these-concepts-in-the-rest-of-the-spec" tabindex="-1">Using these concepts in the rest of the spec <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h4>
<p>Now that we’ve defined these spec concepts, we have to update each place in the spec where we previously used the “whose ID is equal to the value of the blahblah attribute” wording.</p>
<p>Returning to our good friend <code>popoverTarget</code>, we can see a relatively straightforward example.</p>
<p>The definition of the <code>popoverTarget</code> attribute now reads:</p>
<blockquote>
<p>If specified, the <code>popovertarget</code> attribute value must be a valid single-element reference attribute referring to an element with the <code>popover</code> attribute.</p>
</blockquote>
<p>And now, <a href="https://whatpr.org/html/10995/59931bd...99765dc/popover.html#popover-target-element">get the <strong>popover target element</strong></a> determines <em>popoverElement</em> like this:</p>
<blockquote>
<ol start="4">
<li>Let <em>popoverElement</em> be <em>node</em>’s resolved <code>popovertarget</code> target element.</li>
</ol>
</blockquote>
<p><a href="https://whatpr.org/html/10995/forms.html#labeled-control"><code><label></code> association</a> is a bit more complex, since we wanted descendants of the <code><label></code> to be correctly labelled when using reference target:</p>
<blockquote>
<p>To determine a <code>label</code> element <em>label</em>’s <strong>labeled control</strong>, run these steps:</p>
<ol>
<li>If <em>label</em>’s <code>for</code> attribute is specified, then:
<ol>
<li>If the resolved <code>for</code> target element is not null, and the resolved <code>for</code> target element is a labelable element, return that element.</li>
<li>Otherwise, return null.</li>
</ol>
</li>
<li>For each descendant <em>descendant</em> of <em>label</em> in tree order:
<ol>
<li>Let <em>candidate</em> be the result of resolving the reference target on descendant.</li>
<li>If <em>candidate</em> is a labelable element, return <em>candidate</em>.</li>
</ol>
</li>
<li>Return null.</li>
</ol>
</blockquote>
<p>There is also a <a href="https://github.com/w3c/aria/pull/2474">PR open on the ARIA spec</a> to introduce this terminology there.</p>
<h3 id="implementation-status" tabindex="-1">Implementation status <a class="header-anchor" href="https://blogs.igalia.com/alice/reference-target-having-your-encapsulation-and-eating-it-too/">#</a></h3>
<p>Chromium has the <a href="https://wpt.fyi/results/shadow-dom/reference-target/tentative?label=master&label=experimental&product=chrome&product=edge&aligned">most complete implementation</a>, though it may not quite be up to date with the latest spec changes. Any developers wanting to try it out should get the latest build of Chrome or Edge and flip on the Experimental Web Platform Features flag. If you do try it out, I’d love to hear any <a href="https://github.com/WICG/webcomponents/issues/1120">feedback</a> you might have!</p>
<p>WebKit and Firefox <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1952585">(tracking bug)</a> each have a prototype implementation, available behind respective feature flags (<code>ShadowRootReferenceTargetEnabled</code> for WebKit and <code>dom.shadowdom.referenceTarget.enabled</code> for Firefox), which should pass at least most of the existing WPT tests - however, the WPT tests are insufficient to test <em>all</em> of the functionality, and the functionality which couldn’t be tested via WPTs hasn’t been implemented yet in these engines. The Chromium implementation included adding many Chromium-specific tests for the behaviour which can’t be tested via WPTs, as well as implementing that behaviour.</p>
<p>Currently, WPT tests can only test the computed accessible name and computed accessible role for an element, as well as testing DOM methods and user actions like clicking. However, reference target impacts the accessibility tree in many ways - not only via ARIA attributes, but via <a href="https://www.w3.org/TR/html-aam-1.0/#att-popovertarget">attributes like <code>popoverTarget</code> being exposed in the accessibility tree</a> as an accessible relation.</p>
<p>And, importantly, <em>changes</em> to the accessibility tree can require certain notifications to be fired to assistive technology APIs - and reference target introduces several new ways to change the accessibility tree. Adding, changing, or removing a shadow root’s <code>referenceTarget</code> may cause changes in the resolved target elements for attributes, causing accessibility tree changes and potentially requiring notifications. Likewise, inserting an element with an ID which matches a shadow root’s <code>referenceTarget</code> could also cause a shadow host’s resolved reference target to change, also potentially causing the accessibility tree to change.</p>
<p>There are two complementary projects currently underway which will allow us to write much richer tests for accessibility tree functionality in browsers:</p>
<ul>
<li><a href="https://github.com/web-platform-tests/wpt/pull/53733">support for writing WPT tests which directly test what browsers expose to accessibility APIs</a></li>
<li><a href="https://github.com/web-platform-tests/wpt/pull/55784">support for an “accessible properties” API</a>.</li>
</ul>
<p>Once we can write WPT tests which actually test the full spectrum of expected behaviour for reference target, we’ll be able to actually make it an <a href="https://github.com/web-platform-tests/interop/issues/1011">official interop focus area</a>.</p>
<p>The prototype implementation work in WebKit and Firefox, as well as the spec work done by Igalia, was generously funded by a <a href="https://blogs.igalia.com/mrego/solving-cross-root-aria-issues-in-shadow-dom/">grant from NLNet Foundation</a>, while the implementation work in Chromium and much of the remainder of the spec work was done by Microsoft engineers on the Edge team.</p> Alice Boxhallhttps://blogs.igalia.com/alice/alice/Qiuyi Zhang (Joyee): Improving Single Executable Application Building for Node.jshttps://joyeecheung.github.io/blog/2026/01/26/improving-single-executable-application-building-for-node-js/2026-01-26T22:11:31+00:00
<p>Recently, I <a target="_blank" rel="noopener" href="https://github.com/nodejs/node/pull/61167">landed a change that moves the Single Executable Application (SEA) build process directly into Node.js core</a> - a hobby project I’d been tinkering with for some time</p> Qiuyi Zhang (Joyee)https://joyeecheung.github.io/blog/Igalia WebKit Team: WebKit Igalia Periodical #54https://blogs.igalia.com/webkit/blog/2026/wip-54/2026-01-26T21:00:55+00:00
<p>Update on what happened in WebKit in the week from January 19 to January 26.</p>
<p>
The main event this week has been the creation of the branch for the upcoming stable series, accompanied by the first release candidate before 2.52.0. But there's more: the WPE port gains hyphenation support and the ability to notify of graphics buffer changes; both ports get graphics fixes and a couple of new Web features, and WPE-Android also gets a new stable release.
</p>
<h2 id="cross-port-cat">Cross-Port 🐱</h2>
<div class="wip-item">
<p><a rel="external" href="https://commits.webkit.org/305917@main">Implemented</a> support for the <code>:open</code>
pseudo-class on dialog and details elements. This is currently behind the
<code>OpenPseudoClass</code> feature flag.</p>
</div>
<div class="wip-item">
<p><a rel="external" href="https://commits.webkit.org/306152@main">Implemented</a> the <code>source</code> property for
<code>ToggleEvent</code>. This can be used to run code dependent on the triggering element
in response to a popover or dialog toggle.</p>
</div>
<h3 id="graphics-frame-photo">Graphics 🖼️</h3>
<div class="wip-item">
<p><a rel="external" href="https://commits.webkit.org/306119@main">Fixed</a> the rendering glitches with
wheel event asynchronous scrolling, which occurred when the page was scrolled
to areas not covered by tiles while the main thread was blocked.</p>
</div>
<h2 id="wpe-webkit-pager">WPE WebKit 📟</h2>
<div class="wip-item">
<p>Support for
<a rel="external" href="https://developer.mozilla.org/en-US/docs/Web/CSS/Reference/Properties/hyphens">hyphenation</a>
has been <a rel="external" href="https://commits.webkit.org/305816@main">added to WPE</a>. This requires
<code>libhyphen</code> and can be disabled at build-time with the <code>USE_LIBHYPHEN=OFF</code>
CMake option.</p>
</div>
<h3 id="wpe-platform-api-jigsaw">WPE Platform API 🧩</h3>
<div class="wip-description">
<p>New, modern platform API that supersedes usage of libwpe and WPE backends.</p>
</div>
<div class="wip-item">
<p>WPEPlatform <a rel="external" href="https://commits.webkit.org/306008@main">gained support</a> to notify
changes in the configuration of graphics buffers allocated to render the
contents of a web view, either by handling the <code>WPEView::buffers-changed</code>
signal or by overriding the <code>WPEViewClass.buffers_changed</code> virtual function.
This feature is mainly useful for platform implementations which may need to
perform additional setup in advance, before updated web view contents are
provided in the buffers configured by WebKit.</p>
</div>
<h2 id="releases-package">Releases 📦️</h2>
<div class="wip-item">
<p><a rel="external" href="https://github.com/Igalia/wpe-android/releases/tag/v0.3.1">WPE-Android 0.3.0</a>
has been released, and prebuilt packages are available <a rel="external" href="https://central.sonatype.com/artifact/org.wpewebkit.wpeview/wpeview/">at the Maven Central
repository</a>.
The main change in this this version is the update to WPE WebKit 2.50.4, which
is the most recent stable release.</p>
</div>
<div class="wip-item">
<p><a rel="external" href="https://github.com/WebKit/WebKit/commits/webkitglib/2.52">A new branch has been
created</a> for the
upcoming 2.52.x stable release series of the GTK and WPE WebKit ports. The
first release candidates from this branch, <a rel="external" href="https://webkitgtk.org/2026/01/23/webkitgtk2.51.90-released.html">WebKitGTK
2.51.90</a> and
<a rel="external" href="https://wpewebkit.org/release/wpewebkit-2.51.90.html">WPE WebKit 2.51.90</a> are
now available. Testing and <a rel="external" href="https://bugs.webkit.org">issue reports in Bugzilla</a>
are welcome to help with stabilization before the first stable release, which
is planned for mid-March.</p>
</div>
<div class="wip-end">
<p>That’s all for this week!</p>
</div> Igalia WebKit Teamhttps://blogs.igalia.com/webkitEnrique Ocaña: Igalia Multimedia contributions in 2025https://eocanha.org/blog/?p=7012026-01-26T09:34:37+00:00
<img class="face" src="/images/eocanha.png" width="100" height="100" alt="" align="right" style="float: right" />
<p>Now that 2025 is over, it’s time to look back and feel proud of the path we’ve walked. Last year has been really exciting in terms of contributions to GStreamer and WebKit for the Igalia Multimedia team.</p>
<p>With more than 459 contributions along the year, we’ve been one of the top contributors to the GStreamer project, in areas like Vulkan Video, GstValidate, VA, GStreamer Editing Services, WebRTC or H.266 support.</p>
<figure class="wp-block-image size-full"><a href="https://eocanha.org/blog/wp-content/uploads/2026/01/gstreamer-contributions.jpg"><img width="943" height="530" src="https://eocanha.org/blog/wp-content/uploads/2026/01/gstreamer-contributions.jpg" alt="Pie chart of Igalia's contributions to different areas of the GStreamer project:
other (30%)
vulkan (24%)
validate (7%)
va (6%)
ges (4%)
webrtc (3%)
h266parse (3%)
python (3%)
dots-viewer (3%)
tests (2%)
docs (2%)
devtools (2%)
webrtcbin (1%)
tracers (1%)
qtdemux (1%)
gst (1%)
ci (1%)
y4menc (1%)
videorate (1%)
gl (1%)
alsa (1%)" class="wp-image-706" /></a><figcaption>Igalia’s contributions to the GStreamer project</figcaption></figure>
<p>In Vulkan Video we’ve worked on the VP9 video decoder, and cooperated with other contributors to push the AV1 decoder as well. There’s now an H.264 base class for video encoding that is designed to support general hardware-accelerated processing.</p>
<p>GStreaming Editing Services, the framework to build video editing applications, has gained time remapping support, which now allows to include fast/slow motion effects in the videos. Video transformations (scaling, cropping, rounded corners, etc) are now hardware-accelerated thanks to the addition of new Skia-based GStreamer elements and integration with OpenGL. Buffer pool tuning and pipeline improvements have helped to optimize memory usage and performance, enabling the edition of 4K video at 60 frames per second. Much of this work to improve and ensure quality in GStreamer Editing Services has also brought improvements in the GstValidate testing framework, which will be useful for other parts of GStreamer.</p>
<p>Regarding H.266 (VVC), full playback support (with decoders such as <code>vvdec</code> and <code>avdec_h266</code>, demuxers and muxers for Matroska, MP4 and TS, and parsers for the <code>vvc1</code> and <code>vvi1</code> formats) is now available in GStreamer 1.26 thanks to Igalia’s work. This allows user applications such as the WebKitGTK web browser to leverage the hardware accelerated decoding provided by VAAPI to play H.266 video using GStreamer.</p>
<p>Igalia has also been one of the top contributors to GStreamer Rust, with 43 contributions. Most of the commits there have been related to Vulkan Video.</p>
<figure class="wp-block-image size-full"><a href="https://eocanha.org/blog/wp-content/uploads/2026/01/gstreamer-rs-contributions.jpg"><img width="943" height="530" src="https://eocanha.org/blog/wp-content/uploads/2026/01/gstreamer-rs-contributions.jpg" alt="Pie chart of Igalia's contributions to different areas of the GStreamer Rust project:
vulkan (28%)
other (26%)
gstreamer (12%)
ci (12%)
tracer (7%)
validate (5%)
ges (7%)
examples (5%)" class="wp-image-708" /></a><figcaption>Igalia’s contributions to the GStreamer Rust project</figcaption></figure>
<p>In addition to GStreamer, the team also has a strong presence in WebKit, where we leverage our GStreamer knowledge to implement many features of the web engine related to multimedia. From the 1739 contributions to the WebKit project done last year by Igalia, the Multimedia team has made 323 of them. Nearly one third of those have been related to generic multimedia playback, and the rest have been on areas such as WebRTC, MediaStream, MSE, WebAudio, a new Quirks system to provide adaptations for specific hardware multimedia platforms at runtime, WebCodecs or MediaRecorder.</p>
<figure class="wp-block-image size-full"><a href="https://eocanha.org/blog/wp-content/uploads/2026/01/webkit-contributions.jpg"><img width="943" height="530" src="https://eocanha.org/blog/wp-content/uploads/2026/01/webkit-contributions.jpg" alt="Pie chart of Igalia's contributions to different areas of the WebKit project:
Generic Gstreamer work (33%)
WebRTC (20%)
Regression bugfixing (9%)
Other (7%)
MSE (6%)
BuildStream SDK (4%)
MediaStream (3%)
WPE platform (3%)
WebAudio (3%)
WebKitGTK platform (2%)
Quirks (2%)
MediaRecorder (2%)
EME (2%)
Glib (1%)
WTF (1%)
WebCodecs (1%)
GPUProcess (1%)
Streams (1%) " class="wp-image-709" /></a><figcaption>Igalia Multimedia Team’s contributions to different areas of the WebKit project</figcaption></figure>
<p>We’re happy about what we’ve achieved along the year and look forward to maintaining this success and bringing even more exciting features and contributions in 2026.</p> eocanhahttps://eocanha.org/blogLuke Lau: Closing the gap, part 2: Probability and profitabilityhttp://lukelau.me/2026/01/26/closing-the-gap-pt22026-01-25T16:00:00+00:00
<p>Welcome back to the second post in this series looking at how we can
improve the performance of RISC-V code from LLVM.</p>
<p>Previously in <a href="http://lukelau.me/2025/12/10/closing-the-gap-pt1.html">part 1</a>
we looked at how we can use <a href="https://cc-perf.igalia.com">LNT</a> to
analyze performance gaps, then identified and fixed a missed <code class="language-plaintext highlighter-rouge">fmsub.d</code>
opportunity during instruction selection, giving a modest 1.77%
speedup on a SPEC CPU 2017 benchmark.</p>
<p>In this post we’ll be improving another SPEC benchmark by <strong>7%</strong> by
teaching the loop vectorizer to make smarter cost modelling
decisions. It involves a relatively non-trivial analysis, but thanks
to LLVM’s modular infrastructure we can do it in just a handful of
lines of code. Let’s get started.</p>
<h2 id="analysis">Analysis</h2>
<p>Just like last time, all fruitful performance work begins by analysing
some workloads. In the last post we had already run some comparisons
of SPEC CPU 2017 benchmarks on LNT, so we can return to those results
and pick another benchmark to focus on. Here’s one that’s 12% slower
than GCC:</p>
<p><img src="http://lukelau.me/assets/531.deepsjeng_r-before.png" alt="Screenshot of LNT showing the 531.deepsjeng_r benchmark being 12.14% slower on Clang vs GCC" /></p>
<p>531.deepsjeng_r is a <a href="https://en.wikipedia.org/wiki/Sjeng_(software)#Deep_Sjeng">chess
engine</a>
that tied first in the World Computer Chess Championships back
in 2009. It consists of a lot bitwise arithmetic and complex loops,
since the state of the game is encoded in 64 element arrays: one
element for each square on the board. Unlike 508.namd_r from last
time, there’s no floating point arithmetic.</p>
<p>Drilling into the profile and its list of functions, right off the bat
we can see that one function is much slower on LLVM. On GCC
<code class="language-plaintext highlighter-rouge">qsearch(state_t*, int, int, int, int)</code> makes up 9.1% of the overall
cycles, but on LLVM it’s 16.1%. And if we click in on the function and
view the cumulative total of cycles spent in user mode, Clang takes
74.6 billion cycles to do what takes GCC only 37.7 billion cycles.</p>
<figure>
<img src="http://lukelau.me/assets/531.deepsjeng_r-total-cumulative-cycles.png" alt="Screenshot of Clang disassembly and GCC disassembly side by side, with inline total cumulative cycle annotations showing Clang taking 74.6 billion cycles and GCC taking 37.7" />
<figcaption>Left shows Clang taking 74.6 billion cycles, right shows GCC taking 37.7 billion.</figcaption>
</figure>
<p>So there’s probably something we can improve upon here, but it’s not
immediately obvious from staring at the disassembly. <code class="language-plaintext highlighter-rouge">qsearch</code> is a
pretty big function with a couple hundred instructions, so switching
to the CFG view gives us a better overview.</p>
<p>On LLVM’s side we see the offending loop that’s consuming so many
cycles: It’s long, vectorized, and completely if-predicated: there’s
no control flow inside the loop itself. This is typical of a loop
that’s been auto-vectorized by the loop vectorized. If you look at the
load and store instructions you can see that they are masked with the
<code class="language-plaintext highlighter-rouge">v0.t</code> operand, stemming from the original control flow that was
flattened.</p>
<p><img src="http://lukelau.me/assets/531.deepsjeng_r-disassembly-before.png" alt="Screenshot of the disassembly from Clang, showing a very hot block
with a lot of masked vector
instructions." /></p>
<p>But on the GCC side there’s no equivalent vectorized loop. The loop is
in there somewhere, but <strong>all the loops are still in their original
scalar form with the control flow intact</strong>. And if we look at the
edges coming from the loop headers, we can see that most of the time
it visits one or two basic blocks and then branches back up to the
header. Most of the blocks in the loop are completely cold.</p>
<p>Unfortunately the sources for deepsjeng aren’t open source so we can’t
share them in this post, but the very rough structure of the loop is
something like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">N</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">foo</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="n">a</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">bar</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">baz</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
<span class="n">qux</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">123</span><span class="p">;</span>
<span class="c1">// lots of work here...</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>For any given iteration, it’s statistically unlikely that we enter the
first if statement. It’s even more unlikely that the second if’s
condition is also true. And even more so for the third nested if where
we eventually have lots of work to compute.</p>
<p>In a scalar loop this doesn’t matter because if an if statement’s
condition is false, then we don’t execute the code inside it. We just
branch back to the start of the loop. But with a vectorized loop, we
execute every single instruction regardless of the condition.</p>
<p>This is the core of the performance gap that we’re seeing versus GCC:
Given that the majority of the work in this loop is so deeply nested
in the control flow, it would have been better to have not vectorized
it given that we need to if-convert it.</p>
<h1 id="cost-modelling">Cost modelling</h1>
<p>One of the hardest problems when making an optimizing compiler is to
know when an optimization is profitable. Some optimizations are a
double edged sword that can harm performance just as much as they can
improve it (if not more), and loop vectorization falls squarely into
this category. So rather than blindly applying optimizations at any
given opportunity, LLVM has detailed cost models for each target to
try and estimate how expensive or cheap a certain sequence of
instructions is, which it can then use to evaluate whether or not a
transform will be a net positive.</p>
<p>It’s hard to overstate the amount of effort in LLVM spent fine tuning
these cost models, applying various heuristics and approximations to
make sure different optimizations don’t shoot themselves in the
foot. In fact there are some optimizations like loop distribute that
are in-tree but disabled by default due to the difficulty in getting
the cost model right.</p>
<p>So naturally, we would expect that the loop vectorizer already has a
sophisticated solution for the problem we’re seeing in our analysis:
Given any predicated block that’s if-converted during vectorization,
we would expect the scalar cost for that block to be made slightly
cheaper because the scalar block may not always be executed. And the
less likely it is to be executed, the cheaper it should be — the
most deeply nested if block should be discounted more than the
outermost if block.</p>
<p>So how does the loop vectorizer handle this?</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">/// A helper function that returns how much we should divide the cost of a</span>
<span class="c1">/// predicated block by. Typically this is the reciprocal of the block</span>
<span class="c1">/// probability, i.e. if we return X we are assuming the predicated block will</span>
<span class="c1">/// execute once for every X iterations of the loop header so the block should</span>
<span class="c1">/// only contribute 1/X of its cost to the total cost calculation, but when</span>
<span class="c1">/// optimizing for code size it will just be 1 as code size costs don't depend</span>
<span class="c1">/// on execution probabilities.</span>
<span class="c1">///</span>
<span class="c1">/// TODO: We should use actual block probability here, if available. Currently,</span>
<span class="c1">/// we always assume predicated blocks have a 50% chance of executing.</span>
<span class="kr">inline</span> <span class="kt">unsigned</span>
<span class="nf">getPredBlockCostDivisor</span><span class="p">(</span><span class="n">TargetTransformInfo</span><span class="o">::</span><span class="n">TargetCostKind</span> <span class="n">CostKind</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">CostKind</span> <span class="o">==</span> <span class="n">TTI</span><span class="o">::</span><span class="n">TCK_CodeSize</span> <span class="o">?</span> <span class="mi">1</span> <span class="o">:</span> <span class="mi">2</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We’ve come across a load bearing TODO here. Either the block is
executed or its not, so it’s a fifty/fifty chance.</p>
<p>On its own this hardcoded probability doesn’t seem like an
unreasonable guess. But whilst 50% may be an accurate estimate as to
whether or not a <strong>branch</strong> will be taken, it’s an inaccurate estimate
as to whether or not a <strong>block</strong> will be executed. Assuming that a
branch has a 1/2 chance of being taken, the most deeply nested block
in our example ends up having a <code class="language-plaintext highlighter-rouge">1/2 * 1/2 * 1/2 = 1/8</code> chance of
being executed.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">N</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">foo</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="n">a</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// 1/2 chance of being executed</span>
<span class="k">if</span> <span class="p">(</span><span class="n">bar</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// 1/4 chance of being executed</span>
<span class="k">if</span> <span class="p">(</span><span class="n">baz</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// 1/8 chance of being executed</span>
<span class="c1">// ...</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The fix to get the loop vectorizer to not unprofitably vectorize this
loop will be to teach <code class="language-plaintext highlighter-rouge">getPredBlockCostDivisor</code> to take into account
control flow between blocks.</p>
<p>It’s worth mentioning the fact that a hardcoded constant managed to
work well enough up until this point is the sign of an good trade
off. 1% of the effort for 90% of the benefit. A patch can go off the
rails very easily by trying to implement too much in one go, so
deferring the more complex cost modelling here till later was an
astute choice. Incremental development is key to making progress
upstream.</p>
<h2 id="vplan-cost-modeling">VPlan cost modeling</h2>
<p>To get a better picture of how the loop vectorizer is calculating the
cost for each possible loop, lets start with a simplified LLVM IR reproducer:</p>
<div class="language-llvm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">; for (int i = 0; i < 1024; i++)</span>
<span class="c1">; if (c0)</span>
<span class="c1">; if (c1)</span>
<span class="c1">; p1[p0[i]] = 0; // extra work to increase the cost in the predicated block</span>
<span class="k">define</span> <span class="kt">void</span> <span class="vg">@nested</span><span class="p">(</span><span class="kt">ptr</span> <span class="k">noalias</span> <span class="nv">%p0</span><span class="p">,</span> <span class="kt">ptr</span> <span class="k">noalias</span> <span class="nv">%p1</span><span class="p">,</span> <span class="kt">i1</span> <span class="nv">%c0</span><span class="p">,</span> <span class="kt">i1</span> <span class="nv">%c1</span><span class="p">)</span> <span class="p">{</span>
<span class="nl">entry:</span>
<span class="k">br</span> <span class="kt">label</span> <span class="nv">%loop</span>
<span class="nl">loop:</span>
<span class="nv">%iv</span> <span class="p">=</span> <span class="k">phi</span> <span class="kt">i32</span> <span class="p">[</span> <span class="m">0</span><span class="p">,</span> <span class="nv">%entry</span> <span class="p">],</span> <span class="p">[</span> <span class="nv">%iv.next</span><span class="p">,</span> <span class="nv">%latch</span> <span class="p">]</span>
<span class="k">br</span> <span class="kt">i1</span> <span class="nv">%c0</span><span class="p">,</span> <span class="kt">label</span> <span class="nv">%then.0</span><span class="p">,</span> <span class="kt">label</span> <span class="nv">%latch</span>
<span class="nl">then.0:</span>
<span class="k">br</span> <span class="kt">i1</span> <span class="nv">%c1</span><span class="p">,</span> <span class="kt">label</span> <span class="nv">%then.1</span><span class="p">,</span> <span class="kt">label</span> <span class="nv">%latch</span>
<span class="nl">then.1:</span>
<span class="nv">%gep0</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="kt">i32</span><span class="p">,</span> <span class="kt">ptr</span> <span class="nv">%p0</span><span class="p">,</span> <span class="kt">i32</span> <span class="nv">%iv</span>
<span class="nv">%x</span> <span class="p">=</span> <span class="k">load</span> <span class="kt">i32</span><span class="p">,</span> <span class="kt">ptr</span> <span class="nv">%gep0</span>
<span class="nv">%gep1</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="kt">i32</span><span class="p">,</span> <span class="kt">ptr</span> <span class="nv">%p1</span><span class="p">,</span> <span class="kt">i32</span> <span class="nv">%x</span>
<span class="k">store</span> <span class="kt">i32</span> <span class="m">0</span><span class="p">,</span> <span class="kt">ptr</span> <span class="nv">%gep1</span>
<span class="k">br</span> <span class="kt">label</span> <span class="nv">%latch</span>
<span class="nl">latch:</span>
<span class="nv">%iv.next</span> <span class="p">=</span> <span class="k">add</span> <span class="kt">i32</span> <span class="nv">%iv</span><span class="p">,</span> <span class="m">1</span>
<span class="nv">%done</span> <span class="p">=</span> <span class="k">icmp</span> <span class="k">eq</span> <span class="kt">i32</span> <span class="nv">%iv.next</span><span class="p">,</span> <span class="m">1024</span>
<span class="k">br</span> <span class="kt">i1</span> <span class="nv">%done</span><span class="p">,</span> <span class="kt">label</span> <span class="nv">%exit</span><span class="p">,</span> <span class="kt">label</span> <span class="nv">%loop</span>
<span class="nl">exit:</span>
<span class="k">ret</span> <span class="kt">void</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We can run <code class="language-plaintext highlighter-rouge">opt -p loop-vectorize -debug</code> on this example to see how the loop
vectorizer decides if it’s profitable to vectorize the loop or not:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ opt -p loop-vectorize -mtriple riscv64 -mattr=+v nested.ll -disable-output -debug
...
LV: Found an estimated cost of 0 for VF 1 For instruction: %iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ]
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %c0, label %then.0, label %latch
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %c1, label %then.1, label %latch
LV: Found an estimated cost of 0 for VF 1 For instruction: %gep0 = getelementptr i32, ptr %p0, i32 %iv
LV: Found an estimated cost of 1 for VF 1 For instruction: %x = load i32, ptr %gep0, align 4
LV: Found an estimated cost of 0 for VF 1 For instruction: %gep1 = getelementptr i32, ptr %p1, i32 %x
LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 0, ptr %gep1, align 4
LV: Found an estimated cost of 0 for VF 1 For instruction: br label %latch
LV: Found an estimated cost of 1 for VF 1 For instruction: %iv.next = add i32 %iv, 1
LV: Found an estimated cost of 1 for VF 1 For instruction: %done = icmp eq i32 %iv.next, 1024
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %done, label %exit, label %loop
LV: Scalar loop costs: 3.
...
Cost of 1 for VF vscale x 4: induction instruction %iv.next = add i32 %iv, 1
Cost of 0 for VF vscale x 4: induction instruction %iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ]
Cost of 1 for VF vscale x 4: exit condition instruction %done = icmp eq i32 %iv.next, 1024
Cost of 0 for VF vscale x 4: EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
Cost of 0 for VF vscale x 4: EXPLICIT-VECTOR-LENGTH-BASED-IV-PHI vp<%5> = phi ir<0>, vp<%index.evl.next>
Cost of 0 for VF vscale x 4: EMIT-SCALAR vp<%avl> = phi [ ir<1024>, vector.ph ], [ vp<%avl.next>, vector.body ]
Cost of 1 for VF vscale x 4: EMIT-SCALAR vp<%6> = EXPLICIT-VECTOR-LENGTH vp<%avl>
Cost of 0 for VF vscale x 4: vp<%7> = SCALAR-STEPS vp<%5>, ir<1>, vp<%6>
Cost of 0 for VF vscale x 4: CLONE ir<%gep0> = getelementptr ir<%p0>, vp<%7>
Cost of 0 for VF vscale x 4: vp<%8> = vector-pointer ir<%gep0>
Cost of 2 for VF vscale x 4: WIDEN ir<%x> = vp.load vp<%8>, vp<%6>, vp<%3>
Cost of 0 for VF vscale x 4: WIDEN-GEP Inv[Var] ir<%gep1> = getelementptr ir<%p1>, ir<%x>
Cost of 12 for VF vscale x 4: WIDEN vp.store ir<%gep1>, ir<0>, vp<%6>, vp<%3>
Cost of 0 for VF vscale x 4: EMIT vp<%index.evl.next> = add nuw vp<%6>, vp<%5>
Cost of 0 for VF vscale x 4: EMIT vp<%avl.next> = sub nuw vp<%avl>, vp<%6>
Cost of 0 for VF vscale x 4: EMIT vp<%index.next> = add nuw vp<%4>, vp<%0>
Cost of 0 for VF vscale x 4: EMIT branch-on-count vp<%index.next>, vp<%1>
Cost of 0 for VF vscale x 4: vector loop backedge
Cost of 0 for VF vscale x 4: EMIT-SCALAR vp<%bc.resume.val> = phi [ ir<0>, ir-bb<entry> ]
Cost of 0 for VF vscale x 4: IR %iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ] (extra operand: vp<%bc.resume.val> from scalar.ph)
Cost of 0 for VF vscale x 4: EMIT vp<%3> = logical-and ir<%c0>, ir<%c1>
Cost for VF vscale x 4: 17 (Estimated cost per lane: 2.1)
...
LV: Selecting VF: vscale x 4.
LV: Minimum required TC for runtime checks to be profitable:0
LV: Interleaving is not beneficial.
LV: Found a vectorizable loop (vscale x 4) in nested.ll
LV: Vectorizing: innermost loop.
LEV: Unable to vectorize epilogue because no epilogue is allowed.
LV: Loop does not require scalar epilogue
LV: Loop does not require scalar epilogue
Executing best plan with VF=vscale x 4, UF=1
</code></pre></div></div>
<p>First we see it work out the cost of the original scalar loop, or as
the vectorizer sees it, the loop with a vectorization factor (VF)
of 1. It goes through each instruction calling into
TargetTransformInfo, and arrives at a total scalar cost of 3. You
might have noticed though, if you went through and manually summed up
the individual instruction costs you would have gotten a total cost
of 4. However the load and store instructions belong to the predicated
<code class="language-plaintext highlighter-rouge">then.1</code> block, so they have their cost divided by 2 from
<code class="language-plaintext highlighter-rouge">getPredBlockCostDivisor</code>.</p>
<p>For the vectorized loop, the loop vectorizer uses
<a href="https://llvm.org/docs/VectorizationPlan.html">VPlan</a> to cost the one
plan for a range of different VFs<sup id="fnref:scalar-vplan"><a href="http://lukelau.me/2026/01/26/closing-the-gap-pt2.html#fn:scalar-vplan" class="footnote" rel="footnote">1</a></sup>. VPlan is an IR
specific to the loop vectorizer to help represent various
vectorization strategies, which is why you see all the <code class="language-plaintext highlighter-rouge">EMIT</code> and
<code class="language-plaintext highlighter-rouge">WIDEN</code> “recipes” in the output. It calculates a total cost for the
loop and divides it by the estimated number of lanes — we’re working
with scalable vectors on RISC-V so the target needs to make an
estimate of what <code class="language-plaintext highlighter-rouge">vscale</code> is — and arrives at 2.1 per lane. There’s
no predication discount applied here because it’s a vectorized
loop. 2.1 is cheaper than 3, so it ultimately picks the vectorized
loop.</p>
<h2 id="blockfrequencyinfo">BlockFrequencyInfo</h2>
<p>Computing an accurate probability that a given block will be executed
is a non-trivial task, but thankfully LLVM already has an analysis we
can use for this called BlockFrequencyInfo.</p>
<p>BlockFrequencyInfo computes how often a block can be expected to
execute relative to other blocks in a function. It in turn uses
another analysis called BranchProbabilityInfo to work out how likely a
branch to a specific block is going to be taken. And because
BranchProbabilityInfo uses profiling information when available, it
can give you much more accurate block frequencies when compiling with
<a href="https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization">PGO</a>. Otherwise
it will fall back to guessing the probability of a branch being taken,
which is just 50/50 a lot of the time, but sometimes influenced by
interesting heuristics too: like the probability of <code class="language-plaintext highlighter-rouge">icmp eq i32 %x,
0</code> is 0.375 instead of 0.5, and floats have a near zero chance of
being NaN.</p>
<p>Plugging BlockFrequencyInfo into the loop vectorizer is
straightforward, all we need to do is tell the pass manager that we
want to access BlockFrequencyInfo from LoopVectorizePass:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">PreservedAnalyses</span> <span class="n">LoopVectorizePass</span><span class="o">::</span><span class="n">run</span><span class="p">(</span><span class="n">Function</span> <span class="o">&</span><span class="n">F</span><span class="p">,</span>
<span class="n">FunctionAnalysisManager</span> <span class="o">&</span><span class="n">AM</span><span class="p">)</span> <span class="p">{</span>
<span class="p">...</span>
<span class="n">BFI</span> <span class="o">=</span> <span class="o">&</span><span class="n">AM</span><span class="p">.</span><span class="n">getResult</span><span class="o"><</span><span class="n">BlockFrequencyAnalysis</span><span class="o">></span><span class="p">(</span><span class="n">F</span><span class="p">);</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>(BlockFrequencyAnalysis is the pass that computes the analysis result BlockFrequencyInfo, if you’re wondering why the names are different)</p>
<p>Then we can use it to lookup the relative frequencies of whatever
block and work out the probability of it being executed in the loop:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span> <span class="n">LoopVectorizationCostModel</span><span class="o">::</span><span class="n">getPredBlockCostDivisor</span><span class="p">(</span>
<span class="n">TargetTransformInfo</span><span class="o">::</span><span class="n">TargetCostKind</span> <span class="n">CostKind</span><span class="p">,</span> <span class="k">const</span> <span class="n">BasicBlock</span> <span class="o">*</span><span class="n">BB</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">CostKind</span> <span class="o">==</span> <span class="n">TTI</span><span class="o">::</span><span class="n">TCK_CodeSize</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">HeaderFreq</span> <span class="o">=</span>
<span class="n">BFI</span><span class="o">-></span><span class="n">getBlockFreq</span><span class="p">(</span><span class="n">TheLoop</span><span class="o">-></span><span class="n">getHeader</span><span class="p">()).</span><span class="n">getFrequency</span><span class="p">();</span>
<span class="kt">uint64_t</span> <span class="n">BBFreq</span> <span class="o">=</span> <span class="n">BFI</span><span class="o">-></span><span class="n">getBlockFreq</span><span class="p">(</span><span class="n">BB</span><span class="p">).</span><span class="n">getFrequency</span><span class="p">();</span>
<span class="k">return</span> <span class="n">HeaderFreq</span> <span class="o">/</span> <span class="n">BBFreq</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The frequencies returned from BlockFrequencyInfo are relative to the
the entry block of a function. So if a block has a frequency of 50 and
the entry block has a frequency of 100, then you can expect that block
to execute 50 times for every 100 times the entry block is executed.</p>
<p>You can use this to work out probabilities of a block being taken in a
function, so in this example that block has a 50/100 = 50% chance of
being executed every time the function is executed. However this only
works in the case that the CFG has no loops: otherwise a block may be
executed more times than the entry block and we’d end up with
probabilities greater than 100%.</p>
<p>If we want to calculate the probability of a block being executed
<em>inside a loop</em> though, that’s fine since the loop vectorizer
currently only vectorizes inner-most loops<sup id="fnref:vplan-native"><a href="http://lukelau.me/2026/01/26/closing-the-gap-pt2.html#fn:vplan-native" class="footnote" rel="footnote">2</a></sup>, i.e. loops
that contain no other loops.</p>
<p>We can consider the frequencies of each block in the loop relative to
the frequency of the header block. To give a brief <a href="https://llvm.org/docs/LoopTerminology.html#id7">loop
terminology</a> recap,
the header is the first block inside the loop body which dominates all
other blocks in the loop, and is the destination of all backedges. So
the header is guaranteed to have a frequency greater than or equal to
any other block in the loop — this invariant is important as we’ll
see later.</p>
<p><img src="https://llvm.org/docs/_images/loop-terminology.svg" alt="A diagram showing off terminology for different parts of a loop" /></p>
<p>Then to calculate the probability of a block in a loop being executed,
we divide the block frequency by the header frequency. To work out how
much we should divide the cost of the scalar block by, we return the
inverse of that.</p>
<p>Trying out this change on our sample loop, first we’ll see the debug
output from BlockFrequencyInfo as it’s computed:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ opt -p loop-vectorize -mtriple riscv64 -mattr=+v nested.ll -disable-output -debug
...
block-frequency-info: nested
- entry: float = 1.0, int = 562949953421312
- loop: float = 32.0, int = 18014398509481984
- then.0: float = 16.0, int = 9007199254740992
- then.1: float = 8.0, int = 4503599627370496
- latch: float = 32.0, int = 18014398509481984
- exit: float = 1.0, int = 562949953421312
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">loop</code> is the header block and <code class="language-plaintext highlighter-rouge">then.1</code> is the nested if block, and
with BlockFrequencyInfo’s frequency we get a probability of 8/32 =
0.25. So we would expect <code class="language-plaintext highlighter-rouge">then.1</code>’s scalar cost to be divided by 4:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>...
LV: Found an estimated cost of 0 for VF 1 For instruction: %iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ]
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %c0, label %then.0, label %latch
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %c1, label %then.1, label %latch
LV: Found an estimated cost of 0 for VF 1 For instruction: %gep0 = getelementptr i32, ptr %p0, i32 %iv
LV: Found an estimated cost of 1 for VF 1 For instruction: %x = load i32, ptr %gep0, align 4
LV: Found an estimated cost of 0 for VF 1 For instruction: %gep1 = getelementptr i32, ptr %p1, i32 %x
LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 0, ptr %gep1, align 4
LV: Found an estimated cost of 0 for VF 1 For instruction: br label %latch
LV: Found an estimated cost of 1 for VF 1 For instruction: %iv.next = add i32 %iv, 1
LV: Found an estimated cost of 1 for VF 1 For instruction: %done = icmp eq i32 %iv.next, 1024
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %done, label %exit, label %loop
LV: Scalar loop costs: 2.
...
Cost for VF vscale x 4: 17 (Estimated cost per lane: 2.1)
...
LV: Selecting VF: 1.
LV: Vectorization is possible but not beneficial.
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">then.1</code>s scalar cost is now 2/4 = 0, so the total cost of the
scalar loop is now 2 and the loop vectorizer no longer decides to
vectorize. If we try this out on 538.deepsjeng_r, we can see that it
no longer vectorizes that loop in <code class="language-plaintext highlighter-rouge">qsearch</code> either. Success!</p>
<p><img src="http://lukelau.me/assets/531.deepsjeng_r-after.png" alt="Screenshot of LNT showing a 6.82% improvement on 531.deepsjeng_r" /></p>
<p>Running it again on LNT showed a ~7% speedup in execution time. Not
just as fast as GCC yet, but a welcome improvement for only a handful
of lines of code.</p>
<h2 id="upstreaming">Upstreaming</h2>
<p>Now that we know the fix we want to land, we can start to think about
how we want to upstream this into LLVM.</p>
<p>If we run <code class="language-plaintext highlighter-rouge">llvm-lit --update-tests
llvm/test/Transforms/LoopVectorize</code>, we actually get quite a few
unexpected test changes. One of the side effects of using
BlockFrequencyInfo is that <a href="https://github.com/llvm/llvm-project/pull/160449">tail folded loops no longer discount the
scalar loop if it wasn’t predicated to begin
with</a>. A tail folded
loop is a loop where the scalar epilogue is folded into the vector loop itself by predicating the vector operations:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// non-tail folded loop:</span>
<span class="c1">// process as many VF sized vectors that fit in n</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span> <span class="o">-</span> <span class="p">(</span><span class="n">n</span> <span class="o">%</span> <span class="n">VF</span><span class="p">);</span> <span class="n">i</span> <span class="o">+=</span> <span class="n">VF</span><span class="p">)</span>
<span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">..</span><span class="n">i</span><span class="o">+</span><span class="n">VF</span><span class="p">]</span> <span class="o">=</span> <span class="n">y</span><span class="p">[</span><span class="n">i</span><span class="p">..</span><span class="n">i</span><span class="o">+</span><span class="n">VF</span><span class="p">];</span>
<span class="c1">// process the remaining n % VF scalar elements</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">n</span> <span class="o">-</span> <span class="p">(</span><span class="n">n</span> <span class="o">%</span> <span class="n">VF</span><span class="p">);</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">y</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
</code></pre></div></div>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// tail folded loop:</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span> <span class="o">+=</span> <span class="n">VF</span><span class="p">)</span>
<span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">..</span><span class="n">i</span><span class="o">+</span><span class="n">VF</span><span class="p">]</span> <span class="o">=</span> <span class="n">y</span><span class="p">[</span><span class="n">i</span><span class="p">..</span><span class="n">i</span><span class="o">+</span><span class="n">VF</span><span class="p">]</span> <span class="n">mask</span><span class="o">=</span><span class="p">[</span><span class="n">i</span><span class="o"><</span><span class="n">n</span><span class="p">,</span> <span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="o"><</span><span class="n">n</span><span class="p">,</span> <span class="p">...,</span> <span class="n">i</span><span class="o">+</span><span class="n">VF</span><span class="o">-</span><span class="mi">1</span><span class="o"><</span><span class="n">n</span><span class="p">];</span>
</code></pre></div></div>
<p>However because this block is technically predicated due to the mask
on the vector instructions, the loop vectorizer applied
<code class="language-plaintext highlighter-rouge">getPredBlockCostDivisor</code> to the scalar loop cost even if the original
scalar loop had no control flow in its body. BlockFrequencyInfo here
can detect that if the block had no control flow, its probability of
being executed is 1 and so the scalar loop cost isn’t made cheaper
than it needs to be. I split off and landed this change separately,
<a href="http://lukelau.me/2024/07/17/how-to-land-a-change-to-llvm-in-20-easy-patches.html">since it makes the test changes easier to review</a>.</p>
<p>Now that the remaining changes in <code class="language-plaintext highlighter-rouge">llvm/test/Transforms/LoopVectorize</code>
looked more contained, I was almost ready to open a pull request. I
just wanted to quickly kick the tyres on
<a href="https://github.com/llvm/llvm-test-suite">llvm-test-suite</a> with a few
other targets, since this wasn’t a RISC-V specific change. The plan
was to quickly collect some stats on how many loops were vectorized,
check for any anomalies when compared to beforehand, and then be on
our way:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd llvm-test-suite
$ ninja -C build
...
[222/7278] Building C object External/...nchspec/CPU/500.perlbench_r/src/pp.c.o
FAILED: External/SPEC/CINT2017rate/500.perlbench_r/CMakeFiles/500.perlbench_r.dir/root/cpu2017/benchspec/CPU/500.perlbench_r/src/pp.c.o
/root/llvm-test-suite/build.x86_64-ReleaseLTO-a/tools/timeit --summary External/SPEC/CINT2017rate/500.perlbench_r/CMakeFiles/500.perlbench_r.dir/root/cpu2017/benchspec/CPU/500.perlbench_r/src/pp.c.o.time /root/llvm-project/build/bin/clang -DDOUBLE_SLASHES_SPECIAL=0 -DNDEBUG -DPERL_CORE -DSPEC -DSPEC_AUTO_BYTEORDER=0x12345678 -DSPEC_AUTO_SUPPRESS_OPENMP -DSPEC_CPU -DSPEC_LINUX -DSPEC_LINUX_X64 -DSPEC_LP64 -DSPEC_SUPPRESS_OPENMP -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGE_FILES -I/root/cpu2017/benchspec/CPU/500.perlbench_r/src -I/root/cpu2017/benchspec/CPU/500.perlbench_r/src/dist/IO -I/root/cpu2017/benchspec/CPU/500.perlbench_r/src/cpan/Time-HiRes -I/root/cpu2017/benchspec/CPU/500.perlbench_r/src/cpan/HTML-Parser -I/root/cpu2017/benchspec/CPU/500.perlbench_r/src/ext/re -I/root/cpu2017/benchspec/CPU/500.perlbench_r/src/specrand -march=x86-64-v3 -save-temps=obj -O3 -fomit-frame-pointer -flto -DNDEBUG -w -Werror=date-time -save-stats=obj -save-stats=obj -fno-strict-aliasing -MD -MT External/SPEC/CINT2017rate/500.perlbench_r/CMakeFiles/500.perlbench_r.dir/root/cpu2017/benchspec/CPU/500.perlbench_r/src/pp.c.o -MF External/SPEC/CINT2017rate/500.perlbench_r/CMakeFiles/500.perlbench_r.dir/root/cpu2017/benchspec/CPU/500.perlbench_r/src/pp.c.o.d -o External/SPEC/CINT2017rate/500.perlbench_r/CMakeFiles/500.perlbench_r.dir/root/cpu2017/benchspec/CPU/500.perlbench_r/src/pp.c.o -c /root/cpu2017/benchspec/CPU/500.perlbench_r/src/pp.c
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: /root/llvm-project/build/bin/clang-19 -cc1 -triple x86_64-unknown-linux-gnu -O3 -emit-llvm-bc -flto=full -flto-unit -save-temps=obj -disable-free -clear-ast-before-backend -main-file-name pp.c -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=none -relaxed-aliasing -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu x86-64-v3 -debugger-tuning=gdb -fdebug-compilation-dir=/root/llvm-test-suite/build.x86_64-ReleaseLTO-a -fcoverage-compilation-dir=/root/llvm-test-suite/build.x86_64-ReleaseLTO-a -resource-dir /root/llvm-project/build/lib/clang/23 -Werror=date-time -w -ferror-limit 19 -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -vectorize-loops -vectorize-slp -stats-file=External/SPEC/CINT2017rate/500.perlbench_r/CMakeFiles/500.perlbench_r.dir/root/cpu2017/benchspec/CPU/500.perlbench_r/src/pp.stats -faddrsig -fdwarf2-cfi-asm -o External/SPEC/CINT2017rate/500.perlbench_r/CMakeFiles/500.perlbench_r.dir/root/cpu2017/benchspec/CPU/500.perlbench_r/src/pp.c.o -x ir External/SPEC/CINT2017rate/500.perlbench_r/CMakeFiles/500.perlbench_r.dir/root/cpu2017/benchspec/CPU/500.perlbench_r/src/pp.bc
1. Optimizer
2. Running pass "function<eager-inv>(float2int,lower-constant-intrinsics,chr,loop(loop-rotate<header-duplication;prepare-for-lto>,loop-deletion),loop-distribute,inject-tli-mappings,loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>,infer-alignment,loop-load-elim,instcombine<max-iterations=1;no-verify-fixpoint>,simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-arithmetic;switch-to-lookup;no-keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,slp-vectorizer,vector-combine,instcombine<max-iterations=1;no-verify-fixpoint>,loop-unroll<O3>,transform-warning,sroa<preserve-cfg>,infer-alignment,instcombine<max-iterations=1;no-verify-fixpoint>,loop-mssa(licm<allowspeculation>),alignment-from-assumptions,loop-sink,instsimplify,div-rem-pairs,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;switch-to-arithmetic;no-switch-to-lookup;keep-loops;no-hoist-common-insts;hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;speculate-unpredictables>)" on module "External/SPEC/CINT2017rate/500.perlbench_r/CMakeFiles/500.perlbench_r.dir/root/cpu2017/benchspec/CPU/500.perlbench_r/src/pp.bc"
3. Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "Perl_pp_coreargs"
#0 0x0000556ff93ab158 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/root/llvm-project/build/bin/clang-19+0x2d5c158)
#1 0x0000556ff93a8835 llvm::sys::RunSignalHandlers() (/root/llvm-project/build/bin/clang-19+0x2d59835)
#2 0x0000556ff93abf01 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
#3 0x00007f305ce49df0 (/lib/x86_64-linux-gnu/libc.so.6+0x3fdf0)
#4 0x0000556ffaa0dbfb llvm::LoopVectorizationCostModel::expectedCost(llvm::ElementCount) (/root/llvm-project/build/bin/clang-19+0x43bebfb)
#5 0x0000556ffaa22a0d llvm::LoopVectorizationPlanner::computeBestVF() (/root/llvm-project/build/bin/clang-19+0x43d3a0d)
#6 0x0000556ffaa36f3b llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/root/llvm-project/build/bin/clang-19+0x43e7f3b)
#7 0x0000556ffaa413eb llvm::LoopVectorizePass::runImpl(llvm::Function&) (/root/llvm-project/build/bin/clang-19+0x43f23eb)
...
...
</code></pre></div></div>
<p>A crash when building for X86. No assertion message, but a backtrace
that points to the loop vectorizer cost model. Unfortunately this did
not turn out to be simple to debug and instead turned into a whole
other ordeal, so I’ll leave the details of that rabbit hole to the
next post. But in the meantime, here are some hints if you want to
guess what went wrong:</p>
<ul>
<li>The crash stems from a SIGFPE signal</li>
<li>It only occurs when building on X86. Building on AArch64 is
unaffected, even when cross-compiling to X86</li>
<li>It only occurs with LTO</li>
</ul>
<p>Hopefully this also gives a bit of insight into the type of upstream
work that we carry out at <a href="https://www.igalia.com">Igalia</a>. If you
have an LLVM or RISC-V project that we could help with, <a href="mailto:[email protected]">feel free to
reach out</a>.</p>
<div class="footnotes">
<ol>
<li id="fn:scalar-vplan">
<p>The scalar loop is also modeled in VPlan, but
currently costed with the legacy cost model and not the VPlan
itself. This is another <a href="https://github.com/llvm/llvm-project/blob/a51eab9492c9d79f5717975bfa659d91e27985a3/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L7182">load bearing TODO</a>. <a href="http://lukelau.me/2026/01/26/closing-the-gap-pt2.html#fnref:scalar-vplan" class="reversefootnote">↩</a></p>
</li>
<li id="fn:vplan-native">
<p>Whilst not enabled default, there is experimental
support for outer loop vectorization in the VPlan native path. <a href="http://lukelau.me/2026/01/26/closing-the-gap-pt2.html#fnref:vplan-native" class="reversefootnote">↩</a></p>
</li>
</ol>
</div> Luke Lauhttp://lukelau.me/Igalia Compilers Team: Legacy RegExp features in JavaScripthttps://blogs.igalia.com/compilers/2026/01/20/legacy-regexp-features-in-javascript/2026-01-20T00:00:00+00:00
<p>In June 2025, I joined the <a href="https://www.igalia.com/coding-experience/">Igalia Coding Experience</a> program. My role was to implement the TC39 proposal <a href="https://github.com/tc39/proposal-regexp-legacy-features">Legacy RegExp Features</a> in SpiderMonkey, the JavaScript engine in Mozilla Firefox. This wasn't my first proposal implementation. I'd already implemented the <a href="https://tc39.es/proposal-is-error/#sec-fundamental-objects">Error.isError</a> and <a href="https://spidermonkey.dev/blog/2025/03/05/iterator-range.html">Iterator.range</a> TC39 proposals in SpiderMonkey, but implementing the Legacy RegExp Features proposal involved delving deeper into the Mozilla codebase, and new challenges for me.</p>
<p>To begin with, I created an implementation plan with a timeline of how I was going to approach the proposal. Additionally, I added links to the codebase where I thought I was going to make changes as per the specification, which helped me have a clear starting point and path for integrating the feature. It also meant I could get feedback from SpiderMonkey developers before actually beginning the implementation.</p>
<p>The Legacy RegExp features proposal disables legacy static properties and RegExp.prototype.compile for instances of proper subclasses of RegExp as well as for cross-realm regexps.</p>
<p>The following operations are modified in SpiderMonkey:</p>
<h3 id="regexp-prototype-compile-pattern-flags" tabindex="-1">RegExp.prototype.compile(pattern, flags) <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/01/20/legacy-regexp-features-in-javascript/">#</a></h3>
<p>This method reinitializes an existing RegExp object with a new pattern and/or flags. It modifies the RegExp object in place rather than creating a new one.</p>
<p><strong>Modification:</strong> The proposal modifies <code>RegExp.prototype.compile</code> to throw errors for objects that are not direct instances of the RegExp as well as for cross-realm mismatches. The <code>compile()</code> method initializes a RegExp object similar to the way a RegExp literal is created, bypassing any preprocessing of the pattern that might be done by a RegExp subclass's constructor, and potentially breaking a subclass's custom "exec" method. Thus, compile is disallowed for subclasses. It is now forbidden for a RegExp compile method to be applied to a RegExp object belonging to a different realm, as this would typically result in static properties of the incorrect realm being updated.</p>
<p>Example of newly restricted behaviour:</p>
<pre class="language-bash" tabindex="0"><code class="language-bash"><span class="token punctuation">(</span>base<span class="token punctuation">)</span> $ ./mach run<br /> <span class="token number">0</span>:00.29 /Users/default/firefox/obj-aarch64-apple-darwin25.2.0/dist/bin/js<br />js<span class="token operator">></span> <span class="token builtin class-name">let</span> g <span class="token operator">=</span> newGlobal<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br />js<span class="token operator">></span> <span class="token builtin class-name">let</span> re <span class="token operator">=</span> g.RegExp<span class="token punctuation">(</span><span class="token string">"x"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br />js<span class="token operator">></span> RegExp.prototype.compile.call<span class="token punctuation">(</span>re<span class="token punctuation">)</span><span class="token punctuation">;</span><br />typein:3:26 TypeError: RegExp operation not permitted on object from different realm<br />Stack:<br /> @typein:3:26<br />js<span class="token operator">></span></code></pre>
<p>To explain each line of the JavaScript code in detail:</p>
<ul>
<li>
<p><code>let g</code> = <a href="https://searchfox.org/firefox-main/rev/6ece603789f6751c37c48b23f39dbbb16b290592/js/src/shell/js.cpp#7449"><code>newGlobal()</code></a> creates a new JavaScript global object in SpiderMonkey, similar to opening a new window in a browser. Each global object has its own realm.
A realm is a JavaScript execution context that contains its own set of global objects and built-in functions. Every object in SpiderMonkey has a realm pointer which identifies which realm it belongs to.</p>
</li>
<li>
<p><code>let re = g.RegExp(“x”)</code> creates a new RegExp object from <code>g</code>'s realm, with a distinct instance of the RegExp constructor. Although the object behaves like one created from <code>RegExp("x")</code>, the two are not wholly compatible with one another.</p>
</li>
<li>
<p><code>RegExp.prototype.compile.call(re)</code> invokes the <a href="https://searchfox.org/firefox-main/rev/e1eada69e2ddd86a398ccb141dcbf772254162eb/js/src/builtin/RegExp.cpp"><code>compile()</code></a> method with the regexp initialized above for a realm returned from newGlobal(). Per <a href="https://github.com/tc39/proposal-regexp-legacy-features?tab=readme-ov-file#regexpprototypecompile--pattern-flags-">step 5</a> of the modified <code>RegExp.prototype.compile()</code> algorithm in the proposal, this results in a <code>TypeError</code> exception being thrown.</p>
</li>
</ul>
<p>Initially, I added my changes in <a href="https://searchfox.org/firefox-main/rev/6ece603789f6751c37c48b23f39dbbb16b290592/js/src/builtin/RegExp.cpp#582"><code>regexp_compile_impl()</code></a>, but when testing with <code>./mach try auto</code>, the feature failed test262 cross-realm tests when run with the <code>ion eager</code> and <code>--more-compartments</code> flag. Debug output showed that when invoking the RegExp.prototype.compile(re)<code> both the receiver or (</code>this`) of the RegExp.prototype.compile() method, and the RegExp object were in the same realm while they weren’t. In other words, the cross-realm check was passing, when it should have been failing, according to the test expectations.</p>
<p>By the time execution reached <code>regexp_compile()</code>, the <code>CallNonGenericMethod<IsRegExpObject, regexp_compile_impl></code> wrapper had already processed the "receiver" or "this" of the compile method. According to the <a href="https://searchfox.org/firefox-main/rev/6ece603789f6751c37c48b23f39dbbb16b290592/js/public/CallNonGenericMethod.h#88-100">CallNonGenericMethod documentation</a>, if <code>args.thisv()</code> is not of the correct type, it will attempt to unwrap <code>this</code> and if successful, call the implementation function on the unwrapped <code>this</code>. For a bit of context on this, SpiderMonkey has a concept of <a href="https://searchfox.org/firefox-main/rev/6ece603789f6751c37c48b23f39dbbb16b290592/js/public/Wrapper.h#120-133">Wrapper</a> objects, which decorate an object in a sort of proxy membrane to provide security boundary enforcement. For instance, ensuring that a method can be invoked or a field can be written to from the presently entered compartment. Unwrapping an object means removing that proxy membrane, to access the actual object, similar to how you’d unwrap a gift. This can be done using <a href="https://searchfox.org/firefox-main/rev/6ece603789f6751c37c48b23f39dbbb16b290592/js/src/proxy/Wrapper.cpp#353"><code>js::CheckedUnwrapStatic()</code></a>.</p>
<p>With <code>--more-compartments</code>, <code>CallNonGenericMethod</code> in <code>regexp_compile()</code> was automatically unwrapping cross-compartment proxies through <code>CallMethodIfWrapped </code>before calling <code>regexp_compile_impl()</code>.</p>
<p>This unwrapping process also switched the JSContext to the target object's realm. This meant that by the time my realm checks executed in <code>regexp_compile_impl()</code>, both <code>cx->realm()</code> and the RegExp object's realm pointed to the same realm (the object's home realm), making them appear equal even in genuine cross-realm call scenarios where the original call came from a different realm.</p>
<p>So I moved the same-realm testing and [[LegacyFeaturesEnabled]] bit testing to <a href="https://searchfox.org/firefox-main/rev/6ece603789f6751c37c48b23f39dbbb16b290592/js/src/builtin/RegExp.cpp#647"><code>regexp_compile()</code></a>, just before <code>CallNonGenericMethod</code> is called and added <code>js::CheckedUnwrapStatic()</code> to unwrap any proxy wrappers before checking the realm. This ensures we’re checking the realm of the actual RegExp object and not the compartment wrappers around it.</p>
<h3 id="subclass-instances" tabindex="-1">Subclass Instances <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/01/20/legacy-regexp-features-in-javascript/">#</a></h3>
<p>As mentioned above, the RegExp method <code>RegExp.prototype.compile()</code> re-initializes a RegExp using a newly created matcher for the specified pattern and flags. The proposal adds some restrictions to this which prevent oddities such as subclasses not functioning as expected (for instance, by not preprocessing the pattern and adding context used by their <code>exec()</code> implementation). More importantly, when applied to a cross-realm object, this would result in execution modifying the static RegExp members for the incorrect realm.</p>
<p>The proposal modifies the behavior so that legacy static properties are only updated when direct instances of the built-in RegExp constructor are used, not subclass instances or cross-realm objects, using similar logic to <code>RegExp.prototype.compile()</code>:</p>
<blockquote>
<ol start="7">
<li>If SameValue(thisRealm, rRealm) is true, then
<ul>
<li>i. If the value of R’s [[LegacyFeaturesEnabled]] internal slot is true, then
<ul>
<li>a. Perform UpdateLegacyRegExpStaticProperties(%RegExp%, S, lastIndex, e, capturedValues).</li>
</ul>
</li>
<li>ii. Else,
<ul>
<li>a. Perform InvalidateLegacyRegExpStaticProperties(%RegExp%).</li>
</ul>
</li>
</ul>
</li>
</ol>
</blockquote>
<p>The properties are specced and implemented as accessors with a getter and no setter, except for <code>RegExp.input</code> (and its alias <code>RegExp.$_</code>), which remains writable. Inside each of the accessors, if the receiver <code>this</code> and the %RegExp% realm intrinsic (the standard RegExp constructor) are not the same, we throw a <code>TypeError</code>.</p>
<pre class="language-bash" tabindex="0"><code class="language-bash"><span class="token punctuation">(</span>base<span class="token punctuation">)</span> $ ./mach run<br /> <span class="token number">0</span>:00.28 /Users/default/firefox/obj-aarch64-apple-darwin25.2.0/dist/bin/js<br />js<span class="token operator">></span> /a<span class="token punctuation">(</span>b<span class="token punctuation">)</span>c/.exec<span class="token punctuation">(</span><span class="token string">"abc"</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <br /><span class="token punctuation">[</span><span class="token string">"abc"</span>, <span class="token string">"b"</span><span class="token punctuation">]</span><br />js<span class="token operator">></span> RegExp.<span class="token variable">$1</span> <br /><span class="token string">"b"</span><br />js<span class="token operator">></span> new RegExp<span class="token punctuation">(</span><span class="token string">"a(b)"</span><span class="token punctuation">)</span>.exec<span class="token punctuation">(</span><span class="token string">"ab"</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <br /><span class="token punctuation">[</span><span class="token string">"ab"</span>, <span class="token string">"b"</span><span class="token punctuation">]</span><br />js<span class="token operator">></span> RegExp.<span class="token variable">$1</span> <br /><span class="token string">"b"</span><br />js<span class="token operator">></span> new <span class="token punctuation">(</span>class extends RegExp <span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">(</span><span class="token string">"a(b)"</span><span class="token punctuation">)</span>.exec<span class="token punctuation">(</span><span class="token string">"ab"</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <br /><span class="token punctuation">[</span><span class="token string">"ab"</span>, <span class="token string">"b"</span><span class="token punctuation">]</span><br />js<span class="token operator">></span> RegExp.<span class="token variable">$1</span> <br />typein:6:1 TypeError: RegExp static property <span class="token string">'static_paren1_getter'</span> is invalid<br />Stack:<br /> @typein:6:1<br />js<span class="token operator">></span> </code></pre>
<pre class="language-bash" tabindex="0"><code class="language-bash">/a<span class="token punctuation">(</span>b<span class="token punctuation">)</span>c/.exec<span class="token punctuation">(</span><span class="token string">"abc"</span><span class="token punctuation">)</span><span class="token punctuation">;</span> RegExp.<span class="token variable">$1</span> // should <span class="token builtin class-name">return</span> <span class="token string">"b"</span><br />new RegExp<span class="token punctuation">(</span><span class="token string">"a(b)"</span><span class="token punctuation">)</span>.exec<span class="token punctuation">(</span><span class="token string">"ab"</span><span class="token punctuation">)</span><span class="token punctuation">;</span> RegExp.<span class="token variable">$1</span> // <span class="token string">"b"</span><br />new <span class="token punctuation">(</span>class extends RegExp <span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">(</span><span class="token string">"a(b)"</span><span class="token punctuation">)</span>.exec<span class="token punctuation">(</span><span class="token string">"ab"</span><span class="token punctuation">)</span><span class="token punctuation">;</span> RegExp.<span class="token variable">$1</span> // throws</code></pre>
<h3 id="normalisation-of-regexp-static-properties" tabindex="-1">Normalisation of RegExp Static Properties <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/01/20/legacy-regexp-features-in-javascript/">#</a></h3>
<p>RegExp static properties are now defined as configurable and non-enumerable. This is so that the associated features may be easily removed by using the JavaScript <code>delete</code> operator. This is important for consistency with modern ECMA262 and for allows for applications to further reduce the number of side-affect producing globals, including VM native methods.</p>
<p>In SpiderMonkey, the legacy static properties are defined <a href="https://searchfox.org/firefox-main/rev/6ece603789f6751c37c48b23f39dbbb16b290592/js/src/builtin/RegExp.cpp#1499-1552">in RegExp.cpp</a>. To implement the proposal, I enclosed the properties with a NIGHTLY_BUILD directive, removing the <code>JS_PROP_PERMANEN</code> and <code>JS_PROP_ENUMERATE</code> flags to make them configurable and non-enumerable for the Nightly environment, where they can be tested by the community. Outside of Nightly, we continue supporting the old implementation for beta/release environments.</p>
<p>Then, I updated the test262 AnnexB RegExp tests to support the change and to limit the tests to Nightly.</p>
<h2 id="understanding-the-implementation-challenges-and-solutions" tabindex="-1">Understanding the Implementation: Challenges and Solutions <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/01/20/legacy-regexp-features-in-javascript/">#</a></h2>
<h3 id="1-creative-bit-packing" tabindex="-1">1. Creative Bit Packing <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/01/20/legacy-regexp-features-in-javascript/">#</a></h3>
<p>Once the legacy RegExp statics were normalised, the next step was adding a <code>LegacyFeaturesEnabled</code> internal slot. This slot keeps a reference to its constructor and is checked whenever legacy features are accessed. If the <code>RegExp</code> is a subclass instance or is is associated with a different realm, the slot indicates that legacy features should throw an error.</p>
<p>Initially, I added the slot to the <a href="https://searchfox.org/firefox-main/rev/6ece603789f6751c37c48b23f39dbbb16b290592/js/src/vm/RegExpObject.h#43">RegExpObject </a>:</p>
<pre class="language-cpp" tabindex="0"><code class="language-cpp"><span class="token keyword">static</span> <span class="token keyword">const</span> <span class="token keyword">unsigned</span> LEGACY_FEATURES_ENABLED_SLOT <span class="token operator">=</span> <span class="token number">3</span><span class="token punctuation">;</span> </code></pre>
<p>This presented a couple of problems for me:</p>
<ul>
<li>
<p>The number of reserved slots must match the allocation kind defined in <a href="https://searchfox.org/firefox-main/rev/6ece603789f6751c37c48b23f39dbbb16b290592/js/src/gc/AllocKind.h#60">FOR_EACH_OBJECT_ALLOCKIND(D)</a>. The number of reserved slots increased to 5, which meant that I had to choose between OBJECT6 or OBJECT8. During implementation, I somehow missed OBJECT6 and went with OBJECT8.</p>
</li>
<li>
<p>I knew that I’d get some pushback in code review, as my changes increased the size of the RegExp Object by 32 bytes (four 8-byte slots). I could see that there was a way for <a href="https://searchfox.org/firefox-main/rev/6ece603789f6751c37c48b23f39dbbb16b290592/js/public/RegExpFlags.h">Boolean flags to share a slot</a> but I didn't know how to implement my changes without breaking the JIT.</p>
</li>
</ul>
<p>I decided to leave the implementation as is and wait for SpiderMonkey engineers / reviewers to give me feedback and their preference on how to add the Boolean.</p>
<p>During code review, my reviewer Iain pointed out that since we’re only storing a single bit of information (whether legacy features are enabled or not), and the existing <code>FLAGS_SLOT</code> only uses 8 bits, I could store the legacy features in the unused higher bits.</p>
<p>The slot implementation includes a getter, <code>bool legacyFeaturesEnabled()</code>, that reads the bit from the <code>FLAGS_SLOT</code>; and a setter, <code>setLegacyFeaturesEnabled(bool)</code>, that writes the bit to the <code>FLAGS_SLOT</code>.</p>
<p>The new approach involved defining some constants based on the size of RegExp Flags so that the code keeps working if RegExpFlags gets bigger in future:</p>
<pre class="language-js" tabindex="0"><code class="language-js"><span class="token keyword">static</span> <span class="token keyword">const</span> size_t RegExpFlagsMask <span class="token operator">=</span> <span class="token constant">JS</span><span class="token operator">:</span><span class="token operator">:</span>RegExpFlag<span class="token operator">:</span><span class="token operator">:</span>AllFlags<span class="token punctuation">;</span><br /><span class="token keyword">static</span> <span class="token keyword">const</span> size_t LegacyFeaturesEnabledBit <span class="token operator">=</span> <span class="token function">Bit</span><span class="token punctuation">(</span><span class="token number">8</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br /><br /><span class="token function">static_assert</span><span class="token punctuation">(</span><span class="token punctuation">(</span>RegExpFlagsMask <span class="token operator">&</span> LegacyFeaturesEnabledBit<span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">,</span><br /> <span class="token string">"LegacyFeaturesEnabledBit must not overlap"</span><span class="token punctuation">)</span><span class="token punctuation">;</span></code></pre>
<p><a href="https://searchfox.org/firefox-main/rev/6ece603789f6751c37c48b23f39dbbb16b290592/js/src/vm/RegExpObject.h#65">RegExpFlagsMask</a> has a bit set to 1 if that bit is part of the RegExpFlags, and 0 otherwise. The lowest 8 bits are currently set to other RegExp flags, which leaves us with the highest bits to pack our slot in.</p>
<p>We perform two operations: <code>raw & RegExpFlagsMask</code>, which gets only the traditional RegExp flags; and <code>raw & ~RegExpFlagsMask</code>, which gets everything apart from the RegExp flags.Those are bits 0-7. We use bit 8 to store <code>LegacyFeaturesEnabled</code>.
When we read the flags, we mask off any bits that are not part of the RegExpFlags.</p>
<pre class="language-js" tabindex="0"><code class="language-js"><span class="token keyword">return</span> <span class="token constant">JS</span><span class="token operator">:</span><span class="token operator">:</span><span class="token function">RegExpFlags</span><span class="token punctuation">(</span>raw <span class="token operator">&</span> RegExpFlagsMask<span class="token punctuation">)</span><span class="token punctuation">;</span></code></pre>
<p>When we write to the flags, we combine the new value of the RegExpFlags bits <code>(flags.value())</code> with the old value of the other bits in <code>(raw & RegExpFlagsMask)</code>.</p>
<pre class="language-js" tabindex="0"><code class="language-js">uint32_t newValue <span class="token operator">=</span> flags<span class="token punctuation">.</span><span class="token function">value</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">|</span> <span class="token punctuation">(</span>raw <span class="token operator">&</span> <span class="token operator">~</span>RegExpFlagsMask<span class="token punctuation">)</span><span class="token punctuation">;</span><br /><span class="token function">setFixedSlot</span><span class="token punctuation">(</span><span class="token constant">FLAGS_SLOT</span><span class="token punctuation">,</span> <span class="token function">Int32Value</span><span class="token punctuation">(</span>newValue<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span></code></pre>
<p>When we read the <code>LegacyFeaturesEnabledBit</code>, we check if it’s set. When we write it, we take the existing raw value and either set or clear the <code>LegacyFeaturesEnabledBit</code>.</p>
<h3 id="2-lazy-evaluation" tabindex="-1">2. Lazy Evaluation <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/01/20/legacy-regexp-features-in-javascript/">#</a></h3>
<p>The proposal specifies RegExp properties as internal slots of the RegExp Object, and the abstract operations <code>UpdateLegacyRegExpStaticProperties (C, S, startIndex, endIndex, capturedValues)</code> and <code>InvalidateLegacyRegExpStaticProperties(C)</code> were initially confusing. The confusion came from a specification detail: we need to eagerly update the properties at a specific point in time, as opposed to SpiderMonkey’s lazily evaluated implementation.</p>
<p>It was the first time I had come across lazy evaluation and thought, naively, that it would be possible to change the implementation to eagerly update static properties after a successful match. This didn't work for a few reasons.</p>
<p>First, lazy evaluation is heavily embedded in the JIT, so the idea of just changing that was… ambitious. Second, lazy evaluation is a way to defer regexp evaluation until RegExp properties are accessed. Third, there’s no observable difference to the end user whether the RegExp properties were lazily or eagerly evaluated. Lastly, internal slots are a way for ECMA262 to describe the internal state of the object.</p>
<p>So, <code>UpdateLegacyRegExpStaticProperties (C, S, startIndex, endIndex, capturedValues)</code> wasn’t needed, as it codifies already existing behaviour in SpiderMonkey. For <code>InvalidateLegacyRegExpStaticProperties(C)</code>, my mentor suggested implementing it as a <a href="https://searchfox.org/firefox-main/rev/6ece603789f6751c37c48b23f39dbbb16b290592/js/src/vm/RegExpStatics.h#17">boolean flag in RegExpStatics</a>.</p>
<p>When a subclass or cross-realm regexp executes, this flag is set to true, preventing legacy static properties from being accessed. The flag is cleared after normal RegExp executions, allowing legacy features to work for standard RegExp instances.</p>
<p>Because <code>InvalidateLegacyRegExpStaticProperties(C)</code> marks the values of the static properties as unavailable by setting the internal slots to empty, in step 4 of the accessors <code>GetLegacyRegExpStaticProperty(C, thisValue, internalSlotName)</code>, we throw a TypeError if the static properties are invalidated.</p>
<p>Then, we add the equivalent code in the <a href="https://searchfox.org/firefox-main/rev/6ece603789f6751c37c48b23f39dbbb16b290592/js/src/jit/CodeGenerator.cpp#2229-2243">JIT path</a> and so that when a regexp is executed, we lazily store enough information to be able to rerun the regexp later if the RegExpStatics are accessed.</p>
<h3 id="3-gating-the-implementation-behind-a-preference" tabindex="-1">3. Gating the implementation behind a preference <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/01/20/legacy-regexp-features-in-javascript/">#</a></h3>
<p>The first step to implementing a TC39 proposal in SpiderMonkey is adding a preference for it. This allows the feature to be enabled or disabled at runtime, which is important in gating the feature until it has been tested enough for release.</p>
<p>With this proposal, it was awkward, because this was not a new syntax or library method, but behavioral modifications to the existing RegExp static properties and the <code>compile()</code> method.</p>
<p>At first, I enclosed my changes in an <code>#ifdef NIGHTLY_BUILD</code> directive so that they are only available in the nightly environment. But given the potential for web compatibility risks, we needed to put the changes behind a preference. That way, we can flip the feature back in case we break something.</p>
<p>This created an awkward situation: the static RegExp properties themselves (like <code>RegExp.$1, RegExp.input</code>) are defined in <code>regexp_static_props</code>, which is baked into the static RegExp JSClass and embedded in the binary at compile time. I ended up wrapping these property definitions in an <code>#ifdef NIGHTLY_BUILD</code>, meaning they only exist in Nightly builds.</p>
<p>But the behavior of these properties — that is, whether accessing them should throw errors for subclasses and cross-realm regexps — is gated behind a runtime preference. This is even more awkward, because it will change behaviour in Nightly even without the preference enabled.</p>
<p>Thus, the preference only controls whether the new throwing behavior is active. As Iain noted, there wasn't a particularly clean way to avoid this. We'd need two parallel RegExp classes and then have to switch between them at runtime based on the pref, which seemed like overkill.</p>
<p>The compromise was to ship the properties in Nightly, use the preference to control the new behavior, and rely on extra-careful testing.</p>
<h3 id="4-wild-goose-chase" tabindex="-1">4. Wild Goose Chase <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/01/20/legacy-regexp-features-in-javascript/">#</a></h3>
<p>Around August, when I had the initial implementation working without memory optimization or centralized legacy and realm checks, I was updating legacy regexp statics in <code>RegExpBuiltinExec()</code> only when matches succeeded.</p>
<p><code>RegExpBuiltinExec()</code> has two execution paths: a <code>forTest</code> path for <code>RegExp.prototype.test</code> (where we can skip allocating a result object) and a normal path for full execution. I had legacy feature validation in both paths, but only for successful matches.</p>
<p>My mentor suggested we needed to update the legacy regexp statics not just on success, but also on failure. That made sense from a spec perspective, so I spent the next week and a half trying to figure out how to implement this. I was looking into the execution paths, trying to understand where and how to trigger updates on failed matches.</p>
<p>After about a week, we realized that they had misread the proposal! Oops. Turns out, SpiderMonkey doesn't update legacy regexp properties on failure at all: it just returns the last successful result. I'd been chasing a solution to a problem that didn't actually exist in the implementation.</p>
<h3 id="next-steps-and-final-thoughts" tabindex="-1">Next Steps and Final Thoughts <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/01/20/legacy-regexp-features-in-javascript/">#</a></h3>
<p>The "Legacy RegExp features in JavaScript" proposal is, at the time of this writing, in <a href="https://tc39.es/process-document/">stage 3</a> of the TC39 process, meaning the proposal is stable and no further changes can be made to it. There are potential backward compatibility risks and any attempt to use a disabled feature will throw a Type Error. More on that can be found in <a href="https://github.com/tc39/proposal-regexp-legacy-features/blob/918a4b09723b34e4f857f10b4576028a8a02e97d/web-breaking-hazards.md">the Breaking Hazards portion of the proposal</a>.</p>
<p>Before implementing this proposal I had briefly interacted with C++ on a production level codebase when working on the Error.isError proposal, but working on legacy RegExp properties was a deeper dive into C++ and browser internals, which was difficult but also very much appreciated!</p>
<p>Working on this proposal exposed gaps in my knowledge but also gave me confidence in navigating large C++ codebases. I’m particularly grateful to my mentor, and Daniel Minor and Iain Ireland (from the SpiderMonkey team) for pointing me in the right direction and brainstorming solutions with me.</p>
<h3 id="you-may-also-like" tabindex="-1">You may also like: <a class="header-anchor" href="https://blogs.igalia.com/compilers/2026/01/20/legacy-regexp-features-in-javascript/">#</a></h3>
<p><a href="https://hacks.mozilla.org/2020/06/a-new-regexp-engine-in-spidermonkey/">A New RegExp Engine in SpiderMonkey</a></p>
<p><a href="https://spidermonkey.dev/blog/2025/03/05/iterator-range.html">Implementing Iterator.range in SpiderMonkey</a></p>
<hr /> Igalia Compilers Teamhttps://blogs.igalia.com/compilers/Igalia WebKit Team: WebKit Igalia Periodical #53https://blogs.igalia.com/webkit/blog/2026/wip-53/2026-01-19T19:25:32+00:00
<p>Update on what happened in WebKit in the week from December 26 to January 19.</p>
<p>
We're back! The first periodical of 2026 brings you performance optimizations, improvements to the memory footprint calculation, new APIs, the removal of the legacy Qt5 WPE backend, and as always, progress on JSC's Temporal implementation.
</p>
<h2 id="cross-port-cat">Cross-Port 🐱</h2>
<div class="wip-item">
<p>The memory footprint calculation mechanism <a rel="external" href="https://github.com/WebKit/WebKit/pull/56493">has been unified</a> across GTK, JSC, and WPE ports. Therefore, the expensive <code>/proc/self/smaps</code> is not used anymore and the WPE uses <code>/proc/self/statm</code> with extra cache now to prevent frequent file reading.</p>
</div>
<div class="wip-item">
<p><a rel="external" href="https://commits.webkit.org/305444@main">Added</a> a new <code>webkit_context_menu_get_position()</code> function to the API that allows obtaining the pointer coordinates, relative to the web view origin, at the moment when a context menu was triggered.</p>
<p>Additionally, behaviour of context menus <a rel="external" href="https://commits.webkit.org/305461@main">has been made more consistent</a> between the GTK and WPE ports, and handling of <code>GAction</code> objects attached to menu items has been <a rel="external" href="https://commits.webkit.org/305267@main">rewritten</a> and <a rel="external" href="https://commits.webkit.org/305504@main">improved</a> with the goal of better supporting context menus in the WPE port.</p>
</div>
<h3 id="javascriptcore-fish">JavaScriptCore 🐟</h3>
<div class="wip-description">
<p>The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.</p>
</div>
<div class="wip-item">
<p>In JavaScriptCore's implementation of Temporal, <a rel="external" href="https://github.com/WebKit/WebKit/pull/56210/">fixed a bug</a> in <code>Temporal.PlainTime.from</code> that read options in the wrong order, which caused a test262 test to fail.</p>
</div>
<div class="wip-item">
<p>In JavaScriptCore's implementation of Temporal, <a rel="external" href="https://github.com/WebKit/WebKit/pull/56460">fixed several bugs</a> in <code>PlainYearMonth</code> methods and enabled all <code>PlainYearMonth</code> tests that don't depend on the <code>Intl</code> object. This completes the implementation of Temporal <code>PlainYearMonth</code> objects in JSC.</p>
</div>
<h3 id="graphics-frame-photo">Graphics 🖼️</h3>
<div class="wip-item">
<p>In WebKit's Skia graphics backend, <a rel="external" href="https://commits.webkit.org/304898@main">fixed GrDirectContext management</a> for GPU resources. Operations on GPU-backed resources must use the context that created them, not the current thread's context. The fix stores <code>GrDirectContext</code> at creation time for <code>NativeImage</code> and uses <code>surface->recordingContext()->asDirectContext()</code> for SkSurface, correcting multiple call sites that previously used the shared display's context incorrectly.</p>
</div>
<div class="wip-item">
<p>Damage propagation <a rel="external" href="https://github.com/WebKit/WebKit/pull/55697">has been added</a> to the recently-added, non-composited mode in WPE.</p>
</div>
<div class="wip-item">
<p>In WebKit's Skia graphics backend for GTK/WPE, <a rel="external" href="https://commits.webkit.org/305273@main">added canvas 2D operation recording</a> for GPU-accelerated rendering. Instead of executing drawing commands immediately, operations are recorded into an <code>SkPicture</code> and replayed in batch when the canvas contents are needed, reducing GPU state change overhead for workloads with many small drawing operations, improving the MotionMark <em>Canvas Lines</em> performance on embedded devices with low-end tiled GPUs.</p>
</div>
<h2 id="wpe-webkit-pager">WPE WebKit 📟</h2>
<div class="wip-item">
<p>Due to Qt5 not receiving maintenance since mid-2025, the WPE Qt5 binding that used the legacy libwpe API <a rel="external" href="https://commits.webkit.org/305824@main">has been removed</a> from the tree. The Qt6 binding <a rel="external" href="https://github.com/WebKit/WebKit/tree/main/Source/WebKit/UIProcess/API/wpe/qt6">remains part of the source tree</a>, which is a better alternative that allows using supported Qt versions, and is built atop the new WPEPlatform API, making it a future-proof option. The WPE Qt API may be enabled when configuring the build with CMake, using the <code>ENABLE_WPE_QT_API</code> option.</p>
</div>
<h3 id="wpe-platform-api-jigsaw">WPE Platform API 🧩</h3>
<div class="wip-description">
<p>New, modern platform API that supersedes usage of libwpe and WPE backends.</p>
</div>
<div class="wip-item">
<p>The <code>WPEScreenSyncObserver</code> class has been improved to <a rel="external" href="https://commits.webkit.org/305509@main">support multiple callbacks</a>. Instead of a single callback set with <code>wpe_screen_sync_observer_set_callback()</code>, clients of the API can now use <code>wpe_screen_sync_observer_add_callback()</code> and <code>wpe_screen_sync_observer_remove_callback()</code>. The observer will be paused automatically when there are no callbacks attached to it.</p>
</div>
<div class="wip-end">
<p>That’s all for this week!</p>
</div> Igalia WebKit Teamhttps://blogs.igalia.com/webkitManuel Rego: Servo 2025 Statshttps://blogs.igalia.com/mrego/servo-2025-stats/2026-01-14T00:00:00+00:00
<p>This is a brief blog post to highlight the growth of the Servo community in recent years, particularly since <a href="https://igalia.com">Igalia</a> took over the project maintenance in 2023.</p>
<p>Note that this doesn’t talk about the technical achievements, though there have been tons of them in the last years. <em>A picture is worth a thousand words</em> so just take a look at <a href="https://blogs.igalia.com/mrego/servo-a-new-web-engine-written-in-rust/">this slide from my latest Servo talk</a> which shows how <a href="https://www.google.com/">google.com</a> was rendered with Servo at the beginning of 2023 vs September 2025.</p>
<figure>
<p><img src="https://blogs.igalia.com/mrego/files/2025/09/servo-talk-at-gosim/slide-11.png" alt="Slide showing screenshots of Servo rendering google.com in January 2025 vs September 2025" /></p>
<figcaption>Slide showing screenshots of Servo rendering google.com in January 2023 vs September 2025</figcaption>
</figure>
<h2 id="prs-numbers" tabindex="-1">PRs numbers <a class="header-anchor" href="https://blogs.igalia.com/mrego/servo-2025-stats/">#</a></h2>
<p>So like <a href="https://blogs.igalia.com/mrego/servo-revival-2023-2024/">we did last year</a>, let’s take a look at the PRs merged on the main <a href="https://github.com/servo/servo">Servo repository on GitHub</a> since 2018.</p>
<div>
<table>
<thead>
<tr>
<th></th>
<th>2018</th>
<th>2019</th>
<th>2020</th>
<th>2021</th>
<th>2022</th>
<th>2023</th>
<th>2024</th>
<th>2025</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>PRs</strong></td>
<td>1,188</td>
<td>986</td>
<td>669</td>
<td>118</td>
<td>65</td>
<td>776</td>
<td>1,771</td>
<td>3,183</td>
</tr>
<tr>
<td><strong>Contributors</strong></td>
<td>27.33</td>
<td>27.17</td>
<td>14.75</td>
<td>4.92</td>
<td>2.83</td>
<td>11.33</td>
<td>26.33</td>
<td>42.42</td>
</tr>
<tr>
<td><strong>Contributors ≥ 10</strong></td>
<td>2.58</td>
<td>1.67</td>
<td>1.17</td>
<td>0.08</td>
<td>0.00</td>
<td>1.58</td>
<td>4.67</td>
<td>8.50</td>
</tr>
</tbody>
</table>
</div>
<ul>
<li><strong>PRs</strong>: total number of PRs merged.</li>
<li><strong>Contributors</strong>: average number of contributors per month.</li>
<li><strong>Contributors ≥ 10</strong>: average number of contributors that have merged more than 10 PRs per month.</li>
</ul>
<p>As a clarification, these numbers don’t include PRs from bots (<code>dependabot</code> and <code>Servo WPT Sync</code>).</p>
<p>Checking this we can see we are close to <strong>double the numbers from last year</strong>! The numbers in 2025 are way bigger than in the previous years (even checking the numbers from 2018-2019), showing a healthy community working on Servo.</p>
<p></p>
<p>The next chart is a different view of the same data but split per month, with the number of PRs landed every month, the number of contributors and the number of contributors with more than 10 patches. It shows the evolution over the years and the high activity last year.</p>
<p></p>
<h2 id="number-of-contributors" tabindex="-1">Number of contributors <a class="header-anchor" href="https://blogs.igalia.com/mrego/servo-2025-stats/">#</a></h2>
<p>Now let’s focus on the last 3 years, since the project reactivation, and the numbers of contributors to the Servo project.</p>
<div>
<table>
<thead>
<tr>
<th></th>
<th>2023</th>
<th>2024</th>
<th>2025</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Contributors</strong></td>
<td>54</td>
<td>129</td>
<td>146</td>
</tr>
<tr>
<td><strong>≥ 100 PRs</strong></td>
<td>1 (2%)</td>
<td>3 (2%)</td>
<td>8 (5%)</td>
</tr>
<tr>
<td><strong>≥ 10 PRs</strong></td>
<td>8 (15%)</td>
<td>29 (22%)</td>
<td>43 (29%)</td>
</tr>
<tr>
<td><strong>Only 1 PR</strong></td>
<td>31 (57%)</td>
<td>53 (41%)</td>
<td>55 (38%)</td>
</tr>
</tbody>
</table>
</div>
<p>The number of contributors to Servo has tripled since 2023, reaching <strong>146 different contributors in 2025</strong>.</p>
<p>If we analyze the rest of the data in this table, we can see that the percentage of contributors that do a single PR to Servo in a year has been reduced, meaning that Servo contributors are now usually doing more than one PR to the project.</p>
<p>If we check the number of contributors that have done more than 10 PRs in a year, we see the percentage almost doubling from 15% to 29% in the last 3 years.</p>
<p>And for the top contributors doing more than 100 PRs in a year, we have gone from 1 in 2023 and 3 in 2024 to 8 last year, which represent the 5% of the Servo contributors, showing a good team of very active contributors to the project.</p>
<h2 id="wpt-pass-rate" tabindex="-1">WPT pass-rate <a class="header-anchor" href="https://blogs.igalia.com/mrego/servo-2025-stats/">#</a></h2>
<p>Let’s take a look at <a href="https://web-platform-tests.org/">WPT</a> evolution in 2025.</p>
<div>
<table>
<thead>
<tr>
<th>2025</th>
<th>January 1st</th>
<th>December 31st</th>
<th>Diff</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Score %</strong></td>
<td>48.2%</td>
<td>61.6%</td>
<td><strong>+13.4pp</strong></td>
</tr>
<tr>
<td><strong>Subtests (passed/total)</strong></td>
<td>1396647/1998146</td>
<td>1866247/1998146</td>
<td><strong>+469,600</strong></td>
</tr>
<tr>
<td><strong>Subtests %</strong></td>
<td>69.9%</td>
<td>93.4%</td>
<td><strong>+23.5pp</strong></td>
</tr>
</tbody>
</table>
</div>
<figure>
<p><img src="https://blogs.igalia.com/mrego/files/2026/01/wpt.png" alt="Evolution of WPT pass rates for Servo in 2025" /></p>
<figcaption>Evolution of WPT pass rates for Servo in 2025</figcaption>
</figure>
<p>You can check more information about WPT pass-rates at <a href="https://servo.org/wpt/">Servo’s website</a> (where you can also find an explanation of the <em>Score</em> number).</p>
<p>Note that these numbers differ from <a href="https://wpt.fyi/">wpt.fyi</a> because we’re still not running all the WPT tests in Servo, so the total numbers here are smaller.</p>
<p>It’s not easy to extract conclusions from this data, but it shows the Servo project keeps progressing and supporting more web platform features as time passes.</p>
<p>Sometimes these numbers grow artificially as new tests are added to WPT for features that Servo already supports (for example, the biggest jump last year was in October getting 188,281 new subtests passing without any change in Servo, just because new tests were added to WPT).</p>
<h2 id="github-stars" tabindex="-1">GitHub stars <a class="header-anchor" href="https://blogs.igalia.com/mrego/servo-2025-stats/">#</a></h2>
<figure>
<p><img src="https://blogs.igalia.com/mrego/files/2026/01/github-stars.png" alt="Evolution of GitHub stars for Servo from https://www.star-history.com/#servo/servo" /></p>
<figcaption>Evolution of GitHub stars for Servo from <a href="https://www.star-history.com/#servo/servo">star-history.com</a></figcaption>
</figure>
<p>We are about to reach <strong>35,000 stars on GitHub</strong>. It’s good to see the project has not stopped growing since the beginning, and the curve has become steeper in recent years.</p>
<h2 id="other" tabindex="-1">Other <a class="header-anchor" href="https://blogs.igalia.com/mrego/servo-2025-stats/">#</a></h2>
<p>If we check to the <a href="https://github.com/servo/project/blob/main/governance/README.md">official project roles</a>, we have now:</p>
<ul>
<li>5 administrators</li>
<li>17 TSC members</li>
<li>25 maintainers</li>
<li>18 contributors</li>
</ul>
<p>We have also started doing <a href="https://servo.org/blog/2025/10/20/servo-0.0.1-release/"><strong>Servo releases</strong></a>, we have done <a href="https://github.com/servo/servo/releases">3 so far</a>.</p>
<p>Also the TSC has setup <a href="https://servo.org/blog/2025/11/21/sponsorship-tiers/">sponsorship tiers</a> for donations. We got <a href="https://servo.org/#acknowledgements"><strong>4 bronze sponsors</strong></a> in 2025 and we hope to increase the number of sponsorships in 2026.</p>
<p>Regarding donations, we have defined a <a href="https://github.com/servo/project/blob/main/FUNDING_REQUEST.md"><strong>funding process</strong></a> to request usage of that money. We are currently using it to sponsor <a href="https://servo.org/blog/2025/09/17/your-donations-at-work-funding-jdm/">Josh Matthews’ contributions</a>, and <a href="https://www.azabani.com/2025/12/18/shoestring-web-engine-ci.html">pay for self-hosted runners to speed up CI times</a>.</p>
<p>Servo has been present in several events last year, we ended up giving <a href="https://servo.org/about/"><strong>10 talks</strong></a> all around the globe.</p>
<h2 id="wrap-up" tabindex="-1">Wrap-up <a class="header-anchor" href="https://blogs.igalia.com/mrego/servo-2025-stats/">#</a></h2>
<p>The idea here was to do a quick recap of the Servo stats in 2025. Taking a look at these numbers every now and then is useful, and gives you a different perspective about the status of the project, that one can easily ignore during the day-to-day tasks.</p>
<p>In general things have grown a lot in 2025, who knows what would happen in 2026, but we hope we can at least keep similar numbers or maybe even keep growing them further. That would be really great news for the Servo project.</p>
<p>Igalia is really proud of what the whole Servo community has achieved together in the recent years, and we hope for a bright future for the project going forward.</p>
<p>As an aside note, by the end of the month I’ll be at <a href="https://pretalx.fosdem.org/fosdem-2026/talk/review/PQPRDZ8DM7L8SYHBKGNZUJZUWGEXQTTP">FOSDEM talking about Servo</a>, other Servo folks like <a href="https://www.igalia.com/team/dazabani">Delan Azabani</a> and <a href="https://www.igalia.com/team/mrobinson">Martin Robinson</a> will also be there. If you are around, don’t hesitate to say hi and ask anything about the project.</p> Manuel Regohttps://blogs.igalia.com/mrego/Miyoung Shin: Our Journey to support Extensions for embeddershttps://blogs.igalia.com/mshin/?p=1452026-01-13T02:00:28+00:00
<h1 class="wp-block-heading"><strong>A History of Extensions for Embedders — and Where We’re Heading</strong></h1>
<p>Chromium’s Extensions platform has long been a foundational part of the desktop browsing experience. Major Chromium-based browsers—such as Chrome and Microsoft Edge—ship with full support for the Chrome Extensions ecosystem, and user expectations around extension availability and compatibility continue to grow.</p>
<p>In contrast, some Chromium embedders— for instance, products built directly on the //content API without the full //chrome stack—do not naturally have access to Extensions. Similarly, the traditional Chrome for Android app does not support Extensions. While some embedders have attempted to enable limited Extensions functionality by pulling in selected pieces of the //chrome layer, this approach is heavyweight, difficult to maintain, and fundamentally incapable of delivering full feature parity.</p>
<p>At Igalia we have been willing to help on the long term-goal of making Extensions usable on lightweight, //content-based products, without requiring embedders to depend on //chrome. This post outlines the background of that effort, the phases of work so far, the architectural challenges involved, and where the project is headed.</p>
<pre class="wp-block-preformatted"><strong>Note:</strong> ChromeOS supporting extensions (ChromeOS has <a href="https://blog.chromium.org/2024/06/building-faster-smarter-chromebook.html">announced</a> plans to incorporate more of the Android build stack) is not the same thing as Chrome-Android App supporting extensions. The two codepaths and platform constraints differ significantly. While the traditional Chrome app on Android phones and tablets still does not officially support extensions, recent beta builds of desktop-class Chrome on Android have begun to close this gap by enabling native extension installation and execution.<br /><br />Tracking bug: <a href="https://issues.chromium.org/issues/356905053">https://issues.chromium.org/issues/356905053</a></pre>
<h3 class="wp-block-heading">Extensions Architecture — Layered View</h3>
<p>The following diagram illustrates the architectural evolution of Extensions support for Chromium embedders.</p>
<h4 class="wp-block-heading">Traditional Chromium Browser Stack</h4>
<p>At the top of the stack, Chromium-based browsers such as Chrome and Edge rely on the full <code>//chrome</code> layer. Historically, the Extensions platform has lived deeply inside this layer, tightly coupled with Chrome-specific concepts such as <code>Profile</code>, browser windows, UI surfaces, and Chrome services.</p>
<pre class="wp-block-code"><code>+-----------------------+
| //chrome |
| (UI, Browser, etc.) |
+-----------------------+
| //extensions |
+-----------------------+
| //content |
+-----------------------+
</code></pre>
<p>This architecture works well for full browsers, but it is problematic for embedders. Products built directly on <code>//content</code> cannot reuse Extensions without pulling in a large portion of <code>//chrome</code>, leading to high integration and maintenance costs.</p>
<hr class="wp-block-separator has-alpha-channel-opacity" />
<h3 class="wp-block-heading"><strong>Phase 1 — Extensions on Android (Downstream Work)</strong></h3>
<p>In 2023, a downstream project at Igalia required extension support on a Chromium-based <strong>Android</strong> application. The scope was limited—we only needed to support a small number of specific extensions—so we implemented:</p>
<ul class="wp-block-list">
<li>basic installation logic,</li>
<li>manifest handling,</li>
<li>extension launch/execution flows, and</li>
<li>a minimal subset of Extensions APIs that those extensions depended on.</li>
</ul>
<p>This work demonstrated that Extensions <em>can</em> function in an Android environment. However, it also highlighted a major problem: <strong>modifying the Android <code>//chrome</code> codepath is expensive</strong>. Rebasing costs are high, upstream alignment is difficult, and the resulting solution is tightly coupled to Chrome-specific abstractions. The approach was viable only because the downstream requirements were narrow and controlled.</p>
<p>I shared this experience at <a href="https://youtu.be/FYYi_XiyL74?si=jG6ZrqZANRygo6AJ&t=300"><strong>BlinkOn Lightning Talk: “Extensions on Android”</strong></a>.</p>
<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
</div></figure>
<hr class="wp-block-separator has-alpha-channel-opacity" />
<h3 class="wp-block-heading">Phase 2 — Extensions for Embedders <br />( //content + //extensions + //components/extensions )</h3>
<p>Following Phase 1, we began asking a broader question:</p>
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><em>Can we provide a reusable, upstream-friendly Extensions implementation that works for embedders without pulling in the <code>//chrome</code> layer?</em></p>
</blockquote>
<h4 class="wp-block-heading">Motivation</h4>
<p>Many embedders aim to remain as lightweight as possible. Requiring <code>//chrome</code> introduces unnecessary complexity, long build times, and ongoing maintenance costs. Our hypothesis was that <strong>large portions of the Extensions stack could be decoupled from Chrome and reused directly by content-based products</strong>.</p>
<p>One early idea was to componentize the Extensions code by migrating substantial parts of <code>//chrome/*/extensions</code> into <code>//components/extensions</code>.</p>
<pre class="wp-block-code"><code>+-------------------------+
| //components/extensions |
+-------------------------+
| //extensions |
+-------------------------+
| //content |
+-------------------------+</code></pre>
<h4 class="wp-block-heading">Proof-of-concept : Wolvic</h4>
<p>We tested this idea through <a href="https://www.wolvic.com/en/">Wolvic </a>, a VR browser used in several commercial<br />solutions. Wolvic has two implementations:</p>
<ul class="wp-block-list">
<li>a Gecko-based version, and </li>
<li>a Chromium-based version built directly on the <code>//content</code> API.</li>
</ul>
<p><br />Originally, Extensions were already supported in Wolvic-Gecko, but not in Wolvic-Chromium. To close that gap, we migrated core pieces of the Extensions machinery into <code>//components/extensions</code> and enabled extension loading and execution in a content-only environment.</p>
<p>By early 2025, this work successfully demonstrated that Extensions could run without the <code>//chrome</code> layer.</p>
<p>Demo video::<br /><a href="https://youtube.com/shorts/JmQnpC-lxR8?si=Xf0uB6q__j4pmlSj">https://youtube.com/shorts/JmQnpC-lxR8?si=Xf0uB6q__j4pmlSj</a></p>
<p>Design document:<br /><a href="https://docs.google.com/document/d/1I5p4B0XpypR7inPqq1ZnGMP4k-IGeOpKGvCFS0EDWHk/edit?usp=sharing">https://docs.google.com/document/d/1I5p4B0XpypR7inPqq1ZnGMP4k-IGeOpKGvCFS0EDWHk/edit?usp=sharing</a></p>
<p>However, this work lived entirely in the Wolvic repository, which is a fork of Chromium. While open source, this meant that other embedders could not easily benefit without additional rebasing and integration work.</p>
<p>This raised an important question:</p>
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Why not do this work directly in the Chromium upstream so that all embedders can benefit?</p>
</blockquote>
<hr class="wp-block-separator has-alpha-channel-opacity" />
<h3 class="wp-block-heading">Phase 3 — Extensions for Embedders<br />(//content + //extensions)</h3>
<p>Following discussions with the Extensions owner ([email protected]), we refined the approach further.</p>
<p>Rather than migrating functionality into <code>//components</code>, the preferred long-term direction is to <strong>move Extensions logic directly into the <code>//extensions</code> layer wherever possible</strong>.</p>
<pre class="wp-block-code"><code>+-----------------------+
| Embedder UI | (minimal interfaces)
+-----------------------+
| //extensions |
+-----------------------+
| //content |
+-----------------------+</code></pre>
<p>This approach offers several advantages:</p>
<ul class="wp-block-list">
<li>clearer layering and ownership,</li>
<li>fewer architectural violations,</li>
<li>reduced duplication between Chrome and embedders,</li>
<li>a cleaner API surface for integration.</li>
</ul>
<p>We aligned on this direction and began upstream work accordingly.</p>
<p>Tracking bug: <img src="https://s.w.org/images/core/emoji/16.0.1/72x72/1f517.png" alt="🔗" class="wp-smiley" /> <a href="https://issues.chromium.org/issues/358567092">https://issues.chromium.org/issues/358567092</a></p>
<figure class="wp-block-image size-full"><img width="697" height="366" src="https://blogs.igalia.com/mshin/files/2026/01/image.png" alt="" class="wp-image-151" /></figure>
<h4 class="wp-block-heading">Our goals for Content Shell + //extensions are:</h4>
<ol class="wp-block-list">
<li>Embedders should only implement a small set of interfaces, primarily for UI surfaces (install prompts, permission dialogs) and optional behaviors.</li>
<li><strong>Full Web Extensions APIs support</strong> <br />w3c standard : <a href="https://w3c.github.io/webextensions/specification/">https://w3c.github.io/webextensions/specification/</a></li>
<li><strong>Chrome Web Store compatibility</strong><br />Embedders should be able to install and run extensions directly from the Chrome Web Store.</li>
</ol>
<hr class="wp-block-separator has-alpha-channel-opacity" />
<h4 class="wp-block-heading"><strong>Short-term Goal: Installation Support</strong></h4>
<p>Our immediate milestone is to make <strong>installation</strong> work entirely using //content + //extensions.</p>
<p><strong>Current progress:</strong></p>
<ul class="wp-block-list">
<li><img src="https://s.w.org/images/core/emoji/16.0.1/72x72/2705.png" alt="✅" class="wp-smiley" /> <code>.zip</code> installation support already lives in //extensions</li>
<li><img src="https://s.w.org/images/core/emoji/16.0.1/72x72/1f6a7.png" alt="🚧" class="wp-smiley" /> Migrating <strong>Unpacked directory installation</strong> from //chrome to //extensions<br />(including replacing Profile with BrowserContext abstractions)</li>
<li><img src="https://s.w.org/images/core/emoji/16.0.1/72x72/1f51c.png" alt="🔜" class="wp-smiley" /> Moving <strong>.crx installation</strong> code from //chrome → //extensions<br /><br />As part of this effort, we are introducing <strong>clean, well-defined interfaces</strong> for install prompts and permission confirmations:</li>
<li>Chrome will continue to provide its full-featured UI</li>
<li>Embedders can implement minimal, custom UI as needed</li>
</ul>
<p><strong>What Comes Next</strong>:</p>
<p>Once installation is fully supported, we will move on to:</p>
<ul class="wp-block-list">
<li>Chrome Web Store integration flows</li>
<li>Core WebExtensions APIs required by commonly used extensions</li>
</ul>
<hr class="wp-block-separator has-alpha-channel-opacity" />
<h4 class="wp-block-heading"><strong>Main Engineering Challenge — Detaching from the Chrome Layer</strong></h4>
<p>The hardest part of this migration is not moving files—it is <strong>breaking long-standing dependencies on the <code>//chrome</code> layer</strong>.</p>
<p>The Extensions codebase is large and historically coupled to Chrome-only concepts such as:</p>
<ul class="wp-block-list">
<li><code>Profile</code></li>
<li><code>Browser</code></li>
<li>Chrome-specific <code>WebContents</code> delegates</li>
<li>Chrome UI surfaces</li>
<li>Chrome services (sync, signin, prefs)</li>
</ul>
<p>Each migration requires careful refactoring, layering reviews, and close collaboration with component owners. While the process is slow, it has already resulted in meaningful architectural improvements.</p>
<hr class="wp-block-separator has-alpha-channel-opacity" />
<h3 class="wp-block-heading"><strong>What’s Next?</strong></h3>
<p>In the next post, We’ll demonstrate:</p>
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>A functioning version of Extensions running on top of<br />//content + //extensions only — capable of installing and running extensions app.</strong></p>
</blockquote>
<p>from Igalia side, we continue working on ways to make easier integrating Chromium on other platforms, etc. This will mark the first end-to-end, //chrome-free execution path for extensions in content-based browsers.</p>
<p>Stay tuned!</p> mshinhttps://blogs.igalia.com/mshinAlex Bradbury: Per-query energy consumption of LLMshttps://muxup.com/2026q1/per-query-energy-consumption-of-llms2026-01-07T12:00:00+00:00
<p>How much energy is consumed when querying an LLM? We're largely in the dark
when it comes to proprietary models, but for open weight models that anyone
can host on readily available, albeit eye-wateringly expensive, hardware this
is something that can be measured and reported, right? In fact, given other
people are <a href="https://inferencemax.semianalysis.com/">doing the hard work</a> of
setting up and running benchmarks across all kinds of different hardware and
software configurations for common open weight models, can we just re-use that
to get a reasonable figure in terms of Watt-hours (Wh) per query?</p>
<p>For the kind of model you can run locally on a consumer GPU then of course
there's some value in seeing how low the per-query energy usage might be on a
large scale commercial setup. But my main interest is in larger and more
capable models, the kind that you wouldn't realistically run locally and end
up using in a pay-per-token manner either directly with your host of choice or
through an intermediary like <a href="https://openrouter.ai/">OpenRouter</a>. In these
cases where models are efficiently served with a minimum of 4-8 GPUs or even
<a href="https://www.perplexity.ai/hub/blog/lower-latency-and-higher-throughput-with-multi-node-deepseek-deployment">multi-node
clusters</a>
it's not easy to get a feel for the resources you're using. I'm pretty happy
that simple back of the envelope maths shows that whether providers are
properly amortising the cost of their GPUs or not, it's implausible that
they're selling per-token API access for open models at below the cost of
electricity. That gives a kind of upper bound on energy usage, and looking at
the pennies I spend on such services it's clearly a drop in the ocean compared
to my overall energy footprint. But it's not a very tight bound, which means
it's hard to assess the impact of increasing my usage.</p>
<p>We can look at things like <a href="https://arxiv.org/pdf/2508.15734">Google's published figures on energy usage for
Gemini</a> but this doesn't help much. They
don't disclose the length of the median prompt and its response, or details of
the model used to serve that median query meaning it's not helpful for
either estimating how it might apply to other models or how it might apply to
your own usage (which may be far away from this mysterious median query).
Mistral <a href="https://mistral.ai/news/our-contribution-to-a-global-environmental-standard-for-ai">released
data</a>
on the per query environmental impact (assuming for a 400 token query), but
the size of the Mistral Large 2 model is not disclosed and they don't calculate
a Wh per query figure. CO2 and water per query are very helpful to evaluate a
particular deployment, but the actual energy used is a better starting point
that can be applied to other providers assuming different levels of carbon
intensity. If one of the API providers were to share statistics based on a
real world deployment of one of the open models with a much higher degree of
transparency (i.e. sharing stats on the number of queries served during the
period, statistics on their length, and measured system power draw) that would
be a useful source of data. But today we're looking at what we can conclude
from the <a href="https://inferencemax.semianalysis.com/">InferenceMAX benchmark
suite</a> published results.</p>
<p>I'd started looking at options for getting good figures thinking I might
have to invest in the hassle and expense of renting a multi-GPU cloud
instance to run my own benchmarks, then felt InferenceMAX may make that
unnecessary. After writing this up along with all my provisos I'm perhaps
tempted again to try to generate figures myself. Anyway, read on for a more
detailed look at that benchmark suite. You can scroll past all the provisos
and <a href="https://muxup.com/feed.xml#results">jump ahead to the figures</a> giving the Wh/query
figures implied by the benchmark results across different GPUs, different
average input/output sequence lengths, and for gpt-oss 120B and
DeepSeek-R1-0528. But I hope you'll feel a bit guilty about it.</p>
<p>If you see any errors, please let me know.</p>
<h2 id="high-level-notes-on-inferencemax"><a href="https://muxup.com/feed.xml#high-level-notes-on-inferencemax" class="anchor" tabindex="-1"></a>High-level notes on InferenceMAX</h2>
<p><a href="https://inferencemax.semianalysis.com/">InferenceMAX benchmark suite</a> has the
<a href="https://newsletter.semianalysis.com/p/inferencemax-open-source-inference">stated
goal</a>
to "provide benchmarks that both emulate real world applications as much as
possible and reflect the continuous pace of software innovation." They
differentiate themselves from other benchmarking efforts noting "Existing
performance benchmarks quickly become obsolete because they are static, and
participants often game the benchmarks with unrealistic, highly specific
configurations."</p>
<p>The question I'm trying to answer is "what is the most 'useful AI' I can
expect for a modern GPU cluster in a realistic deployment and how much energy
does it consume". Any benchmark is going to show peak throughput higher than
you'd expect to achieve in real workload and there's naturally a desire to
keep it pinned on a specific model for as long as it isn't <em>totally</em>
irrelevant in order to enable comparisons as hardware and software evolves
with a common point of reference. But although I might make slightly
different choices about what gets benchmarked and how, the InferenceMAX setup
at first look seems broadly aligned with what I want to achieve.</p>
<p>They benchmark
<a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-0528">DeepSeek-R1-0528</a> (both
at the native fp8 quantisation and at fp4) which is a 671B parameter model
with 37B active weights released ~7 months ago and seems a fair representative
of a large MoE open weight model.
<a href="https://huggingface.co/openai/gpt-oss-120b">gpt-oss-120b</a> is also
benchmarked, providing a point of comparison for a much smaller and efficient
to run model. Different input sequence length and output sequence length (ISL
and OSL - the number of input and output tokens) are tested: 1k/1k, 1k/8k,
8k/1k, which provides coverage of different query types. Plus tests against a
wide range of GPUs (including the 72-GPU GB200 NVL72 cluster) and sweeps
different settings.</p>
<p>At the time of writing you might reasonably consider to be 'InferenceMAX' is
split into around three pieces:</p>
<ul>
<li>The frontend website you can <a href="https://inferencemax.semianalysis.com/">see at
inferencemax.semianalysis.com</a> (not
currently open source but <a href="https://github.com/SemiAnalysisAI/InferenceX/issues/315">planned to
be</a>)</li>
<li>The <a href="https://github.com/kimbochen/bench_serving">script for executing queries against the LLM serving infrastructure and
collecting stats</a> (currently in
a seperate repo but <a href="https://github.com/SemiAnalysisAI/InferenceX/issues/338">planned to be incorporated into the main InferenceMAX
repository</a>),</li>
<li>The wrapper/runner scripts and GitHub actions workflows that live in the
<a href="https://github.com/SemiAnalysisAI/InferenceX">main InferenceMAX
repository</a>.
<ul>
<li>This is actively contributed to by at least Nvidia and AMD engineers.</li>
</ul>
</li>
</ul>
<p>GitHub Actions is used to orchestrate the runs, ultimately producing a zip
file containing JSON with the statistics of each configuration (e.g.
<a href="https://github.com/SemiAnalysisAI/InferenceX/actions/runs/20216709902/job/58149531774">here</a>).
The <code>benchmark_serving.py</code> script is invoked via the <a href="https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/benchmarks/benchmark_lib.sh#L107"><code>run_benchmark_serving</code> wrapper
in
<code>benchmark_lib.sh</code></a>
which hardcodes some options and passes through some others from the workflow
YAML. The results logged by <code>benchmark_serving.py</code> are <a href="https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/utils/process_result.py">processed in
InferenceMAX's <code>process_result.py</code>
helper</a>
which will produce JSON in the desired output format. Together, these scripts
provide statistics like throughput (input and output token), end to end
latency, interactivity (output tokens per second) etc.</p>
<h2 id="further-studying-the-benchmark-setup"><a href="https://muxup.com/feed.xml#further-studying-the-benchmark-setup" class="anchor" tabindex="-1"></a>Further studying the benchmark setup</h2>
<p>So, let's look at the benchmarking logic in more detail to look for any
surprises or things that might affect the accuracy of the Wh-per-query figure
I want to generate. I'll note that InferenceMAX is an ongoing project that is
actively being developed. These observations are based on a recent repo
checkout, but of course things may have changed since then if you're reading
this post some time after it was first published.</p>
<p>Looking through I made the following observations. Some represent potential
issues (see the next subheading for a list of the upstream issues I filed),
while others are just notes based on aspects of the benchmark I wanted to
better understand.</p>
<ul>
<li>One of the required arguments to the benchmark serving script is
<code>--random-range-ratio</code>. This is set by default <a href="https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/.github/workflows/benchmark-tmpl.yml#L56">to 0.8 in
<code>benchmark-tmpl.yml</code></a>
and <a href="https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/.github/workflows/benchmark-multinode-tmpl.yml#L49">in
<code>benchmark-multinode-tmpl.yml</code></a>
and is not overridden elsewhere.
<ul>
<li>This argument is ultimately used in
<a href="https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L366"><code>sample_random_requests</code></a>.
It uses <code>np.random.randint</code> to sample input/output lengths between the
<code>range_ratio * {input,output}_len</code> and <code>{input,output}_len</code>.</li>
<li>Taken together, this logic means for for a workload advertised as having
8k input or output tokens (8192), the benchmark will actually run with an
average ~7373 (<code>0.9*num_tokens</code>, due to the length being a random number
between <code>0.8*num_tokens</code> and <code>num_tokens</code>) tokens.</li>
<li>Because the throughput figures are <a href="https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L498">calculated using the actual input and
output token
lengths</a>,
the figure <em>does</em> represent what was observed, it's just the workload
doesn't quite match the description. The reported end to end latency for
instance will be misleadingly lower than you would get for a workload that
actually did have the expected input / output sequence lengths.</li>
</ul>
</li>
<li>The various request functions in
<a href="https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/backend_request_func.py">backend_request.func.py</a>
will set <code>output.success = False</code> if they don't get a HTTP 200 status code
back for a request. There is no logic to retry a refused request and
<a href="https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L485">metrics will be calculated skipping any failed
requests</a>.
This means an overloaded server will perform better on this benchmark for
metrics like E2E latency and TTFT if it refuses requests rather than accept
them and serve them slowly. As the number of failed requests isn't included
in the results json it's not easy to tell if this is a factor for any
benchmarks.</li>
<li>Many of the various scripts in the benchmarks/ subdirectory <a href="https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/benchmarks/gptoss_fp4_b200_docker.sh#L22">set a
max-model-len
parameter</a>
or the similar <code>--max_seq_len</code> parameter for trt-llm (e.g. <a href="https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/benchmarks/gptoss_fp4_b200_trt_docker.sh#L65">the b200
config</a>
which if I'm not mistaken will ultimately be set from the max_model_len
<a href="https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/utils/matrix_logic/generate_sweep_configs.py">defined in
generate_sweep_configs.py</a>.
This parameter is <a href="https://docs.vllm.ai/en/latest/cli/serve/#-max-model-len">documented in
vllm</a> and <a href="https://nvidia.github.io/TensorRT-LLM/1.0.0rc2/commands/trtllm-serve.html#cmdoption-trtllm-serve-serve-max_seq_len">in
TensortRT-LLM</a>
and controls the maximum supported length of a request, including both the
prompt and any generated output. Setting it 20 or 200 tokens above the sum
of the benchmarked ISL+OSL to minimise memory use does not seem like a
realistic real-world deployment, which seems the wrong choice given the
InferenceMAX complaint that in other suites "participants often
game the benchmarks with unrealistic, highly specific configurations".
Benchmarks naturally show a 'best case', but if you're generating figures
like $ per M tokens it's a figure that makes little sense if it reflects a
configuration you wouldn't feasibly use/sell.</li>
<li>Throughput is <a href="https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L546">calculated in
<code>benchmark_serving.py</code></a>
based on the total number of tokens divided by the duration of the
benchmark. This is then normalised on a per-GPU basis <a href="https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/utils/process_result.py#L90">in
process_result.py</a>.
No problems here, I just wanted to clarify the source of the figure.</li>
<li>In terms of the source of the input tokens themselves, we can see that
<a href="https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/benchmarks/benchmark_lib.sh#L222"><code>--dataset-name random</code> is always passed to
<code>benchmark_serving.py</code></a>.
This leads to
<a href="https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L366"><code>sample_random_requests</code></a>
being called, which will pick random token ids and create a list of tokens
of the desired length (mapping these randomly picked IDs to tokens).
<ul>
<li>The <code>--ignore-eos</code> flag is passed to the <code>benchmark_serving.py</code> script
which will in turn set this option in the JSON when making the LLM request.
<a href="https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/backend_request_func.py"><code>backend_request_func.py</code></a>
sets this and also sets <code>max_tokens</code> to the desired <code>output_len</code> which
<em>should</em> ensure that the response has that exact desired number of output
tokens. <code>ignore_eos</code> means that the LLM server will keep generating tokens
even after seeing the end of sequence token.</li>
<li>It's interesting that some of the benchmark configurations enable
multi-token prediction, and presumably find it beneficial even given the
totally random token inputs. Is it possible that such configurations
benefit from undesirable looped outputs (due to a combination of random
inputs and continuing to sample tokens past the EOS marker) that
potentially are very predictable and give an extra boost?</li>
</ul>
</li>
<li>The --num-prompts parameter controls the total number of requests that are
issued. The benchmark script is written so it will wait for all of these to
complete (either successfully or unsuccessfully). This is
<a href="https://github.com/SemiAnalysisAI/InferenceX/blob/84320a0aadacae1114265b553830f48b56231817/benchmarks/gptoss_fp4_h100_slurm.sh#L51">typically</a>
set to the concurrency times 10, but some benchmark setups set it higher
(presumably as the default figure finishes too quickly for good results).</li>
<li>In terms of how requests are submitted with a certain level of concurrency:
<ul>
<li>See above for a discussion of the total number of requests</li>
<li><code>--request-rate inf</code> is always passed, so there's no limit on submitting
requests up to the concurency limit.</li>
<li>It <a href="https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L962">precomputes a list of requests to
submit</a>
and then <a href="https://github.com/kimbochen/bench_serving/blob/499c0b171b499b02a1fd546fb2326d2175a5d66e/benchmark_serving.py#L664">uses a semaphore to limit
concurrency</a>
but otherwise continuously submits requests up to the concurrency limit,
and then waits until they call complete.</li>
</ul>
</li>
<li>There are no tests that the configuration is serving the model with the
expected quality currently, but there's an <a href="https://github.com/SemiAnalysisAI/InferenceX/issues/123">issue tracking at least adding a
simple quality
benchmark</a>.
Although none of the explored settings <em>should</em> impact the quality of output,
it's always possible they trigger a bug and in this case it's not
interesting to benchmark.</li>
<li>It would be helpful for reproducibility if more complete system information
for the benchmark runners was released. This is <a href="https://github.com/SemiAnalysisAI/InferenceX/issues/393">being worked
on</a>.</li>
<li>You should of course consider whether the tested input and output sequence
lengths correspond to a workload you are interested in (thank you to Aaron
Zhao for <a href="https://www.linkedin.com/feed/update/urn:li:activity:7414767337058242562?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7414767337058242562%2C7415321431900905472%29&dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287415321431900905472%2Curn%3Ali%3Aactivity%3A7414767337058242562%29">reminding me to mention
this</a>.
This benchmarking approach also doesn't consider caching. Both factors could
be highly relevant if trying to estimate energy cost for a long context chat
or 'agentic' flow. But I'm happy enough with the tested workloads as a
starting point, and my main focus here is trying to get a degree of comfort
with the reported numbers for the ISL/OSL combinations they've chosen to
test.</li>
</ul>
<h2 id="filed-issues"><a href="https://muxup.com/feed.xml#filed-issues" class="anchor" tabindex="-1"></a>Filed issues</h2>
<p>I ended up filing the following issues upstream:</p>
<ul>
<li>FIXED <a href="https://github.com/SemiAnalysisAI/InferenceX/issues/293">Token throughput per MW is described as reflecting the generated tokens but
is actually processed+generated
tokens</a>
<ul>
<li>The companion <a href="https://newsletter.semianalysis.com/p/inferencemax-open-source-inference">article introducing
InferenceMAX</a>
has previously defined throughput as the rate at which the GPU
<strong>generates</strong> tokens yet the figure displayed in the UI was the total
number of output <em>and</em> input tokens per second. The definition in the
article has now been fixed, and changes to the UI make it more obvious
based on context that throughput refers to input+output tokens (as y-axis
metric options now exist to show "input token throughput per GPU" and
"output token throughput per GPU").</li>
<li><a href="https://www.youtube.com/watch?v=yYha_OtxA14">This talking head video from
Nvidia</a> seems to make the
same error, talking about the number of tokens 'generated' per second per
GPU when looking at the relevant results these sem to be the total throughput
(i.e. output <strong>plus</strong> the much faster to process input tokens).</li>
</ul>
</li>
<li><a href="https://github.com/SemiAnalysisAI/InferenceX/issues/299">Presented input/output token throughput per GPU for disaggregated setups
not usefully comparable to standard multi-gpu
setups</a>
<ul>
<li>In disaggregated setups you have some number of GPUs dedicated to prefill
(processing input tokens) and some number dedicated to decode (generating
output tokens). In this case, the reported input/output throughput figures
refer to the input or output throughput per prefill GPU or per decode GPU.
It doesn't make sense (IMHO) to plot this figure against the input/output
throughput figures for a non-disaggregated setup. To make it comparable,
the input/output throughput per GPU should be calculated by averaging
across the whole cluster rather than just the GPUs dedicated to prefill or
decode respectively.</li>
</ul>
</li>
<li><a href="https://github.com/SemiAnalysisAI/InferenceX/issues/300">Standard deviatation of interactivity (std_intvty) in result json is
incorrectly
calculated</a>
<ul>
<li>Not a big issue as the figure isn't used anywhere. Interactivity
(tokens/second) metrics are calculated from the recorded time per output
token. <code>1000/$tpot_metric</code> is correct for the mean, median, and p99 figures
but mathematically incorrect for the standard deviation. e.g. a small
standard deviation for time per output token will result in a huge
standard deviation being computed for interactivity.</li>
</ul>
</li>
<li>FIXED <a href="https://github.com/SemiAnalysisAI/InferenceX/issues/349">Reference kW figures no longer shown in frontend for each
GPU</a>
<ul>
<li>At some point updates to the frontend logic meant that the per-GPU kW
figures used in calculating the token throughput per utility MW were no
longer displayed. This has now been fixed.</li>
</ul>
</li>
<li><a href="https://github.com/SemiAnalysisAI/InferenceX/issues/350">How will full workflow run output be retained beyond 90
days</a>
<ul>
<li>The benchmark frontend helpfully links to the GitHub Actions run that
generated the displayed results and has a datepicker to view previous
results. Clicking through to GitHub means you can download the original
.zip of the JSON format benchmark results which is something I take
advantage of in the analysis later in this article. According to GitHub
docs, <a href="https://docs.github.com/en/organizations/managing-organization-settings/configuring-the-retention-period-for-github-actions-artifacts-and-logs-in-your-organization">the maximum retention period for Actions artifacts and logs is 90
days for a public
repo</a>.
It would be good to have a mechanism so that this information is backed up
rather than lost.</li>
</ul>
</li>
<li><a href="https://github.com/SemiAnalysisAI/InferenceX/issues/365">Contents of CONFIG_DIR path as used in launch_gb200-nv.sh is
undisclosed</a>
<ul>
<li>Most benchmark configuration lives in the main repository, but
unfortunately one of the Nvidia DeepSeek R1 configurations <a href="https://github.com/SemiAnalysisAI/InferenceX/blob/ff7dfc7365034aa84245f41c517c38618860d484/runners/launch_gb200-nv.sh#L26">relies on
a config dir that's not publicly
available</a>
meaning it can't be audited or reproduced. This is a case where tightening
up benchmark rules and review process can hopefully avoid it happening in
the future.</li>
</ul>
</li>
<li><a href="https://github.com/SemiAnalysisAI/InferenceX/issues/359">Reconsider allowing setting max_model_len / max_seq_len to
isl+osl+tiny_margin</a>
<ul>
<li>As explained above, a number of benchmarks set <code>max_model_len</code> (or for
Nvidia's TensorRT, <code>--max_seq_len</code>) to some figure that is just above
ISL+OSL. Although some degree of tuning is expected, to me this goes
against the idea that "<a href="https://newsletter.semianalysis.com/p/inferencemax-open-source-inference">We want server configs to reflect real world
deployments as much as
possible</a>"
and the stated goal "to provide benchmarks that both emulate real world
applications as much as possible and reflect the continuous pace of
software innovation". It's hard to imagine a realistic deployment that
would configure their serving engine in a way such that it errors if
input+output tokens passes ~2k tokens for instance. Looking at the
<a href="https://openrouter.ai/deepseek/deepseek-r1-0528">DeepSeek R1 0528 providers on
OpenRouter</a>, the vast
majority offer greater than 128k context.</li>
<li>By my understanding, with PagedAttention the KV cache is dynamically
allocated anyway so this setting would largely impact other data
structures. Plus vllm at least contains a startup check that there is
sufficient VRAM to serve at least one request at the maximum configured
context. I would really like to see what impact this setting has on
benchmarks.</li>
<li>The repository maintainers renamed my issue to a title that doesn't
reflect my report. I'm hopeful they will review my recent comment and
title it back.</li>
</ul>
</li>
<li><a href="https://github.com/SemiAnalysisAI/InferenceX/issues/357">Some reported metrics will be inflated if a serving engine sheds
load</a>
<ul>
<li>This covers the observation made above that failed requests are simply
skipped. As the number of failed requests isn't tracked, it's not easy to
see if a particular configuration may appear better (better E2E latency,
lower time to first token) as a reset of shedding load rather than
queueing.</li>
<li>The repository maintainers renamed this issue to "[feature suggestion for
vllm/vllm benchmark_serving]" and closed it. I'm hopeful they will read my
<a href="https://github.com/SemiAnalysisAI/InferenceX/issues/357#issuecomment-3680821210">response</a>
and reconsider on the grounds that:
<ul>
<li>The benchmark_serving script isn't doing anything "wrong" necessarily.
It is simply making an implementation choice with potential impact on
results that the InferenceMAX harness isn't tracking.</li>
<li>The script is planned to be added to the repo soon anyway.</li>
</ul>
</li>
</ul>
</li>
<li><a href="https://github.com/SemiAnalysisAI/InferenceX/issues/356">Benchmarked ISL and OSL averages 0.9*target_length meaning results are
over-optimistic</a>.
<ul>
<li>This is the problem mentioned above where the introduced variance in
input/output sequence length has an average lower than the headline rate.
As noted, this means specifically the end to end latency figure is
misleading, but also impacts tokens/second and throughput to the extent
that the cost of serving a query doesn't scale with O(n).</li>
<li>This will be fixed by <a href="https://github.com/SemiAnalysisAI/InferenceX/pull/339">PR
339</a> which
upstreams the <code>benchmark_serving.py</code> script and in that modified branch
changes <code>sample_random_requests</code> to sample a range with multiplier between
<code>1 - RANGE_RATIO</code> and <code>1 + RANGE_RATIO</code>.</li>
</ul>
</li>
</ul>
<p>In the best case, you'd hope to look at the benchmark results, accept they're
probably represent a higher degree of efficiency than you'd likely get on a
real workload, that an API provider might achieve 50% of that and double the
effective cost per query to give a very rough upper estimate on per-query cost
But that only really works if the reported benchmark results roughly match the
achievable throughput in a setup configured for commercial serving. Given the
tuning to specific isl/osl values, I'm not at all confident thats the case and
I don't know how wide the gap is.</p>
<h2 id="generating-results"><a href="https://muxup.com/feed.xml#generating-results" class="anchor" tabindex="-1"></a>Generating results</h2>
<p>Firstly I wrote a <a href="https://gist.github.com/asb/44fe17f4f5b7abed7836481be45c5a38#file-check-py">quick
script</a>
to check some assumptions about the data and look for anything that seems
anomalous. Specifically:</p>
<ul>
<li>Check that total throughput per GPU matches what you'd expect based on the
input token and output token throughput per GPU, even in the disaggregated
case. i.e. the total thoughput per GPU averaged over the whole cluster
should equal the sum of the input and output throughput per GPU provided
those figures are averaged over the whole cluster.</li>
<li>The ratio of input token throughput to output token throughput should be
almost equal to the to the ratio of input to output tokens in the
benchmark's workload. If not, there is something surprising that needs
investigating.</li>
</ul>
<p>Based on the information available in the generated result JSON and the
reported all-in power per GPU (based on SemiAnalysis' model), we can calculate
the Watt hours per query. First calculate the joules per token (watts per GPU
divided by the total throughput per GPU). This gives a weighted average of the
joules per token for the measured workload (i.e. reflecting the ratio of
isl:osl). Multiplying joules per token by the tokens per query (isl+osl) gives
the joules per query, and we can just divide by 3600 to get Wh.</p>
<p>There is some imprecision because we're constructing the figure for e.g.
8192/1024 ISL based on measurements with an average <code>0.9*8192</code> input and
<code>0.9*1024</code> output length. The whole calculation would be much simpler if the
benchmark harness recorded the number of queries executed and in what time,
meaning we can directly calculate the Wh/query from the Wh for the system over
the benchmark duration divided by the number of queries served (and
remembering that in the current setup each query is on average 90% of the
advertised sequence length).</p>
<p>This logic is wrapped up in a <a href="https://gist.github.com/asb/44fe17f4f5b7abed7836481be45c5a38#file-process_results-py">simple
script</a>.</p>
<p>There's been a recent change to <a href="https://github.com/SemiAnalysisAI/InferenceX/pull/381">remove the 'full sweep'
workflows</a> in favour of
only triggering a subset of runs when there is a relevant change. But I
grabbed my results from before this happened, from a December 15th 2025 run.
However when finalising this article I spotted Nvidia managed to land some new
NVL72 DeepSeek R1 0528 configurations just before Christmas, so I've merged in
those results as well, using a run from December 19th. All data and scripts are
collected together <a href="https://gist.github.com/asb/44fe17f4f5b7abed7836481be45c5a38">in this
Gist</a>.</p>
<h2 id="results"><a href="https://muxup.com/feed.xml#results" class="anchor" tabindex="-1"></a>Results</h2>
<p>As well as giving the calculated Wh per query, the script also gives a
comparison point of minutes of PS5 gameplay (<a href="https://www.playstation.com/en-gb/legal/ecodesign/">according to
Sony</a>, "Active Power
Consumption" ranges from ~217W to ~197W depending on model - we'll just use
200W). The idea here is to provide some kind of reference point for what a
given Wh figure means in real-world times, rather than focusing solely on the
relative differences between different deployments. Comparisons to "minutes of
internet streaming" seem popular at the moment, presumably as it's because an
activity basically everyone does. I'm steering away from that because I'd
be comparing one value that's hard to estimate accurately and has many
provisos to another figure that's hard to estimate accurately and has many
provisos, which just injects more error and uncertainty into this effort to
better measure/understand/contextualise energy used for LLM inference.</p>
<p>I'm now going to cherry-pick some results for discussion. Firstly for DeepSeek
R1 0528 with 8k/1k ISL/OSL, we see that the reported configurations that give
a usable level of interactivity at fp8 report between 0.96-3.74 Wh/query
(equivalent to 0.29-1.12 minutes of PS5 gaming). The top row which is
substantially
more efficient is the newer <a href="https://github.com/SemiAnalysisAI/InferenceX/commit/c040b5cf23ced2c7e23d1da03e1abae89e6426aa">GB200 NVL72 configuration added at the end of
last
year</a>.
It's not totally easy to trace the configuration changes given they're
accompanied by a reworking of the associated scripts, but as far as I can see
the configuration ultimately used is <a href="https://github.com/ai-dynamo/dynamo/blob/b7107d008/examples/backends/sglang/slurm_jobs/scripts/gb200-fp8/disagg/8k1k-max-tpt.sh">this file from the dynamo
repository</a>.
Looking at the JSON the big gain comes from significantly higher prefill
throughput (with output throughput per GPU remaining roughly the same). This
indicates the older results (the second row) were bottlenecked waiting for
waiting for prefill to complete.</p>
<table>
<thead>
<tr>
<th align="left">Workload</th>
<th align="left">Intvty (tok/s)</th>
<th align="left">E2EL (s)</th>
<th align="left">Details</th>
<th align="left">Wh/Q</th>
<th align="left">PS5 min</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">fp8 DS R1 0528 8k/1k</td>
<td align="left">39.5</td>
<td align="left">36.5</td>
<td align="left">gb200 dynamo-sglang (72 GPUs disagg, conc: 2048, pfill_dp_attn, dec_dp_attn)</td>
<td align="left">0.96</td>
<td align="left">0.29</td>
</tr>
<tr>
<td align="left">fp8 DS R1 0528 8k/1k</td>
<td align="left">31.3</td>
<td align="left">55.2</td>
<td align="left">gb200 dynamo-sglang (72 GPUs disagg, conc: 1024, pfill_dp_attn, dec_dp_attn)</td>
<td align="left">3.13</td>
<td align="left">0.94</td>
</tr>
<tr>
<td align="left">fp8 DS R1 0528 8k/1k</td>
<td align="left">20.9</td>
<td align="left">48.8</td>
<td align="left">h200 trt (8 GPUs, conc: 64, dp_attn)</td>
<td align="left">3.32</td>
<td align="left">1.00</td>
</tr>
<tr>
<td align="left">fp8 DS R1 0528 8k/1k</td>
<td align="left">19.5</td>
<td align="left">49.6</td>
<td align="left">h200 sglang (8 GPUs, conc: 64)</td>
<td align="left">3.39</td>
<td align="left">1.02</td>
</tr>
<tr>
<td align="left">fp8 DS R1 0528 8k/1k</td>
<td align="left">23.9</td>
<td align="left">39.9</td>
<td align="left">b200-trt trt (8 GPUs, conc: 64)</td>
<td align="left">3.39</td>
<td align="left">1.02</td>
</tr>
<tr>
<td align="left">fp8 DS R1 0528 8k/1k</td>
<td align="left">22.3</td>
<td align="left">44.5</td>
<td align="left">b200 sglang (8 GPUs, conc: 64)</td>
<td align="left">3.74</td>
<td align="left">1.12</td>
</tr>
</tbody>
</table>
<p>Now taking a look at the results for an fp4 quantisation of the same workload,
the result is significantly cheaper to serve with similer or better
interactivity and the NVL72 setup Nvidia submitted does have a significant
advantage over the 4/8 GPU clusters. This time we see 0.63-1.67 Wh/query
(equivalent to 0.19-0.50 minutes of PS5 power draw while gaming). Serving at a
lower quantisation impacts the quality of results of course, but the improved
efficiency, including on smaler 4 GPU setups helps demonstrate why models like
<a href="https://huggingface.co/moonshotai/Kimi-K2-Thinking">Kimi K2 thinking</a> are
distributed as "native int4", with benchmark results reported at this
quantisation and quantisation aware training used to maintain quality of
result.</p>
<table>
<thead>
<tr>
<th align="left">Workload</th>
<th align="left">Intvty (tok/s)</th>
<th align="left">E2EL (s)</th>
<th align="left">Details</th>
<th align="left">Wh/Q</th>
<th align="left">PS5 min</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">fp4 DS R1 0528 8k/1k</td>
<td align="left">41.6</td>
<td align="left">24.6</td>
<td align="left">gb200 dynamo-trt (40 GPUs disagg, conc: 1075, pfill_dp_attn, dec_dp_attn)</td>
<td align="left">0.63</td>
<td align="left">0.19</td>
</tr>
<tr>
<td align="left">fp4 DS R1 0528 8k/1k</td>
<td align="left">22.8</td>
<td align="left">43.2</td>
<td align="left">b200-trt trt (4 GPUs, conc: 128, dp_attn)</td>
<td align="left">0.93</td>
<td align="left">0.28</td>
</tr>
<tr>
<td align="left">fp4 DS R1 0528 8k/1k</td>
<td align="left">18.7</td>
<td align="left">59.3</td>
<td align="left">b200 sglang (4 GPUs, conc: 128)</td>
<td align="left">1.25</td>
<td align="left">0.38</td>
</tr>
<tr>
<td align="left">fp4 DS R1 0528 8k/1k</td>
<td align="left">30.3</td>
<td align="left">39.4</td>
<td align="left">b200 sglang (4 GPUs, conc: 64)</td>
<td align="left">1.67</td>
<td align="left">0.50</td>
</tr>
</tbody>
</table>
<p>Looking now at the 1k/8k workload (i.e. generating significant output) and the
cost is 15.0-16.3 Wh/query (equivalent to 4.49-4.89 minutes of PS5 power draw
while gaming). As expected this is significantly higher than the 8k/1k
workload as prefill (processing input tokens) is much cheaper per token than
decode (generating output tokens)</p>
<table>
<thead>
<tr>
<th align="left">Workload</th>
<th align="left">Intvty (tok/s)</th>
<th align="left">E2EL (s)</th>
<th align="left">Details</th>
<th align="left">Wh/Q</th>
<th align="left">PS5 min</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">fp8 DS R1 0528 1k/8k</td>
<td align="left">42.5</td>
<td align="left">176.3</td>
<td align="left">b200 sglang (8 GPUs, conc: 64)</td>
<td align="left">15.0</td>
<td align="left">4.49</td>
</tr>
<tr>
<td align="left">fp8 DS R1 0528 1k/8k</td>
<td align="left">31.9</td>
<td align="left">232.2</td>
<td align="left">h200 sglang (8 GPUs, conc: 64)</td>
<td align="left">15.9</td>
<td align="left">4.76</td>
</tr>
<tr>
<td align="left">fp8 DS R1 0528 1k/8k</td>
<td align="left">31.2</td>
<td align="left">237.9</td>
<td align="left">h200 trt (8 GPUs, conc: 64)</td>
<td align="left">16.3</td>
<td align="left">4.88</td>
</tr>
<tr>
<td align="left">fp8 DS R1 0528 1k/8k</td>
<td align="left">39.1</td>
<td align="left">189.5</td>
<td align="left">b200-trt trt (8 GPUs, conc: 64)</td>
<td align="left">16.3</td>
<td align="left">4.89</td>
</tr>
</tbody>
</table>
<p>Again, fp4 has a significant improvement in efficiency:</p>
<table>
<thead>
<tr>
<th align="left">Workload</th>
<th align="left">Intvty (tok/s)</th>
<th align="left">E2EL (s)</th>
<th align="left">Details</th>
<th align="left">Wh/Q</th>
<th align="left">PS5 min</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">fp4 DS R1 0528 1k/8k</td>
<td align="left">29.7</td>
<td align="left">251.5</td>
<td align="left">b200-trt trt (4 GPUs, conc: 256, dp_attn)</td>
<td align="left">2.73</td>
<td align="left">0.82</td>
</tr>
<tr>
<td align="left">fp4 DS R1 0528 1k/8k</td>
<td align="left">37.7</td>
<td align="left">197.5</td>
<td align="left">b200-trt trt (8 GPUs, conc: 256, dp_attn)</td>
<td align="left">4.31</td>
<td align="left">1.29</td>
</tr>
<tr>
<td align="left">fp4 DS R1 0528 1k/8k</td>
<td align="left">34.2</td>
<td align="left">221.2</td>
<td align="left">b200 sglang (4 GPUs, conc: 128)</td>
<td align="left">4.75</td>
<td align="left">1.43</td>
</tr>
<tr>
<td align="left">fp4 DS R1 0528 1k/8k</td>
<td align="left">33.1</td>
<td align="left">223.1</td>
<td align="left">b200-trt trt (4 GPUs, conc: 128)</td>
<td align="left">4.79</td>
<td align="left">1.44</td>
</tr>
</tbody>
</table>
<p>As you'd expect for a much smaller model at native fp4 quantisation,
GPT-OSS-120B is much cheaper to serve. e.g. for 8k/1k:</p>
<table>
<thead>
<tr>
<th align="left">Workload</th>
<th align="left">Intvty (tok/s)</th>
<th align="left">E2EL (s)</th>
<th align="left">Details</th>
<th align="left">Wh/Q</th>
<th align="left">PS5 min</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">fp4 GPT-OSS 120B 8k/1k</td>
<td align="left">45.8</td>
<td align="left">20.8</td>
<td align="left">b200-trt trt (1 GPUs, conc: 128)</td>
<td align="left">0.11</td>
<td align="left">0.03</td>
</tr>
<tr>
<td align="left">fp4 GPT-OSS 120B 8k/1k</td>
<td align="left">93.1</td>
<td align="left">10.5</td>
<td align="left">b200-trt trt (2 GPUs, conc: 128, dp_attn)</td>
<td align="left">0.11</td>
<td align="left">0.03</td>
</tr>
<tr>
<td align="left">fp4 GPT-OSS 120B 8k/1k</td>
<td align="left">44.3</td>
<td align="left">21.4</td>
<td align="left">b200 vllm (1 GPUs, conc: 128)</td>
<td align="left">0.11</td>
<td align="left">0.03</td>
</tr>
<tr>
<td align="left">fp4 GPT-OSS 120B 8k/1k</td>
<td align="left">145.7</td>
<td align="left">6.7</td>
<td align="left">b200-trt trt (2 GPUs, conc: 64, dp_attn)</td>
<td align="left">0.14</td>
<td align="left">0.04</td>
</tr>
<tr>
<td align="left">fp4 GPT-OSS 120B 8k/1k</td>
<td align="left">103.8</td>
<td align="left">9.2</td>
<td align="left">b200 vllm (2 GPUs, conc: 64)</td>
<td align="left">0.20</td>
<td align="left">0.06</td>
</tr>
</tbody>
</table>
<p>Or for 1k/8k:</p>
<table>
<thead>
<tr>
<th align="left">Workload</th>
<th align="left">Intvty (tok/s)</th>
<th align="left">E2EL (s)</th>
<th align="left">Details</th>
<th align="left">Wh/Q</th>
<th align="left">PS5 min</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">fp4 GPT-OSS 120B 1k/8k</td>
<td align="left">80.5</td>
<td align="left">91.6</td>
<td align="left">b200-trt trt (1 GPUs, conc: 128)</td>
<td align="left">0.49</td>
<td align="left">0.15</td>
</tr>
<tr>
<td align="left">fp4 GPT-OSS 120B 1k/8k</td>
<td align="left">72.3</td>
<td align="left">102.0</td>
<td align="left">b200 vllm (1 GPUs, conc: 128)</td>
<td align="left">0.55</td>
<td align="left">0.16</td>
</tr>
<tr>
<td align="left">fp4 GPT-OSS 120B 1k/8k</td>
<td align="left">144.9</td>
<td align="left">51.1</td>
<td align="left">b200-trt trt (2 GPUs, conc: 128, dp_attn)</td>
<td align="left">0.55</td>
<td align="left">0.17</td>
</tr>
<tr>
<td align="left">fp4 GPT-OSS 120B 1k/8k</td>
<td align="left">129.4</td>
<td align="left">57.0</td>
<td align="left">b200-trt trt (2 GPUs, conc: 128)</td>
<td align="left">0.61</td>
<td align="left">0.18</td>
</tr>
</tbody>
</table>
<h2 id="conclusion"><a href="https://muxup.com/feed.xml#conclusion" class="anchor" tabindex="-1"></a>Conclusion</h2>
<p>Well, this took rather a lot more work than I thought it would and I'm
not yet fully satisfied with the result. Partly we have to accept a degree of
fuzziness about marginal energy usage of an individual query - it's going to
depend on the overall workload of the system so there's going to be some
approximation when you try to cost a single query.</p>
<p>I'm glad that InferenceMAX exists and am especially glad that it's open and
publicly developed, which is what has allowed me to dive into its
implementation to the extent I have and flag concerns/issues. I feel it's not
yet fully living up to its aim of providing results that reflect real world
application, but I hope that will improve with further maturation and better
rules for benchmark participants. Of course, it may still make most sense to
collect benchmark figures myself and even if doing so, being able to refer to
the benchmarked configurations and get an indication of what hardware can
achieve what performance is helpful in doing so. Renting a 72-GPU cluster is
expensive and as far as I can see not typically available for a short time, so
any benchmarking run by myself would be limited to 4-8 GPU configurations. If
the gap in efficiency is huge for such setups vs the NVL72 then these smaller
setups are maybe less interesting.</p>
<p>If I found the time to run benchmarks myself, what would I be testing? I'd
move to <a href="https://huggingface.co/deepseek-ai/DeepSeek-V3.2">DeepSeek V3.2</a>. One
of the big features of this release was the movement to a new attention
mechanism which <a href="https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/assets/paper.pdf#section.3">scales <em>much</em> closer to linearly with sequence
length</a>.
With e.g. <a href="https://github.com/MoonshotAI/Kimi-Linear">Kimi Linear</a> and
<a href="https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd">Qwen3-Next</a>,
other labs are moving in a similar direction experimentally at least. I'd
try to set up 8 GPU configuration with sglang/vllm configured in a way that it
would be capable of serving a commercial workload with varied input/output
sequence lengths and test this is the case (Chutes <a href="https://chutes.ai/app/chute/398651e1-5f85-5e50-a513-7c5324e8e839?tab=source">provide their deployed
configs</a>
which may be another reference point). I'd want to see how much the effective
Wh per million input/output tokens varies depending on the different isl/osl
workloads. These <em>should</em> be relatively similar given the linear attention
mechanism, and if so it's a lot easier to estimate the rough energy cost of a
series of your own queries of varied length. I would stick with the random
input tokens for the time being.</p>
<p>So where does that leave us? All of this and we've got figures for two
particular models, with one benchmark harness, a limited set of input/output
sequence lengths, and a range of
potential issues that might impact the conclusion. I think this is a useful
yardstick / datapoint, though I'd like to get towards something that's even
more useful and that I have more faith in.</p>
<hr /><a href="https://muxup.com/feed.xml#article-changelog" class="anchor" tabindex="-1"></a>Article changelog
<ul>
<li>2026-02-17:
<ul>
<li>Changed GitHub links to point to SemiAnalysisAI/InferenceX rather than
InferenceMAX/InferenceMAX, as they were broken by the upstream rename.</li>
</ul>
</li>
<li>2026-01-09:
<ul>
<li>Fix broken link.</li>
<li>Add note that more complete system info would be helpful for
reproducibility.</li>
<li>Add note about variety of input/output sequence lengths tested.</li>
</ul>
</li>
<li>2026-01-07: Initial publication date.</li>
</ul> Alex Bradburyhttps://muxup.comBrian Kardell: The Secret Life of Custom Elementshttps://bkardell.com/blog/SecretLifeOfCustomElements.html2026-01-06T05:00:00+00:00
<h1 class="contextual-heading">The Secret Life of Custom Elements</h1>
<p class="segue">Twenty years ago last month, Google published <a href="https://web.archive.org/web/20060203035414/http://code.google.com/webstats/index.html">an analysis of "slightly over a billion documents,"</a> a snapshot of the web that helped shape the early direction of HTML5. It followed a lineage of smaller, more personal studies — individuals poking at the web to answer some narrow question, often with datasets that would easily fit on a thumb drive today. For about half those two decades, I’ve been arguing that we need <strong>more</strong> study of the web, not less. The platform evolves faster than our understanding of it, and the only way to know what the web actually is — not what we imagine it to be — is to look.</p>
<p>Every month the <a href="https://en.wikipedia.org/wiki/Internet_Archive">HTTP Archive</a> quietly captures a snapshot of the web as it actually exists—not the idealized web that we hope for, but the messy, improvised, duct‑taped reality of millions of sites in the wild. I’ve been collecting and studying these elements for the last six years. </p>
<p>This new dataset is the largest I’ve ever worked with: Billions of pages, hundreds of thousands of distinct non-standard element names, and a long tail that stretches into places no standards body has ever seriously examined. And unlike the Google study, which looked for patterns in class names, this dataset captures the long tail of non‑standard elements — the names people invent for actual elements when the platform doesn’t give them what they need.</p>
<p>What emerges is a portrait of the web as it is lived: messy, inventive, repetitive, global, and full of reinvention. It’s also a mirror held up to the platform itself.</p>
<p>But, it's also much more complex to study than I could have imagined a decade ago, and I really wish that the W3C (and member orgs which include academia) had taken up the charge to begin to figure out how to really study the web and use that information to inform standards work.</p>
<h2 class="contextual-heading">What's difficult about it...</h2>
<p>One problem is that the dataset itself has some fairly extreme bias. The crawl doesn't hit anything that isn't on the public internet - that means it excludes intranets which are <em>massive</em>. In fact, most of my career was spent working on intranets. The crawl captures only home pages, plus the target of whatever it interprets as the largest link on that page. It also can't get to anything that requires login - which means that for a site like twitter or bluesky or mastodon, you're going to get something very unrepresentative of any of those. So, one challenge I'd love to see us trying to tackle is how to get even better data representation. It's hard to "pave cowpaths" if they're in a country we can't even see into.</p>
<p>Initially I had this idea that we could watch for the adoption of tags - imagining that we'd get some that would become <em>very</em> popular, just like we did with JavaScript libraries and frameworks. However, it turns out that this is not the signal it might first appear to be. An element appearing in tens of thousands or even hundreds of thousands of pages is often simply because they are part of a larger successful system. If Wix or Shopify create some custom elements that work behind the WYSIWYG tooling, and lots of people use it to create their pages - then suddenly that element gets very very popular - even if it isn't actually particularly good. In fact, we can see shifts in the data where the teams themselves changed their minds and another version supplants the first very quickly because it's simply internal.</p>
<p>Then, I thought that perhaps what we can do with the dataset instead, is to squint at it and look a little more abstractly at what people are naming their elements and see if people are re-solving similar problems. Do we find, for example, multiple non-standard element names that appear to be about tabs? Yes! Clearly that is indicative that we need a native element, right? <em>Maybe</em>. It's a bit more nuanced than that. Here are the most commonly re-created/repeated non-standard element themes:</p>
<ul>
<li>Navigation </li>
<li>Headers and footers </li>
<li>Carousels and sliders</li>
<li>Modals </li>
<li>Search bars </li>
<li>Product cards </li>
<li>Login forms </li>
<li>Cookie banners </li>
<li>Accordions </li>
<li>Tabs </li>
<li>Toasts </li>
<li>Breadcrumbs</li>
</ul>
<p>While we don't have several of these in standard HTML, we <em>do</em> have native <code><header></code>, <code><footer></code>, <code><nav></code>, <code><dialog></code>, and <code><search></code> elements, and even accordions via the <code>name</code> attribute of <code><details></code>. And yet, the wild still contains hundreds or thousands of custom elements with names like <code><app-header></code>, <code><site-footer></code>, <code><main-nav></code>, <code><modal-dialog></code>, <code><search-box></code>, and <code><accordion-panel></code>.</p>
<p>Native primitives may exist, but not at the same level of abstraction as these. <code><header></code> and <code><footer></code> in HTML are structural, not behavioral. <code><dialog></code> is behavioral, but not styled. <code><search></code> exists, but doesn’t solve autocomplete, filtering, or results.</p>
<p>So developers build those - and, if you stop and think about it, not all non-standard elements are equally as undesirable. Many of them will be simple decorations or thin wrappers that <em>do</em> use their native counterparts. Where there is definitely some interesting thing to study is where there is clear generic need where the platform doesn't provide anything close. Above, tabs, for example.</p>
<h2 class="contextual-heading">Observations..</h2>
<p>Here are many observations from the data, in no real particular order of importance.</p>
<h3 class="contextual-heading">Forms and Inputs: Tweaked, Wrapped, and Re‑Wrapped</h3>
<p>Forms and inputs are a great example of the constant re-invention I just described. Sometimes it's because the native element is insufficient, but that's not <em>necessarily</em> the case. In some cases they're just slight wrappers. Among them are lots and lots of "pickers" and "selecters" that show up...</p>
<ul>
<li><code><custom-select></code></li>
<li><code><date-picker></code></li>
<li><code><variant-picker></code></li>
<li><code><quantity-selector></code></li>
</ul>
<p>There is already a lot of ongoing work to make native form elements (including selects) require less code and just be more stylable and flexible, and the data at least suggests that such efforts will be very welcome.</p>
<h3 class="contextual-heading">Hidden Machinery</h3>
<p>A surprising number of elements aren’t UI components at all. They’re runtime markers:</p>
<ul>
<li><code><ng-container></code></li>
<li><code><router-outlet></code></li>
<li><code><astro-island></code></li>
<li><code><ion-router-outlet></code></li>
<li><code><next-route-announcer></code></li>
</ul>
<p>These exist because frameworks need declarative boundaries for hydration, routing, rendering or template expansion. I suppose it is debatable wither these are an indicator of “missing HTML features”, or just how much. </p>
<h3 class="contextual-heading">Carousels (and sliders... and toasts)</h3>
<p>I don't love carousels, but it's hard to deny that they are popular. There are <em>dozens</em> of distinct and identifiable carousel/slider elements in the dataset and they appear <em>a lot</em>. I really dislike a few bits of Google's attempt to make CSS-only carousels possible, but it's pretty clear why they chose to tackle that problem. I guess it is worth stressing again the bias in the dataset here - if there is a page I most expect to see a carousel, it is exactly the primary one the archive crawls. So, while it is the most popular in the dataset, I don't know that it is the most popular all-around. You can see why Google winds up with their proposals though, toasts are on that top list too.</p>
<h3 class="contextual-heading">Structural semantics?</h3>
<p>There are a few broad categories where the main point seems to be "semantics". That is, very often many of these don't actually <em>do</em> anything, beyond provide some hooks, mainly for styling. They aren't actually even custom elements sometimes (or maybe even often) - just non-standard elements. </p>
<h4 class="contextual-heading">e-commerce</h4>
<p>Dozens of these surround e-commerce. There are tens of thousands of sites that use elements with names (and variants).</p>
Product & merchandising
<ul>
<li><code><product-card></code></li>
<li><code><product-title></code></li>
<li><code><product-price></code></li>
<li><code><product-rating></code></li>
<li><code><product-variant></code></li>
<li><code><product-gallery></code></li>
<li><code><product-description></code></li>
<li><code><product-badge></code></li>
</ul>
Pricing & money
<ul>
<li><code><price-money></code></li>
<li><code><sale-price></code></li>
<li><code><compare-at-price></code></li>
<li><code><discount-amount></code></li>
<li><code><currency-display></code></li>
</ul>
Inventory & availability
<ul>
<li><code><stock-status></code></li>
<li><code><pickup-availability></code></li>
<li><code><delivery-estimate></code></li>
<li><code><inventory-level></code></li>
</ul>
Cart & checkout
<ul>
<li><code><cart-items></code></li>
<li><code><cart-count></code></li>
<li><code><checkout-button></code></li>
<li><code><order-summary></code></li>
</ul>
<p>Very interestingly they are often used alongside actual machine readable semantics via jsonLD in the same markup. </p>
<p>While the vast majority of these elements appear because of common tooling, the fact that there are dozens of variants of similar names appearing on smaller numbers of sites indicates there is something widely interesting here. It's hard to say what it is other than that it would be nice to have a common structural semantic that would work for both purposes.</p>
<p>I guess the biggest surprise here is that if it's true, why hasn't such a thing arisen already? It is entirely within the community's power to develop such a thing. Perhaps the answer is that there is just so much variance it isn't easily plausible. Maybe templating would somehow allow us to achieve a common pattern which achieved this based on the shared jsonLD semantics.</p>
<h4 class="contextual-heading">Publishing & Editorial Semantics</h4>
<p>CMSes and news sites often invent tags for editorial structure, and many of these are sticking around.</p>
Content structure
<ul>
<li><code><article-header></code></li>
<li><code><article-summary></code></li>
<li><code><article-author></code></li>
<li><code><article-date></code></li>
<li><code><article-tags></code></li>
<li><code><article-tag></code></li>
<li><code><article-category></code></li>
<li><code><byline></code></li>
<li><code><dateline></code></li>
<li><code><pullquote></code></li>
<li><code><footnote></code></li>
</ul>
Taxonomy
<ul>
<li><code><tag-list></code></li>
<li><code><category-label></code></li>
<li><code><topic-header></code></li>
</ul>
<p>These reflect the needs of journalism and long‑form content.</p>
<h4 class="contextual-heading">Social & Community Semantics</h4>
<p>These show up in comment systems, forums, and social platforms.</p>
User‑generated content
<ul>
<li><code><comment></code></li>
<li><code><comment-list></code></li>
<li><code><comment-item></code></li>
<li><code><comment-author></code></li>
<li><code><comment-content></code></li>
<li><code><comment-date></code></li>
<li><code><comment-form></code></li>
</ul>
Identity
<ul>
<li><code><user-avatar></code></li>
<li><code><user-name></code></li>
<li><code><profile-card></code></li>
</ul>
<p>These encode relationships and interactions, not UI patterns.</p>
Events
<ul>
<li><code><event-date></code></li>
<li><code><event-location></code></li>
<li><code><event-schedule></code></li>
<li><code><event-details></code></li>
</ul>
<p>Again, these are domain objects, not widgets - and they have well established schema.org or microformats as well. </p>
Invoicing
<ul>
<li><code><invoice></code></li>
<li><code><invoice-line></code></li>
<li><code><invoice-total></code></li>
<li><code><invoice-summary></code></li>
</ul>
<p>Before the web came along, there were already national and international standards around electronically trading informtation like invoices - and when XML was sold, invoices were a common example. Here we are again.</p>
<h3 class="contextual-heading">"Namespaced" Elements</h3>
<p>Several elements like `o:p`, `rdf:rdf`, `dc:format`, `cc:work`, `fb:like`, `g:plusone` appear in the top 100. These basically were thinking of an XHTML future (namespacing) that never really arrived. However, HTML has always allowed it - so that's just the tag name. In many ways, it's just as good. Interestingly, these may be some of the better examples of what I'd like to see happen - they are widely understood.</p>
<p>Conversely, while hugely successful, the share buttons are more an indication of a desire than something we could actually standardize in precisely that way. They also point to a desire _in time_. Google Plus doesn't even exist anymore, `fb:like` is from a time when Facebook was at the top of the most interesting places to be. Maybe one of the things we've learned is that this is way handier to do at the browser/OS levels? I suppose the Web Share API was a part of thinking how we'd deal with this.</p>
<p>The fact that they both still appear so much is also kind of an indication of age of the page and slow replacement of underlying tools.</p>
<h3 class="contextual-heading">Typos, Encoding Errors, and the Weird Stuff</h3>
<p>One of the most delightful parts of the dataset is the long tail of what are almost certainly just typos:</p>
<ul>
<li><code><prodcut-card></code></li>
<li><code><navgation></code></li>
<li><code><contianer></code></li>
</ul>
<p>The fact that these can appear on tens of thousands of sites because they are part of common tooling helps re-enforce that not every non-standard element is a signal. :)</p>
<h3 class="contextual-heading">In conclusion...</h3>
<p>I wish that I could say "Ah ha - the data says very clearly that <em>these</em> are the specific things we should definitely 'just write down' now" in the way that I imagined a decade ago, but I don't think we're there yet. I guess if I had to give three things I'd like to see happen from here they'd be:</p>
<ol>
<li><p>We need lots more effort in thinking about how to study these things. I would love to see real investment in this space. This year, at last, the W3C is hiring someone to study the web. I'm not yet sure what that looks like but I look forward to trying to discuss more with them.</p>
</li>
<li><p>We need a real community effort - an <a href="https://ul.org/about/our-history/">Underwriters Labs</a> for custom elements, with participation and funding from orgs with money. We don't necessarily need "the one true tabs" as much as we need a place to find what I expect will be a very few sets of tabs as custom elements which we can trust like we trust native elements. Given a little bit of time, I have faith that this will naturally sort itself into a few 'winners'. </p>
</li>
<li><p>That community effort might also include things which won't ever have native implmentations, but which lay down some kind of light semantic meaning or compound styling structure that we all begin to agree on - like product cards or breadcrumbs.</p>
</li>
</ol>
<p>A lot of this is pretty adjacent/close to the ideas behind OpenUI and it's possible some of this could just happen there. However, due mainly to limits and participation, OpenUI has really not really produced custom elements or worked to somehow list or grade and promote them (though we did study them quite a bit in the tabs research). The effort led by Brad Frost to think about a "global design system" in particular might be closer to some of these ideas.</p> Brian Kardellhttp://bkardell.com/Andy Wingo: pre-tenuring in v8https://wingolog.org/2026/01/05/pre-tenuring-in-v82026-01-05T15:38:38+00:00
<div><p>Hey hey happy new year, friends! Today I was going over some V8 code
that touched <i>pre-tenuring</i>: allocating objects directly in the old
space instead of the nursery. I knew the theory here but I had never
looked into the mechanism. Today’s post is a quick overview of how it’s
done.</p><h3>allocation sites</h3><p>In a JavaScript program, there are a number of source code locations
that allocate. Statistically speaking, any given allocation is likely
to be short-lived, so generational garbage collection partitions
freshly-allocated objects into their own space. In that way, when the
system runs out of memory, it can preferentially reclaim memory from the
nursery space instead of groveling over the whole heap.</p><p>But you know what they say: there are lies, damn lies, and statistics.
Some programs are outliers, allocating objects in such a way that they
don’t die young, or at least not young enough. In those cases,
allocating into the nursery is just overhead, because minor collection
won’t reclaim much memory (because too many objects survive), and
because of useless copying as the object is scavenged within the nursery
or promoted into the old generation. It would have been better to
eagerly tenure such allocations into the old generation in the first
place. (The more I think about it, the funnier <i>pre-tenuring</i> is as a
term; what if some PhD programs could pre-allocate their graduates into
named chairs? Is going straight to industry the equivalent of dying
young? Does collaborating on a paper with a full professor imply a
write barrier? But I digress.)</p><p>Among the set of allocation sites in a program, a subset should
pre-tenure their objects. How can we know which ones? There is a
literature of static techniques, but this is JavaScript, so the answer
in general is dynamic: we should observe how many objects survive
collection, organized by allocation site, then optimize to assume that
the future will be like the past, falling back to a general path if the
assumptions fail to hold.</p><h3>my runtime doth object</h3><p>The high-level overview of how V8 implements pre-tenuring is based on
per-program-point <i>AllocationSite</i> objects, and per-allocation
<i>AllocationMemento</i> objects that point back to their corresponding
AllocationSite. Initially, V8 doesn’t know what program points would
profit from pre-tenuring, and instead allocates everything in the
nursery. Here’s a quick picture:</p><figure><img src="https://wingolog.org/pub/v8-allocation-sites.png" alt="diagram of linear allocation buffer containing interleaved objects and allocation mementos" />
<figcaption><i>A linear allocation buffer containing objects allocated with allocation mementos</i></figcaption>
</figure><p>Here we show that there are two allocation sites, <tt>Site1</tt> and <tt>Site2</tt>.
V8 is currently allocating into a linear allocation buffer (LAB) in the
nursery, and has allocated three objects. After each of these objects
is an <tt>AllocationMemento</tt>; in this example, <tt>M1</tt> and <tt>M3</tt> are
<tt>AllocationMemento</tt> objects that point to <tt>Site1</tt> and <tt>M2</tt> points to
<tt>Site2</tt>. When V8 allocates an object, it <a href="https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/factory.cc;l=343-345;drc=56535b80c32d3a618b8fc8cfd7b1afdd3862e1b2">increments the “created”
counter on the corresponding
<tt>AllocationSite</tt></a>
(if available; it’s possible an allocation comes from C++ or something
where we don’t have an <tt>AllocationSite</tt>).</p><p>When the free space in the LAB is too small for an allocation, V8 gets
another LAB, or collects if there are no more LABs in the nursery. When
V8 does a minor collection, as the scavenger visits objects, it will
<a href="https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/pretenuring-handler-inl.h;l=73-150">look to see if the object is followed by an
AllocationMemento</a>.
If so, it dereferences the memento to find the <tt>AllocationSite</tt>, then
increments its “found” counter, and adds the <tt>AllocationSite</tt> to a set.
<a href="https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/pretenuring-handler.cc;l=158">Once an AllocationSite has had 100
allocations</a>,
it is enqueued for a pre-tenuring decision; <a href="https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/pretenuring-handler.cc;l=50">sites with 85%
survival</a>
get marked for pre-tenuring.</p><p>If an allocation site is marked as needing pre-tenuring, the code in
which it is embedded it will get de-optimized, and then next time it is
optimized, the code generator arranges to allocate into the old
generation instead of the default nursery.</p><p>Finally, if a major collection collects more than 90% of the old
generation, V8 <a href="https://source.chromium.org/chromium/chromium/src/+/main:v8/src/heap/heap.cc;l=3034-3054">resets all pre-tenured allocation
sites</a>,
under the assumption that pre-tenuring was actually premature.</p><h3>tenure for me but not for thee</h3><p>What kinds of allocation sites are eligible for pre-tenuring? Sometimes
it depends on object kind; wasm memories, for example, are almost always
long-lived, so they are always pre-tenured. Sometimes it depends on who
is doing the allocation; allocations from the bootstrapper, literals
allocated by the parser, and many allocations from C++ go straight to
the old generation. And sometimes the compiler has enough information
to determine that pre-tenuring might be a good idea, as when it
<a href="https://source.chromium.org/chromium/chromium/src/+/main:v8/src/maglev/maglev-graph-builder.cc;l=4719">generates a store of a fresh object to a field in an known-old
object</a>.</p><p>But otherwise I thought that the whole AllocationSite mechanism would
apply generally, to any object creation. It turns out, nope: it seems
to only apply to object literals, array literals, and <tt>new Array</tt>.
Weird, right? I guess it makes sense in that these are the ways to
create objects that also creates the field values at creation-time,
allowing the whole block to be allocated to the same space. If instead
you make a pre-tenured object and then initialize it via a sequence of
stores, this would likely create old-to-new edges, preventing the new
objects from dying young while incurring the penalty of copying and
write barriers. Still, I think there is probably some juice to squeeze
here for pre-tenuring of class-style allocations, at least in the
optimizing compiler or in short inline caches.</p><p>I suspect this state of affairs is somewhat historical, as the
AllocationSite mechanism seems to have originated with <a href="https://dl.acm.org/doi/10.1145/2509136.2509531">typed array
storage strategies</a> and
V8’s “boilerplate” object literal allocators; both of these predate
per-AllocationSite pre-tenuring decisions.</p><h3>fin</h3><p>Well that’s adaptive pre-tenuring in V8! I thought the “just stick a
memento after the object” approach is pleasantly simple, and if you are
only bumping creation counters from baseline compilation tiers, it
likely amortizes out to a win. But does the restricted application to
literals point to a fundamental constraint, or is it just accident? If
you have any insight, let me know :) Until then, happy hacking!</p></div> Andy Wingohttps://wingolog.org/Qiuyi Zhang (Joyee): require(esm) in Node.js: from experiment to stabilityhttps://joyeecheung.github.io/blog/2025/12/30/require-esm-in-node-js-from-experiment-to-stability/2026-01-05T10:36:13+00:00
<p>More than a year ago, I set out to revive <a target="_blank" rel="noopener" href="https://nodejs.org/docs/latest-v25.x/api/modules.html#loading-ecmascript-modules-using-require"><code>require(esm)</code> in Node.js</a> and </p> Qiuyi Zhang (Joyee)https://joyeecheung.github.io/blog/Qiuyi Zhang (Joyee): require(esm) in Node.js: implementer's taleshttps://joyeecheung.github.io/blog/2025/12/30/require-esm-in-node-js-implementers-tales/2026-01-05T10:36:11+00:00
<p>In earlier posts, I wrote about <a href="https://joyeecheung.github.io/blog/2024/03/18/require-esm-in-node-js/" title="reviving require(esm)">reviving require(esm)</a> and <a href="https://joyeecheung.github.io/blog/2025/12/30/require-esm-in-node-js-from-experiment-to-stability/" title="its iteration process">its iteration process</a></p> Qiuyi Zhang (Joyee)https://joyeecheung.github.io/blog/Jasmine Tang: Rewriting analysis based warnings fall throughhttps://badumbatish.github.io/blog/rewriting_analysisbasedwarnings2026-01-05T00:00:00+00:00
Uhhh hey :) Jasmine Tanghttps://badumbatish.github.io/blog/rss.xmlQiuyi Zhang (Joyee): require(esm) in Node.jshttps://joyeecheung.github.io/blog/2024/03/18/require-esm-in-node-js/2025-12-30T18:58:11+00:00
<p>Recently I landed experimental <a target="_blank" rel="noopener" href="https://github.com/nodejs/node/pull/51977">support for <code>require()</code>-ing synchronous ES modules in Node.js</a>, a feature that has been long overdue</p> Qiuyi Zhang (Joyee)https://joyeecheung.github.io/blog/Igalia WebKit Team: WebKit Igalia Periodical #52https://blogs.igalia.com/webkit/blog/2025/wip-52/2025-12-25T18:26:43+00:00
<p>Update on what happened in WebKit in the week from December 16 to December 25.</p>
<p>
Right during the holiday season 🎄, the last WIP installment of the year comes packed with new releases, a couple of functions added to the public API, cleanups, better timer handling, and improvements to MathML and WebXR support.
</p>
<h2 id="cross-port-cat">Cross-Port 🐱</h2>
<div class="wip-item">
<p><a rel="external" href="https://github.com/WebKit/WebKit/pull/52066">Landed support for <code>font-size: math</code></a>. Now
<a rel="external" href="https://developer.mozilla.org/en-US/docs/Web/CSS/Reference/Properties/math-depth"><code>math-depth</code></a>
can automatically control the font size inside of <code><math></code> blocks, making
scripts and nested content smaller to improve readability and presentation.</p>
</div>
<div class="wip-item">
<p>Two new functions <a rel="external" href="https://commits.webkit.org/304810@main">have been added</a> to
the public API:</p>
<ul>
<li>
<p><code>webkit_context_menu_item_get_gaction_target()</code> to obtain the <code>GVariant</code>
associated with a context menu item created from a <code>GAction</code>.</p>
</li>
<li>
<p><code>webkit_context_menu_item_get_title()</code> may be used to obtain
the title of a context menu item.</p>
</li></ul></div>
<div class="wip-item">
</div>
<p><a rel="external" href="https://commits.webkit.org/304553@main">Improved timers</a>, by making some of
them use the <a rel="external" href="https://lwn.net/Articles/251413/">timerfd API</a>. This reduces
timer “lateness”—the amount of time elapsed between the configured trigger
time, and the effective one—, which in turn improves the perceived smoothness
of animations thanks to steadier frame delivery timings. Systems where the
<code>timerfd_create</code> and <code>timerfd_settime</code> functions are not available will
continue working as before.</p>
<div class="wip-item">
<p>On the WebXR front, support <a rel="external" href="https://commits.webkit.org/304631@main">was added</a>
for <code>XR_TRACKABLE_TYPE_DEPTH_ANDROID</code> through the <code>XR_ANDROID_trackables</code>
extension, which allows reporting depth information for elements that take part
in hit testing.</p>
</div>
<h3 id="graphics-frame-photo">Graphics 🖼️</h3>
<div class="wip-item">
<p>Landed <a rel="external" href="https://github.com/WebKit/WebKit/pull/54518">a change</a> that implements
non-composited page rendering in the WPE port. This new mode is disabled by
default, and may be activated by disabling the <code>AcceleratedCompositing</code> runtime
preference. In such case, the frames are rendered using a simplified code path
that does not involve the internal WebKit compositor. Therefore it may offer a
better performance in some specific cases on constrained embedded devices.</p>
</div>
<div class="wip-item">
<p>Since version 2.10.2, the <a rel="external" href="https://freetype.org">FreeType</a> library can be built
with direct support for loading fonts in the
<a rel="external" href="https://www.w3.org/TR/WOFF2/">WOFF2</a> format. Until now, the WPE and GTK WebKit
ports used <a rel="external" href="https://github.com/google/woff2">libwoff2</a> in an intermediate step
to convert those fonts on-the-fly before handing them to FreeType for
rendering. The CMake build system will now <a rel="external" href="https://commits.webkit.org/304864@main">detect when FreeType supports WOFF2
directly</a> and skip the conversion step.
This way, in systems which provide a suitable version of FreeType, <code>libwoff2</code>
will no longer be needed.</p>
</div>
<h2 id="wpe-webkit-pager">WPE WebKit 📟</h2>
<h3 id="wpe-platform-api-jigsaw">WPE Platform API 🧩</h3>
<div class="wip-description">
<p>New, modern platform API that supersedes usage of libwpe and WPE backends.</p>
</div>
<div class="wip-item">
<p>The legacy libwpe-based API <a rel="external" href="https://commits.webkit.org/304671@main">can now be disabled at build
time</a>, by toggling the
<code>ENABLE_WPE_LEGACY_API</code> CMake option. This allows removal of uneeded code when
an application is exclusively using the new WPEPlatform API.</p>
</div>
<h3 id="wpe-android-robot">WPE Android <a rel="external" href="https://github.com/Igalia/wpe-android">↗</a> 🤖</h3>
<div class="wip-description">
<p>Adaptation of WPE WebKit targeting the Android operating system.</p>
</div>
<div class="wip-item">
<p><a rel="external" href="https://developer.android.com/ndk/reference/group/a-hardware-buffer">AHardwareBuffer</a>
is <a rel="external" href="https://commits.webkit.org/304567@main">now supported</a> as backing for
accelerated graphics surfaces that can be shared across processes. This is the
last piece of the puzzle to use WPEPlatform on Android without involving
expensive operations to copy rendered frames back-and-forth between GPU and
system memory.</p>
</div>
<h2 id="releases-package">Releases 📦️</h2>
<div class="wip-item">
<p><a rel="external" href="https://webkitgtk.org/2025/12/16/webkitgtk2.50.4-released.html">WebKitGTK
2.50.4</a> and
<a rel="external" href="https://wpewebkit.org/release/wpewebkit-2.50.4.html">WPE WebKit 2.50.4</a> have
been released. These stable releases include a number of important patches for
security issues, and we urge users and distributors to update to this release
if they have not yet done it. An accompanying security advisory,
<code>WSA-2025-0010</code>, has been published
(<a rel="external" href="https://webkitgtk.org/security/WSA-2025-0010.html">GTK</a>,
<a rel="external" href="https://wpewebkit.org/security/WSA-2025-0010.html">WPE</a>).</p>
<p>Development releases of <a rel="external" href="https://webkitgtk.org/2025/12/19/webkitgtk2.51.4-released.html">WebKitGTK
2.51.4</a> and
<a rel="external" href="https://wpewebkit.org/release/wpewebkit-2.51.4.html">WPE WebKit 2.51.4</a> are
available as well, and may be used to preview upcoming features. As usual, bug
reports are <a rel="external" href="https://bugs.webkit.org">welcome in Bugzilla</a>.</p>
</div>
<h2 id="community-events-handshake">Community & Events 🤝</h2>
<div class="wip-item">
<p>Paweł Lampe has published a <a rel="external" href="https://blogs.igalia.com/plampe/wpe-performance-considerations-pre-rendering/">blog
post</a>
that discusses various pre-rendering techniques useful in the context of using
WPE on embedded devices.</p>
</div>
<div class="wip-end">
<p>That’s all for this week!</p>
</div> Igalia WebKit Teamhttps://blogs.igalia.com/webkitEric Meyer: Targeting by Reference in the Shadow DOMhttps://meyerweb.com/eric/thoughts/?p=57232025-12-19T15:04:58+00:00
<p>I’ve long made it clear that I don’t particularly care for the whole Shadow <abbr title="Document Object Model">DOM</abbr> thing. I believe I understand the problems it tries to solve, and I fully acknowledge that those are problems worth solving. There are just a bunch of things about it that don’t feel right to me, like how it can break accessibility in a number of ways.</p>
<p>One of those things is how it breaks stuff like the <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/button#commandfor"> <code>commandFor</code></a> attribute on <code><button></code>s, or the <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/button#popovertarget"> <code>popoverTarget</code></a> attribute, or a variety of <abbr title="Accessible Rich Internet Applications">ARIA</abbr> attributes such as <a href="https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA/Reference/Attributes/aria-labelledby"><code>aria-labelledby</code></a>. This happens because a Shadow DOMmed component creates a whole separate node tree, which creates a barrier (for a lot of things, to be clear; this is just one class of them).</p>
<p>At least, that’s <em>been</em> the case. There’s now <a href="https://github.com/WICG/webcomponents/blob/gh-pages/proposals/reference-target-explainer.md">a proposal to fix that</a>, and prototype implementations in both Chrome and Safari! In Chrome, it’s covered by the <strong>Experimental Web Platform features</strong> flag in <code>chrome://flags</code>. In Safari, you open the <strong>Develop > Feature Flags…</strong> dialog, search for “referenceTarget”, and enable both flags.</p>
<p>(Disclosure: My employer, <a href="https://igalia.com/">Igalia</a>, with support from <a href="https://nlnet.nl">NLnet</a>, did the WebKit implementation, and also a Gecko implementation that’s being reviewed as I write this.)</p>
<p>If you’re familiar with Shadow DOMming, you know that there are attributes for the <code><template></code> element like <code>shadowRootClonable</code> that set how the Shadow DOM for that particular component can be used. The proposal at hand is for a <code>shadowRootReferenceTarget</code> attribute, which is a string used to identify an element within the Shadowed DOM tree that should be the actual target of any references. This is backed by a <code>ShadowRoot.referenceTarget</code> API feature.</p>
<p>Take this simple setup as a quick example.</p>
<pre class="html"> <code><label for="consent">I agree to join your marketing email list for some reason</label>
<sp-checkbox id="consent">
<template>
<input id="setting" type="checkbox" aria-checked="indeterminate">
<span id="box"></span>
</template> </sp-checkbox></code> </pre>
<p>Assume there’s some JavaScript to make that stuff inside the Shadow DOM work as intended. (No, nothing this simple should really be a web component, but let’s assume that someone has created a whole multi-faceted component system for handling rich user interactions or whatever, and someone else has to use it for job-related reasons, and this is one small use of that system.) </p>
<p>The problem is, the <code><label></code> element’s <code>for</code> is pointing at <code>consent</code>, which is the ID of the component. The actual thing that should be targeted is the <code><input></code> element with the ID of <code>setting</code> . We can’t just change the markup to <code><label for="setting"></code> because that <code><input></code> is trapped in the Shadow tree, where none in the Light beyond may call for it. So it just plain old doesn’t work.</p>
<p>Under the Reference Target proposal, one way to fix this would look something like this in HTML:</p>
<pre class="html"> <code><label for="consent">I agree to join your marketing email list for some reason</label>
<sp-checkbox id="consent">
<template shadowRootReferenceTarget="setting">
<input id="setting" type="checkbox" aria-checked="indeterminate">
<span id="box"></span>
</template> </sp-checkbox></code> </pre>
<p>With this markup in place, if someone clicks/taps/otherwise activates the label, it points to the ID <code>consent</code> . That Shadowed component takes that reference and redirects it to an <em>effective target</em>  —  the reference target identified in its <code>shadowRootReferenceTarget</code> attribute.</p>
<p>You could also set up the reference with JavaScript instead of an HTML template:</p>
<pre class="html"> <code><label for="consent">I agree to join your marketing email list for some reason</label>
<sp-checkbox id="consent"></sp-checkbox></code> </pre>
<pre class="js"><code>class SpecialCheckbox extends HTMLElement {
checked = "mixed";
constructor() {
super();
this.shadowRoot_ = this.attachShadow({
referenceTarget: "setting"
});
// lines of code to Make It Go
}
}</code> </pre>
<p>Either way, the effective target is the <code><input></code> with the ID of <code>setting</code> .</p>
<p>This can be used in any situation where one element targets another, not just with <code>for</code> . The <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/input#form"> <code>form</code></a> and <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/input#list"> <code>list</code></a> attributes on inputs would benefit from this. So, too, would the relatively new <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/button#popovertarget"> <code>popoverTarget</code> </a>and <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/button#commandfor"> <code>commandFor</code></a> button attributes. And all of the <a href="https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA/Reference/Attributes#relationship_attributes">ARIA targeting attributes</a>, like <code>aria-controls</code> and <code>aria-errormessage</code> and <code>aria-owns</code> as well.</p>
<p>If reference targets are something you think would be useful in your own work, <strong>please</strong> give it a try in Chrome or Safari or both, to see if your use cases are being met. If not, you can <a href="https://github.com/WICG/webcomponents/issues/1120"> leave feedback on issue 1120</a> to share any problems you run into. If we’re going to have a Shadow DOM, the least we can do is make it as accessible and useful as possible.</p>
<hr /><p>Have something to say to all that? You can <a href="https://meyerweb.com/eric/thoughts/2025/12/19/targeting-by-reference-in-the-shadow-dom/#commentform">add a comment to the post</a>, or <a href="mailto:[email protected]?subject=In%20reply%20to%20%22Targeting%20by%20Reference%20in%20the%20Shadow%20DOM%22">email Eric directly</a>.</p> Eric Meyerhttps://meyerweb.com/eric/thoughtsPawel Lampe: WPE performance considerations: pre-renderinghttps://blogs.igalia.com/plampe/wpe-performance-considerations-pre-rendering/2025-12-19T00:00:00+00:00
<p>This article is a continuation of the series on <strong>WPE performance considerations</strong>. While the <a href="https://blogs.igalia.com/plampe/wpe-performance-considerations-dom-tree/">previous article</a> touched upon fairly low-level aspects of the DOM tree overhead,
this one focuses on more high-level problems related to managing the application’s workload over time. Similarly to before, the considerations and conclusions made in this blog post are strongly related to web applications
in the context of embedded devices, and hence the techniques presented should be used with extra care (and benchmarking) if one would like to apply those on desktop-class devices.</p>
<h2 id="the-workload" tabindex="-1">The workload <a class="header-anchor" href="https://blogs.igalia.com/plampe/wpe-performance-considerations-pre-rendering/">#</a></h2>
<p>Typical web applications on embedded devices have their workloads distributed over time in various ways. In practice, however, the workload distributions can usually be fitted into one of the following categories:</p>
<ol>
<li><strong>Idle applications with occasional updates</strong> - the applications that present static content and are updated at very low intervals. As an example, one can think of some static dashboard that presents static content and switches
the page every, say, 60 seconds - such as e.g. a static departures/arrivals dashboard on the airport.</li>
<li><strong>Idle applications with frequent updates</strong> - the applications that present static content yet are updated frequently (or are presenting some dynamic content, such as animations occasionally). In that case, one can imagine a similar
airport departures/arrivals dashboard, yet with the animated page scrolling happening quite frequently.</li>
<li><strong>Active applications with occasional updates</strong> - the applications that present some dynamic content (animations, multimedia, etc.), yet with major updates happening very rarely. An example one can think of in this case is an application
playing video along with presenting some metadata about it, and switching between other videos every few minutes.</li>
<li><strong>Active applications with frequent updates</strong> - the applications that present some dynamic content and change the surroundings quite often. In this case, one can think of a stock market dashboard continuously animating the charts
and updating the presented real-time statistics very frequently.</li>
</ol>
<p>Such workloads can be well demonstrated on charts plotting the browser’s CPU usage over time:</p>
<center>
<source type="image/avif"><source type="image/webp"><img alt="Typical web application workloads." src="https://blogs.igalia.com/plampe/img/obgL44nHKc-1385.png" width="1385" height="360" />
</source></source></center>
<p>As long as the peak workload (due to updates) is small, no negative effects are perceived by the end user. However, when the peak workload is significant, some negative effects may start getting noticeable.</p>
<p>In case of applications from groups (1) and (2) mentioned above, a significant peak workload may not be a problem at all. As long as there are no continuous visual changes and no interaction is allowed during updates, the end-user
is unable to notice that the browser was not responsive or missed some frames for some period of time. In such cases, the application designer does not need to worry much about the workload.</p>
<p>In other cases, especially the ones involving applications from groups (3) and (4) mentioned above, the significant peak workload may lead to visual stuttering, as any processing making the browser busy for longer than 16.6 milliseconds
will lead to lost frames. In such cases, the workload has to be managed in a way that the peaks are reduced either by optimizing them or distributing them over time.</p>
<h4 id="first-step-optimization" tabindex="-1">First step: optimization <a class="header-anchor" href="https://blogs.igalia.com/plampe/wpe-performance-considerations-pre-rendering/">#</a></h4>
<p>The first step to addressing the peak workload is usually optimization. Modern web platform gives a full variety of tools to optimize all the stages of web application processing done by the browser. The usual process of optimization is a
2-step cycle starting with measuring the bottlenecks and followed by fixing them. In the process, the usual improvements involve:</p>
<ul>
<li>using CSS containment,</li>
<li>using shadow DOM,</li>
<li>promoting certain parts of the DOM to layers and manipulating them with transforms,</li>
<li>parallelizing the work with workers/worklets,</li>
<li>using the <code>visibility</code> CSS property to separate painting from layout,</li>
<li>optimizing the application itself (JavaScript code, the structure of the DOM, the architecture of the application),</li>
<li>etc.</li>
</ul>
<h4 id="second-step-pre-rendering" tabindex="-1">Second step: pre-rendering <a class="header-anchor" href="https://blogs.igalia.com/plampe/wpe-performance-considerations-pre-rendering/">#</a></h4>
<p>Unfortunately, in practice, it’s not uncommon that even very well optimized applications still have too much of a peak workload for the constrained embedded devices they’re used on. In such cases, the last resort solution is
<strong>pre-rendering</strong>. As long as it’s possible from the application business-logic perspective, having at least some web page content pre-rendered is very helpful in situations when workload has to be managed, as <strong>pre-rendering</strong>
allows the web application designer to choose the precise moment when the content should actually be rendered and how it should be done. With that, it’s possible to establish a proper trade-off between reduction in peak workload and
the amount of extra memory used for storing the pre-rendered contents.</p>
<h2 id="pre-rendering-techniques" tabindex="-1">Pre-rendering techniques <a class="header-anchor" href="https://blogs.igalia.com/plampe/wpe-performance-considerations-pre-rendering/">#</a></h2>
<p>Nowadays, the web platform provides at lest a few widely-adapted APIs that provide means for the application to perform various kinds of pre-rendering. Also, due to the ways the browsers are implemented, some APIs can be purposely misused
to provide pre-rendering techniques not necessarily supported by the specification. However, in the pursuit of good trade-offs, all the possibilities should be taken into account.</p>
<p>Before jumping into particular pre-rendering techniques, it’s necessary to emphasize that the <strong>pre-rendering</strong> term used in this article refers to the actual rendering being done earlier than it’s visually presented. In that
sense, the resource is rasterized to some intermediate form when desired and then just composited by the browser engine’s compositor later.</p>
<h4 id="pre-rendering-offline" tabindex="-1">Pre-rendering offline <a class="header-anchor" href="https://blogs.igalia.com/plampe/wpe-performance-considerations-pre-rendering/">#</a></h4>
<p>The most basic (and limited at the same time) pre-rendering technique is one that involves rendering offline i.e. before the browser even starts. In that case, the first limitation is that the content to be rendered must be known
beforehand. If that’s the case, the rendering can be done in any way, and the result may be captured as e.g. raster or vector image (depending on the desired trade-off). However, the other problem is that such a rendering is usually out of
the given web application scope and thus requires extra effort. Moreover, depending on the situation, the amount of extra memory used, the longer web application startup (due to loading the pre-rendered resources), and the processing
power required to composite a given resource, it may not always be trivial to obtain the desired gains.</p>
<h4 id="pre-rendering-using-canvas" tabindex="-1">Pre-rendering using canvas <a class="header-anchor" href="https://blogs.igalia.com/plampe/wpe-performance-considerations-pre-rendering/">#</a></h4>
<p>The first group of actual pre-rendering techniques happening during web application runtime is related to <a href="https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API">Canvas</a> and
<a href="https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas">OffscreenCavas</a>. Those APIs are really useful as they offer great flexibility in terms of usage and are usually very performant.
However, in this case, the natural downside is the lack of support for rendering the DOM inside the canvas. Moreover, canvas has a very limited support for painting text — unlike the DOM, where
CSS has a significant amount of features related to it. Interestingly, there’s an ongoing proposal called <a href="https://github.com/WICG/html-in-canvas">HTML-in-Canvas</a> that could resolve those limitations
to some degree. In fact, Blink has a functioning prototype of it already. However, it may take a while before the spec is mature and widely adopted by other browser engines.</p>
<p>When it comes to actual usage of canvas APIs for pre-rendering, the possibilities are numerous, and there are even more of them when combined with processing using <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API">workers</a>.
The most popular ones are as follows:</p>
<ul>
<li>rendering to an invisible canvas and showing it later,</li>
<li>rendering to a canvas detached from the DOM and attaching it later,</li>
<li>rendering to an invisible/detached canvas and producing an image out of it to be shown later,</li>
<li>rendering to an offscreen canvas and producing an image out of it to be shown later.</li>
</ul>
<p>When combined with workers, some of the above techniques may be used in the worker threads with the rendered artifacts transferred to the main for presentation purposes. In that case, one must be careful with
the transfer itself, as some objects may get serialized, which is very costly. To avoid that, it’s recommended to use <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Transferable_objects">transferable objects</a>
and always perform a proper benchmarking to make sure the transfer is not involving serialization in the particular case.</p>
<p>While the use of canvas APIs is usually very straightforward, one must be aware of two extra caveats.</p>
<p>First of all, in the case of many techniques mentioned above, there is no guarantee that the browser will perform actual rasterization at the given point in time. To ensure the rasterization is triggered, it’s usually
necessary to enforce it using e.g. a dummy readback (<code>getImageData()</code>).</p>
<p>Finally, one should be aware that the usage of canvas comes with some overhead. Therefore, creating many canvases or creating them often, may lead to performance problems that could outweigh the gains from the
pre-rendering itself.</p>
<h4 id="pre-rendering-using-eventually-invisible-layers" tabindex="-1">Pre-rendering using eventually-invisible layers <a class="header-anchor" href="https://blogs.igalia.com/plampe/wpe-performance-considerations-pre-rendering/">#</a></h4>
<p>The second group of pre-rendering techniques happening during web application runtime is limited to the DOM rendering and comes out of a combination of purposeful spec misuse and tricking the browser engine into making it rasterizing
on demand. As one can imagine, this group of techniques is very much browser-engine-specific. Therefore, it should always be backed by proper benchmarking of all the use cases on the target browsers and target hardware.</p>
<p>In principle, all the techniques of this kind consist of 3 parts:</p>
<ol>
<li>Enforcing the content to be pre-rendered being placed on a separate layer backed by an actual buffer internally in the browser,</li>
<li>Tricking the browser’s compositor into thinking that the layer needs to be rasterized right away,</li>
<li>Ensuring the layer won’t be composited eventually.</li>
</ol>
<p>When all the elements are combined together, the browser engine will allocate an internal buffer (e.g. texture) to back the given DOM fragment, it will process that fragment (style recalc, layout), and rasterize it right away. It will do so
as it will not have enough information to allow delaying the rasterization of the layer (as e.g. in case of <code>display: none</code>). Then, when the compositing time comes, the layer will turn out to be invisible in practice
due to e.g. being occluded, clipped, etc. This way, the rasterization will happen right away, but the results will remain invisible until a later time when the layer is made visible.</p>
<p>In practice, the following approaches can be used to trigger the above behavior:</p>
<ul>
<li>for <strong>(1)</strong>, the CSS properties such as <code>will-change: transform</code>, <code>z-index</code>, <code>position: fixed</code>, <code>overflow: hidden</code> etc. can be used depending on the browser engine,</li>
<li>for <strong>(2)</strong> and <strong>(3)</strong>, the CSS properties such as <code>opacity: 0</code>, <code>overflow: hidden</code>, <code>contain: strict</code> etc. can be utilized, again, depending on the browser engine.</li>
</ul>
<h5>The scrolling trick</h5>
<p>While the above CSS properties allow for various combinations, in case of WPE WebKit in the context of embedded devices (tested on <strong>NXP i.MX8M Plus</strong>), the combination that has proven to yield the best performance benefits turns
out to be a simple approach involving <code>overflow: hidden</code> and scrolling. The example of such an approach is explained below.</p>
<p>Suppose the goal of the application is to update a big table with numbers once every N frames — like in the following demo:
<a href="https://scony.github.io/web-examples/text/random-numbers-bursting-in-table.html?cs=20&rs=20&if=59">random-numbers-bursting-in-table.html?cs=20&rs=20&if=59</a></p>
<center>
<source type="image/avif"><source type="image/webp"><img alt="Bursting numbers demo." src="https://blogs.igalia.com/plampe/img/0PwhZK5P7T-654.png" width="654" height="653" />
</source></source></center>
<p>With the number of idle frames (<code>if</code>) set to 59, the idea is that the application does nothing significant for the 59 frames, and then every 60th frame it updates all the numbers in the table.</p>
<p>As one can imagine, on constrained embedded devices, such an approach leads to a very heavy workload during every 60th frame and hence to lost frames and unstable application’s FPS.</p>
<p>As long as the numbers are available earlier than every 60th frame, the above application is a perfect example where pre-rendering could be used to reduce the peak workload.</p>
<p>To simulate that, the 3 variants of the approach involving the <strong>scrolling trick</strong> were prepared for comparison with the above:</p>
<ul>
<li><a href="https://scony.github.io/web-examples/text/random-numbers-bursting-in-table-prerendered-1.html?cs=20&rs=20&if=59">random-numbers-bursting-in-table-prerendered-1.html?cs=20&rs=20&if=59</a></li>
<li><a href="https://scony.github.io/web-examples/text/random-numbers-bursting-in-table-prerendered-2.html?cs=20&rs=20&if=59">random-numbers-bursting-in-table-prerendered-2.html?cs=20&rs=20&if=59</a></li>
<li><a href="https://scony.github.io/web-examples/text/random-numbers-bursting-in-table-prerendered-3.html?cs=20&rs=20&if=59">random-numbers-bursting-in-table-prerendered-3.html?cs=20&rs=20&if=59</a></li>
</ul>
<p>In the above demos, the idea is that each cell with a number becomes a scrollable container with 2 numbers actually — one above the other. In that case, because <code>overflow: hidden</code> is set, only one of the numbers is visible while the
other is hidden — depending on the current scrolling:</p>
<center>
<source type="image/avif"><source type="image/webp"><img alt="Scrolling trick explained." src="https://blogs.igalia.com/plampe/img/qFqjTXuuSo-611.png" width="611" height="348" />
</source></source></center>
<p>With such a setup, it’s possible to update the invisible numbers during <strong>idle</strong> frames without the user noticing. Due to how WPE WebKit accelerates the scrolling, changing the invisible
numbers, in practice, triggers the layout and rendering right away. Moreover, the actual rasterization to the buffer backing the scrollable container happens immediately (depending on the tiling settings), and hence the high cost of layout
and text rasterization can be distributed. When the time comes, and all the numbers need to be updated, the scrollable containers can be just scrolled, which in that case turns out to be ~2 times faster than updating all the numbers in place.</p>
<p>To better understand the above effect, it’s recommended to compare the mark views from sysprof traces of the
<a href="https://scony.github.io/web-examples/text/random-numbers-bursting-in-table.html?cs=10&rs=10&if=11">random-numbers-bursting-in-table.html?cs=10&rs=10&if=11</a> and
<a href="https://scony.github.io/web-examples/text/random-numbers-bursting-in-table-prerendered-1.html?cs=10&rs=10&if=11">random-numbers-bursting-in-table-prerendered-1.html?cs=10&rs=10&if=11</a> demos:</p>
<center>
<source type="image/avif"><source type="image/webp"><img alt="Sysprof from basic demo." src="https://blogs.igalia.com/plampe/img/NVtyG7e_K1-2363.png" width="2363" height="1169" />
</source></source></center>
<br /><br /><br />
<center>
<source type="image/avif"><source type="image/webp"><img alt="Sysprof from pre-rendering demo." src="https://blogs.igalia.com/plampe/img/6du_zbm-hI-2363.png" width="2363" height="1172" />
</source></source></center>
<p>While the first sysprof trace shows very little processing during 11 idle frames and a big chunk of processing (21 ms) every 12th frame, the second sysprof trace shows how the distribution of load looks. In
that case, the amount of work during 11 idle frames is much bigger (yet manageable), but at the same time, the formerly big chunk of processing every 12th frame is reduced almost 2 times (to 11 ms). Therefore, the overall
frame rate in the application is much better.</p>
<h5>Results</h5>
<p>Despite the above improvement speaking for itself, it’s worth summarizing the improvement with the benchmarking results of the above demos obtained from the <strong>NXP i.MX8M Plus</strong> and presenting the application’s average
frames per second (FPS):</p>
<center>
<source type="image/avif"><source type="image/webp"><img alt="Benchmarking results." src="https://blogs.igalia.com/plampe/img/YqNYgMaEpQ-1104.png" width="1104" height="204" />
</source></source></center>
<p>Clearly, the positive impact of pre-rendering can be substantial depending on the conditions. In practice, when the rendered DOM fragment is more complex, the trick such as above can yield even better results.
However, due to how tiling works, the effect can be minimized if the content to be pre-rendered spans multiple tiles. In that case, the browser may defer rasterization until the tiles are actually needed. Therefore,
the above needs to be used with care and always with proper benchmarking.</p>
<h2 id="conclusions" tabindex="-1">Conclusions <a class="header-anchor" href="https://blogs.igalia.com/plampe/wpe-performance-considerations-pre-rendering/">#</a></h2>
<p>As demonstrated in the above sections, when it comes to pre-rendering the contents to distribute the web application workload over time, the web platform gives both the official APIs to do it, as well as unofficial
means through purposeful misuse of APIs and exploitation of browser engine implementations. While this article hasn’t covered all the possibilities available, the above should serve as a good initial read with some easy-to-try
solutions that may yield surprisingly good results. However, as some of the ideas mentioned above are very much browser-engine-specific, they should be used with extra care and with the limitations (lack of portability)
in mind.</p>
<p>As the web platform constantly evolves, the pool of pre-rendering techniques and tricks should keep evolving as well. Also, as more and more web applications are used on embedded devices, more pressure should be
put on the specification, which should yield more APIs targeting the low-end devices in the future. With that in mind, it’s recommended for the readers to stay up-to-date with the latest specification and
perhaps even to get involved if some interesting use cases would be worth introducing new APIs.</p> Pawel Lampehttps://blogs.igalia.com/plampe/Delan Azabani: Web engine CI on a shoestring budgethttps://www.azabani.com/2025/12/18/shoestring-web-engine-ci2025-12-18T10:00:00+00:00
<p>Servo is a greenfield web browser engine that supports many platforms. Automated testing for the project requires building Servo for all of those platforms, plus several additional configurations, and running nearly two million tests including the entire Web Platform Tests. How do we do all of that in under half an hour, <em>without</em> a hyperscaler budget for compute and an entire team to keep it all running smoothly, and securely enough to run untrusted code from contributors?
<p>We’ve answered these questions by building a CI runner orchestration system for GitHub Actions that we can run on our own servers, using ephemeral virtual machines for security and reproducibility. We also discuss how we implemented graceful fallback from self-hosted runners to GitHub-hosted runners, the lessons we learned in automating image rebuilds, and how we could port the system to other CI platforms like Forgejo Actions.
<p>This is a transcript post for a talk I gave internally at Igalia.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0033.jpg" alt="Web engine CI on a shoestring budget
delan azabani (she/her)
azabani.com
November 2025" />
</div>
<div class="_spacer"></div>
<p>Let's talk about how Servo can have fast CI for not a lot of money.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0034.jpg" alt="Servo's situation
- Servo currently uses GitHub Actions (GHA) quite heavily
- Many platforms: Linux, Windows, macOS, Android, and OpenHarmony
- Many tests: Web Platform Tests (50K+ tests, 1.8M+ subtests), WebGPU CTS, devtools, unit tests…
- Many configurations: MSRV, libservo, linting, release, benchmarking…
- GHA is a frustrating CI service with baffling limitations" />
</div>
<div class="_spacer"></div>
<p>Servo is a greenfield web browser engine. And being a web browser engine, it has some pretty demanding requirements for its CI setup.
<p>We have to build and run Servo for many platforms, including three desktop platforms and two mobile platforms.
<p>We have to run many, many tests, the main bulk of which is the entire Web Platform Tests suite, which is almost 2 million subtests. We also have several smaller test suites as well, like the WebGPU tests and the DevTools tests and so on.
<p>We have to build Servo in many different configurations for special needs. So we might build Servo with the oldest version of Rust that we still support, just to make sure that still works. We might build Servo as a library in the same way that it would be consumed by embedders. We have to lint the codebase. We have to build with optimizations for nightly and monthly releases. We have to build with other optimizations for benchmarking work, and so on.
<p>And as you'll see throughout this talk, we do this on GitHub and GitHub Actions, but GitHub Actions is a <em>very</em> frustrating CI service, and it has many baffling limitations. And as time goes on, I think Servo being on GitHub and GitHub Actions will be more for the network effects we had early on than for any particular merits of these platforms.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0035.jpg" alt="GitHub-hosted runners
- GitHub provides their own runners
- Essential for glue logic and small workloads
- Painful for building and testing a browser engine
- The gratis runners are tiny and slow
- The paid runners are very expensive
- Need more tools or deps? Install them every time
- Caching is a joke, not suitable for incremental builds" />
</div>
<div class="_spacer"></div>
<p>On GitHub Actions, GitHub provides their own first-party runners.
<p>And these runners are very useful for small workloads, as well as the logic that coordinates workloads. So this would include things like taking a tryjob request for "linux", and turning that into a run that just builds Servo for Linux. Or you might get a try job request for "full", and we'll turn that into a run that builds Servo for all of the platforms and runs all of the tests.
<p>But for a project of our scale, these runners really fall apart once you get into any workload beyond that.
<p>They have runners that are free of charge. They are very tiny, resource constrained, and it seems GitHub tries to cram as many of these runners onto each server as they possibly can.
<p>They have runners that you can pay for, but they're very very expensive. And I believe this is because you not only pay a premium for like hyperscaler cloud compute rates, but you also pay a premium on top of that for the convenience of having these runners where you can essentially just flip a switch and get faster builds. So they really make you pay for this.
<p>Not only that, but using GitHub hosted runners you can't really customize the image that runs on the runners besides being able to pull in containers, which is also kind of slow and not useful all the time. If there are tools and dependencies that you need that aren't in those images, you need to install them every single time you run a job or a workflow, which is such a huge waste of time, energy, and money, no matter whose money it is.
<p>There are also some caching features on GitHub Actions, but they're honestly kind of a joke. The caching performs really poorly, and there's not a lot you can cache with them. So in general, they're not really suitable for doing things like incremental builds.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0036.jpg" alt="Alternatives considered
- SaaS providers: expensive, often no Win and/or macOS
- RunsOn: less expensive, AWS-only, no macOS support
- " />
</div>
<div class="_spacer"></div>
<p>So we have all these slow builds, and they're getting slower, and we want to make them faster. So we considered several alternatives.
<p>The first that comes to mind are these third-party runner providers. These are things like Namespace, Warpbuild, Buildjet. There's so many of them. The caveat with these is that they're almost always... almost as expensive per hour as GitHub's first-party runners. And I think this is because they try to focus on providing features like better caching, that allow you to accumulate less hours on their runners. And in doing so, they don't really have any incentive to also compete on the hourly rate.
<p>There is one exception: there's a system called RunsOn. It's sort of an off-the-shelf thing that you can grab this software and operate it yourself, but you do have to use AWS. So it's very tightly coupled to AWS. And both of these alternatives, they often lack support for certain platforms on their runners. Many of them are missing Windows or missing macOS or missing both. And RunsOn is missing macOS support, and probably won't get macOS support for the foreseeable future.
<p>We considered offloading some of the work that these free GitHub hosted runners do onto our own servers with, let's call them like, "proxy jobs". The idea would be that you'd still use free GitHub hosted runners, but you'd do the work remotely on another server. The trouble with this is that then you're still using these free GitHub hosted runners, which take up your quota of your maximum concurrent free runners.
<p>And it's also tricky to do this without losing access to the broader ecosystem of prebuilt Actions. So these Actions are like steps that you can pull in that will let you do useful things like managing artifacts and installing dependencies and so on. But it's one thing to say, okay, my workload is this shell script, and I'm going to run it remotely now. That's easy enough to do, but it's a lot harder to say, okay, well, I've got this workflow that has a bunch of scripts and a bunch of external Actions made by other people, and I'm going to run all of this remotely. I'm going to make sure that all of these Actions are also compatible with being run remotely. That's a lot trickier. That said, you should probably avoid relying too heavily on this ecosystem anyway. It's a great way to get locked into the platform, and working with YAML honestly really sucks.
<p>So we could set up an entire CI service. There are some CI services like Jenkins, Bamboo... honestly, most CI services nowadays have support for built-in container orchestration. So they can spin up containers for each runner, for each job, autonomously. And while they can do containers, none of them have really solved the problem of virtual machine orchestration out of the box. And this is a problem for us, because we want to use virtual machines for that security and peace of mind, which I'll explain in more detail a little bit later. And seeing as we lack the dedicated personnel to operate an entire CI service and have that be on the critical path — we didn't want to have someone on call for outages — this was probably not the best option for us at the moment.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0037.jpg" alt="Self-hosted runners
- Self-hosted runners are better!
- Give the runners as much RAM and CPU as we want
- Custom build environments tailored to the project
- Bake in whatever tools we want
- Bake in a prebuilt Servo for incremental builds" />
</div>
<div class="_spacer"></div>
<p>What we decided to do was set up some self-hosted runners.
<p>These solve most of our problems, primarily because we can throw as much hardware and resources at these runners as we want. We can define the contention ratios.
<p>And better yet, we can also customize the images that the runners use. By being able to customize the runner images, we can move a lot of steps and a lot of work that used to be done on every single workflow run, and do it only once, or once per build. We only rebuild the runner images, say, once a week. So that significantly saves... it cuts down on the amount of work that we have to do.
<p>This is not just installing tools and dependencies, but it's also enabling incremental builds quite powerfully, which we'll see a little bit later on.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0038.jpg" alt="How much faster?
- mach try full workflow: 61m30s → 25m47s (−58%)
- linux-unit-tests job: 34m29s → 3m15s (−90%)
- windows-unit-tests job: 59m14s → 8m4s (−86%)
- lint job: 11m54s → 2m25s (−79%)
- wpt jobs: 25m35s → 20m50s (−18%)
- But we also went from 20 runners → 3 runners" />
</div>
<div class="_spacer"></div>
<p>How much faster is this system with the self-hosted runners? Well, it turns out, quite a lot faster.
<p>We've got this typical workflow that you use when you're testing your commits, when you're making a pull request. And if you kick off a tryjob like this, several of the jobs that we've now offloaded onto self-hosted runners are now taking 70, 80, even 90% less time than they did on GitHub hosted runners, which is excellent.
<p>And even for the web platform tests, we found more modest time savings with the capacity that we've allocated to them. But as a result, even though the savings are a bit more modest, what's worth highlighting here is that we were able to go from twenty runners — we had to parallelise this test suite across twenty runners before — and now we only need three.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0039.jpg" alt="What makes our system unique
- Augments the built-in CI service of the forge
- Almost transparent user experience
- Linux, Windows, and macOS runners
- Graceful fallback to GitHub-hosted runners
- Secure enough for a large public project
- Completely self-hosted, so it's dirt cheap^
^ operating expenses, not necessarily labour" />
</div>
<div class="_spacer"></div>
<p>Some things that make our system unique and that we're pretty proud of — and these things apply to all versions of our system, which we've been developing over the last 12 to 18 months.
<p>The first is that we build on top of the native CI service of the Git forge. So in this case, it's GitHub and GitHub Actions. It could be Forgejo and Forgejo Actions in the future, and we're working on that already.
<p>We also want to give users more or less a transparent user experience, the idea being that users should not notice any changes in their day-to-day work, besides the fact that their builds have gotten faster. And I think for the most part, we have achieved that.
<p>Our system supports all of the platforms that GitHub Actions supports for their runners, including Linux, Windows, and macOS, and we could even support some of the other platforms that Forgejo Actions supports in the future, including BSD.
<p>We have the ability to set up a job so that it can try to use self-hosted runners if they're available, but fall back to GitHub hosted runners if there's none available for whatever reason, like maybe they're all busy for the foreseeable future, or the servers are down or something like that. We have the ability to fall back to GitHub hosted runners, and this is something that was quite complicated to build, actually. And we have a whole section of this talk explaining how that works, because this is something that's not actually possible with the feature set that GitHub provides in their CI system.
<p>It's secure enough for a large public project like Servo, where we don't necessarily know all of our contributors all that well personally. And this is in large part because we use virtual machines, instead of just containers, for each run and for each job. My understanding is that it is possible to build a system like this securely with containers, but in practice it's a lot harder to get that right as a security boundary than if you had the benefit of a hypervisor.
<p>And of course it is all completely self-hosted, which makes it about as cheap as it gets, because your operating expenses are really just the costs of bare metal compute.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0040.jpg" alt="Completely self-hosted
so it's dirt cheap^
- We spend 312 EUR/month on general-purpose runners
- On comparable GitHub runners: 1421–2500 EUR/month
- On comparable third-party runners: 503–1077 EUR/month
^ operating expenses, not necessarily labour" />
</div>
<div class="_spacer"></div>
<p>Now, how cheap is that? Well, in our deployment in Servo, we spend about 300 EUR per month on servers that do general-purpose runners, and these handle most of our workload.
<p>If we were to compare that to if we were running on GitHub-hosted runners, there'd be almost an order of magnitude increase in costs, somewhere like 1400 EUR to over 2000 EUR per month if we were doing the same work on GitHub-hosted runners.
<p>There are also significant increases if we went with third-party runner providers as well, although not quite as much.
<p>But in truth, this is actually kind of an unfair comparison, because it assumes that we would need the same amount of hours between if we were running on self-hosted runners or if we were running on GitHub hosted runners. And something that you'll see throughout this talk is that this is very much not the case. We spend so many fewer hours running our jobs, because we have to do so much less work on them. So in reality, the gap between our expenses with these two approaches would actually be a lot wider.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0041.jpg" alt="Three ways to use runners
- Mandatory self-hosted: (with natural queueing)
- runs-on: self-hosted-image:servo-ubuntu2204
- Graceful fallback to GitHub-hosted:
- Decision job: POST https://monitor/select-runner
- runs-on: ${{ needs.decision.outputs.label }}
- Graceful fallback plus queueing:
- Decision job: POST https://queue/select-runner
- runs-on: ${{ needs.decision.outputs.label }}" />
</div>
<div class="_spacer"></div>
<p>There are three ways to use our CI runner system.
<p>The first one is the way that GitHub designed self-hosted runners to be used. The way they intend you to use self-hosted runners is that when you define a job, you use this <code>runs-on</code> setting to declare what kind of runners your job should run on. And you can use labels for GitHub hosted runners, or you can define arbitrary labels for your own runners and select those instead. But you have to choose one or the other in advance. And if you do that, you can't have any kind of fallback, which was a bit of a drawback for us, especially early on. One benefit of this, though, is that you get the natural ability to queue up jobs. Because if, at the time of queuing up a new job, if there's no self-hosted runners available yet, the job will stay in a queued state until a self-hosted runner becomes available. And it's nice that you essentially get that for free. <em>But</em> you have no ability to have this kind of graceful fallback.
<p>So we built some features to allow you to do graceful fallback. And how this works is that each of the servers that operates these runners has a web API as well. And you can hit that web API to check if there are runners available and reserve them. Reserving them is something you have to do if you're doing graceful fallback. But I'll explain that in more detail a bit later on. And in doing so, because you have a job that can check if there are runners available, you can now parameterize the <code>runs-on</code> setting and decide "I want to use a self-hosted runner, or a GitHub hosted runner". It's unclear if this is going to be possible on Forgejo Actions yet, so we may have to add that feature, but it's certainly possible on GitHub Actions.
<p>Now, the downside of this is that you do lose the ability, that natural ability, to queue up jobs, and I'll explain why that is a bit later. But in short, we have a queue API that mitigates this problem, because you can hit the queue API, and it can have a full view of the runner capacity, and either tell you to wait, or forward your request once capacity becomes available.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0042.jpg" alt="Faster checkouts
- No repo checkout unless you run actions/checkout
- But servo/servo has 130K+ files and 6K+ directories
- This is especially slow on Windows and macOS
- Bake a repo checkout into every runner image
- Replace actions/checkout with our own action:
git fetch --depth=1 origin $commit
git switch --detach
git reset --hard FETCH_HEAD" />
</div>
<div class="_spacer"></div>
<p>Some things that you can do with our system, that you can't do with GitHub hosted runners. One of them is check out the repo significantly faster.
<p>Something about GitHub Actions and how it works is that, if you run a job, you run a workflow in a repo, you don't actually get a checkout of the repo. You don't get a clone of the repo out of the box, unless you explicitly add a step that does a checkout. And this is fine for the most part, it works well enough for most users and most repos.
<p>But the Servo repo has over 130,000 files across 6,000 directories, and that's just the <em>tracked</em> files and directories. And as a result, even if we use things like shallow clones, there's just kind of no getting around the fact that cloning this repo and checking it out is just <em>slow</em>. It's unavoidably slow.
<p>And it's <em>especially</em> slow on Windows and macOS, where the filesystem performance is honestly often pretty poor compared to Linux. So we want to make our checkouts faster.
<p>Well, what we can do is, we can actually move the process of cloning and checking out the repo from the build process, and move that into the image build process. So we only do it once, when we're building the runner images.
<p>Then what we can do is go into the jobs that run on self-hosted runners, and switch out the stock checkout action with our own action. And our own action will just use the existing clone of the repo, it will fetch the commit that we actually need to build on, then switch to it, and check it out.
<p>And as a result, we can check out the Servo repo pretty reliably in a couple of seconds, instead of having to check out the entire repo from scratch.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0043.jpg" alt="Incremental builds
- Cargo supports incremental builds
- We're now baking a repo checkout into every image
- Why not build Servo and bake that in too?
- Not perfect — some compilation units get false rebuilds
- Probably don't use this for release artifacts" />
</div>
<div class="_spacer"></div>
<p>Something that flows on from that, though, is that if we are baking a copy of the Servo repo into our runner images, well, what if we just bake a copy of the built artifacts as well? Like, why don't we just build Servo, and bake that into the image?
<p>And this will allow us to do incremental builds, because Servo is a Rust project, and we use Cargo, and Cargo supports incremental builds. As a result, by doing this, when you run a job on our CI system, most of the time we only have to rebuild a handful of crates that have changed, and not have to rebuild all of Servo from scratch.
<p>Now, this is not perfect. Sometimes we'll have some compilation units that get falsely rebuilt, but this works well enough, for the most part, that it's actually a significant time savings.
<p>I also probably wouldn't trust this for building artifacts that you actually want to release in a finished product, just because of the kinds of bugs that we've seen in Cargo's incremental build support.
<p>But for everything else, just your typical builds where you do like commit checks and such, this is very very helpful.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0044.jpg" alt="Servo's deployment
- Five servers on Hetzner^
- 3x AX102 (Zen 4 16c/32t, 128G RAM) = 312 EUR/month
- 2x AX42 (Zen 3 8c/16t, 64G RAM) = 92 EUR/month
- NixOS + libvirt + KVM + ZFS
- Custom orchestration
^ not including OpenHarmony runners" />
</div>
<div class="_spacer"></div>
<p>Servo has this system deployed, and it's had it deployed for at least the last year or so.
<p>Nowadays we have three servers which are modestly large, and we use these for the vast majority of our workload. We also have two smaller servers that we use for specific benchmarking tasks.
<p>The stack on these servers, if you could call it that, is largely things that I personally was very familiar with, because I built this. So we've got NixOS for config management, the hypervisor is libvirt and Linux KVM, and the storage is backed by ZFS. The process of actually building the images, and managing the lifecycle of the virtual machines though, is done with some custom orchestration tooling that we've written.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0045.jpg" alt="How does it work?
- Monitor service orchestrates the runners
- Rebuilds virtual machine images
- Spawns virtual machines for runners
- Registers runners with CI service API
- Labels runners when asked to reserve them
(optional, but required for graceful fallback)
- Queue service allows queueing with fallback (optional)" />
</div>
<div class="_spacer"></div>
<p>This tooling consists of two services. The monitor service, which runs on every server that operates self-hosted runners, and the queue service, which is single and global.
<p>So the monitor service does the vast majority of the work. It rebuilds virtual machine images, these templates. It clones the templates into virtual machines for each runner, for each job. It registers the runners with the CI service using its API, so that it can receive jobs. And it can also label the runners to tie them to specific jobs when asked to reserve them. This is optional, but it is a required step if we're doing graceful fallback.
<p>Now, with graceful fallback, you do lose the ability to naturally queue up jobs. So what we've put on top of that is a single queue service that sits in front of the cluster, and it essentially acts as a reverse proxy. It's quite thin and simple, and there's a single one of them, not one per server, for the same reason as the general principle of, like, when you go to a supermarket, it's more efficient to have a single large queue, a single long queue of customers that gets dispatched to a bunch of shopkeepers. That's more efficient than having a sort of per-shopkeeper queue, especially when it comes to, like, moving jobs dynamically in response to the availability.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0046.jpg" alt="Graceful fallback" />
</div>
<div class="_spacer"></div>
<p>So we've got a whole section here about how graceful fallback works. I might cut this from shorter versions of the talk, but let's jump into it.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0047.jpg" alt="Graceful fallback
- Every job has to choose a runner label in advance
runs-on: ubuntu-latest # GitHub-hosted
runs-on: self-hosted-image:servo-ubuntu2204
- Once you choose the runner label, there's no turning back
- Borrowing a built-in label does not prioritise self-hosted runners over GitHub-hosted runners
- So there's no way to fall back… or is there?" />
</div>
<div class="_spacer"></div>
<p>On GitHub Actions, every job has this <code>runs-on</code> setting, that defines what kind of runner it needs to get assigned to. It has to define this in advance, before the job runs.
<p>And annoyingly, once you choose, "I want to run on a GitHub hosted runner" or "I want to run on a self-hosted runner", once you decide that, there's no turning back. Now, if you've decided that my job needs to run on a self-hosted runner, it can <em>only</em> run on a self-hosted runner. And now the outcome of your job, and the outcome of your workflow, now depends on that job actually <em>getting</em> a self-hosted runner with the matching criteria and succeeding. There's no way for that to become optional anymore.
<p>And you might say, okay, well, maybe I can work around this by spinning up my self-hosted runners, but just giving them the same labels that GitHub assigns to their runners, maybe it'll do something smart? Like maybe... it will run my self-hosted runners if they're available and fall back. But no, the system has no sense of priority and you can't even define any sense of priority of like, I want to try this kind of runner and fall back to another kind of runner. This is simply not possible with the GitHub Action system.
<p>So it may seem like it's impossible to do graceful fallback, but we found a way.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0048.jpg" alt="Decision jobs
- Prepend a job that chooses a runner label
1. let label = runner available?
| yes => [self-hosted label]
| no => [GitHub-hosted label]
2. $ echo " /> [self-hosted label]
| no => [GitHub-hosted label]
2. $ echo " /> [self-hosted label]
| no => [GitHub-hosted label]
2. $ echo " /> [self-hosted label]
| no => [GitHub-hosted label]
2. $ echo "label=${label}" | tee -a $GITHUB_OUTPUT
- Use the step output in runs-on
runs-on: ${{ needs.decision.outputs.label }}
- But two decisions must not be made concurrently">
</div>
<div class="_spacer"></div>
<p>What we can do is add a decision job, for each workload job that may need to run on self-hosted runners. And we prepend this job, and its job is to choose a runner label.
<p>So how it essentially works is: it checks if a runner is available, and based on that, either chooses a self-hosted label or a GitHub hosted label. And it chooses it and sets it in an output. And this output can get pulled in... in our workload job, the job that actually does our work, because now you can parameterize the <code>runs-on</code> setting, so that it takes the value from this previous decision job.
<p>Unfortunately, it seems like this may not be possible in Forgejo Actions just yet, so we might have to develop support for that, but it's certainly possible in GitHub Actions today, and it has been for quite a while.
<p>The one caveat with this approach is that you need to be careful to only do this decision process one at a time. You should not do two instances of this process concurrently and interleave them.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0049.jpg" alt="Decisions must be serialised
- Stack Overflow and GitHub answers are inherently racy:
any idle runners? —(TOCTOU)→ commit to self-hosted
- We can reserve a runner by applying a unique label to it!
- GH API: add custom label: reserved-for:<UUIDv4>
- runs-on: ${{ needs.decision.outputs.label }}
runs-on: 6826776b-5c18-4ef5-8129-4644a698ae59
- Initially do this directly in the decision job (servo#33081)" />
- runs-on: ${{ needs.decision.outputs.label }}
runs-on: 6826776b-5c18-4ef5-8129-4644a698ae59
- Initially do this directly in the decision job (servo#33081)" />
- runs-on: ${{ needs.decision.outputs.label }}
runs-on: 6826776b-5c18-4ef5-8129-4644a698ae59
- Initially do this directly in the decision job (servo#33081)" />
- runs-on: ${{ needs.decision.outputs.label }}
runs-on: 6826776b-5c18-4ef5-8129-4644a698ae59
- Initially do this directly in the decision job (servo#33081)">
</div>
<div class="_spacer"></div>
<p>The reason for this is something you'll see if you think about a lot of the answers that you get on Stack Overflow and the GitHub forums. If you look up, "how do I solve this problem? How do I define a job where I can fall back from self-hosted runners to GitHub hosted runners?"
<p>And most of the answers there, they'll have a problem where they check if there are any runners that are available, <em>and then</em>, they will make the decision, committing to either a self-hosted runner or a GitHub hosted runner. The trouble is that in between, if another decision job comes in and tries to make the same kind of decision, they can end up "earmarking" the same runner for two jobs. But each runner is only meant to run one job, and it <em>can</em> only run one job, so one of the jobs will get left without a runner.
<p>So we can start to fix this by actually reserving the runners when we're doing graceful fallback. And how we've done it so far, is that we've used the GitHub Actions API to label the runner when we want to reserve it, and we label it with a unique ID. Then the workload job can put that unique ID, that label, in its <code>runs-on</code> setting. Instead of a general runner label, it can tie itself to this specific, uniquely identified runner label.
<p>And we did it this way initially, because it allowed us to do it inside the decision job, at first. I think in the future, we will have to move away from this, because on Forgejo Actions, the way runner labels work is quite different. They're not something that you can sort of update after the fact. In fact, they're actually kind of defined by the runner process. So this approach for reserving runners won't work on Forgejo Actions. We'll probably have to do that internally on the runner servers. But in the meantime, we use labeling. Yeah, so at first we did this inside the decision job.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0050.jpg" alt="Decisions must be serialised
- Labelling runners requires a privileged GitHub API token
- Even with reservation, decisions must still be serialised:
runner not yet reserved? —(TOCTOU)→ label the runner
- But hey, at least we have job concurrency… right?
- Wrong: runs will fail under even modest contention :(" />
</div>
<div class="_spacer"></div>
<p>There are some problems with this. One of them is that now the decision job has to have a GitHub token that has permissions to manage runners and update their labels. And this is something that we'd like to avoid if possible, because we'd like our jobs to have least privilege that they need to do their work.
<p>And something I didn't mention is that reserving runners in this way doesn't actually solve the problem on its own, because you've now transformed the problem to being, okay, we're going to check if the runner is not yet reserved. We're going to check if there's an unreserved runner, <em>and then</em>, we're going to label the runner. But in between, there's a moment where another process doing the same thing could make the same decision. And as a result, if we just did this, we could end up with a situation where one runner gets assigned two unique labels, but it can only fulfill one of them. So we have that same problem again.
<p>So you might say, okay, well, it looks like GitHub Actions has this neat job concurrency feature. I mean, they say you can use this to define a job in a way where only one of them will run at a time, and you can't run them concurrently, so let's try using that to solve this problem.
<p>What you'll find, though, is that if you try to solve the problem with job concurrency, as soon as there's even the slightest bit of contention, you'll just have your runs starting to fail spuriously, and you'll have to keep rerunning your jobs, and it'll be so much more annoying.
<p>And the reason for this is that, if you look more closely at the docs, job concurrency essentially has a queue limited to one job. So at a maximum, you can have one job, one instance of the job that's running, one instance of the job that's queued, and then if another instance of the job comes in while you have one running and one queued, then those extra jobs just get cancelled, and the build fails. So unfortunately, job concurrency does not solve this problem.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0051.jpg" alt="Decisions must be serialised
- So move decisions out of the decision jobs (servo#33315)
- But what happens if reserved runners fail to materialise?
- You can limit in_progress time in GHA, but not queued time" />
</div>
<div class="_spacer"></div>
<p>So to solve these problems, what we did is we moved that decision and runner reservation process out of the decision jobs, and into the servers that operate the runners themselves. And we do this with an API that runs on the servers.
<p>One smaller problem you might notice though, is that there's still a possibility that you could reserve a runner, but then after you've reserved the runner, it might fail to actually run. And this has become a lot less likely in our system, in our experience nowadays, now that we've ironed out most of the bugs, but it can still happen from time to time, usually due to infrastructure or network connectivity failures.
<p>And we wanted to solve this by setting a time limit on how long a job can be <code>queued</code> for, because if it can't actually get a runner in practice, it will get stuck in that <code>queued</code> state indefinitely. But unfortunately, while we can set a time limit on how long a job can <em>run for</em> once it's been assigned, we can't actually set a time limit on how long a job can be <code>queued</code> for.
<p>So we have to rely on the default built-in limit for all jobs, where I think there's like a limit of a day or two? So like, 24 or 48 hours or so, for how long a job can be <code>queued</code>? And this is just a really long time. So as a result, whenever this happens, you essentially have to cancel the job manually, or if you don't have permission to do that, you have to go <em>ask</em> someone who <em>has</em> permission, to go and cancel your job for you, which is really annoying.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0052.jpg" alt="Timeout jobs
- Watchdog for your workload job, ensuring it gets a runner
1. Wait a short amount of time (e.g. 120 seconds)
2. Query the CI service API for the workload job
3. If the job is still queued, cancel the run
- Only run this when you actually use a self-hosted runner:
if: ${{ fromJSON(needs.decision.outputs.is-self-hosted) }}" />
</div>
<div class="_spacer"></div>
<p>So we solved that using timeout jobs. A timeout job is a sibling to your workload job, and it acts like a watchdog, and it ensures that it actually got a runner when you expect to get a self-hosted runner.
<p>And how that works is, we wait a short amount of time, just like a minute or two, which should be long enough for the runner to actually start running the job, and then we query the API of the CI service to check if the workload job actually started running, or if it's still in that <code>queued</code> state. If it's still queued after two minutes, we cancel the run.
<p>Unfortunately, we can't just cancel the job run. We do have to cancel the whole workflow run, which is quite annoying. But, you know, it's GitHub, nothing is surprising anymore.
<p>Thankfully, we only have to run this when we actually get a self-hosted runner, so we can make it conditional. But over time, in Servo's deployment, we have actually stopped using these, to free up some of those GitHub-hosted runner resources.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0053.jpg" alt="Uniquely identifying jobs
- How do we know the run id of the workload job?
- Jobs can be instantiated many times via workflow calls
- The only supported job relationship is needs
- Workload job needs decision job
- Timeout job needs decision job
- Timeout job can't needs workload job
- needs relationships are not exposed in the API" />
</div>
<div class="_spacer"></div>
<p>One challenge that comes in making these timeout jobs is identifying the workload jobs uniquely, so we can look up whether it's still <code>queued</code> or whether it's started running. There are unique IDs for each job run. These are just like an incrementing number, and you'd think we'd be able to use this number to look up the workload job uniquely and robustly.
<p>Unfortunately, you can't know the run ID of the job [correction 2025-12-24] <del>until it starts, and it may not ever start... or at least you may not know it</del> until the workflow runs, and there can be many instances of the job in the workflow because of workflow calls. Workflow calls are a feature that essentially allows you to inline the contents of a workflow in another as many times as you like. And as a result, you can have multiple copies, multiple instances of a job that run independently within one workflow run. So we definitely need a way to uniquely look up our instance of the workload job.
<p>The trouble is that the only job relationship you can do in GitHub Actions is a <code>needs</code> relationship, and that's inappropriate for our situation here, because we can say that the workload job <code>needs</code> the decision job, we can say the timeout job <code>needs</code> the decision job — and in fact we do both of these, we "need" to do both of these — but we can't say that the timeout job <code>needs</code> the workload job, because of how <code>needs</code> works.
<p>How <code>needs</code> works is that if job A <code>needs</code> job B, then job B has to actually get assigned a runner, and run, and complete its run — it has to finish — before job A can even start. And in this situation, we're making a timeout job to catch situations where the workload job never ends up running, so if we expressed a <code>needs</code> relationship between them, then the timeout job would never run, in these cases at least.
<p>And even if we could express a <code>needs</code> relationship between jobs, like maybe we could walk the job tree, and go from the timeout job, through the decision job via the <code>needs</code> relationship, and then walk back down to the workload job using the same kind of <code>needs</code> relationship... unfortunately, none of these <code>needs</code> relationships are actually exposed in the API for a running workflow. So like, they're used for scheduling, but when you actually go and query the API, you can't tell what job <code>needs</code> what other job. They're just three jobs, and they're unrelated to one another.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0054.jpg" alt="Uniquely identifying jobs
- Tie them together by putting the <UUIDv4> in the name:
name: Linux [${{ needs.decision.outputs.unique-id }}]
name: Linux [6826776b-5c18-4ef5-8129-4644a698ae59]
- Query the CI service API for all jobs in the workflow run
- Check the status of the job whose name contains
[${{ needs.decision.outputs.unique-id }}]
- Yes, really, we have to string-match the name :)))" /> in the name:
name: Linux [${{ needs.decision.outputs.unique-id }}]
name: Linux [6826776b-5c18-4ef5-8129-4644a698ae59]
- Query the CI service API for all jobs in the workflow run
- Check the status of the job whose name contains
[${{ needs.decision.outputs.unique-id }}]
- Yes, really, we have to string-match the name :)))" /> in the name:
name: Linux [${{ needs.decision.outputs.unique-id }}]
name: Linux [6826776b-5c18-4ef5-8129-4644a698ae59]
- Query the CI service API for all jobs in the workflow run
- Check the status of the job whose name contains
[${{ needs.decision.outputs.unique-id }}]
- Yes, really, we have to string-match the name :)))" /> in the name:
name: Linux [${{ needs.decision.outputs.unique-id }}]
name: Linux [6826776b-5c18-4ef5-8129-4644a698ae59]
- Query the CI service API for all jobs in the workflow run
- Check the status of the job whose name contains
[${{ needs.decision.outputs.unique-id }}]
- Yes, really, we have to string-match the name :)))">
</div>
<div class="_spacer"></div>
<p>So how we had to end up solving this is, we had to tie these jobs together by generating a unique ID, a UUID, and putting it in the friendly name, like the display name of the job, like this.
<p>And to query the CI service to find out if that job is still queued, we need to query it for the whole workflow run, and just look at all of the jobs, and then find the job whose name contains that unique ID.
<p>Then we can check the <code>status</code>, and see if it's still <code>queued</code>. This is really, really silly. Yes, we really do have to string match the name, which is bananas! But this is GitHub Actions, so this is what we have to do.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0055.jpg" alt="Tokenless API
- Monitor API requires access to secrets in the workflow
- All pull_request_target runs have access to secrets
- …but you generally don't want to use it anyway
- Most pull_request runs do not have access to secrets
- How do we prove the request is genuine and authorised, if we can't authenticate with a token?" />
</div>
<div class="_spacer"></div>
<p>One thing I didn't mention is that being able to reserve runners needs to be kind of a privileged operation, because we don't just want an arbitrary client on the internet to be able to erroneously or maliciously reserve runners. Even if they may not be able to do a whole lot with those runners, they can still deny service.
<p>So to use the monitor API to do graceful fallback and to request and reserve a runner, normally we would require knowledge of some kind of shared secret, like an API token, and that's what we've done for most of the life of this system.
<p>The trouble with this is that there are many kinds of workflow runs that don't have access to the secrets defined in the repo. A big one is <code>pull_request</code> runs. Most of the time, <code>pull_request</code> runs don't have access to secrets defined in the repo. And there is another kind of run called a <code>pull_request_target</code> run, and those do have access to secrets, but they also have some pretty gnarly security implications that mean that in general, you wanna avoid using these for pull requests anyway.
<p>So if you're stuck with <code>pull_request</code> runs for your pull requests, does that mean that you can't use self-hosted runners? How do we allow <code>pull_request</code> runs to request and reserve runners in a way that it can prove that its request is genuine and authorized?
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0056.jpg" alt="Tokenless API
- Upload an artifact representing the request!
- Hit the monitor API
- /select-runner ?unique_id &qualified_repo &run_id
- (the profile_key is in the artifact)
- Important: delete the artifact, so it can't be reused (and set the minimum auto-delete, in case that fails)" />
</div>
<div class="_spacer"></div>
<p>What we do is we use artifacts. We upload a small artifact that encodes the details of the request and publish the artifact against the run. So in the artifact, we'd say, "I want two Ubuntu runners" or "one Windows runner" or something like that.
<p>And then we would hit the monitor API, we hit a different endpoint that just says, go to this repo, go to this run ID, and then check the artifacts! You'll see my request there! And this does not require an API token, it's not a privileged operation. What's privileged is publishing the artifact, and that's unforgeable; the only entity who can publish the artifact is the workflow itself.
<p>All we have to do then, all we have to be careful to do, is to delete the artifact after we reserve the runner, so that it can't be replayed by a malicious client. And in the event that deleting the artifact fails, we can also set the minimum auto-delete period of, I think at the moment it's 24 hours, just in case that fails.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0057.jpg" alt="Global queue
- Fallback happens immediately if no runners are available
- But if GitHub-hosted runs take 5x as long as self-hosted, we can wait up to 80% of that time and still win
- Run a queue service that allows jobs to wait for capacity
- Decision jobs hit the queue API instead of the monitor API
- Queue API says " />
</div>
<div class="_spacer"></div>
<p>Graceful fallback normally means that you lose the ability to naturally queue jobs for self-hosted runners. And this happens because when you hit the API, requesting, you know, are there any available runners, please reserve one for me. If there aren't any available at the time of the request, we will immediately fall back to running on a GitHub hosted runner.
<p>But the thing is, is that our self-hosted runners are generally so much faster! A GitHub hosted run might take five times as long as the equivalent self-hosted run, and if that was the case, it would actually be beneficial to wait up to 80% of that time, and we'd still probably save time.
<p>So to increase the utilization of our self-hosted runners, what we can do is run a small queue service that sits in front of the monitors, and it essentially acts as a reverse proxy. It will take the same kind of requests for reserving runners as before, but it will have a global view of the availability and the runner capacity at any given moment.
<p>And based on that, it will either respond to the client saying, go away and please try again later, or it will forward the request onto one of the monitors based on the available capacity.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0058.jpg" alt="Runner images" />
</div>
<div class="_spacer"></div>
<p>We also have some lessons here about how to automate building virtual machine images, essentially. And these are lessons that we've learned over the last year or two.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0059.jpg" alt="Runner images
- GitHub uses Packer for their stock runner images
- Our monitor service manages image rebuilds
- Initially kicked off manually, now fully automated (#6)
- Driven by Rust with reflink copies (#32)
- Mounting images to inject data is no longer viable (#30)
- macOS has no usable FS with Linux write support
- Get tools and deps from the monitor's web server (#32)" />
</div>
<div class="_spacer"></div>
<p>GitHub Actions uses Packer for their first-party runner images, so they use Packer to build those images. And our monitor service also automates building the images, but we don't use Packer.
<p>Initially, we had a handful of scripts that we just kicked off manually whenever we needed to update our images, but we've now fully automated the process. And we've done this using some modules in our monitor service, so there's some high-level Rust code that drives these image rebuilds, and it even uses reflink copies to take advantage of copy-on-write time savings and space savings with ZFS.
<p>Now, one of the complications we ran into when building images, over the past year or so, is that we used to pull a sort of clever trick where, we would do as much of the process of configuring the virtual machine image on the host, actually, as possible, rather than doing configuration inside the guest, and having to spin up the guest so that we can configure it. And we were able to do this by essentially mounting the root file system of the guest image, on the host, and then injecting files, like injecting tools and scripts and other things that are needed by the image and needed to configure the image.
<p>But we stopped being able to do this eventually because, well, essentially because of macOS. macOS has no file system that's usable for building Servo that can also be mounted on Linux with write support. Because like, just think about them, right? We've got HFS+, which Linux can write to but only if there's no journaling and you can't install macOS on HFS+ without journaling. There's APFS, which Linux has no support for. There's exFAT, which has no support for symlinks, so a lot of tools like uv will break. There's NTFS, which we thought would be our savior, but when we tried to use it, we ran into all sorts of weird build failures, which we believe are due to some kind of filesystem syncing race condition or something like that, so NTFS was unusable as well.
<p>In the end, what we had to do is, if a guest image needed any tools or dependencies, it had to download them from a web server on the host.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0060.jpg" alt="Runner images
- Consistent approach to automating operating systems
- OS installation: whatever is easiest
- Config bootstrap: native config management (if any)
- Use it as little as possible
- Image config: native scripting
- Runner boot: native scripting" />
</div>
<div class="_spacer"></div>
<p>We eventually settled on a consistent approach to automating the process of installing the OSes and configuring them for each runner template.
<p>And the approach that we used was to install the OS using whatever method is easiest, and to bootstrap the config using any native config management system, if there is one included with the operating system.
<p>But once we've kicked off that process, we then use the native config management as little as possible. And we do this because a lot of the config management tools that are built into these operating systems are quite quirky and annoying, and they are built for needs that we don't have, the primary need being that they can manage the configuration of a system over time, keeping it up to date with any changes. The thing about these runner images, though, is that each runner image only needs to get built once, it only needs to get configured once, and then after after that it'll be cloned for each runner, and then it will only run once, and this kind of sort of "one shot" use case is something that's kind of overkill with config management systems.
<p>We can just do it with the usual scripting and automation facilities of the operating system. How does that look like in practice?
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0061.jpg" alt="Linux runners
- OS installation: prebuilt Ubuntu cloud images
- Config bootstrap: cloud-init config
- Use it as little as possible (systemd journal to tty7, netplan config, curl and run next stage)
- Image config: bash script
- Runner boot: same bash script" />
</div>
<div class="_spacer"></div>
<p>Well for Linux, we install the OS using pre-built Ubuntu cloud images, that we just download from the mirrors.
<p>We bootstrap the config using cloud-init, which is such a painful... it's so painful to use cloud-init. We use it because it's included with the operating system, so that means it's the fastest possible way to get started.
<p>We use it as little as possible: we just configure the logs to go to a TTY, we configure the network, so we can connect to the network, and then once we're on the network, we just curl and run a bash script which does the rest.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0062.jpg" alt="Windows runners
- OS installation: autounattend.xml (generator)
- Config bootstrap: same autounattend.xml
- Use it as little as possible
- Create elevated scheduled task to curl and run next stage
- Install NetKVM driver, do some reg imports, reboot
- Image config: PowerShell script
- Runner boot: same PowerShell script" />
</div>
<div class="_spacer"></div>
<p>The same goes for Windows.
<p>We install the operating system using an automated answers file called autounattend.xml. There's a <a href="https://schneegans.de/windows/unattend-generator/">nice little generator here</a> which you can use if you don't want to have to set up a whole Windows server to set up unattended installations. You generate that XML file, and you can also use that XML file to bootstrap the config.
<p>Again we use it as little as possible, because writing automations as XML elements is kind of a pain. So we essentially just set up a scheduled task to run the next stage of the config, we install the network driver, we import some registry settings, and we reboot. That's it. The rest of it is done with a PowerShell script.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0063.jpg" alt="macOS runners
- OS installation: by hand :( but only once
- Config bootstrap: curl|sh by hand :( but only once
- Use it as little as possible (zsh script)
- Enable SSH, enable autologin, enable sudo NOPASSWD
- Install a LaunchAgent to curl and run next stage
- Disable broken session restore feature in Terminal.app
- Image config and runner boot: zsh script" />
</div>
<div class="_spacer"></div>
<p>The same goes for macOS.
<p>Now, unfortunately, installing the OS and bootstrapping the config does have to be done by hand. And this is because if you want to automate a macOS installation, your two options, more or less, are enterprise device management solutions, which cost a lot of money and mean that you have to have a macOS server around to control and orchestrate the servers. But if you don't want to use one of those enterprise solutions, what most open systems that are faced with this problem end up doing is to throw OpenCV at the problem. I've seen several projects use OpenCV to OCR the setup wizard, which is... it's certainly a bold strategy. It's not really for me.
<p>What I decided to do instead is just install the OS by hand, and pipe <code>curl</code> into <code>sh</code> to kick off the config management process. And this is something that we only really have to do once, because we do it once, and then we take a snapshot of it, and then we never have to do it again, at least until the next version of macOS comes out.
<p>So this bootstrap script just does a handful of minimal things: it enables automatic login, it sets up a LaunchAgent to ensure that we can run our own code on each boot, and then it does a handful of other things which it honestly doesn't really have to do in this script. We could probably do these things in the <code>zsh</code> script which we then <code>curl</code> and run. And that's where the remainder of the work is done.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0064.jpg" alt="Future directions
- Decoupling the system from Servo
- macOS arm64 runners (#64)
- Support for Forgejo Actions (#94)
- Support for other CI services?
- Dynamic runner counts / autoscaling
- Hot runners with memory ballooning
- microVM runners?" />
</div>
<div class="_spacer"></div>
<p>So looking forward, some things that we'd like to do with this system.
<p>The first is to decouple it from Servo. So we built this CI system quite organically over the past 12 to 18 months, and we built it around Servo's needs specifically, but we think this system could be more broadly useful for other projects. We'll just have to abstract away some of the Servo specific bits, so that it can be more easily used on other projects, and that's something we're looking into now.
<p>Something else that we'll have to do sooner or later is add support for macOS runners on Apple Silicon, that is ARM64, and the reason we have to do this is that macOS 26, which is the most recent version of macOS that came out in September, that's just a couple months ago, that is the last version of macOS that will support x86 CPUs. And at the moment, our macOS runners run on x86 CPUs, on the host and in the guest.
<p>This is a little bit complicated because at the moment, our macOS runners actually run on Linux hosts, using a Linux-KVM-based sort of Hackintosh-y kind of solution. And there is no counterpart for this for arm64 hosts and guests, and I'm not sure there ever will be one. So we're going to have to port the system so that it can run on macOS hosts, so we can use actual Mac hardware for this, which is easy enough to do, and that's in progress.
<p>But we're also going to have to port the system so it can run with other hypervisors. And this is because, although libvirt supports macOS hosts, the support for the macOS Hypervisor framework and Virtualization framework is not mature enough to actually run macOS guests in libvirt. And I'm not sure how long that will take to develop, so in the meantime, we've been looking at porting the system so that when you're on a Mac, you can run with UTM instead, and that's been working fairly well so far.
<p>We're also looking at porting the system so that it can run with Forgejo Actions and not just GitHub Actions. So Forgejo Actions is an open alternative to GitHub Actions that tries to be loosely compatible with GitHub Actions, and in our experience, from experimentation so far, we found that it mostly <em>is</em> loosely compatible. We think we'll only have to make some fairly minor changes to our system to make it work on both CI systems.
<p>That said, this CI system could potentially be more broadly applicable to other CI services as well, because virtual machine orchestration is something that we haven't really seen any CI services have a great built-in solution for. So if this interests you and you want to use it on your project on some other CI service, then we'd appreciate knowing about that, because that could be something we would explore next.
<p>The remaining ideas are things that I think we could look into to make our runners more efficient.
<p>The big one is autoscaling. So at the moment when you set up a server to operate some self-hosted runners, you essentially have to statically pre-configure how many runners of each kind of runner you want to be kept operating. And this has worked well enough for us for the most part, but it does mean that there's some kind of wasted resources sometimes, when the moment-to-moment needs of the jobs that are queued up aren't well fitted to the composition of your runner configuration. So if we had the ability to dynamically respond to demand, or some kind of autoscaling, I think we could improve our runner utilization rates a little bit, and sort of get more out of the same amount of runner capacity, the same amount of server capacity.
<p>There's a couple ideas here, also, about reducing boot times for the runners, which can be quite helpful if you have a big backlog of jobs queued up for these servers, and this is because time spent booting up each runner, each virtual machine, is time that cannot be spent doing real work.
<p>So two ways we can think of to reduce these boot times are, to have hot spares ready to go, the idea being that, if you can spin up more runners than you actually intend to run concurrently, and just have them sitting around, then you can kind of amortize the boot times, and sort of get the boot process process out of the way. And the way you do this is by spinning up a whole bunch of runners, say maybe you spin up like twenty runners, even though you only intend to run four of them concurrently.
<p>And what you do is you give these runners a token amount of RAM to start with. You give them like one gig of RAM instead of 16 or 32 gigs of RAM. And then when a job comes in, and you actually want to assign the runner out so that it can do the work, then you dynamically increase the RAM from one gig, or that token amount, to the actual amount, like 16 gigs or 32 gigs. And this should be fairly easy to do in practice. This is actually supported in libvirt using a feature known as memory ballooning. But there are some minor caveats, like you do lose the ability to do certain optimizations, like you can't do huge pages backing on the memory anymore. But for the most part, this should be fairly technically simple to implement.
<p>Something that could be more interesting in the longer term is microVMs, things like Firecracker, which as I understand it, these microVMs can sort of take the concept of paravirtualization to its logical extreme. And what it means is that on kernels that support being run as microVMs, you can boot them in like one or two seconds, instead of 20 or 30 or 60 seconds. And this could save a great deal of time, at least for jobs that run on Linux and BSD. I don't know if I said Linux and macOS, but I meant Linux and BSD.
<hr />
<div class="_slide">
<img src="https://www.azabani.com/images/servo-ci/mpv-shot0065.jpg" alt="github.com/servo/ci-runners
Slides: go.daz.cat/3tdhp" />
</div>
<div class="_spacer"></div>
<p>So yea, we now have a system that we use to speed up our builds in Servo's CI, and it works fairly well for us. And we think that it's potentially useful for other projects as well.
<p>So if you're interested to find out more, or you're interested to find out how you can use the system in your own projects, go to our GitHub repo at <a href="https://github.com/servo/ci-runners">servo/ci-runners</a>, or you can go <a href="https://go.daz.cat/3tdhp">here for a link to the slides</a>. Thanks!
article > section:not(#specificity) {
position: relative;
display: flex;
flex-flow: column nowrap;
padding: 1em 0;
}
article > section:not(#specificity) > * {
flex: 0 0 auto;
}
article > section:not(#specificity) > ._slide {
max-height: 50vh;
flex: 0 0 auto;
position: sticky;
top: 0;
text-align: center;
}
article > section:not(#specificity) > ._slide > img {
max-width: 100%;
max-height: 50vh;
}
article > section:not(#specificity) > ._spacer {
flex: 1 0 1em;
}</p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p></p> Delan Azabanihttps://www.azabani.com/Andy Wingo: in which our protagonist dreams of laurelshttps://wingolog.org/2025/12/17/in-which-our-protagonist-dreams-of-laurels2025-12-17T22:42:40+00:00
<div><p>I had a dream the other evening, in which I was at a large event full of
hackers—funny, that this is the extent of my dreams at the moment; as a
parent of three young kids, I don’t get out much—and, there, I was to
receive an award and give a speech. (I know, I am a ridiculous man,
even when sleeping.) The award was something about free software; it
had the trappings of victory, but the vibe among attendees was numbness
and bitter loss. Palantir had a booth; they use free software, and
isn’t that just great?</p><p>My talk was to be about Guile, I think: something technical, something
interesting, but, I suspected, something inadequate: in its place and
time it would be a delight to go deep on mechanism but the moment seemed
to call for something else.</p><p>These days are funny. We won, objectively, in the sense of the goals we
set in the beginning; most software is available to its users under a
free license: Firefox, Chromium, Android, Linux, all the programming
languages, you know the list. So why aren’t we happy?</p><p>When I reflect back on what inspired me about free software 25 years
ago, it was much more political than technical. The idea that we should
be able to modify our own means of production and share those
modifications was a part of a political project of mutual care: we
should be empowered to affect the systems that surround us, to the
extent that they affect us.</p><p>To give you an idea of the milieu, picture me in 1999. I left my home
to study abroad on another continent. When I would go to internet cafés
I would do my email and read slashdot and freshmeat as one did back
then, but also I would often read <a href="https://znetwork.org/">Z magazine</a>,
Noam Chomsky and Michael Albert and Michael Parenti and Arundhati Roy
and Zapatistas and all. I remember reading El País the day after “we”
shut down the World Trade Organization meeting in Seattle, seeing
front-page pictures of pink-haired kids being beat up by the cops and
wishing I were there with them. For me, free software fit with all of
this: the notion that a better world was possible, and we could build it
together.</p><p>I won’t lie and say that the ideals were everything. I think much of my
motivation to program is selfish: I like to learn, to find out, to do.
But back then I felt the social component more strongly. Among my
cohort, though, I think we now do free software because we did free
software; the motive sedimented into mechanism. These are the spoils of
victory: free is the default. But defaults lack a sense of urgency, of
the political.</p><p>Nowadays the commons that we built is the feedlot of large language
models, and increasingly also its waste pond. The software we make is
free, but the system in which it is made is not; Linux Magazine 1, Z
magazine 0.</p><p>All of this makes me think that free software as a cause has run its
course. We were the vanguard, and we won. Our dreams of 25 years ago
are today’s table stakes. Specifically for my copyleft comrades, it
seems that the role of copyright as a societal lever has much less purchase; taken to its conclusion, we might find ourselves siding with
Disney and OpenAI against Google.</p><p>If I had to choose an idea from the 90s to keep, I would take “another
world is possible” over the four freedoms. For me, software freedom is
a strategy within a broader humanist project of liberation. It was
clever, in that it could motivate people from a variety of backgrounds
in a way that was on the whole positive for the humanist project. It
inspired me as a meaningful way in which I could work towards a world of
people caring for each other. In that spirit, I would like to invite my
comrades to reflect on their own hierarchy of principles; too often I
see people arguing the fine points of “is this software free” according
to a specific definition without appreciating the ends to which the
software freedom definition is a means.</p><p>Anyway, it turns out that I did win something, <a href="https://www.fsf.org/news/2024-free-software-awards-winners">the Award for the
Advancement of Free
Software</a>,
for my work on Guile over the years. My work on Guile has waxed and waned,
and in these last few years of parenthood it has been rather the latter,
but I am proud of some of the technical hacks; and it has been with a
heart-warming, wondrous delight that I have been a spectator to the rise
of <a href="https://guix.gnu.org/">Guix</a>, a complete operating system built on
Guile. Apart from its quite compelling technical contributions, I just
love that Guix is a community of people working together to build a
shared project. I am going to the <a href="https://libreplanet.org/wiki/Group:Guix/FOSDEM2026">Guix
days</a> in a month or
so and in past years it has been such a pleasure to see so many people
there, working to make possible another world.</p><p>In my dream, instead of talking about Guile, I gave a rousing and
compelling impromptu invective against Palantir and their ilk. I
thought it quite articulate; I was asleep. In these waking hours, some
days later, I don’t know what I did say, but I think I know what I would
like to have said: that if we take the means of free software to be the
ends, then we will find ourselves arguing our enemies are our friends.
Saying that it’s OK if some software we build on is made by people who
facilitate ICE raids. People who build spy software for controlling
domestic populations. People who work for empire.</p><p>What I would like to say is that free software is a strategy. As a
community of people that share some kind of liberatory principles of
which free software has been a part, let use free software as best we
can, among many other strategies. If it fits, great. If you find
yourself on the same side of an argument as Palantir, it’s time to back
up and try something else.</p></div> Andy Wingohttps://wingolog.org/Pablo Saavedra: Verifying ARM vs THUMB2 Instruction Sets in ELF Binarieshttp://http503.gvatas.in/?p=26922025-12-16T08:17:53+00:00
When working with embedded Linux systems, in ARM-based boards, sometimes you need to determine whether a binary was compiled with ARM or THUMB2 instruction sets. This is a quick reference guide for checking this without relying on heavy tools like readelf or objdump. The Core Concept ARM uses a clever trick to distinguish between ARM […] Pablo Saavedrahttps://http503.gvatas.inIgalia WebKit Team: WebKit Igalia Periodical #51https://blogs.igalia.com/webkit/blog/2025/wip-51/2025-12-15T19:58:42+00:00
<p>Update on what happened in WebKit in the week from December 8 to December 15.</p>
<p>
In this end-of-year special have a new GMallocString helper that makes
management of malloc-based strings more efficient, development releases,
and a handful of advancements on JSC's implementation of Temporal, in
particular the PlainYearMonth class.
</p>
<h2 id="cross-port-cat">Cross-Port 🐱</h2>
<div class="wip-item">
<p>Added <a rel="external" href="https://github.com/WebKit/WebKit/pull/55162">GMallocString</a> class to WTF to adopt UTF8 C strings and make them WebKit first class citizens efficiently (no copies). Applied in GStreamer code together with <a rel="external" href="https://github.com/WebKit/WebKit/pull/51259">other improvements by using CStringView</a>. Fixed other <a rel="external" href="https://github.com/WebKit/WebKit/pull/54312">two</a> <a rel="external" href="https://github.com/WebKit/WebKit/pull/54762">bugs</a> about string management.</p>
</div>
<h3 id="javascriptcore-fish">JavaScriptCore 🐟</h3>
<div class="wip-description">
<p>The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.</p>
</div>
<div class="wip-item">
<p>In JavaScriptCore's implementation of Temporal, <a rel="external" href="https://github.com/WebKit/WebKit/pull/55153">added the <code>with</code> method for <code>PlainYearMonth</code> objects</a>, as well as <a rel="external" href="https://github.com/WebKit/WebKit/pull/55076">the <code>equals</code>, <code>compare</code>, and <code>valueOf</code> methods</a>, <a rel="external" href="https://github.com/WebKit/WebKit/pull/54800">and also the <code>from</code> method</a>. Also <a rel="external" href="https://github.com/WebKit/WebKit/pull/55201">implemented the <code>toPlainDate</code> method for PlainYearMonth objects</a>.</p>
</div>
<h2 id="releases-package">Releases 📦️</h2>
<div class="wip-item">
<p>Development releases of <a rel="external" href="https://webkitgtk.org/2025/12/09/webkitgtk2.51.3-released.html">WebKitGTK 2.51.3</a>
and <a rel="external" href="https://wpewebkit.org/release/wpewebkit-2.51.3.html">WPE WebKit 2.51.3</a>
are now available. These include a number of API additions and new features,
and are intended to allow interested parties to test those in advance, prior
to the next stable release series. As usual, bug reports are
<a rel="external" href="https://bugs.webkit.org/">welcome in Bugzilla</a>.</p>
</div>
<div class="wip-end">
<p>That’s all for this week!</p>
</div> Igalia WebKit Teamhttps://blogs.igalia.com/webkitJosé Dapena: Maintaining Chromium downstream: how can upstream help?https://blogs.igalia.com/dape/2025/12/11/maintaining-chromium-downstream-how-can-upstream-help/2025-12-11T00:00:00+00:00
<img class="face" src="/images/dape.png" width="74" height="100" alt="" align="right" style="float: right" />
<p>As I write often, maintaining a downstream of Chromium is not easy. A lot of effort falls on the shoulders of the teams embedding Chromium, or creating products on top of the upstream Chromium project.</p>
<p>We covered this in the previous chapters of my <a href="https://blogs.igalia.com/dape/tags/downstream-maintenance/">series of blog posts about maintaining Chromium downstreams</a>. Now, this post is going to be a bit different.</p>
<p>I start with a question:</p>
<blockquote>
<p>What can upstream Chromium do to help downstreams?</p>
</blockquote>
<p>This very same question was discussed in the <a href="https://webengineshackfest.org/">Web Engines Hackfest</a> breakout session that originated most of these posts. In this blog post, I will share some of the most interesting answers that came up in that session.</p>
<h2 id="better-componentization" tabindex="-1">Better componentization <a class="header-anchor" href="https://blogs.igalia.com/dape/2025/12/11/maintaining-chromium-downstream-how-can-upstream-help/">#</a></h2>
<p>One of the ideas was to move code around more aggressively to make it easier to reuse. Specifically, refactoring to move more and more code from <code>//chrome</code> to <code>//components</code>.</p>
<p>Chromium has gone a long way in that direction. Each of these changes allows downstreams to directly use only the components they need, instead of working on top of <code>//chrome</code>. But there is still room for improvement.</p>
<p>Some parts of <code>//chrome</code> are still not refactored and could be very useful, especially for downstreams shipping a browser. Some examples:</p>
<ul>
<li>Tabs implementation.</li>
<li>Profiles.</li>
<li>Synchronization.</li>
</ul>
<h2 id="improve-extensibility" tabindex="-1">Improve extensibility <a class="header-anchor" href="https://blogs.igalia.com/dape/2025/12/11/maintaining-chromium-downstream-how-can-upstream-help/">#</a></h2>
<p>In the same direction, supporting easier ways to provide alternative implementations, and add custom software components, was considered important.</p>
<p>Some examples:</p>
<ul>
<li>Making it easier to support Chrome extensions without using <code>//chrome</code>, would allow implementing new browsers without bundling the Chromium UI.</li>
<li>Going further in the direction of what has been done with <a href="https://chromium.googlesource.com/chromium/src/+/lkgr/docs/ozone_overview.md">Ozone</a>: the Chromium platform abstraction layer that helps to implement the support for a variety of OS (including Linux and X11). Similar steps could be taken at other levels to improve OS integration (system hardware encryption, accelerated video codecs, system IPC, and so on).</li>
</ul>
<h2 id="downstream-advocacy" tabindex="-1">Downstream advocacy <a class="header-anchor" href="https://blogs.igalia.com/dape/2025/12/11/maintaining-chromium-downstream-how-can-upstream-help/">#</a></h2>
<p>A very interesting proposal was to create the role of downstream advocates in the Chrome community.</p>
<p>They would act as an entry point for downstream projects wanting to interact with the Chrome community and be an official communication channel for downstreams to report their needs.</p>
<p>This would also increase awareness of the different ways Chromium is used by downstreams.</p>
<p>Today there are two channels that are somewhat similar: the <a href="https://groups.google.com/a/chromium.org/g/embedder-dev"><em>Chromium Embedders</em> mailing list</a> and the <code>#embedders</code> Slack channel.</p>
<h2 id="a-two-way-problem" tabindex="-1">A two-way problem <a class="header-anchor" href="https://blogs.igalia.com/dape/2025/12/11/maintaining-chromium-downstream-how-can-upstream-help/">#</a></h2>
<p>So far, three different problems raised by downstreams have been covered, and they seem like fair requests to the Chromium community.</p>
<p>But there is also work to do on the downstreams side.</p>
<p>Can downstreams contribute more of their work to upstream? Not only in code, but also in all the maintenance activities.</p>
<p>There is also code written for very specific downstream needs that could land upstream, as long as it does not become a burden to the common project. That means ownership and enough work bandwidth need to be in place.</p>
<h2 id="where-are-we-now" tabindex="-1">Where are we now? <a class="header-anchor" href="https://blogs.igalia.com/dape/2025/12/11/maintaining-chromium-downstream-how-can-upstream-help/">#</a></h2>
<p>There is a major change in the Chromium community: the creation of the <a href="https://www.linuxfoundation.org/supporters-of-chromium-based-browsers">Supporters of Chromium Based Browsers</a>. What does it mean for embedders? Could it be a good way to channel requirements from the different downstream projects?</p>
<p>Two years after the Web Engines Hackfest session, we can see some improvements. But the general question is still valid:</p>
<blockquote>
<p>What can upstream Chromium do to help downstreams?</p>
</blockquote>
<h2 id="the-last-post" tabindex="-1">The last post <a class="header-anchor" href="https://blogs.igalia.com/dape/2025/12/11/maintaining-chromium-downstream-how-can-upstream-help/">#</a></h2>
<p>The next post in this series will be the last one. It will cover some typical problems downstream projects are facing today.</p>
<h2 id="references" tabindex="-1">References <a class="header-anchor" href="https://blogs.igalia.com/dape/2025/12/11/maintaining-chromium-downstream-how-can-upstream-help/">#</a></h2>
<ul>
<li><a href="https://webengineshackfest.org/2023/">Web Engines Hackfest 2023</a> - <a href="https://github.com/Igalia/webengineshackfest/issues/9">Maintaining Chromium downstream breakout session</a>.</li>
<li><a href="https://www.linuxfoundation.org/supporters-of-chromium-based-browsers">Supporters of Chromium-Based Browsers</a>.</li>
</ul> José Dapenahttps://blogs.igalia.com/dape/Luke Lau: Closing the LLVM RISC-V gap to GCC, part 1http://lukelau.me/2025/12/10/closing-the-gap-pt12025-12-09T16:00:00+00:00
<p>At the time of writing, GCC beats Clang on several SPEC CPU 2017 benchmarks on
RISC-V<sup id="fnref:1"><a href="http://lukelau.me/2025/12/10/closing-the-gap-pt1.html#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>
<p><img src="http://lukelau.me/assets/lnt.png" alt="LNT results comparing GCC and Clang" /></p>
<p>LLVM developers upstream have been working hard on the performance of
generated code, in every part of the pipeline from the frontend all
the way through to the backend. So when we first saw these results we
were naturally a bit surprised. But as it turns out, the GCC
developers have been hard at work too.</p>
<p>Sometimes a bit of healthy competition isn’t a bad thing, so this blog
post is the first in a series looking at the work going on upstream to
improve performance and catch up to GCC.</p>
<p>Please note that this series focuses on RISC-V. Other targets may have
more competitive performance but we haven’t measured them yet. We’ll
specifically be focusing on the high-performance application processor
use case for RISC-V, e.g. compiling for a
<a href="https://github.com/riscv/riscv-profiles">profile</a> like
RVA23. Unfortunately since we don’t have access to RVA23 compatible
hardware just yet we’ll be benchmarking on a SpacemiT-X60 powered
Banana Pi BPI-F3 with <code class="language-plaintext highlighter-rouge">-march=rva22u64_v</code>. We don’t want to use
<code class="language-plaintext highlighter-rouge">-mcpu=spacemit-x60</code> since we want to emulate a portable configuration
that an OS distribution might compile packages with. And we want to
include the vector extension, as we’ll see in later blog posts that
optimization like auto-vectorization can have a major impact on
performance.</p>
<h2 id="where-to-start">Where to start?</h2>
<p>It goes without saying that a vague task like “make LLVM faster” is
easier said than done. The first thing is to find something to make
fast, and while you could read through the couple dozen million lines
of code in LLVM until inspiration strikes, it’s generally easier to
start the other way around by analyzing the code it generates.</p>
<p>Sometimes you’ll get lucky by just stumbling across something that
could be made faster when hacking or poring through generated
assembly. But there’s an endless amount of optimizations to be
implemented and not all of them are equally impactful. If we really
want to make large strides in performance we need to take a step back
and triage what’s actually worth spending time on.</p>
<p><a href="https://llvm.org/docs/lnt/intro.html">LNT</a>, LLVM’s nightly testing
infrastructure, is a great tool for this task. It’s both a web server
that allows you to analyze benchmark results, and a command line tool
to help run the benchmarks and submit the results to said web
server.</p>
<p>As the name might imply, it’s normally used for detecting performance
regressions by running benchmarks daily with the latest revision of
Clang, flagging any tests that may have become slower or faster since
the last revision.</p>
<p>But it also allows you to compare benchmark results across arbitrary
configurations. You can run experiments to see what effects a flag
has, or see the difference in performance on two pieces of hardware.</p>
<p>Moreover, you can pass in different compilers. In our case, we can do
two “runs” with Clang and GCC. Here’s how we would kick these off:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for </span>CC <span class="k">in </span>clang riscv64-linux-gnu-gcc
<span class="k">do
</span>lnt runtest test-suite bpi-f3-rva22u64_v-ReleaseLTO <span class="se">\</span>
<span class="nt">--sandbox</span> /var/lib/lnt/ <span class="se">\</span>
<span class="nt">--test-suite</span><span class="o">=</span>path/to/llvm-test-suite <span class="se">\</span>
<span class="nt">-DTEST_SUITE_SPEC2017_ROOT</span><span class="o">=</span>path/to/cpu2017 <span class="se">\</span>
<span class="nt">--cc</span><span class="o">=</span><span class="nv">$CC</span> <span class="se">\</span>
<span class="nt">--cflags</span><span class="o">=</span><span class="s2">"-O3 -flto -march=rva22u64_v"</span> <span class="se">\</span>
<span class="nt">--cxxflags</span><span class="o">=</span><span class="s2">"-O3 -flto -march=rva22u64_v"</span> <span class="se">\</span>
<span class="nt">--benchmarking-only</span> <span class="se">\</span>
<span class="nt">--build-threads</span><span class="o">=</span>16 <span class="se">\</span>
<span class="c"># cross-compile and run on another machine over ssh</span>
<span class="nt">--toolchain</span><span class="o">=</span>rva22u64_v.cmake <span class="se">\ </span>
<span class="nt">--remote-host</span><span class="o">=</span>bpi-f3 <span class="se">\</span>
<span class="c"># fight noise, run each benchmark 3 times on the same core</span>
<span class="nt">--exec-multisample</span><span class="o">=</span>3 <span class="se">\</span>
<span class="nt">--run-under</span><span class="o">=</span><span class="s2">"taskset -c 5"</span> <span class="se">\</span>
<span class="c"># submit the results to a web server for easy viewing</span>
<span class="nt">--submit</span><span class="o">=</span>https://mylntserver.com/submitRun
<span class="k">done</span>
</code></pre></div></div>
<p>This command does a lot of heavy lifting. First off it invokes CMake
to configure a new build of llvm-test-suite and SPEC CPU 2017 with
<code class="language-plaintext highlighter-rouge">-O3 -flto -march=rva22u64_v</code>. But because compiling the benchmarks
on the Banana Pi BPI-F3 would be painfully slow, we’ve specified a
CMake <a href="https://cmake.org/cmake/help/book/mastering-cmake/chapter/Cross%20Compiling%20With%20CMake.html#toolchain-files">toolchain
file</a>
to <strong>cross-compile to riscv64-linux-gnu</strong> from an x86-64 build
machine. Here’s what the toolchain file looks like:</p>
<div class="language-cmake highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># rva22u64_v.cmake</span>
<span class="nb">set</span><span class="p">(</span>CMAKE_SYSTEM_NAME Linux<span class="p">)</span>
<span class="nb">set</span><span class="p">(</span>CMAKE_C_COMPILER_TARGET riscv64-linux-gnu<span class="p">)</span>
<span class="nb">set</span><span class="p">(</span>CMAKE_CXX_COMPILER_TARGET riscv64-linux-gnu<span class="p">)</span>
<span class="nb">set</span><span class="p">(</span>ARCH riscv64<span class="p">)</span>
</code></pre></div></div>
<p>If you’ve got your cross toolchain sysroots set up in the right place
in <code class="language-plaintext highlighter-rouge">/usr/riscv64-linux-gnu/</code>, things should “just work” and CMake will
magically build RISC-V binaries. On Debian distros you can install the
*-cross packages for this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ apt install libc6-dev-riscv64-cross libgcc-14-dev-riscv64-cross
libstdc++-12-dev-riscv64-cross
</code></pre></div></div>
<p>(You could also use mmdebstrap, or see <a href="https://muxup.com/2024q4/rootless-cross-architecture-debootstrap">Alex Bradbury’s guide to this</a>)</p>
<p>After the benchmarks are built it rsyncs over the binaries to the
remote machine, and then sshes into it to begin running the
benchmarks. It will expect the sandbox path where the binaries are
built on the remote to also exist on the host, so something like
<code class="language-plaintext highlighter-rouge">/var/lib/lnt</code> should work across both. The BPI-F3 can also produce
some noisy results, so the <code class="language-plaintext highlighter-rouge">--exec-multisample=3</code> and
<code class="language-plaintext highlighter-rouge">-run-under="taskset -c 5"</code> tell it to run the benchmarks multiple
times and pin them to the same core.</p>
<p>Finally it generates a <code class="language-plaintext highlighter-rouge">report.json</code> file and submits it to the web
server of choice. Navigate to the web interface and you’ll be shown
two “machines”, LNT’s parlance for a specific combination of hardware,
compiler and flags. You should see something like:
<code class="language-plaintext highlighter-rouge">bpif3-rva22u64_v-ReleaseLTO__clang_DEV__riscv64</code> and
<code class="language-plaintext highlighter-rouge">bpif3-rva22u64_v-ReleaseLTO__gcc_DEV__riscv64</code>. Clicking into one of
these machines will allow you to compare it against the other.</p>
<p><img src="http://lukelau.me/assets/lnt-machine-compare.png" alt="LNT UI for comparing results across two machines" /></p>
<h2 id="profiling">Profiling</h2>
<p>Once on the LNT web interface you’ll be presented with a list of
benchmarks with a lot of red percentages beside them. We now know
<em>what</em> is slower, but next we need to know <em>why</em> they’re slower. We
need to profile these benchmarks to see where all the cycles are spent
and to figure out what Clang is doing differently from GCC.</p>
<p>LNT makes this easy, all you need to do is add <code class="language-plaintext highlighter-rouge">--use-perf=profile</code> to
the <code class="language-plaintext highlighter-rouge">lnt runtest</code> invocation and it will perform an additional run of
each benchmark wrapped in <code class="language-plaintext highlighter-rouge">perf record</code>. Profiling impacts run time so
it runs it separately to avoid interfering with the final results. If
you want to override the default events that are sampled you can
specify them with <code class="language-plaintext highlighter-rouge">--perf-events=cycles:u,instructions:u,...</code>.</p>
<p>LNT will take care of copying back the collected profiles to the host
machine and encoding them in the report, and in the web interface
you’ll notice a “Profile” button beside the benchmark. Click on that
and you’ll be brought to a side by side comparison of the profiles
from the two machines:</p>
<p><img src="http://lukelau.me/assets/lnt-profile.png" alt="LNT UI for comparing profiles" /></p>
<p>From here you can dive in and see where the benchmark spends most of
its time. Select a function from the dropdown and choose one with a
particularly high percentage: This is how much it makes up overall of
whatever counter is active in the top right, like cycles or
instructions. Then do the same for the other run and you’ll be
presented with the disassemblies side-by-side below. Most importantly,
information about the counters is displayed inline with each
instruction, much like the output of <code class="language-plaintext highlighter-rouge">perf annotate</code>.</p>
<p>You might find the per-instruction counter cycle information to be a
bit too fine-grained, so personally I like to use the “Control-Flow
Graph” view mode in the top left. This groups the instructions into
blocks and lets you see which blocks are the hottest. It also shows
the edges between branches and their destinations which makes
identifying loops a lot easier.</p>
<h2 id="a-real-example">A real example</h2>
<p>Lets take a look at how we can use LNT’s web interface to identify
something that GCC does but Clang doesn’t (but should). Going back to
the list of SPEC benchmark results we can see 508.namd_r is about 17%
slower, so hopefully we should find something to optimize in
there.</p>
<p>Jumping into the profile we can see there’s a bunch of functions that
all contribute a similar amount to the runtime. We’ll just pick the
hottest one at 14.3%,
<code class="language-plaintext highlighter-rouge">ComputeNonbondedUtil::calc_pair_energy_fullelect(nonbonded*)</code>. It’s a
pretty big function, but in GCC’s profile 71% of the dynamic
instruction count comes from this single, albiet large block.</p>
<p><img src="http://lukelau.me/assets/508.namd_r-block-gcc.png" alt="A hot block in the profile for GCC's 508.namd_r" /></p>
<p>Looking at Clang’s profile on the opposite side we see a similar block
that accounts for 85% of the function’s instruction count. This
slightly higher proportion is a small hint that the block that Clang’s
producing is sub-optimal. If we take the hint and stare at it for long
enough, one thing starts to stand out is that Clang generates a
handful of <code class="language-plaintext highlighter-rouge">fneg.d</code> instructions which GCC doesn’t:</p>
<pre><code class="language-asm"> fneg.d fa0, fa0
fneg.d ft0, ft0
fneg.d ft2, ft2
fmul.d fa3, ft5, fa3
fmul.d fa0, fa3, fa0
fmul.d ft0, fa3, ft0
fmul.d fa3, fa3, ft2
fmadd.d fa2, fa4, fa2, fa0
fmadd.d ft6, fa4, ft6, ft0
fmadd.d fa4, fa4, ft1, fa3
</code></pre>
<p><code class="language-plaintext highlighter-rouge">fneg.d rd, rs1</code> negates a double and <code class="language-plaintext highlighter-rouge">fmul.d</code> multiplies two
doubles. <code class="language-plaintext highlighter-rouge">fmadd.d rd, rs1, rs2, rs3</code> computes <code class="language-plaintext highlighter-rouge">(rs1*rs2)+rs3</code>, so here
we’re doing some calculation like <code class="language-plaintext highlighter-rouge">(a*b)+(c*-d)</code>.</p>
<p>These <code class="language-plaintext highlighter-rouge">fneg.d</code>s and <code class="language-plaintext highlighter-rouge">fmadd.d</code>s are missing on GCC. Instead it emits
<code class="language-plaintext highlighter-rouge">fmsub.d</code>, which is entirely absent from the Clang code:</p>
<pre><code class="language-asm"> fmul.d fa1,fa4,fa1
fmul.d ft10,fa4,fa5
fmsub.d ft10,ft7,fa0,ft10
fmsub.d fa5,ft7,fa5,fa1
fmul.d fa1,fa4,fa1
fmsub.d fa1,ft7,fa0,fa1
</code></pre>
<p><code class="language-plaintext highlighter-rouge">fmsub.d rd, rs1, rs2, rs3</code> computes <code class="language-plaintext highlighter-rouge">(rs1*rs2)-rs3</code>, so GCC is
instead doing something like <code class="language-plaintext highlighter-rouge">(a*b)-(c*d)</code> and in doing so avoids the
need for the <code class="language-plaintext highlighter-rouge">fneg.d</code>. This sounds like a missed optimization in LLVM,
so lets take a look at fixing it.</p>
<h2 id="writing-the-right-fix">Writing the (right) fix</h2>
<p>The LLVM RISC-V scalar backend is pretty mature at this stage so it’s
surprising that we aren’t able to match <code class="language-plaintext highlighter-rouge">fmsub.d</code>. But if you take a
look in <code class="language-plaintext highlighter-rouge">RISCVInstrInfoD.td</code>, you’ll see that the pattern already
exists:</p>
<pre><code class="language-tablegen">// fmsub: rs1 * rs2 - rs3
def : Pat<(any_fma FPR64:$rs1, FPR64:$rs2, (fneg FPR64:$rs3)),
(FMSUB_D FPR64:$rs1, FPR64:$rs2, FPR64:$rs3, FRM_DYN)>;
</code></pre>
<p>We’ll need to figure out why this pattern isn’t getting selected, so
lets start by extracting the build commands so we can look under the
hood and dump the LLVM IR:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>cmake <span class="nt">-B</span> build <span class="nt">-C</span> cmake/caches/ReleaseLTO.cmake <span class="nt">--toolchain</span><span class="o">=</span>...
<span class="nv">$ </span>ninja <span class="nt">-C</span> build 508.namd_r <span class="nt">-t</span> clean
<span class="nv">$ </span>ninja <span class="nt">-C</span> build 508.namd_r <span class="nt">-v</span>
...
<span class="o">[</span>44/45] : <span class="o">&&</span> llvm-project/build.release/bin/clang++ <span class="nt">--target</span><span class="o">=</span>riscv64-linux-gnu <span class="nt">-march</span><span class="o">=</span>rva22u64_v <span class="nt">-O3</span> <span class="nt">-fomit-frame-pointer</span> <span class="nt">-flto</span> <span class="nt">-DNDEBUG</span> <span class="nt">-fuse-ld</span><span class="o">=</span>lld ... <span class="nt">-o</span> External/SPEC/CFP2017rate/508.namd_r/508.namd_r
</code></pre></div></div>
<p>This is an LTO build so the code generation step is actually happening
during link time. To dump the IR we can copy and paste the link
command from the verbose output and append <code class="language-plaintext highlighter-rouge">-Wl,--save-temps</code> to it,
which in turn tells the Clang driver to pass <code class="language-plaintext highlighter-rouge">--save-temps</code> to the
linker<sup id="fnref:lld"><a href="http://lukelau.me/2025/12/10/closing-the-gap-pt1.html#fn:lld" class="footnote" rel="footnote">2</a></sup>.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>llvm-project/build.release/bin/clang++ <span class="nt">-Wl</span>,--save-temps <span class="nt">--target</span><span class="o">=</span>riscv64-linux-gnu <span class="nt">-march</span><span class="o">=</span>rva22u64_v <span class="nt">-O3</span> <span class="nt">-fomit-frame-pointer</span> <span class="nt">-flto</span> <span class="nt">-DNDEBUG</span> <span class="nt">-fuse-ld</span><span class="o">=</span>lld ... <span class="nt">-o</span> External/SPEC/CFP2017rate/508.namd_r/508.namd_r
<span class="nv">$ </span><span class="nb">ls </span>External/SPEC/CFP2017rate/508.namd_r/508.namd_r<span class="k">*</span>
External/SPEC/CFP2017rate/508.namd_r/508.namd_r
External/SPEC/CFP2017rate/508.namd_r/508.namd_r.0.0.preopt.bc
External/SPEC/CFP2017rate/508.namd_r/508.namd_r.0.2.internalize.bc
External/SPEC/CFP2017rate/508.namd_r/508.namd_r.0.4.opt.bc
External/SPEC/CFP2017rate/508.namd_r/508.namd_r.0.5.precodegen.bc
</code></pre></div></div>
<p>The bitcode is dumped at various stages, and
<code class="language-plaintext highlighter-rouge">508.namd_r.0.5.precodegen.bc</code> is the particular stage we’re looking
for. This is after all the middle-end optimisations have run and is as
close as we’ll get before the backend begins. It contains the bitcode
for the entire program though, so lets find the symbol for the C++
function and extract just that corresponding LLVM IR function:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ llvm-objdump -t 508.namd_r | grep calc_pair_energy_fullelect
...
000000000004562e l F .text 0000000000001c92 _ZN20ComputeNonbondedUtil26calc_pair_energy_fullelectEP9nonbonded
$ llvm-extract -f 508.namd_r.0.5.precodegen.bc --func _ZN20ComputeNonbondedUtil26calc_pair_energy_fullelectEP9nonbonded \
| llvm-dis > calc_pair_energy_fullelect.precodegen.ll
</code></pre></div></div>
<p>Now quickly grep the disassembled LLVM IR to see if we can find the
source of the <code class="language-plaintext highlighter-rouge">fneg</code>s:</p>
<div class="language-llvm highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">%316</span> <span class="p">=</span> <span class="k">fneg</span> <span class="kt">double</span> <span class="nv">%315</span>
<span class="nv">%neg</span> <span class="p">=</span> <span class="k">fmul</span> <span class="kt">double</span> <span class="nv">%mul922</span><span class="p">,</span> <span class="nv">%316</span>
<span class="nv">%317</span> <span class="p">=</span> <span class="k">tail</span> <span class="k">call</span> <span class="kt">double</span> <span class="vg">@llvm.fmuladd.f64</span><span class="p">(</span><span class="kt">double</span> <span class="nv">%mul919</span><span class="p">,</span> <span class="kt">double</span> <span class="nv">%314</span><span class="p">,</span> <span class="kt">double</span> <span class="nv">%neg</span><span class="p">)</span>
</code></pre></div></div>
<p>This looks promising. We have a <code class="language-plaintext highlighter-rouge">@llvm.fmuladd</code> that’s being fed by a
<code class="language-plaintext highlighter-rouge">fmul</code> of a <code class="language-plaintext highlighter-rouge">fneg</code>, which is similar to the <code class="language-plaintext highlighter-rouge">(a*b)+(c*-d)</code> pattern in
the resulting assembly. But looking back to our TableGen pattern for
<code class="language-plaintext highlighter-rouge">fmsub.d</code>, we want <code class="language-plaintext highlighter-rouge">(any_fma $rs1, $rs2, (fneg $rs3))</code>, i.e. a
<code class="language-plaintext highlighter-rouge">llvm.fmuladd</code> fed by a <code class="language-plaintext highlighter-rouge">fneg</code> of a <code class="language-plaintext highlighter-rouge">fmul</code>.</p>
<p>One thing about floating point arithmetic is that whilst it’s
generally not associative, we can hoist out the <code class="language-plaintext highlighter-rouge">fneg</code> from the <code class="language-plaintext highlighter-rouge">fmul</code>
since all negation does is flip the sign bit. So we can try to teach
InstCombine to hoist the <code class="language-plaintext highlighter-rouge">fneg</code> outwards like <code class="language-plaintext highlighter-rouge">(fmul x, (fneg y)) ->
(fneg (fmul x, y))</code>. But if we go to try that out we’ll see that
InstCombine already does the exact opposite:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Instruction</span> <span class="o">*</span><span class="n">InstCombinerImpl</span><span class="o">::</span><span class="n">visitFNeg</span><span class="p">(</span><span class="n">UnaryOperator</span> <span class="o">&</span><span class="n">I</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Value</span> <span class="o">*</span><span class="n">Op</span> <span class="o">=</span> <span class="n">I</span><span class="p">.</span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="c1">// ...</span>
<span class="n">Value</span> <span class="o">*</span><span class="n">OneUse</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">match</span><span class="p">(</span><span class="n">Op</span><span class="p">,</span> <span class="n">m_OneUse</span><span class="p">(</span><span class="n">m_Value</span><span class="p">(</span><span class="n">OneUse</span><span class="p">))))</span>
<span class="k">return</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">Instruction</span> <span class="o">*</span><span class="n">R</span> <span class="o">=</span> <span class="n">hoistFNegAboveFMulFDiv</span><span class="p">(</span><span class="n">OneUse</span><span class="p">,</span> <span class="n">I</span><span class="p">))</span>
<span class="k">return</span> <span class="n">replaceInstUsesWith</span><span class="p">(</span><span class="n">I</span><span class="p">,</span> <span class="n">R</span><span class="p">);</span>
<span class="c1">// ...</span>
<span class="p">}</span>
<span class="n">Instruction</span> <span class="o">*</span><span class="n">InstCombinerImpl</span><span class="o">::</span><span class="n">hoistFNegAboveFMulFDiv</span><span class="p">(</span><span class="n">Value</span> <span class="o">*</span><span class="n">FNegOp</span><span class="p">,</span>
<span class="n">Instruction</span> <span class="o">&</span><span class="n">FMFSource</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Value</span> <span class="o">*</span><span class="n">X</span><span class="p">,</span> <span class="o">*</span><span class="n">Y</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">match</span><span class="p">(</span><span class="n">FNegOp</span><span class="p">,</span> <span class="n">m_FMul</span><span class="p">(</span><span class="n">m_Value</span><span class="p">(</span><span class="n">X</span><span class="p">),</span> <span class="n">m_Value</span><span class="p">(</span><span class="n">Y</span><span class="p">))))</span> <span class="p">{</span>
<span class="c1">// Push into RHS which is more likely to simplify (const or another fneg).</span>
<span class="c1">// FIXME: It would be better to invert the transform.</span>
<span class="k">return</span> <span class="n">cast</span><span class="o"><</span><span class="n">Instruction</span><span class="o">></span><span class="p">(</span><span class="n">Builder</span><span class="p">.</span><span class="n">CreateFMulFMF</span><span class="p">(</span>
<span class="n">X</span><span class="p">,</span> <span class="n">Builder</span><span class="p">.</span><span class="n">CreateFNegFMF</span><span class="p">(</span><span class="n">Y</span><span class="p">,</span> <span class="o">&</span><span class="n">FMFSource</span><span class="p">),</span> <span class="o">&</span><span class="n">FMFSource</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>
<p>InstCombine usually has good reasons for canonicalizing certain IR
patterns, so we need to seriously reconsider if we want to change the
canonical form. InstCombine affects all targets and it could be the
case that some other backends have patterns that match <code class="language-plaintext highlighter-rouge">fmul (fneg x,
y)</code>, in which case we don’t want disturb them. However for RISC-V we
know what our patterns for instruction selection are and what form we
want our incoming IR to be in. So a much better place to handle this
in is in RISCVISelLowering.cpp, which lets us massage it into shape at
the SelectionDAG level, in a way that’s localized to just our
target. “Un-canonicalizing” the IR is a common task that backends end
up performing, and this is what the <a href="https://github.com/llvm/llvm-project/pull/157388">resulting
patch</a> ended up
looking like:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
</span><span class="gi">+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
</span><span class="p">@@ -20248,6 +20248,17 @@</span> SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
return V;
break;
case ISD::FMUL: {
<span class="gi">+ using namespace SDPatternMatch;
+ SDLoc DL(N);
+ EVT VT = N->getValueType(0);
+ SDValue X, Y;
+ // InstCombine canonicalizes fneg (fmul x, y) -> fmul x, (fneg y), see
+ // hoistFNegAboveFMulFDiv.
+ // Undo this and sink the fneg so we match more fmsub/fnmadd patterns.
+ if (sd_match(N, m_FMul(m_Value(X), m_OneUse(m_FNeg(m_Value(Y))))))
+ return DAG.getNode(ISD::FNEG, DL, VT,
+ DAG.getNode(ISD::FMUL, DL, VT, X, Y));
+
</span></code></pre></div></div>
<p>And if we rebuild our benchmark after applying it, we can see we the
<code class="language-plaintext highlighter-rouge">fmsub.d</code>s getting matched, saving a couple of instructions:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">@@ -983,18 +983,15 @@</span>
fld ft2, 48(a5)
fld ft3, 64(a5)
fld ft4, 72(a5)
<span class="gd">- fneg.d fa0, fa0
- fneg.d ft0, ft0
- fneg.d ft2, ft2
</span> fmul.d fa3, ft5, fa3
fmul.d fa0, fa3, fa0
fmul.d ft0, fa3, ft0
fmul.d fa3, fa3, ft2
fld ft2, 0(s1)
fmul.d fa4, ft5, fa4
<span class="gd">- fmadd.d fa2, fa4, fa2, fa0
- fmadd.d ft6, fa4, ft6, ft0
- fmadd.d fa4, fa4, ft1, fa3
</span><span class="gi">+ fmsub.d fa2, fa4, fa2, fa0
+ fmsub.d ft6, fa4, ft6, ft0
+ fmsub.d fa4, fa4, ft1, fa3
</span></code></pre></div></div>
<p>All in all this ended up giving a <a href="https://lnt.lukelau.me/db_default/v4/nts/profile/1/1022/1021">1.77% improvement in instruction
count for the 508.namd_r
benchmark.</a>
It’s still not nearly as fast as GCC, but we’re a little bit closer
than before we started.</p>
<h2 id="whats-next">What’s next?</h2>
<p>Hopefully this has given you an overview of how to identify
opportunities for optimization in LLVM, and what a typical fix might
look like. The analysis is really the most important part, but if you
don’t feel like setting up an LNT instance yourself locally Igalia
runs one at
<a href="https://cc-perf.igalia.com">cc-perf.igalia.com</a><sup id="fnref:llvm-lnt"><a href="http://lukelau.me/2025/12/10/closing-the-gap-pt1.html#fn:llvm-lnt" class="footnote" rel="footnote">3</a></sup>. We run
llvm-test-suite and SPEC CPU 2017 nightly built with Clang and GCC on
a small set of RISC-V hardware<sup id="fnref:3"><a href="http://lukelau.me/2025/12/10/closing-the-gap-pt1.html#fn:3" class="footnote" rel="footnote">4</a></sup>, but hopefully to be expanded in
future. Feel free to use it to investigate some of the <a href="https://cc-perf.igalia.com/db_default/v4/nts/70?compare_to=69">differences
between Clang and
GCC</a>
yourself, and maybe you’ll find some inspiration for optimizations.</p>
<p>In the next post in this series I’ll talk about a performance
improvement that recently landed related to cost modelling.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Compiled with <code class="language-plaintext highlighter-rouge">-march=rva22u64_v -O3 -flto</code>, running the train
dataset on a 16GB Banana Pi BPI-F3 (SpacemiT X60), with GCC and
Clang from ToT on 2025-11-25. <a href="http://lukelau.me/2025/12/10/closing-the-gap-pt1.html#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:lld">
<p>LLD in this case, configurable through CMake with
<code class="language-plaintext highlighter-rouge">-DCMAKE_LINKER_TYPE=LLD</code>. <a href="http://lukelau.me/2025/12/10/closing-the-gap-pt1.html#fnref:lld" class="reversefootnote">↩</a></p>
</li>
<li id="fn:llvm-lnt">
<p>The LLVM foundation is also in the process of <a href="https://discourse.llvm.org/t/status-of-lnt-llvm-org/88480?u=lukel">rebooting
its canonical public
server</a>,
which should hopefully be up and running in the coming months. <a href="http://lukelau.me/2025/12/10/closing-the-gap-pt1.html#fnref:llvm-lnt" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>Currently it consists of a few Banana Pi BPI-F3s and some HiFive
Premier P550s, the latter of which were generously donated by
RISC-V International. <a href="http://lukelau.me/2025/12/10/closing-the-gap-pt1.html#fnref:3" class="reversefootnote">↩</a></p>
</li>
</ol>
</div> Luke Lauhttp://lukelau.me/Igalia WebKit Team: WebKit Igalia Periodical #50https://blogs.igalia.com/webkit/blog/2025/wip-50/2025-12-08T20:26:32+00:00
<p>Update on what happened in WebKit in the week from December 1 to December 8.</p>
<p>
In this edition of the periodical we have further advancements on
the Temporal implementation, support for Vivante super-tiled format,
and an adaptation of the DMA-BUF formats code to the Android port.
</p>
<h2 id="cross-port-cat">Cross-Port 🐱</h2>
<h3 id="javascriptcore-fish">JavaScriptCore 🐟</h3>
<div class="wip-description">
<p>The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.</p>
</div>
<div class="wip-item">
<p><a rel="external" href="https://github.com/WebKit/WebKit/pull/54717">Implemented</a> the <code>toString</code>, <code>toJSON</code>, and <code>toLocaleString</code> methods for <code>PlainYearMonth</code> objects in JavaScriptCore's implementation of Temporal.</p>
</div>
<h3 id="graphics-frame-photo">Graphics 🖼️</h3>
<div class="wip-item">
<p><code>BitmapTexture</code> and <code>TextureMapper</code> <a rel="external" href="https://commits.webkit.org/303849@main">were prepared</a> to handle textures where the logical size (e.g. 100×100) differs from the allocated size (e.g. 128×128) due to alignment requirements. This allowed <a rel="external" href="https://commits.webkit.org/303900@main">to add support</a> for using memory-mapped GPU buffers in the Vivante super-tiled format available on i.MX platforms. Set <code>WEBKIT_SKIA_USE_VIVANTE_SUPER_TILED_TILE_TEXTURES=1</code> to activate at runtime.</p>
</div>
<h2 id="wpe-webkit-pager">WPE WebKit 📟</h2>
<h3 id="wpe-platform-api-jigsaw">WPE Platform API 🧩</h3>
<div class="wip-description">
<p>New, modern platform API that supersedes usage of libwpe and WPE backends.</p>
</div>
<div class="wip-item">
<p>The <code>WPEBufferDMABufFormats</code> class has been <a rel="external" href="https://commits.webkit.org/303891@main">renamed</a> to <code>WPEBufferFormats</code>, as it can be used in situations where mechanisms other than DMA-BUF may be used for buffer sharing—on Android targets <a rel="external" href="https://developer.android.com/ndk/reference/group/a-hardware-buffer">AHardwareBuffer</a> is used instead, for example. The naming change involved also <code>WPEBufferFormatsBuilder</code> (renamed from <code>WPEBufferDMABufFormatsBuilder</code>), and methods and signals in other classes that use these types. Other than the renames, there is no change in functionality.</p>
</div>
<div class="wip-end">
<p>That’s all for this week!</p>
</div> Igalia WebKit Teamhttps://blogs.igalia.com/webkitEnrique Ocaña: Meow: Process log text files as if you could make cat speakhttps://eocanha.org/blog/?p=6842025-12-05T11:16:07+00:00
<img class="face" src="/images/eocanha.png" width="100" height="100" alt="" align="right" style="float: right" />
<p>Some years ago I had mentioned <a href="https://eocanha.org/blog/2021/05/25/gstreamer-webkit-debugging-by-using-external-tools-2-2/">some command line tools</a> I used to analyze and find useful information on GStreamer logs. I’ve been using them consistently along all these years, but some weeks ago I thought about unifying them in a single tool that could provide more flexibility in the mid term, and also as an excuse to unrust my Rust knowledge a bit. That’s how I wrote Meow, a tool to make <code>cat</code> speak (that is, to provide meaningful information).</p>
<p>The idea is that you can <code>cat</code> a file through <code>meow</code> and apply the filters, like this:</p>
<p><code>cat /tmp/log.txt | meow appsinknewsample n:V0 n:video ht: \<br /> ft:-0:00:21.466607596 's:#([A-za-z][A-Za-z]*/)*#'</code></p>
<p>which means “select those lines that contain <code>appsinknewsample</code> (with case insensitive matching), but don’t contain <code>V0</code> nor <code>video</code> (that is, by exclusion, only that contain audio, probably because we’ve analyzed both and realized that we should focus on audio for our specific problem), highlight the different thread ids, only show those lines with timestamp lower than 21.46 sec, and change strings like <code>Source/WebCore/platform/graphics/gstreamer/mse/AppendPipeline.cpp</code> to become just <code>AppendPipeline.cpp</code>“, to get an output as shown in this terminal screenshot:</p>
<figure class="wp-block-image size-large"><a href="https://eocanha.org/blog/wp-content/uploads/2025/12/image.png"><img width="1024" height="254" src="https://eocanha.org/blog/wp-content/uploads/2025/12/image-1024x254.png" alt="Screenshot of a terminal output showing multiple log lines. Some of them have the word " /></a></figure>
<p>Cool, isn’t it? After all, I’m convinced that the answer to any GStreamer bug is always hidden in the logs (or will be, as soon as I add “<em>just a couple of log lines more, bro</em>” <img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f92d.png" alt="🤭" class="wp-smiley" />).</p>
<p>Currently, meow supports this set of manipulation commands:</p>
<ul><li><strong>Word filter and highlighting by regular expression</strong> (<code>fc:REGEX</code>, or just <code>REGEX</code>): Every expression will highlight its matched words in a different color.</li><li><strong>Filtering without highlighting</strong> (<code>fn:REGEX</code>): Same as <code>fc:</code>, but without highlighting the matched string. This is useful for those times when you want to match lines that have two expressions (<code>E1</code>, <code>E2</code>) but the highlighting would pollute the line too much. In those case you can use a regex such as <code>E1.*E2</code> and then highlight the subexpressions manually later with an <code>h:</code> rule.</li><li><strong>Negative filter</strong> (<code>n:REGEX</code>): Selects only the lines that don’t match the regex filter. No highlighting.</li><li><strong>Highlight with no filter</strong> (<code>h:REGEX</code>): Doesn’t discard any line, just highlights the specified regex.</li><li><strong>Substitution</strong> (<code>s:/REGEX/REPLACE</code>): Replaces one pattern for another. Any other delimiter character can be used instead of /, it that’s more convenient to the user (for instance, using # when dealing with expressions to manipulate paths).</li><li><strong>Time filter</strong> (<code>ft:TIME-TIME</code>): Assuming the lines start with a GStreamer log timestamp, this filter selects only the lines between the target start and end time. Any of the time arguments (or both) can be omitted, but the <code>-</code> delimiter must be present. Specifying multiple time filters will generate matches that fit on any of the time ranges, but overlapping ranges can trigger undefined behaviour.</li><li><strong>Highlight threads</strong> (<code>ht:</code>): Assuming a GStreamer log, where the thread id appears as the third word in the line, highlights each thread in a different color.</li></ul>
<p>The <code>REGEX</code> pattern is a regular expression. All the matches are case insensitive. When used for substitutions, capture groups can be defined as <code>(?<span class="has-inline-color has-medium-pink-color">CAPTURE_NAME</span>REGEX)</code>.</p>
<p>The <code>REPLACE</code>ment string is the text that the <code>REGEX</code> will be replaced by when doing substitutions. Text captured by a named capture group can be referred to by <code>${<span class="has-inline-color has-medium-pink-color">CAPTURE_NAME</span>}</code>.</p>
<p>The <code>TIME</code> pattern can be any sequence of numbers, <code>:</code> or <code>.</code> . Typically, it will be a GStreamer timestamp (eg: 0:01:10.881123150), but it can actually be any other numerical sequence. Times are compared lexicographically, so it’s important that all of them have the same string length.</p>
<p>The filtering algorithm has a custom set of priorities for operations, so that they get executed in an intuitive order. For instance, a sequence of filter matching expressions (<code>fc:</code>, <code>fn:</code>) will have the same priority (that is, any of them will let a text line pass if it matches, not forbidding any of the lines already allowed by sibling expressions), while a negative filter will only be applied on the results left by the sequence of filters before it. Substitutions will be applied at their specific position (not before or after), and will therefore modify the line in a way that can alter the matching of subsequent filters. In general, the user doesn’t have to worry about any of this, because the rules are designed to generate the result that you would expect.</p>
<p>Now some practical examples:</p>
<p><strong>Example 1</strong>: Select lines with the word “one”, or the word “orange”, or a number, highlighting each pattern in a different color except the number, which will have no color:<br /><br /><code>$ cat file.txt | meow one fc:orange 'fn:[0-9][0-9]*'<br />000 <span class="has-inline-color has-medium-pink-color">one</span> small <span class="has-inline-color has-yellow-color">orange</span><br />005 <span class="has-inline-color has-medium-pink-color">one</span> big <span class="has-inline-color has-yellow-color">orange</span></code></p>
<p><strong>Example 2</strong>: Assuming a pictures filename listing, select filenames not ending in “jpg” nor in “jpeg”, and rename the filename to “.bak”, preserving the extension at the end:<br /><br /><code>$ cat list.txt | meow 'n:jpe?g' \</code><br /><code> </code> <code>'s:#^(?<f>[^.]*)(?<e>[.].*)$#${f}.bak${e}'<br />train.bak.png<br />sunset.bak.gif</code></p>
<p><strong>Example 3</strong>: Only print the log lines with times between 0:00:24.787450146 and 0:00:24.790741865 or those at 0:00:30.492576587 or after, and highlight every thread in a different color:<br /><br /><code>$ cat log.txt | meow ft:0:00:24.787450146-0:00:24.790741865 \<br /> </code> <code>ft:0:00:30.492576587- ht:<br />0:00:24.787450146 739 <span class="has-inline-color has-medium-pink-color">0x1ee2320</span> DEBUG …<br />0:00:24.790382735 739 <span class="has-inline-color has-yellow-color">0x1f01598</span> INFO …<br />0:00:24.790741865 739 <span class="has-inline-color has-medium-pink-color">0x1ee2320</span> DEBUG …<br />0:00:30.492576587 739 <span class="has-inline-color has-yellow-color">0x1f01598</span> DEBUG …<br />0:00:31.938743646 739 <span class="has-inline-color has-yellow-color">0x1f01598</span> ERROR …</code></p>
<p>This is only the begining. I have great ideas for this new tool (as time allows), such as support for parenthesis (so the expressions can be grouped), or call stack indentation on logs generated by tracers, in a similar way to what Alicia’s <a href="https://github.com/ntrrgc/dotfiles/blob/master/bin/gst-log-indent-tracers"><code>gst-log-indent-tracers</code> tool</a> does. I might also predefine some common expressions to use in regular expressions, such as the ones to match paths (so that the user doesn’t have to think about them and reinvent the wheel every time). Anyway, these are only ideas. Only time and hyperfocus slots will tell…</p>
<p>By now, you can <a href="https://github.com/eocanha/meow">find the source code on my github</a>. Meow! <img width="128" height="128" class="wp-image-698" src="https://eocanha.org/blog/wp-content/uploads/2025/12/blobcat.png" alt="" /></p> eocanhahttps://eocanha.org/blogBrian Kardell: Standards Queueshttps://bkardell.com/blog/Queues.html2025-12-04T05:00:00+00:00
<h1 class="contextual-heading">Standards Queues</h1>
<p class="segue">The hardest part of web standards isn’t even the technology — it’s the queues. And that’s the real problem I keep coming back to.</p>
<h2 class="contextual-heading">Pools, Queues, and Bottlenecks</h2>
<p>As programmers, we’re familiar with these kinds of problems: if things enter faster than they leave, they back up. We often need to prioritize among the backlog. The standards process is like several of those queues stacked together.</p>
<p>Ideas enter the system far faster than they leave — and they can come from anywhere. But to progress, you need implementers. They are finite, already busy, and often advancing their own priorities. On top of that, every proposal competes for wide review in privacy, security, architecture, accessibility, and internationalization. Each of those specialties is even more finite, even more busy, and even more backed up.</p>
<p>So an idea lands in hundreds — even thousands — of inboxes, waiting for attention. We might not even notice it as it whips past among all the others. Even if we do, it might just get starred in email or left open in a tab for “later.” Sometimes that idea is a book, or an explainer, or suddenly has 20 replies. Instead of needing five minutes to read and consider, it becomes intimidating.</p>
<p>At some point it just sits. It might wait weeks, months, or even years before someone comments. Why? Because everyone has jobs with other tasks. The queues are full.</p>
<p>And the longer it sits, the more things change around it. The more it unloads from memory. The more intimidating it becomes to return to. It has to get through a whole lot of asynchronous back-and-forth between implementers, spec writers, and test writers before reaching baseline usability.</p>
<p>Along the way, if coordination isn’t strong (and historically it hasn’t been), once work is invested it’s hard to throw away. It’s hard to propose breaking changes or add stop energy.</p>
<h2 class="contextual-heading">Real Impacts</h2>
<p>This is why something like :focus-visible can take seven years. Realistically, it required only a few days of effective discussion. The tests and development weren’t that hard.</p>
<p>The hard part was agreeing on what it should do and which tests it should pass. Most of that difficulty came from the fact that you couldn’t get everyone to sit down and focus concurrently. Implementations — and thus real focus — were years apart.</p>
<h2 class="contextual-heading">Checking Fitness</h2>
<p>Getting “unstuck” isn’t just about moving something forward. One major appeal of the standards process is wide review, but for this to work we need ways for things to fail early enough to shift efforts.</p>
<p>Sometimes failure happens after months or even years. That’s frustrating and demoralizing. It’s like standing in a long DMV line, inching forward, only to discover at the end that you’re in the wrong building.</p>
<p>All of this is made worse by the fact that queues keep getting fuller.</p>
<h2 class="contextual-heading">Things that help</h2>
<h3 class="contextual-heading">Interop</h3>
<p>The Interop project illustrates the very end of the process, and in many ways the simplest.</p>
<p>Without intervention, each implementer historically built their own priority queue from all possible shortcomings of their browser engine. There’s a huge pool of things to choose from. I’ve written before about how WPT and the dashboard aren’t the best way to view this, but there are currently almost 23k subtests that fail in every browser (or almost 11k that fail and aren’t marked tentative).</p>
<p>Interop coordinates efforts to choose an achievable set of things from this gigantic pool that meet strict criteria: all that’s left is implementation work. It’s been hugely successful because it ensures delivery. It also helps us deliver early when people are excited about common priorities. In those cases, the impact is huge — we can go from mostly words to three interoperable implementations in one year. Amazing.</p>
<p>Still, every year a huge number of things remain in the pool that we can’t afford to take up. The pool keeps growing.</p>
<h3 class="contextual-heading">Joint Meetings</h3>
<p>The WHATWG has started holding regular joint meetings with groups like OpenUI and CSSWG. This is valuable because it allows the right people to agree on an agenda and discuss directly, rather than leaving issues to sit unnoticed or requiring endless pings for attention.</p>
<p>W3C's TPAC is an annual event with five days of meetings (both W3C and WHATWG), many of them joint with wide-review specialists. These are dedicated times to get a lot of people in the same rooms for extended periods. The availability for hallway conversations also matters a lot: You can corner people there in ways that are much harder when everyone is remote. More progress happens at TPAC than in half the rest of the year combined.</p>
<p>Timely coordination — and investment to make it plausible — is still the single biggest problem we face in standards. I'd love to see us find ways to improve that.</p> Brian Kardellhttp://bkardell.com/Igalia WebKit Team: WebKit Igalia Periodical #49https://blogs.igalia.com/webkit/blog/2025/wip-49/2025-12-02T14:15:36+00:00
<p>Update on what happened in WebKit in the week from November 24 to December 1.</p>
<p>
The main highlights for this week are the completion of `PlainMonthDay`
in Temporal, moving networking access for GstWebRTC to the WebProcess,
and Xbox Cloud Gaming now working in the GTK and WPE ports.
</p>
<h2 id="cross-port-cat">Cross-Port 🐱</h2>
<h3 id="multimedia-movie-camera">Multimedia 🎥</h3>
<div class="wip-description">
<p>GStreamer-based multimedia support for WebKit, including (but not limited to)
playback, capture, WebAudio, WebCodecs, and WebRTC.</p>
</div>
<div class="wip-item">
<p><a rel="external" href="https://xbox.com/play">Xbox Cloud Gaming</a> is now usable in WebKitGTK and WPE
with the GstWebRTC backend, we had to fix <a rel="external" href="https://commits.webkit.org/303668@main">non-spec compliant ICE candidates
handling</a> and add a <a rel="external" href="https://commits.webkit.org/303669@main">WebRTC quirk
forcing <code>max-bundle</code> in PeerConnections</a>
to make it work. Happy cloud gaming!</p>
</div>
<div class="wip-item">
<p>Support for remote inbound RTP statistics was improved in
<a rel="external" href="https://commits.webkit.org/303671@main">303671@main</a>, we now properly report
<code>framesPerSecond</code> and <code>totalDecodeTime</code> metrics, those fields are used in the
Xbox Cloud Gaming service to show live stats about the connection and video
decoder performance in an overlay.</p>
</div>
<div class="wip-item">
<p>The GstWebRTC backend now relies on
<a rel="external" href="https://github.com/ystreet/librice">librice</a> for its
<a rel="external" href="https://en.wikipedia.org/wiki/Interactive_Connectivity_Establishment">ICE</a>.
The Sans-IO architecture of librice allows us to keep the WebProcess sandboxed
and to route WebRTC-related UDP and (eventually) TCP packets using the
NetworkProcess. This work landed in
<a rel="external" href="https://commits.webkit.org/303623@main">303623@main</a>. The GNOME SDK should
also soon <a rel="external" href="https://gitlab.gnome.org/GNOME/gnome-build-meta/-/merge_requests/4146">ship
librice</a>.</p>
</div>
<div class="wip-item">
<p>Support for seeking in <code>loop</code>ing videos was fixed in
<a rel="external" href="https://commits.webkit.org/303539@main">303539@main</a>.</p>
</div>
<h3 id="javascriptcore-fish">JavaScriptCore 🐟</h3>
<div class="wip-description">
<p>The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.</p>
</div>
<div class="wip-item">
<p>Implemented the <a rel="external" href="https://github.com/WebKit/WebKit/pull/54563"><code>valueOf</code></a> and
<a rel="external" href="https://github.com/WebKit/WebKit/pull/54342"><code>toPlainDate</code></a> for <code>PlainMonthDay</code> objects.
This completes the implementation of
<a rel="external" href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Temporal">Temporal</a>
<code>PlainMonthDay</code> objects in JSC!</p>
</div>
<h2 id="webkitgtk-desktop">WebKitGTK 🖥️</h2>
<div class="wip-item">
<p>The GTK port has <a rel="external" href="https://commits.webkit.org/303532@main">gained support</a> for
interpreting touch input as <a rel="external" href="https://developer.mozilla.org/en-US/docs/Web/API/Pointer_events">pointer
events</a>. This
matches the behaviour of other browsers by following the corresponding
specifications.</p>
</div>
<h2 id="wpe-webkit-pager">WPE WebKit 📟</h2>
<div class="wip-item">
<p><a rel="external" href="https://github.com/WebKit/WebKit/pull/54398">Fixed</a> an issue that prevented
WPE from processing further input events after receiving a secondary mouse
button press.</p>
</div>
<div class="wip-item">
<p><a rel="external" href="https://commits.webkit.org/303518@main">Fixed</a> an issue that caused right
mouse button clicks to prevent processing of further pointer events.</p>
</div>
<h3 id="wpe-platform-api-jigsaw">WPE Platform API 🧩</h3>
<div class="wip-description">
<p>New, modern platform API that supersedes usage of libwpe and WPE backends.</p>
</div>
<div class="wip-item">
<p>We landed a <a rel="external" href="https://commits.webkit.org/303531@main">patch</a> to add a new signal
in <code>WPEDisplay</code> to notify when the connection to the native display has been lost.</p>
</div>
<h2 id="infrastructure-construction-site">Infrastructure 🏗️</h2>
<div class="wip-item">
<p>Modernized the CMake modules used to find
<a rel="external" href="https://commits.webkit.org/303333@main">libtasn1</a>,
<a rel="external" href="https://commits.webkit.org/303341@main">libsecret</a>,
<a rel="external" href="https://commits.webkit.org/303118@main">libxkbcommon</a>,
<a rel="external" href="https://commits.webkit.org/303127@main">libhyphen</a>, and
<a rel="external" href="https://commits.webkit.org/303387@main">Enchant</a> libraries.</p>
<p>Note that this work removed the support for building against
<a rel="external" href="https://rrthomas.github.io/enchant/">Enchant</a> 1.x, and only version 2 will be
supported. The first stable release to require Enchant 2.x will be 2.52.0 due
in March 2026. Major Linux and BSD distributions have included Enchant 2
packages for years, and therefore this change is not expected to cause any
trouble. The Enchant library is used by the GTK port for spell checking.</p>
</div>
<h2 id="community-events-handshake">Community & Events 🤝</h2>
<div class="wip-item">
<p>We have published <a rel="external" href="https://conflor.es/blog/2025-11-27-interop-and-mathml/">an
article</a> detailing our
work making <a rel="external" href="https://developer.mozilla.org/en-US/docs/Web/MathML">MathML</a>
interoperable across browser engines! It has live demonstrations and feature
tables with our progress on WebKit support.</p>
</div>
<div class="wip-item">
<p>We have published new blogs post highlighting the most important changes in
both <a rel="external" href="https://wpewebkit.org/blog/2025-11-27-wpewebkit-2.50.html">WPE WebKit</a>
and <a rel="external" href="https://webkitgtk.org/2025/11/26/webkitgtk-2.50.html">WebKitGTK</a> 2.50.
Enjoy!</p>
</div>
<div class="wip-end">
<p>That’s all for this week!</p>
</div> Igalia WebKit Teamhttps://blogs.igalia.com/webkitAlex Bradbury: QEMU-based instruction execution countinghttps://muxup.com/2025q4/qemu-based-instruction-execution-counting2025-12-02T12:00:00+00:00
<p>Although analysing performance by way of instruction counting has obvious
limitations, it can be helpful (especially when combined with appropriate
analysis scripts) to get rapid feedback on the impact of code generation
changes or to explore hypotheses about why code from one compiler might be
performing differently from another - for instance, by looking at instruction
mix in the most executed translation blocks. In this post we'll look at how to
capture the necessary data to perform such an analysis using a QEMU plugin.
Future posts will give details of the analysis scripts I've used, and walk
through an example or two of putting them to use.</p>
<h2 id="modifying-qemu"><a href="https://muxup.com/feed.xml#modifying-qemu" class="anchor" tabindex="-1"></a>Modifying QEMU</h2>
<p>Over the past few years, QEMU's plugin API has developed a fair bit. QEMU
includes several plugins, and <code>hotblocks</code> provides <em>almost</em> what we want but
doesn't allow configurability of the number of blocks it will print
information on. I submitted a <a href="https://lore.kernel.org/qemu-devel/cf5a00136738b981a12270b76572e8d502daf208.1753857212.git.asb@igalia.com/T/">small patch
series</a>
(and <a href="https://lore.kernel.org/qemu-devel/[email protected]/">submitted it a second
time</a>
addressing this and other minor issues found along the way. The series has now
been <a href="https://lore.kernel.org/qemu-devel/[email protected]/">accepted by the
maintainer</a>.</p>
<p>To build QEMU with this patch:</p>
<div class="highlight"><pre><span></span><code>git clone https://github.com/qemu/qemu <span>&&</span> <span>cd</span> qemu
git checkout v10.1.2
cat - <span><<'EOF' > hotblocks.patch</span>
<span>index 98404b6885..8ecf033997 100644</span>
<span>--- a/contrib/plugins/hotblocks.c</span>
<span>+++ b/contrib/plugins/hotblocks.c</span>
<span>@@ -73,28 +73,29 @@ static void exec_count_free(gpointer key, gpointer value, gpointer user_data)</span>
<span> static void plugin_exit(qemu_plugin_id_t id, void *p)</span>
<span> {</span>
<span> g_autoptr(GString) report = g_string_new("collected ");</span>
<span>- GList *counts, *it;</span>
<span>+ GList *counts, *sorted_counts, *it;</span>
<span> int i;</span>
<span> g_string_append_printf(report, "%d entries in the hash table\n",</span>
<span> g_hash_table_size(hotblocks));</span>
<span> counts = g_hash_table_get_values(hotblocks);</span>
<span>- it = g_list_sort_with_data(counts, cmp_exec_count, NULL);</span>
<span>+ sorted_counts = g_list_sort_with_data(counts, cmp_exec_count, NULL);</span>
<span>- if (it) {</span>
<span>+ if (sorted_counts) {</span>
<span> g_string_append_printf(report, "pc, tcount, icount, ecount\n");</span>
<span>- for (i = 0; i < limit && it->next; i++, it = it->next) {</span>
<span>+ for (i = 0, it = sorted_counts; (limit == 0 || i < limit) && it;</span>
<span>+ i++, it = it->next) {</span>
<span> ExecCount *rec = (ExecCount *) it->data;</span>
<span> g_string_append_printf(</span>
<span>- report, "0x%016"PRIx64", %d, %ld, %"PRId64"\n",</span>
<span>+ report, "0x%016"PRIx64", %d, %ld, %"PRIu64"\n",</span>
<span> rec->start_addr, rec->trans_count,</span>
<span> rec->insns,</span>
<span> qemu_plugin_u64_sum(</span>
<span> qemu_plugin_scoreboard_u64(rec->exec_count)));</span>
<span> }</span>
<span>- g_list_free(it);</span>
<span>+ g_list_free(sorted_counts);</span>
<span> }</span>
<span> qemu_plugin_outs(report->str);</span>
<span>@@ -170,6 +171,13 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,</span>
<span> fprintf(stderr, "boolean argument parsing failed: %s\n", opt);</span>
<span> return -1;</span>
<span> }</span>
<span>+ } else if (g_strcmp0(tokens[0], "limit") == 0) {</span>
<span>+ char *endptr = NULL;</span>
<span>+ limit = g_ascii_strtoull(tokens[1], &endptr, 10);</span>
<span>+ if (endptr == tokens[1] || *endptr != '\0') {</span>
<span>+ fprintf(stderr, "unsigned integer parsing failed: %s\n", opt);</span>
<span>+ return -1;</span>
<span>+ }</span>
<span> } else {</span>
<span> fprintf(stderr, "option parsing failed: %s\n", opt);</span>
<span> return -1;</span>
<span>diff --git a/docs/about/emulation.rst b/docs/about/emulation.rst</span>
<span>index 4a7d1f4178..e8793b0f9c 100644</span>
<span>--- a/docs/about/emulation.rst</span>
<span>+++ b/docs/about/emulation.rst</span>
<span>@@ -463,6 +463,18 @@ Example::</span>
<span> 0x000000004002b0, 1, 4, 66087</span>
<span> ...</span>
<span>+Behaviour can be tweaked with the following arguments:</span>
<span>+</span>
<span>+.. list-table:: Hot Blocks plugin arguments</span>
<span>+ :widths: 20 80</span>
<span>+ :header-rows: 1</span>
<span>+</span>
<span>+ * - Option</span>
<span>+ - Description</span>
<span>+ * - inline=true|false</span>
<span>+ - Use faster inline addition of a single counter.</span>
<span>+ * - limit=N</span>
<span>+ - The number of blocks to be printed. (Default: N = 20, use 0 for no limit).</span>
<span> Hot Pages</span>
<span> .........</span>
<span>EOF</span>
patch -p1 < hotblocks.patch
./configure --prefix<span>=</span><span>$(pwd)</span>/inst --target-list<span>=</span><span>"riscv32-linux-user riscv64-linux-user"</span>
make -j<span>$(</span>nproc<span>)</span>
<span>cd</span> ..
</code></pre></div>
<h2 id="using-this-plugin-to-capture-statistics-from-running-a-binary-under-qemu-user"><a href="https://muxup.com/feed.xml#using-this-plugin-to-capture-statistics-from-running-a-binary-under-qemu-user" class="anchor" tabindex="-1"></a>Using this plugin to capture statistics from running a binary under qemu-user</h2>
<p>Assuming you have an <a href="https://muxup.com/2024q4/rootless-cross-architecture-debootstrap">appropriate
sysroot</a>, you can
run a binary and have the execution information emitted to stderr by doing
something like:</p>
<div class="highlight"><pre><span></span><code><span>QEMUDIR=$HOME</span>/qemu/build
<span>SYSROOT=$HOME</span>/rvsysroot
<span>$QEMUDIR</span>/qemu-riscv64 <span>\</span>
-L <span>$SYSROOT</span> <span>\</span>
-plugin <span>$QEMUDIR</span>/contrib/plugins/libhotblocks.so,limit<span>=</span><span>0</span>,inline<span>=</span>on <span>\</span>
-d plugin,nochain <span>\</span>
my_rv64_binary
</code></pre></div>
<p>This produces output like:</p>
<pre><code>collected 2229 entries in the hash table
pc, tcount, icount, ecount
0x00007fffee7012ba, 1, 1, 3737
0x00007fffee7012be, 1, 3, 3737
0x00007ffff741e738, 1, 23, 1074
0x00007fffee71bb38, 1, 5, 884
0x00007ffff741bb2e, 1, 11, 662
...
</code></pre>
<p>This listing indicates the address of the translation block, the number of
times it's been translated, the number of instructions it contains, and the
number of times it was executed. Note that a translation block is not the same
as a basic block in the compiler. A translation block can span multiple basic
blocks in the case of fallthrough, and this can also mean an instruction may
show up in multiple translation blocks.</p>
<p>At least for my use cases, I need something a bit more involved than this. In
order to add collection of these statistics to an existing benchmark harness I
need a wrapper script that transparently collects these statistics to a file.
It's also helpful to capture the runtime address of executable mappings for
loaded libraries, allowing translation blocks to be attributed easily to
either the binary itself or <code>libc</code>, <code>libm</code> etc. We have <code>gdb</code> connect to
QEMU's gdbserver in order to dump those mappings. Do ensure you're using a
recent version of QEMU (the version suggested in the patch application
instructions is definitely good) for this as I wasted quite some time running
into a <a href="https://github.com/qemu/qemu/commit/8b647bd352505234cab2acd2422aba183a1aa1fd">bug with file descriptor
numbers</a>
that caused odd breakage.</p>
<p>This <code>qemu-forwarder.sh</code> script will capture the plugin's output in a
<code>.qemu_out</code> file and the mappings in a <code>.map</code> file, both of which can be later
consumed by a detailed analysis script.</p>
<div class="highlight"><pre><span></span><code><span>#!/bin/sh</span>
<span>QEMUDIR=$HOME</span>/qemu/build
<span>SYSROOT=$HOME</span>/rvsysroot
<span>QEMU=</span><span>"</span><span>$QEMUDIR</span><span>/qemu-riscv64 \</span>
<span> -L </span><span>$SYSROOT</span><span> \</span>
<span> -plugin </span><span>$QEMUDIR</span><span>/contrib/plugins/libhotblocks.so,limit=0,inline=on \</span>
<span> -d plugin,nochain"</span>
<span>SUFFIX=</span><span>""</span>
<span>if</span> <span>[</span> -e <span>"</span><span>$1</span><span>.qemu_out"</span> <span>]</span>; <span>then</span>
<span>NUM=</span><span>1</span>
<span>while</span> <span>[</span> -e <span>"</span><span>$1</span><span>.qemu_out.</span><span>$NUM</span><span>"</span> <span>]</span>; <span>do</span>
<span>NUM=</span><span>$((</span><span>NUM</span> <span>+</span> <span>1</span><span>))</span>
<span>done</span>
<span>SUFFIX=</span><span>".</span><span>$NUM</span><span>"</span>
<span>fi</span>
<span>GDB_SOCK=</span><span>$(</span>mktemp -u<span>)</span>
setarch <span>$(</span>uname -m<span>)</span> -R <span>$QEMU</span> -g <span>$GDB_SOCK</span> -D <span>$1</span>.qemu_out<span>$SUFFIX</span> <span>"</span><span>$@</span><span>"</span> &
<span>QEMU_PID=$!</span>
<span>RETRY_COUNT=</span><span>0</span>
<span>while</span> ! <span>[</span> -e <span>"</span><span>$GDB_SOCK</span><span>"</span> <span>]</span>; <span>do</span>
<span>RETRY_COUNT=</span><span>$((</span><span>RETRY_COUNT</span> <span>+</span> <span>1</span><span>))</span>
<span>if</span> <span>[</span> <span>$RETRY_COUNT</span> -eq <span>10</span> <span>]</span>; <span>then</span>
<span>echo</span> <span>"Timed out waiting for gdb socket to be created"</span>
<span>exit</span> <span>1</span>
<span>fi</span>
sleep <span>0</span>.1
<span>if</span> ! <span>kill</span> -0 <span>$QEMU_PID</span> <span>2</span>>/dev/null; <span>then</span>
<span>echo</span> <span>"QEMU process died before gdb socket was created"</span>
<span>wait</span> <span>$QEMU_PID</span>
<span>exit</span> <span>$?</span>
<span>fi</span>
<span>done</span>
gdb -batch <span>\</span>
-ex <span>"set pagination off"</span> <span>\</span>
-ex <span>"target remote </span><span>$GDB_SOCK</span><span>"</span> <span>\</span>
-ex <span>"break main"</span> <span>\</span>
-ex <span>"continue"</span> <span>\</span>
-ex <span>"set logging file </span><span>$1</span><span>.map</span><span>$SUFFIX</span><span>"</span> <span>\</span>
-ex <span>"set logging enabled on"</span> <span>\</span>
-ex <span>"info proc mappings"</span> <span>\</span>
-ex <span>"detach"</span> > /dev/null <span>2</span>>&<span>1</span>
<span>wait</span> <span>$QEMU_PID</span>
</code></pre></div>
<p>The above will work under LLVM's <code>lit</code>, though you will need to use a recent
enough version that doesn't strip <code>HOME</code> from the environment (or else edit
the script accordingly). It also produces output in sequentially numbered
files, again motivated by the desire to run under this script from <code>lit</code> as
used by <code>llvm-test-suite</code>'s SPEC configuration which can involve multiple
invocations of the same binary for a given benchmark (e.g. 500.perlbench_r).</p>
<h2 id="analysing-the-output"><a href="https://muxup.com/feed.xml#analysing-the-output" class="anchor" tabindex="-1"></a>Analysing the output</h2>
<p>A follow-up post will introduce the scripting I've built around this.</p>
<h2 id="recording-and-analysing-results-from-running-spec"><a href="https://muxup.com/feed.xml#recording-and-analysing-results-from-running-spec" class="anchor" tabindex="-1"></a>Recording and analysing results from running SPEC</h2>
<p>Assuming you have <code>qemu-forwarder.sh</code>, in your llvm-test-suite directory:</p>
<div class="highlight"><pre><span></span><code><span>CONF=</span>clang-head-test
<span>CLANG_BIN_DIR=$HOME</span>/llvm-project/build/release/bin
<span>CFLAGS=</span><span>"-march=rv64gc_zba_zbb_zbs"</span>
cat - <span><<EOF > $CONF.cmake</span>
<span>set(CMAKE_SYSTEM_NAME Linux)</span>
<span>set(CMAKE_SYSROOT $HOME/rvsysroot)</span>
<span>set(CMAKE_C_COMPILER $CLANG_BIN_DIR/clang)</span>
<span>set(CMAKE_CXX_COMPILER $CLANG_BIN_DIR/clang++)</span>
<span>set(CMAKE_C_COMPILER_TARGET riscv64-linux-gnu)</span>
<span>set(CMAKE_CXX_COMPILER_TARGET riscv64-linux-gnu)</span>
<span>set(CMAKE_C_FLAGS_INIT "$CFLAGS")</span>
<span>set(CMAKE_CXX_FLAGS_INIT "$CFLAGS")</span>
<span>set(CMAKE_LINKER_TYPE LLD)</span>
<span>set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)</span>
<span>set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)</span>
<span>set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)</span>
<span>set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)</span>
<span>EOF</span>
cmake -G Ninja <span>\</span>
-B build.<span>$CONF</span> <span>\</span>
--toolchain<span>=$CONF</span>.cmake <span>\</span>
-DTEST_SUITE_SPEC2017_ROOT<span>=</span>~/cpu2017 <span>\</span>
-DTEST_SUITE_SUBDIRS<span>=</span>External/SPEC <span>\</span>
-DTEST_SUITE_COLLECT_CODE_SIZE<span>=</span>OFF <span>\</span>
-DTEST_SUITE_COLLECT_COMPILE_TIME<span>=</span>OFF <span>\</span>
-DTEST_SUITE_USER_MODE_EMULATION<span>=</span>ON <span>\</span>
-DTEST_SUITE_RUN_UNDER<span>=</span><span>$(pwd)</span>/qemu-forwarder.sh
cmake --build build.<span>$CONF</span>
<span>$CLANG_BIN_DIR</span>/llvm-lit -v --filter-out<span>=</span><span>'.+_s|specrand'</span> build.<span>$CONF</span>
</code></pre></div>
<p>The <code>526.blender_r</code> test takes twice as long as the others, so you may wish to
skip it by instead executing something like:</p>
<div class="highlight"><pre><span></span><code><span>$CLANG_BIN_DIR</span>/llvm-lit -v --filter-out<span>=</span><span>'.+_s|specrand|blender'</span> build.<span>$CONF</span>
</code></pre></div>
<p>If you want to re-run tests you must delete the previous <code>.qemu_out</code> and
<code>.map</code> files, which can be done with:</p>
<div class="highlight"><pre><span></span><code><span>[</span> -n <span>"build.</span><span>$CONF</span><span>"</span> <span>]</span> <span>&&</span> find <span>"build.</span><span>$CONF</span><span>"</span> -type f -name <span>"*.qemu_out*"</span> -exec sh -c <span>'</span>
<span> for q_file do</span>
<span> base_path="${q_file%.qemu_out*}"</span>
<span> rm -f "$q_file" "${base_path}.map"*</span>
<span> done</span>
<span>'</span> sh <span>{}</span> +
</code></pre></div>
<p>In order to compare two SPEC builds, you can use something like the following
hacky script. Using the captured translation block execution data to generate
a plain executed instruction count is overkill as the example
<a href="https://www.qemu.org/docs/master/about/emulation.html#instruction">tests/tcg/plugin/insn.c</a>
can easily dump for this for you directly. But by collecting the data upfront,
you can easily dive right into a more detailed analysis when you see a
surprising difference in executed instruction counts without rerunning the
binary.</p>
<div class="highlight"><pre><span></span><code><span>#!/usr/bin/env python3</span>
<span>from</span> <span>pathlib</span> <span>import</span> <span>Path</span>
<span>from</span> <span>collections</span> <span>import</span> <span>defaultdict</span>
<span>import</span> <span>sys</span>
<span>def</span> <span>collect_totals</span>(<span>root_dir</span>):
<span>totals</span> <span>=</span> <span>defaultdict</span>(<span>int</span>)
<span>root_path</span> <span>=</span> <span>Path</span>(<span>root_dir</span>)<span>/</span><span>"External"</span>
<span>for</span> <span>file_path</span> <span>in</span> <span>root_path.rglob</span>(<span>"*.qemu_out*"</span>):
<span>benchmark_name</span> <span>=</span> <span>file_path.parts</span>[<span>4</span>]
<span>try</span>:
<span>with</span> <span>file_path.open</span>(<span>"r"</span>) <span>as</span> <span>f</span>:
<span>file_total</span> <span>=</span> <span>0</span>
<span>for</span> <span>line</span> <span>in</span> <span>f</span>:
<span>parts</span> <span>=</span> <span>line.strip</span>()<span>.split</span>(<span>','</span>)
<span># Only sum lines that match the expected format.</span>
<span>if</span> <span>len</span>(<span>parts</span>) <span>==</span> <span>4</span> <span>and</span> <span>parts</span>[<span>2</span>]<span>.strip</span>()<span>.isdigit</span>():
<span># icount * ecount.</span>
<span>file_total</span> <span>+=</span> <span>int</span>(<span>parts</span>[<span>2</span>]) <span>*</span> <span>int</span>(<span>parts</span>[<span>3</span>])
<span>totals</span>[<span>benchmark_name</span>] <span>+=</span> <span>file_total</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span>:
<span>print</span>(<span>f"Error reading {</span><span>file_path</span><span>}: {</span><span>e</span><span>}"</span>)
<span>return</span> <span>totals</span>
<span>if</span> <span>__name__</span> <span>==</span> <span>"__main__"</span>:
<span>if</span> <span>len</span>(<span>sys.argv</span>) <span>!=</span> <span>3</span>:
<span>print</span>(<span>"Usage: spec-compare-helper <dir_a> <dir_b>"</span>)
<span>sys.exit</span>(<span>1</span>)
<span>dir_a</span>, <span>dir_b</span> <span>=</span> <span>sys.argv</span>[<span>1</span>], <span>sys.argv</span>[<span>2</span>]
<span>totals_a</span> <span>=</span> <span>collect_totals</span>(<span>dir_a</span>)
<span>totals_b</span> <span>=</span> <span>collect_totals</span>(<span>dir_b</span>)
<span>benchmarks</span> <span>=</span> <span>sorted</span>(<span>set</span>(<span>totals_a.keys</span>()) <span>|</span> <span>set</span>(<span>totals_b.keys</span>()))
<span>print</span>(<span>f"{'Benchmark':<20} {'DirA':>15} {'DirB':>15} {'Diff (%)':>10}"</span>)
<span>print</span>(<span>"="</span> <span>*</span> <span>60</span>)
<span>for</span> <span>benchmark</span> <span>in</span> <span>benchmarks</span>:
<span>val_a</span> <span>=</span> <span>totals_a.get</span>(<span>benchmark</span>, <span>0</span>)
<span>val_b</span> <span>=</span> <span>totals_b.get</span>(<span>benchmark</span>, <span>0</span>)
<span>diff_pct</span> <span>=</span> ((<span>val_b</span> <span>-</span> <span>val_a</span>) <span>/</span> <span>val_a</span> <span>*</span> <span>100</span>) <span>if</span> <span>val_a</span> <span>else</span> <span>float</span>(<span>"inf"</span>)
<span>print</span>(<span>f"{</span><span>benchmark</span><span>:<20} {</span><span>val_a</span><span>:>15} {</span><span>val_b</span><span>:>15} {</span><span>diff_pct</span><span>:>9.2f}%"</span>)
</code></pre></div>
<p>Which produces output looking something like this:</p>
<pre><code>Benchmark DirA DirB Diff (%)
============================================================
500.perlbench_r 180245097594 182078714777 1.02%
502.gcc_r 220874510659 219647717585 -0.56%
505.mcf_r 131589945456 134271153130 2.04%
508.namd_r 220648061019 216682202888 -1.80%
510.parest_r 291341820355 291844973715 0.17%
511.povray_r 31911866906 31103201809 -2.53%
519.lbm_r 94166321698 86910581403 -7.71%
520.omnetpp_r 138002605692 137676301622 -0.24%
523.xalancbmk_r 283566182007 284735075518 0.41%
525.x264_r 380165035845 379862173371 -0.08%
526.blender_r 660528270138 659361380750 -0.18%
531.deepsjeng_r 355058534962 349621355155 -1.53%
538.imagick_r 238573643488 238560676372 -0.01%
541.leela_r 421886351310 405423320484 -3.90%
544.nab_r 415595728542 391443973852 -5.81%
557.xz_r 132548718317 130229753780 -1.75%
</code></pre>
<p>It's worth highlighting that as we're running this under user-mode emulation,
the dynamic instruction count naturally never counts any instructions on the
kernel side that you would see if profiling a real system.</p>
<hr /><a href="https://muxup.com/feed.xml#article-changelog" class="anchor" tabindex="-1"></a>Article changelog
<ul>
<li>2025-12-15: Note that the qemu patches have now been accepted in the
maintainer's tree.</li>
<li>2025-12-02: Initial publication date.</li>
</ul> Alex Bradburyhttps://muxup.comManuel Regohttps://blogs.igalia.com/mrego/blog/2025-12-02/2025-12-02T00:00:00+00:00
<p>You can now easily customize find-in-page with the new <a href="https://drafts.csswg.org/css-pseudo/#selectordef-search-text"><code>::search-text</code> pseudo-element</a>, that is shipping in Chromium 144.0.7547. 🚀</p>
<p><img src="https://blogs.igalia.com/mrego/files/2025/12/search-text.png" alt="Screenshot of the following CSS as an example of how to customize find-in-page: :root::search-text { background: yellow; } :root::search-text:current { color: white; background: olive; text-decoration: underline; } aside::search-text { background: magenta; } aside::search-text:current { background: darkmagenta; text-decoration: underline; }" /></p>
<p><video src="https://blogs.igalia.com/mrego/files/2025/12/search-text.mp4" controls=""></video></p>
<p><a href="https://blogs.igalia.com/schenney/find-in-page-highlight-styling/">Find more details on the blog post by Stephen Chenney</a>. Thanks to Bloomberg for sponsoring this work.</p> Manuel Regohttps://blogs.igalia.com/mrego/Alex Bradbury: Minipost: Olmo 3 training costhttps://muxup.com/2025q4/minipost-olmo3-training-cost2025-12-01T12:00:00+00:00
<p>Recently I jotted down some notes on <a href="https://muxup.com/2025q4/minipost-llm-inference-vs-training-cost-for-deepseek">LLM inference vs training costs for
DeepSeek</a>
and I wanted to add on an additional datapoint for training cost based on the
recently released <a href="https://allenai.org/blog/olmo3">Olmo3 models</a> from the
Allen Institute for AI ("Ai2"). The model family has 7B and 32B parameter
models, with 'Think' variants available for 7B and 32B but so far only a 7B
'Instruct' non-reasoning version (but <a href="https://xcancel.com/allen_ai/status/1991545790263857609">watch this
space</a>). What's
particularly interesting about the Olmo models to me is that beyond providing
open weights, the training scripts and datasets are openly available as well.</p>
<p>Going by the reported benchmarks at least it's competitive with less open
models at a similar size, and importantly they've increased the supported
context length from the rather limiting 4k tokens supported by the Olmo 2
series to a much more usable 64k tokens. Given the relatively small size these
models are less capable than relatively chunky models like DeepSeek R1/V3.x or
Kimi K2, but I've been impressed by the capability of 32B dense models for
basic queries, and from my non-scientific testing both the 32B and 7B Olmo3
variants seem to do a reasonable job of summarising things like discussion
threads. You can experiment yourself at
<a href="https://playground.allenai.org/">playground.allenai.org</a>.</p>
<h2 id="energy-required-for-training-olmo-3"><a href="https://muxup.com/feed.xml#energy-required-for-training-olmo-3" class="anchor" tabindex="-1"></a>Energy required for training Olmo 3</h2>
<p>One of the neat things about this level of openness is that it <em>should</em> act as
a floor in terms of performance for future models of this size class assuming
they're appropriately funded and don't take too many risks chasing novelty.
Rerunning the training process with an updated dataset and some minor tweaks
is something you could imagine doing on some regular cadence, ideally as a
shared endeavour. Imagining this effort in the future, how much energy is
required? The initial version of the <a href="http://allenai.org/papers/olmo3">detailed Olmo 3 technical
report</a> unfortunately has little to say on
this. We can get a back of the envelope figure in terms of GPU hours for
pre-training based on the reported 7700 tokens per second per GPU for the 7B
base model and 1900 tokens per second for the 32B base model and the ~6T token
dataset. But even better than that, we can just <strong>ask</strong> the Ai2 folks
(sometimes the internet really does work wonderfully!). After asking on their
<a href="https://discord.gg/ai2">public Discord</a> I was rapidly furnished with this
helpful answer:</p>
<blockquote>
For some detailed numbers, we measured power consumption throughout training,
along with total GPU hours. We used ~234k H100 hours to pretrain the 7B, and
~1.05m H100 hours to pretrain the 32B. 1900 TPS is generally what our trainer
is capable of, but with restarts, evaluations, checkpointing, and occasional
network issues, the 32B took 1.05m hours. We measured an average power
consumption of ~621W while pretraining the 7B and ~649W while pretraining the
32B, and this means that our GPUs consumed ~146MWh for the 7B and ~681MWh for
the 32B. We'll include more detailed GPU hour information in a future version
of the paper, including for post-training!
<p><em>Ai2 Olmo 3 team <a href="https://discord.com/channels/1241138968448340109/1441462011618922647/1441471645046014038">on their
Discord</a>.</em></p>
</blockquote>
<p>So that's 0.681 GWh in GPU power draw for pretraining the 32B model and
0.146 GWh in GPU power draw for pretraining the 7B model. As noted in the
quote, this is inclusive of restarts, checkpointing etc. But perhaps won't
include previous early stage experimentation. I look forward to an updated
technical report with full details, but pretraining should cover the bulk of
the compute requirements (as a reference point, today's <a href="https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/main/assets/paper.pdf">DeepSeek V3.2
paper</a>
found it notable that the post-training compute budget exceeded 10% of the
pretraining cost).</p>
<p>The 0.681 GWh figure doesn't account for full system power and cooling
cost. I'd love to be corrected, but I believe a 1.5x-2x multiplier would be an
assumption towards the upper end. But for the sake of this yardstick
comparison let's look at a few comparisons based on the reported number:</p>
<ul>
<li>0.681 GWh of electricity would cost about £180k at UK residential rates
(capped at 26.35p per kWh currently). Substantially less in the USA.</li>
<li><a href="https://www.gov.wales/sites/default/files/publications/2023-11/leisure-centre-decarbonisation-guidance-note.pdf">A larger leisure centre with a pool consumes ~2.5 GWh of energy per
year</a>.
I don't know if the idea of a "leisure centre" translates outside of the UK,
but basically it's a swimming pool plus gym, squash/tennis courts etc.
<ul>
<li>The linked page claims ~2 GWh of energy in gas and 0.5 GWh in electricity.
For the gas, to compare like with like you'd need to consider the source
of energy for the electricity used for Olmo training.</li>
</ul>
</li>
<li>0.681 GWh is ~0.11% of <a href="https://www.home.cern/resources/faqs/facts-and-figures-about-lhc">LHC's annual 600 GWh energy
consumption</a>
or ~0.05% of CERN's annual consumption.</li>
<li>We can estimate a Boeing 787-9 flying from London Heathrow to SFO
consumes jet fuel containing ~0.58 GWh of energy.
<ul>
<li>Calculated with 8638km distance, 5.62kg fuel/km (taking the most economic
787-9 long haul figure from <a href="https://en.wikipedia.org/wiki/Fuel_economy_in_aircraft">this table on
Wikipedia</a> and
<a href="https://en.wikipedia.org/wiki/Jet_fuel#Typical_physical_properties_for_Jet_A_and_Jet_A-1">11.95kWh/kg specific energy of jet
fuel</a>).</li>
<li>This is a yardstick rather than a direct comparison. A direct comparison
to the GWh of electricity used for the GPU compute of the LLM would depend
on the source of the electricity. If it was e.g. gas rather than
solar/hydro/wind then you'd want to compare the number of GWh consumed to
create that electricity which would of course be higher.</li>
<li>As a further point of reference, FlightAware indicates 5 separate
direct LHR to SFO flights scheduled per day.</li>
</ul>
</li>
</ul>
<h2 id="more-efficient-llm-training"><a href="https://muxup.com/feed.xml#more-efficient-llm-training" class="anchor" tabindex="-1"></a>More efficient LLM training</h2>
<p>We can hope for new breakthroughs, more efficient hardware, better datasets
and so on. But here is some work I noticed in the area. Fair warning: this
isn't my field, and we have to recognise applying a research result to a
production training run is sure to have challenges even if the research
suggests the trade-offs are worthwhile. So consider this vague gesticulating
about seemingly interesting work that is going on and find someone who knows
what they're talking about to confirm the degree to which it is
interesting/viable.</p>
<ul>
<li>Mixture of Experts (MoE) models are substantially cheaper to train which is
one reason the industry has moved in that direction. The next Ai2 Olmo
model is <a href="https://old.reddit.com/r/LocalLLaMA/comments/1p24aet/ai2_just_announced_olmo_3_a_leading_fully_open_lm/npzqw4h/?context=3">expected to be
MoE</a>.
The <a href="https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd">Qwen
blog</a> has
a
<a href="https://img.alicdn.com/imgextra/i1/O1CN01FUbdQa1i6J7tAfCCn_%21%216000000004363-2-tps-2860-1114.png">graph</a>
comparing the relative training cost in GPU hours of the dense Qwen3-32B vs
Qwen3-30B-A3b vs Qwen3-Next-80B-A3B, where the latter
makes further architectural changes, reporting a 10.7x reduction. ~2.5x of
that is going to come from the reduced corpus size (15T tokens down from
36T), but that still leaves plenty of improvement from other factors.</li>
<li>Maybe it will be shown viable to train in lower precision such as MXFP8 or
even NVFP4, which would allow much more throughput for a similar energy
budget. Nvidia have worked to demonstrate this can be effective for
<a href="https://arxiv.org/pdf/2506.08027">both</a>
<a href="https://arxiv.org/pdf/2509.25149">formats</a> (see also <a href="https://arxiv.org/pdf/2512.02010">this work from
MIT</a>).</li>
<li>Also from Nvidia, <a href="https://arxiv.org/pdf/2511.16664">Nemotron Elastic</a>
showed a model architecture that allows deriving smaller models without
doing a separate pre-training runs.</li>
</ul>
<p>Finally, the cheapest way to train an LLM from scratch is...to find a way to
avoid the need to. For models like Olmo 3 that release the base model and
checkpoints, people can apply their own post-training or perform additional
pre-training.</p>
<h2 id="bonus-comparison-point-apertus"><a href="https://muxup.com/feed.xml#bonus-comparison-point-apertus" class="anchor" tabindex="-1"></a>Bonus comparison point: Apertus</h2>
<p><a href="https://www.swiss-ai.org/apertus">Apertus</a> is a Swiss project to produce an
open LLM, with 70B and 8B models released so far. Their <a href="https://arxiv.org/pdf/2509.14233">full tech
report</a> notes the following "Once a
production environment has been set up, we estimate that the model can be
realistically trained in approximately 90 days on 4096 GPUs, accounting for
overheads. If we assume 560 W power usage per Grace-Hopper module in this
period, below the set power limit of 660 W, we can estimate 5 GWh power usage
for the compute of the pretraining run."</p>
<hr /><a href="https://muxup.com/feed.xml#article-changelog" class="anchor" tabindex="-1"></a>Article changelog
<ul>
<li>2025-12-04: Add link to "Four Over Six" NVFP4 training paper.</li>
<li>2025-12-02: Added clarifying note about energy via gas in the
leisure centre comparison.</li>
<li>2025-12-01: Initial publication date.</li>
</ul> Alex Bradburyhttps://muxup.comAlex Bradbury: Minipost: Benchmarking the Hetzner AX102 vs CCX53https://muxup.com/2025q4/minipost-benchmarking-hetzner-ax102-vs-ccx532025-11-30T12:00:00+00:00
<p>I recently had reason to do a quick comparison of the performance of the
<a href="https://www.hetzner.com/dedicated-rootserver/ax102/">Hetzner AX102</a> dedicated
server and the high-end 'dedicated' CCX53 VPS on <a href="https://www.hetzner.com/cloud/">Hetzner
Cloud</a> and thought I may as well write up the
results for posterity. I'm incapable of starting a post without some kind of
disclaimer so here comes the one for this post: naturally the two products
have major differences in terms of flexibility (spin-up/down at will, vs pay a
small setup fee and endure a wait time depending on hardware availability). So
depending on your use case, your requirements with respect to that flexibility
may override any cost differential.</p>
<h2 id="specs"><a href="https://muxup.com/feed.xml#specs" class="anchor" tabindex="-1"></a>Specs</h2>
<p>All costs are exclusive of VAT, assuming the lowest cost data center location,
and inclusive of IPv4 address.</p>
<p><strong>AX102</strong>:</p>
<ul>
<li>16 core Ryzen 9 7950X3D (32 threads)</li>
<li>128GB DDR5 RAM</li>
<li>2 x 1.92TB NVMe</li>
<li>104 EUR/month, 39 EUR one-off setup fee.</li>
</ul>
<p><strong>CCX53</strong></p>
<ul>
<li>Unknown AMD CPU exposing 32vCPU (physical cores? threads?)</li>
<li>128GB RAM</li>
<li>600GB NVMe</li>
<li>192.49 EUR/month maximum charge. 0.3085 EUR per hour (if you keep the same
VPS active over the month it won't exceed the monthly price cap, so you
effectively get a small discount on the per-hour cost).</li>
</ul>
<h2 id="benchmark"><a href="https://muxup.com/feed.xml#benchmark" class="anchor" tabindex="-1"></a>Benchmark</h2>
<p>Building Clang+LLVM+LLD, everyone's favourite workload! Both systems are
running an up to date Arch Linux (more details on setting this up on the CCX53
in the appendix below) with clang 21.1.6. The dedicated machine has the
advantage of RAID 0 across the two SSDs, but also has encrypted rootfs
configured. I didn't bother to set that up for the CCX53 VPS.</p>
<div class="highlight"><pre><span></span><code>sudo pacman -Syu --needed clang lld cmake ninja wget
<span>LLVM_VER=</span><span>21</span>.1.6
wget https://github.com/llvm/llvm-project/releases/download/llvmorg-<span>${</span><span>LLVM_VER</span><span>}</span>/llvm-project-<span>${</span><span>LLVM_VER</span><span>}</span>.src.tar.xz
tar -xvf llvm-project-<span>${</span><span>LLVM_VER</span><span>}</span>.src.tar.xz
<span>cd</span> llvm-project-<span>${</span><span>LLVM_VER</span><span>}</span>.src
cmake -G Ninja <span>\</span>
-DLLVM_ENABLE_PROJECTS<span>=</span><span>'clang;lld'</span> <span>\</span>
-DLLVM_TARGETS_TO_BUILD<span>=</span><span>"all"</span> <span>\</span>
-DLLVM_CCACHE_BUILD<span>=</span>OFF <span>\</span>
-DCMAKE_C_COMPILER<span>=</span>clang <span>\</span>
-DCMAKE_CXX_COMPILER<span>=</span>clang++ <span>\</span>
-DLLVM_ENABLE_LLD<span>=</span>ON <span>\</span>
-DCMAKE_BUILD_TYPE<span>=</span>Release <span>\</span>
-DLLVM_ENABLE_ASSERTIONS<span>=</span>ON <span>\</span>
-S llvm <span>\</span>
-B build
<span>time</span> cmake --build build
<span>printf</span> <span>"### Version info ###\n"</span>
clang --version | head -n <span>1</span>
</code></pre></div>
<p>On both machines, ninja shows 5575 build steps.</p>
<p>Results:</p>
<ul>
<li>AX102
<ul>
<li>10m27s (627s)</li>
</ul>
</li>
<li>CCX53
<ul>
<li>14m11s (851s, about 1.36x the AX102)</li>
</ul>
</li>
</ul>
<p>Running the clang and LLVM tests with <code>./build/bin/llvm-lit -s --order=lexical llvm/test clang/test</code> (which shows 9402 tests) gives:</p>
<ul>
<li>AX102
<ul>
<li>3m39s (219s)</li>
</ul>
</li>
<li>CCX53
<ul>
<li>4m28s (268s, about 1.24x the AX102)</li>
</ul>
</li>
</ul>
<p>I ran these multiple times, and in the case of the CCX53 across two different
VMs in different regions and saw only a few percentage points variance.</p>
<p>Focusing on the results for build clang/llvm/lld, let's figure out the cost
for 1000 from-scratch builds. Not so much as it's a representative workload, but
because it gives an easy to compare metric that captures both the difference
in price and in performance. So calculating <code>time_per_build_in_hours * 1000 * cost_per_hour</code>:</p>
<ul>
<li>AX102
<ul>
<li>(626.6 / 3600) * 1000 * (104/720) = <strong>25.14 EUR</strong></li>
<li>Or if you include the setup fee and assume it's amortised over 12 months:
<ul>
<li>(626.6/3600) * 1000 * ((104 + (39/12))/720) = <strong>25.93 EUR</strong></li>
</ul>
</li>
</ul>
</li>
<li>CCX53
<ul>
<li>(850.6 / 3600) * 1000 * (192.49/720) = <strong>63.17 EUR</strong></li>
<li>Or using the 0.3085 EUR/hr price which you would pay if you didn't run for
the whole month:
<ul>
<li>(850.6 / 3600) * 1000 * 0.3085 = <strong>72.89 EUR</strong></li>
</ul>
</li>
</ul>
</li>
</ul>
<h2 id="appendix-ccx53-arch-linux-setup"><a href="https://muxup.com/feed.xml#appendix-ccx53-arch-linux-setup" class="anchor" tabindex="-1"></a>Appendix: CCX53 Arch Linux setup</h2>
<p>This could be scripted, but I just created the VPS via their web UI. Then after
it was provisioned, used that web UI to have it boot into a rescue system.
Then do an Arch bootstrap that roughly mirrors the <a href="https://muxup.com/arch-linux-on-remote-server-setup-runbook">one I use on a dedicated
build machine</a> except
that we don't bother with encrypting the rootfs. The CCX* server types at
least <a href="https://docs.hetzner.cloud/changelog#2023-08-23-new-server-types-with-dedicated-amd-vcpus">use
UEFI</a>
so we can keep using efistub for boot.</p>
<p>First get a bootstrap environment and enter it:</p>
<div class="highlight"><pre><span></span><code>wget http://mirror.hetzner.de/archlinux/iso/latest/archlinux-bootstrap-x86_64.tar.zst
tar -xvf archlinux-bootstrap-x86_64.tar.zst --numeric-owner
sed -i <span>'1s;^;Server=https://mirror.hetzner.de/archlinux/$repo/os/$arch\n\n;'</span> root.x86_64/etc/pacman.d/mirrorlist
mount --bind root.x86_64/ root.x86_64/ <span># See <https://bugs.archlinux.org/task/46169></span>
<span>printf</span> <span>"About to enter bootstrap chroot\n===============================\n"</span>
./root.x86_64/bin/arch-chroot root.x86_64/
</code></pre></div>
<p>Now set info that will be used throughout the process:</p>
<div class="highlight"><pre><span></span><code><span>export</span> <span>NEW_HOST_NAME=</span>archvps
<span>export</span> <span>PUBLIC_SSH_KEY=</span><span>"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOfpPQ1j+XLsapAhONAQmvu6TZGT5y8jeziM4Vio1NrA asb@plurp"</span>
<span>export</span> <span>NEW_USER=</span>asb
</code></pre></div>
<p>And now proceed to set up the disks, create filesystems, perform an initial
bootstrap and chroot into the new rootfs:</p>
<div class="highlight"><pre><span></span><code>pacman-key --init
pacman-key --populate archlinux
pacman -Sy --noconfirm xfsprogs dosfstools
sfdisk /dev/sda <span><<EOF</span>
<span>label: gpt</span>
<span>start=1MiB, size=255MiB, type=uefi</span>
<span>start=256MiB, type=linux</span>
<span>EOF</span>
mkfs.fat -F32 /dev/sda1
mkfs.xfs /dev/sda2
mount /dev/sda2 /mnt
mkdir /mnt/boot
mount /dev/sda1 /mnt/boot
pacstrap /mnt base linux linux-firmware efibootmgr <span>\</span>
xfsprogs dosfstools <span>\</span>
python3 <span>\</span>
openssh sudo net-tools git man-db man-pages vim
genfstab -U /mnt >> /mnt/etc/fstab
<span>printf</span> <span>"About to enter newrootfs chroot\n===============================\n"</span>
arch-chroot /mnt
</code></pre></div>
<p>Do final configuration from within the chroot:</p>
<div class="highlight"><pre><span></span><code>sed /etc/locale.gen -i -e <span>"s/^\#en_GB.UTF-8 UTF-8.*/en_GB.UTF-8 UTF-8/"</span>
locale-gen
<span># Ignore "System has not been booted with systemd" and "Failed to connect to bus" error for next command.</span>
systemd-firstboot --locale<span>=</span>en_GB.UTF-8 --timezone<span>=</span>UTC --hostname<span>=</span><span>"</span><span>$NEW_HOST_NAME</span><span>"</span>
ln -s /dev/null /etc/udev/rules.d/80-net-setup-link.rules <span># disable persistent network names</span>
<span># No longer need to disable large fallback image as Arch stopped generating it</span>
<span># by default</span>
<span>printf</span> <span>"efibootmgr before changes:\n==========================\n"</span>
efibootmgr -u
<span># Set up efistub</span>
efibootmgr <span>\</span>
--disk /dev/sda <span>\</span>
--part <span>1</span> <span>\</span>
--create <span>\</span>
--label <span>'Arch Linux'</span> <span>\</span>
--loader /vmlinuz-linux <span>\</span>
--unicode <span>"root=/dev/sda2 rw initrd=\initramfs-linux.img"</span> <span>\</span>
--verbose
<span>printf</span> <span>"efibootmgr after changes:\n=========================\n"</span>
efibootmgr -u
mkswap --size<span>=</span>8G --file /swapfile
cat - <span><<EOF > /etc/systemd/system/swapfile.swap</span>
<span>[Unit]</span>
<span>Description=Swap file</span>
<span>[Swap]</span>
<span>What=/swapfile</span>
<span>[Install]</span>
<span>WantedBy=multi-user.target</span>
<span>EOF</span>
systemctl <span>enable</span> swapfile.swap
cat - <span><<EOF > /etc/systemd/network/10-eth0.network</span>
<span>[Match]</span>
<span>Name=eth0</span>
<span>[Network]</span>
<span>DHCP=yes</span>
<span>Address=$(ip -6 addr show dev eth0 scope global | grep "scope global" | cut -d' ' -f6)</span>
<span>Gateway=$(ip route show | head -n 1 | cut -d' ' -f 3)</span>
<span>Gateway=fe80::1</span>
<span>EOF</span>
systemctl <span>enable</span> systemd-networkd.service systemd-resolved.service systemd-timesyncd.service
<span>printf</span> <span>"PasswordAuthentication no\n"</span> > /etc/ssh/sshd_config.d/20-no-password-auth.conf
systemctl <span>enable</span> sshd.service
useradd -m -g users -G wheel -s /bin/bash <span>"</span><span>$NEW_USER</span><span>"</span>
usermod --pass<span>=</span><span>'!'</span> root <span># disable root login</span>
chmod +w /etc/sudoers
<span>printf</span> <span>"%%wheel ALL=(ALL) ALL\n"</span> >> /etc/sudoers
chmod -w /etc/sudoers
mkdir <span>"/home/</span><span>$NEW_USER</span><span>/.ssh"</span>
<span>printf</span> <span>"%s\n"</span> <span>"</span><span>$PUBLIC_SSH_KEY</span><span>"</span> > <span>"/home/</span><span>$NEW_USER</span><span>/.ssh/authorized_keys"</span>
chmod <span>700</span> <span>"/home/</span><span>$NEW_USER</span><span>/.ssh"</span>
chmod <span>600</span> <span>"/home/</span><span>$NEW_USER</span><span>/.ssh/authorized_keys"</span>
chown -R <span>"</span><span>$NEW_USER</span><span>:users"</span> <span>"/home/</span><span>$NEW_USER</span><span>/.ssh"</span>
</code></pre></div>
<p>Now set password:</p>
<pre><code>passwd "$NEW_USER"
</code></pre>
<p>Then ctrl-d twice and set a symlink for resolv.conf:</p>
<pre><code>ln -sf ../run/systemd/resolve/stub-resolv.conf root.x86_64/mnt/etc/resolv.conf
</code></pre>
<p>Finally, <code>reboot</code>.</p>
<p>Remember to <code>ssh-keygen -R $THE_IP_ADDRESS</code> so you don't get ssh host
verification errors.</p>
<hr /><a href="https://muxup.com/feed.xml#article-changelog" class="anchor" tabindex="-1"></a>Article changelog
<ul>
<li>2025-11-30: Initial publication date.</li>
</ul> Alex Bradburyhttps://muxup.comAlex Bradbury: Minipost: LLM inference vs training costs for DeepSeekhttps://muxup.com/2025q4/minipost-llm-inference-vs-training-cost-for-deepseek2025-11-29T12:00:00+00:00
<p>Tl;dr: Based on published data from DeepSeek, we can estimate it takes
something like ~70 days of inference traffic (served by DeepSeek themselves,
ignoring any other providers) to match the GPU hours used for the final
training run for V3 and R1.</p>
<p>Simon Willison recently <a href="https://bsky.app/profile/simonwillison.net/post/3m6qdf5rffs2l">reshared some figures on inference costs for
LLMs</a>. I
couldn't agree more with the comment further down that thread "The big AI labs
continue to be infuriatingly opaque about the actual figures for their total
electricity and water consumption".</p>
<p>A number of responses wonder about the cost of training. If you accept the
reported figures for serving a query, what impact does it have if you amortise
the energy spent training the model over the served queries? Mistral did this
for their <a href="https://mistral.ai/news/our-contribution-to-a-global-environmental-standard-for-ai">lifecycle
analysis</a>
but they grouped together "training and inference" and kept confidential the
ratio of energy for training vs inference by reporting a figure that combined
the training cost with 18 months of usage. The thread reminded me of another
datapoint available for DeepSeek that seemed worth writing up. I think this
gives some helpful intuition for the amortised cost of training for a widely
used model of that size, but to state the obvious any attempt to apply that
intuition to other models is totally reliant on how widely used it is.</p>
<p>DeepSeek have published figures both on training and on inference for
DeepSeek's website and API users. I will attempt to consistently refer to the
figure for training as "final run training cost" to reflect the fact the
number of GPU hours used in experimentation and failed attempts isn't
reported. For final run training for DeepSeek-R1:</p>
<ul>
<li>2.788M H800 GPU hours for V3 which serves as the R1 base (see Table 1
in the <a href="https://arxiv.org/pdf/2412.19437">V3 technical report</a>).</li>
<li>0.147M H800 GPU hours for building R1 on top of V3 (see Supplementary Table
4 in the <a href="https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-025-09422-z/MediaObjects/41586_2025_9422_MOESM1_ESM.pdf">supplementary
information</a>
for the <a href="https://www.nature.com/articles/s41586-025-09422-z">R1 Nature
article</a>).</li>
<li><strong>Total</strong>: 2.935M H800 GPU hours</li>
</ul>
<p>Now for inference, back in February DeepSeek wrote up <a href="https://github.com/deepseek-ai/open-infra-index/blob/main/202502OpenSourceWeek/day_6_one_more_thing_deepseekV3R1_inference_system_overview.md">details of their
inference
system</a>
giving details of cost of serving, profit margin, and load over a 24h period.
So yes, we're extrapolating from this datapoint and assuming it's
representative. Given the worldwide inference of DeepSeek R1/V3 is surely much
larger (being openly licensed there are many vendors who are serving it), I'm
not overly worried about this aspect. Their reported average inference serving
infrastructure occupancy is 226.75 nodes (each node containing 8 H800 GPUs),
meaning <strong>43536 H800 GPU hours per day</strong>. At that rate, it will take <strong>~67.5
days</strong> of traffic for the same number of H800 GPU hours to be used for
inference as for the final training run.</p>
<p>All this to say, for a widely used model of DeepSeek R1 scale when looking at
the cost of inference, accounting for the amortised final run training cost is
more likely to be a multiplier of 2x or less rather than something much
larger. In terms of energy, this does assume that the power draw of the H800
GPUs while running inference is similar to the draw during training. And to
underline again, the reported training cost surely doesn't include
experimentation, aborted runs etc.</p>
<hr /><a href="https://muxup.com/feed.xml#article-changelog" class="anchor" tabindex="-1"></a>Article changelog
<ul>
<li>2025-11-29: Initial publication date.</li>
</ul> Alex Bradburyhttps://muxup.comEri Pazos: interop and mathml corehttps://conflor.es/blog/2025-11-27-interop-and-mathml/2025-11-27T00:00:00+00:00
math {
font-size: 2em;
}
.math-example {
display: flex;
justify-content: center;
align-items: center;
& > div {
display: inline-grid;
grid-template-columns: 1fr 2px 1fr;
grid-template-rows: fit-content(0);
grid-gap: var(--spacing);
& > img {
height: 100%;
width: auto;
border: none;
padding: 0.25rem;
}
@media (width 768px) {
grid-template-columns: 1fr;
text-align: center;
justify-items: center;
& > hr {
width: 100%;
}
}
}
}
<p class="p-summary">Interoperability makes the web better for everyone, allowing users to have a great experience regardless of their choice of browser.
We have been working on MathML Core making across browser engines as part of an agreement with the Sovereign Tech Fund.
There are some exciting developments and new features!</p>
<p><strong>Interoperability</strong> makes the web better for everyone, allowing users to have a great experience regardless of their choice of browser.
We have many standards that shape how the internet should work, drafted from <strong>consensus</strong> between different engine makers and third parties.
While having specs on how everything should function is great, we still need to <strong>align the different browser implementations</strong>.
This can be tricky as all of them have their peculiarities, and not all browsers agree on what is a priority for them.
The goal of the <a href="https://wpt.fyi/interop-2025">Interop</a> program is to select a few important features that all engines will prioritize, so users and editors can finally benefit from them.</p>
<p>A few months ago I joined <a href="https://www.igalia.com">Igalia</a>'s web platform team (and I'm really happy about it!).
Thanks to <a href="https://www.igalia.com/2025/07/14/Igalia,-Interop-and-the-Sovereign-Tech-Fund.html">an agreement</a> with the <a href="https://www.sovereign.tech/programs/fund">Sovereign Tech Fund</a>, this year we will be working on MathML and other important Interop areas.</p>
<blockquote>
<p>This post contains MathML examples. Each formula is represented twice.
Your browser renders the left one from the HTML code, while on the right there is a pre-printed SVG as a reference of how it should look.
Keep in mind that most of these features are either experimental or have just landed, so <strong>you may need the latest version of a browser to view them correctly</strong>.</p>
</blockquote>
<h2>A bit of history</h2>
<p><strong><a href="https://en.wikipedia.org/wiki/MathML">MathML</a></strong> was first published in 1998, and it grew to be a gigantic project that sought to define how mathematical notation should be rendered.
However, due to its complexity, the implementations of the browser engines were wildly different and incomplete.
This meant that editors could not rely on it, since users would see very different content depending on what they were browsing with.</p>
<pre class="language-html"><code class="language-html"><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>math</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msubsup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mo</span><span class="token punctuation">></span></span>∫<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mo</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span><span class="token punctuation">></span></span>0<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span><span class="token punctuation">></span></span>1<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msubsup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mrow</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span><span class="token punctuation">></span></span>x<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span><span class="token punctuation">></span></span>2<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mo</span><span class="token punctuation">></span></span>+<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mo</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span><span class="token punctuation">></span></span>1<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mrow</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>math</span><span class="token punctuation">></span></span></code></pre>
<div class="math-example">
<div>
∫
0
1
x
2
+
1
<hr />
<img class="no-index" alt="An integral from 0 to 1 of x squared plus one" src="https://conflor.es/images/2025/mathml-integral.svg" />
</div>
</div>
<p>This is why <strong><a href="https://w3c.github.io/mathml-core">MathML Core</a></strong> was born.
It is a small subset of <a href="https://www.w3.org/TR/MathML3">MathML 3</a> that is feasible to implement in browsers.
It is based on the parts of the specification that are <strong>used in practice</strong>, adding important implementation details and testing.</p>
<p>To illustrate why this is important, Chromium had support for some parts of MathML when it was forked from WebKit.
However, it proved to be very difficult to maintain and complete, so it was removed in 2013.
My colleague Frédéric Wang led the effort to create a new implementation based on MathML Core, which was <a href="https://www.igalia.com/2023/01/10/Igalia-Brings-MathML-Back-to-Chromium.html">shipped in 2023</a>, a huge milestone for the standard.</p>
<p>We are in a very exciting moment in the MathML history, since <strong>all three major browser engines have overlapping support</strong>.
However, there is still work to be done to align the different implementations so they follow the MathML Core specification.
The goal is that one could write formulas on a website and have it look the same everywhere (like Wikipedia, which is now <a href="https://phabricator.wikimedia.org/T271001">transitioning to native MathML</a> instead of prerendered SVGs).</p>
<p>So, what have we been working on?</p>
<h2>RTL mirroring</h2>
<p>Some scripts are written from <strong>right to left</strong>, including <a href="https://en.wikipedia.org/wiki/Arabic_alphabet">Arabic</a>.
Browsers should be able to correctly render text and math in either direction, making use of the <a href="https://www.unicode.org/reports/tr9/">Unicode BiDi</a> specification and the <a href="https://learn.microsoft.com/en-us/typography/opentype/spec/features_pt#tag-rtlm"><code>rtlm</code></a> font feature.
However, the existing implementations either didn't support mirroring or had hacky behaviour that didn't work correctly for all cases. Read <a href="https://people.igalia.com/fwang/mathml-operator-mirroring-explainer.html">this explainer</a> that Frédéric made for a great visualization of the differences.</p>
<pre class="language-html"><code class="language-html"><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>link</span> <span class="token attr-name">rel</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>stylesheet<span class="token punctuation">"</span></span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>https://fred-wang.github.io/MathFonts/XITS/mathfonts.css<span class="token punctuation">"</span></span><span class="token punctuation">/></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>math</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mrow</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mo</span><span class="token punctuation">></span></span>{<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mo</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mfrac</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span><span class="token punctuation">></span></span>5<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span><span class="token punctuation">></span></span>6<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mfrac</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mo</span><span class="token punctuation">></span></span>)<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mo</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mrow</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mfrac</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span><span class="token punctuation">></span></span>3<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span><span class="token punctuation">></span></span>4<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mfrac</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msub</span> <span class="token attr-name">displaystyle</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>true<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mo</span><span class="token punctuation">></span></span>∲<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mo</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span><span class="token punctuation">></span></span>C<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msub</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>math</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>math</span> <span class="token attr-name">dir</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>rtl<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mrow</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mo</span><span class="token punctuation">></span></span>{<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mo</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mfrac</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span><span class="token punctuation">></span></span>٥<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span><span class="token punctuation">></span></span>٦<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mfrac</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mo</span><span class="token punctuation">></span></span>)<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mo</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mrow</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mfrac</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span><span class="token punctuation">></span></span>٣<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span><span class="token punctuation">></span></span>٤<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mfrac</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msub</span> <span class="token attr-name">displaystyle</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>true<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mo</span><span class="token punctuation">></span></span>∲<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mo</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span><span class="token punctuation">></span></span>ج<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msub</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>math</span><span class="token punctuation">></span></span></code></pre>
<div class="math-example">
<div>
<span>
{
5
6
)
3
4
∲
C
{
٥
٦
)
٣
٤
∲
ج
</span>
<hr />
<img class="no-index" alt="A series of math formulas, first from left to right, then from right to left" src="https://conflor.es/images/2025/mathml-rtl.svg" />
</div>
</div>
<p>There are two cases when it comes to mirroring. If there is a corresponding mirrored character (e.g. opening parenthesis to closing parenthesis), it is called <strong>character-level mirroring</strong> or Unicode BiDi, and the browser just needs to swap one character for the other.
Sadly, this doesn't apply to every operator.</p>
<p>Take the <em>contour clockwise integral</em>.
If we just mirror the symbol by applying a reflection symmetry about a vertical line, the arrow is suddenly pointing in the other direction, making it <em>counterclockwise</em>.
This changes the meaning of the formula!</p>
<p><img src="https://conflor.es/images/2025/mathml-integral-comparison.svg" alt="Three clockwise integrals: left to right, incorrectly mirrored (arrow pointing to the other side), and right to left" class="no-index" /></p>
<p>To avoid this, the <code>rtlm</code> font feature can use <strong>glyph-level mirroring</strong> to provide a different set of correctly mirrored glyphs.
<em>Glyphs</em> plural since a math symbol can have different size variants to accommodate multiple contents.
Not only that, when the variants are not enough, there are glyphs for assembling arbitrarily long operators.</p>
<pre class="language-html"><code class="language-html"><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>link</span> <span class="token attr-name">rel</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>stylesheet<span class="token punctuation">"</span></span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>https://fred-wang.github.io/MathFonts/XITS/mathfonts.css<span class="token punctuation">"</span></span><span class="token punctuation">/></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>math</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mspace</span> <span class="token attr-name">height</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0.8em<span class="token punctuation">"</span></span> <span class="token attr-name">width</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0.8em<span class="token punctuation">"</span></span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">background</span><span class="token punctuation">:</span> tomato</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mspace</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mspace</span> <span class="token attr-name">height</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>1.5em<span class="token punctuation">"</span></span> <span class="token attr-name">width</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0.8em<span class="token punctuation">"</span></span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">background</span><span class="token punctuation">:</span> gold</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mspace</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mspace</span> <span class="token attr-name">height</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>2.5em<span class="token punctuation">"</span></span> <span class="token attr-name">width</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0.8em<span class="token punctuation">"</span></span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">background</span><span class="token punctuation">:</span> mediumseagreen</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mspace</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mspace</span> <span class="token attr-name">height</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>4.5em<span class="token punctuation">"</span></span> <span class="token attr-name">width</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0.8em<span class="token punctuation">"</span></span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">background</span><span class="token punctuation">:</span> cornflowerblue</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mspace</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>math</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>math</span> <span class="token attr-name">dir</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>rtl<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mspace</span> <span class="token attr-name">height</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0.8em<span class="token punctuation">"</span></span> <span class="token attr-name">width</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0.8em<span class="token punctuation">"</span></span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">background</span><span class="token punctuation">:</span> tomato</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mspace</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mspace</span> <span class="token attr-name">height</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>1.5em<span class="token punctuation">"</span></span> <span class="token attr-name">width</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0.8em<span class="token punctuation">"</span></span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">background</span><span class="token punctuation">:</span> gold</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mspace</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mspace</span> <span class="token attr-name">height</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>2.5em<span class="token punctuation">"</span></span> <span class="token attr-name">width</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0.8em<span class="token punctuation">"</span></span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">background</span><span class="token punctuation">:</span> mediumseagreen</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mspace</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mspace</span> <span class="token attr-name">height</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>4.5em<span class="token punctuation">"</span></span> <span class="token attr-name">width</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0.8em<span class="token punctuation">"</span></span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">background</span><span class="token punctuation">:</span> cornflowerblue</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mspace</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>math</span><span class="token punctuation">></span></span></code></pre>
<div class="math-example">
<div>
<div>
</div>
<hr />
<img class="no-index" alt="A series square roots, each taller than the last. First from left to right, then from right to left" src="https://conflor.es/images/2025/mathml-roots.svg" />
</div>
</div>
<p>No browser engine supported glyph-level mirroring for MathML operators, so we had to implement it in all of them.
Thankfully <a href="https://github.com/harfbuzz/harfbuzz">harfbuzz</a>, the underlying font rendering library used by Chromium and Firefox, already supported it.
WebKit is a work in progress, since there is more complexity because of different ports using different backends.
As for character-level mirroring, Chromium and WebKit did it right, but Firefox applied reflection symmetry instead of replacing the correct pair.
The changes in Firefox and Chromium are now stable and ready to be used!</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Feature</th>
<th>Firefox</th>
<th>WebKit</th>
<th>Chromium</th>
</tr>
</thead>
<tbody>
<tr>
<td>Character level mirroring (BiDi)</td>
<td>✅✨</td>
<td>✅</td>
<td>✅</td>
</tr>
<tr>
<td>Glyph level mirroring (rtlm)</td>
<td>✅✨</td>
<td>🚧</td>
<td>✅✨</td>
</tr>
</tbody>
</table>
</div>
<h2><code>math-shift</code> and <code>math-depth</code></h2>
<p>Details are important, especially when rendering complex and layered formulas.
One may think that a few pixels do not make that much of a difference.
However, when you have multiple levels of nesting, offsets, and multiple elements, a slight change can make everything look ugly at best, wrong at worst.</p>
<p>Enter <code>math-shift: compact</code>. Look at this example from the <a href="https://w3c.github.io/mathml-core/#the-math-shift">MathML Core spec</a>:</p>
<pre class="language-html"><code class="language-html"><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>math</span> <span class="token attr-name">display</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>block<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span><span class="token punctuation">></span></span>x<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">color</span><span class="token punctuation">:</span> mediumseagreen</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span>2<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msqrt</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mo</span><span class="token punctuation">></span></span>≠<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mo</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span><span class="token punctuation">></span></span>x<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mn</span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">color</span><span class="token punctuation">:</span> cornflowerblue</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span>2<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mn</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>math</span><span class="token punctuation">></span></span></code></pre>
<div class="math-example">
<div>
x
2
≠
x
2
<hr />
<img class="no-index" alt="Square root of x squared does not equal x squared. The exponent under the root is lower than the exponent on the right" src="https://conflor.es/images/2025/mathml-math-shift.svg" />
</div>
</div>
<p>At first glance, you may not see anything too different.
But looking closely, the green "2" on the left is a bit lower than then blue one on the right.
It is trying to <em>fit</em> under the square root bar. This is what LaTeX calls <strong>cramped mode</strong>.</p>
<p>Chromium already supported the definition given by MathML Core, so that left Firefox and WebKit, both of which used hardcoded rules for specific cases in C++ objects.
MathML Core takes another approach, and <strong>incentivizes using CSS styling rules</strong> instead.</p>
<p>Another interesting property is <a href="https://w3c.github.io/mathml-core/#the-math-script-level-property"><code>math-depth</code></a>.
It is used to make nested elements, such as those inside fractions, scripts or radicals a bit smaller.
That way, if you have an exponent of an exponent of an exponent (of an exponent...), each one is displayed a bit tinier than the last.</p>
<pre class="language-html"><code class="language-html"><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>math</span> <span class="token attr-name">display</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>block<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span><span class="token punctuation">></span></span>A<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">color</span><span class="token punctuation">:</span> cornflowerblue</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span>A<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>msup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">color</span><span class="token punctuation">:</span> mediumseagreen</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span>A<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">color</span><span class="token punctuation">:</span> tomato</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span>A<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>msup</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mo</span><span class="token punctuation">></span></span>+<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mo</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mroot</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span><span class="token punctuation">></span></span>A<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">color</span><span class="token punctuation">:</span> mediumseagreen</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span>A<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mroot</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mo</span><span class="token punctuation">></span></span>+<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mo</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mfrac</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mrow</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span><span class="token punctuation">></span></span>A<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mo</span><span class="token punctuation">></span></span>+<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mo</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mfrac</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">color</span><span class="token punctuation">:</span> cornflowerblue</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span>A<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span> <span class="token special-attr"><span class="token attr-name">style</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value css language-css"><span class="token property">color</span><span class="token punctuation">:</span> cornflowerblue</span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span>A<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mfrac</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mrow</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>mi</span><span class="token punctuation">></span></span>A<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mi</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>mfrac</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>math</span><span class="token punctuation">></span></span></code></pre>
<div class="math-example">
<div>
A
A
A
A
+
A
A
+
A
+
A
A
A
<hr />
<img class="no-index" alt="A variable with nested exponents, each smaller than the last. A radical with index A, smaller than the value inside the root. A nested fraction, whose variables are also displayed smaller." src="https://conflor.es/images/2025/mathml-math-depth.svg" />
</div>
</div>
<p>In this case, Firefox and Chromium already had compliant implementations, so only WebKit needed to catch up.
Support for <code>math-depth</code> and the <a href="https://w3c.github.io/mathml-core/#dfn-scriptlevel"><code>scriptlevel</code></a> attribute (which allows to modify this depth) has now landed,
while a patch for <a href="https://www.w3.org/TR/css-fonts-4/#valdef-font-size-math"><code>font-size: math</code></a> (which sets the size of the element based on its depth) is on the way.</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Feature</th>
<th>Firefox</th>
<th>WebKit</th>
<th>Chromium</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>math-shift: compact</code></td>
<td>✅✨</td>
<td>✅✨</td>
<td>✅</td>
</tr>
<tr>
<td><code>math-depth</code></td>
<td>✅</td>
<td>✅✨</td>
<td>✅</td>
</tr>
<tr>
<td><code>font-size: math</code></td>
<td>✅</td>
<td>🚧</td>
<td>✅</td>
</tr>
<tr>
<td><code>scriptlevel</code></td>
<td>✅</td>
<td>✅✨</td>
<td>✅</td>
</tr>
</tbody>
</table>
</div>
<h2>Other work</h2>
<h3>Rendering unknown elements as mrow</h3>
<p>MathML 3 defined 195 elements.
MathML Core focuses on about <strong>30</strong>, leaving the rest to styling or polyfills.
This means deprecating some features that were previously implemented in some browsers, like <code>mfenced</code>, <code>semantics</code>, and <code>maction</code>, as it would be too difficult to make them interoperable right now.
To prevent breaking existing content too much, they are rendered like an <code>mrow</code>.</p>
<h3><code>font-family: math</code></h3>
<p>Selecting a <strong>good math font</strong> is essential for rendering.
Stretchy operators, math symbols, and italics are not available with every font, so without one they are presented very poorly.
<a href="https://drafts.csswg.org/css-fonts/#math-def"><code>font-family: math</code></a> is a CSS property that specifies that the content should use a suitable font for mathematics.
Previously browsers had a hardcoded list of CSS fallbacks, but now this has been standardized and implemented.</p>
<p>Android doesn't come with a math font installed, so it mixes symbols from different fonts, producing a rather unappealing result:</p>
<p><img src="https://conflor.es/images/2025/mathml-poor-rendering.webp" alt="A math formula containing different symbols, all of them with varying font styling and weights as the result of not having an unified math font family" class="no-index" /></p>
<h3><code>mathvariant</code> and <code>text-transform: math-auto</code></h3>
<p>Single letter identifiers inside a <code><mi></code> tag are treated as variables, and so they should be rendered with <strong><em>fancy italics</em></strong>.
This is still supported by MathML Core.
However, MathML 3 allows a plethora of transformations using <code>mathvariant</code>, from bold to gothic text.
The new spec says that while italic transformation should still happen by default, other text should <strong>use the specific Unicode codepoint directly</strong>, as it just adds too much complexity for the browser implementation.</p>
<p><code>text-transform: math-auto</code> is a CSS property applied by default to <code><mi></code> elements that enables the italic transformation for them.
Setting the new <code>mathvariant</code> attribute to <code>normal</code> will make the <code>text-transform</code> of the element be <code>none</code>, removing the italic styling.</p>
<p><img src="https://conflor.es/images/2025/mathml-mathvariant.svg" alt="Different stylings of the letter A. Italic, regular, bold italic, bold regular, double struck, script, fraktur, sans serif and monospace" class="no-index" /></p>
<h3><code>DisplayOperatorMinHeight</code> and Cambria Math</h3>
<p>Microsoft <a href="https://github.com/MicrosoftDocs/typography-issues/issues/1136">made a mistake</a> in Cambria Math, one of the math fonts used in Windows.
They switched the <code>DisplayOperatorMinHeight</code> and <code>DelimitedSubFormulaMinHeight</code>, so operators <a href="https://github.com/w3c/mathml-core/issues/126">weren't being displayed correctly</a>.
Some browsers had a workaround for this, but a more general fix was implemented in harfbuzz, so we removed the workarounds in favour of relying on the upstream library instead.</p>
<h3>Animation for <code>math-*</code> properties</h3>
<p>When implementing <code>math-shift</code> in Firefox, we noticed that the spec said the new properties are not supposed to be animatable.
In the new CSS spec, most properties are defined as animatable (<em>fun!</em>).
After some discussion with the MathML Working Group, we decided to change the spec, and we are adding this feature to the browser engines.</p>
@keyframes math-anim {
0% { color: royalblue; math-depth: 1; }
20% { color: mediumseagreen; }
40% { color: gold; }
60% { color: tomato; math-depth: 3; }
80% { color: mediumpurple; }
100% { color: royalblue; math-depth: 1; }
}
#anim-target {
animation: math-anim 5s infinite;
}
#anim-container {
height: 4.5rem;
& > math {
font-size: 4rem;
}
}
<p id="anim-container">
x
2
</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Feature</th>
<th>Firefox</th>
<th>WebKit</th>
<th>Chromium</th>
</tr>
</thead>
<tbody>
<tr>
<td>Render unknown elements as <code>mrow</code></td>
<td>✅✨</td>
<td>✅✨</td>
<td>✅</td>
</tr>
<tr>
<td><code>font-family: math</code></td>
<td>✅✨</td>
<td>✅✨</td>
<td>✅</td>
</tr>
<tr>
<td><code>text-transform: math-auto</code></td>
<td>✅</td>
<td>✅✨</td>
<td>✅</td>
</tr>
<tr>
<td>New <code>mathvariant</code> behaviour</td>
<td>✅</td>
<td>🚧</td>
<td>✅</td>
</tr>
<tr>
<td><code>DisplayOperatorMinHeight</code> fix</td>
<td>✅✨</td>
<td>✅✨</td>
<td>✅✨</td>
</tr>
<tr>
<td>Animation for <code>math-*</code> properties</td>
<td>✅✨</td>
<td>🚧</td>
<td>🚧</td>
</tr>
</tbody>
</table>
</div>
<h2>What's next?</h2>
<p>Many of these improvements have already shipped, but our work continues on making mathematics more interoperable in browsers.
This includes some <em>exciting</em> new features ahead:</p>
<ul>
<li><strong>Updates to the operator dictionary:</strong>
MathML Core revamped the existing list of operators and their default layouts.
Additionally, there is a new compact form that removes redundancies.</li>
<li><strong>More improvements to operator stretching and spacing:</strong>
There are still some inconsistencies between browsers and some long standing bugs that we would love to tackle.</li>
<li><strong>Handling positioned elements and forbidding floats in MathML:</strong>
Like flex or grid, MathML doesn't create floating children for elements with a <code>math</code> display type.
However, they can still have out of flow positioned children.
At the moment this isn't consistent across browsers and it is something we want to improve.</li>
</ul>
<p>Working on MathML is very rewarding, specially because of the people that have helped along the way.
I'd like to specially thank my colleague <a href="https://github.com/fred-wang">@fredw</a>, reviewers from Mozilla, Apple and Google, and the <a href="https://www.w3.org/groups/wg/math/">W3C Math Working Group</a>.
Also <a href="https://github.com/delan">@delan</a> for reviewing the first draft of this post.</p>
<p>We are very grateful to the Sovereign Tech Fund for supporting this work!</p> Eri Pazoshttps://conflor.es/Igalia WebKit Team: WebKit Igalia Periodical #48https://blogs.igalia.com/webkit/blog/2025/wip-48/2025-11-24T20:12:28+00:00
<p>Update on what happened in WebKit in the week from November 17 to November 24.</p>
<p>
In this week's rendition, the WebView snapshot API was enabled on the WPE
port, further progress on the Temporal and Trusted Types implementations,
and the release of WebKitGTK and WPE WebKit 2.50.2.
</p>
<h2 id="cross-port-cat">Cross-Port 🐱</h2>
<div class="wip-item">
<p>A WebKitImage-based implementation of WebView snapshot <a rel="external" href="https://commits.webkit.org/303449@main">landed</a> this week, enabling this feature on WPE when it was previously only available in GTK. This means you can now use <code>webkit_web_view_get_snapshot</code> (and <code>webkit_web_view_get_snapshot_finish</code>) to get a WebKitImage-representation of your screenshot.</p>
<p>WebKitImage implements the <code>GLoadableIcon</code> interface (as well as <code>GIcon</code>'s), so you can get a PNG-encoded image using <code>g_loadable_icon_load</code>.</p>
</div>
<div class="wip-item">
<p><a rel="external" href="https://commits.webkit.org/303376@main">Remove</a> incorrect early return in Trusted Types DOM attribute handling to align with spec changes.</p>
</div>
<h3 id="javascriptcore-fish">JavaScriptCore 🐟</h3>
<div class="wip-description">
<p>The built-in JavaScript/ECMAScript engine for WebKit, also known as JSC or SquirrelFish.</p>
</div>
<div class="wip-item">
<p>In JavaScriptCore's implementation of Temporal, <a rel="external" href="https://github.com/WebKit/WebKit/pull/52251">implemented</a> the <code>with</code> method for <code>PlainMonthDay</code> objects.</p>
</div>
<div class="wip-item">
<p>In JavaScriptCore's implementation of Temporal, <a rel="external" href="https://github.com/WebKit/WebKit/pull/54281">implemented</a> the <code>from</code> and <code>equals</code> methods for <code>PlainMonthDay</code> objects.</p>
</div>
<h2 id="releases-package">Releases 📦️</h2>
<div class="wip-item">
<p><a rel="external" href="https://webkitgtk.org/2025/11/19/webkitgtk2.50.2-released.html">WebKitGTK 2.50.2</a> and <a rel="external" href="https://wpewebkit.org/release/wpewebkit-2.50.2.html">WPE WebKit 2.50.2</a> have been released.</p>
<p>These stable releases include a number of patches for security issues, and as such a new security advisory, <code>WSA-2025-0008</code>, has been issued (<a rel="external" href="https://webkitgtk.org/security/WSA-2025-0008.html">GTK</a>, <a rel="external" href="https://wpewebkit.org/security/WSA-2025-0008.html">WPE</a>).</p>
<p>It is recommend to apply an <a rel="external" href="https://github.com/WebKit/WebKit/commit/730bffd856d2a1e56dd3bd2a0702282f19c5242a">additional patch</a> that fixes building with the JavaScriptCore “CLoop” interpreter is enabled, which is typicall for architectures where JIT compilation is unsupported. Releases after 2.50.2 will include it and manual patching will no longer be needed.</p>
</div>
<div class="wip-end">
<p>That’s all for this week!</p>
</div> Igalia WebKit Teamhttps://blogs.igalia.com/webkit