<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Ohm Blog</title>
        <link>https://ohmjs.org/blog</link>
        <description>Ohm Blog</description>
        <lastBuildDate>Thu, 12 Mar 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[Inside Ohm's PEG-to-Wasm compiler]]></title>
            <link>https://ohmjs.org/blog/2026/03/12/peg-to-wasm</link>
            <guid>/2026/03/12/peg-to-wasm</guid>
            <pubDate>Thu, 12 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Ohm is a user-friendly parsing toolkit for JavaScript and TypeScript. You can use it to parse custom file formats or quickly build parsers, interpreters, and compilers for programming languages. Learn more]]></description>
            <content:encoded><![CDATA[<div class="admonition admonition-note alert alert--secondary"><div class="admonition-heading"><h5><span class="admonition-icon"><svg xmlns="http://www.w3.org/2000/svg" width="14" height="16" viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>About Ohm</h5></div><div class="admonition-content"><p><em>Ohm is a user-friendly parsing toolkit for JavaScript and TypeScript. You can use it to parse custom file formats or quickly build parsers, interpreters, and compilers for programming languages. <a href="https://ohmjs.org" target="_blank" rel="noopener noreferrer">Learn more</a></em></p></div></div><p>A few weeks ago, we announced the <a href="/blog/ohm-v18">Ohm v18 beta</a>, which involved a complete rewrite of the core parsing engine. Since then, we've implemented even more performance improvements: v18 is now <strong>more than 50x faster for real-world grammars</strong> while using about 10% of the memory.</p><img loading="lazy" src="/img/blog/v18-results.svg" alt="v18 benchmark results" style="width:100%;margin:1.5rem 0" class="img_E7b_"><p>The new parsing engine works by compiling an Ohm grammar — which is a form of <a href="https://en.wikipedia.org/wiki/Parsing_expression_grammar" target="_blank" rel="noopener noreferrer">parsing expression grammars</a>, or PEG — into a WebAssembly module that implements a parser. In this post, we'll dive into the technical details of how that works, and talk about some of the optimizations that made it even faster.</p><h2 class="anchor anchorWithStickyNavbar_mojV" id="pexpr-trees">PExpr trees<a class="hash-link" href="#pexpr-trees" title="Direct link to heading">​</a></h2><p>In previous versions of Ohm (up to and including v17), the parsing engine used an approach called <em>AST interpretation</em>. Here's how that works.</p><p>When you instantiate a grammar with Ohm, it parses your grammar and converts it to an abstract syntax tree. You can think of this tree as a kind of program, which describes a parser for the language. The nodes of the tree are <em>parsing expressions</em>, or <code>PExprs</code> as they're called in the source code.</p><p>We'll use the following grammar as an example:</p><div class="codeBlockContainer_I0IT theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-text codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token plain">JSONLike {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  Value = Object</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        | "true"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        | "false"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        | "null"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  Object = "{" Members? "}"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  Members = Member ("," Member)*</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  Member = string ":" Value</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  string = "\"" (~"\"" any)* "\""</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>This grammar is parsed by Ohm (using the "grammar grammar", or <em>metagrammar</em>, defined in <a href="https://github.com/ohmjs/ohm/blob/main/packages/ohm-js/src/ohm-grammar.ohm" target="_blank" rel="noopener noreferrer">ohm-grammar.ohm</a>). The result is a <code>Map&lt;string, PExpr&gt;</code>:</p><div class="codeBlockContainer_I0IT theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-text codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token plain">{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  'Value' =&gt; Alt(</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     Apply('Object'),</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     Term('true'),</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     Term('false'),</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     Term('null')</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ),</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  'Object' =&gt; Seq(</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    Term('{'),</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    Opt(Apply('Members')),</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    Term('}')</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ),</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ...</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>You can think of each rule ('Value', 'Object', etc.) as being like a function, and the function bodies are parsing expressions. <code>Alt</code>, <code>Apply</code>, <code>Opt</code>, <code>Seq</code>, and <code>Term</code> are all subclasses of the abstract <code>PExpr</code> class, and they all have an <code>eval</code> method. These methods are mostly pretty small and straightforward — here's the implementation for <code>Alt</code>:</p><div class="codeBlockContainer_I0IT language-javascript theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-javascript codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token plain">pexprs</span><span class="token punctuation" style="color:#393A34">.</span><span class="token class-name">Alt</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">prototype</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method-variable function-variable method function property-access" style="color:#d73a49">eval</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">function</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token parameter">state</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token keyword control-flow" style="color:#00009f">for</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token keyword" style="color:#00009f">let</span><span class="token plain"> idx </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"> idx </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">this</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">terms</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">length</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"> idx</span><span class="token operator" style="color:#393A34">++</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword control-flow" style="color:#00009f">if</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">state</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">eval</span><span class="token punctuation" style="color:#393A34">(</span><span class="token keyword" style="color:#00009f">this</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">terms</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">idx</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token keyword control-flow" style="color:#00009f">return</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token keyword control-flow" style="color:#00009f">return</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">false</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">;</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>To evaluate an <code>Alt</code>, we just recursively evaluate its children. If one of them succeeds, then the <code>Alt</code> succeeds; otherwise, it fails. This approach is straightforward, but not very performant.</p><h2 class="anchor anchorWithStickyNavbar_mojV" id="wasm-compilation">Wasm compilation<a class="hash-link" href="#wasm-compilation" title="Direct link to heading">​</a></h2><p>At a high level, the v18 code has two parts:</p><ol><li>Runtime support code written in <a href="https://www.assemblyscript.org/" target="_blank" rel="noopener noreferrer">AssemblyScript</a>.</li><li>The WebAssembly codegen piece, which is written in TypeScript.</li></ol><p>The code generation phase begins with the same <code>PExpr</code> tree as v17, but instead of interpreting it, we compile it to WebAssembly. (We actually first convert it to a slightly lower-level intermediate representation, but let's not worry about that for now.) Then we link the generated WebAssembly code with the runtime support code to produce the final Wasm module.</p><p>Here's what that code generation looks like for <code>Alt</code>:</p><div class="codeBlockContainer_I0IT language-typescript theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-typescript codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token keyword" style="color:#00009f">function</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">emitAlt</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">exp</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> ir</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access maybe-class-name">Alt</span><span class="token punctuation" style="color:#393A34">)</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">void</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">asm</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">this</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> saved </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> asm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">maybeSaveBacktrackPoint</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    asm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">block</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">w</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">blocktype</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">empty</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token arrow operator" style="color:#393A34">=&gt;</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token keyword control-flow" style="color:#00009f">for</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> term </span><span class="token keyword" style="color:#00009f">of</span><span class="token plain"> exp</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">children</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">this</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">emitPExpr</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">term</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        asm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">localGet</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'ret'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        asm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">condBreak</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">asm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">depthOf</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'pexprEnd'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic">// `return true`</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        saved</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">pos</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">restore</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>Unsurprisingly, it has a similar structure: we loop over all the children, and emit the code for each child. But, notice that this is a <em>compile-time</em> loop, not a run-time one. So the structure of the final code, expressed as pseudocode, looks like this:</p><div class="codeBlockContainer_I0IT theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-text codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token plain">try matching terms[0]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">succeeded? return true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">try matching terms[1]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">succeeded? return true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">// ...</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">return false</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>Note that in the generated WebAssembly code, we're also not dispatching to any kind of generic <code>eval</code> function — we just inline the code for each individual expression. The exception is rule application: by default, each rule gets compiled to its own function, so a rule application (like <code>Apply('Object')</code> in the JSONLike grammar) just compiles to a <code>call</code>.</p><p>Producing a <em>recognizer</em> (something that just accepts or rejects a given string, without producing a parse tree) in this way was the first major milestone for v18, and it only took about 8 days. We only targeted pure-PEG features; Ohm-specific things like parameterized rules and left recursion would be harder to deal with.</p><h2 class="anchor anchorWithStickyNavbar_mojV" id="building-syntax-trees">Building syntax trees<a class="hash-link" href="#building-syntax-trees" title="Direct link to heading">​</a></h2><p>So far we've described how v18 compiles a recognizer. But to do something useful with a valid input, we need to produce some kind of <em>parse tree</em> — or <em>concrete syntax tree</em> (CST), as they're called in Ohm.</p><p>In v17, CST nodes are regular JavaScript objects, allocated on the heap and managed by the garbage collector. From a memory management perspective, they have a few interesting properties:</p><ul><li>The nodes themselves are relatively small, so the per-node memory management overhead is relatively large.</li><li>There are a large number of nodes (counting Terminal nodes, around one per input character).</li><li>The nodes are full of references (which need to be scanned during garbage collection).</li><li>All nodes generally have the same lifetime: either the whole tree is in use, or all its nodes can be freed.</li></ul><p>These properties make CST nodes well-suited for region-based memory management, also known as <em>arena allocation</em>. As <a href="https://en.wikipedia.org/wiki/Region-based_memory_management" target="_blank" rel="noopener noreferrer">Wikipedia describes it</a>:</p><blockquote><p>A region <!-- -->[...]<!-- --> is a collection of allocated objects that can be efficiently reallocated or deallocated all at once. Memory allocators using region-based management are often called <em>area allocators</em>, and when they work by only "bumping" a single pointer, as <em>bump allocators</em>.</p></blockquote><h3 class="anchor anchorWithStickyNavbar_mojV" id="bump-allocation-into-wasm-linear-memory">Bump allocation into Wasm linear memory<a class="hash-link" href="#bump-allocation-into-wasm-linear-memory" title="Direct link to heading">​</a></h3><p>In v18 we use a bump allocator (provided by <a href="https://www.assemblyscript.org/runtime.html#variants" target="_blank" rel="noopener noreferrer">AssemblyScript's stub runtime</a>) to allocate CST nodes in Wasm linear memory. This has lower overhead than heap-allocated JavaScript objects (only one 32-bit header field per object, vs 3–4 in most JS engines). We consider all CST nodes to be owned by the <code>MatchResult</code> they are associated with, so when the <code>MatchResult</code> is freed, we also reclaim the memory from all its CST nodes.</p><p>For references between nodes, we use a 32-bit offset into linear memory, rather than a full-width pointer. (This is the normal way to use references in 32-bit WebAssembly.)</p><p>Overall, the approach is similar to what Adrian Sampson describes in <a href="https://www.cs.cornell.edu/~asampson/blog/flattening.html" target="_blank" rel="noopener noreferrer">Flattening ASTs (and Other Compiler Data Structures)</a>.</p><h3 class="anchor anchorWithStickyNavbar_mojV" id="node-layout">Node layout<a class="hash-link" href="#node-layout" title="Direct link to heading">​</a></h3><h4 class="anchor anchorWithStickyNavbar_mojV" id="terminal-nodes">Terminal nodes<a class="hash-link" href="#terminal-nodes" title="Direct link to heading">​</a></h4><p>Terminals are the most important thing to optimize, since a typical tree has approximately one terminal node per input character. So, rather than allocating a separate node for each terminal, we use a tagged 32-bit value: <code>(matchLength &lt;&lt; 1) | 1</code>.</p><p>A regular reference to a full CST node is always 4-byte aligned: the offset is a multiple of 4, and thus the two low bits are always 0. So if the low bit is set, we can detect that it's not a true reference, and instead use the upper 31 bits as the payload — and for terminal nodes, the only thing we need to store is how many characters of input were consumed.</p><h4 class="anchor anchorWithStickyNavbar_mojV" id="other-nodes">Other nodes<a class="hash-link" href="#other-nodes" title="Direct link to heading">​</a></h4><p>The other node types (nonterminal, list, opt) have the following layout:</p><div class="codeBlockContainer_I0IT theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-text codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token plain">Byte</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  0   matchLength: i32 (chars consumed)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  4   typeAndDetails: i32</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        bits [1:0] = node type</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        bits [31:2] = ruleId (nonterm) or arity</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  8   childCount: i32</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"> 12   failureOffset: i32 (relative to startIdx)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"> 16+  children: i32[] (child node "pointers")</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><h3 class="anchor anchorWithStickyNavbar_mojV" id="chunked-bindings">Chunked bindings<a class="hash-link" href="#chunked-bindings" title="Direct link to heading">​</a></h3><p>When a parsing expression is successful, it produces a number of CST nodes, which we call <em>bindings</em>. If the parent expression succeeds, those nodes will become its children; but if it fails, they become garbage.</p><p>This bottom-up way of building the CST requires a stack-like structure for temporarily storing the bindings. The original implementation used an AssemblyScript <code>Array&lt;i32&gt;</code> — a managed, dynamically-resized array. This was convenient, but it meant that in some scenarios, pushing a binding could be quite expensive (allocate a new backing buffer, copy all elements, free the old one).</p><p>We replaced this with an unrolled doubly-linked list of fixed-size chunks:</p><div class="codeBlockContainer_I0IT theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-text codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token plain"> prev: i32   next: i32   data: i32[128]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ∅</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ▲</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">┌───┼───────────────────────────┐</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">│  prev   next   data (128×i32) │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">│          │                    │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└──────────┼────────────────────┘</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ▲      │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    │      ▼</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">┌───────────────────────────────┐</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">│  prev   next   data (128×i32) │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">│          │                    │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└──────────┼────────────────────┘</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">           ▼</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">           ∅</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>Each chunk holds 128 binding slots. Two globals track the current position: <code>bindingsChunk</code> (pointer to the active chunk) and <code>bindingsIdx</code> (offset within it). Push is a single store instruction plus an index increment — only when <code>bindingsIdx</code> hits 128 does it advance to the next chunk, reusing an existing one if available from a previous backtrack.</p><p>The critical property for a PEG parser is that backtracking is cheap: it's just restoring two <code>i32</code> values (the saved chunk pointer and index). No elements need to be zeroed, copied, or freed. The "abandoned" slots in subsequent chunks are simply ignored and will be overwritten on the next forward pass.</p><p>This change alone made parsing 15–16% faster on our benchmarks, purely from eliminating array resize copies and managed-object overhead.</p><h2 class="anchor anchorWithStickyNavbar_mojV" id="memoization">Memoization<a class="hash-link" href="#memoization" title="Direct link to heading">​</a></h2><p>Ohm uses a technique called <em>packrat parsing</em>, in which rule applications are memoized: the first time a rule is applied at a given input position, the result is stored in a table. If the same rule is applied at the same position again, we just look up the result instead of re-evaluating the rule body.</p><p>Conceptually, the memo table is a 2D structure indexed by input position and rule ID:</p><div class="codeBlockContainer_I0IT theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-text codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token plain">         pos 0   pos 1   pos 2   pos 3   pos 4   ...</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ┌───────┬───────┬───────┬───────┬───────┬─────</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Value   │       │       │       │       │       │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Object  │       │       │   ✓   │       │       │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Members │       │       │       │       │       │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Member  │       │       │       │       │       │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">string  │       │   ✓   │       │   ✓   │       │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        └───────┴───────┴───────┴───────┴───────┴─────</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>In a naive implementation, the memo table would have <code>numPositions × numRules</code> entries. But in practice, it's very sparse — most rules are never attempted at most positions.</p><h3 class="anchor anchorWithStickyNavbar_mojV" id="block-sparse-representation">Block-sparse representation<a class="hash-link" href="#block-sparse-representation" title="Direct link to heading">​</a></h3><p>To avoid wasting memory on empty entries, v18 uses a <em>block-sparse</em> memo table. The index is a flat array of block pointers: <code>index[pos * numBlocks + blockIdx]</code>. Each block holds 16 entries and is allocated lazily on first write.</p><div class="codeBlockContainer_I0IT theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-text codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token plain">pos 0                        pos 1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">┌────────┬────────┬─── ···  ┌────────┬────────┬─── ···</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">│ blk 0  │ blk 1  │         │ blk 0  │ blk 1  │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└───┬────┴───┬────┴─── ···  └────────┴────────┴─── ···</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    │        │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ▼        ▼</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">┌────────┐ ┌────────┐</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">│16 slots│ │16 slots│    ← i32 MemoEntry values</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└────────┘ └────────┘</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">(64 bytes)  (0 = not yet allocated)</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>This means we only allocate memory for rules that are actually attempted at a given position, while keeping lookups fast.</p><p>(This two-level representation is common in packrat parsers; it was first described in <a href="https://pdos.csail.mit.edu/~baford/packrat/thesis/" target="_blank" rel="noopener noreferrer">Bryan Ford's thesis</a> and later in <a href="https://web.archive.org/web/20171010074824/http://cs.nyu.edu/rgrimm/xtc/rats-intro.html" target="_blank" rel="noopener noreferrer">Robert Grimm's Rats! parser generator</a>.)</p><h3 class="anchor anchorWithStickyNavbar_mojV" id="memo-entry-encoding">Memo entry encoding<a class="hash-link" href="#memo-entry-encoding" title="Direct link to heading">​</a></h3><p>Each memo entry is packed into a single <code>i32</code>:</p><ul><li><strong>Success</strong>: a pointer to the CST node (bit 0 is always 0, since nodes are 4-byte aligned)</li><li><strong>Failure</strong>: <code>(failureOffset &lt;&lt; 1) | 1</code></li><li><strong>Spaces</strong>: <code>(matchLength &lt;&lt; 2) | 2</code> — more on this below</li></ul><p>This encoding lets us distinguish the three cases with a simple bit check, and avoids the need for any auxiliary data structures.</p><h2 class="anchor anchorWithStickyNavbar_mojV" id="parameterized-rules">Parameterized rules<a class="hash-link" href="#parameterized-rules" title="Direct link to heading">​</a></h2><p>In Ohm, a rule can have <em>parameters</em> — parsing expressions that are substituted into the rule body. For example:</p><div class="codeBlockContainer_I0IT theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-text codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token plain">KeyVal&lt;keyExp, valExp&gt; = keyExp ":" valExp</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>The <code>KeyVal</code> rule takes two parameters, <code>keyExp</code> and <code>valExp</code>. These work much like function parameters in a typical programming language: when we use the rule, we also need to supply the actual parameters (aka <em>arguments</em>). For example:</p><div class="codeBlockContainer_I0IT theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-text codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token plain">IdField = KeyVal&lt;"\"id\"", number&gt;</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>In the v17 interpreter, we handle parameters by maintaining a <em>rule stack</em>.</p><p>In v18, we handle parameterized rules via static specialization. This means that we generate a separate rule body for every unique combination of parameters. So parameterized rules are more like macros: they are expanded at compile time, and no parameters exist at runtime. In the example above, it means that there is no generic <code>KeyVal</code> rule — it's as if we defined the rule like this:</p><div class="codeBlockContainer_I0IT theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-text codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token plain">KeyVal$0 = "\"id\"" ":" number</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>This simplifies the runtime semantics. It also works well with memoization. For parameterized rules, we can only use a memo entry if the parameters are identical: if <code>KeyVal&lt;"\"id\"", number&gt;</code> succeeds at position 0, it doesn't mean that <code>KeyVal&lt;"\"id\"", letter&gt;</code> will.</p><p>After specialization, two applications of the same rule with different parameters can simply be treated as two unique rule applications, and their memo entries will never be shared.</p><h2 class="anchor anchorWithStickyNavbar_mojV" id="optimized-space-skipping">Optimized space skipping<a class="hash-link" href="#optimized-space-skipping" title="Direct link to heading">​</a></h2><p>One of Ohm's distinctive features is that syntactic rules (those starting with an uppercase letter) automatically skip whitespace between tokens, and the grammar author can override the definition of whitespace — for example, to allow comments to be treated as whitespace.</p><p>In v17, implicit space skipping is treated just like an explicit application of the <code>spaces</code> rule: CST nodes are allocated and the result is memoized.</p><p>v18 uses an optimized form of implicit space skipping: it avoids creating CST nodes altogether. During the walk phase, it can lazily materialize those nodes if and when they are required. For the common case where no one inspects the space-skipping nodes, this avoids a huge number of allocations.</p><p>Lazy spaces nodes also have a special representation in the memo table: <code>(matchLength &lt;&lt; 2) | 2</code>. Since there is no CST node to point to, we just record the matchLength as a tagged, 32-bit value.</p><p>Optimizing space skipping can lead to a <em>huge</em> performance gain in some grammars. For example, here are the results from our official ES5 grammar on a 742KB source file:</p><img loading="lazy" src="/img/blog/v18-space-skipping.svg" alt="Space skipping optimization" style="width:100%" class="img_E7b_"><h2 class="anchor anchorWithStickyNavbar_mojV" id="other-optimizations">Other optimizations<a class="hash-link" href="#other-optimizations" title="Direct link to heading">​</a></h2><p>The things mentioned above were the biggest wins; a few smaller optimizations also contributed meaningful gains.</p><h3 class="anchor anchorWithStickyNavbar_mojV" id="single-use-rule-inlining">Single-use rule inlining<a class="hash-link" href="#single-use-rule-inlining" title="Direct link to heading">​</a></h3><p>If a rule is referenced exactly once in the grammar, its body is emitted inline at the call site. This eliminates the function call overhead, and saves space in the memo table. (This is a standard optimization in packrat parsers.)</p><h3 class="anchor anchorWithStickyNavbar_mojV" id="preallocated-nodes">Preallocated nodes<a class="hash-link" href="#preallocated-nodes" title="Direct link to heading">​</a></h3><p>Some CST nodes have a fixed structure, no matter where they appear in the tree:</p><ul><li>Single-child nonterminals: simple rules that match a single code point, like <code>letter</code> or <code>digit</code>.</li><li>Empty iteration nodes (zero children and zero match length).</li></ul><p>For nodes like this, we preallocate a singleton instance, and use that whenever it's needed, rather than allocating separate nodes for each instance.</p><h2 class="anchor anchorWithStickyNavbar_mojV" id="further-reading">Further reading<a class="hash-link" href="#further-reading" title="Direct link to heading">​</a></h2><p>Very little of what's described here is novel; most of the techniques can be found in one of these papers:</p><ul><li>Bryan Ford:<ul><li><a href="https://dl.acm.org/doi/pdf/10.1145/583852.581483" target="_blank" rel="noopener noreferrer">Packrat parsing: simple, powerful, lazy, linear time</a> (2002)</li><li><a href="https://dspace.mit.edu/bitstream/handle/1721.1/87310/51972156-MIT.pdf" target="_blank" rel="noopener noreferrer">Packrat parsing: a practical linear-time algorithm with backtracking</a> (2002)</li></ul></li><li>Robert Grimm: <a href="https://dl.acm.org/doi/pdf/10.1145/1133255.1133987" target="_blank" rel="noopener noreferrer">Better extensibility through modular syntax</a> (2006)</li></ul><p>For more details on Ohm's original implementation, see:</p><ul><li><a href="https://dl.acm.org/doi/pdf/10.1145/3093334.2989231" target="_blank" rel="noopener noreferrer">Modular semantic actions</a> (2016)</li><li><a href="https://dl.acm.org/doi/pdf/10.1145/3136014.3136022" target="_blank" rel="noopener noreferrer">Incremental packrat parsing</a> (2017)</li></ul><h2 class="anchor anchorWithStickyNavbar_mojV" id="try-it-out">Try it out<a class="hash-link" href="#try-it-out" title="Direct link to heading">​</a></h2><div class="codeBlockContainer_I0IT language-bash theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-bash codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token function" style="color:#d73a49">npm</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">install</span><span class="token plain"> ohm-js@next                      </span><span class="token comment" style="color:#999988;font-style:italic"># Runtime (production dependency)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">npm</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">install</span><span class="token plain"> --save-dev @ohm-js/compiler@next </span><span class="token comment" style="color:#999988;font-style:italic"># Compiler (dev dependency)</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>We'd love to hear your feedback. Give it a spin, and let us know what you think on <a href="https://discord.gg/KwxY5gegRQ" target="_blank" rel="noopener noreferrer">Discord</a> or <a href="https://github.com/ohmjs/ohm/discussions" target="_blank" rel="noopener noreferrer">GitHub Discussions</a>.</p><h2 class="anchor anchorWithStickyNavbar_mojV" id="acknowledgements">Acknowledgements<a class="hash-link" href="#acknowledgements" title="Direct link to heading">​</a></h2><p>I'd like to thank Adam B. and <a href="https://projectsubstrate.org" target="_blank" rel="noopener noreferrer">Project Substrate</a> for the initial funding that kicked off this project, and <a href="https://shopify.com" target="_blank" rel="noopener noreferrer">Shopify</a> for additional financial support.</p><p>And thanks to Alex Warth (PEG parsing guru) for the advice, ideas, and encouragement that made this work possible.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Ohm v18 Beta]]></title>
            <link>https://ohmjs.org/blog/ohm-v18</link>
            <guid>ohm-v18</guid>
            <pubDate>Fri, 20 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Ohm v18 compiles grammars to WebAssembly, making parsing ~20x faster and using a fraction of the memory.]]></description>
            <content:encoded><![CDATA[<p><em>aka "The One that Compiles to Wasm".</em></p><p>After nearly a year of development, we're excited to announce the first beta release of Ohm v18 — the biggest change to Ohm since its initial release. We've totally reworked the core parsing engine to be WebAssembly-based, making parsing around 20x faster on real-world grammars while using a fraction of the memory.</p><h2 class="anchor anchorWithStickyNavbar_mojV" id="whats-new">What's new<a class="hash-link" href="#whats-new" title="Direct link to heading">​</a></h2><p>Every version of Ohm up to v17 worked the same way under the hood: when you call <code>grammar.match()</code>, Ohm walks a tree of parsing expression objects (PExprs), calling <code>eval()</code> on each node. (It's a so-called <a href="https://craftinginterpreters.com/a-tree-walk-interpreter.html" target="_blank" rel="noopener noreferrer">tree-walking interpreter</a>.) In the process, it builds up a huge parse tree, with each node a separate object that must be managed by the GC.</p><p>v18 takes a completely different approach. At build time, the new <code>@ohm-js/compiler</code> translates your grammar into a WebAssembly module, compiling each rule to its own function. The parse tree is allocated into Wasm linear memory, and nodes used a packed representation which is much, much more memory efficient.</p><p>Also, the runtime (<code>ohm-js</code>) is now separate from the compiler (<code>@ohm-js/compiler</code>):</p><div class="codeBlockContainer_I0IT language-bash theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-bash codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token plain">npx ohm2wasm my-grammar.ohm   </span><span class="token comment" style="color:#999988;font-style:italic"># compile at build time</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><div class="codeBlockContainer_I0IT language-js theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-js codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token keyword module" style="color:#00009f">import</span><span class="token plain"> </span><span class="token imports punctuation" style="color:#393A34">{</span><span class="token imports maybe-class-name">Grammar</span><span class="token imports punctuation" style="color:#393A34">}</span><span class="token plain"> </span><span class="token keyword module" style="color:#00009f">from</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'ohm-js'</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">// load and run at runtime</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> g </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword control-flow" style="color:#00009f">await</span><span class="token plain"> </span><span class="token maybe-class-name">Grammar</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">instantiate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">fs</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">readFileSync</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'my-grammar.wasm'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><h2 class="anchor anchorWithStickyNavbar_mojV" id="how-much-faster">How much faster?<a class="hash-link" href="#how-much-faster" title="Direct link to heading">​</a></h2><p>We've been benchmarking with two real-world workloads:</p><ol><li>Our <a href="https://github.com/ohmjs/ohm/blob/main/examples/ecmascript/src/es5.ohm" target="_blank" rel="noopener noreferrer">official ES5 grammar</a> compiling a large (742K) JavaScript file.</li><li>Shopify's <a href="https://github.com/Shopify/theme-tools/blob/main/packages/liquid-html-parser/grammar/liquid-html.ohm" target="_blank" rel="noopener noreferrer">LiquidHTML grammar</a>, parsing all the Liquid templates from their <a href="https://github.com/Shopify/dawn" target="_blank" rel="noopener noreferrer">Dawn theme</a>.</li></ol><p>One these two benchmarks, v18 parses about <strong>22x faster</strong> than v17, and requires less than 20% of the memory. 🔥</p><h2 class="anchor anchorWithStickyNavbar_mojV" id="breaking-changes">Breaking changes<a class="hash-link" href="#breaking-changes" title="Direct link to heading">​</a></h2><p>Along with the new runtime, v18 has a significantly reworked API. Check out the <a href="/docs/releases/ohm-js-18.0">migration guide</a> for all the details on the new API, what's changed, and what's not in v18 yet.</p><p>And please note: <strong>the new API is still in flux</strong>, so expect some changes before the stable release.</p><h2 class="anchor anchorWithStickyNavbar_mojV" id="try-it-out">Try it out<a class="hash-link" href="#try-it-out" title="Direct link to heading">​</a></h2><div class="codeBlockContainer_I0IT language-bash theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-bash codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token function" style="color:#d73a49">npm</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">install</span><span class="token plain"> ohm-js@beta                      </span><span class="token comment" style="color:#999988;font-style:italic"># Runtime (production dependency)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">npm</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">install</span><span class="token plain"> --save-dev @ohm-js/compiler@beta </span><span class="token comment" style="color:#999988;font-style:italic"># Compiler (dev dependency)</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>If you want to kick the tires without changing your build setup, there's a compat helper that parses, compiles, and instantiates in one step — just like the old <code>ohm.grammar()</code>:</p><div class="codeBlockContainer_I0IT language-js theme-code-block"><div class="codeBlockContent_wNvx" style="color:#393A34;background-color:#f6f8fa"><pre tabindex="0" class="prism-code language-js codeBlock_jd64 thin-scrollbar"><code class="codeBlockLines_mRuA"><span class="token-line" style="color:#393A34"><span class="token keyword module" style="color:#00009f">import</span><span class="token plain"> </span><span class="token imports punctuation" style="color:#393A34">{</span><span class="token imports">grammar</span><span class="token imports punctuation" style="color:#393A34">}</span><span class="token plain"> </span><span class="token keyword module" style="color:#00009f">from</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'@ohm-js/compiler/compat'</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> g </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">grammar</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'MyGrammar { start = "hello" }'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">using result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> g</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">match</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'hello'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><br></span></code></pre><button type="button" aria-label="Copy code to clipboard" title="Copy" class="copyButton_eDfN clean-btn"><span class="copyButtonIcons_W9eQ" aria-hidden="true"><svg class="copyButtonIcon_XEyF" viewBox="0 0 24 24"><path d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg class="copyButtonSuccessIcon_i9w9" viewBox="0 0 24 24"><path d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div><p>(This compiles on every call, so it's great for prototyping but probably not what you want for production.)</p><p>We'd love to hear your feedback on v18. Give it a spin, and let us know what you think on <a href="https://discord.gg/KwxY5gegRQ" target="_blank" rel="noopener noreferrer">Discord</a> or <a href="https://github.com/ohmjs/ohm/discussions" target="_blank" rel="noopener noreferrer">GitHub Discussions</a>.</p>]]></content:encoded>
        </item>
    </channel>
</rss>