Zola 2026-02-23T00:00:00+00:00 https://mnt.io/atom.xml About memory pressure, lock contention, and Data-oriented Design 2026-02-23T00:00:00+00:00 2026-02-23T00:00:00+00:00 Unknown https://mnt.io/articles/about-memory-pressure-lock-contention-and-data-oriented-design/ <p>I'm here to narrate you a story about performance. Recently, I was in the same room as some Memory Pressure and some Lock Contention. It took me a while to recognize them. Legend says it only happens in obscure, low-level systems, but I'm here to refute the legend. While exploring, I had the pleasure of fixing a funny bug in a higher-order stream: lucky us, to top it all off, we even have a sweet treat! This story is also a pretext to introduce you to Data-oriented Design, and to show how it improved execution time by 98.7% and throughput by 7718.5%. I believe we have all the ingredients for a juicy story. Let's cook, and <em lang="fr">bon appétit !</em></p> <h2 id="on-a-beautiful-morning">On a Beautiful Morning…<a role="presentation" class="anchor" href="#on-a-beautiful-morning" title="Anchor link to this header">#</a> </h2> <p>While powering on my <a rel="noopener external" target="_blank" href="https://dygma.com/pages/defy">Dygma Defy</a>, unlocking my computer, and checking messages from my colleagues, I suddenly come across this one:</p> <blockquote> <p>Does anyone also experience a frozen room list?</p> </blockquote> <p>Ah yeah, for some years now, I've been employed by <a rel="noopener external" target="_blank" href="https://element.io/">Element</a> to work on the <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-rust-sdk">Matrix Rust SDK</a>. If one needs to write a complete, modern, cross-platform, fast Matrix client or bot, this SDK is an excellent choice. The SDK is composed of many crates. Some are very low in the stack and are not aimed at being used directly by developers, like <code>matrix_sdk_crypto</code>. Some others are higher in the stack — the highest is for User Interfaces (UI) with <code>matrix_sdk_ui</code>. While it is a bit opinionated, it is designed to provide the high-quality features everybody expects in a modern Matrix client.</p> <p>One of these features is the Room List. The Room List is a place where users spend a lot of their time in a messaging application (along with the Timeline, i.e. the room's messages). Some expectations for this component:</p> <ul> <li>Be superfast,</li> <li>List all the rooms,</li> <li>Interact with rooms (open them, mark them as unread etc.),</li> <li>Filter the rooms,</li> <li>Sort the rooms.</li> </ul> <p>Let's focus on the part that interests us today: <em>Sort the rooms</em>. The Room List holds… no rooms. It actually provides a <em>stream of updates about rooms</em>; more precisely a <code>Stream&lt;Item = Vec&lt;VectorDiff&lt;Room&gt;&gt;&gt;</code>. What does this mean? The stream yields a vector of “diffs” of rooms. I'm writing <a href="https://mnt.io/series/reactive-programming-in-rust/">a series about reactive programming</a> — you might be interested to read more about it. Otherwise, here is what you need to know.</p> <p><a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im/0.8.0/eyeball_im/enum.VectorDiff.html">The <code>VectorDiff</code> type</a> comes from <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im/0.8.0/eyeball_im/">the <code>eyeball-im</code> crate</a>, initially created for the Matrix Rust SDK as a solid foundation for reactive programming. It looks like this:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> enum</span><span class="z-entity z-name"> VectorDiff</span><span>&lt;</span><span class="z-entity z-name">T</span><span>&gt; {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Append</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> values</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Vector</span><span>&lt;</span><span class="z-entity z-name">T</span><span>&gt;,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span class="z-entity z-name"> Clear</span><span>,</span></span> <span class="giallo-l"><span class="z-entity z-name"> PushFront</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> T</span><span>,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span class="z-entity z-name"> PushBack</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> T</span><span>,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span class="z-entity z-name"> PopFront</span><span>,</span></span> <span class="giallo-l"><span class="z-entity z-name"> PopBack</span><span>,</span></span> <span class="giallo-l"><span class="z-entity z-name"> Insert</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> index</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> usize</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> T</span><span>,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span class="z-entity z-name"> Set</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> index</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> usize</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> T</span><span>,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span class="z-entity z-name"> Remove</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> index</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> usize</span><span>,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span class="z-entity z-name"> Truncate</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> length</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> usize</span><span>,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span class="z-entity z-name"> Reset</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> values</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Vector</span><span>&lt;</span><span class="z-entity z-name">T</span><span>&gt;,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>It represents a <em>change</em> in <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im/0.8.0/eyeball_im/struct.ObservableVector.html">an <code>ObservableVector</code></a>. This is like a <code>Vec</code>, but <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im/0.8.0/eyeball_im/struct.ObservableVector.html#method.subscribe">one can subscribe to the changes</a>, and will receive… well… <code>VectorDiff</code>s!</p> <p>The Room List type merges several streams into a single stream representing the list of rooms. For example, let's imagine the room at index 3 receives a new message. Its “preview” (the <em>latest event</em> displayed beneath the room's name, e.g. <q>Alice: Hello!</q>) changes. Also, the Room List sorts rooms by their “recency” (the <em>time</em> something happened in the room). And since the “preview” has changed, its “recency” changes too, which means the room is sorted and re-positioned. Then, we expect the Room List's stream to yield:</p> <ol> <li><code>VectorDiff::Set { index: 3, value: new_room }</code> because of the new “preview”,</li> <li><code>VectorDiff::Remove { index: 3 }</code> to remove the room… immediately followed by</li> <li><code>VectorDiff::PushFront { value: new_room }</code> to insert the room at the top of the Room List.</li> </ol> <p>This reactive programming mechanism has proven to be extremely efficient.</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>I did my calculation: the size of <code>VectorDiff&lt;Room&gt;</code> is 72 bytes (mostly because <code>Room</code> contains <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/sync/struct.Arc.html">an <code>Arc</code></a> over the real struct type). This is pretty small for an update. Not only it brings a small memory footprint, but it crosses the FFI boundary pretty easily, making it easy to map to other languages like Swift or Kotlin — languages that provide UI components, like <a rel="noopener external" target="_blank" href="https://developer.apple.com/swiftui/">SwiftUI</a> or <a rel="noopener external" target="_blank" href="https://developer.android.com/compose">Jetpack Compose</a>.</p> </div> </div> <p>Absolutely! These are two popular UI components where a <code>VectorDiff</code> maps straightforwardly to their List component update operations. They are actually (remarkably) pretty similar to each other<sup class="footnote-reference" id="fr-vectordiff_on_other_uis-1"><a href="#fn-vectordiff_on_other_uis">1</a></sup>.</p> <p>You're always a good digression companion, thank you. Let's go back on our problem:</p> <blockquote> <p>What does "frozen" mean for the Room List?</p> </blockquote> <p>It means that the Room List is simply… <em>blank</em>, <em>empty</em>, <em lang="fr">vide</em>, <em lang="es">vacía</em>, <em lang="it">vuoto</em>, <em lang="ar">خلو</em>… well, you get the idea.</p> <blockquote> <p>What could freeze the Room List?</p> </blockquote> <p>What are our options?</p> <div class="conversation" data-character="factotum"> <div class="conversation--character"> <span lang="fr">Le Factotum</span> <picture role="presentation"> <source srcset="/image/factotum.avif" type="image/avif" /> <source srcset="/image/factotum.webp" type="image/webp" /> <img src="/image/factotum.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>It would be a real pleasure if you let me assist you in this task.</p> <ul> <li>The network sync is not running properly, hence giving the <em>impression</em> of a frozen Room List? Hmm, no, everything works as expected here. Moreover, local data should be displayed.</li> <li>The “source streams” used by the Room List are not yielding the expected updates? No, everything works like a charm.</li> <li>The “merge of streams” is broken for some reasons? No, it seems fine.</li> <li>The filtering of the streams? Not touched since a long time.</li> <li>The sorting? Ah, maybe, I reckon we have changed something here…</li> </ul> </div> </div> <p>Indeed, we have changed one sorter recently. Let's take a look at how this Room List stream is computed, shall we?</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-storage">let</span><span class="z-variable"> stream</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> stream!</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> loop</span><span> {</span></span> <span class="giallo-l"><span class="z-comment"> // Wait for the filter to be updated.</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> filter</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> filter_cell</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">take</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Get the “raw” entries.</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span> (</span><span class="z-variable">initial_values</span><span>,</span><span class="z-variable"> stream</span><span>)</span><span class="z-keyword z-operator"> =</span><span class="z-variable z-language"> self</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">entries</span><span>();</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Combine normal stream updates with other room updates.</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> stream</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> merge_streams</span><span>(</span><span class="z-variable">initial_values</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">clone</span><span>(),</span><span class="z-variable"> stream</span><span>,</span><span class="z-variable"> other_updates</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> let</span><span> (</span><span class="z-variable">initial_values</span><span>,</span><span class="z-variable"> stream</span><span>)</span><span class="z-keyword z-operator"> =</span><span> (</span><span class="z-variable">initial_values</span><span>,</span><span class="z-variable"> stream</span><span>)</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">filter</span><span>(</span><span class="z-variable">filter</span><span>)</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">sort_by</span><span>(</span><span class="z-entity z-name z-function">new_sorter_lexicographic</span><span>(</span><span class="z-entity z-name z-function">vec!</span><span>[</span></span> <span class="giallo-l"><span class="z-comment"> // Sort by latest event&#39;s kind.</span></span> <span class="giallo-l"><span class="z-entity z-name"> Box</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>(</span><span class="z-entity z-name z-function">new_sorter_latest_event</span><span>()),</span></span> <span class="giallo-l"><span class="z-comment"> // Sort rooms by their recency.</span></span> <span class="giallo-l"><span class="z-entity z-name"> Box</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>(</span><span class="z-entity z-name z-function">new_sorter_recency</span><span>()),</span></span> <span class="giallo-l"><span class="z-comment"> // Finally, sort by name.</span></span> <span class="giallo-l"><span class="z-entity z-name"> Box</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>(</span><span class="z-entity z-name z-function">new_sorter_name</span><span>()),</span></span> <span class="giallo-l"><span> ]))</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">dynamic_head_with_initial_value</span><span>(</span><span class="z-variable">page_size</span><span>,</span><span class="z-variable"> limit_stream</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Clearing the stream before chaining with the real stream.</span></span> <span class="giallo-l"><span class="z-keyword"> yield</span><span class="z-entity z-name z-function"> once</span><span>(</span><span class="z-entity z-name z-function">ready</span><span>(</span><span class="z-entity z-name z-function">vec!</span><span>[</span><span class="z-entity z-name">VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Reset</span><span> {</span><span class="z-variable"> values</span><span class="z-keyword z-operator">:</span><span class="z-variable"> initial_values</span><span> }]))</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">chain</span><span>(</span><span class="z-variable">stream</span><span>);</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">switch</span><span>();</span></span></code></pre> <p>There is a lot going on here. Sadly, we are not going to explain everything in this beautiful piece of art<sup class="footnote-reference" id="fr-switch-1"><a href="#fn-switch">2</a></sup>.</p> <p>The <code>.filter()</code>, <code>.sort_by()</code> and <code>.dynamic_head_with_initial_value()</code> methods are part of <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im-util/0.10.0/eyeball_im_util/">the <code>eyeball-im-util</code> crate</a>. They are used to filter, sort etc. a stream: They are essentially mapping a <code>Stream&lt;Item = Vec&lt;VectorDiff&lt;T&gt;&gt;&gt;</code> to another <code>Stream&lt;Item = Vec&lt;VectorDiff&lt;T&gt;&gt;&gt;</code>. In other terms, they “change” the <code>VectorDiff</code>s on-the-fly to simulate filtering, sorting, or something else. Let's see a very concrete example with <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im-util/0.10.0/eyeball_im_util/vector/struct.Sort.html">the <code>Sort</code> higher-order stream</a> (the following example is mostly a copy of the documentation of <code>Sort</code>, but <a rel="noopener external" target="_blank" href="https://github.com/jplatte/eyeball/pull/43">since I wrote this algorithm, I guess you, dear reader, will find it acceptable</a>).</p> <p>Let's imagine we have a vector of <code>char</code>. We want a <code>Stream</code> of <em>changes</em> about this vector (the famous <code>VectorDiff</code>). We also want to <em>simulate</em> a sorted vector, by only modifying the <em>changes</em>. The solution looks like this:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> std</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">cmp</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Ordering</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> eyeball_im</span><span class="z-keyword z-operator">::</span><span>{</span><span class="z-entity z-name">ObservableVector</span><span>,</span><span class="z-entity z-name"> VectorDiff</span><span>};</span></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> eyeball_im_util</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">vector</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">VectorObserverExt</span><span>;</span></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> stream_assert</span><span class="z-keyword z-operator">::</span><span>{assert_next_eq, assert_pending};</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Our comparison function.</span></span> <span class="giallo-l"><span class="z-keyword">fn</span><span class="z-entity z-name z-function"> cmp</span><span>&lt;</span><span class="z-entity z-name">T</span><span>&gt;(</span><span class="z-variable">left</span><span class="z-keyword z-operator">: &amp;</span><span class="z-entity z-name">T</span><span>,</span><span class="z-variable"> right</span><span class="z-keyword z-operator">: &amp;</span><span class="z-entity z-name">T</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> Ordering</span></span> <span class="giallo-l"><span class="z-keyword">where</span></span> <span class="giallo-l"><span class="z-entity z-name"> T</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Ord</span><span>,</span></span> <span class="giallo-l"><span>{</span></span> <span class="giallo-l"><span class="z-variable"> left</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">cmp</span><span>(</span><span class="z-variable">right</span><span>)</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Our vector.</span></span> <span class="giallo-l"><span class="z-storage">let mut</span><span class="z-variable"> vector</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> ObservableVector</span><span class="z-keyword z-operator">::</span><span>&lt;</span><span class="z-entity z-name">char</span><span>&gt;</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>();</span></span> <span class="giallo-l"><span class="z-storage">let</span><span> (</span><span class="z-variable">initial_values</span><span>,</span><span class="z-storage"> mut</span><span class="z-variable"> stream</span><span>)</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> vector</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">subscribe</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">sort_by</span><span>(</span><span class="z-variable">cmp</span><span>);</span></span> <span class="giallo-l"><span class="z-comment">// ^^^</span></span> <span class="giallo-l"><span class="z-comment">// |</span></span> <span class="giallo-l"><span class="z-comment">// there</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert!</span><span>(</span><span class="z-variable">initial_values</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">is_empty</span><span>());</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_pending!</span><span>(</span><span class="z-variable">stream</span><span>);</span></span></code></pre> <p>Alrighty. That's a good start. <code>vector</code> is empty, so the initial values from the subscribe are empty, and the <code>stream</code> is also pending<sup class="footnote-reference" id="fr-stream_assert-1"><a href="#fn-stream_assert">3</a></sup>. I think it's time to play with this new toy, isn't it?</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// Append unsorted values.</span></span> <span class="giallo-l"><span class="z-variable">vector</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">append</span><span>(</span><span class="z-entity z-name z-function">vector!</span><span>[</span><span class="z-string">&#39;d&#39;</span><span>,</span><span class="z-string"> &#39;b&#39;</span><span>,</span><span class="z-string"> &#39;e&#39;</span><span>]);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// We get a `VectorDiff::Append` with sorted values!</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_next_eq!</span><span>(</span></span> <span class="giallo-l"><span class="z-variable"> stream</span><span>,</span></span> <span class="giallo-l"><span class="z-entity z-name"> VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Append</span><span> {</span><span class="z-variable"> values</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name z-function"> vector!</span><span>[</span><span class="z-string">&#39;b&#39;</span><span>,</span><span class="z-string"> &#39;d&#39;</span><span>,</span><span class="z-string"> &#39;e&#39;</span><span>] }</span></span> <span class="giallo-l"><span>);</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_pending!</span><span>(</span><span class="z-variable">stream</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Let&#39;s recap what we have. `vector` is our `ObservableVector`,</span></span> <span class="giallo-l"><span class="z-comment">// `stream` is the “sorted view”/“sorted stream” of `vector`:</span></span> <span class="giallo-l"><span class="z-comment">//</span></span> <span class="giallo-l"><span class="z-comment">// | index | 0 1 2 |</span></span> <span class="giallo-l"><span class="z-comment">// | `vector` | d b e |</span></span> <span class="giallo-l"><span class="z-comment">// | `stream` | b d e |</span></span></code></pre> <p>So far, so good. It looks naive and simple: one operation in, one operation out. It's funnier when things get more complicated though:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// Append multiple other values.</span></span> <span class="giallo-l"><span class="z-variable">vector</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">append</span><span>(</span><span class="z-entity z-name z-function">vector!</span><span>[</span><span class="z-string">&#39;f&#39;</span><span>,</span><span class="z-string"> &#39;g&#39;</span><span>,</span><span class="z-string"> &#39;a&#39;</span><span>,</span><span class="z-string"> &#39;c&#39;</span><span>]);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// We get three `VectorDiff`s this time!</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_next_eq!</span><span>(</span></span> <span class="giallo-l"><span class="z-variable"> stream</span><span>,</span></span> <span class="giallo-l"><span class="z-entity z-name"> VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PushFront</span><span> {</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-string"> &#39;a&#39;</span><span> }</span></span> <span class="giallo-l"><span>);</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_next_eq!</span><span>(</span></span> <span class="giallo-l"><span class="z-variable"> stream</span><span>,</span></span> <span class="giallo-l"><span class="z-entity z-name"> VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Insert</span><span> {</span><span class="z-variable"> index</span><span class="z-keyword z-operator">:</span><span class="z-constant z-numeric"> 2</span><span>,</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-string"> &#39;c&#39;</span><span> }</span></span> <span class="giallo-l"><span>);</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_next_eq!</span><span>(</span></span> <span class="giallo-l"><span class="z-variable"> stream</span><span>,</span></span> <span class="giallo-l"><span class="z-entity z-name"> VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Append</span><span> {</span><span class="z-variable"> values</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name z-function"> vector!</span><span>[</span><span class="z-string">&#39;f&#39;</span><span>,</span><span class="z-string"> &#39;g&#39;</span><span>] }</span></span> <span class="giallo-l"><span>);</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_pending!</span><span>(</span><span class="z-variable">stream</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Let&#39;s recap what we have:</span></span> <span class="giallo-l"><span class="z-comment">//</span></span> <span class="giallo-l"><span class="z-comment">// | index | 0 1 2 3 4 5 6 |</span></span> <span class="giallo-l"><span class="z-comment">// | `vector` | d b e f g a c |</span></span> <span class="giallo-l"><span class="z-comment">// | `stream` | a b c d e f g |</span></span> <span class="giallo-l"><span class="z-comment">// ^ ^ ^^^</span></span> <span class="giallo-l"><span class="z-comment">// | | |</span></span> <span class="giallo-l"><span class="z-comment">// | | with `VectorDiff::Append { .. }`</span></span> <span class="giallo-l"><span class="z-comment">// | with `VectorDiff::Insert { index: 2, .. }`</span></span> <span class="giallo-l"><span class="z-comment">// with `VectorDiff::PushFront { .. }`</span></span></code></pre> <p>Notice how <code>vector</code> is <em>never</em> sorted. That's the power of these higher-order streams of <code>VectorDiff</code>s: light and —more importantly— <strong>combinable</strong>! I repeat myself: we are always mapping a <code>Stream&lt;Item = Vec&lt;VectorDiff&lt;T&gt;&gt;&gt;</code> to another <code>Stream&lt;Item = Vec&lt;VectorDiff&lt;T&gt;&gt;&gt;</code>. That's the same type! The whole collection is never computed entirely, except for the initial values: only the changes are handled and trigger a computation. Knowing that, in the manner of <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/future/trait.Future.html"><code>Future</code></a>, <code>Stream</code> is lazy —i.e. it does something only when polled—, it makes things pretty efficient. And…</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>… as your favourite digression companion, I really, deeply, appreciate these details. Nonetheless, I hope you don't mind if… I suggest to you that… you might want to, maybe, go back to… <small>the main… subject, don't you think?</small></p> </div> </div> <p>Which topic? Ah! The frozen Room List! Sorters are <em>not</em> the culprit. There. Happy? Short enough?</p> <p>These details were important. Kind of. I hope you've learned something along the way. Next, let's see how a sorter works, and how it could be responsible for our memory pressure and lock contention.</p> <h2 id="randomness">Randomness<a role="presentation" class="anchor" href="#randomness" title="Anchor link to this header">#</a> </h2> <p>Taking a step back, I was asking myself: <q>Is it really frozen?</q>. The cherry on the cake: I was unable to reproduce the problem! Even the reporters of the problem were unable to reproduce it consistently. Hmm, a random problem? Fortunately, two of the reporters are obstinate. Ultimately, we got analysis.</p> <figure> <picture> <source srcset=".&#x2F;memory-pressure.avif" type="image/avif" /> <source srcset=".&#x2F;memory-pressure.webp" type="image/webp" /> <img src=".&#x2F;memory-pressure.png" loading="lazy" decoding="async" /> </picture> <figcaption> <p>Memory analysis of Element X in Android Studio (Element X is based on the Matrix Rust SDK). It presents a callback tree, with the number of allocations and deallocations for each node in this tree. Thanks <a rel="noopener external" target="_blank" href="https://github.com/jmartinesp">Jorge</a>!</p> <p>And, holy cow, we see <strong>a lot</strong> of memory allocations, exactly 322'042 to be precise, counting for 743Mib, for the <code>eyeball_im_util::vector::sort::SortBy</code> type! I don't remember exactly how many rooms are part of the Room List, but it's probably around 500-600.</p> <p><small>Download fullsize image as: <a href=".&#x2F;memory-pressure.avif" title="Download the AVIF image">AVIF</a>, <a href=".&#x2F;memory-pressure.webp" title="Download the WebP image">WebP</a>, <a href=".&#x2F;memory-pressure.png" title="Download the PNG image">PNG</a>.</small></p> </figcaption> </figure> <p>The Room List wasn't frozen. It was taking <strong>a lot</strong> of time to yield values. Sometimes, up to 5 minutes on a phone. Alright, we have two problems to solve here:</p> <ol> <li>Why is it random?</li> <li>Why so many memory allocations and deallocations?</li> </ol> <p>The second problem will be discussed in the next section. Let's start with the first problem in this section, shall we?</p> <p>Let's start at the beginning. <code>eyeball_im_util::vector::sort::SortBy</code> is used like so:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-variable">stream</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">sort_by</span><span>(</span><span class="z-entity z-name z-function">new_sorter_lexicographic</span><span>(</span><span class="z-entity z-name z-function">vec!</span><span>[</span></span> <span class="giallo-l"><span class="z-entity z-name"> Box</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>(</span><span class="z-entity z-name z-function">new_sorter_latest_event</span><span>()),</span></span> <span class="giallo-l"><span class="z-entity z-name"> Box</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>(</span><span class="z-entity z-name z-function">new_sorter_recency</span><span>()),</span></span> <span class="giallo-l"><span class="z-entity z-name"> Box</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>(</span><span class="z-entity z-name z-function">new_sorter_name</span><span>()),</span></span> <span class="giallo-l"><span> ]))</span></span></code></pre> <p><code>sort_by</code> receives a sorter: <a rel="noopener external" target="_blank" href="https://docs.rs/matrix-sdk-ui/0.16.0/matrix_sdk_ui/room_list_service/sorters/fn.new_sorter_lexicographic.html"><code>new_sorter_lexicographic</code></a>. It's from <a rel="noopener external" target="_blank" href="https://docs.rs/matrix-sdk-ui/0.16.0/matrix_sdk_ui/room_list_service/sorters/"><code>matrix_sdk_ui::room_list::sorters</code></a>, and it's a constructor for a… lexicographic sorter. All sorters must implement <a rel="noopener external" target="_blank" href="https://docs.rs/matrix-sdk-ui/0.16.0/matrix_sdk_ui/room_list_service/sorters/trait.Sorter.html">the <code>Sorter</code> trait</a>. Once again, it's a trait from <code>matrix_sdk_ui</code>, nothing fancy, it's simply this:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// Trait “alias”.</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> trait</span><span class="z-entity z-name"> Sorter</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name z-function"> Fn</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-entity z-name">Room</span><span>,</span><span class="z-keyword z-operator"> &amp;</span><span class="z-entity z-name">Room</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> Ordering</span><span> {}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// All functions `F` are auto-implementing `Sorter`.</span></span> <span class="giallo-l"><span class="z-keyword">impl</span><span>&lt;</span><span class="z-entity z-name">F</span><span>&gt;</span><span class="z-entity z-name"> Sorter</span><span class="z-keyword"> for</span><span class="z-entity z-name"> F</span></span> <span class="giallo-l"><span class="z-keyword">where</span></span> <span class="giallo-l"><span class="z-entity z-name"> F</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name z-function"> Fn</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-entity z-name">Room</span><span>,</span><span class="z-keyword z-operator"> &amp;</span><span class="z-entity z-name">Room</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> Ordering</span><span> {}</span></span></code></pre> <p>Put it differently, all functions with two parameters of type <code>&amp;Room</code>, and with a return type <code>Ordering</code> is considered a sorter. There. It's crystal clear now, except… what's a lexicographic sorter?</p> <div class="conversation" data-character="procureur"> <div class="conversation--character"> <span lang="fr">Le Procureur</span> <picture role="presentation"> <source srcset="/image/procureur.avif" type="image/avif" /> <source srcset="/image/procureur.webp" type="image/webp" /> <img src="/image/procureur.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>Should I really quote the documentation of <code>new_sorter_lexicographic</code>? My work here is turning into a tragedy.</p> <p>It creates a new sorter that will run multiple sorters. When the <math><msup><mi>n</mi><mtext>th</mtext></msup></math> sorter returns <code>Ordering::Equal</code>, the next sorter is called. It stops as soon as a sorter returns <code>Ordering::Greater</code> or <code>Ordering::Less</code>.</p> <p>This is an implementation of a lexicographic order as defined for <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Lexicographic_order#Cartesian_products">cartesian products</a>.</p> </div> </div> <p>In short, we are executing 3 sorters: by <em>latest event</em>, by <em>recency</em> and by <em>name</em>.</p> <p>None of these sorters are using any form of randomness. It's a <em lang="fr">cul-de-sac</em>. Let's take a step back by looking at <code>SortBy</code> in <code>eyeball_im_util</code> itself maybe? <i>Scroll the documentation</i>, not here, <i>read the initial patch</i>, hmm, I see a mention of a binary search, <i>jump into the code</i>, ah, <a rel="noopener external" target="_blank" href="https://github.com/jplatte/eyeball/blob/b7dc6fde71e507459ecbd7519a8a22f12bf2a8de/eyeball-im-util/src/vector/sort.rs#L315-L318">here, look at the comment</a>:</p> <blockquote> <p>When looking for the <em>position</em> of a value (e.g. where to insert a new value?), <code>Vector::binary_search_by</code> is used — it is possible because the <code>Vector</code> is sorted. When looking for the <em>unsorted index</em> of a value, <code>Iterator::position</code> is used.</p> </blockquote> <p><a rel="noopener external" target="_blank" href="https://docs.rs/imbl/7.0.0/imbl/type.Vector.html#method.binary_search_by"><code>Vector::binary_search_by</code></a> doesn't mention any form of randomness in its documentation. Another <em lang="fr">cul-de-sac</em>.</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>Remember that the Room List appears frozen but it is actually blank. The problem is not when the stream receives an update, but when the stream is “created”, i.e. when the initial items are sorted for the first time before receiving updates.</p> <p>Moreover, the comment says <q>it is possible because the <code>Vector</code> is sorted</q>, which indicates that “the vector” (I guess it's a buffer somewhere) <em>has been sorted</em> one way or another. What do you think?</p> </div> </div> <p>Ah! Brilliant. That's correct! Looking at <a rel="noopener external" target="_blank" href="https://github.com/jplatte/eyeball/blob/b7dc6fde71e507459ecbd7519a8a22f12bf2a8de/eyeball-im-util/src/vector/sort.rs#L261">the constructor of <code>SortBy</code></a> (or its implementation), we notice it's using <a rel="noopener external" target="_blank" href="https://docs.rs/imbl/7.0.0/imbl/type.Vector.html#method.sort_by"><code>Vector::sort_by</code></a>. And guess what? It's relying on… <i>drum roll</i>… <a rel="noopener external" target="_blank" href="https://github.com/jneem/imbl/blob/6feb48d04ed9bd2a004968541d1a90d61c423d31/src/vector/mod.rs#L1575-L1583">quicksort</a>! Following the path, we see <a rel="noopener external" target="_blank" href="https://github.com/jneem/imbl/blob/6feb48d04ed9bd2a004968541d1a90d61c423d31/src/sort.rs#L177-L185">it actually creates a pseudo random number generator (PRNG) to do the quicksort</a>.</p> <p>Phew. Finally. Time for a cup of tea and a biscuit<sup class="footnote-reference" id="fr-biscuit-1"><a href="#fn-biscuit">4</a></sup>.</p> <p>My guess here is the following. Depending on the (pseudo randomly) generated pivot index, the number of comparisons may vary each time this runs. We can enter a pathological case where more comparisons means more memory pressure, which means slower sorting, which means… A Frozen Room List<sup><abbr title="Trademark">TM</abbr></sup>, <i>play horror movie music</i>!</p> <h2 id="memory-pressure">Memory Pressure<a role="presentation" class="anchor" href="#memory-pressure" title="Anchor link to this header">#</a> </h2> <p>A memory allocator is responsible for… well… allocating the memory. If you believe this is a simple problem, please retract this offensive thought quickly: what an oaf! Memory is managed based on the strategy or strategies used by the memory allocator: there is not a unique solution. Each memory allocator comes with trade-offs: do you allocate and replace multiple similar small objects several times in a row, do you need fixed-size blocks of memory, dynamic blocks etc.</p> <p>Allocating memory is not free. The memory allocator has a cost in itself —which could be mitigated by implementing a custom memory allocator maybe—, but there is also <strong>a hardware cost</strong>, and it's comparatively more difficult to mitigate. Memory is allocated on the heap, i.e. <em>the RAM</em>, also called <em>the main memory</em> (not be confused with <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/CPU_cache">CPU caches: L1, L2…</a>). The RAM is nice and all, but it lives far from the CPU. It <em>takes time</em> to allocate something on the heap and…</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>Hold on a second. I heard it is around 100-150 nanoseconds to fetch a data from the heap. In what world is this “costly”? How is this “far” from the CPU?</p> <p>I understand we are talking about <em>random</em> accesses (the <em>R</em> in RAM), and multiple indirections, but still, it sounds pretty fast, right?</p> </div> </div> <p>Hmm, <i>refrain from opening the Pandora's box</i>, let's try to stay high-level here, shall we? Be careful: the numbers I am going to present can vary depending on your hardware, but the important part is <strong>the scale</strong>: keep that in mind.</p> <figure> <table><thead><tr><th>Operation</th><th style="text-align: right">Time</th><th style="text-align: right">“Human scale”</th></tr></thead><tbody> <tr><td>Fetch from L1 cache</td><td style="text-align: right">1ns</td><td style="text-align: right">1mn</td></tr> <tr><td>Branch misprediction</td><td style="text-align: right">3ns</td><td style="text-align: right">3mn</td></tr> <tr><td>Fetch from L2 cache</td><td style="text-align: right">4ns</td><td style="text-align: right">4mn</td></tr> <tr><td>Mutex lock/unlock</td><td style="text-align: right">17ns</td><td style="text-align: right">17mn</td></tr> <tr><td>Fetch from the main memory</td><td style="text-align: right">100ns</td><td style="text-align: right">1h40mn</td></tr> <tr><td>SSD random read</td><td style="text-align: right">16'000ns</td><td style="text-align: right">11.11 days</td></tr> </tbody></table> <figcaption> <p>Latency numbers for the year 2020 for various operations (source: <a rel="noopener external" target="_blank" href="https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html"><cite>Latency Numbers Every Programmer Shoud Know</cite> from Colin Scott (UC Berkeley)</a>).</p> <p>The time in the second column is given in nanoseconds, i.e. <math><mfrac><mn>1</mn><mn>1'000'000'000</mn></mfrac></math> second. The time in the third column is “humanized” to give us a better sense of the scale here: we imagine 1ns maps to 1min.</p> </figcaption> </figure> <p>Do you see the difference between the L1/L2 caches and the main memory? 1ns to 100ns is the same difference as 1mn to 1h40. So, yes, it takes time to read from memory. That's why we try to avoid allocations as much as possible.</p> <figure> <svg viewBox="0 0 200 55" role="img" id="memory-race"> <style> #memory-race text { font-size: 4pt } #memory-race circle { fill: oklch(69.50% .140 76.18); animation: 4s linear 0s infinite alternate slide; } #memory-race .l1 { animation-duration: .5s } #memory-race .l2 { animation-duration: 2s } #memory-race .ram { animation-duration: 50s } @keyframes slide { from { transform: translateX(15%); } to { transform: translateX(85%); } } </style> <text x="0" y="12">CPU</text> <text x="0" y="27">CPU</text> <text x="0" y="42">CPU</text> <text x="180" y="12">L1</text> <text x="180" y="27">L2</text> <text x="180" y="42">RAM</text> <circle cx="0" cy="10" r="4" class="l1" /> <circle cx="0" cy="25" r="4" class="l2" /> <circle cx="0" cy="40" r="4" class="ram" /> </svg> <figcaption> <p>Not comfortable with numbers? Let's try to visualise it with 1ns = 1s! On the left: the CPU. On the right, the L1 cache, the L2 cache, and the RAM. The “balls” represent the time it takes to move information between the CPU and the L1/L2 caches or the RAM.</p> </figcaption> </figure> <p>Sadly, in our case, it appears we are allocating 322'042 times to sort the initial rooms of the Room List, for a total of 743'151'616 bits allocated, with 287 bytes per allocation. Of course, if we are doing quick napkin maths<sup class="footnote-reference" id="fr-napkin-math-1"><a href="#fn-napkin-math">5</a></sup>, it should take around 200ms. We are far from The Frozen Room List<sup><abbr title="Trademark">TM</abbr></sup>, but there is more going on.<sup class="footnote-reference" id="fr-suspens-1"><a href="#fn-suspens">6</a></sup></p> <p>Do you remember the memory allocator? Its role is to also avoid <em>fragmentation</em> as much as possible. The number of memory “blocks” isn't infinite: when memory blocks are freed, and new ones are allocated later, maybe the previous blocks are no longer available and cannot be reused. The allocator has to find a good place, while keeping fragmentation under control. Maybe the blocks must be moved to create enough space to insert the new blocks (it's often preferable to have contiguous blocks).</p> <p>That's what I call <strong>memory pressure</strong>. We are asking too much, too fast, and the memory allocator we use in the Matrix Rust SDK is not designed to handle this use case.</p> <p>What are our solutions then?</p> <div class="conversation" data-character="factotum"> <div class="conversation--character"> <span lang="fr">Le Factotum</span> <picture role="presentation"> <source srcset="/image/factotum.avif" type="image/avif" /> <source srcset="/image/factotum.webp" type="image/webp" /> <img src="/image/factotum.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>May I suggest an approach? What about finding where we are allocating and deallocating memory? Then we might be able to reduce either the number of allocations, or the size of the value being allocated (and deallocated), with the hope of making the memory allocator happier. Possible solutions:</p> <ul> <li>If the allocated value is too large to fit in the stack, we could return a pointer to it if possible,</li> <li>Maybe we don't need the full value: we could return just a pointer to a fragment of it?</li> </ul> </div> </div> <p>Excellent ideas. Let's track which sorter creates the problem. We start with the sorter that was recently modified: <code>latest_event</code>. In short, this sorter compares the <code>LatestEventValue</code> of two rooms: the idea is that rooms with a <code>LatestEventValue</code> representing a <em>local event</em>, i.e. an event that is not sent yet, or is sending, must be at the top of the Room List. Alright, <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-rust-sdk/blob/3eb693acadb08db8e41de90ef51730d206168e7c/crates/matrix-sdk-ui/src/room_list_service/sorters/latest_event.rs#L64C1-L69C2">let's look at its core part</a>:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub fn</span><span class="z-entity z-name z-function"> new_sorter</span><span>()</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-keyword"> impl</span><span class="z-entity z-name"> Sorter</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> latest_events</span><span class="z-keyword z-operator"> =</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> |</span><span class="z-variable">left</span><span class="z-keyword z-operator">: &amp;</span><span class="z-entity z-name">Room</span><span>,</span><span class="z-variable"> right</span><span class="z-keyword z-operator">: &amp;</span><span class="z-entity z-name">Room</span><span class="z-keyword z-operator">|</span><span> (</span><span class="z-variable">left</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">latest_event</span><span>(),</span><span class="z-variable"> right</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">latest_event</span><span>());</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> move</span><span class="z-keyword z-operator"> |</span><span class="z-variable">left</span><span>,</span><span class="z-variable"> right</span><span class="z-keyword z-operator">| -&gt;</span><span class="z-entity z-name"> Ordering</span><span> {</span><span class="z-entity z-name z-function"> cmp</span><span>(</span><span class="z-variable">latest_events</span><span>,</span><span class="z-variable"> left</span><span>,</span><span class="z-variable"> right</span><span>) }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Alright. For each sorting iteration, the <code>Room::latest_event</code> method is called twice. <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-rust-sdk/blob/3eb693acadb08db8e41de90ef51730d206168e7c/crates/matrix-sdk-base/src/room/latest_event.rs#L38">This method is as follows</a>:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub fn</span><span class="z-entity z-name z-function"> latest_event</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable z-language">self</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> LatestEventValue</span><span> {</span></span> <span class="giallo-l"><span class="z-variable z-language"> self</span><span class="z-keyword z-operator">.</span><span>info</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">read</span><span>()</span><span class="z-keyword z-operator">.</span><span>latest_event</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">clone</span><span>()</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Oh, there it is. We are acquiring a read lock over the <code>info</code> value, then we are reading the <code>latest_event</code> field, and we are cloning the value. Cloning is important here as we don't want to hold the read lock for too long. This is our culprit. The size of the <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-rust-sdk/blob/3eb693acadb08db8e41de90ef51730d206168e7c/crates/matrix-sdk-base/src/latest_event.rs#L29"><code>LatestEventValue</code></a> type is 144 bytes (it doesn't count the size of the event itself, because this size is dynamic).</p> <p>Before going further, let's check whether another sorter has a similar problem, shall we? <i>Look at the other sorters</i>, oh!, turns out <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-rust-sdk/blob/01c0775e5974ad8a8690f5c580e79612ddcdfa2d/crates/matrix-sdk-ui/src/room_list_service/sorters/recency.rs#L90">the <code>recency</code> sorter</a> also uses the <code>latest_event</code> method! Damn, this is becoming really annoying.</p> <p>Question: do we need the entire <code>LatestEventValue</code>? Probably not!</p> <ul> <li>For the <code>latest_event</code> sorter, we actually only need to know when this <code>LatestEventValue</code> is <em>local</em>, that's it.</li> <li>For the <code>recency</code> sorter, we only need to know the timestamp of the <code>LatestEventValue</code>.</li> </ul> <p>So instead of copying the whole value in memory twice per sorter iteration, for two sorters, let's try to write more specific methods:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub fn</span><span class="z-entity z-name z-function"> latest_event_is_local</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable z-language">self</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> bool</span><span> {</span></span> <span class="giallo-l"><span class="z-variable z-language"> self</span><span class="z-keyword z-operator">.</span><span>info</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">read</span><span>()</span><span class="z-keyword z-operator">.</span><span>latest_event</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">is_local</span><span>()</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">pub fn</span><span class="z-entity z-name z-function"> latest_event_timestamp</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable z-language">self</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> Option</span><span>&lt;</span><span class="z-entity z-name">MilliSecondsSinceUnixEpoch</span><span>&gt; {</span></span> <span class="giallo-l"><span class="z-variable z-language"> self</span><span class="z-keyword z-operator">.</span><span>info</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">read</span><span>()</span><span class="z-keyword z-operator">.</span><span>latest_event</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">timestamp</span><span>()</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Just like that, <strong>the throughput has been improved by 18%</strong> according to the <code>room_list</code> benchmark. You can see <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-rust-sdk/commit/62eb1996d917fb1928bdb9bba40d78a6eefe0bbd">the patch in “action”</a>. Can we declare victory over memory pressure?</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>I beg your pardon, but I don't believe it's a victory. We have reduced the size of allocations, but not the number of allocations itself.</p> <p>Well, actually, <code>latest_event_is_local</code> returns a <code>bool</code>: it can fit in the stack. And <code>latest_event_timestamp</code> returns an <code>Option&lt;MilliSecondsSinceUnixEpoch&gt;</code>, where <a rel="noopener external" target="_blank" href="https://docs.rs/ruma/0.14.1/ruma/struct.MilliSecondsSinceUnixEpoch.html"><code>MilliSecondsSinceUnixEpoch</code> is a <code>Uint</code></a>, which <a rel="noopener external" target="_blank" href="https://docs.rs/js_int/0.2.2/js_int/struct.UInt.html">itself is a <code>f64</code></a>: it can also fit in the stack.</p> <p>So, yes, we may have reduced the number of allocations greatly, that's agreed, it explains the 18% throughput improvement. However, issue reporters were mentioning a lag of 5 minutes or so, do you remember? How do you explain the remaining 4 minutes 6 seconds then? This is still unacceptable, right?</p> </div> </div> <p>Definitely yes! Everything above 200ms (from our napkin maths) is unacceptable here. Memory pressure was an important problem, and it's now solved, but it wasn't the only problem.</p> <h2 id="lock-contention">Lock Contention<a role="presentation" class="anchor" href="#lock-contention" title="Anchor link to this header">#</a> </h2> <p>The assiduous reader may have noticed that we are still dealing with a lock here.</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-variable z-language">self</span><span class="z-keyword z-operator">.</span><span>info</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">read</span><span>()</span><span class="z-keyword z-operator">.</span><span>latest_event</span><span class="z-keyword z-operator">.</span><span>…</span></span> <span class="giallo-l"><span class="z-comment">// ^^^^^^</span></span> <span class="giallo-l"><span class="z-comment">// |</span></span> <span class="giallo-l"><span class="z-comment">// this read lock acquisition</span></span></code></pre> <p>Do you remember we had 322'042 allocations? It represents the number of times the <code>latest_event</code> method was called basically, which means…</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>… the lock is acquired 322'042 times!</p> <p>…</p> <p>… no?</p> </div> </div> <p>… yes… and please, stop interrupting me, I was trying to build up the suspense for a climax.</p> <p>Anyway. Avoiding a lock isn't an easy task. However, this lock around <code>info</code> is particularly annoying because it's called by almost all sorters! They need information about a <code>Room</code>; all the information is in this <code>info</code> field, which is a read-write lock. Hmmm.</p> <p>Let's change our strategy. We need to take a step back:</p> <ol> <li>The sorters need this data.</li> <li>Running the sorters won't change this data.</li> <li>When the data does change the sorters will be re-run.</li> </ol> <p>Maybe we could fetch, ahead of time, all the necessary data for all sorters in a single type: it will be refreshed when the data changes, which is right before the sorters run again.</p> <div class="conversation" data-character="procureur"> <div class="conversation--character"> <span lang="fr">Le Procureur</span> <picture role="presentation"> <source srcset="/image/procureur.avif" type="image/avif" /> <source srcset="/image/procureur.webp" type="image/webp" /> <img src="/image/procureur.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>The idea here is to organise the data around a specific layout. The focus on the data layout aims at being CPU cache friendly as much as possible. This kind of approach is called <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Data-oriented_design"><em>Data-oriented Design</em></a>.</p> </div> </div> <p>That's correct. If the type is small enough, it can fit more easily in the CPU caches, like L1 or L2. Do you remember how fast they are? 1ns and 4ns, much faster than the 100ns for the main memory. Moreover, it removes the lock contention and the memory pressure entirely!</p> <details> <summary> <p>I highly recommend watching the following talks<sup class="footnote-reference" id="fr-talks-1"><a href="#fn-talks">7</a></sup> if you want to learn more about Data-oriented Design (DoD)</p> </summary> <figure> <iframe class="youtube-player" src="https://www.youtube-nocookie.com/embed/rX0ItVEVjHc" title="Data-Oriented Design and C++, by Mike Acton, at the CppCon 2014" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen loading="lazy"></iframe> <figcaption> <p>Video: Data-Oriented Design and C++, by Mike Acton, at the CppCon 2014</p> <p>The transformation of data is the only purpose of any program. Common approaches in C++ which are antithetical to this goal will be presented in the context of a performance-critical domain (console game development). Additionally, limitations inherent in any C++ compiler and how that affects the practical use of the language when transforming that data will be demonstrated. <a rel="noopener external" target="_blank" href="https://github.com/CppCon/CppCon2014/tree/master/Presentations/Data-Oriented%20Design%20and%20C%2B%2B">View the slides</a>.</p> </figcaption> </figure> <figure> <iframe class="youtube-player" src="https://www.youtube-nocookie.com/embed/WDIkqP4JbkE" title="Cpu Caches and Why You Care, by Scott Meyers, at the code::dive conference 2014" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen loading="lazy"></iframe> <figcaption> <p>Video: Cpu Caches and Why You Care, by Scott Meyers, at the code::dive conference 2014</p> <p>This talk explores CPU caches and their impact on program performance.</p> </figcaption> </figure> </details> <p>So. Let's be serious: I suggest trying to do some Data-oriented Design here. We start by putting all our data in a single type:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> struct</span><span class="z-entity z-name"> RoomListItem</span><span> {</span></span> <span class="giallo-l"><span class="z-comment"> /// Cache of `Room::latest_event_timestamp`.</span></span> <span class="giallo-l"><span class="z-variable"> cached_latest_event_timestamp</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Option</span><span>&lt;</span><span class="z-entity z-name">MilliSecondsSinceUnixEpoch</span><span>&gt;,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> /// Cache of `Room::latest_event_is_local`.</span></span> <span class="giallo-l"><span class="z-variable"> cached_latest_event_is_local</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> bool</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> /// Cache of `Room::recency_stamp`.</span></span> <span class="giallo-l"><span class="z-variable"> cached_recency_stamp</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Option</span><span>&lt;</span><span class="z-entity z-name">RoomRecencyStamp</span><span>&gt;,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> /// Cache of `Room::cached_display_name`, already as a string.</span></span> <span class="giallo-l"><span class="z-variable"> cached_display_name</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Option</span><span>&lt;</span><span class="z-entity z-name">String</span><span>&gt;,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> /// Cache of `Room::is_space`.</span></span> <span class="giallo-l"><span class="z-variable"> cached_is_space</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> bool</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Cache of `Room::state`.</span></span> <span class="giallo-l"><span class="z-variable"> cached_state</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> RoomState</span><span>,</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">impl</span><span class="z-entity z-name"> RoomListItem</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> refresh_cached_data</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-storage">mut</span><span class="z-variable z-language"> self</span><span>,</span><span class="z-variable"> room</span><span class="z-keyword z-operator">: &amp;</span><span class="z-entity z-name">Room</span><span>) {</span></span> <span class="giallo-l"><span class="z-variable z-language"> self</span><span class="z-keyword z-operator">.</span><span>cached_latest_event_timestamp </span><span class="z-keyword z-operator">=</span><span class="z-variable"> room</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">new_latest_event_timestamp</span><span>();</span></span> <span class="giallo-l"><span class="z-comment"> // etc.</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>At this point, the size of <code>RoomListItem</code> is 64 bytes, acceptably small!</p> <div class="conversation" data-character="factotum"> <div class="conversation--character"> <span lang="fr">Le Factotum</span> <picture role="presentation"> <source srcset="/image/factotum.avif" type="image/avif" /> <source srcset="/image/factotum.webp" type="image/webp" /> <img src="/image/factotum.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>The L1 and L2 caches nowadays have a size of several kilobytes. You can try to run <a rel="noopener external" target="_blank" href="https://man.freebsd.org/cgi/man.cgi?query=sysctl"><code>sysctl</code></a> or <a rel="noopener external" target="_blank" href="https://linux.die.net/man/1/getconf"><code>getconf</code></a> in a shell to see how much your hardware supports (look for an entry like “cache line”, or “cache line size” for example).</p> <p>On my system for example, the L1 (data) cache size is 65Kb, and the cache line size is 128 bytes.</p> <p>Ideally, we —at the very least— want one <code>RoomListItem</code> to fit in a cache line. Compacting the type to avoid inner padding would be ideal. If there is a <em>cache miss</em> in L1, the CPU will look at the next cache, so L2, and so on, until reaching the main memory. So the cost of a cache miss is: look up in L1, plus cache miss, plus look up in L2, etc.</p> </div> </div> <p><a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-rust-sdk/commit/a84c97b292c658109bfb40391b5f10b0708276d4">A bit of plumbing later</a>, this new <code>RoomListItem</code> type is used everywhere by the Room List, by all its filters and all its sorters. For example, the <code>latest_event</code> sorter now looks like:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub fn</span><span class="z-entity z-name z-function"> new_sorter</span><span>()</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-keyword"> impl</span><span class="z-entity z-name"> Sorter</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> latest_events</span><span class="z-keyword z-operator"> = |</span><span class="z-variable">left</span><span class="z-keyword z-operator">: &amp;</span><span class="z-entity z-name">RoomListItem</span><span>,</span><span class="z-variable"> right</span><span class="z-keyword z-operator">: &amp;</span><span class="z-entity z-name">RoomListItem</span><span class="z-keyword z-operator">|</span><span> {</span></span> <span class="giallo-l"><span> (</span><span class="z-variable">left</span><span class="z-keyword z-operator">.</span><span>cached_latest_event_is_local,</span><span class="z-variable"> right</span><span class="z-keyword z-operator">.</span><span>cached_latest_event_is_local)</span></span> <span class="giallo-l"><span> };</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> move</span><span class="z-keyword z-operator"> |</span><span class="z-variable">left</span><span>,</span><span class="z-variable"> right</span><span class="z-keyword z-operator">| -&gt;</span><span class="z-entity z-name"> Ordering</span><span> {</span><span class="z-entity z-name z-function"> cmp</span><span>(</span><span class="z-variable">latest_events</span><span>,</span><span class="z-variable"> left</span><span>,</span><span class="z-variable"> right</span><span>) }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>The lock acquisitions happen only in <code>refresh_cached_data</code>, when a new update happens, not during the filtering or sorting anymore. Let's see what the benchmark has to say now.</p> <p>Before:</p> <pre class="giallo z-code"><code data-lang="shellscript"><span class="giallo-l"><span class="z-entity z-name">$</span><span class="z-string"> cargo bench</span><span class="z-constant z-other"> --bench</span><span class="z-string"> room_list</span></span> <span class="giallo-l"><span class="z-entity z-name">RoomList/Create/1000</span><span class="z-string"> rooms ×</span><span class="z-constant z-numeric"> 1000</span><span class="z-string"> events</span></span> <span class="giallo-l"><span> time: [</span><span class="z-constant z-numeric">53.027</span><span> ms </span><span class="z-constant z-numeric">53.149</span><span> ms </span><span class="z-constant z-numeric">53.273</span><span> ms]</span></span> <span class="giallo-l"><span class="z-entity z-name"> thrpt:</span><span> [18.771</span><span class="z-string"> Kelem/s</span><span class="z-constant z-numeric"> 18.815</span><span class="z-string"> Kelem/s</span><span class="z-constant z-numeric"> 18.858</span><span class="z-string"> Kelem/s]</span></span></code></pre> <p>After:</p> <pre class="giallo z-code"><code data-lang="shellscript"><span class="giallo-l"><span class="z-entity z-name">$</span><span class="z-string"> cargo bench</span><span class="z-constant z-other"> --bench</span><span class="z-string"> room_list</span></span> <span class="giallo-l"><span class="z-entity z-name">RoomList/Create/1000</span><span class="z-string"> rooms ×</span><span class="z-constant z-numeric"> 1000</span><span class="z-string"> events</span></span> <span class="giallo-l"><span> time: [</span><span class="z-constant z-numeric">676.29</span><span> µs </span><span class="z-constant z-numeric">676.84</span><span> µs </span><span class="z-constant z-numeric">677.50</span><span> µs]</span></span> <span class="giallo-l"><span class="z-entity z-name"> thrpt:</span><span> [1.4760</span><span class="z-string"> Melem/s</span><span class="z-constant z-numeric"> 1.4775</span><span class="z-string"> Melem/s</span><span class="z-constant z-numeric"> 1.4787</span><span class="z-string"> Melem/s]</span></span> <span class="giallo-l"><span class="z-entity z-name"> change:</span></span> <span class="giallo-l"><span> time: [-98.725% -98.721% -98.716%] (</span><span class="z-entity z-name">p</span><span class="z-string"> =</span><span class="z-constant z-numeric"> 0.00</span><span class="z-keyword z-operator"> &lt;</span><span class="z-constant z-numeric"> 0.05</span><span>)</span></span> <span class="giallo-l"><span class="z-entity z-name"> thrpt:</span><span> [+7686.9%</span><span class="z-string"> +7718.5% +7745.6%]</span></span> <span class="giallo-l"><span class="z-entity z-name"> Performance</span><span class="z-string"> has improved.</span></span></code></pre> <p>Boom!</p> <p>We don't see the 5 minutes lag mentioned by the reporters, but remember it's random. Nonetheless, <strong>the performance impact is huge</strong>:</p> <ul> <li>From 18.8Kelem/s to 1.4Melem/s,</li> <li>From 53ms to 676µs, or —to compare with the same unit— 0.676ms, so <strong>78× faster</strong>!</li> <li>The throughput has improved by 7718.5%, and the time by 98.7%.</li> </ul> <p>Can we claim victory now?</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>Apparently yes! The reporters were unable to reproduce the problem anymore. It seems it's solved! Looking at profilers, we see millions fewer allocations in the benchmark runs (the benchmark does a lot of allocations for the setup, but the difference is pretty noticeable).</p> <p>Data-oriented Design is fascinating. Understanding how computers work, how the memory and the CPU work, is crucial to optimise algorithms. The changes we've applied are small compared to the performance improvement they have provided!</p> <p>You said everything above 200ms is unacceptable. With 676µs, I reckon the target is reached. It's even below the napkin maths about main memory access, which suggests we are not hitting the RAM anymore in the filters and sorters (not in an uncivilised way at least). Also, it's funny that the difference between an L1/L2 cache access (1-4ns) and a main memory access (100ns) is on average 40 times faster, which looks suspiciously similar to the 78 times factor we see here. It also suggests we are hitting L1 more frequently than L2, which is a good sign!</p> </div> </div> <p>The benchmark Iteration Times and Regression graphs are interesting to look at.</p> <figure> <p><a href="https://mnt.io/articles/about-memory-pressure-lock-contention-and-data-oriented-design/./1-iteration-times.svg"><img src="https://mnt.io/articles/about-memory-pressure-lock-contention-and-data-oriented-design/./1-iteration-times.svg" alt="Iteration times" loading="lazy" decoding="async" /></a></p> <figcaption> <p>The initial Iteration Times, before our patches. Notice how the points do not follow any “trend”. It&#39;s a clear sign the program is acting erratically.</p> </figcaption> </figure> <figure> <p><a href="https://mnt.io/articles/about-memory-pressure-lock-contention-and-data-oriented-design/./2-iteration-times.svg"><img src="https://mnt.io/articles/about-memory-pressure-lock-contention-and-data-oriented-design/./2-iteration-times.svg" alt="Iteration times" loading="lazy" decoding="async" /></a></p> <figcaption> <p>The final Iteration Times/Regression, after our patches. Notice how the points are linear.</p> </figcaption> </figure> <p>The second graph is the kind of graph I like. Predictable.</p> <div class="conversation" data-character="procureur"> <div class="conversation--character"> <span lang="fr">Le Procureur</span> <picture role="presentation"> <source srcset="/image/procureur.avif" type="image/avif" /> <source srcset="/image/procureur.webp" type="image/webp" /> <img src="/image/procureur.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>In this concrete case, it's difficult to improve the performance further because <code>RoomListItem</code> is used by sorters, and by filters, and in other places of the code. The current usage of <code>RoomListItem</code> falls into the definition of <em>Array of Structures</em> in the Data-oriented Design terminology. After all, we clearly have a <code>Vec&lt;RoomListItem&gt;</code> at the root of everything. It is efficient but <em>Structure of Arrays</em> might be even more efficient. Instead of having:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-storage">struct</span><span class="z-entity z-name"> RoomListItem</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> a</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> bool</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> b</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> u64</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> c</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> bool</span><span>,</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">let</span><span class="z-variable"> rooms</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Vec</span><span>&lt;</span><span class="z-entity z-name">RoomListItem</span><span>&gt;;</span></span></code></pre> <p>we would have:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-storage">struct</span><span class="z-entity z-name"> RoomListItems</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> a</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Vec</span><span>&lt;</span><span class="z-entity z-name">bool</span><span>&gt;,</span></span> <span class="giallo-l"><span class="z-variable"> b</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Vec</span><span>&lt;</span><span class="z-entity z-name">u64</span><span>&gt;,</span></span> <span class="giallo-l"><span class="z-variable"> c</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Vec</span><span>&lt;</span><span class="z-entity z-name">bool</span><span>&gt;,</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">let</span><span class="z-variable"> rooms</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> RoomListItems</span><span>;</span></span></code></pre> <p>This is not applicable in our situation because sorters are iterating over different fields. However, if you're sure only one field in a single loop is used, this <em>Structure of Arrays</em> is cache friendlier as it loads less data into the CPU caches: less padding, fewer useless bytes. By making better use of the cache line, not only we are pretty sure the program will run faster, but the CPU will be better at predicting what data will be loaded in the cache line, boosting the performance even more!</p> <p>Just so you know my role here is not restricted to recite documentation or to summarise Wikipedia entries.</p> </div> </div> <p>Of course you&#39;re valuable! Now, the surprise.</p> <h2 id="">The Dessert<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h2> <p>Of course, let&#39;s not forget about our dessert! I won&#39;t dig too much: the patch contains all the necessary gory details. In short, it&#39;s about how <code>VectorDiff::Set</code> can create a nasty bug in <code>SortBy</code>. Basically, when a value in the vector is updated, a <code>VectorDiff::Set</code> is emitted. <code>SortBy</code> is then responsible for computing a new <code>VectorDiff</code>:</p> <ul> <li>it was calculating the old position of the value,</li> <li>it was calculating the new position,</li> <li>depending on that, it was emitting the appropriate <code>VectorDiff</code>s.</li> </ul> <p>However, the old “value” wasn&#39;t removed from the buffer <em>immediately</em> and not <em>every time</em>. In theory, it should not cause any problem —it was an optimisation after all— except if… the items manipulated by the stream are “shallow clones”. Shallow cloning a value won&#39;t copy the value entirely: we get a new value, but its state is synced with the original value. This happens with types such as:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[derive(</span><span class="z-entity z-name">Clone</span><span>)]</span></span> <span class="giallo-l"><span class="z-storage">struct</span><span class="z-entity z-name"> S</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> inner</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Arc</span><span>&lt;</span><span class="z-entity z-name">T</span><span>&gt;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Here, cloning a value of type <code>S</code> and changing its <code>inner</code> field will also update the original value.</p> <p>Just like that, it was possible to systematically create… <strong>an infinite loop</strong>. Funky isn&#39;t it?</p> <p>You can view the patch <a rel="noopener external" target="_blank" href="https://github.com/jplatte/eyeball/pull/80">Fix an infinite loop when <code>SortBy&lt;Stream&lt;Item = T&gt;&gt;</code> handles a <code>VectorDiff::Set</code> where <code>T</code> is a shallow clone type</a> to learn more.</p> <p>I think this is a concrete example of when jumping on an optimisation can lead to a bug. I&#39;m not saying we should not prematurely optimise our programs: I&#39;m a partisan of the “we should” camp. I&#39;m saying that bugs can be pretty subtle sometimes, and this bug would have been avoided if we hadn&#39;t taken a shortcut in this algorithm. It&#39;s important to be correct first, then measure, then improve.</p> <p>I hope you&#39;ve learned a couple of things, and you&#39;ve enjoyed your reading.</p> <p>I would like to thank <a rel="noopener external" target="_blank" href="https://artificialworlds.net/blog/">Andy Balaam</a> and <a rel="noopener external" target="_blank" href="https://github.com/poljar">Damir Jelić</a> for the reviews and the feedback!</p> <section class="footnotes"> <ol class="footnotes-list"> <li id="fn-vectordiff_on_other_uis"> <p>On <a rel="noopener external" target="_blank" href="https://developer.apple.com/swiftui/">SwiftUI</a>, there is the <a rel="noopener external" target="_blank" href="https://developer.apple.com/documentation/swift/collectiondifference/change"><code>CollectionDifference.Change</code></a> enum. For example: <code>VectorDiff::PushFront</code> is equivalent to <code>Change.insert(offset: 0)</code>. On <a rel="noopener external" target="_blank" href="https://developer.android.com/compose">Jetpack Compose</a>, there is <a rel="noopener external" target="_blank" href="https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.collections/-mutable-list/"><code>MutableList</code></a> object. For example: <code>VectorDiff::Clear</code> is equivalent to <code>MutableList.clear()</code>! <a href="#fr-vectordiff_on_other_uis-1">↩</a></p> </li> <li id="fn-switch"> <p>I would <em>love</em> to talk about how this <code>Stream</code> produces a <code>Stream</code>, how the outer stream and the inner stream are switched (with <code>.switch()</code>!), how we&#39;ve implemented that from scratch, but it&#39;s probably for another article. Meanwhile, you can take a look at <a rel="noopener external" target="_blank" href="https://docs.rs/async-rx/0.1.3/async_rx/struct.Switch.html"><code>async_rx::Switch</code></a>. <a href="#fr-switch-1">↩</a></p> </li> <li id="fn-stream_assert"> <p>Do you know <a rel="noopener external" target="_blank" href="https://docs.rs/stream_assert/0.1.1/stream_assert/"><code>stream_assert</code></a>? It&#39;s another crate we&#39;ve written to easily apply assertions on <code>Stream</code>s. Pretty convenient. <a href="#fr-stream_assert-1">↩</a></p> </li> <li id="fn-biscuit"> <p>Yes, <a rel="noopener external" target="_blank" href="https://www.biscuitsec.org/">biscuit</a>. <a href="#fr-biscuit-1">↩</a></p> </li> <li id="fn-napkin-math"> <p>I highly recommend to read the <a rel="noopener external" target="_blank" href="https://github.com/sirupsen/napkin-math/">Napkin Math</a> project, with the great talk at <a rel="noopener external" target="_blank" href="https://www.youtube.com/watch?v=IxkSlnrRFqc">SRECON&#39;19, <cite>Advanced Napkin Math: Estimating System Performance from First Principles</cite> by Simon Eskildsen</a>. <a href="#fr-napkin-math-1">↩</a></p> </li> <li id="fn-suspens"> <p>Do you remember the lock contention? Wait for it. At this step of the story, I wasn&#39;t aware we had a lock contention yet. <a href="#fr-suspens-1">↩</a></p> </li> <li id="fn-talks"> <p>If you are curious and enjoy watching talks, I&#39;m maintaining <a rel="noopener external" target="_blank" href="https://www.youtube.com/playlist?list=PLOkMRkzDhWGX_4YWI4ZYGbwFPqKnDRudf">a playlist of interesting talks I&#39;ve watched</a>. Also you can read this old article <a href="https://mnt.io/articles/one-conference-per-day-for-one-year-2017/">Once conference per day, for one year (2017)</a>. <a href="#fr-talks-1">↩</a></p> </li> </ol> </section> From 19k to 4.2M events/sec: story of a SQLite query optimisation 2025-09-12T00:00:00+00:00 2025-09-12T00:00:00+00:00 Unknown https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/ <p>Sit down comfortably. Take a cushion if you wish. This is, <i>clear its throat</i>, the story of a funny performance quest. The <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-rust-sdk">Matrix Rust SDK</a> is a set of crates aiming at providing all the necessary tooling to develop robust and safe <a rel="noopener external" target="_blank" href="https://matrix.org/">Matrix</a> clients. Of course, it involves databases to persist some data. The Matrix Rust SDK supports multiple databases: in-memory, <a rel="noopener external" target="_blank" href="https://sqlite.org/">SQLite</a>, and <a rel="noopener external" target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API">IndexedDB</a>. This story is about the SQLite database.</p> <p>The structure we want to persist is a novel type we have designed specifically for the Matrix Rust SDK: a <a rel="noopener external" target="_blank" href="https://docs.rs/matrix-sdk-common/0.14.0/matrix_sdk_common/linked_chunk/index.html"><code>LinkedChunk</code></a>. It's the underlying structure that holds all events manipulated by the Matrix Rust SDK. It is somewhat similar to a <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Linked_List">linked list</a>; the differences are subtle and the goal of this article is <em>not</em> to present all the details. We have developed many API around this type to make all operations fast and efficient in the context of the Matrix protocol. What we need to know is that in a <code>LinkedChunk&lt;_, Item, Gap&gt;</code>, each node contains a <code>ChunkContent&lt;Item, Gap&gt;</code> defined as:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-storage">enum</span><span class="z-entity z-name"> ChunkContent</span><span>&lt;</span><span class="z-entity z-name">Item</span><span>,</span><span class="z-entity z-name"> Gap</span><span>&gt; {</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> Gap</span><span>(</span><span class="z-entity z-name">Gap</span><span>),</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> Items</span><span>(</span><span class="z-entity z-name">Vec</span><span>&lt;</span><span class="z-entity z-name">Item</span><span>&gt;),</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Put it differently: each node can contain a <em>gap</em>, or a set of <em>items</em> (be Matrix events).</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>May I recapitulate?</p> <p>Each Matrix <em>room</em> contains a <code>LinkedChunk</code>, which is a set of <em>chunks</em>. Each <em>chunk</em> is either a <em>gap</em> or a set of <em>events</em>. It seems to map fairly easily to SQL tables, isn't it?</p> </div> </div> <p>You're right: it's pretty straightforward! Let's see the first table: <code>linked_chunks</code> which contains all the chunks. (Note that the schemas are simplified for the sake of clarity).</p> <pre class="giallo z-code"><code data-lang="sql"><span class="giallo-l"><span class="z-keyword">CREATE TABLE</span><span> &quot;</span><span class="z-entity z-name z-function">linked_chunks</span><span>&quot; (</span></span> <span class="giallo-l"><span class="z-comment"> -- Which linked chunk does this chunk belong to?</span></span> <span class="giallo-l"><span class="z-string"> &quot;linked_chunk_id&quot;</span><span> BLOB </span><span class="z-keyword">NOT NULL</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> -- Identifier of the chunk, unique per linked chunk.</span></span> <span class="giallo-l"><span class="z-string"> &quot;id&quot;</span><span class="z-storage"> INTEGER</span><span class="z-keyword"> NOT NULL</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> -- Identifier of the previous chunk.</span></span> <span class="giallo-l"><span class="z-string"> &quot;previous&quot;</span><span class="z-storage"> INTEGER</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> -- Identifier of the next chunk.</span></span> <span class="giallo-l"><span class="z-string"> &quot;next&quot;</span><span class="z-storage"> INTEGER</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> -- Our enum for the content of the chunk: `E` for events, `G` for a gap.</span></span> <span class="giallo-l"><span class="z-string"> &quot;type&quot;</span><span class="z-storage"> TEXT CHECK</span><span>(</span><span class="z-string">&quot;type&quot;</span><span class="z-keyword"> IN</span><span> (</span><span class="z-string">&#39;E&#39;</span><span>, </span><span class="z-string">&#39;G&#39;</span><span>)) </span><span class="z-keyword">NOT NULL</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> -- … other things …</span></span> <span class="giallo-l"><span>);</span></span></code></pre> <p>Alrighty. Next contenders: the <code>event_chunks</code> and the <code>gap_chunks</code> tables, which store the <code>ChunkContent</code>s of each chunk, respectively for <code>ChunkContent::Items</code> and <code>ChunkContent::Gap</code>. In <code>event_chunks</code>, each row corresponds to an event. In <code>gap_chunks</code>, each row corresponds to a gap.</p> <pre class="giallo z-code"><code data-lang="sql"><span class="giallo-l"><span class="z-keyword">CREATE TABLE</span><span> &quot;</span><span class="z-entity z-name z-function">event_chunks</span><span>&quot; (</span></span> <span class="giallo-l"><span class="z-comment"> -- Which linked chunk does this event belong to?</span></span> <span class="giallo-l"><span class="z-string"> &quot;linked_chunk_id&quot;</span><span> BLOB </span><span class="z-keyword">NOT NULL</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> -- Which chunk does this event refer to?</span></span> <span class="giallo-l"><span class="z-string"> &quot;chunk_id&quot;</span><span class="z-storage"> INTEGER</span><span class="z-keyword"> NOT NULL</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> -- The event ID.</span></span> <span class="giallo-l"><span class="z-string"> &quot;event_id&quot;</span><span> BLOB </span><span class="z-keyword">NOT NULL</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> -- Position (index) in the **chunk**.</span></span> <span class="giallo-l"><span class="z-string"> &quot;position&quot;</span><span class="z-storage"> INTEGER</span><span class="z-keyword"> NOT NULL</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> -- … other things …</span></span> <span class="giallo-l"><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">CREATE TABLE</span><span> &quot;</span><span class="z-entity z-name z-function">gap_chunks</span><span>&quot; (</span></span> <span class="giallo-l"><span class="z-comment"> -- Which linked chunk does this event belong to?</span></span> <span class="giallo-l"><span class="z-string"> &quot;linked_chunk_id&quot;</span><span> BLOB </span><span class="z-keyword">NOT NULL</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> -- Which chunk does this gap refer to?</span></span> <span class="giallo-l"><span class="z-string"> &quot;chunk_id&quot;</span><span class="z-storage"> INTEGER</span><span class="z-keyword"> NOT NULL</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> -- … other things …</span></span> <span class="giallo-l"><span>);</span></span></code></pre> <p>Last contender, <code>events</code>. The assiduous reader may have noted that <code>event_chunks</code> doesn't contain the content of the events: only its ID and its position, <i>roll its eyes</i>… let's digress a bit, should we? Why is that? To handle out-of-band events. In the Matrix protocol, we can receive events via:</p> <ul> <li><a rel="noopener external" target="_blank" href="https://spec.matrix.org/v1.15/client-server-api/#get_matrixclientv3sync">the <code>/sync</code> endpoint</a>, it's the main source of inputs, we get most of the events via this API,</li> <li><a rel="noopener external" target="_blank" href="https://spec.matrix.org/v1.15/client-server-api/#get_matrixclientv3roomsroomidmessages">the <code>/messages</code> endpoint</a>, when we need to get events around a particular events; this is helpful if we need to paginate backwards or forwards around an event,</li> <li><a rel="noopener external" target="_blank" href="https://spec.matrix.org/v1.15/client-server-api/#get_matrixclientv3roomsroomidcontexteventid">the <code>/context</code> endpoint</a>, if we need to get more context about an event.</li> <li>but there is more, like <a rel="noopener external" target="_blank" href="https://spec.matrix.org/v1.15/client-server-api/#mroompinned_events">pinned events</a>, and so on.</li> </ul> <p>When an event is fetched but cannot be positioned regarding other events, it is considered <em>out-of-band</em>: it belongs to zero linked chunk, but we keep it in the database. Maybe we can attach it to a linked chunk later, or we want to keep it for saving future network requests. Anyway. You're a great digression companion. Let's jump back to our tables.</p> <p>The <code>events</code> table contains <em>all</em> the events: in-band <em>and</em> out-of-band.</p> <pre class="giallo z-code"><code data-lang="sql"><span class="giallo-l"><span class="z-comment">-- Events and their content.</span></span> <span class="giallo-l"><span class="z-keyword">CREATE TABLE</span><span> &quot;</span><span class="z-entity z-name z-function">events</span><span>&quot; (</span></span> <span class="giallo-l"><span class="z-comment"> -- The ID of the event.</span></span> <span class="giallo-l"><span class="z-string"> &quot;event_id&quot;</span><span> BLOB </span><span class="z-keyword">NOT NULL</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> -- The JSON encoded content of the event (it&#39;s an encrypted value).</span></span> <span class="giallo-l"><span class="z-string"> &quot;content&quot;</span><span> BLOB </span><span class="z-keyword">NOT NULL</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> -- … other things …</span></span> <span class="giallo-l"><span>);</span></span></code></pre> <p>At some point, we need to fetch metadata about a <code>LinkedChunk</code>. A certain algorithm needs these metadata to work efficiently. We don't need to load all events, however we need:</p> <ul> <li>to know all the chunks that are part of a linked chunk,</li> <li>for each chunk, the number of events: 0 in case of a <code>ChunkContent::Gap</code> (<code>G</code>), or the number of events in case of a <code>ChunkContent::Items</code> (<code>E</code>).</li> </ul> <p>A first implementation has landed in the Matrix Rust SDK. All good. When suddenly…</p> <h2 id="incredibly-slow-sync"><q cite="https://github.com/element-hq/element-x-ios-rageshakes/issues/4248">Incredibly slow sync</q><a role="presentation" class="anchor" href="#incredibly-slow-sync" title="Anchor link to this header">#</a> </h2> <p>A power-user<sup class="footnote-reference" id="fr-power-user-1"><a href="#fn-power-user">1</a></sup> was <a rel="noopener external" target="_blank" href="https://github.com/element-hq/element-x-ios-rageshakes/issues/4248">experiencing slowness</a>. It's always a delicate situation. How to know the reason of the slowness? Is it the device? The network? The asynchronous runtime? A lock contention? The file system? … The database?</p> <p>We don't have the device within easy reach. Hopefully, Matrix users are always nice and willing to help! We have added a bunch of logs, then the user has reproduced the problem, and shared their logs (via a rageshake) with us. Logs are never trivial to analyse. However, here is a tip we use in the Matrix Rust SDK: we have a special tracing type that logs the time spent in a portion of the code; called <a rel="noopener external" target="_blank" href="https://docs.rs/matrix-sdk-common/0.14.0/matrix_sdk_common/tracing_timer/struct.TracingTimer.html"><code>TracingTimer</code></a>.</p> <p>Basically, when a <code>TracingTimer</code> is created, it keeps its creation time in memory. And when the <code>TracingTimer</code> is dropped, it emits a log containing the elapsed time since its creation. It looks like this (it uses <a rel="noopener external" target="_blank" href="https://docs.rs/tracing/">the <code>tracing</code> library</a>):</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> struct</span><span class="z-entity z-name"> TracingTimer</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> id</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> String</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> callsite</span><span class="z-keyword z-operator">: &amp;</span><span>&#39;</span><span class="z-entity z-name">static DefaultCallsite</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> start</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Instant</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> level</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> tracing</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Level</span><span>,</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">impl</span><span class="z-entity z-name"> Drop</span><span class="z-keyword"> for</span><span class="z-entity z-name"> TracingTimer</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> drop</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-storage">mut</span><span class="z-variable z-language"> self</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> enabled</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> tracing</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">level_enabled!</span><span>(</span><span class="z-variable z-language">self</span><span class="z-keyword z-operator">.</span><span>level)</span><span class="z-keyword z-operator"> &amp;&amp;</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> interest</span><span class="z-keyword z-operator"> =</span><span class="z-variable z-language"> self</span><span class="z-keyword z-operator">.</span><span>callsite</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">interest</span><span>();</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword z-operator"> !</span><span class="z-variable">interest</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">is_never</span><span>()</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> &amp;&amp;</span><span class="z-entity z-name"> tracing</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">__macro_support</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">__is_enabled</span><span>(</span><span class="z-variable z-language">self</span><span class="z-keyword z-operator">.</span><span>callsite</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">metadata</span><span>(),</span><span class="z-variable"> interest</span><span>)</span></span> <span class="giallo-l"><span> };</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> if</span><span class="z-keyword z-operator"> !</span><span class="z-variable">enabled</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span>;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> message</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> format!</span><span>(</span><span class="z-string">&quot;_{}_ finished in {:?}&quot;</span><span>,</span><span class="z-variable z-language"> self</span><span class="z-keyword z-operator">.</span><span>id,</span><span class="z-variable z-language"> self</span><span class="z-keyword z-operator">.</span><span>start</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">elapsed</span><span>());</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> metadata</span><span class="z-keyword z-operator"> =</span><span class="z-variable z-language"> self</span><span class="z-keyword z-operator">.</span><span>callsite</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">metadata</span><span>();</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> fields</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> metadata</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">fields</span><span>();</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> message_field</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> fields</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">field</span><span>(</span><span class="z-string">&quot;message&quot;</span><span>)</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">unwrap</span><span>();</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> values</span><span class="z-keyword z-operator"> =</span><span> [(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">message_field</span><span>,</span><span class="z-entity z-name"> Some</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">message</span><span class="z-keyword"> as</span><span class="z-keyword z-operator"> &amp;</span><span class="z-keyword">dyn</span><span class="z-entity z-name"> tracing</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Value</span><span>))];</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // This function is hidden from docs, but we have to use it</span></span> <span class="giallo-l"><span class="z-comment"> // because there is no other way of obtaining a `ValueSet`.</span></span> <span class="giallo-l"><span class="z-comment"> // It&#39;s not entirely clear why it is private. See this issue:</span></span> <span class="giallo-l"><span class="z-comment"> // https://github.com/tokio-rs/tracing/issues/2363</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> values</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> fields</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">value_set</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">values</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name"> tracing</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Event</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">dispatch</span><span>(</span><span class="z-variable">metadata</span><span>,</span><span class="z-keyword z-operator"> &amp;</span><span class="z-variable">values</span><span>);</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>And with that, let's use its companion macro <a rel="noopener external" target="_blank" href="https://docs.rs/matrix-sdk-common/0.14.0/matrix_sdk_common/macro.timer.html"><code>timer!</code></a> (I won't copy-paste it here, it's pretty straightforward):</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>{</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> _timer</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> timer!</span><span>(</span><span class="z-string">&quot;built something important&quot;</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // … build something important …</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // `_timer` is dropped here, and will emit a log.</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>With this technique, we were able to inspect the logs and saw immediately what was slow… assuming we have added <code>timer!</code>s at the right places! It's not magic, it doesn't find performance issues for you. You have to probe the correct places in your code, and refine if necessary.</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>I don't know if you heard about <em>sampling profilers</em>, but those are programs far superior at analysing performance problems, compared to your… rustic <code>TracingTimer</code> (pun intended!). Such programs can provide flamegraphs, call trees etc.</p> <p>I'm personally a regular user of <a rel="noopener external" target="_blank" href="https://github.com/mstange/samply">samply</a>, a command line CPU profiler relying on the <a rel="noopener external" target="_blank" href="https://github.com/firefox-devtools/profiler">Firefox profiler</a> for its UI. It works on macOS, Linux and Windows.</p> </div> </div> <p>I do also use <code>samply</code> pretty often! But you need an access to the processes to use such tools. Here, the Matrix Rust SDK is used and embedded inside Matrix clients. We have no access to it. It lives on devices everywhere around the world. We may use better log analysers to infer “call trees”, but supporting asynchronous logs (because the code is asynchronous) makes it very difficult. And I honestly don't know if such a thing exists.</p> <p>So. Yes. <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-rust-sdk/pull/5407">We found the culprit</a>. With <a rel="noopener external" target="_blank" href="https://github.com/BurntSushi/ripgrep"><code>ripgrep</code></a>, we were able to scan megabytes of logs and find the culprit pretty quickly. I was looking for lags of the order of a second. I wasn't disappointed:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> rg </span><span class="z-string">&#39;_method_ finished in.*&gt; load_all_chunks_metadata&#39;</span><span> all.log </span><span class="z-keyword z-operator">|</span><span class="z-entity z-name"> rg</span><span class="z-string"> &#39;\d+(\.\d+)?s&#39;</span><span class="z-constant z-other"> --only-matching</span><span class="z-keyword z-operator"> |</span><span class="z-entity z-name"> sort</span><span class="z-constant z-other"> --numeric-sort --reverse</span></span> <span class="giallo-l"><span>107.121747125s</span></span> <span class="giallo-l"><span>79.909931458s</span></span> <span class="giallo-l"><span>10.348993583s</span></span> <span class="giallo-l"><span>8.827636417s</span></span> <span class="giallo-l"><span>8.614481625s</span></span> <span class="giallo-l"><span>8.009787875s</span></span> <span class="giallo-l"><span>5.99637875s</span></span> <span class="giallo-l"><span>4.118492334s</span></span> <span class="giallo-l"><span>3.910040333s</span></span> <span class="giallo-l"><span>3.718858334s</span></span> <span class="giallo-l"><span>3.689340667s</span></span> <span class="giallo-l"><span>3.661383208s</span></span></code></pre> <p>107 seconds. Be 1 minute and 47 seconds. Hello sweety.</p> <h2 id="the-slow-query">The slow query<a role="presentation" class="anchor" href="#the-slow-query" title="Anchor link to this header">#</a> </h2> <p><code>load_all_chunks_metadata</code> is a method that runs this SQL query:</p> <pre class="giallo z-code"><code data-lang="sql"><span class="giallo-l"><span class="z-keyword">SELECT</span></span> <span class="giallo-l"><span class="z-constant z-other"> lc</span><span>.</span><span class="z-constant z-other">id</span><span>,</span></span> <span class="giallo-l"><span class="z-constant z-other"> lc</span><span>.</span><span class="z-constant z-other">previous</span><span>,</span></span> <span class="giallo-l"><span class="z-constant z-other"> lc</span><span>.</span><span class="z-constant z-other">next</span><span>,</span></span> <span class="giallo-l"><span class="z-support z-function"> COUNT</span><span>(</span><span class="z-constant z-other">ec</span><span>.</span><span class="z-constant z-other">event_id</span><span>) </span><span class="z-keyword">as</span><span> number_of_events</span></span> <span class="giallo-l"><span class="z-keyword">FROM</span><span> linked_chunks </span><span class="z-keyword">as</span><span> lc</span></span> <span class="giallo-l"><span class="z-keyword">LEFT JOIN</span><span> event_chunks </span><span class="z-keyword">as</span><span> ec</span></span> <span class="giallo-l"><span class="z-keyword">ON</span><span class="z-constant z-other"> ec</span><span>.</span><span class="z-constant z-other">chunk_id</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-other"> lc</span><span>.</span><span class="z-constant z-other">id</span></span> <span class="giallo-l"><span class="z-keyword">WHERE</span><span class="z-constant z-other"> lc</span><span>.</span><span class="z-constant z-other">linked_chunk_id</span><span class="z-keyword z-operator"> =</span><span> ?</span></span> <span class="giallo-l"><span class="z-keyword">GROUP BY</span><span class="z-constant z-other"> lc</span><span>.</span><span class="z-constant z-other">id</span></span></code></pre> <p>For each chunk of the linked chunk, it counts the number of events associated to this chunk. That's it.</p> <p>Do you remember that a chunk can be of two kinds: <code>ChunkContent::Items</code> if it contains a set of events, or <code>ChunkContent::Gap</code> if it contains a gap, so, no event.</p> <p>This query does the following:</p> <ol> <li>if the chunk is of kind <code>ChunkContent::Items</code>, it does count all events associated to itself (via <code>ec.chunk_id = lc.id</code>),</li> <li>otherwise, the chunk is of kind <code>ChunkContent::Gap</code>, so it will try to count but… no event is associated to it: it's impossible to get <code>ec.chunk_id = lc.id</code> to be true for a gap. This query will scan <em>all events</em> for each gap… for no reason whatsoever! This is a linear scan here. If there are 300 gaps for this linked chunk, and 5000 events, 1.5 millions events will be scanned for <strong>no reason</strong>!</li> </ol> <p>How lovingly inefficient.</p> <h2 id="12-6x-faster"><math><mn>12.6</mn><mo>×</mo></math> faster<a role="presentation" class="anchor" href="#12-6x-faster" title="Anchor link to this header">#</a> </h2> <p><q>Let's use an <a rel="noopener external" target="_blank" href="https://sqlite.org/lang_createindex.html"><code>INDEX</code></a></q> I hear you say (let's pretend you're saying that, please, for the sake of the narrative!).</p> <p>A database index provides rapid lookups after all. It has become a reflex amongst the developer community.</p> <div class="conversation" data-character="procureur"> <div class="conversation--character"> <span lang="fr">Le Procureur</span> <picture role="presentation"> <source srcset="/image/procureur.avif" type="image/avif" /> <source srcset="/image/procureur.webp" type="image/webp" /> <img src="/image/procureur.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>Indexes are designed to quickly locate data without scanning the full table. An index contains a copy of the data, organised in a way enabling very efficient search. Behind the scene, it uses various data structures, involving trade-offs between lookup performance and index size. Most of the time, an index makes it possible to transform a linear lookup, <math> <mi>O</mi><mo>(</mo> <mi>n</mi> <mo>)</mo> </math>, to a logarithmic lookup, <math> <mi>O</mi><mo>(</mo> <mo lspace="0" rspace="0">log</mo><mo>(</mo> <mi>n</mi> <mo>)</mo> <mo>)</mo> </math>. See <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Database_index">Database index</a> to learn more.</p> </div> </div> <p>That's correct. But we didn't want to use an index here. The reason is twofold:</p> <ol> <li><strong>More spaces</strong>. Remember that <em>Le Procureur</em> said an index contains a <em>copy</em> of the data. Here, the data is <a rel="noopener external" target="_blank" href="https://spec.matrix.org/v1.15/appendices/#event-ids">the event ID</a>. It's not heavy, but it's not nothing. Moreover, we are not counting the <em>key</em> to associate the <em>copied data</em> to the row containing the real data in the source table.</li> <li><strong>Still extra useless time</strong>. We would still need to traverse the index for gaps, which is pointless. <a rel="noopener external" target="_blank" href="https://sqlite.org/arch.html">SQLite implements indexes as B-Trees</a>, which is really efficient, but still, we already know that a gap has zero event because… it's… a gap between events!</li> </ol> <p>Do you remember that the <code>linked_chunks</code> table has a <code>type</code> column? It contains <code>E</code> when the chunk is of kind <code>ChunkContent::Items</code> —it represents a set of events—, and <code>G</code> when of kind <code>ChunkContent::Gap</code> —it represents a gap—. Maybe… <i> stare into the void</i></p> <div class="conversation" data-character="factotum"> <div class="conversation--character"> <span lang="fr">Le Factotum</span> <picture role="presentation"> <source srcset="/image/factotum.avif" type="image/avif" /> <source srcset="/image/factotum.webp" type="image/webp" /> <img src="/image/factotum.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>May I interrupt?</p> <p>Do you know that SQLite provides <a rel="noopener external" target="_blank" href="https://sqlite.org/lang_expr.html#the_case_expression">a <code>CASE</code> expression</a>? I know it's unusual. SQL designers prefer to think in terms of sets, sub-sets, joins, temporal tables, partial indexes… but honestly, for what I'm concerned, in our case, it's simple enough and it can be powerful. It's a maddeningly pragmatic <code>match</code> statement.</p> <p>Moreover, the <code>type</code> column is already typed as an enum with the <code>CHECK("type" IN ('E', 'G'))</code> constraint. Maybe the SQL engine can run some even smarter optimisations for us.</p> </div> </div> <p>Oh, that would be brilliant! If <code>type</code> is <code>E</code>, we count the number of events, otherwise we conclude it's <em>de facto</em> zero, isn't it? Let's try. The SQL query then becomes:</p> <pre class="giallo z-code"><code data-lang="sql"><span class="giallo-l"><span class="z-keyword">SELECT</span></span> <span class="giallo-l"><span class="z-constant z-other"> lc</span><span>.</span><span class="z-constant z-other">id</span><span>,</span></span> <span class="giallo-l"><span class="z-constant z-other"> lc</span><span>.</span><span class="z-constant z-other">previous</span><span>,</span></span> <span class="giallo-l"><span class="z-constant z-other"> lc</span><span>.</span><span class="z-constant z-other">next</span><span>,</span></span> <span class="giallo-l"><span class="z-keyword"> CASE</span><span class="z-constant z-other"> lc</span><span>.</span><span class="z-constant z-other">type</span></span> <span class="giallo-l"><span class="z-keyword"> WHEN</span><span class="z-string"> &#39;E&#39;</span><span class="z-keyword"> THEN</span><span> (</span></span> <span class="giallo-l"><span class="z-keyword"> SELECT</span><span class="z-support z-function"> COUNT</span><span>(</span><span class="z-constant z-other">ec</span><span>.</span><span class="z-constant z-other">event_id</span><span>)</span></span> <span class="giallo-l"><span class="z-keyword"> FROM</span><span> event_chunks </span><span class="z-keyword">as</span><span> ec</span></span> <span class="giallo-l"><span class="z-keyword"> WHERE</span><span class="z-constant z-other"> ec</span><span>.</span><span class="z-constant z-other">chunk_id</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-other"> lc</span><span>.</span><span class="z-constant z-other">id</span></span> <span class="giallo-l"><span> )</span></span> <span class="giallo-l"><span class="z-keyword"> ELSE</span></span> <span class="giallo-l"><span class="z-constant z-numeric"> 0</span></span> <span class="giallo-l"><span class="z-keyword"> END</span></span> <span class="giallo-l"><span class="z-keyword"> as</span><span> number_of_events</span></span> <span class="giallo-l"><span class="z-keyword">FROM</span><span> linked_chunks </span><span class="z-keyword">as</span><span> lc</span></span> <span class="giallo-l"><span class="z-keyword">WHERE</span><span class="z-constant z-other"> lc</span><span>.</span><span class="z-constant z-other">linked_chunk_id</span><span class="z-keyword z-operator"> =</span><span> ?</span></span></code></pre> <p>Since we have spotted the problem, we have written a benchmark to measure the solutions. The benchmark simulates 10'000 events, with 1 gap every 80 events. A set of data we consider <em>realistic</em> somehow for a normal user (not for a power-user though, because a power-user has usually more gaps than events). Here are the before/after results.</p> <figure> <table> <thead> <tr> <th></th> <th title="0.95 confidence level">Lower bound</th> <th>Estimate</th> <th title="0.95 confidence level">Upper bound</th> </tr> </thead> <tbody> <tr> <td>Throughput</td> <td>19.832 Kelem/s</td> <td>19.917 Kelem/s</td> <td>19.999 Kelem/s</td> </tr> <tr> <td><math><msup><mi>R</mi><mn>2</mn></msup></math></td> <td>0.0880234</td> <td>0.1157540</td> <td>0.0857823</td> </tr> <tr> <td>Mean</td> <td>500.03 ms</td> <td>502.08 ms</td> <td>504.24 ms</td> </tr> <tr> <td title="Standard Deviation">Std. Dev.</td> <td>2.2740 ms</td> <td>3.6256 ms</td> <td>4.1963 ms</td> </tr> <tr> <td>Median</td> <td>498.23 ms</td> <td>500.93 ms</td> <td>506.25 ms</td> </tr> <tr> <td title="Median Absolute Deviation">MAD</td> <td>129.84 µs</td> <td>4.1713 ms</td> <td>6.1184 ms</td> </tr> </tbody> </table> <figcaption> <p>Benchmark's results for the original query with <code>COUNT</code> and <code>LEFT JOIN</code>.</p> </figcaption> </figure> <details> <summary> <p>The Probability Distribution Function graph, and the Iteration times graph for the <code>LEFT JOIN</code> approach</p> </summary> <figure> <p><a href="https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/./1-pdf.svg"><img src="https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/./1-pdf.svg" alt="Probability distribution function" loading="lazy" decoding="async" /></a></p> <figcaption> <p>Benchmark&#39;s Probability Distribution Function for the <code>LEFT JOIN</code> approach.</p> </figcaption> </figure> <figure> <p><a href="https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/./1-iteration-times.svg"><img src="https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/./1-iteration-times.svg" alt="Iteration times" loading="lazy" decoding="async" /></a></p> <figcaption> <p>Benchmark&#39;s Iteration Times for the <code>LEFT JOIN</code> approach.</p> </figcaption> </figure> </details> <figure> <table> <thead> <tr> <th></th> <th title="0.95 confidence level">Lower bound</th> <th>Estimate</th> <th title="0.95 confidence level">Upper bound</th> </tr> </thead> <tbody> <tr> <td>Throughput</td> <td>251.61 Kelem/s</td> <td>251.84 Kelem/s</td> <td>251.98 Kelem/s</td> </tr> <tr> <td><math><msup><mi>R</mi><mn>2</mn></msup></math></td> <td>0.9999778</td> <td>0.9999833</td> <td>0.9999673</td> </tr> <tr> <td>Mean</td> <td>39.684 ms</td> <td>39.703 ms</td> <td>39.726 ms</td> </tr> <tr> <td title="Standard Deviation">Std. Dev.</td> <td>8.8237 µs</td> <td>35.948 µs</td> <td>47.987 µs</td> </tr> <tr> <td>Median</td> <td>39.683 ms</td> <td>39.691 ms</td> <td>39.725 ms</td> </tr> <tr> <td title="Median Absolute Deviation">MAD</td> <td>1.9369 µs</td> <td>13.000 µs</td> <td>50.566 µs</td> </tr> </tbody> </table> <figcaption> <p>Benchmark&#39;s results for the new query with the <code>CASE</code> expression.</p> </figcaption> </figure> <details> <summary> <p>The Probability Distribution Function graph, and the Linear Regression graph for the <code>CASE</code> approach</p> </summary> <figure> <p><a href="https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/./2-pdf.svg"><img src="https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/./2-pdf.svg" alt="Probability distribution function" loading="lazy" decoding="async" /></a></p> <figcaption> <p>Benchmark&#39;s Probability Distribution Function for the <code>CASE</code> approach.</p> </figcaption> </figure> <figure> <p><a href="https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/./2-linear-regression.svg"><img src="https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/./2-linear-regression.svg" alt="Linear regression" loading="lazy" decoding="async" /></a></p> <figcaption> <p>Benchmark&#39;s Linear Regression for the <code>CASE</code> approach.</p> </figcaption> </figure> </details> <p>The throughput and the time are <math><mn>12.6</mn><mo>×</mo></math> better. No <code>INDEX</code>. No more <code>LEFT JOIN</code>. Just a simple <code>CASE</code> expression. <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-rust-sdk/pull/5411">You can see the patches containing the benchmark and the fix</a>.</p> <p>But that&#39;s not all…</p> <h2 id=""><math><mn>211</mn><mo>×</mo></math> faster<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h2> <p>It&#39;s clearly better, but we couldn&#39;t stop ourselves. Having spotted the problem, and having found this solution, it has made us creative! We have noticed that we are running one query per chunk of kind <code>ChunkContent::Items</code>. If the linked chunk contains 100 chunks, it will run 101 queries.</p> <p>Then suddenly, <i>hit forehead with the hand&#39;s palm</i>, an idea pops! What if we could only use 2 queries for all scenarios!</p> <ol> <li>The first query would count all events for each chunk in <code>events_chunk</code> in one pass, and would store that in a <code>HashMap</code>,</li> <li>The second query would fetch all chunks also in one pass,</li> <li>Finally, Rust will fill the number of events for each chunk based on the data in the <code>HashMap</code>.</li> </ol> <p>The first query translates like so in Rust:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// The first query.</span></span> <span class="giallo-l"><span class="z-storage">let</span><span class="z-variable"> number_of_events_by_chunk_ids</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> transaction</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">prepare</span><span>(</span></span> <span class="giallo-l"><span class="z-string"> r#&quot;</span></span> <span class="giallo-l"><span class="z-string"> SELECT</span></span> <span class="giallo-l"><span class="z-string"> ec.chunk_id,</span></span> <span class="giallo-l"><span class="z-string"> COUNT(ec.event_id)</span></span> <span class="giallo-l"><span class="z-string"> FROM event_chunks as ec</span></span> <span class="giallo-l"><span class="z-string"> WHERE ec.linked_chunk_id = ?</span></span> <span class="giallo-l"><span class="z-string"> GROUP BY ec.chunk_id</span></span> <span class="giallo-l"><span class="z-string"> &quot;#</span><span>,</span></span> <span class="giallo-l"><span> )</span><span class="z-keyword z-operator">?</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">query_map</span><span>((</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">hashed_linked_chunk_id</span><span>,),</span><span class="z-keyword z-operator"> |</span><span class="z-variable">row</span><span class="z-keyword z-operator">|</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Ok</span><span>((</span></span> <span class="giallo-l"><span class="z-variable"> row</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">get</span><span class="z-keyword z-operator">::</span><span>&lt;</span><span class="z-variable">_</span><span>,</span><span class="z-entity z-name"> u64</span><span>&gt;(</span><span class="z-constant z-numeric">0</span><span>)</span><span class="z-keyword z-operator">?</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> row</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">get</span><span class="z-keyword z-operator">::</span><span>&lt;</span><span class="z-variable">_</span><span>,</span><span class="z-entity z-name"> usize</span><span>&gt;(</span><span class="z-constant z-numeric">1</span><span>)</span><span class="z-keyword z-operator">?</span></span> <span class="giallo-l"><span> ))</span></span> <span class="giallo-l"><span> })</span><span class="z-keyword z-operator">?</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">collect</span><span class="z-keyword z-operator">::</span><span>&lt;</span><span class="z-entity z-name">Result</span><span>&lt;</span><span class="z-entity z-name">HashMap</span><span>&lt;</span><span class="z-variable">_</span><span>,</span><span class="z-variable"> _</span><span>&gt;,</span><span class="z-variable"> _</span><span>&gt;&gt;()</span><span class="z-keyword z-operator">?</span><span>;</span></span></code></pre> <p>And the second query translates like so<sup class="footnote-reference" id="fr-simplified-code-1"><a href="#fn-simplified-code">2</a></sup>:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-variable">transaction</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">prepare</span><span>(</span></span> <span class="giallo-l"><span class="z-string"> r#&quot;</span></span> <span class="giallo-l"><span class="z-string"> SELECT</span></span> <span class="giallo-l"><span class="z-string"> lc.id,</span></span> <span class="giallo-l"><span class="z-string"> lc.previous,</span></span> <span class="giallo-l"><span class="z-string"> lc.next,</span></span> <span class="giallo-l"><span class="z-string"> lc.type</span></span> <span class="giallo-l"><span class="z-string"> FROM linked_chunks as lc</span></span> <span class="giallo-l"><span class="z-string"> WHERE lc.linked_chunk_id = ?</span></span> <span class="giallo-l"><span class="z-string"> &quot;#</span><span>,</span></span> <span class="giallo-l"><span> )</span><span class="z-keyword z-operator">?</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">query_map</span><span>((</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">hashed_linked_chunk_id</span><span>,),</span><span class="z-keyword z-operator"> |</span><span class="z-variable">row</span><span class="z-keyword z-operator">|</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Ok</span><span>((</span></span> <span class="giallo-l"><span class="z-variable"> row</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">get</span><span class="z-keyword z-operator">::</span><span>&lt;</span><span class="z-variable">_</span><span>,</span><span class="z-entity z-name"> u64</span><span>&gt;(</span><span class="z-constant z-numeric">0</span><span>)</span><span class="z-keyword z-operator">?</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> row</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">get</span><span class="z-keyword z-operator">::</span><span>&lt;</span><span class="z-variable">_</span><span>,</span><span class="z-entity z-name"> Option</span><span>&lt;</span><span class="z-entity z-name">u64</span><span>&gt;&gt;(</span><span class="z-constant z-numeric">1</span><span>)</span><span class="z-keyword z-operator">?</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> row</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">get</span><span class="z-keyword z-operator">::</span><span>&lt;</span><span class="z-variable">_</span><span>,</span><span class="z-entity z-name"> Option</span><span>&lt;</span><span class="z-entity z-name">u64</span><span>&gt;&gt;(</span><span class="z-constant z-numeric">2</span><span>)</span><span class="z-keyword z-operator">?</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> row</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">get</span><span class="z-keyword z-operator">::</span><span>&lt;</span><span class="z-variable">_</span><span>,</span><span class="z-entity z-name"> String</span><span>&gt;(</span><span class="z-constant z-numeric">3</span><span>)</span><span class="z-keyword z-operator">?</span><span>,</span></span> <span class="giallo-l"><span> ))</span></span> <span class="giallo-l"><span> })</span><span class="z-keyword z-operator">?</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">map</span><span>(</span><span class="z-keyword z-operator">|</span><span class="z-variable">metadata</span><span class="z-keyword z-operator">| -&gt;</span><span class="z-entity z-name"> Result</span><span>&lt;</span><span class="z-variable">_</span><span>&gt; {</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span> (</span><span class="z-variable">identifier</span><span>,</span><span class="z-variable"> previous</span><span>,</span><span class="z-variable"> next</span><span>,</span><span class="z-variable"> chunk_type</span><span>)</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> metadata</span><span class="z-keyword z-operator">?</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Let&#39;s use the `HashMap` from the first query here!</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> number_of_events</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> number_of_events_by_chunk_ids</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">get</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">id</span><span>)</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">copied</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">unwrap_or</span><span>(</span><span class="z-constant z-numeric">0</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name"> Ok</span><span>(</span><span class="z-entity z-name">ChunkMetadata</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> identifier</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> previous</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> next</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> number_of_events</span><span>,</span></span> <span class="giallo-l"><span> })</span></span> <span class="giallo-l"><span> })</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">collect</span><span class="z-keyword z-operator">::</span><span>&lt;</span><span class="z-entity z-name">Result</span><span>&lt;</span><span class="z-entity z-name">Vec</span><span>&lt;</span><span class="z-variable">_</span><span>&gt;,</span><span class="z-variable"> _</span><span>&gt;&gt;()</span></span></code></pre> <p>Only two queries. All tests are passing. Now let&#39;s see what the benchmark has to say!</p> <figure> <table> <thead> <tr> <th></th> <th title="0.95 confidence level">Lower bound</th> <th>Estimate</th> <th title="0.95 confidence level">Upper bound</th> </tr> </thead> <tbody> <tr> <td>Throughput</td> <td>4.1490 Melem/s</td> <td>4.1860 Melem/s</td> <td>4.2221 Melem/s</td> </tr> <tr> <td><math><msup><mi>R</mi><mn>2</mn></msup></math></td> <td>0.9961591</td> <td>0.9976310</td> <td>0.9960356</td> </tr> <tr> <td>Mean</td> <td>2.3670 ms</td> <td>2.3824 ms</td> <td>2.3984 ms</td> </tr> <tr> <td title="Standard Deviation">Std. Dev.</td> <td>16.065 µs</td> <td>26.872 µs</td> <td>31.871 µs</td> </tr> <tr> <td>Median</td> <td>2.3556 ms</td> <td>2.3801 ms</td> <td>2.4047 ms</td> </tr> <tr> <td title="Median Absolute Deviation">MAD</td> <td>3.8003 µs</td> <td>36.438 µs</td> <td>46.445 µs</td> </tr> </tbody> </table> <figcaption> <p>Benchmark&#39;s results for the two queries approach.</p> </figcaption> </figure> <details> <summary> <p>The Probability Distribution Function graph, and the Linear Regression graph for the two queries approach</p> </summary> <figure> <p><a href="https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/./3-pdf.svg"><img src="https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/./3-pdf.svg" alt="Probability distribution function" loading="lazy" decoding="async" /></a></p> <figcaption> <p>Benchmark&#39;s Probability Distribution Function for the two queries approach.</p> </figcaption> </figure> <figure> <p><a href="https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/./3-linear-regression.svg"><img src="https://mnt.io/articles/from-19k-to-4-2m-events-per-sec-story-of-a-sqlite-query-optimisation/./3-linear-regression.svg" alt="Linear regression" loading="lazy" decoding="async" /></a></p> <figcaption> <p>Benchmark&#39;s Linear Regression for the two queries approach.</p> </figcaption> </figure> </details> <p><strong>It is <math><mn>16.7</mn><mo>×</mo></math> faster compared to the previous solution, so <math><mn>211</mn><mo>×</mo></math> faster than the first query! We went from 502ms to 2ms. That&#39;s mental! From a throughput of 19.9 Kelem/s to 4.2 Melem/s!</strong> <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-rust-sdk/pull/5425">You can see the patches containing the improvement</a>.</p> <p>The throughput is measured by <em>element</em>, where an <em>element</em> here represents a Matrix event. Consequently, 4 Melem/s means 4 millions events per second, which means that <code>load_all_chunks_metadata</code> can do its computation at a rate of 4 millions events per second.</p> <p>I think we can stop here. Performance are finally acceptable.</p> <h2 id="-1">Lessons<a role="presentation" class="anchor" href="#-1" title="Anchor link to this header">#</a> </h2> <ul> <li><a rel="noopener external" target="_blank" href="https://bheisler.github.io/criterion.rs/book/index.html">Write benchmarks (with Criterion)</a>.</li> <li>Run benchmarks.</li> <li>Be aware of <a rel="noopener external" target="_blank" href="https://sqlite.org/queryplanner-ng.html">the SQL query planner</a>.</li> <li>Be careful with joins.</li> <li>Know your data.</li> <li>Take a step back and count.</li> <li>SQLite is fast.</li> </ul> <p>Notice how the SQL tables layout didn&#39;t change. Notice how the <code>LinkedChunk</code> implementation didn&#39;t change. Only the SQL queries have changed, and it has dramatically improved the situation.</p> <p>This is joint effort between <a rel="noopener external" target="_blank" href="https://bouvier.cc/">Benjamin Bouvier</a>, <a rel="noopener external" target="_blank" href="https://github.com/poljar">Damir Jelić</a> and I.</p> <section class="footnotes"> <ol class="footnotes-list"> <li id="fn-power-user"> <p>We consider a <em>power-user</em> a user with more than 2000 rooms. I hear your laugth! But guess what? We have users with more than 4000 rooms. And I&#39;m excluding bots here. The Matrix Rust SDK can be used to develop bots, which can sit in thousands and thousands rooms easily. That said: we have to be performant. <a href="#fr-power-user-1">↩</a></p> </li> <li id="fn-simplified-code"> <p>The code has been simplified a little bit. In reality, basic Rust types, like <code>u64</code> or <code>Option&lt;u64&gt;</code>, are mapped to linked chunk&#39;s types. <a href="#fr-simplified-code-1">↩</a></p> </li> </ol> </section> Sliding Sync at the Matrix Conference 2024-10-30T00:00:00+00:00 2024-10-30T00:00:00+00:00 Unknown https://mnt.io/articles/sliding-sync-at-the-matrix-conference/ <p>Berlin. <time datetime="2024-09-21 10:00">Saturday, September 21, 2024. 10am</time>. I was live on stage and broadcasted on Internet, to talk about (Simplified) Sliding Sync, the next sync mechanism for Matrix, at the first <a rel="noopener external" target="_blank" href="https://2024.matrix.org/">Matrix Conference</a>.</p> <p><a rel="noopener external" target="_blank" href="https://matrix.org/">Matrix</a> is an open network for secure, decentralised communication. It is an important technology for Internet.</p> <p>Matrix is a protocol. Everyone can implement it: either by providing its own server and connect it to the federation, or by providing its own client and connect it to the federation too. Nobody has a full control over the network, and nobody controls the clients nor the servers. And yet, end-to-end encryption is working, synchronisation is working, and everybody can talk to everybody, communities organize themselves, the network grows and grows.</p> <p>I am working at <a rel="noopener external" target="_blank" href="https://element.io/">Element</a> since 2 years now. I am paid to work on the [Matrix Rust SDK], a project owned by the Matrix organisation. Everything we do is available to the entire Matrix community, not only for Element. Well, this is the open source world.</p> <p>Matrix previous synchronisation mechanism is slow and inefficient. To put Matrix on the hands of everyone for a daily pleasant usage, we have started to experiment with a new sync mechanism, called Sliding Sync. The MSC —which stands for Matrix Spec Changes, like RFC for example—, so the <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-spec-proposals/blob/kegan/sync-v3/proposals/3575-sync.md">MSC3575</a> was our experimental foundation to play with a new sync mechanism. After many sweat and tears, we ultimately found a working pattern and design that fulfill a large majority of our usecases. Along the way, the implementation inside the Sliding Sync Proxy —a proxy that sits on the top of a homeserver<sup class="footnote-reference" id="fr-1-1"><a href="#fn-1">1</a></sup> to provide this new sync mechanism— was starting to feel really buggy and was really slow. It was time to clean up everything, including the MSC.</p> <p>Enter <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-spec-proposals/blob/erikj/sss/proposals/4186-simplified-sliding-sync.md">MSC4186</a>, which is basically Simplified Sliding Sync. We have mostly removed features from <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-spec-proposals/blob/kegan/sync-v3/proposals/3575-sync.md">MSC3575</a>, so that the implementation on the server-side is much simpler and lighter. Simplified Sliding Sync is now implemented and enabled by default on <a rel="noopener external" target="_blank" href="https://github.com/element-hq/synapse/">Synapse</a>, one of the major homeserver implementations. Other homeservers have implemented MSC3575 and are working on supporting MSC4186.</p> <p>Sliding Sync has a huge impact on the overall user experience. Syncing is now fast and almost transparent. It also works linearly whether the user has 10 or 10'000 rooms.</p> <p>My talk can be viewed here:</p> <figure> <iframe class="youtube-player" src="https://www.youtube-nocookie.com/embed/kI2lSCVEunw" title="Simplified Sliding Sync, by Ivan Enderlin, at the Matrix Conference 2024, Berlin" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen loading="lazy"></iframe> <figcaption> <p>Video: Simplified Sliding Sync, by Ivan Enderlin, at the Matrix Conference 2024, Berlin</p> <p><a href="./slides.pdf">Download the slides as PDF (21MiB)</a></p> </figcaption> </figure> <h2 id="other-talks">Other talks<a role="presentation" class="anchor" href="#other-talks" title="Anchor link to this header">#</a> </h2> <p><a rel="noopener external" target="_blank" href="https://2024.matrix.org/watch/">All the talks are available online</a>, including talks from the public sector, like NATO, Sweden, French or German administrations… I encourage you to check the list! Nonetheless, I take the opportunity of this article to highlight some announcement talks, or technical (Matrix internals) talks, I've enjoyed.</p> <h3 id="matrix-2-0-and-the-launch-of-element-x">Matrix 2.0 and the launch of Element X!<a role="presentation" class="anchor" href="#matrix-2-0-and-the-launch-of-element-x" title="Anchor link to this header">#</a> </h3> <p>Two presentations for the price of one: <cite>Matrix 2.0 Is Here!</cite> by Matthew Hogdson. 10 years after the original launch of Matrix, and 5 years after Matrix 1.0, what a best anniversary to announce Matrix 2.0.</p> <figure> <iframe class="youtube-player" src="https://www.youtube-nocookie.com/embed/ZiRYdqkzjDU" title="Matrix 2.0 Is Here!, by Matthew Hogdson, at the Matrix Conference 2024, Berlin" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen loading="lazy"></iframe> <figcaption> <p>Video: Matrix 2.0 Is Here!, by Matthew Hogdson, at the Matrix Conference 2024, Berlin</p> <p><a rel="noopener external" target="_blank" href="https://2024.matrix.org/documents/talk_slides/LAB3%202024-09-20%2010_15%20Matthew%20-%20Matrix%202.0%20is%20Here_%20The%20Matrix%20Conference%20Keynote.pdf">View and download the slides</a></p> </figcaption> </figure> <p>The second video is <cite>Element X Launch!</cite> by Amandine Le Pape, Ștefan Ceriu, and Amsha Kalra. They present Element X, how it's been designed, developed, how it uses the Matrix Rust SDK, and where you can see awesome demos of Element X with Element Call and so on! It was a great moment for everyone working at Element and users!</p> <figure> <iframe class="youtube-player" src="https://www.youtube-nocookie.com/embed/gHyHO3xPfQU" title="Element X Launch!, by Amandine Le Pape, Ștefan Ceriu, and Amsha Kalra, at the Matrix Conference 2024, Berlin" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen loading="lazy"></iframe> <figcaption> <p>Video: Element X Launch!, by Amandine Le Pape, Ștefan Ceriu, and Amsha Kalra, at the Matrix Conference 2024, Berlin</p> <p><a rel="noopener external" target="_blank" href="https://2024.matrix.org/documents/talk_slides/LAB3%202024-09-20%2017_45%20Amandine%20Le%20Pape,%20Amsha%20Kalra,%20Stefan%20Ceriu%20-%20Element%20X%20Launch%20Complete%20Presentation.pdf">View and download the slides</a></p> </figcaption> </figure> <h3 id="unable-to-decrypt-this-mesage">Unable to decrypt this mesage<a role="presentation" class="anchor" href="#unable-to-decrypt-this-mesage" title="Anchor link to this header">#</a> </h3> <p><cite>Unable to decrypt this message</cite> by Kegan Dougal. This talk explains why one can see an <em>Unable To Decrypt</em> error while trying to view a message in Matrix. Most problems have been solved today, but the great message about this presentation is to show how hard it is (was!) to provide reliable end-to-end encryption over a federated network. One homeserver can be overused and then slowed down, or a connection between two servers can be broken, or one device lost its connectivity because it's used in the subway, or whatever. All these classes of problems are illustrated and explained. I liked it a lot because it gives a good sense of why end-to-end encryption is hard over a giant decentralised, federated network, with encryption keys being renewed frequently, and how problems have been solved.</p> <figure> <iframe class="youtube-player" src="https://www.youtube-nocookie.com/embed/FHzh2Y7BABQ" title="Unable to decrypt this message, by Kegan Dougal, at the Matrix Conference 2024, Berlin" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen loading="lazy"></iframe> <figcaption> <p>Video: Unable to decrypt this message, by Kegan Dougal, at the Matrix Conference 2024, Berlin</p> <p><a rel="noopener external" target="_blank" href="https://2024.matrix.org/documents/talk_slides/LAB4%202024-09-21%2014_30%20Kegan%20Dougal%20-%20Unable%20to%20decrypt%20this%20message.pdf">View and download the slides</a></p> </figcaption> </figure> <h3 id="news-from-the-matrix-rust-sdk">News from the Matrix Rust SDK<a role="presentation" class="anchor" href="#news-from-the-matrix-rust-sdk" title="Anchor link to this header">#</a> </h3> <p><cite>Strengthening the Base: Laying the Groundwork for a more robust Rust SDK</cite> by Benjamin Bouvier, a good friend! This talk explains the recent updates of the Matrix Rust SDK: how we have designed new API to make the developer experience easier and more robust.</p> <figure> <iframe class="youtube-player" src="https://www.youtube-nocookie.com/embed/KOaoZKc1tgo" title="Strengthening the Base: Laying the Groundwork for a more robust Rust SDK, by Benjamin Bouvier, at the Matrix Conference 2024, Berlin" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen loading="lazy"></iframe> <figcaption> <p>Video: Strengthening the Base: Laying the Groundwork for a more robust Rust SDK, by Benjamin Bouvier, at the Matrix Conference 2024, Berlin</p> <p><a rel="noopener external" target="_blank" href="https://2024.matrix.org/documents/talk_slides/LAB3%202024-09-20%2011_15%20Benjamin%20Bouvier%20-%20Rust%20SDK%20Foundation.pdf">View and download the slides</a></p> </figcaption> </figure> <h2 id="about-transport">About transport<a role="presentation" class="anchor" href="#about-transport" title="Anchor link to this header">#</a> </h2> <p>I currently live in Switzerland. The conference was in Germany. Europe has a fantastic rail network, and more importantly, a unique <strong>night train</strong> network!</p> <p>Going there by plane would have generated 1'344 <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Global_warming_potential">kg CO<sub>2</sub>eq</a>, be 68% of my annual carbon budget (for a sustainable world, we should all be at 2'000 kg maximum). Taking the train however has generated <strong>6.5 kg CO<sub>2</sub>eq, be 0.33% of my annual carbon budget</strong>. It's 206 times less than the plane!</p> <p>If you are curious <a rel="noopener external" target="_blank" href="https://back-on-track.eu/night-train-map/">to try night train, you can check this map</a> that lists all possible connections, stops, companies operating the trains etc. Taking the night train is a nice way of travelling, and it saves a lot of emissions.</p> <p>I've taken a regular day train to go to Berlin, and a night train to come back home.</p> <section class="footnotes"> <ol class="footnotes-list"> <li id="fn-1"> <p>A <em>homeserver</em> in the Matrix terminology is simply a Matrix server. <a href="#fr-1-1">↩</a></p> </li> </ol> </section> Building a new site! 2024-10-08T00:00:00+00:00 2024-10-08T00:00:00+00:00 Unknown https://mnt.io/articles/building-a-new-site/ <p>The time has come. I needed to rewrite my site from scratch. It was first implemented with <a rel="noopener external" target="_blank" href="https://github.com/jekxyl/jekxyl">Jekxyl</a>, a static site generator written with <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Xyl">the XYL language</a>, a language I've developed inside <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Central">Hoa</a>. I've migrated my blog to <a rel="noopener external" target="_blank" href="https://wordpress.com">WordPress.com</a> when <a href="https://mnt.io/articles/bye-bye-liip-hello-automattic/">I was working there</a>. The <a rel="noopener external" target="_blank" href="https://github.com/WordPress/gutenberg">Gutenberg editor</a> is really great, but there is no great support for <code>&lt;code&gt;</code>. Plus, the theme I was using was pretty heavy. The homepage was 1.15MiB! A simple article was 1.9MiB. Clearly not really efficient. I wanted something more customisable, something light, something I can hack, and more importantly, I wanted to start series.</p> <h2 id="enter-zola">Enter Zola<a role="presentation" class="anchor" href="#enter-zola" title="Anchor link to this header">#</a> </h2> <p><a rel="noopener external" target="_blank" href="https://www.getzola.org/">Zola</a> is a static site generator written in <a rel="noopener external" target="_blank" href="https://www.rust-lang.org/">Rust</a>. It uses <a rel="noopener external" target="_blank" href="https://commonmark.org/">CommonMark</a> for the markup, which is nice and straightforward to use. The template system is powerful and simple. Zola can build 34 pages in 392ms at the time of writing, I consider this is fast.</p> <p>Nothing particular to say. It's a boring tool, which is great compliment. It just works! In a couple of hours, I was able to get everything up and running.</p> <h2 id="site-s-features">Site's features<a role="presentation" class="anchor" href="#site-s-features" title="Anchor link to this header">#</a> </h2> <p>The site contains articles and series. A series is composed of several episodes. That's it. The URL patterns are the followings:</p> <ul> <li><code>/articles/&lt;article-id&gt;/</code> to view an article,</li> <li><code>/series/&lt;series-id&gt;/</code> to view all episodes of a series,</li> <li><code>/series/&lt;series-id&gt;/&lt;episode-id&gt;/</code> to view a particular episode of a series.</li> </ul> <h3 id="homepage">Homepage<a role="presentation" class="anchor" href="#homepage" title="Anchor link to this header">#</a> </h3> <p>The homepage provides:</p> <ul> <li>the latest series, and</li> <li>pinned articles.</li> </ul> <p>To <em>pin</em> an article, I add the following TOML declarations in the frontmatter of an article:</p> <pre class="giallo z-code"><code data-lang="toml"><span class="giallo-l"><span>[</span><span class="z-entity z-name">extra</span><span>]</span></span> <span class="giallo-l"><span class="z-variable">pinned</span><span class="z-punctuation z-separator"> =</span><span class="z-constant z-language"> true</span></span></code></pre> <p>This <code>pinned</code> declaration is not recognised by Zola: the <code>[extra]</code> section contains user-defined values. Then, it's a matter of filtering by this value in the template:</p> <pre class="giallo z-code"><code data-lang="html"><span class="giallo-l"><span>{% for page in section.pages | filter(attribute = &quot;extra.pinned&quot;, value = true) -%}</span></span></code></pre> <p>That's a nice feature to promote some articles.</p> <p>In comparison to WordPress.com, the new homepage is 36.8KiB, that's 31 times less!</p> <h3 id="articles">Articles<a role="presentation" class="anchor" href="#articles" title="Anchor link to this header">#</a> </h3> <p>An article has some metadata like:</p> <ul> <li>the publishing time,</li> <li>the reading time,</li> <li>keywords,</li> <li>edition.</li> </ul> <p>If you read this article in October 2024, you might see all that in this very article. The beauty of this hides in the source code though:</p> <pre class="giallo z-code"><code data-lang="html"><span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin">&lt;</span><span class="z-entity z-name z-tag">main</span><span class="z-entity z-other z-attribute-name"> vocab</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;https://schema.org&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">article</span><span class="z-entity z-other z-attribute-name"> class</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;article&quot;</span><span class="z-entity z-other z-attribute-name"> typeof</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;Article&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">header</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">h1</span><span class="z-entity z-other z-attribute-name"> property</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;name&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>Building a new site!</span><span class="z-punctuation z-definition z-tag z-begin">&lt;/</span><span class="z-entity z-name z-tag">h1</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">div</span><span class="z-entity z-other z-attribute-name"> class</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;metadata&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">time</span><span class="z-entity z-other z-attribute-name"> title</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;Published date&quot;</span><span class="z-entity z-other z-attribute-name"> datetime</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;2024-10-08&quot;</span><span class="z-entity z-other z-attribute-name"> property</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;datePublished&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>October 08, 2024</span><span class="z-punctuation z-definition z-tag z-begin">&lt;/</span><span class="z-entity z-name z-tag">time</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">span</span><span class="z-entity z-other z-attribute-name"> title</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;Reading time&quot;</span><span class="z-entity z-other z-attribute-name"> property</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;timeRequired&quot;</span><span class="z-entity z-other z-attribute-name"> content</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;PT2M&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>2 minutes read</span><span class="z-punctuation z-definition z-tag z-begin">&lt;/</span><span class="z-entity z-name z-tag">span</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">span</span><span class="z-entity z-other z-attribute-name"> title</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;Keywords&quot;</span><span class="z-entity z-other z-attribute-name"> property</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;keywords&quot;</span><span class="z-entity z-other z-attribute-name"> content</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;rust, site&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span> Keywords:</span><span class="z-constant z-character">&amp;nbsp;</span><span class="z-punctuation z-definition z-tag z-begin">&lt;</span><span class="z-entity z-name z-tag">a</span><span class="z-entity z-other z-attribute-name"> href</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;/keywords/rust&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>rust</span><span class="z-punctuation z-definition z-tag z-begin">&lt;/</span><span class="z-entity z-name z-tag">a</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>, </span><span class="z-punctuation z-definition z-tag z-begin">&lt;</span><span class="z-entity z-name z-tag">a</span><span class="z-entity z-other z-attribute-name"> href</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;/keywords/site&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>site</span><span class="z-punctuation z-definition z-tag z-begin">&lt;/</span><span class="z-entity z-name z-tag">a</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>, </span><span class="z-punctuation z-definition z-tag z-begin">&lt;</span><span class="z-entity z-name z-tag">a</span><span class="z-entity z-other z-attribute-name"> href</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;/keywords/rdfa&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>rdfa</span><span class="z-punctuation z-definition z-tag z-begin">&lt;/</span><span class="z-entity z-name z-tag">a</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;/</span><span class="z-entity z-name z-tag">span</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">span</span><span class="z-punctuation z-definition z-tag z-end z-punctuation z-definition z-tag z-begin">&gt;&lt;</span><span class="z-entity z-name z-tag">a</span><span class="z-entity z-other z-attribute-name"> href</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;https://github.com/Hywan/mnt.io/edit/main/content/articles</span><span class="z-constant z-character">&amp;#x2F;</span><span class="z-string">2024-10-08-building-a-new-site</span><span class="z-constant z-character">&amp;#x2F;</span><span class="z-string">index.md&quot;</span><span class="z-entity z-other z-attribute-name"> title</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;Submit a patch for this page&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>Edit</span><span class="z-punctuation z-definition z-tag z-begin">&lt;/</span><span class="z-entity z-name z-tag">a</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span> this page</span><span class="z-punctuation z-definition z-tag z-begin">&lt;/</span><span class="z-entity z-name z-tag">span</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">meta</span><span class="z-entity z-other z-attribute-name"> property</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;description&quot;</span><span class="z-entity z-other z-attribute-name"> content</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;…&quot;</span><span class="z-punctuation z-definition z-tag z-end"> /&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;/</span><span class="z-entity z-name z-tag">div</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;/</span><span class="z-entity z-name z-tag">header</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> &lt;!-- … --&gt;</span></span></code></pre> <p>First off, the site uses HTML semantics as much as possible with <code>article</code>, <code>header</code>, <code>time</code>, <code>meta</code> etc. Second, you may also notice the <code>vocab</code>, <code>typeof</code>, <code>property</code> and <code>content</code> attributes. This is an extension to HTML for <a rel="noopener external" target="_blank" href="https://www.w3.org/TR/rdfa-primer/">Resource Description Framework in Attributes</a> (RDFa for short). This is common to add more semantics data to your content. It helps automated tools to analyse the content of a Web document, and makes sense of it. <a rel="noopener external" target="_blank" href="https://schema.org/">schema.org</a> is a collaborative effort to create schemas for structured data, and that's what I use in this site for the moment.</p> <p>Last neat thing: did you notice you can edit the page? The code lives on Github, and everyone is free to submit a patch!</p> <h3 id="series">Series<a role="presentation" class="anchor" href="#series" title="Anchor link to this header">#</a> </h3> <p>A series is pretty similar to an article, except that it adds another level of indirection with episodes.</p> <p>Similarly to articles with <code>pinned</code>, a series has its own metadata:</p> <pre class="giallo z-code"><code data-lang="toml"><span class="giallo-l"><span>[</span><span class="z-entity z-name">extra</span><span>]</span></span> <span class="giallo-l"><span class="z-variable">complete</span><span class="z-punctuation z-separator"> =</span><span class="z-constant z-language"> true</span></span></code></pre> <p><code>complete</code> indicates whether the series is complete or in progress.</p> <p>A series also has buttons to navigate to the previous or the next episodes. Nothing fancy, but it's fun to be able to do all that with Zola.</p> <p>The hierarchy is intuitive to understand, and it uses RDFa heavily too, for example a series overview with all its episodes looks like this:</p> <pre class="giallo z-code"><code data-lang="html"><span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin">&lt;</span><span class="z-entity z-name z-tag">main</span><span class="z-entity z-other z-attribute-name"> vocab</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;https://schema.org&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">section</span><span class="z-entity z-other z-attribute-name"> typeof</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;CreativeWorkSeries&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">h1</span><span class="z-entity z-other z-attribute-name"> property</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;name&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>From Rust to beyond</span><span class="z-punctuation z-definition z-tag z-begin">&lt;/</span><span class="z-entity z-name z-tag">h1</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> &lt;!-- … --&gt;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">h2</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>Episodes</span><span class="z-punctuation z-definition z-tag z-begin">&lt;/</span><span class="z-entity z-name z-tag">h2</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">div</span><span class="z-entity z-other z-attribute-name"> role</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;list&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">div</span><span class="z-entity z-other z-attribute-name"> role</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;listitem&quot;</span><span class="z-entity z-other z-attribute-name"> class</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;article-poster&quot;</span><span class="z-entity z-other z-attribute-name"> property</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;hasPart&quot;</span><span class="z-entity z-other z-attribute-name"> typeof</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;Article&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">a</span><span class="z-entity z-other z-attribute-name"> href</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;/series/from-rust-to-beyond/prelude/&quot;</span><span class="z-entity z-other z-attribute-name"> property</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;url&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>Episode 1 – </span><span class="z-punctuation z-definition z-tag z-begin">&lt;</span><span class="z-entity z-name z-tag">span</span><span class="z-entity z-other z-attribute-name"> property</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;name&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>Prelude</span><span class="z-punctuation z-definition z-tag z-begin">&lt;/</span><span class="z-entity z-name z-tag">span</span><span class="z-punctuation z-definition z-tag z-end z-punctuation z-definition z-tag z-begin">&gt;&lt;/</span><span class="z-entity z-name z-tag">a</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">div</span><span class="z-entity z-other z-attribute-name"> class</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;metadata&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-comment"> &lt;!-- … --&gt;</span><span> </span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;/</span><span class="z-entity z-name z-tag">div</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;/</span><span class="z-entity z-name z-tag">div</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">div</span><span class="z-entity z-other z-attribute-name"> role</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;listitem&quot;</span><span class="z-entity z-other z-attribute-name"> class</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;article-poster&quot;</span><span class="z-entity z-other z-attribute-name"> property</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;hasPart&quot;</span><span class="z-entity z-other z-attribute-name"> typeof</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;Article&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">a</span><span class="z-entity z-other z-attribute-name"> href</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;/series/from-rust-to-beyond/the-webassembly-galaxy/&quot;</span><span class="z-entity z-other z-attribute-name"> property</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;url&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>Episode 2 – </span><span class="z-punctuation z-definition z-tag z-begin">&lt;</span><span class="z-entity z-name z-tag">span</span><span class="z-entity z-other z-attribute-name"> property</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;name&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span><span>The WebAssembly galaxy</span><span class="z-punctuation z-definition z-tag z-begin">&lt;/</span><span class="z-entity z-name z-tag">span</span><span class="z-punctuation z-definition z-tag z-end z-punctuation z-definition z-tag z-begin">&gt;&lt;/</span><span class="z-entity z-name z-tag">a</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;</span><span class="z-entity z-name z-tag">div</span><span class="z-entity z-other z-attribute-name"> class</span><span class="z-punctuation z-separator">=</span><span class="z-string">&quot;metadata&quot;</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"><span class="z-comment"> &lt;!-- … --&gt;</span><span> </span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag z-begin"> &lt;/</span><span class="z-entity z-name z-tag">div</span><span class="z-punctuation z-definition z-tag z-end">&gt;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> &lt;!-- … --&gt;</span></span></code></pre> <p>First off, we use the <code>role</code> HTML attribute to change the semantics of some elements: here <code>div</code> become a <code>ul</code> or a <code>li</code>. Second, we use <code>typeof="CreativeWorkSeries"</code> to describe a series, which is composed of different parts: <code>property="hasPart"</code>. Each part is an article: <code>typeof="Article"</code>, which has its own semantics: <code>property="name"</code> etc. The markup is extremely simple but it contains all required information. HTML is really powerful, I'm not going to lie!</p> <h2 id="discuss">Discuss<a role="presentation" class="anchor" href="#discuss" title="Anchor link to this header">#</a> </h2> <p>One novelty is the <em>Discuss</em> menu item at the top of the site. It contains a link to a <a rel="noopener external" target="_blank" href="https://matrix.org/">Matrix</a> public room: <a rel="noopener external" target="_blank" href="https://matrix.to/#/#mnt_io:matrix.org">https://matrix.to/#/#mnt_io:matrix.org</a>, where anybody can come to talk about an article, a series, ask questions, or simply chill. You're very welcome there!</p> <h2 id="good-ol-web">Good ol' Web<a role="presentation" class="anchor" href="#good-ol-web" title="Anchor link to this header">#</a> </h2> <p>The site has a short CSS stylesheet written by hand with no framework (oh yeah). It weights 11KiB (uncompressed), heavy, I know.</p> <p>The site also has <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/RSS">RSS</a> and <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Atom_(web_standard)">Atom</a> feeds for syndications. It even has a blogroll under the <em>Recommandations</em> Section in the footer. Well, you get it, I'm nostalgic of the old Web. It's absolutely incredible what it is possible to achieve today with HTML and CSS without any frameworks or polyfills. So much resources are wasted nowadays…</p> <h2 id="personnages"><q>Personnages</q><a role="presentation" class="anchor" href="#personnages" title="Anchor link to this header">#</a> </h2> <p>The biggest novelty is <a href="https://mnt.io/lore/">the lore</a> I've developed for this new version of the site. Please, welcome 3 characters: <em>Le Compte</em>, <em>Le Factotum</em>, and <em>Le Procureur</em>.</p> <p>These characters will help to explain not trivial concepts by interacting with me. Let me copy the lore here.</p> <h3 id="le-comte">Le Comte<a role="presentation" class="anchor" href="#le-comte" title="Anchor link to this header">#</a> </h3> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>My name is <em>Le Comte</em>. I enjoy being the main character of this story. I am mostly here to learn, and to interrogate our dear author.</p> <p>My resources are unlimited. I am fortunate to have a fortune with a secret origin. If I want to understand something, I will work as hard as possible to try to make light on it. My new caprice is these new modern things people are calling <em>computers</em>. They seem really powerful and I want to learn everything about them!</p> <p>I often ask help to my Factotum for the dirty, and sometimes illegal tasks. I rarely ask help to Le Procureur, we don't really appreciate his presence.</p> </div> </div> <h3 id="le-factotum">Le Factotum<a role="presentation" class="anchor" href="#le-factotum" title="Anchor link to this header">#</a> </h3> <div class="conversation" data-character="factotum"> <div class="conversation--character"> <span lang="fr">Le Factotum</span> <picture role="presentation"> <source srcset="/image/factotum.avif" type="image/avif" /> <source srcset="/image/factotum.webp" type="image/webp" /> <img src="/image/factotum.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>My name is <em>Le Factotum</em>. It's a latin word, literally saying “do everything”. I'm here to assist Le Comte in its fancies.</p> <p>Even if I have an uneventful life now, Le Comte is partly aware of my smuggling past. He says no word about it, but he knows I kept contact with old friends across various countries and cultures. These relations are useful to Le Comte to achieve its quests to learn everything about computers.</p> <p>Fundamentally, when Le Comte wants to do something manky, he asks me the best way to do that. And I always have a solution.</p> </div> </div> <h3 id="le-procureur">Le Procureur<a role="presentation" class="anchor" href="#le-procureur" title="Anchor link to this header">#</a> </h3> <div class="conversation" data-character="procureur"> <div class="conversation--character"> <span lang="fr">Le Procureur</span> <picture role="presentation"> <source srcset="/image/procureur.avif" type="image/avif" /> <source srcset="/image/procureur.webp" type="image/webp" /> <img src="/image/procureur.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>My name is <em>Le Procureur</em>. I am the son of the Law and the Order. I know what is legal, and what is illegal. If an information is missing, a detail, an exactness, I know where to find the answer.</p> <p>Some people believe I am irritating, but I consider myself the defenser of discipline.</p> </div> </div> <h2 id="optimised-for-smallness-speed-semantics-and-fun">Optimised for smallness, speed, semantics and fun!<a role="presentation" class="anchor" href="#optimised-for-smallness-speed-semantics-and-fun" title="Anchor link to this header">#</a> </h2> <p>At this step, it should be clear the site has been optimised for smallness, speed and semantics. Even the fonts aren't custom: I use <a rel="noopener external" target="_blank" href="https://modernfontstacks.com/">Modern Font Stacks</a> to find a font stack that work on most computers. Two devices may not have the same look and feel for this site and that's perfectly fine. That's the nature of the Web.</p> <p>I encourage you <a rel="noopener external" target="_blank" href="https://github.com/Hywan/mnt.io">to read the source code of this site</a>, to fork it, to play with it, to get inspired by it. It's important to own your content, and to not give your work to other platforms.</p> <p>I really hope you'll enjoy the content I'm preparing. You can start with the first episode of the new series: <a href="https://mnt.io/series/reactive-programming-in-rust/observability/">Reactive programming in Rust, Observability</a>. See you there!</p> Observability 2024-09-19T00:00:00+00:00 2024-09-19T00:00:00+00:00 Unknown https://mnt.io/series/reactive-programming-in-rust/observability/ <p>Imagine a collection of values <code>T</code>. This collection can be updated by inserting new values, removing existing ones, or the collection can truncated, cleared… This collection acts as <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/vec/index.html">the standard <code>Vec</code></a>. However, there is a subtlety: This collection is <em>observable</em>. It is possible for someone to <em>subscribe</em> to this collection and to receive its updates.</p> <p>This observability pattern is the basis of reactive programming. It applies to any kind of type. Actually, it can be generalised as a single <code>Observable&lt;T&gt;</code> type. For collections though, we will see that an <code>ObservableVector&lt;T&gt;</code> type is more efficient.</p> <p>I’ve recently played a lot with this pattern as part of my work inside the <a rel="noopener external" target="_blank" href="https://github.com/matrix-org/matrix-rust-sdk">Matrix Rust SDK</a>, a set of Rust libraries that aim at developing robust <a rel="noopener external" target="_blank" href="https://matrix.org/">Matrix</a> clients or bridges. It is notoriously used by the next generation Matrix client developed by <a rel="noopener external" target="_blank" href="https://element.io/">Element</a>, namely <a rel="noopener external" target="_blank" href="https://element.io/labs/element-x">Element X</a>. The Matrix Rust SDK is cross-platform. Element X has two implementations: on iOS, iPadOS and macOS with Swift, and on Android with Kotlin. Both languages are using our Rust bindings to <a rel="noopener external" target="_blank" href="https://www.swift.org/">Swift</a> and <a rel="noopener external" target="_blank" href="https://kotlinlang.org/">Kotlin</a>. This is the story for another series (how we have automated this, how we support asynchronous flows from Rust to foreign languages etc.), but for the moment, let’s keep focus on reactive programming.</p> <p>Taking the Element X use case, the room list –which is the central piece of the app– is fully dynamic:</p> <ul> <li>Rooms are sorted by recency, so rooms move to the top when a new interesting message is received,</li> <li>The list can be filtered by room properties (one can filter by group or people, favourites, unreads, invites…),</li> <li>The list is also searchable by room names.</li> </ul> <p>The rooms exposed by the room list are stored in a unique <em>observable</em> type. Why is it dynamic? Because the app continuously sync new data that update the internal state: when a room gets an update from the network, the room list is automatically updated. The beauty of it: we have nothing to do. Sorters and filters are run automatically. Why? Spoiler: because everything is a <code>Stream</code>.</p> <p>Thanks to the Rust async model, every part is lazy. The app never needs to ask for Rust if a new update is present. It literally just waits for them.</p> <p>I believe this reactive programming approach is pretty interesting to explore. And this is precisely the goal of this series. We are going to play with <code>Stream</code> a lot, with higher-order <code>Stream</code> a lot more, and w…</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>Hold on a second! I believe this first step is a bit steep for someone who's not familiar with asynchronous code in Rust, don't you think?</p> <p>Before digging in the implementation details you are obviously eager to share, maybe we can start with examples.</p> </div> </div> <p>Alrighty. Fair. Before digging into the really fun bits, we need some basis.</p> <h2 id="baby-steps-with-reactive-programming">Baby steps with reactive programming<a role="presentation" class="anchor" href="#baby-steps-with-reactive-programming" title="Anchor link to this header">#</a> </h2> <p>Everything we are going to share with you has been implemented in <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball">a library called <code>eyeball</code></a>. To give you a good idea of what reactive programming in Rust can look like, let's create a Rust program:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo new --bin playground</span></span> <span class="giallo-l"><span> Creating binary (application) `playground` package</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cd playground</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo add eyeball</span></span> <span class="giallo-l"><span> Updating crates.io index</span></span> <span class="giallo-l"><span> Adding eyeball v0.8.8 to dependencies</span></span> <span class="giallo-l"><span> Features:</span></span> <span class="giallo-l"><span> - __bench</span></span> <span class="giallo-l"><span> - async-lock</span></span> <span class="giallo-l"><span> - tracing</span></span> <span class="giallo-l"><span> Updating crates.io index</span></span> <span class="giallo-l"><span> Locking 3 packages to latest compatible versions</span></span></code></pre><pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// in `src/main.rs`</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> eyeball</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Observable</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">fn</span><span class="z-entity z-name z-function"> main</span><span>() {</span></span> <span class="giallo-l"><span class="z-storage"> let mut</span><span class="z-variable"> observable</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>(</span><span class="z-constant z-numeric">7</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> subscriber</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">subscribe</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">observable</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-entity z-name">Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">get</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">observable</span><span>));</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-variable">subscriber</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">get</span><span>());</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">set</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-storage">mut</span><span class="z-variable"> observable</span><span>,</span><span class="z-constant z-numeric"> 13</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-entity z-name">Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">get</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">observable</span><span>));</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-variable">subscriber</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">get</span><span>());</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>What do we see here? First off, <code>observable</code> is an observable value. Proof is: It is possible to subscribe to it, see <code>subscriber</code>. Both <code>observable</code> and <code>subscriber</code> are seeing the same initial value: 7. When <code>observable</code> receives a new value, 13, both <code>observable</code> and <code>subscriber</code> are seeing the updated value. Let's take it for a spin:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo run --quiet</span></span> <span class="giallo-l"><span>[src/main.rs:8:5] Observable::get(&amp;observable) = 7</span></span> <span class="giallo-l"><span>[src/main.rs:9:5] subscriber.get() = 7</span></span> <span class="giallo-l"><span>[src/main.rs:13:5] Observable::get(&amp;observable) = 13</span></span> <span class="giallo-l"><span>[src/main.rs:14:5] subscriber.get() = 13</span></span></code></pre> <p>Tadaa. Fantastic, isn't it?</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>I… I am… speechless? Is it <em>really</em> reactive programming? Where is the reactivity here? It seems like you've only shared a value between an <em>owner</em> and a <em>watcher</em>. You're calling them <em>observable</em> and <em>subscriber</em>, alright, but how is this thing reactive? I only see synchronous code for the moment.</p> </div> </div> <p>Hold on. You told me to start slow. You're right though: the <code>Observable</code> owns the value. The <code>Subscriber</code> is able to read the value from the <code>Observable</code>. However, <code>Subscriber::next</code> returns a <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/future/trait.Future.html"><code>Future</code></a>! Let's add this:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// in `src/main.rs`</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// …</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">fn</span><span class="z-entity z-name z-function"> main</span><span>() {</span></span> <span class="giallo-l"><span class="z-comment"> // …</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-variable">subscriber</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>);</span></span> <span class="giallo-l"><span>}</span></span></code></pre><pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo run --quiet</span></span> <span class="giallo-l"><span>error[E0728]: `await` is only allowed inside `async` functions and blocks</span></span> <span class="giallo-l"><span> --&gt; src/main.rs:16:28</span></span> <span class="giallo-l"><span> |</span></span> <span class="giallo-l"><span>3 | fn main() {</span></span> <span class="giallo-l"><span> | --------- this is not `async`</span></span> <span class="giallo-l"><span>...</span></span> <span class="giallo-l"><span>16 | dbg!(subscriber.next().await);</span></span> <span class="giallo-l"><span> | ^^^^^ only allowed inside `async` functions and blocks</span></span></code></pre> <p>Indeed. Almighty <code>rustc</code> is correct. The <code>main</code> function is not <code>async</code>. We need an asynchronous runtime. Let's use <a rel="noopener external" target="_blank" href="https://docs.rs/smol">the <code>smol</code> project</a>, I enjoy it a lot: it's a small, fast and well-written async runtime:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo add smol</span></span> <span class="giallo-l"><span> Updating crates.io index</span></span> <span class="giallo-l"><span> Adding smol v2.0.2 to dependencies</span></span> <span class="giallo-l"><span> [ … snip … ]</span></span></code></pre> <p>Now let's modify our <code>main</code> function a little bit:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// in `src/main.rs`</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> eyeball</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Observable</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">fn</span><span class="z-entity z-name z-function"> main</span><span>() {</span></span> <span class="giallo-l"><span class="z-entity z-name"> smol</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">block_on</span><span>(</span><span class="z-keyword">async</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> let mut</span><span class="z-variable"> observable</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>(</span><span class="z-constant z-numeric">7</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> let mut</span><span class="z-variable"> subscriber</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">subscribe</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">observable</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-entity z-name">Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">get</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">observable</span><span>));</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-variable">subscriber</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">get</span><span>());</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">set</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-storage">mut</span><span class="z-variable"> observable</span><span>,</span><span class="z-constant z-numeric"> 13</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-entity z-name">Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">get</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">observable</span><span>));</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-variable">subscriber</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">get</span><span>());</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-variable">subscriber</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>);</span></span> <span class="giallo-l"><span> })</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Please <code>rustc</code>, be nice…</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span>[src/main.rs:9:9] Observable::get(&amp;observable) = 7</span></span> <span class="giallo-l"><span>[src/main.rs:10:9] subscriber.get() = 7</span></span> <span class="giallo-l"><span>[src/main.rs:14:9] Observable::get(&amp;observable) = 13</span></span> <span class="giallo-l"><span>[src/main.rs:15:9] subscriber.get() = 13</span></span> <span class="giallo-l"><span>[src/main.rs:17:9] subscriber.next().await = Some(</span></span> <span class="giallo-l"><span> 13,</span></span> <span class="giallo-l"><span>)</span></span></code></pre> <p>Hurray!</p> <p>We can even have a bit more ergonomics by using <a rel="noopener external" target="_blank" href="https://docs.rs/smol-macros">the <code>smol-macros</code> crate</a> which sets up a default <a rel="noopener external" target="_blank" href="https://docs.rs/smol/2.0.2/smol/struct.Executor.html">async runtime <code>Executor</code></a> for us. It's useful in our case as we want to play with something else (reactive programming), and don't want to focus on the async runtime itself:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo add smol-macros macro_rules_attribute</span></span> <span class="giallo-l"><span> Updating crates.io index</span></span> <span class="giallo-l"><span> Adding smol-macros v0.1.1 to dependencies</span></span> <span class="giallo-l"><span> Adding macro_rules_attribute v0.2.0 to dependencies</span></span> <span class="giallo-l"><span> Features:</span></span> <span class="giallo-l"><span> - better-docs</span></span> <span class="giallo-l"><span> - verbose-expansions</span></span> <span class="giallo-l"><span> Updating crates.io index</span></span> <span class="giallo-l"><span> Locking 4 packages to latest compatible versions</span></span> <span class="giallo-l"><span> Adding macro_rules_attribute v0.2.0</span></span> <span class="giallo-l"><span> Adding macro_rules_attribute-proc_macro v0.2.0</span></span> <span class="giallo-l"><span> Adding paste v1.0.15</span></span> <span class="giallo-l"><span> Adding smol-macros v0.1.1</span></span></code></pre> <p>We will take the opportunity to improve our program a little bit. Let's spawn a <code>Future</code> that will continuously read new updates from the <code>subscriber</code>.</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> std</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">time</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Duration</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> eyeball</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Observable</span><span>;</span></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> macro_rules_attribute</span><span class="z-keyword z-operator">::</span><span>apply;</span></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> smol</span><span class="z-keyword z-operator">::</span><span>{</span><span class="z-entity z-name">Executor</span><span>,</span><span class="z-entity z-name"> Timer</span><span>};</span></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> smol_macros</span><span class="z-keyword z-operator">::</span><span>main;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>#[apply(main</span><span class="z-keyword z-operator">!</span><span>)]</span></span> <span class="giallo-l"><span class="z-keyword">async fn</span><span class="z-entity z-name z-function"> main</span><span>(</span><span class="z-variable">executor</span><span class="z-keyword z-operator">: &amp;</span><span class="z-entity z-name">Executor</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> let mut</span><span class="z-variable"> observable</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>(</span><span class="z-constant z-numeric">7</span><span>);</span></span> <span class="giallo-l"><span class="z-storage"> let mut</span><span class="z-variable"> subscriber</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">subscribe</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">observable</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Task that reads new updates from `observable`.</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> task</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> executor</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">spawn</span><span>(</span><span class="z-keyword">async move</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> while</span><span class="z-storage"> let</span><span class="z-entity z-name"> Some</span><span>(</span><span class="z-variable">new_value</span><span>)</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> subscriber</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-variable">new_value</span><span>);</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> });</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Now, let&#39;s update `observable`.</span></span> <span class="giallo-l"><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">set</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-storage">mut</span><span class="z-variable"> observable</span><span>,</span><span class="z-constant z-numeric"> 13</span><span>);</span></span> <span class="giallo-l"><span class="z-entity z-name"> Timer</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">after</span><span>(</span><span class="z-entity z-name">Duration</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">from_secs</span><span>(</span><span class="z-constant z-numeric">1</span><span>))</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">set</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-storage">mut</span><span class="z-variable"> observable</span><span>,</span><span class="z-constant z-numeric"> 17</span><span>);</span></span> <span class="giallo-l"><span class="z-entity z-name"> Timer</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">after</span><span>(</span><span class="z-entity z-name">Duration</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">from_secs</span><span>(</span><span class="z-constant z-numeric">1</span><span>))</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">set</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-storage">mut</span><span class="z-variable"> observable</span><span>,</span><span class="z-constant z-numeric"> 23</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Wait on the task.</span></span> <span class="giallo-l"><span class="z-variable"> task</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>The little <code>Timer::after</code> calls are here to pretend the values are coming from random events, for the moment. Let's run it again to see if we get the same result:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo run --quiet</span></span> <span class="giallo-l"><span>[src/main.rs:16:13] new_value = 13</span></span> <span class="giallo-l"><span>[src/main.rs:16:13] new_value = 17</span></span> <span class="giallo-l"><span>[src/main.rs:16:13] new_value = 23</span></span> <span class="giallo-l"><span>^C</span></span></code></pre> <p>Here we go, perfect! See, ah ha! It's async and nice now.</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>I believe I start to appreciate it. However, I foresee you might hide something behind these <code>Time::after</code>. Am I right?</p> <p>And this <code>task.await</code> at the end makes the program to never finish. It explains the need to send <a rel="noopener external" target="_blank" href="https://man.freebsd.org/cgi/man.cgi?query=signal">a <code>SIGINT</code> signal</a> to the program to interrupt it, right?</p> </div> </div> <p>You're slick. Indeed, I wanted to focus on the <code>observable</code> and the <code>subscriber</code>. Because there is a subtlety here. If the <code>Timer::after</code> are removed, only the last update will be displayed on the output by <code>dbg!</code>. And that's perfectly normal. The async runtime will execute all the <code>Observable::set(&amp;mut observable, new_value)</code> in a row, and then, once there is an await point, another task will have room to run. In this case, that's <code>subscriber.next().await</code>.</p> <p>The subscriber only receives the <strong>last</strong> update, and that's pretty important to understand. There is no buffer of all the previous updates here, no memory, no trace, <code>subscriber</code> returns the last value when it is called. Note that this is not always the case as we will see with <code>ObservableVector</code> later, but for the moment, that's the case.</p> <p>And yes, if we want the <code>task</code> to get a chance to consume more updates, we need to tell the executor we will wait while the current other tasks are waken up. To do that, we can use <a rel="noopener external" target="_blank" href="https://docs.rs/smol/2.0.2/smol/future/fn.yield_now.html">the <code>smol::yield_now</code> function</a>:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment"> // Now, let&#39;s update `observable`.</span></span> <span class="giallo-l"><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">set</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-storage">mut</span><span class="z-variable"> observable</span><span>,</span><span class="z-constant z-numeric"> 13</span><span>);</span></span> <span class="giallo-l"><span class="z-comment"> // Eh `executor`: `task` can run now, we will wait!</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> yield_now</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // More updates.</span></span> <span class="giallo-l"><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">set</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-storage">mut</span><span class="z-variable"> observable</span><span>,</span><span class="z-constant z-numeric"> 17</span><span>);</span></span> <span class="giallo-l"><span class="z-entity z-name"> Observable</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">set</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-storage">mut</span><span class="z-variable"> observable</span><span>,</span><span class="z-constant z-numeric"> 23</span><span>);</span></span> <span class="giallo-l"><span class="z-comment"> // Eh `executor`: _bis repetita placent_!</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> yield_now</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> drop</span><span>(</span><span class="z-variable">observable</span><span>)</span></span> <span class="giallo-l"><span class="z-variable"> task</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Let's see what happens:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo run --quiet</span></span> <span class="giallo-l"><span>[src/main.rs:14:13] new_value = 13</span></span> <span class="giallo-l"><span>[src/main.rs:14:13] new_value = 23</span></span></code></pre> <p>Eh, see, <code>new_value = 17</code> is <strong>not</strong> displayed, because the <code>observable</code> is updated but the <code>subscriber</code> is suspended by the executor. But the others are read, good good.</p> <p>Note that we are dropping the <code>observable</code>. Once it's dropped, the <code>subscriber</code> won't be able to read any value from it, so it's going to close itself, and the <code>task</code> will end. That's why waiting on the task with <code>task.await</code> will terminate this time. And thus, the program will finish gracefully.</p> <p>And that's it. That's the basis of reactive programming. Also note that <code>Subscriber&lt;T&gt;</code> implements <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/marker/trait.Send.html"><code>Send</code></a> and <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/marker/trait.Sync.html"><code>Sync</code></a> if <code>T</code> implements <code>Send</code> and <code>Sync</code>, i.e. if the observed type implements these traits. That's pretty useful actually: it is possible to send the subscriber in a different thread, and keep waiting for new updates.</p> <h2 id="attack-of-the-clones">Attack of the Clones<a role="presentation" class="anchor" href="#attack-of-the-clones" title="Anchor link to this header">#</a> </h2> <p>However, at the beginning of this episode, we were talking about a collection. Let's focus on <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/vec/index.html"><code>Vec</code></a>.</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>Why do we focus on <code>Vec</code> <em>only</em>? Why not <code>HashMap</code>, <code>HashSet</code>, <code>BTreeSet</code>, <code>BTreeMap</code>, <code>BinaryHeap</code>, <code>LinkedList</code> or even <code>VecDeque</code>? It seems a bit non-inclusive if you ask me. Are you aware there isn't only <code>Vec</code> in life?</p> </div> </div> <p>Well, the reason is simple: <code>Vec</code> is supported by <code>eyeball</code>. It's a matter of time and work to support other collections, it's definitely not impossible but you will see that it's not trivial neither to support all these collections for a simple reason: Did you notice that <code>Subscriber</code> produces an owned <code>T</code>? Not a <code>&amp;T</code>, but a <code>T</code>. That's because <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball/0.8.8/eyeball/struct.Subscriber.html#method.next-1"><code>Subscriber::next</code></a> requires <code>T: Clone</code>. It means that the observed value will be cloned every time it is broadcasted to a subscriber.</p> <p><a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/clone/trait.Clone.html">Cloning a value</a> may be expensive. Here we are manipulating <code>usize</code>, which is a primitive type, so it's all fine (it boils down to a <a rel="noopener external" target="_blank" href="https://en.cppreference.com/w/c/string/byte/memcpy"><code>memcpy</code></a>). But imagine an <code>Observable&lt;Vec&lt;BigType&gt;&gt;</code> where <code>BigType</code> is 512 bytes: the memory impact is going to be quickly noticeable. So th…</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>… Excuse my interruption! You know how I love reading books. I like defining myself as a bibliophile. Anyway. During my perusal of the <code>eyeball</code> documentation, I have found <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball/0.8.8/eyeball/struct.Subscriber.html#method.next_ref-1"><code>Subscriber::next_ref</code></a>. The documentation says:</p> <blockquote> <p>Wait for an update and get a read lock for the updated value.</p> </blockquote> <p>and later:</p> <blockquote> <p>You can use this method to get updates of an <code>Observable</code> where the inner type does not implement <code>Clone</code>.</p> </blockquote> </div> </div> <p>Can you stop cutting me off please? It's really unpleasant. And do not forget we are not alone… <i>doing sideways head movement</i></p> <p>You're right though. There is <code>Subscriber::next_ref</code>. However, if you are such an <em>assiduous reader</em>, you may have read the end of the documentation, aren't you?</p> <blockquote> <p>However, the <code>Observable</code> will be locked (not updateable) while any read guards are alive.</p> </blockquote> <p>Blocking the <code>Observable</code> might be tolerable in some cases, but it cannot be generalised to all use cases. A user is more likely to prefer <code>next</code> instead of <code>next_ref</code> by default.</p> <p>Back to our <code>Observable&lt;Vec&lt;BigType&gt;&gt;</code> then. Imagine the collection contains a lot of items: cloning the entire <code>Vec&lt;_&gt;</code> for every update to every subscriber is a pretty inefficient way of programming. Remember that, as a programmer, we have the responsibility to make our programs use as few resources as possible, so that hardwares can be used longer. The hardware is the most polluting segment of our digital world.</p> <p>So. How a data structure like <code>Vec</code> can be cloned cheaply? We could put it inside an <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/sync/struct.Arc.html"><code>Arc</code></a> right? Cloning an <em>Atomically Reference Counted</em> value is really cheap: <a rel="noopener external" target="_blank" href="https://github.com/rust-lang/rust/blob/f6bcd094abe174a218f7cf406e75521be4199f88/library/alloc/src/sync.rs#L2118-L2170">it increases the counter by 1 atomically</a>, the inner value is untouched. Nonetheless, we have a mutation problem now. If we have <code>Observable&lt;Arc&lt;Vec&lt;_&gt;&gt;&gt;</code>, it means that the subscribers will be <code>Subscriber&lt;Arc&lt;Vec&lt;_&gt;&gt;&gt;</code>. In this case, every time the observable wants to mutate the data, it is going to… be… impossible because an <code>Arc</code> is nothing less than a shared reference, and shared references in Rust disallow mutation by default. Using <code>Observable::set</code> will create a new <code>Arc</code>, but we cannot update the value <em>inside</em> the <code>Arc</code>, except if we use a lock… Well, we are adding more and more complexity.</p> <p><q lang="la">Spes salutis</q><sup class="footnote-reference" id="fr-spes_salutis-1"><a href="#fn-spes_salutis">1</a></sup>! Fortunately for us, <em>immutable data structures</em> exist in Rust.</p> <blockquote> <p>An immutable data structure is a data structure which can be copied and modified efficiently without altering the original.</p> </blockquote> <p>It can be modified. However, as soon as it is copied (or cloned), it is still possible to modify the copy but the original data is not modified. That's extremely powerful.</p> <p>Such structures bring many advantages, but one of them is <em>structural sharing</em>:</p> <blockquote> <p>If two data structures are mostly copies of each other, most of the memory they take up will be shared between them. This implies that making copies of an immutable data structure is cheap: it's really only a matter of copying a pointer and increasing a reference counter, where in the case of <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/vec/index.html"><code>Vec</code></a> you have to allocate the same amount of memory all over again and make a copy of every element it contains. For immutable data structures, extra memory isn't allocated until you modify either the copy or the original, and then only the memory needed to record the difference.</p> </blockquote> <p>Well, <i>taking a deep breath</i>, it sounds exactly like what we need to solve our issue, isn't it? The <code>Observable&lt;Immutable&lt;_&gt;&gt;</code> and the <code>Subscriber&lt;Immutable&lt;_&gt;&gt;</code>s will share the same value, with the observable being able to mutate its inner value. The subscribers can modify the received value too, in an efficient way, without conflicting with the value from the observable. Both values will continue to live on their side, but cloning the value is cheap.</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>Dare I ask how immutable data structures are implemented? It sounds like complex beasts.</p> <p>I mean… a naive implementation sounds <em>relatively doable</em> but I am guessing there is a lot of subtleties, possible conflicts, and many memory guarantees that I am not anticipating yet, right?</p> </div> </div> <p>Oh… <q lang="la">beati pauperes in spiritu</q><sup class="footnote-reference" id="fr-beati_pauperes_in_spiritu-1"><a href="#fn-beati_pauperes_in_spiritu">2</a></sup>… it is actually really complex. It may be a topic for another series or articles. For the moment, if you interested, let me redirect you to one research paper that proposes an immutable <code>Vec</code>: <cite>RRR Vector: A Practical General Immutable Sequence</cite><sup class="footnote-reference" id="fr-SRUB2015-1"><a href="#fn-SRUB2015">3</a></sup>. Be cool though, understanding this part is not necessary at all for what we are talking now. It's a great tool we are going to use, no matter how it works internally.</p> <p>Do you know the other good news? We don't have to implement it by ourselves, because some nice people already did it! Enter <a rel="noopener external" target="_blank" href="https://docs.rs/imbl">the <code>imbl</code> crate</a>. This crate provides <a rel="noopener external" target="_blank" href="https://docs.rs/imbl/3.0.0/imbl/struct.Vector.html">a <code>Vector</code> type</a>. It can be used like a regular <code>Vec</code>. (Side note: it's even smarter than a <code>Vec</code> because it implements smart head and tail chunking<sup class="footnote-reference" id="fr-UCR2014-1"><a href="#fn-UCR2014">4</a></sup>, and allocates in the stack or on the heap depending on the size of the collection, similarly to <a rel="noopener external" target="_blank" href="https://docs.rs/smallvec">the <code>smallvec</code> crate</a>. End of digression)</p> <h2 id="observable-immutable-collection">Observable (immutable) collection<a role="presentation" class="anchor" href="#observable-immutable-collection" title="Anchor link to this header">#</a> </h2> <p>The <code>imbl</code> crate then. It provides <a rel="noopener external" target="_blank" href="https://docs.rs/imbl/3.0.0/imbl/struct.Vector.html">a <code>Vector</code> type</a>. <code>eyeball</code> provides a crate for working with immutable data structures (how surprising huh?): <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im">this crate is <code>eyeball-im</code></a>.</p> <p>Instead of providing an <code>Observable&lt;T&gt;</code> type, it provides <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im/0.5.0/eyeball_im/struct.ObservableVector.html">an <code>ObservableVector&lt;T&gt;</code> type</a> which is a <code>Vector</code>, but an observable one! Let's see… what do we have… <i>scroll the documentation</i>, hmm, interesting, <i>scroll more…</i>, okay, that's interesting:</p> <ul> <li>First off, there is methods like <code>append</code>, <code>pop_back</code>, <code>pop_front</code>, <code>push_back</code>, <code>push_front</code>, <code>remove</code>, <code>insert</code>, <code>set</code>, <code>truncate</code> and <code>clear</code>. It seems this collection is pretty flexible. The vocabulary is clear. They all take a <code>&amp;mut self</code>, cool.</li> <li>Then, there is a <code>with_capacity</code> method, this is intriguing, <i>add to notes</i>,</li> <li>Finally, we find our not-so-ol' friend <code>subscribe</code>, but this time it returns a <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im/0.5.0/eyeball_im/struct.VectorSubscriber.html"><code>VectorSubscriber&lt;T&gt;</code></a>.</li> </ul> <p>Let's explore <code>VectorSubscriber</code> a bit more, would you? <i>Scroll the document</i>, contrary to <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball/0.8.8/eyeball/struct.Subscriber.html#method.next-1"><code>Subscriber::next</code></a>, there is no <code>next</code> method. How are we supposed to wait on an update?</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>Confer to the assiduous reader! If you read <em>carefully</em> the documentation of the <code>Subscriber::next</code> method, you will see:</p> <blockquote> <p>This method is a convenience so you don't have to import a <code>Stream</code> extension trait such as <code>futures::StreamExt</code> or <code>tokio_stream::StreamExt</code>.</p> </blockquote> </div> </div> <p>… fair enough. So <code>Subscriber::next</code> mimics <code>StreamExt::next</code>. Okay. Let's look at <a rel="noopener external" target="_blank" href="https://docs.rs/futures/0.3.30/futures/stream/trait.Stream.html"><code>Stream</code></a> first, it's from <a rel="noopener external" target="_blank" href="https://docs.rs/futures">the <code>futures</code> crate</a>. <code>Stream</code> defines itself as:</p> <blockquote> <p>A stream of values produced asynchronously.</p> <p>If <code>Future&lt;Output = T&gt;</code> is an asynchronous version of <code>T</code>, then <code>Stream&lt;Item = T&gt;</code> is an asynchronous version of <code>Iterator&lt;Item = T&gt;</code>. A stream represents a sequence of value-producing events that occur asynchronously to the caller.</p> <p>The trait is modeled after <code>Future</code>, but allows <code>poll_next</code> to be called even after a value has been produced, yielding None once the stream has been fully exhausted.</p> </blockquote> <p>We aren't going to teach everything about <code>Stream</code>: why this design, its pros and cons… However, <i>wave its hand to ask you to come closer</i>, did you notice how <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/future/trait.Future.html#tymethod.poll"><code>Future::poll</code></a> returns <code>Poll&lt;Self::Output&gt;</code>, whilst <a rel="noopener external" target="_blank" href="https://docs.rs/futures/0.3.30/futures/stream/trait.Stream.html#tymethod.poll_next"><code>Stream::poll_next</code></a> returns <code>Poll&lt;Option&lt;Self::Item&gt;&gt;</code>? It's really similar to <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/iter/trait.Iterator.html#tymethod.next"><code>Iterator::next</code></a> which returns <code>Option&lt;Self::Item&gt;</code>.</p> <p>Let's take a look at <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/task/enum.Poll.html"><code>Poll&lt;T&gt;</code></a> don't you mind? It's an enum with 2 variants:</p> <ul> <li><code>Ready(value)</code> means a <code>value</code> is immediately ready,</li> <li><code>Pending</code> means no value is ready yet.</li> </ul> <p>Then, what <code>Poll&lt;Option&lt;T&gt;&gt;</code> represents for a <code>Stream</code>?</p> <ul> <li><code>Poll::Ready(Some(value))</code> means this stream has successfully produced a <code>value</code>, and may produce more values on subsequent <code>poll_next</code> calls,</li> <li><code>Poll::Ready(None)</code> means the stream has terminated (and <code>poll_next</code> should not be called anymore),</li> <li><code>Poll::Pending</code> means no value is ready yet.</li> </ul> <p>It makes perfect sense. A <code>Future</code> produces a single value, whilst a <code>Stream</code> produces multiple values, and <code>Poll::Ready(None)</code> represents the termination of the stream, similarly to <code>None</code> to represent the termination of an <code>Iterator</code>. Ahh, I love consistency.</p> <p>We have the basis. Now let's see <a rel="noopener external" target="_blank" href="https://docs.rs/futures/0.3.30/futures/stream/trait.StreamExt.html"><code>StreamExt</code></a>. It's a trait extending <code>Stream</code> to add convenient combinator methods. Amongst other things, we find <a rel="noopener external" target="_blank" href="https://docs.rs/futures/0.3.30/futures/prelude/stream/trait.StreamExt.html#method.next"><code>StreamExt::next</code></a>! Ah ha! It returns a <code>Next</code> type which implements a <code>Future</code>, exactly what <code>eyeball</code> does actually. Remember our:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// from `main.rs`</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">while</span><span class="z-storage"> let</span><span class="z-entity z-name"> Some</span><span>(</span><span class="z-variable">new_value</span><span>)</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> subscriber</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-variable">new_value</span><span>);</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>It is exactly the same pattern with <code>StreamExt::next</code>:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// from the documentation of `StreamExt::Next`</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> futures</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">stream</span><span class="z-keyword z-operator">::</span><span>{</span><span class="z-variable z-language">self</span><span>,</span><span class="z-entity z-name"> StreamExt</span><span>};</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">let mut</span><span class="z-variable"> stream</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> stream</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">iter</span><span>(</span><span class="z-constant z-numeric">1</span><span class="z-keyword z-operator">..=</span><span class="z-constant z-numeric">3</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_eq!</span><span>(</span><span class="z-variable">stream</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>,</span><span class="z-entity z-name"> Some</span><span>(</span><span class="z-constant z-numeric">1</span><span>));</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_eq!</span><span>(</span><span class="z-variable">stream</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>,</span><span class="z-entity z-name"> Some</span><span>(</span><span class="z-constant z-numeric">2</span><span>));</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_eq!</span><span>(</span><span class="z-variable">stream</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>,</span><span class="z-entity z-name"> Some</span><span>(</span><span class="z-constant z-numeric">3</span><span>));</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_eq!</span><span>(</span><span class="z-variable">stream</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>,</span><span class="z-entity z-name"> None</span><span>);</span></span></code></pre> <p>Pieces start to come together, don't they?</p> <p>End of the detour. Back to <code>eyeball_im::VectorSubscriber&lt;T&gt;</code> . It is possible to transform this type into a <code>Stream</code> with its <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im/0.5.0/eyeball_im/struct.VectorSubscriber.html#method.into_stream"><code>into_stream</code></a> method. It returns a <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im/0.5.0/eyeball_im/struct.VectorSubscriberStream.html"><code>VectorSubscriberStream</code></a>. Naming is hard, but if I would have to guess, I would say it implements… a… <code>Stream</code>?</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// from `eyeball-im`</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">impl</span><span>&lt;</span><span class="z-entity z-name">T</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Clone</span><span class="z-keyword z-operator"> +</span><span class="z-entity z-name"> Send</span><span class="z-keyword z-operator"> +</span><span class="z-entity z-name"> Sync</span><span class="z-keyword z-operator"> +</span><span> &#39;</span><span class="z-entity z-name">static</span><span>&gt;</span><span class="z-entity z-name"> Stream</span><span class="z-keyword"> for</span><span class="z-entity z-name"> VectorSubscriberStream</span><span>&lt;</span><span class="z-entity z-name">T</span><span>&gt; {</span></span> <span class="giallo-l"><span class="z-storage"> type</span><span class="z-entity z-name"> Item</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> VectorDiff</span><span>&lt;</span><span class="z-entity z-name">T</span><span>&gt;;</span></span></code></pre> <p><a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im/0.5.0/eyeball_im/struct.VectorSubscriberStream.html#impl-Stream-for-VectorSubscriberStream%3CT%3E">Yes, it does</a>!</p> <p>Dust blown away, the puzzle starts to appear clearly. Let's back on coding!</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo add eyeball-im futures</span></span> <span class="giallo-l"><span> Updating crates.io index</span></span> <span class="giallo-l"><span> Adding eyeball-im v0.5.0 to dependencies</span></span> <span class="giallo-l"><span> Features:</span></span> <span class="giallo-l"><span> - serde</span></span> <span class="giallo-l"><span> - tracing</span></span> <span class="giallo-l"><span> Adding futures v0.3.30 to dependencies</span></span> <span class="giallo-l"><span> Features:</span></span> <span class="giallo-l"><span> + alloc</span></span> <span class="giallo-l"><span> + async-await</span></span> <span class="giallo-l"><span> + executor</span></span> <span class="giallo-l"><span> + std</span></span> <span class="giallo-l"><span> - bilock</span></span> <span class="giallo-l"><span> - cfg-target-has-atomic</span></span> <span class="giallo-l"><span> - compat</span></span> <span class="giallo-l"><span> - futures-executor</span></span> <span class="giallo-l"><span> - io-compat</span></span> <span class="giallo-l"><span> - thread-pool</span></span> <span class="giallo-l"><span> - unstable</span></span> <span class="giallo-l"><span> - write-all-vectored</span></span> <span class="giallo-l"><span> [ … snip … ]</span></span></code></pre><pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// in `src/main.rs`</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> eyeball_im</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">ObservableVector</span><span>;</span></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> futures</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">stream</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">StreamExt</span><span>;</span></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> macro_rules_attribute</span><span class="z-keyword z-operator">::</span><span>apply;</span></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> smol</span><span class="z-keyword z-operator">::</span><span>{</span><span class="z-entity z-name">future</span><span class="z-keyword z-operator">::</span><span>yield_now,</span><span class="z-entity z-name"> Executor</span><span>};</span></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> smol_macros</span><span class="z-keyword z-operator">::</span><span>main;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>#[apply(main</span><span class="z-keyword z-operator">!</span><span>)]</span></span> <span class="giallo-l"><span class="z-keyword">async fn</span><span class="z-entity z-name z-function"> main</span><span>(</span><span class="z-variable">executor</span><span class="z-keyword z-operator">: &amp;</span><span class="z-entity z-name">Executor</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> let mut</span><span class="z-variable"> observable</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> ObservableVector</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>();</span></span> <span class="giallo-l"><span class="z-comment"> // Subscribe to `observable` and get a `Stream`.</span></span> <span class="giallo-l"><span class="z-storage"> let mut</span><span class="z-variable"> subscriber</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">subscribe</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">into_stream</span><span>();</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Push one value.</span></span> <span class="giallo-l"><span class="z-variable"> observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;a&#39;</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Task that reads new updates from `observable`.</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> task</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> executor</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">spawn</span><span>(</span><span class="z-keyword">async move</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> while</span><span class="z-storage"> let</span><span class="z-entity z-name"> Some</span><span>(</span><span class="z-variable">new_value</span><span>)</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> subscriber</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-variable">new_value</span><span>);</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> });</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Now, let&#39;s update `observable`.</span></span> <span class="giallo-l"><span class="z-variable"> observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;b&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-comment"> // Eh `executor`: `task` can run now!</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> yield_now</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // More updates.</span></span> <span class="giallo-l"><span class="z-variable"> observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;c&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable"> observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;d&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-comment"> // Eh `executor`, same.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> yield_now</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> drop</span><span>(</span><span class="z-variable">observable</span><span>);</span></span> <span class="giallo-l"><span class="z-variable"> task</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Time to show off:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo run --quiet</span></span> <span class="giallo-l"><span>[src/main.rs:18:13] new_value = PushBack {</span></span> <span class="giallo-l"><span> value: &#39;a&#39;,</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"><span>[src/main.rs:18:13] new_value = PushBack {</span></span> <span class="giallo-l"><span> value: &#39;b&#39;,</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"><span>[src/main.rs:18:13] new_value = PushBack {</span></span> <span class="giallo-l"><span> value: &#39;c&#39;,</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"><span>[src/main.rs:18:13] new_value = PushBack {</span></span> <span class="giallo-l"><span> value: &#39;d&#39;,</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Do you see something new?</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>Hmm, indeed. With <code>Observable</code>, some values may “miss” because <code>Observable</code> and <code>Subscriber</code> have no buffer. The subscribers only return the current value when asked for. However, with <code>ObservableVector</code>, things are different: no missing values. There are all here. As if there… was a buffer!</p> <p>And the values returned by the subscriber are not the raw <code>T</code>: we see <code>PushBack</code>. It comes from, <i>check the documentation</i>, <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im/0.5.0/eyeball_im/enum.VectorDiff.html"><code>VectorDiff::PushBack</code></a>!</p> </div> </div> <p>Good eyes, well done.</p> <p>First off, that's correct that <code>PushBack</code> comes from <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im/0.5.0/eyeball_im/enum.VectorDiff.html"><code>VectorDiff</code></a>. Let's come back to this piece in a second: it is the cornerstone of the entire series, it deserves a bit of explanations.</p> <p>Second, yes, <code>VectorSubscriber</code> returns <strong>all values</strong>! There is actually a buffer. It's a bit annoying to continue with a <code>task</code> as we did so far, let's use <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/macro.assert_eq.html"><code>assert_eq!</code></a> instead.</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// in `src/main.rs`</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> eyeball_im</span><span class="z-keyword z-operator">::</span><span>{</span><span class="z-entity z-name">ObservableVector</span><span>,</span><span class="z-entity z-name"> VectorDiff</span><span>};</span></span> <span class="giallo-l"><span class="z-comment">// ^^^^^^^^^^ new!</span></span> <span class="giallo-l"><span class="z-comment">// …</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>#[apply(main</span><span class="z-keyword z-operator">!</span><span>)]</span></span> <span class="giallo-l"><span class="z-keyword">async fn</span><span class="z-entity z-name z-function"> main</span><span>(</span><span class="z-variable">_executor</span><span class="z-keyword z-operator">: &amp;</span><span class="z-entity z-name">Executor</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> let mut</span><span class="z-variable"> observable</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> ObservableVector</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>();</span></span> <span class="giallo-l"><span class="z-storage"> let mut</span><span class="z-variable"> subscriber</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">subscribe</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">into_stream</span><span>();</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Push one value.</span></span> <span class="giallo-l"><span class="z-variable"> observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;a&#39;</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> assert_eq!</span><span>(</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-variable">subscriber</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>),</span></span> <span class="giallo-l"><span class="z-entity z-name"> Some</span><span>(</span><span class="z-entity z-name">VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PushBack</span><span> {</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-string"> &#39;a&#39;</span><span> }),</span></span> <span class="giallo-l"><span> );</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Push another value.</span></span> <span class="giallo-l"><span class="z-variable"> observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;b&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable"> observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;c&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable"> observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;d&#39;</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> assert_eq!</span><span>(</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-variable">subscriber</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>),</span></span> <span class="giallo-l"><span class="z-entity z-name"> Some</span><span>(</span><span class="z-entity z-name">VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PushBack</span><span> {</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-string"> &#39;b&#39;</span><span> }),</span></span> <span class="giallo-l"><span> );</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> assert_eq!</span><span>(</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-variable">subscriber</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>),</span></span> <span class="giallo-l"><span class="z-entity z-name"> Some</span><span>(</span><span class="z-entity z-name">VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PushBack</span><span> {</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-string"> &#39;c&#39;</span><span> }),</span></span> <span class="giallo-l"><span> );</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> assert_eq!</span><span>(</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> dbg!</span><span>(</span><span class="z-variable">subscriber</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>),</span></span> <span class="giallo-l"><span class="z-entity z-name"> Some</span><span>(</span><span class="z-entity z-name">VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PushBack</span><span> {</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-string"> &#39;d&#39;</span><span> }),</span></span> <span class="giallo-l"><span> );</span></span> <span class="giallo-l"><span>}</span></span></code></pre><pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo run --quiet</span></span> <span class="giallo-l"><span>[src/main.rs:16:9] subscriber.next().await = Some(</span></span> <span class="giallo-l"><span> PushBack {</span></span> <span class="giallo-l"><span> value: &#39;a&#39;,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span>)</span></span> <span class="giallo-l"><span>[src/main.rs:26:9] subscriber.next().await = Some(</span></span> <span class="giallo-l"><span> PushBack {</span></span> <span class="giallo-l"><span> value: &#39;b&#39;,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span>)</span></span> <span class="giallo-l"><span>[src/main.rs:30:9] subscriber.next().await = Some(</span></span> <span class="giallo-l"><span> PushBack {</span></span> <span class="giallo-l"><span> value: &#39;c&#39;,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span>)</span></span> <span class="giallo-l"><span>[src/main.rs:34:9] subscriber.next().await = Some(</span></span> <span class="giallo-l"><span> PushBack {</span></span> <span class="giallo-l"><span> value: &#39;d&#39;,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span>)</span></span></code></pre> <p>Beautiful! However… the code is a bit verbose, isn't it? <i>Desperately waiting for an affirmative answer</i>, okay, okay, something you may not know about me: I love macros. There. I said it. Let's quickly craft one:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// in `src/main.rs`</span></span> <span class="giallo-l"><span class="z-comment">// before the `main` function</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function">macro_rules! assert_next_eq</span><span> {</span></span> <span class="giallo-l"><span> (</span><span class="z-keyword z-operator"> $</span><span class="z-variable">stream</span><span class="z-keyword z-operator">:</span><span class="z-variable">ident</span><span>,</span><span class="z-keyword z-operator"> $</span><span class="z-variable">expr</span><span class="z-keyword z-operator">:</span><span class="z-variable">expr</span><span class="z-keyword z-operator"> $</span><span>(,)</span><span class="z-keyword z-operator">?</span><span> )</span><span class="z-keyword z-operator"> =&gt;</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> assert_eq!</span><span>(</span><span class="z-entity z-name z-function">dbg!</span><span>(</span><span class="z-keyword z-operator"> $</span><span class="z-variable">stream</span><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">next</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-keyword">await</span><span>),</span><span class="z-entity z-name"> Some</span><span>(</span><span class="z-keyword z-operator"> $</span><span class="z-variable">expr</span><span> ));</span></span> <span class="giallo-l"><span> };</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>This macro does exactly what our <code>assert_eq!</code> was doing, except now it's shorter to use, and thus more pleasant. Don't believe me? See by yourself:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// in `src/main.rs`</span></span> <span class="giallo-l"><span class="z-comment">// at the end of the `main` function</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Push one value.</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;a&#39;</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_next_eq!</span><span>(</span><span class="z-variable">subscriber</span><span>,</span><span class="z-entity z-name"> VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PushBack</span><span> {</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-string"> &#39;a&#39;</span><span> });</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Push another value.</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;b&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;c&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;d&#39;</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_next_eq!</span><span>(</span><span class="z-variable">subscriber</span><span>,</span><span class="z-entity z-name"> VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PushBack</span><span> {</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-string"> &#39;b&#39;</span><span> });</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_next_eq!</span><span>(</span><span class="z-variable">subscriber</span><span>,</span><span class="z-entity z-name"> VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PushBack</span><span> {</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-string"> &#39;c&#39;</span><span> });</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_next_eq!</span><span>(</span><span class="z-variable">subscriber</span><span>,</span><span class="z-entity z-name"> VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PushBack</span><span> {</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-string"> &#39;d&#39;</span><span> });</span></span></code></pre> <p>There we go.</p> <p>Having a scientific and rigorous approach is important in our domain. We said <code>ObservableVector</code> seems to contain a buffer, and <code>VectorSubscriber</code> seems to pop values from this buffer. Let's play with that. I see two things to test:</p> <ol> <li>Modify the <code>ObservableVector</code>, and subscribe to it <em>after</em>: Does the subscriber receive the update before it was created?</li> <li>How many values the buffer can hold?</li> </ol> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-storage">let mut</span><span class="z-variable"> observable</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> ObservableVector</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>();</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Push a value before the subscriber exists.</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;a&#39;</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">let mut</span><span class="z-variable"> subscriber</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">subscribe</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">into_stream</span><span>();</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Push another value.</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;b&#39;</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_next_eq!</span><span>(</span><span class="z-variable">subscriber</span><span>,</span><span class="z-entity z-name"> VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PushBack</span><span> {</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-string"> &#39;b&#39;</span><span> });</span></span></code></pre> <p>If the <code>subscriber</code> receives <code>a</code>, it must fail, otherwise no error:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo run --quiet</span></span> <span class="giallo-l"><span>[src/main.rs:25:5] subscriber.next().await = Some(</span></span> <span class="giallo-l"><span> PushBack {</span></span> <span class="giallo-l"><span> value: &#39;b&#39;,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span>)</span></span></code></pre> <p>Look Ma', no error!</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>We have learned that a <code>VectorSubscriber</code> is aware of the new updates that are made once it exists. A <code>VectorSubscriber</code> is not aware of updates that happened before its creation.</p> <p>In the example, <code>VectorDiff::PushBack { value: 'a' }</code> is not received before <code>subscriber</code> was created. However, <code>VectorDiff::PushBack { value: 'b' }</code> is received because it happened after <code>subscriber</code> was created. It makes perfect sense.</p> <p>It suggests that the buffer lives inside <code>VectorSubscriber</code>, and not inside <code>ObservableVector</code>. Or maybe the buffer is shared between the observable and the subscribers, with the buffer having some specific semantics, like a <em>channel</em>. We would need to look at the implementation to be sure.</p> </div> </div> <p>Agree. This is left as an exercise for the reader, <i>wink to you</i>.</p> <p>We have an answer to question 1. What about question 2? The size of the buffer.</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// in `src/main.rs`</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">let mut</span><span class="z-variable"> observable</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> ObservableVector</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>();</span></span> <span class="giallo-l"><span class="z-storage">let mut</span><span class="z-variable"> subscriber</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">subscribe</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">into_stream</span><span>();</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Push ALL THE VALUES!</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;a&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;b&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;c&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;d&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;e&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;f&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;g&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;h&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;i&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;j&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;k&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;l&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;m&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;n&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;o&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;p&#39;</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_next_eq!</span><span>(</span><span class="z-variable">subscriber</span><span>,</span><span class="z-entity z-name"> VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PushBack</span><span> {</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-string"> &#39;a&#39;</span><span> });</span></span> <span class="giallo-l"><span class="z-comment">// no need to assert the others</span></span></code></pre><pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo run --quiet</span></span> <span class="giallo-l"><span>[src/main.rs:36:5] subscriber.next().await = Some(</span></span> <span class="giallo-l"><span> PushBack {</span></span> <span class="giallo-l"><span> value: &#39;a&#39;,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span>)</span></span></code></pre> <p>Hmm, the buffer doesn't seem to be full with 16 values. Let's add a couple more:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-comment">// in `src/main.rs`</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// [ … snip … ]</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;n&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;o&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;p&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;q&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-comment">// ^ new!</span></span> <span class="giallo-l"><span class="z-variable">observable</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-string">&#39;r&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-comment">// ^ new!</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_next_eq!</span><span>(</span><span class="z-variable">subscriber</span><span>,</span><span class="z-entity z-name"> VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PushBack</span><span> {</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-string"> &#39;a&#39;</span><span> });</span></span></code></pre><pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo run --quiet</span></span> <span class="giallo-l"><span>[src/main.rs:38:5] subscriber.next().await = Some(</span></span> <span class="giallo-l"><span> Reset {</span></span> <span class="giallo-l"><span> values: [</span></span> <span class="giallo-l"><span> &#39;a&#39;,</span></span> <span class="giallo-l"><span> &#39;b&#39;,</span></span> <span class="giallo-l"><span> &#39;c&#39;,</span></span> <span class="giallo-l"><span> &#39;d&#39;,</span></span> <span class="giallo-l"><span> &#39;e&#39;,</span></span> <span class="giallo-l"><span> &#39;f&#39;,</span></span> <span class="giallo-l"><span> &#39;g&#39;,</span></span> <span class="giallo-l"><span> &#39;h&#39;,</span></span> <span class="giallo-l"><span> &#39;i&#39;,</span></span> <span class="giallo-l"><span> &#39;j&#39;,</span></span> <span class="giallo-l"><span> &#39;k&#39;,</span></span> <span class="giallo-l"><span> &#39;l&#39;,</span></span> <span class="giallo-l"><span> &#39;m&#39;,</span></span> <span class="giallo-l"><span> &#39;n&#39;,</span></span> <span class="giallo-l"><span> &#39;o&#39;,</span></span> <span class="giallo-l"><span> &#39;p&#39;,</span></span> <span class="giallo-l"><span> &#39;q&#39;,</span></span> <span class="giallo-l"><span> &#39;r&#39;,</span></span> <span class="giallo-l"><span> ],</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span>)</span></span> <span class="giallo-l"><span>thread &#39;main&#39; panicked at src/main.rs:38:5:</span></span> <span class="giallo-l"><span>assertion `left == right` failed</span></span> <span class="giallo-l"><span> left: Some(Reset { values: [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;e&#39;, &#39;f&#39;, &#39;g&#39;, &#39;h&#39;, &#39;i&#39;, &#39;j&#39;, &#39;k&#39;, &#39;l&#39;, &#39;m&#39;, &#39;n&#39;, &#39;o&#39;, &#39;p&#39;, &#39;q&#39;, &#39;r&#39;] })</span></span> <span class="giallo-l"><span> right: Some(PushBack { value: &#39;a&#39; })</span></span> <span class="giallo-l"><span>note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace</span></span></code></pre> <p>Oh! An error, great! Our <code>assert_next_eq!</code> has failed. <code>subscriber</code> does not receive a <code>VectorDiff::PopBack</code> but a <code>VectorDiff::Reset</code>. Let's play with <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im/0.5.0/eyeball_im/struct.ObservableVector.html#method.with_capacity"><code>ObservableVector::with_capacity</code></a> a moment, maybe it's related to the buffer capacity? Let's change a single line:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-storage">let mut</span><span class="z-variable"> observable</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> ObservableVector</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">with_capacity</span><span>(</span><span class="z-constant z-numeric">32</span><span>);</span></span> <span class="giallo-l"><span class="z-comment">// ^^^^^^^^^^^^^^^^^ new!</span></span></code></pre><pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cargo run --quiet</span></span> <span class="giallo-l"><span>[src/main.rs:38:5] subscriber.next().await = Some(</span></span> <span class="giallo-l"><span> PushBack {</span></span> <span class="giallo-l"><span> value: &#39;a&#39;,</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span>)</span></span></code></pre><div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>We have learned that <code>ObservableVector::with_capacity</code> controls the size of the buffer.</p> <p>The name could suggest that it controls the capacity of the observed <code>Vector</code>, <em>à la</em> <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/vec/struct.Vec.html#method.with_capacity"><code>Vec::with_capacity</code></a>, but it must not be confused.</p> <p>For a reason we ignore so far, when the buffer is full, we receive a <code>VectorDiff::Reset</code>. We need to learn more about this type.</p> </div> </div> <h2 id="observable-differences">Observable differences<a role="presentation" class="anchor" href="#observable-differences" title="Anchor link to this header">#</a> </h2> <p>The previous section was explaining how immutable data structures could save us by cheaply and efficiently cloning the data between the observable and its subscribers. However, we see that <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im"><code>eyeball-im</code></a>, despite using <a rel="noopener external" target="_blank" href="https://docs.rs/imbl"><code>imbl</code></a>, does not share an <a rel="noopener external" target="_blank" href="https://docs.rs/imbl/3.0.0/imbl/struct.Vector.html"><code>imbl::Vector</code></a> but an <a rel="noopener external" target="_blank" href="https://docs.rs/eyeball-im/0.5.0/eyeball_im/enum.VectorDiff.html"><code>eyeball_im::VectorDiff</code></a>. Why such design? It looks like a drama. A betrayal. An act of treachery!</p> <p>Well. Firstly, <code>eyeball-im</code> is relying on some immutable properties of <code>Vector</code>. And secondly, the reason for which <code>VectorDiff</code> exists is simple. If a subscriber receives <code>Vector</code>s, how is the user able to see what has changed? The user (!) would be responsible to <em>calculate</em> the differences between 2 <code>Vector</code>s every time! Not only this is costly, but it is utterly error-prone.</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>Are you suggesting that <code>VectorSubscriber</code> (or <code>VectorSubscriberStream</code>) calculates the differences between the <code>Vector</code>s itself so that the user doesn't have to?</p> <p>I still see many problems though. I believe the order of the <code>VectorDiff</code>s matters a lot for some use cases. For example, let's consider two consecutive <code>Vector</code>s:</p> <ol> <li><code>['a', 'b', 'c']</code> and</li> <li><code>['a', 'c', 'b']</code>.</li> </ol> <p>Has <code>'b'</code> been removed and pushed back, or <code>'c'</code> been popped back and inserted? How can you decide between the twos?</p> </div> </div> <p>We can't —it would be implementation specifics anyway— and we don't want to. The user is manipulating the <code>ObservableVector</code> in a special way, and we should ideally not change that.</p> <p>These <code>VectorDiff</code> actually comes from <code>ObservableVector</code> itself! Let's look at the implementation of <a rel="noopener external" target="_blank" href="https://github.com/jplatte/eyeball/blob/4254403e385715380753bb0def20fb0398e91ebd/eyeball-im/src/vector.rs#L107-L114"><code>ObservableVector::push_back</code></a>:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub fn</span><span class="z-entity z-name z-function"> push_back</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-storage">mut</span><span class="z-variable z-language"> self</span><span>,</span><span class="z-variable"> value</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> T</span><span>) {</span></span> <span class="giallo-l"><span class="z-comment"> // [ … snip … ]</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable z-language"> self</span><span class="z-keyword z-operator">.</span><span>values</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push_back</span><span>(</span><span class="z-variable">value</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">clone</span><span>());</span></span> <span class="giallo-l"><span class="z-comment"> // ^^^^^^ this is a `Vector`!</span></span> <span class="giallo-l"><span class="z-variable z-language"> self</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">broadcast_diff</span><span>(</span><span class="z-entity z-name">VectorDiff</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PushBack</span><span> {</span><span class="z-variable"> value</span><span> });</span></span> <span class="giallo-l"><span class="z-comment"> // ^^^^^^^^^^ here you are…</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Each method adding or removing values on the <code>ObservableVector</code> emits its own <code>VectorDiff</code> variant. No calculation, it's purely a mapping:</p> <figure> <table><thead><tr><th><code>ObservableVector::…</code></th><th><code>VectorDiff::…</code></th><th>Meaning</th></tr></thead><tbody> <tr><td><code>append(values)</code></td><td><code>Append { values }</code></td><td>Append many <code>values</code></td></tr> <tr><td><code>clear()</code></td><td><code>Clear</code></td><td>Clear out all the values</td></tr> <tr><td><code>insert(index, value)</code></td><td><code>Insert { index, value }</code></td><td>Insert a <code>value</code> at <code>index</code> </td></tr> <tr><td><code>pop_back()</code></td><td><code>PopBack</code></td><td>Remove the value at the back</td></tr> <tr><td><code>pop_front()</code></td><td><code>PopFront</code></td><td>Remove the value at the front</td></tr> <tr><td><code>push_back(value)</code></td><td><code>PushBack { value }</code></td><td>Add <code>value</code> at the back</td></tr> <tr><td><code>push_front(value)</code></td><td><code>PushFront { value }</code></td><td>Add <code>value</code> at the front</td></tr> <tr><td><code>remove(index)</code></td><td><code>Remove { index }</code></td><td>Remove value at <code>index</code></td></tr> <tr><td><code>set(index, value)</code></td><td><code>Set { index, value }</code></td><td>Replace value at <code>index</code> by <code>value</code></td></tr> <tr><td><code>truncate(length)</code></td><td><code>Truncate { length }</code></td><td>Truncate to <code>length</code> values</td></tr> </tbody></table> <figcaption> <p>Mappings of <code>ObservableVector</code> methods to <code>VectorDiff</code> variants.</p> </figcaption> </figure> <p>See, for each <code>VectorDiff</code> variant, there is an <code>ObservableVector</code> method triggering it.</p> <div class="conversation" data-character="comte"> <div class="conversation--character"> <span lang="fr">Le Comte</span> <picture role="presentation"> <source srcset="/image/comte.avif" type="image/avif" /> <source srcset="/image/comte.webp" type="image/webp" /> <img src="/image/comte.png" loading="lazy" decoding="async" /> </picture> </div> <div class="conversation--message"> <p>And what about <code>VectorDiff::Reset</code>?</p> <p>We were receiving it when the buffer was full apparently. You are not mentioning it, and if I take a close look at <code>ObservableVector</code>'s documentation, I don't see any <code>reset</code> method. Is it only an internal thing?</p> </div> </div> <p>You are correct. When the buffer is full, the subscriber will provide a <code>VectorDiff::Reset { values }</code> where <code>values</code> is the full list of values. The documentation says:</p> <blockquote> <p>The subscriber lagged too far behind, and the next update that should have been received has already been discarded from the internal buffer.</p> </blockquote> <p>If the subscriber didn't catch all the updates, the best thing it can do is to say: <q>Okay, I am late at the party, I've missed several things, so here is the current state!</q>. This is not ideal, but the subscriber is responsible to not lag, and this design avoids having missing values. If a subscriber receives too much <code>VectorDiff::Reset</code>s, the user may consider increasing the capacity of the <code>ObservableVector</code>.</p> <h2 id="filtering-and-sorting-with-higher-order-streams">Filtering and sorting with higher-order <code>Stream</code>s<a role="presentation" class="anchor" href="#filtering-and-sorting-with-higher-order-streams" title="Anchor link to this header">#</a> </h2> <p>We are reaching the end of this episode. And you know what? We have set all the parts to talk about higher-order <code>Stream</code>, <i>chante victory and dance at the same time</i>!</p> <p>At the beginning of this episode, we were saying that the Matrix Rust SDK is able to filter and to sort an <code>ObservableVector</code> representing all the rooms. How? <code>VectorSubscriberStream</code> <em>is</em> a <code>Stream</code>. More specifically, it is a <code>Stream&lt;Item = VectorDiff&lt;T&gt;&gt;</code>. Now questions:</p> <ul> <li>What's the difference between an unfiltered <code>Vector</code> and a filtered <code>Vector</code>?</li> <li>What's the difference between an unsorted <code>Vector</code> and a sorted <code>Vector</code>?</li> <li>What's the difference between a filtered <code>Vector</code> and a sorted <code>Vector</code>?</li> <li>and so on.</li> </ul> <p>All of them are strictly <code>Stream&lt;Item = VectorDiff&lt;T&gt;&gt;</code>! However, the <code>VectorDiff</code>s aren't the same. A simple example. Let's say we build a vector by inserting <code>1</code>, <code>2</code>, <code>3</code> and <code>4</code>. We subscribe to it, and we want to filter out all the even numbers. Instead of receiving:</p> <ul> <li><code>VectorDiff::Insert { index: 0, value: 1 }</code>,</li> <li><code>VectorDiff::Insert { index: 1, value: 2 }</code>,</li> <li><code>VectorDiff::Insert { index: 2, value: 3 }</code>,</li> <li><code>VectorDiff::Insert { index: 3, value: 4 }</code>.</li> </ul> <p>… we want to receive:</p> <ul> <li><code>VectorDiff::Insert { index: 0, value: 1 }</code>,</li> <li><code>VectorDiff::Insert { index: 1, value: 3 }</code>: note the <code>index</code>, it is not 2 but 1!</li> </ul> <p>We will see how all that works in the next episodes and how powerful this design is, especially when it comes to cross-platform UI (user interface). We are going to learn so much about <code>Stream</code> and <code>Future</code>, it's going to be fun!</p> <section class="footnotes"> <ol class="footnotes-list"> <li id="fn-spes_salutis"> <p>Latine expression meaning <em>salvation hope</em>. <a href="#fr-spes_salutis-1">↩</a></p> </li> <li id="fn-beati_pauperes_in_spiritu"> <p>Latine expression meaning <em>bless are the poor in spirit</em>. <a href="#fr-beati_pauperes_in_spiritu-1">↩</a></p> </li> <li id="fn-SRUB2015"> <p><cite><a href="https://infoscience.epfl.ch/server/api/core/bitstreams/7c8b929f-1f68-4948-8ea8-e364e4899b2a/content">Relaxed-Radix-Balanced (RRR) Vector: A Practical General Purpose Immutable Sequence</a></cite> by Sticki N., Rompf T., Ureche V. and Bagwell P. (2015, August), in <i>Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (pp. 342-354).</i> <a href="#fr-SRUB2015-1">↩</a></p> </li> <li id="fn-UCR2014"> <p><cite><a href="http://deepsea.inria.fr/pasl/chunkedseq.pdf">Theory and Practise of Chunked Sequences</a></cite> by Acar U. A., Charguéraud A., and Rainey M. (2014), in <i>Algorithms-ESA 2014: 22th Annual European Symposium, Wroclaw, Poland, September 8-10, 2014. Proceedings 21 (pp. 25-36).</i>, Springer Berlin Heidelberg. <a href="#fr-UCR2014-1">↩</a></p> </li> </ol> </section> I've loved Wasmer, I still love Wasmer 2021-10-04T00:00:00+00:00 2021-10-04T00:00:00+00:00 Unknown https://mnt.io/articles/i-leave-wasmer/ <p>This article could also have been titled <em>How I failed to change Wasmer</em>.</p> <p>Today is my last day at <a rel="noopener external" target="_blank" href="https://wasmer.io/">Wasmer</a>. For those who don't know this name, it has a twofold meaning: it's a <a rel="noopener external" target="_blank" href="http://github.com/wasmerio/wasmer">very popular WebAssembly runtime</a>, as well as a startup. I want to write about what I've been able to accomplish during my time at Wasmer (a high overview, not a technical view), and what <em>forces</em> me to leave the company despite being one of its co-founder. I reckon my testimony can help other people to avoid digging into the hell I (and my colleagues) had to endure. I'm available for work, you can contact me at <a href="mailto:[email protected]">[email protected]</a>, <a rel="noopener external" target="_blank" href="https://twitter.com/mnt_io">@mnt_io</a>, <a rel="noopener external" target="_blank" href="https://www.linkedin.com/in/ivan-enderlin/">ivan-enderlin</a> (LinkedIn).</p> <h2 id="from-nothing-to-pure-awesomeness">From nothing to pure awesomeness<a role="presentation" class="anchor" href="#from-nothing-to-pure-awesomeness" title="Anchor link to this header">#</a> </h2> <p>I've joined the Wasmer company at its early beginning, in March 2019. The company was 3 months old. My initial role was to write and to improve the runtime itself, and to create many embeddings, i.e. ways to integrate the Wasmer runtime inside various technologies, so that WebAssembly can run everywhere.</p> <p>I can say with confidence that my work is a success. I've learned a lot, and I've worked on so many different projects, technologies, hacked so many things, collaborated with so many people, every action was led by the <strong>passion</strong>.</p> <p>At the time of writing, Wasmer has an incredible growth. In 2.5 years only, the runtime has more than 10'500 stars on Github, and is <strong>one of the most popular WebAssembly runtime in the world</strong>! It's used by many various companies, such as <a rel="noopener external" target="_blank" href="https://confio.tech/">Confio</a>, <a rel="noopener external" target="_blank" href="https://fluence.network/">Fluence Labs</a>, <a rel="noopener external" target="_blank" href="https://hotg.dev/">HOT-G</a>, <a rel="noopener external" target="_blank" href="https://brave.com/">Brave</a>, <a rel="noopener external" target="_blank" href="https://google.com/">Google</a>, <a rel="noopener external" target="_blank" href="https://www.apple.com/">Apple</a>, <a rel="noopener external" target="_blank" href="https://spacemesh.io/">SpaceMesh</a>, <a rel="noopener external" target="_blank" href="https://linkerd.io/">Linkerd</a>, <a rel="noopener external" target="_blank" href="https://www.singlestore.com/">SingleStore</a>, <a rel="noopener external" target="_blank" href="https://www.clever-cloud.com/">CleverCloud</a> or <a rel="noopener external" target="_blank" href="https://konghq.com/">Kong</a> to name a few (for the ones I can name though, however other companies are also using Wasmer in very critical environments).</p> <p>Most of my engineering job happened on the Wasmer runtime itself. At the time of writing, I'm the #2 contributor on the project. I was working on <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer/tree/f9ff574e10d4ee97f836565bdae99035e04ac879/lib">every parts of the runtime</a>: the API, the C API, the compilers, the ABI (mostly WASI), the engines, the middlewares, and the VM itself which is the most low-level foundamental layer of the runtime.</p> <p>The runtime provides so many features. It is an impressively powerful runtime for WebAssembly, and I'm saying that with a neutral and respectful mindset. Not everything is perfect obviously but I did my best to set up a truly user-friendly learning environment, with an important documentation and <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer/tree/f9ff574e10d4ee97f836565bdae99035e04ac879/examples">a collection of examples</a> that illustrate many features. I strongly believe it contributed to Wasmer's popularity to great extent.</p> <p>I would like to highlight the most notable embedding projects I've created:</p> <ul> <li><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer/tree/master/lib/c-api"><code>wasmer-c-api</code></a> is the C embedding for Wasmer. It's part of the Wasmer runtime itself, and is fully written in Rust. <a rel="noopener external" target="_blank" href="https://docs.rs/wasmer-c-api/*/wasmer_c_api/wasm_c_api/index.html">The documentation, the C examples</a>, everything is super polished to offer the best experience possible. <a rel="noopener external" target="_blank" href="https://github.com/MarkMcCaskey">Mark McCaskey</a> and I are the authors of this project.</li> <li><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer-python"><code>wasmer-python</code></a> is the Python embedding for Wasmer. At the time of writing, it's been installed more than 5 millions times (I'm counting the compiler packages too, like <code>wasmer-compiler-cranelift</code> and so on), and 1300 stars on Github. There is about 300'000 downloads per months, and it continues to grow! The code is written in Rust, and it relies on <a rel="noopener external" target="_blank" href="https://pyo3.rs/">the awesome <code>pyo3</code> project</a> .</li> <li><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer-go/"><code>wasmer-go</code></a> is the Go embedding for Wasmer. It's hard to know how much total downloads we have because of how the Go ecosystem is designed, but we have about 60'000 downloads per months from Github (I'm excluding the forks of the project), and 1600 stars on Github. The code is written in Go and uses <a rel="noopener external" target="_blank" href="https://golang.org/cmd/cgo/"><code>cgo</code></a> to bind against the C API. Almost all blockchain projects that use WebAssembly are using <code>wasmer-go</code>, which is a popularity I wasn't expecting.</li> <li><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer-ruby/"><code>wasmer-ruby</code></a> is the Ruby embedding for Wasmer. It's not as popular as the others, but it's also very polished and it's finding its place in the Ruby ecosystem. The code is written in Rust, and it relies on <a rel="noopener external" target="_blank" href="https://github.com/danielpclark/rutie">the awesome <code>rutie</code> project</a> .</li> <li>I won't detail all the projects, but there is also <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer-php"><code>wasmer-php</code></a>, <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer-java"><code>wasmer-java</code></a>, <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer-postgres"><code>wasmer-postgres</code></a>… Because of the Wasmer runtime API and C API we have designed, many developers around the globe have been able to create a lot more embeddings, such as in <a rel="noopener external" target="_blank" href="https://github.com/migueldeicaza/WasmerSharp">C#</a>, <a rel="noopener external" target="_blank" href="https://github.com/chances/wasmer-d">D</a>, <a rel="noopener external" target="_blank" href="https://github.com/tessi/wasmex">Elixir</a>, <a rel="noopener external" target="_blank" href="https://github.com/dirkschumacher/wasmr">R</a>, <a rel="noopener external" target="_blank" href="https://github.com/AlwaysRightInstitute/SwiftyWasmer">Swift</a>, <a rel="noopener external" target="_blank" href="https://github.com/zigwasm/wasmer-zig">Zig</a>, <a rel="noopener external" target="_blank" href="https://github.com/dart-lang/wasm">Dart</a>, <a rel="noopener external" target="_blank" href="https://github.com/helmutkian/cl-wasm-runtime">Lisp</a> and so on.</li> </ul> <p>Other fun notable projects are:</p> <ul> <li><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/sonde-rs"><code>sonde-rs</code></a>, a library to compile USDT probes into a Rust library,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/llvm-custom-builds"><code>llvm-custom-builds</code></a>, a sandbox to produce custom LLVM builds for various platforms,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/loupe"><code>loupe</code></a>, a set of tools to analyse and to profile Rust code,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/interface-types"><code>wasmer-interface-types</code></a>, a “living” (understand an unstable playground) library that implements <a rel="noopener external" target="_blank" href="https://github.com/WebAssembly/interface-types">the WebAssembly Interface Types proposal</a>,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/Hywan/inline-c-rs/"><code>inline-c-rs</code></a>, to write and to execute C code inside Rust,</li> <li>in-memory filesystem, that acts exactly like <code>std::fs</code>.</li> </ul> <p>As you might think, I've learned so much. The impostor syndrom was very present because I was constantly trying to do something I didn't know. It's part of the routine at Wasmer: Trying something for the first time. But it's also what kept me motivated, and it was the energy for my passion.</p> <p>This list above shows released projects, but I've also experimented (and sometimes at two hairs of a release) with:</p> <ul> <li><a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Unikernel">Unikernels</a>; this one was really fun, given a WebAssembly module and a filesystem, we were able to generate a unikernel that was executing the given program,</li> <li>Parser; to write the fastest WebAssembly parser possible, it was working, but never released,</li> <li>HAL (Hardware Abstraction Layer) ABI for WebAssembly, so that we can run WebAssembly on small chips super easily (think of IoT),</li> <li>Networking; an extension to WASI to support networking (TCP and UDP sockets), with an implementation in <a rel="noopener external" target="_blank" href="https://www.rust-lang.org/">Rust</a>, <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/C_standard_library">libc</a>, and even <a rel="noopener external" target="_blank" href="https://github.com/ziglang/zig/">Zig</a>! We were able to compile C programs to WebAssembly like cURL, or TCP servers written with kqueue or epoll etc, and to execute them on any platforms.</li> </ul> <p>All those things were working.</p> <p>It's absolutely crazy what WebAssembly can do today, and I still truly and deeply believe in this technology. I'm not the only one: <a rel="noopener external" target="_blank" href="https://www.ycombinator.com/companies/wasmer">YCombinator</a> and <a rel="noopener external" target="_blank" href="https://medium.com/speedinvest/the-next-generation-of-cloud-computing-investing-in-wasmer-768c9aac5922">SpeedInvest</a> are also founders that believe in Wasmer.</p> <p>So. What a dream, huh?</p> <h2 id="the-toxic-working-environment">The toxic working environment<a role="presentation" class="anchor" href="#the-toxic-working-environment" title="Anchor link to this header">#</a> </h2> <p>WebAssembly is <em>nothing</em> without its community. I won't name people to avoid missing important persons, but all the contributors are doing amazing work, to create something new, something special, something <em>right</em>.</p> <p>Wasmer is a success. The Wasmer runtime is <em>nothing</em> without the incredible, marvelous, exceptional team of engineers behind it. In no particular order: <a rel="noopener external" target="_blank" href="https://github.com/lachlansneff">Lachlan Sneff</a>, <a rel="noopener external" target="_blank" href="https://github.com/MarkMcCaskey">Mark McCaskey</a>, <a rel="noopener external" target="_blank" href="https://github.com/jubianchi">Julien Bianchi</a>, <a rel="noopener external" target="_blank" href="https://github.com/nlewycky">Nick Lewycky</a>, <a rel="noopener external" target="_blank" href="https://github.com/losfair">Heyang Zhou</a>, <a rel="noopener external" target="_blank" href="https://github.com/xmclark">Mackenzie Clark</a>, <a rel="noopener external" target="_blank" href="https://github.com/bjfish">Brandon Fish</a>. All of them, with no exception, have put a lot of passion in this project. It is what it is today because of them and also because of the contributors we have been honored to welcome. The open source side of Wasmer was intense but also an important source of joy. It is a respectful place to work.</p> <p>However, the inside reality was very different. All the employees hereinabove have left the company. Almost all of them due to a burn-out or conflicts or strong disagreements with the company leadership. I am leaving due to a severe burn-out. I would like to briefly share my journey to my first burn-out in few points:</p> <ul> <li>I've started as an engineer. I love coding. I love hacking. In Wasmer, I've found a place to learn a lot and to express my passion. We had a lot of pressure mostly because our friendly “competitors” had more people dedicated to work on their runtimes, more money, more power, better marketing and so on. That's the inevitable burden of any startup. When you're competing against giants, that's what happens. And that's OK. It's part of the game.</li> <li>During that time, we were delivering more and more projects, more and more features, at an incredible pace. New hats: Release manager, project manager, more product ownership, more customers to be connected with, more contributors to help, more issues to triage, blog writer etc. The pace was accelerating too fast, something we did notice on multiple occasions.</li> <li>The CEO, <a rel="noopener external" target="_blank" href="https://github.com/syrusakbary">Syrus Akbary</a>, had evidently a lot of pressure on its shoulders. It sadly resulted in the worst possible way: micro-management, stress, pressure, bad leadership, lack of vision, lack of trust in the employees, changing the roadmap constantly, lies, secrets etc.</li> <li>As one of the older in the company, with a family of two kids, I probably got more “wisdom”. I've decided to create a safe place for employees to express their frustrations, their needs, to find solutions together. <em>De facto</em>, I became the “person of trust” in the company. I got new hats, new pressures.</li> <li>SARS-CoV-2 hit. School at home. Lock-down. More micro-management, more stress. Wasmer was running out of money. I brought a new investor that saved the company. New hat.</li> <li>After too many departures (85% of the engineering team!), I tried to take more space and to take more responsabilities in the company. That was at the beginning of 2021. It was my last attempt to save the company from a disaster before leaving. <strong>I couldn't imagine leaving such brilliant and successful projects without having tried everything I could</strong>.</li> <li>Then <strong>I became a <em>late co-founder</em> of Wasmer</strong>. Too many new hats: Doing hiring interviews, accountabilities, helping to define the roadmap (with another awesome person, friend, and employee), handling legal aspects to hire people in multiple countries with non-precarious contracts etc.</li> <li>Obviously, I was also doing the job of all the engineers that have left. They were not replaced for unknown reasons. It was absolutely madness. The pace was unsustainable.</li> <li>Finally, the crack. The CEO continued to change the roadmap, to take bad decisions, to not recognize all the efforts we were doing to save/grow the company. It was my turn to be declared in a <em>severe</em> burn-out by my physician. The last engineer to fall.</li> </ul> <p>Another point: Syrus Akbary also has made many public errors that have created hostility against the company. Hopefully people were smart and kind enough to make the difference between the employees and the CEO (I won't name the people but they will recognize themselves: Thank you). I tried to fix that situation. Discussing with dozens of person to restore empathy and forgiveness, to create better collaboration, to cure and move forward. It was exhausting. I know people have appreciated what I did, but my mental health was ruined.</p> <p>Considering all the time I've devoted to the company, the very few consideration I got in return, the countless work hours (4 days per week, but frequently closing the computer at 1am due to very late meetings, I was working like hell), the precarious contract I had (did you ever see a co-founder with a freelancer contract?), the toxic working environment, the constant pressure etc., my passion was intact but my motivation was seriously damaged. Doing overtime was never recorded and was happening more than frequently, but taking half a day to take care of a sick child was immediately counted as holidays; the balance was broken. Criticisms. Micro-management. Disapprovals. Rewriting the facts and the reality to criticize what you're doing, flipping things against you, avoiding discussion when things get stormy. We even had a meeting titled “Why you're not productive enough” whilst everyone was working as hell, right after the rewrite of the entire runtime to release Wasmer 1.0, a period we all affectionately called “The Rewrite of Hell”. The team deserved vacations, congratulations, attentions, gratitude, … not such a shitty meeting. Well, you get the idea.</p> <p>When I've been declared in a severe burn-out, I had to take a break. The reaction from the CEO was… unexpected: Zero empathy, asking to never ever being sick again (otherwise I will be fired), dividing my equities, asking me to work more, saying I've never been involved in the company etc. That was the final straw to me. That's <em>the</em> wrong way to treat an employee, a collaborator, a contributor, the co-founder.</p> <h2 id="what-s-next">What's next?<a role="presentation" class="anchor" href="#what-s-next" title="Anchor link to this header">#</a> </h2> <p>I need to recover. As you can imagine, working 2.5 years at this pace leaves sequelae. Hopefully a couple of months should suffice.</p> <p>I'm still in love with Wasmer, the <strong>runtime</strong>, the open source projects we have created. It has a bright future. More companies are contributing to it, more individual contributors are bringing their stones to the monument. The project is owned by the public, by the users, by the contributors, they are doing most of the work today. It's well tested, well documented, it's easy to contribute to it. It's a fabulous open source piece of technology.</p> <p>I strongly <em>hope</em> Wasmer, the <strong>company</strong>, will change. The products that are going to be released are absolutely fantastic. It's a technology shift. It will enable the Decentralized Web, how we compile and distribute programs, how we will even consume programs. Wasmer has a solid bright future. I really <em>hope</em> things will change, and I wish the best to and am passing on the torch to the adventurers that will continue to move the company forward. I'm just too <em>skeptical</em> that things can improve or even slightly change. We have built something great. Please take a great care of it.</p> <p>As I said, I'm available for a new adventure, you can contact me at <a href="mailto:[email protected]">[email protected]</a>, <a rel="noopener external" target="_blank" href="https://twitter.com/mnt_io">@mnt_io</a>, <a rel="noopener external" target="_blank" href="https://www.linkedin.com/in/ivan-enderlin/">ivan-enderlin</a> (LinkedIn).</p> <p>Discussions <a rel="noopener external" target="_blank" href="https://twitter.com/mnt_io/status/1445310721185783811">on Twitter</a> and <a rel="noopener external" target="_blank" href="https://news.ycombinator.com/item?id=28772863">on HackerNews</a>.</p> Bye bye WhatsApp, hello ? 2021-01-19T00:00:00+00:00 2021-01-19T00:00:00+00:00 Unknown https://mnt.io/articles/bye-bye-whatsapp-hello/ <p>Ce billet est en français, essentiellement à destination de mes ami(e)s et familles, il sert à vulgariser très rapidement les enjeux autour de <a rel="noopener external" target="_blank" href="https://www.whatsapp.com/">WhatsApp</a>, <a rel="noopener external" target="_blank" href="https://www.signal.org/fr/">Signal</a>, <a rel="noopener external" target="_blank" href="https://telegram.org/">Telegram</a> et <a rel="noopener external" target="_blank" href="https://element.io/">Matrix</a> (<em>spoiler</em>, c'est le gagnant). Tout le monde me pose la même question, alors voici une réponse rapidement en brouillon qui va m'éviter du copier/coller !</p> <p>Je ne rentre volontairement <em>pas</em> dans les détails. Il faut que ce document reste à la portée de tous, sans aucune connaissance en réseau, chiffrement, sécurité etc. Ceux qui ont ces connaissances savent déjà que Matrix est <em>le</em> réseau vers lequel aller ;-).</p> <h2 id="les-bases">Les bases<a role="presentation" class="anchor" href="#les-bases" title="Anchor link to this header">#</a> </h2> <p>Quand on parle de messageries, il y a 2 choses primordiales plus 1 bonus :</p> <ul> <li>le chiffrement ;</li> <li>la topologie du réseau (c'est facile, n'ayez pas peur) ;</li> <li>l'accès libre et gratuit sans restriction au code source (<em>open source</em>).</li> </ul> <p>Nous pouvons aussi parler du modèle économique du réseau rapidement, voir le tableau comparatif.</p> <h3 id="le-chiffrement">Le chiffrement<a role="presentation" class="anchor" href="#le-chiffrement" title="Anchor link to this header">#</a> </h3> <p>Pour respecter la vie privée et éviter l'espionnage et le vol des données, il faut que le chiffrement se fasse de bout en bout (<em>end to end encryption</em>). Ça veut dire que vous seul avez la clé pour chiffrer et/ou déchiffrer vos messages, et personne d'autre. Par message, j'entends message texte, audio, image, vidéo, appels audios-vidéos, tout. Vos données sont à vous, et uniquement vous, et personne ne peut les utiliser, à part la personne avec qui vous les partagez (qui elle, normalement à une clé de déchiffrement par exemple). Les clés servent aussi à identifier la personne avec qui vous parlez, ça permet d'éviter le vol d'identité.</p> <h3 id="la-topologie">La topologie<a role="presentation" class="anchor" href="#la-topologie" title="Anchor link to this header">#</a> </h3> <p>La plupart des réseaux sont centralisés : ça veut dire qu'on a un gros silot, un énorme ordinateur/serveur, et que tout le monde est dessus. Ça pose plein de problèmes :</p> <ul> <li>impossible d'avoir le contrôle dessus ;</li> <li>impossible de faire confiance ;</li> <li>faille unique.</li> </ul> <p>Je prends l'exemple de WhatsApp pour illustrer tout ça parce que ça parle à tout le monde : Facebook décide unilatéralement de déchiffrer le réseau, personne n'a le contrôle dessus, c'est une violation grave de la vie privée de milliards de personnes et on ne peut rien faire (à part quitter le réseau). Avions-nous confiance dans ce que faisait Facebook avec nos données WhatsApp avant ? Non, aucunement. Ils disaient que c'était chiffré, l'était-ce vraiment ? J'accorde plus de confiance dans ceux qui ont créé et chiffré le réseau avant qu'il ne soit racheté par Facebook, donc j'ai envie d'y croire, mais… <em>je ne peux pas le vérifier</em> ! Pourquoi ? Parce que personne (en dehors de quelques employés chez Facebook) n'a accès au code source, aux programmes, qui font tourner WhatsApp. Et pour le côté <em>faille unique</em>, si Facebook est attaqué, c'est l'entièreté du réseau qui s'effondre, c'est une faille unique, un <em>single point of failure</em> comme on dit dans le métier. Pareil si le réseau est <em>hacké</em>, c'est un accès illimité à tout le réseau.</p> <blockquote> <p>Aucune transparence = aucune confiance.</p> </blockquote> <p>Mais il existe une alternative majeure bien sûr ! Les réseaux décentralisés. Au lieu d'avoir un serveur, il y a en des centaines, des milliers. Il n'y a plus de contrôle possible. Il n'y a plus de <em>single point of failure</em>. Un <em>hacker</em> ne peut accéder au pire qu'aux données d'un seul serveur, pas de tous les serveurs (il existe pleins d'exceptions mais je vulgarise, hein). Nous pouvons créer autant de serveurs que nous le souhaitons. Souvent, ce sont des réseaux open source, donc nous pouvons lire le code des programmes, vérifier qu'ils font bien ce qu'ils proclament faire.</p> <h2 id="tableau-comparatif">Tableau comparatif<a role="presentation" class="anchor" href="#tableau-comparatif" title="Anchor link to this header">#</a> </h2> <p>Comparons les services populaires avec ces critères de bases.</p> <figure> <table> <tbody> <tr> <td><strong>Service</strong></td> <td><strong>Chiffrement</strong></td> <td><strong>Topologie</strong></td> <td><strong>Open source</strong></td> </tr> <tr> <td><strong>WhatsApp</strong></td> <td>bout en bout (pour le moment)</td> <td>centralisé (US)</td> <td>non</td> </tr> <tr> <td><strong>Telegram</strong></td> <td>bout en bout (pour le moment)</td> <td>centralisé (Dubaï, US)</td> <td>non</td> </tr> <tr> <td><strong>Signal</strong></td> <td>bout en bout</td> <td>centralisé (US)</td> <td>oui mais…</td> </tr> <tr> <td><strong>Matrix</strong></td> <td>bout en bout</td> <td>décentralisé</td> <td>oui</td> </tr> </tbody> </table> <figcaption> <p>Comparons la base !</p> </figcaption> </figure> <p>Signal est open source, mais nous ne pouvons pas vérifier ce qui est installé sur les serveurs, parce que le serveur est privé. De plus, le serveur open source n'a pas été <a rel="noopener external" target="_blank" href="https://github.com/signalapp/Signal-Server">mis à jour depuis avril 2020</a>, en année Informatique, c'est très long. Ça cache quelque chose ? Aucune idée, je ne peux pas le savoir, car je n'ai pas d'éléments pour prendre une décision. Est-ce que je veux déposer mes données privées sur un service dans lequel je n'ai pas confiance ?</p> <p>En plus, Signal comme WhatsApp sont hébergés/situés aux US, avec les lois liberticides qu'on leur connaît bien (comme le Cloud Act). Signal limite la casse grâce au chiffrement de bout en bout, mais peut être qu'une <em>backdoor</em> est présente et qu'on ne le saura jamais.</p> <blockquote> <p>Aucune transparence = aucune confiance</p> </blockquote> <p>Les réseaux décentralisés sont supérieurs à tous les niveaux (pas de contrôle, pas de hack massif etc.). Les réseaux open source sont ceux en qui nous pouvons avoir confiance. Donc le choix est vite fait, le gagnant ici est Matrix.</p> <p>Comparons maintenant comment les services sont financés, parce que c'est important. Si un service n'est pas rentable, il pourrait avoir de l'appétit pour les données de ses utilisateurs, et là c'est dangereux (c'est exactement ce qu'il se passe avec Facebook et WhatsApp).</p> <figure> <table> <tbody> <tr> <td><strong>Service</strong></td> <td><strong>Revenues</strong></td> </tr> <tr> <td><strong>WhatsApp</strong></td> <td>Facebook veut utiliser les données privées pour vendre de la publicité ciblée.</td> </tr> <tr> <td><strong>Telegram</strong></td> <td>Les fondateurs sont millionnaires et injectent de l'argent.<br>Dans peu de temps, financement via pubs et comptes premiums.</td> </tr> <tr> <td><strong>Signal</strong></td> <td>Organisation à but non-lucratif qui opère via des dons.</td> </tr> <tr> <td><strong>Matrix</strong></td> <td>Matrix développe, offre ou vend des services autour du réseau, mais pas autour des données !</td> </tr> </tbody> </table> <figcaption> <p>Comment sont financés les services ?</p> </figcaption> </figure> <p>Les gagnants ici sont Signal et Matrix.</p> <h2 id="conclusion-matrix-gagnant">Conclusion : Matrix gagnant<a role="presentation" class="anchor" href="#conclusion-matrix-gagnant" title="Anchor link to this header">#</a> </h2> <p>Dans le cas des réseaux centralisés, Signal est une meilleure alternative à WhatsApp et Telegram de part son mode de financement (donc son appétit pour les données des utilisateurs), mais ils sont tous sujets aux même problèmes : aucune confiance car pas de transparence, hébergés aux US etc.</p> <p>Mais les réseaux décentralisés sont supérieurs car ils résolvent tous ces problèmes ! Matrix est décentralisé, est financé par des services autour du réseau mais pas par les données du réseau (qui sont inaccessibles de toute façon, elles n'existent que sur vos téléphones et ordinateurs, nul part ailleurs).</p> <p>J'utilise Matrix. Je vous conseille d'utiliser Matrix. Partir sur Signal, c'est sortir de la gueule d'un loup pour aller dans celle d'un autre. Je suis admiratif du travail des développeurs de chez Signal, ils sont vraiment bons, leur protocole de chiffrement est magnifique, mais je n'ai pas confiance dans leur service parce que je ne <em>peux</em> pas. Et personne ne le <em>peut</em>.</p> <p>J'utilise aussi WhatsApp et Signal pour rester en contact avec mes amis et ma famille, et leur dire d'utiliser Matrix, mais je n'y publierai jamais de données personnelles, photos ou quoi que ce soit, je n'ai aucune confiance. Libre à vous aussi d'utiliser plusieurs réseaux, après tout nous jonglons déjà avec plusieurs réseaux (mail, SMS, WhatsApp, Matrix, Twitter, <a rel="noopener external" target="_blank" href="https://mastodon.social/about">Mastodon</a> etc.), ça n'est pas un problème !</p> <h2 id="premier-pas-avec-matrix">Premier pas avec Matrix<a role="presentation" class="anchor" href="#premier-pas-avec-matrix" title="Anchor link to this header">#</a> </h2> <p>C'est parti, petit tuto Matrix. Le réseau est exceptionnel, mais le client officiel (<a rel="noopener external" target="_blank" href="https://element.io/">Element</a>) est encore un peu « brut » à utiliser comparé à Signal ou WhatsApp. Notez que ça évolue très très vite (je compte 616 contributeurs qui travaillent dessus bénévolement, encore une grande force de l'open source !).</p> <p>Ce qui va vous titiller le plus c'est : vous ne pouvez pas toujours identifier vos contacts par numéro de téléphone (seulement s'ils sont enregistrés sur un serveur d'identité). Pourquoi ? Parce que votre compte à un identifiant, comme une adresse email. Le mien est <code>@mnt_io:matrix.org</code> (le format est <code>@identifiant:serveur</code>). C'est bien meilleur pour la vie privée. Et pis, ça n'est pas différent de MSN ou de tout autre réseau de l'époque, c'est vraiment WhatsApp qui a imposé la « découvertabilité » par le numéro de téléphone. Bien que très pratique, c'est dangereux pour la vie privée.</p> <p>Donc, go, on installe le client :</p> <ul> <li>sur <a rel="noopener external" target="_blank" href="https://apps.apple.com/us/app/element-messenger/id1083446067">iOS, macOS etc</a>.,</li> <li>sur <a rel="noopener external" target="_blank" href="https://play.google.com/store/apps/details?id=im.vector.app&amp;hl=en_US&amp;gl=US">Android</a>,</li> <li>sur votre <a rel="noopener external" target="_blank" href="https://element.io/get-started">bureau ou votre navigateur</a>.</li> </ul> <p>Puis on crée un compte, et ajoutez moi. C'est parti !</p> <p>Matrix/Element est basé sur les groupes. Les chats « directs » (1:1) sont des groupes aussi. Vous pouvez même rejoindre des <em>rooms</em> (gros groupes, des communautés) avec des centaines voire des milliers de personnes dedans. C'est très flexible.</p> <p>Parce que c'est open source, n'importe qui peut écrire son propre client (programme qui se connecte au réseau). Il existe des clients alternatives, comme <a rel="noopener external" target="_blank" href="https://nio.chat/">Nio</a> ou <a rel="noopener external" target="_blank" href="https://fluffychat.im/en/">FluffyChat</a>, ou même <a rel="noopener external" target="_blank" href="https://matrix.org/docs/projects/client/ditto-chat">Ditto Chat</a>. Tous ces clients sont encore en beta, mais ça montre un futur très excitant pour Matrix avec des clients de plus en plus aboutis !</p> <h3 id="matrix-element-vector-hein">Matrix, Element, Vector, hein ?<a role="presentation" class="anchor" href="#matrix-element-vector-hein" title="Anchor link to this header">#</a> </h3> <ul> <li>Element c'est le nom de l'entreprise qui travaille/développe le réseau, les serveurs et le client ;</li> <li>Matrix c'est le nom du réseau ;</li> <li>Vector, c'est l'ancien nom d'Element.</li> </ul> <p>On parle souvent de façon indifférentiée de Matrix ou Element, c'est un abus de langage.</p> <blockquote> <p>Before: Mark as read.</p> <p>Now: Mark has read.</p> </blockquote> <p>Et par pitié, quittez Facebook…</p> Announcing the first Java library to run WebAssembly: Wasmer JNI 2020-05-13T00:00:00+00:00 2020-05-13T00:00:00+00:00 Unknown https://mnt.io/articles/announcing-the-first-java-library-to-run-webassembly-wasmer-jni/ <p><em>This is a copy of <a rel="noopener external" target="_blank" href="https://medium.com/wasmer/announcing-the-first-java-library-to-run-webassembly-wasmer-jni-89e319d2ac7c">an article I wrote for Wasmer</a>.</em></p> <hr /> <p><a rel="noopener external" target="_blank" href="https://webassembly.org/">WebAssembly</a> is a portable binary format. That means the same file can run anywhere.</p> <blockquote> <p>To uphold this bold statement, each language, platform and system must be able to run WebAssembly — as fast and safely as possible.</p> </blockquote> <p>People who are familiar with Wasmer are used to this kind of announcement! Wasmer is written in Rust, and comes with an additional native C API. But you can use it in a lot of other languages. After having announced libraries to use Wasmer, and thus WebAssembly, in:</p> <ul> <li><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/php-ext-wasm"><strong>PHP</strong> with the <code>ext/wasm</code> extension</a>,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/python-ext-wasm"><strong>Python</strong> with the <code>wasmer</code> library</a>,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/ruby-ext-wasm"><strong>Ruby</strong> with the <code>wasmer</code> library</a>,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/go-ext-wasm"><strong>Go</strong> with the <code>wasmer</code> library</a> (see <a rel="noopener external" target="_blank" href="https://medium.com/wasmer/announcing-the-fastest-webassembly-runtime-for-go-wasmer-19832d77c050">Announcing the fastest WebAssembly runtime for Go: <code>wasmer</code></a>), and even</li> <li><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/postgres-ext-wasm"><strong>Postgres</strong> with the <code>wasmer</code> library</a> (see <a rel="noopener external" target="_blank" href="https://medium.com/wasmer/announcing-the-first-postgres-extension-to-run-webassembly-561af2cfcb1">Announcing the first Postgres extension to run WebAssembly</a>),</li> <li>and many other contributions in <a rel="noopener external" target="_blank" href="https://github.com/migueldeicaza/WasmerSharp">.NET/C#</a>, <a rel="noopener external" target="_blank" href="https://github.com/dirkschumacher/wasmr">R</a> and <a rel="noopener external" target="_blank" href="https://github.com/tessi/wasmex">Elixir</a>…</li> </ul> <p>…we are jazzed to announce that <strong><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/java-ext-wasm">Wasmer has now landed in Java</a></strong>!</p> <p>Let’s discover the Wasmer JNI library together.</p> <h2 id="installation">Installation<a role="presentation" class="anchor" href="#installation" title="Anchor link to this header">#</a> </h2> <p>The Wasmer JNI (<em>Java Native Interface</em>) library is based on the <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer">Wasmer runtime</a>, which is written in <a rel="noopener external" target="_blank" href="https://www.rust-lang.org/">Rust</a>, and is compiled to a shared library. For your convenience, we produce one JAR (<em>Java Archive</em>) per architecture and platform. By now, the following are supported, consistently tested, and pre-packaged (available in <a rel="noopener external" target="_blank" href="https://bintray.com/wasmer/wasmer-jni/wasmer-jni">Bintray</a> and <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/java-ext-wasm/releases">Github Releases</a>):</p> <ul> <li> <p><code>amd64-darwin</code> for macOS, x86 64bits,</p> </li> <li> <p><code>amd64-linux</code> for Linux, x86 64 bits,</p> </li> <li> <p><code>amd64-windows</code> for Windows, x86 64 bits.</p> </li> </ul> <p>More architectures and more platforms will be added in the near future. If you need a specific one, <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/java-ext-wasm/issues/new?assignees=&amp;labels=%F0%9F%8E%89+enhancement&amp;template=---feature-request.md&amp;title=">feel free to ask</a>! However, it is possible to <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/java-ext-wasm#development">produce your own JAR for your own platform and architecture</a>.</p> <p>The JAR files are named as follows: <code>wasmer-jni-$(architecture)-$(os)-$(version).jar</code>. Thus, to include Wasmer JNI as a dependency of your project (assuming you use <a rel="noopener external" target="_blank" href="http://gradle.org/">Gradle</a>), write for instance:</p> <pre class="giallo z-code"><code data-lang="plain"><span class="giallo-l"><span>dependencies {</span></span> <span class="giallo-l"><span> implementation &quot;org.wasmer:wasmer-jni-amd64-linux:0.2.0&quot;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>JAR are hosted on the Bintray/JCenter repository under the <code>[wasmer-jni](https://bintray.com/wasmer/wasmer-jni/wasmer-jni)</code> project. They are also attached to our <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/java-ext-wasm/releases">Github releases as assets</a>.</p> <h2 id="calling-a-webassembly-function-from-java">Calling a WebAssembly function from Java<a role="presentation" class="anchor" href="#calling-a-webassembly-function-from-java" title="Anchor link to this header">#</a> </h2> <p>As usual, let’s start with a simple Rust program that we will compile to WebAssembly, and then execute from Java.</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[no_mangle]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> extern</span><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> sum</span><span>(</span><span class="z-variable">x</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> i32</span><span>,</span><span class="z-variable"> y</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> i32</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> i32</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> x</span><span class="z-keyword z-operator"> +</span><span class="z-variable"> y</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>After compilation to WebAssembly, we get a file like <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/java-ext-wasm/raw/master/examples/simple.wasm">this one</a>, named <code>simple.wasm</code>.</p> <p>The following Java program executes the <code>sum</code> exported function by passing <code>5</code> and <code>37</code> as arguments:</p> <pre class="giallo z-code"><code data-lang="java"><span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> org</span><span class="z-punctuation z-separator">.</span><span class="z-storage">wasmer</span><span class="z-punctuation z-separator">.</span><span class="z-storage">Instance</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> java</span><span class="z-punctuation z-separator">.</span><span class="z-storage">io</span><span class="z-punctuation z-separator">.</span><span class="z-storage">IOException</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> java</span><span class="z-punctuation z-separator">.</span><span class="z-storage">nio</span><span class="z-punctuation z-separator">.</span><span class="z-storage">file</span><span class="z-punctuation z-separator">.</span><span class="z-storage">Files</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> java</span><span class="z-punctuation z-separator">.</span><span class="z-storage">nio</span><span class="z-punctuation z-separator">.</span><span class="z-storage">file</span><span class="z-punctuation z-separator">.</span><span class="z-storage">Paths</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> SimpleExample</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span class="z-storage"> public static</span><span class="z-storage z-type z-primitive"> void</span><span class="z-entity z-name z-function"> main</span><span>(</span><span class="z-storage z-type">String</span><span>[]</span><span class="z-variable z-parameter"> args</span><span>)</span><span class="z-storage"> throws</span><span class="z-storage z-type"> IOException</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span class="z-comment"> // Read the WebAssembly bytes.</span></span> <span class="giallo-l"><span class="z-storage z-type z-primitive"> byte</span><span>[]</span><span class="z-variable"> bytes</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> Files</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">readAllBytes</span><span>(</span><span class="z-variable">Paths</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">get</span><span>(</span><span class="z-string">&quot;simple.wasm&quot;</span><span>))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Instantiate the WebAssembly module.</span></span> <span class="giallo-l"><span class="z-storage z-type"> Instance</span><span class="z-variable"> instance</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span class="z-entity z-name z-function"> Instance</span><span>(bytes)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Get the `sum` exported function, call it by passing 5 and 37, and get the result.</span></span> <span class="giallo-l"><span class="z-storage z-type"> Integer</span><span class="z-variable"> result</span><span class="z-keyword z-operator"> =</span><span> (Integer)</span><span class="z-variable"> instance</span><span class="z-punctuation z-separator">.</span><span class="z-variable">exports</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">getFunction</span><span>(</span><span class="z-string">&quot;sum&quot;</span><span>)</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">apply</span><span>(</span><span class="z-constant z-numeric">5</span><span class="z-punctuation z-separator">,</span><span class="z-constant z-numeric"> 37</span><span>)[</span><span class="z-constant z-numeric">0</span><span>]</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> assert</span><span> result </span><span class="z-keyword z-operator">==</span><span class="z-constant z-numeric"> 42</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> instance</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">close</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> }</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span></code></pre> <p>Great! We have successfully executed a Rust program, compiled to WebAssembly, in Java. As you can see, it is pretty straightforward. The API is very similar to the standard JavaScript API, or the other API we have designed for PHP, Python, Go, Ruby etc.</p> <p>The assiduous reader might have noticed the <code>[0]</code> in <code>.apply(5, 37)[0]</code> pattern. A WebAssembly function can return zero to many values, and in this case, we are reading the first one.</p> <blockquote> <p>Note: Java values passed to WebAssembly exported functions are automatically downcasted to WebAssembly values. Types are inferred at runtime, and casting is done automatically. Thus, a WebAssembly function acts as any regular Java function.</p> </blockquote> <p>Technically, an exported function is a <em>functional interface</em> as defined by the Java Language Specification (i.e. it is a <code>[FunctionalInterface](https://docs.oracle.com/javase/8/docs/api/java/lang/FunctionalInterface.html)</code>). Thus, it is possible to write the following code where <code>sum</code> is an actual function (of kind <code>org.wasmer.exports.Function</code>):</p> <pre class="giallo z-code"><code data-lang="java"><span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> org</span><span class="z-punctuation z-separator">.</span><span class="z-storage">wasmer</span><span class="z-punctuation z-separator">.</span><span class="z-storage">Instance</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> org</span><span class="z-punctuation z-separator">.</span><span class="z-storage">wasmer</span><span class="z-punctuation z-separator">.</span><span class="z-storage">exports</span><span class="z-punctuation z-separator">.</span><span class="z-storage">Function</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> java</span><span class="z-punctuation z-separator">.</span><span class="z-storage">io</span><span class="z-punctuation z-separator">.</span><span class="z-storage">IOException</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> java</span><span class="z-punctuation z-separator">.</span><span class="z-storage">nio</span><span class="z-punctuation z-separator">.</span><span class="z-storage">file</span><span class="z-punctuation z-separator">.</span><span class="z-storage">Files</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> java</span><span class="z-punctuation z-separator">.</span><span class="z-storage">nio</span><span class="z-punctuation z-separator">.</span><span class="z-storage">file</span><span class="z-punctuation z-separator">.</span><span class="z-storage">Paths</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> SimpleExample</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span class="z-storage"> public static</span><span class="z-storage z-type z-primitive"> void</span><span class="z-entity z-name z-function"> main</span><span>(</span><span class="z-storage z-type">String</span><span>[]</span><span class="z-variable z-parameter"> args</span><span>)</span><span class="z-storage"> throws</span><span class="z-storage z-type"> IOException</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span class="z-comment"> // Read the WebAssembly bytes.</span></span> <span class="giallo-l"><span class="z-storage z-type z-primitive"> byte</span><span>[]</span><span class="z-variable"> bytes</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> Files</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">readAllBytes</span><span>(</span><span class="z-variable">Paths</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">get</span><span>(</span><span class="z-string">&quot;simple.wasm&quot;</span><span>))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Instantiate the WebAssembly module.</span></span> <span class="giallo-l"><span class="z-storage z-type"> Instance</span><span class="z-variable"> instance</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span class="z-entity z-name z-function"> Instance</span><span>(bytes)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Declare the `sum` function, as a regular Java function.</span></span> <span class="giallo-l"><span class="z-storage z-type"> Function</span><span class="z-variable"> sum</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> instance</span><span class="z-punctuation z-separator">.</span><span class="z-variable">exports</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">getFunction</span><span>(</span><span class="z-string">&quot;sum&quot;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Call `sum`.</span></span> <span class="giallo-l"><span class="z-storage z-type"> Integer</span><span class="z-variable"> result</span><span class="z-keyword z-operator"> =</span><span> (Integer)</span><span class="z-variable"> sum</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">apply</span><span>(</span><span class="z-constant z-numeric">1</span><span class="z-punctuation z-separator">,</span><span class="z-constant z-numeric"> 2</span><span>)[</span><span class="z-constant z-numeric">0</span><span>]</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> assert</span><span> result </span><span class="z-keyword z-operator">==</span><span class="z-constant z-numeric"> 3</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> instance</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">close</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> }</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span></code></pre> <p>But a WebAssembly module not only exports functions, it also exports memory.</p> <h2 id="reading-the-memory">Reading the memory<a role="presentation" class="anchor" href="#reading-the-memory" title="Anchor link to this header">#</a> </h2> <p>A WebAssembly instance has one or more linear memories, a contiguous and byte-addressable range of memory spanning from offset 0 and extending up to a varying memory size, represented by the <code>org.wasmer.Memory</code> class. Let’s see how to use it. Consider the following Rust program:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[no_mangle]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> extern</span><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> return_hello</span><span>()</span><span class="z-keyword z-operator"> -&gt; *</span><span class="z-storage">const</span><span class="z-entity z-name"> u8</span><span> {</span></span> <span class="giallo-l"><span class="z-string"> b&quot;Hello, World!</span><span class="z-constant z-character">\0</span><span class="z-string">&quot;</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">as_ptr</span><span>()</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>The <code>return_hello</code> function returns a pointer to the statically allocated string. The string exists in the linear memory of the WebAssembly module. It is then possible to read it in Java:</p> <pre class="giallo z-code"><code data-lang="java"><span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> org</span><span class="z-punctuation z-separator">.</span><span class="z-storage">wasmer</span><span class="z-punctuation z-separator">.</span><span class="z-storage">Instance</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> org</span><span class="z-punctuation z-separator">.</span><span class="z-storage">wasmer</span><span class="z-punctuation z-separator">.</span><span class="z-storage">Memory</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> java</span><span class="z-punctuation z-separator">.</span><span class="z-storage">io</span><span class="z-punctuation z-separator">.</span><span class="z-storage">IOException</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> java</span><span class="z-punctuation z-separator">.</span><span class="z-storage">nio</span><span class="z-punctuation z-separator">.</span><span class="z-storage">ByteBuffer</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> java</span><span class="z-punctuation z-separator">.</span><span class="z-storage">nio</span><span class="z-punctuation z-separator">.</span><span class="z-storage">file</span><span class="z-punctuation z-separator">.</span><span class="z-storage">Files</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword">import</span><span class="z-storage"> java</span><span class="z-punctuation z-separator">.</span><span class="z-storage">nio</span><span class="z-punctuation z-separator">.</span><span class="z-storage">file</span><span class="z-punctuation z-separator">.</span><span class="z-storage">Paths</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> MemoryExample</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span class="z-storage"> public static</span><span class="z-storage z-type z-primitive"> void</span><span class="z-entity z-name z-function"> main</span><span>(</span><span class="z-storage z-type">String</span><span>[]</span><span class="z-variable z-parameter"> args</span><span>)</span><span class="z-storage"> throws</span><span class="z-storage z-type"> IOException</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span class="z-comment"> // Read the WebAssembly bytes.</span></span> <span class="giallo-l"><span class="z-storage z-type z-primitive"> byte</span><span>[]</span><span class="z-variable"> bytes</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> Files</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">readAllBytes</span><span>(</span><span class="z-variable">Paths</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">get</span><span>(</span><span class="z-string">&quot;memory.wasm&quot;</span><span>))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Instantiate the WebAssembly module.</span></span> <span class="giallo-l"><span class="z-storage z-type"> Instance</span><span class="z-variable"> instance</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span class="z-entity z-name z-function"> Instance</span><span>(bytes)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Get a pointer to the statically allocated string returned by `return_hello`.</span></span> <span class="giallo-l"><span class="z-storage z-type"> Integer</span><span class="z-variable"> pointer</span><span class="z-keyword z-operator"> =</span><span> (Integer)</span><span class="z-variable"> instance</span><span class="z-punctuation z-separator">.</span><span class="z-variable">exports</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">getFunction</span><span>(</span><span class="z-string">&quot;return_hello&quot;</span><span>)</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">apply</span><span>()[</span><span class="z-constant z-numeric">0</span><span>]</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Get the exported memory named `memory`.</span></span> <span class="giallo-l"><span class="z-storage z-type"> Memory</span><span class="z-variable"> memory</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> instance</span><span class="z-punctuation z-separator">.</span><span class="z-variable">exports</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">getMemory</span><span>(</span><span class="z-string">&quot;memory&quot;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Get a direct byte buffer view of the WebAssembly memory.</span></span> <span class="giallo-l"><span class="z-storage z-type"> ByteBuffer</span><span class="z-variable"> memoryBuffer</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> memory</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">buffer</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Prepare the byte array that will hold the data.</span></span> <span class="giallo-l"><span class="z-storage z-type z-primitive"> byte</span><span>[]</span><span class="z-variable"> data</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span class="z-storage z-type z-primitive"> byte</span><span>[</span><span class="z-constant z-numeric">13</span><span>]</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Let&#39;s position the cursor, and…</span></span> <span class="giallo-l"><span class="z-variable"> memoryBuffer</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">position</span><span>(pointer)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // … read!</span></span> <span class="giallo-l"><span class="z-variable"> memoryBuffer</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">get</span><span>(data)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Let&#39;s encode back to a Java string.</span></span> <span class="giallo-l"><span class="z-storage z-type"> String</span><span class="z-variable"> result</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span class="z-entity z-name z-function"> String</span><span>(data)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Hello!</span></span> <span class="giallo-l"><span class="z-keyword"> assert</span><span class="z-variable"> result</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">equals</span><span>(</span><span class="z-string">&quot;Hello, World!&quot;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> instance</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">close</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> }</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span></code></pre> <p>As we can see, the <code>Memory</code> API provides a <code>buffer</code> method. It returns a <a rel="noopener external" target="_blank" href="https://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffer.html"><em>direct</em> byte buffer</a> (of kind <code>java.nio.ByteBuffer</code>) view of the memory. It’s a standard API for any Java developer. We think it’s best to not reinvent the wheel and use standard API as much as possible.</p> <p>The WebAssembly memory is dissociated from the JVM memory, and thus from the garbage collector.</p> <blockquote> <p>You can read <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/java-ext-wasm/blob/master/examples/GreetExample.java">the Greet Example</a> to see a more in-depth usage of the <code>Memory</code> API.</p> </blockquote> <h2 id="more-documentation">More documentation<a role="presentation" class="anchor" href="#more-documentation" title="Anchor link to this header">#</a> </h2> <p>The project comes with a <code>Makefile</code>. The <code>make javadoc</code> command will generate a traditional local Javadoc for you, in the <code>build/docs/javadoc/index.html</code> file.</p> <p>In addition, the project’s <code>README.md</code> file has an <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/java-ext-wasm#api-of-the-wasmer-library">API of the <code>wasmer</code> library Section</a>.</p> <p>Finally, the project comes with <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/java-ext-wasm/tree/master/examples">a set of examples</a>. Use the <code>make run-example EXAMPLE=Simple</code> to run the <code>SimpleExample.java</code> example for instance.</p> <h2 id="performance">Performance<a role="presentation" class="anchor" href="#performance" title="Anchor link to this header">#</a> </h2> <p>WebAssembly aims at being safe, but also fast. Since Wasmer JNI is the <em>first</em> Java library to execute WebAssembly, we can’t compare to prior works in the Java ecosystem. However, you might know that Wasmer comes with 3 backends: Singlepass, Cranelift and LLVM. We’ve even written an article about it: <a rel="noopener external" target="_blank" href="https://medium.com/wasmer/a-webassembly-compiler-tale-9ef37aa3b537">A WebAssembly Compiler tale</a>. The Wasmer JNI library uses the Cranelift backend for the moment, which offers the best compromise between compilation-time and execution-time.</p> <h2 id="credits">Credits<a role="presentation" class="anchor" href="#credits" title="Anchor link to this header">#</a> </h2> <p>Asami (<a rel="noopener external" target="_blank" href="https://twitter.com/d0iasm">d0iasm</a> on Twitter) has improved this project during its internship at Wasmer under my guidance. She finished the internship before the release of the Wasmer JNI project, but she deserves credits for pushing the project forward! Good work Asami!</p> <p>This is an opportunity to remind everyone that we hire anywhere in the world. Asami was working from Japan while I am working from Switzerland, and the rest of the team is from US, Spain, China etc. Feel free to contact me (<a rel="noopener external" target="_blank" href="https://twitter.com/mnt_io">@mnt_io</a> or <a rel="noopener external" target="_blank" href="https://twitter.com/syrusakbary">@syrusakbary</a> on Twitter) if you want to join us on this big adventure!</p> <h2 id="conclusion">Conclusion<a role="presentation" class="anchor" href="#conclusion" title="Anchor link to this header">#</a> </h2> <p>Wasmer JNI is a library to execute WebAssembly directly in Java. It embeds the WebAssembly runtime <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer">Wasmer</a>. The first releases provide the core API with <code>Module</code>, <code>Instance</code>, and <code>Memory</code>. It comes pre-packaged as a JAR, one per architecture and per platform.</p> <p>The source code is open and hosted on Github at <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/java-ext-wasm">wasmerio/java-ext-wasm</a>. We are constantly improving the project, so if you have feedback, issues, or feature requests please open an issue in the repository, or reach us on Twitter at <a rel="noopener external" target="_blank" href="https://twitter.com/wasmerio">@wasmerio</a> or <a rel="noopener external" target="_blank" href="https://twitter.com/mnt_io">@mnt_io</a>.</p> <p>We look forward to see what you build with this!</p> Announcing the first Postgres extension to run WebAssembly 2019-08-29T00:00:00+00:00 2019-08-29T00:00:00+00:00 Unknown https://mnt.io/articles/announcing-the-first-postgres-extension-to-run-webassembly/ <p><em>This is a copy of <a rel="noopener external" target="_blank" href="https://medium.com/wasmer/announcing-the-first-postgres-extension-to-run-webassembly-561af2cfcb1">an article I wrote for Wasmer</a>.</em></p> <hr /> <p>WebAssembly is a portable binary format. That means the same program can run anywhere.</p> <blockquote> <p>To uphold this bold statement, each language, platform and system must be able to run WebAssembly — as fast and safely as possible.</p> </blockquote> <p>Let’s say it again. <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer">Wasmer</a> is a WebAssembly runtime. We have successfully embedded the runtime in other languages:</p> <ul> <li>In Rust, as it is written in Rust</li> <li>Using <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer/tree/master/lib/runtime-c-api">C and C++ bindings</a></li> <li>In PHP, using <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/php-ext-wasm"><code>php-ext-wasm</code></a></li> <li>In Python, using <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/python-ext-wasm"><code>python-ext-wasm</code></a> — <a rel="noopener external" target="_blank" href="https://pypi.org/project/wasmer/">wasmer package on PyPI</a></li> <li>In Ruby, using <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/ruby-ext-wasm"><code>ruby-ext-wasm</code></a> — <a rel="noopener external" target="_blank" href="https://rubygems.org/gems/wasmer">wasmer gem on RubyGems</a></li> <li>In Go, using <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/go-ext-wasm"><code>go-ext-wasm</code></a> — see <a rel="noopener external" target="_blank" href="https://medium.com/wasmer/announcing-the-fastest-webassembly-runtime-for-go-wasmer-19832d77c050">the announcement</a>.</li> </ul> <p>The community has also embedded Wasmer in awesome projects:</p> <ul> <li>.NET/C#, using <a rel="noopener external" target="_blank" href="https://github.com/migueldeicaza/WasmerSharp">WasmerSharp</a></li> <li>R, using <a rel="noopener external" target="_blank" href="https://github.com/dirkschumacher/wasmr">Wasmr</a>.</li> </ul> <p><strong>It is now time to continue the story and to hang around… <a rel="noopener external" target="_blank" href="https://www.postgresql.org/">Postgres</a>!</strong></p> <p>We are so happy to announce a newcrazy idea: <strong>WebAssembly on Postgres</strong>. Yes, you read that correctly. On <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/postgres-ext-wasm">Postgres</a>.</p> <h2 id="calling-a-webassembly-function-from-postgres">Calling a WebAssembly function from Postgres<a role="presentation" class="anchor" href="#calling-a-webassembly-function-from-postgres" title="Anchor link to this header">#</a> </h2> <p>As usual, we have to go through the installation process. There is no package manager for Postgres, so it’s a manual step. The <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/postgres-ext-wasm#installation">Installation Section of the documentation</a> explains the details; here is a summary:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Build the shared library.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> just build</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Install the extension in the Postgres tree.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> just install</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Activate the extension.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> echo </span><span class="z-string">&#39;CREATE EXTENSION wasm;&#39;</span><span class="z-keyword z-operator"> |</span><span> \</span></span> <span class="giallo-l"><span> psql -h $host -d $database</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Initialize the extension.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> echo </span><span class="z-string">&quot;SELECT wasm_init(&#39;$(</span><span class="z-support z-function">pwd</span><span class="z-string">)/target/release/libpg_ext_wasm.dylib&#39;);&quot;</span><span class="z-keyword z-operator"> |</span><span> \</span></span> <span class="giallo-l"><span> psql -h $host -d $database</span></span></code></pre> <p>Once the extension is installed, activated and initialized, we can start having fun!</p> <p>The current API is rather small, however basic features are available. The goal is to gather a community and to design a pragmatic API together, discover the expectations, how developers would use this new technology inside a database engine.</p> <p>Let’s see how it works. To instantiate a WebAssembly module, we use the <code>wasm_new_instance</code> function. It takes 2 arguments: The absolute path to the WebAssembly module, and a prefix for the module exported functions. Indeed, if a module exports a function named <code>sum</code>, then a Postgres function named <code>prefix_sum</code> calling the <code>sum</code> function will be created dynamically.</p> <p>Let’s see it in action. Let’s start by editing a Rust program that compiles to WebAssembly:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[no_mangle]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> extern</span><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> sum</span><span>(</span><span class="z-variable">x</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> i32</span><span>,</span><span class="z-variable"> y</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> i32</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> i32</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> x</span><span class="z-keyword z-operator"> +</span><span class="z-variable"> y</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Once this file compiled to <code>simple.wasm</code>, we can instantiate the module, and call the exported <code>sum</code> function:</p> <pre class="giallo z-code"><code data-lang="sql"><span class="giallo-l"><span class="z-comment">-- New instance of the `simple.wasm` WebAssembly module.</span></span> <span class="giallo-l"><span class="z-keyword">SELECT</span><span> wasm_new_instance(</span><span class="z-string">&#39;/absolute/path/to/simple.wasm&#39;</span><span>, </span><span class="z-string">&#39;ns&#39;</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">-- Call a WebAssembly exported function!</span></span> <span class="giallo-l"><span class="z-keyword">SELECT</span><span> ns_sum(</span><span class="z-constant z-numeric">1</span><span>, </span><span class="z-constant z-numeric">2</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">-- ns_sum</span></span> <span class="giallo-l"><span class="z-comment">-- --------</span></span> <span class="giallo-l"><span class="z-comment">-- 3</span></span> <span class="giallo-l"><span class="z-comment">-- (1 row)</span></span></code></pre> <p><em>Et voilà !</em> The <code>ns_sum</code> function calls the Rust <code>sum</code> function through WebAssembly! How fun is that 😄?</p> <h2 id="inspect-a-webassembly-instance">Inspect a WebAssembly instance<a role="presentation" class="anchor" href="#inspect-a-webassembly-instance" title="Anchor link to this header">#</a> </h2> <p>This section shows how to inspect a WebAssembly instance. At the same time, it quickly explains how the extension works under the hood.</p> <p>The extension provides two foreign data wrappers, gathered together in the <code>wasm</code> foreign schema:</p> <ul> <li><code>wasm.instances</code> is a table with the <code>id</code> and <code>wasm_file</code> columns, respectively for the unique instance ID, and the path of the WebAssembly module,</li> <li><code>wasm.exported_functions</code> is a table with the <code>instance_id</code>, <code>name</code>, <code>inputs</code>, and <code>outputs</code> columns, respectively for the instance ID of the exported function, its name, its input types (already formatted for Postgres), and its output types (already formatted for Postgres).</li> </ul> <p>Let’s see:</p> <pre class="giallo z-code"><code data-lang="sql"><span class="giallo-l"><span class="z-comment">-- Select all WebAssembly instances.</span></span> <span class="giallo-l"><span class="z-keyword">SELECT</span><span class="z-keyword z-operator"> *</span><span class="z-keyword"> FROM</span><span class="z-constant z-other"> wasm</span><span>.</span><span class="z-constant z-other">instances</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">-- id | wasm_file</span></span> <span class="giallo-l"><span class="z-comment">-- -------------------------------------+-------------------------------</span></span> <span class="giallo-l"><span class="z-comment">-- 426e17af-c32f-5027-ad73-239e5450dd91 | /absolute/path/to/simple.wasm</span></span> <span class="giallo-l"><span class="z-comment">-- (1 row)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">-- Select all exported functions for a specific instance.</span></span> <span class="giallo-l"><span class="z-keyword">SELECT</span></span> <span class="giallo-l"><span class="z-keyword"> name</span><span>,</span></span> <span class="giallo-l"><span> inputs,</span></span> <span class="giallo-l"><span> outputs</span></span> <span class="giallo-l"><span class="z-keyword">FROM</span></span> <span class="giallo-l"><span class="z-constant z-other"> wasm</span><span>.</span><span class="z-constant z-other">exported_functions</span></span> <span class="giallo-l"><span class="z-keyword">WHERE</span></span> <span class="giallo-l"><span> instance_id </span><span class="z-keyword z-operator">=</span><span class="z-string"> &#39;426e17af-c32f-5027-ad73-239e5450dd91&#39;</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">-- name | inputs | outputs</span></span> <span class="giallo-l"><span class="z-comment">-- -------+-----------------+---------</span></span> <span class="giallo-l"><span class="z-comment">-- ns_sum | integer,integer | integer</span></span> <span class="giallo-l"><span class="z-comment">-- (1 row)</span></span></code></pre> <p>Based on these information, the <code>wasm</code> Postgres extension is able to generate the SQL function to call the WebAssembly exported functions.</p> <p>It sounds simplistic, and… to be honest, it is! The trick is to use <a rel="noopener external" target="_blank" href="https://www.postgresql.org/docs/current/fdwhandler.html">foreign data wrappers</a>, which is an awesome feature of Postgres.</p> <h2 id="how-fast-is-it-or-is-it-an-interesting-alternative-to-pl-pgsql">How fast is it, or: Is it an interesting alternative to PL/pgSQL?<a role="presentation" class="anchor" href="#how-fast-is-it-or-is-it-an-interesting-alternative-to-pl-pgsql" title="Anchor link to this header">#</a> </h2> <p>As we said, the extension API is rather small for now. The idea is to explore, to experiment, to have fun with WebAssembly inside a database. It is particularly interesting in two cases:</p> <ol> <li>To write extensions or procedures with any languages that compile to WebAssembly in place of <a rel="noopener external" target="_blank" href="https://www.postgresql.org/docs/10/plpgsql.html">PL/pgSQL</a>,</li> <li>To remove a potential performance bottleneck where speed is involved.</li> </ol> <p>Thus we run a basic benchmark. Like most of the benchmarks out there, it must be taken with a grain of salt.</p> <blockquote> <p>The goal is to compare the execution time between WebAssembly and PL/pgSQL, and see how both approaches scale.</p> </blockquote> <p>The Postgres WebAssembly extension uses <a rel="noopener external" target="_blank" href="https://www.postgresql.org/docs/current/fdwhandler.html">Wasmer</a> as the runtime, compiled with the Cranelift backend (<a rel="noopener external" target="_blank" href="https://medium.com/wasmer/a-webassembly-compiler-tale-9ef37aa3b537">learn more about the different backends</a>). We run the benchmark with Postgres 10, on a MacBook Pro 15" from 2016, 2.9Ghz Core i7 with 16Gb of memory.</p> <p>The methodology is the following:</p> <ul> <li>Load both the <code>plpgsql_fibonacci</code> and the <code>wasm_fibonacci</code> functions,</li> <li>Run them with a query like <code>SELECT *_fibonacci(n) FROM generate_series(1, 1000)</code> where <code>n</code> has the following values: 50, 500, and 5000, so that we can observe how both approaches scale,</li> <li>Write the timings down,</li> <li>Run this methodology multiple times, and compute the median of the results.</li> </ul> <p>Here come the results. The lower, the better.</p> <figure> <p><img src="https://mnt.io/articles/announcing-the-first-postgres-extension-to-run-webassembly/./benchmarks.png" alt="Benchmarks" loading="lazy" decoding="async" /></p> <figcaption> <p>Comparing WebAssembly vs. PL/pgSQL when computing the Fibonacci sequence with n=50, 500 and 5000.</p> </figcaption> </figure> <p>We notice that the Postgres WebAssembly extension is faster to run numeric computations. The WebAssembly approach scales pretty well compared to the PL/pgSQL approach, <em>in this situation</em>.</p> <h3 id="">When to use the WebAssembly extension?<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h3> <p>So far, the extension only supports integers (on 32- and 64-bits). The extension doesn’t support strings <em>yet</em>. It also doesn’t support records, views or other Postgres types. Keep in mind this is the very first step.</p> <p>Hence, it is too soon to tell whether WebAssembly can be an alternative to PL/pgSQL. But regarding the benchmark results above, we are sure they can live side-by-side, WebAssembly has clearly a place in the ecosystem! And we want to continue to pursue this exploration.</p> <h2 id="-1">Conclusion<a role="presentation" class="anchor" href="#-1" title="Anchor link to this header">#</a> </h2> <p>We are already talking with people that are interested in using WebAssembly inside databases. If you have any particular use cases, please reach us at <a rel="noopener external" target="_blank" href="https://wasmer.io/">wasmer.io</a>, or on Twitter at <a rel="noopener external" target="_blank" href="https://twitter.com/wasmerio">@wasmerio</a> directly or me <a rel="noopener external" target="_blank" href="https://twitter.com/mnt_io">@mnt_io</a>.</p> <p>Everything is open source, as usual! Happy hacking.</p> Announcing the fastest WebAssembly runtime for Go: wasmer 2019-05-29T00:00:00+00:00 2019-05-29T00:00:00+00:00 Unknown https://mnt.io/articles/announcing-the-fastest-webassembly-runtime-for-go-wasmer/ <p><em>This is a copy of <a rel="noopener external" target="_blank" href="https://medium.com/wasmer/announcing-the-fastest-webassembly-runtime-for-go-wasmer-19832d77c050">an article I wrote for Wasmer</a>.</em></p> <hr /> <p>WebAssembly is a portable binary format. That means the same file can run anywhere.</p> <blockquote> <p>To uphold this bold statement, each language, platform and system must be able to run WebAssembly — as fast and safely as possible.</p> </blockquote> <p><a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer">Wasmer</a> is a WebAssembly runtime written in <a rel="noopener external" target="_blank" href="https://www.rust-lang.org/">Rust</a>. It goes without saying that the runtime can be used in any Rust application. We have also successfully embedded the runtime in other languages:</p> <ul> <li>Using <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer/tree/master/lib/runtime-c-api">C and C++ bindings</a></li> <li>In PHP, using <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/php-ext-wasm"><code>php-ext-wasm</code></a></li> <li>In Python, using <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/python-ext-wasm"><code>python-ext-wasm</code></a> — <a rel="noopener external" target="_blank" href="https://pypi.org/project/wasmer/">wasmer package on PyPI</a></li> <li>In Ruby, using <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/ruby-ext-wasm"><code>ruby-ext-wasm</code></a> — <a rel="noopener external" target="_blank" href="https://rubygems.org/gems/wasmer">wasmer gem on RubyGems</a></li> <li><strong>It is now time to hang around <a rel="noopener external" target="_blank" href="https://golang.org/">Go</a> 🐹!</strong></li> </ul> <p>We are super happy to announce <code>github.com/wasmerio/go-ext-wasm/wasmer</code>, a <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/go-ext-wasm">Go library to run WebAssembly binaries, fast</a>.</p> <h2 id="calling-a-webassembly-function-from-go">Calling a WebAssembly function from Go<a role="presentation" class="anchor" href="#calling-a-webassembly-function-from-go" title="Anchor link to this header">#</a> </h2> <p>First, let’s install <code>wasmer</code> in your go environment (<em>with cgo support</em>).</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span class="z-storage"> export</span><span class="z-variable"> CGO_ENABLED</span><span class="z-keyword z-operator">=</span><span class="z-constant z-numeric">1</span><span class="z-punctuation z-terminator">;</span><span class="z-storage"> export</span><span class="z-variable"> CC</span><span class="z-keyword z-operator">=</span><span class="z-variable">gcc</span><span class="z-punctuation z-terminator">;</span><span class="z-entity z-name"> go</span><span class="z-string"> install github.com/wasmerio/go-ext-wasm/wasmer</span></span></code></pre> <p>Let’s jump immediately into some examples.<code>github.com/wasmerio/go-ext-wasm/wasmer</code> is a regular Go library. The installation is automated with <code>import "github.com/wasmerio/go-ext-wasm/wasmer"</code>.</p> <p>Let’s get our hands dirty. We will write a program that compiles to WebAssembly easily, using Rust for instance:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[no_mangle]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> extern</span><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> sum</span><span>(</span><span class="z-variable">x</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> i32</span><span>,</span><span class="z-variable"> y</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> i32</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> i32</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> x</span><span class="z-keyword z-operator"> +</span><span class="z-variable"> y</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>After compilation to WebAssembly, we get a file like <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/go-ext-wasm/blob/master/wasmer/test/testdata/examples/simple.wasm">this one</a>, named <code>simple.wasm</code>.<br /> The following Go program executes the <code>sum</code> function by passing <code>5</code> and <code>37</code> as arguments:</p> <pre class="giallo z-code"><code data-lang="go"><span class="giallo-l"><span class="z-keyword">package</span><span class="z-entity z-name"> main</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">import</span><span> (</span></span> <span class="giallo-l"><span class="z-string"> &quot;</span><span class="z-entity z-name z-import">fmt</span><span class="z-string">&quot;</span></span> <span class="giallo-l"><span class="z-variable"> wasm</span><span class="z-string"> &quot;</span><span class="z-entity z-name z-import">github.com/wasmerio/go-ext-wasm/wasmer</span><span class="z-string">&quot;</span></span> <span class="giallo-l"><span>)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">func</span><span class="z-entity z-name z-function"> main</span><span>() {</span></span> <span class="giallo-l"><span class="z-comment"> // Reads the WebAssembly module as bytes.</span></span> <span class="giallo-l"><span class="z-variable"> bytes</span><span>,</span><span class="z-variable"> _</span><span class="z-keyword z-operator"> :=</span><span class="z-variable"> wasm</span><span>.</span><span class="z-entity z-name z-function">ReadBytes</span><span>(</span><span class="z-string">&quot;simple.wasm&quot;</span><span>)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Instantiates the WebAssembly module.</span></span> <span class="giallo-l"><span class="z-variable"> instance</span><span>,</span><span class="z-variable"> _</span><span class="z-keyword z-operator"> :=</span><span class="z-variable"> wasm</span><span>.</span><span class="z-entity z-name z-function">NewInstance</span><span>(</span><span class="z-variable">bytes</span><span>)</span></span> <span class="giallo-l"><span class="z-keyword"> defer</span><span class="z-variable"> instance</span><span>.</span><span class="z-entity z-name z-function">Close</span><span>()</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Gets the `sum` exported function from the WebAssembly instance.</span></span> <span class="giallo-l"><span class="z-variable"> sum</span><span class="z-keyword z-operator"> :=</span><span class="z-variable"> instance</span><span>.</span><span class="z-variable">Exports</span><span>[</span><span class="z-string">&quot;sum&quot;</span><span>]</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Calls that exported function with Go standard values. The WebAssembly</span></span> <span class="giallo-l"><span class="z-comment"> // types are inferred and values are casted automatically.</span></span> <span class="giallo-l"><span class="z-variable"> result</span><span>,</span><span class="z-variable"> _</span><span class="z-keyword z-operator"> :=</span><span class="z-entity z-name z-function"> sum</span><span>(</span><span class="z-constant z-numeric">5</span><span>,</span><span class="z-constant z-numeric"> 37</span><span>)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> fmt</span><span>.</span><span class="z-entity z-name z-function">Println</span><span>(</span><span class="z-variable">result</span><span>)</span><span class="z-comment"> // 42!</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Great! We have successfully executed a WebAssembly file inside Go.</p> <blockquote> <p><em>Note: Go values passed to the WebAssembly exported function are automatically cast to WebAssembly values. Types are inferred and casting is done automatically. Thus, a WebAssembly function acts as any regular Go function.</em></p> </blockquote> <h2 id="webassembly-calling-go-funtions">WebAssembly calling Go funtions<a role="presentation" class="anchor" href="#webassembly-calling-go-funtions" title="Anchor link to this header">#</a> </h2> <p>A WebAssembly module <em>exports</em> some functions, so that they can be called from the outside world. This is the entry point to execute WebAssembly.</p> <p>Nonetheless, a WebAssembly module can also have <em>imported</em> functions. Let’s consider the following Rust program:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-storage">extern</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> sum</span><span>(</span><span class="z-variable">x</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> i32</span><span>,</span><span class="z-variable"> y</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> i32</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> i32</span><span>;</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>#[no_mangle]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> extern</span><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> add1</span><span>(</span><span class="z-variable">x</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> i32</span><span>,</span><span class="z-variable"> y</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> i32</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> i32</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> unsafe</span><span> {</span><span class="z-entity z-name z-function"> sum</span><span>(</span><span class="z-variable">x</span><span>,</span><span class="z-variable"> y</span><span>) }</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 1</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>The exported function <code>add1</code> calls the <code>sum</code> function. Its implementation is absent, only its signature is defined. This is an “extern function”, and for WebAssembly, this is an <em>imported</em> function, because its implementation must be <em>imported</em>.</p> <p>Let’s implement the <code>sum</code> function in Go! To do so, <em>need</em> to use <a rel="noopener external" target="_blank" href="https://blog.golang.org/c-go-cgo">cgo</a>:</p> <ol> <li>The <code>sum</code> function signature is defined in C (see the comment above <code>import "C"</code>),</li> <li>The <code>sum</code> implementation is defined in Go. Notice the <code>//export</code> which is the way cgo uses to map Go code to C code,</li> <li><code>NewImports</code> is an API used to create WebAssembly imports. In this code <code>"sum"</code> is the WebAssembly imported function name, <code>sum</code> is the Go function pointer, and <code>C.sum</code> is the cgo function pointer,</li> <li>Finally, <code>NewInstanceWithImports</code> is the constructor to use to instantiate the WebAssembly module with imports. That’s it.</li> </ol> <p>Let’s see the complete program:</p> <pre class="giallo z-code"><code data-lang="go"><span class="giallo-l"><span class="z-keyword">package</span><span class="z-entity z-name"> main</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// // 1️⃣ Declare the `sum` function signature (see cgo).</span></span> <span class="giallo-l"><span class="z-comment">//</span></span> <span class="giallo-l"><span class="z-comment">// #include &lt;stdlib.h&gt;</span></span> <span class="giallo-l"><span class="z-comment">//</span></span> <span class="giallo-l"><span class="z-comment">// extern int32_t sum(void *context, int32_t x, int32_t y);</span></span> <span class="giallo-l"><span class="z-keyword">import</span><span class="z-string"> &quot;</span><span class="z-entity z-name z-import">C</span><span class="z-string">&quot;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">import</span><span> (</span></span> <span class="giallo-l"><span class="z-string"> &quot;</span><span class="z-entity z-name z-import">fmt</span><span class="z-string">&quot;</span></span> <span class="giallo-l"><span class="z-variable"> wasm</span><span class="z-string"> &quot;</span><span class="z-entity z-name z-import">github.com/wasmerio/go-ext-wasm/wasmer</span><span class="z-string">&quot;</span></span> <span class="giallo-l"><span class="z-string"> &quot;</span><span class="z-entity z-name z-import">unsafe</span><span class="z-string">&quot;</span></span> <span class="giallo-l"><span>)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// 2️⃣ Write the implementation of the `sum` function, and export it (for cgo).</span></span> <span class="giallo-l"><span class="z-comment">//export sum</span></span> <span class="giallo-l"><span class="z-keyword">func</span><span class="z-entity z-name z-function"> sum</span><span>(</span><span class="z-variable z-parameter">context</span><span class="z-entity z-name"> unsafe</span><span>.</span><span class="z-entity z-name">Pointer</span><span>,</span><span class="z-variable z-parameter"> x</span><span class="z-storage z-type"> int32</span><span>,</span><span class="z-variable z-parameter"> y</span><span class="z-storage z-type"> int32</span><span>)</span><span class="z-storage z-type"> int32</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> x</span><span class="z-keyword z-operator"> +</span><span class="z-variable"> y</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">func</span><span class="z-entity z-name z-function"> main</span><span>() {</span></span> <span class="giallo-l"><span class="z-comment"> // Reads the WebAssembly module as bytes.</span></span> <span class="giallo-l"><span class="z-variable"> bytes</span><span>,</span><span class="z-variable"> _</span><span class="z-keyword z-operator"> :=</span><span class="z-variable"> wasm</span><span>.</span><span class="z-entity z-name z-function">ReadBytes</span><span>(</span><span class="z-string">&quot;import.wasm&quot;</span><span>)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // 3️⃣ Declares the imported functions for WebAssembly.</span></span> <span class="giallo-l"><span class="z-variable"> imports</span><span>,</span><span class="z-variable"> _</span><span class="z-keyword z-operator"> :=</span><span class="z-variable"> wasm</span><span>.</span><span class="z-entity z-name z-function">NewImports</span><span>().</span><span class="z-entity z-name z-function">Append</span><span>(</span><span class="z-string">&quot;sum&quot;</span><span>,</span><span class="z-variable"> sum</span><span>,</span><span class="z-variable"> C</span><span>.</span><span class="z-variable">sum</span><span>)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // 4️⃣ Instantiates the WebAssembly module with imports.</span></span> <span class="giallo-l"><span class="z-variable"> instance</span><span>,</span><span class="z-variable"> _</span><span class="z-keyword z-operator"> :=</span><span class="z-variable"> wasm</span><span>.</span><span class="z-entity z-name z-function">NewInstanceWithImports</span><span>(</span><span class="z-variable">bytes</span><span>,</span><span class="z-variable"> imports</span><span>)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Close the WebAssembly instance later.</span></span> <span class="giallo-l"><span class="z-keyword"> defer</span><span class="z-variable"> instance</span><span>.</span><span class="z-entity z-name z-function">Close</span><span>()</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Gets the `add1` exported function from the WebAssembly instance.</span></span> <span class="giallo-l"><span class="z-variable"> add1</span><span class="z-keyword z-operator"> :=</span><span class="z-variable"> instance</span><span>.</span><span class="z-variable">Exports</span><span>[</span><span class="z-string">&quot;add1&quot;</span><span>]</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Calls that exported function.</span></span> <span class="giallo-l"><span class="z-variable"> result</span><span>,</span><span class="z-variable"> _</span><span class="z-keyword z-operator"> :=</span><span class="z-entity z-name z-function"> add1</span><span>(</span><span class="z-constant z-numeric">1</span><span>,</span><span class="z-constant z-numeric"> 2</span><span>)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> fmt</span><span>.</span><span class="z-entity z-name z-function">Println</span><span>(</span><span class="z-variable">result</span><span>)</span></span> <span class="giallo-l"><span class="z-comment"> // add1(1, 2)</span></span> <span class="giallo-l"><span class="z-comment"> // = sum(1 + 2) + 1</span></span> <span class="giallo-l"><span class="z-comment"> // = 1 + 2 + 1</span></span> <span class="giallo-l"><span class="z-comment"> // = 4</span></span> <span class="giallo-l"><span class="z-comment"> // QED</span></span> <span class="giallo-l"><span>}</span></span></code></pre><h2 id="reading-the-memory">Reading the memory<a role="presentation" class="anchor" href="#reading-the-memory" title="Anchor link to this header">#</a> </h2> <p>A WebAssembly instance has a linear memory. Let’s see how to read it. Consider the following Rust program:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[no_mangle]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> extern</span><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> return_hello</span><span>()</span><span class="z-keyword z-operator"> -&gt; *</span><span class="z-storage">const</span><span class="z-entity z-name"> u8</span><span> {</span></span> <span class="giallo-l"><span class="z-string"> b&quot;Hello, World!</span><span class="z-constant z-character">\0</span><span class="z-string">&quot;</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">as_ptr</span><span>()</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>The <code>return_hello</code> function returns a pointer to a string. The string terminates by a null byte, <em>à la</em> C. Let’s jump on the Go side:</p> <pre class="giallo z-code"><code data-lang="go"><span class="giallo-l"><span class="z-variable">bytes</span><span>,</span><span class="z-variable"> _</span><span class="z-keyword z-operator"> :=</span><span class="z-variable"> wasm</span><span>.</span><span class="z-entity z-name z-function">ReadBytes</span><span>(</span><span class="z-string">&quot;memory.wasm&quot;</span><span>)</span></span> <span class="giallo-l"><span class="z-variable">instance</span><span>,</span><span class="z-variable"> _</span><span class="z-keyword z-operator"> :=</span><span class="z-variable"> wasm</span><span>.</span><span class="z-entity z-name z-function">NewInstance</span><span>(</span><span class="z-variable">bytes</span><span>)</span></span> <span class="giallo-l"><span class="z-keyword">defer</span><span class="z-variable"> instance</span><span>.</span><span class="z-entity z-name z-function">Close</span><span>()</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Calls the `return_hello` exported function.</span></span> <span class="giallo-l"><span class="z-comment">// This function returns a pointer to a string.</span></span> <span class="giallo-l"><span class="z-variable">result</span><span>,</span><span class="z-variable"> _</span><span class="z-keyword z-operator"> :=</span><span class="z-variable"> instance</span><span>.</span><span class="z-entity z-name z-function">Exports</span><span>[</span><span class="z-string">&quot;return_hello&quot;</span><span>]()</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Gets the pointer value as an integer.</span></span> <span class="giallo-l"><span class="z-variable">pointer</span><span class="z-keyword z-operator"> :=</span><span class="z-variable"> result</span><span>.</span><span class="z-entity z-name z-function">ToI32</span><span>()</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Reads the memory.</span></span> <span class="giallo-l"><span class="z-variable">memory</span><span class="z-keyword z-operator"> :=</span><span class="z-variable"> instance</span><span>.</span><span class="z-variable">Memory</span><span>.</span><span class="z-entity z-name z-function">Data</span><span>()</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">fmt</span><span>.</span><span class="z-entity z-name z-function">Println</span><span>(</span><span class="z-storage z-type">string</span><span>(</span><span class="z-variable">memory</span><span>[</span><span class="z-variable">pointer</span><span> :</span><span class="z-variable"> pointer</span><span class="z-keyword z-operator">+</span><span class="z-constant z-numeric">13</span><span>]))</span><span class="z-comment"> // Hello, World!</span></span></code></pre> <p>The <code>return_hello</code> function returns a pointer as an <code>i32</code> value. We get its value by calling <code>ToI32</code>. Then, we fetch the memory data with <code>instance.Memory.Data()</code>.</p> <p>This function returns a slice over the WebAssembly instance memory. It can be used as any regular Go slice.</p> <p>Fortunately for us, we already know the length of the string we want to read, so <code>memory[pointer : pointer+13]</code> is enough to read the bytes, that are then cast to a string. <em>Et voilà !</em></p> <blockquote> <p>You can read <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/go-ext-wasm/blob/6934a0fa06558f77884398a2371de182593e6a6c/wasmer/test/example_greet_test.go">the Greet Example</a> to see a more advanced usage of the memory API.</p> </blockquote> <h2 id="benchmarks">Benchmarks<a role="presentation" class="anchor" href="#benchmarks" title="Anchor link to this header">#</a> </h2> <p>So far, <code>github.com/wasmerio/go-ext-wasm/wasmer</code> has a nice API, but …<em>is it fast</em>?</p> <p>Contrary to PHP or Ruby, there are already existing runtimes in the Go world to execute WebAssembly. The main candidates are:</p> <ul> <li><a rel="noopener external" target="_blank" href="https://github.com/perlin-network/life">Life</a>, from Perlin Network, a WebAssembly interpreter</li> <li><a rel="noopener external" target="_blank" href="https://github.com/go-interpreter/wagon">Wagon</a>, from Go Interpreter, a WebAssembly interpreter and toolkit.</li> </ul> <p>In <a rel="noopener external" target="_blank" href="https://medium.com/wasmer/php-ext-wasm-migrating-from-wasmi-to-wasmer-4d1014f41c88">our blog post about the PHP extension</a>, we have used <a rel="noopener external" target="_blank" href="https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/nbody.html">the n-body algorithm</a> to benchmark the performance. Life provides more benchmarks: <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Fibonacci_number">the Fibonacci algorithm</a> (the recursive version), <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm">the Pollard’s rho algorithm</a>, and the Snappy Compress operation. The latter works successfully with <code>github.com/wasmerio/go-ext-wasm/wasmer</code> but not with Life or Wagon. We have removed it from the benchmark suites. <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/go-ext-wasm/tree/master/benchmarks">Benchmark sources</a> are online.</p> <p>We use Life 20190521143330–57f3819c2df0, and Wagon 0.4.0, i.e. <em>the latest versions to date</em>.</p> <p>The benchmark numbers represent the average result for 10 runs each. The computer that ran these benchmarks is a MacBook Pro 15" from 2016, 2.9Ghz Core i7 with 16Gb of memory.</p> <p>Results are grouped by benchmark algorithm on the X axis. The Y axis represents the time used to run the algorithm, expressed in milliseconds. The lower, the better.</p> <figure> <p><img src="https://mnt.io/articles/announcing-the-fastest-webassembly-runtime-for-go-wasmer/./benchmarks.png" alt="Benchmarks" loading="lazy" decoding="async" /></p> <figcaption> <p>Speed comparison between Wasmer, Wagon and Life. Benchmark suites are the n-body, Fibonacci, and Pollard’s rho algorithms. Speed is expressed in ms. Lower is better.</p> </figcaption> </figure> <p>While both Life and Wagon provide on average the same speed, Wasmer (<code>github.com/wasmerio/go-ext/wasmer</code>) is on average <strong>72 times faster</strong> 🎉.</p> <p>It is important to know that Wasmer comes with 3 backends: <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer/tree/master/lib/singlepass-backend">Singlepass</a>, <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer/tree/master/lib/clif-backend">Cranelift</a>, and <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer/tree/master/lib/llvm-backend">LLVM</a>. The default backend that is used by the Go library is Cranelift (<a rel="noopener external" target="_blank" href="https://github.com/CraneStation/cranelift">learn more about Cranelift</a>). Using LLVM will provide performance close to native, but we decided to start with Cranelift as it offers the best tradeoff between compilation-time and execution-time (<a rel="noopener external" target="_blank" href="https://medium.com/wasmer/a-webassembly-compiler-tale-9ef37aa3b537">learn more about the different backends</a>, when to use them, pros and cons etc.).</p> <h2 id="">Conclusion<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h2> <p><code>[github.com/wasmerio/go-ext-wasm/wasmer](https://github.com/wasmerio/go-ext-wasm)</code> is a new Go library to execute WebAssembly binaries. It embeds the <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer">Wasmer</a> runtime. The first version supports all the required API for the most common usages.</p> <p>The current benchmarks (a mix from our benchmark suites and from Life suites) show that <strong>Wasmer is — on average — 72 times faster than Life and Wagon</strong>, the two major existing WebAssembly runtimes in the Go world.</p> <p>If you want to follow the development, take a look at <a rel="noopener external" target="_blank" href="https://twitter.com/wasmerio">@wasmerio</a> and <a rel="noopener external" target="_blank" href="https://twitter.com/mnt_io">@mnt_io</a> on Twitter, or <a rel="noopener external" target="_blank" href="https://webassembly.social/@wasmer">@<a href="mailto:[email protected]">[email protected]</a></a> on Mastodon.</p> <p>And of course, everything is open source at <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/go-ext-wasm">wasmerio/go-ext-wasm</a>.</p> <p>Thank you for your time, we can’t wait to see what you build with us!</p> 🐘+🦀+🕸 php-ext-wasm: Migrating from wasmi to Wasmer 2019-04-03T00:00:00+00:00 2019-04-03T00:00:00+00:00 Unknown https://mnt.io/articles/elephant-crab-spider-web-php-ext-wasm-migrating-from-wasmi-to-wasmer/ <p><em>This is a copy of <a rel="noopener external" target="_blank" href="https://medium.com/wasmer/php-ext-wasm-migrating-from-wasmi-to-wasmer-4d1014f41c88">an article I wrote for Wasmer</a>.</em></p> <hr /> <p>First as a joke, now as a real product, I started to develop <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/php-ext-wasm">php-ext-wasm</a>: a <a rel="noopener external" target="_blank" href="http://php.net/">PHP</a> extension allowing to execute <a rel="noopener external" target="_blank" href="https://webassembly.org/">WebAssembly</a> binaries.</p> <p>The PHP virtual machine (VM) is <a rel="noopener external" target="_blank" href="https://github.com/php/php-src/">Zend Engine</a>. To write an extension, one needs to develop in C or C++. The extension was simple C bindings to a Rust library I also wrote. At that time, this Rust library was using <a rel="noopener external" target="_blank" href="https://github.com/paritytech/wasmi"><code>wasmi</code></a> for the WebAssembly VM. I knew that <code>wasmi</code> wasn’t the fastest WebAssembly VM in the game, but the API is solid, well-tested, it compiles quickly, and is easy to hack. All the requirements to start a project!</p> <p>After 6 hours of development, I got something working. I was able to run the following PHP program:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$instance</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span> Wasm</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Instance</span><span>(</span><span class="z-string">&#39;simple.wasm&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable">$result</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $instance</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">sum</span><span>(</span><span class="z-constant z-numeric">1</span><span class="z-punctuation z-separator">,</span><span class="z-constant z-numeric"> 2</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-function">var_dump</span><span>(</span><span class="z-variable">$result</span><span>)</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> // int(3)</span></span></code></pre> <p>The API is straightforward: create an instance (here of <code>simple.wasm</code>), then call functions on it (here <code>sum</code> with 1 and 2 as arguments). PHP values are transformed into WebAssembly values automatically. For the record, here is the <code>simple.rs</code> Rust program that is compiled to a WebAssembly binary:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[no_mangle]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> extern</span><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> sum</span><span>(</span><span class="z-variable">x</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> i32</span><span>,</span><span class="z-variable"> y</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> i32</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> i32</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> x</span><span class="z-keyword z-operator"> +</span><span class="z-variable"> y</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>It was great! 6 hours is a relatively small number of hours to go that far according to me.</p> <p>However, I quickly noticed that <code>wasmi</code> is… slow. <a rel="noopener external" target="_blank" href="https://webassembly.org/">One of the promise of WebAssembly</a> is:</p> <blockquote> <p>WebAssembly aims to execute at native speed by taking advantage of <a rel="noopener external" target="_blank" href="https://webassembly.org/docs/portability/#assumptions-for-efficient-execution">common hardware capabilities</a> available on a wide range of platforms.</p> </blockquote> <p>And clearly, my extension wasn’t fulfilling this promise. Let’s see a basic comparison with a benchmark.</p> <p>I chose <a rel="noopener external" target="_blank" href="https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/nbody.html">the <em>n-body</em> algorithm</a> from <a rel="noopener external" target="_blank" href="https://benchmarksgame-team.pages.debian.net/benchmarksgame/">the Computer Language Benchmarks Game</a> from Debian, mostly because it’s relatively CPU intensive. Also, the algorithm has a simple interface: based on an integer, it returns a floating-point number; this API doesn’t involve any advanced instance memory API, which is perfect to test a proof-of-concept.</p> <p>As a baseline, I’ve run the <em>n-body</em> algorithm <a rel="noopener external" target="_blank" href="https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/nbody-rust-7.html">written in Rust</a>, let’s call it <code>rust-baseline</code>. The same algorithm has been <a rel="noopener external" target="_blank" href="https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/nbody-php-3.html">written in PHP</a>, let’s call it <code>php</code>. Finally, the algorithm has been compiled from Rust to WebAssembly, and executed with the <code>php-ext-wasm</code> extension, let’s call that case <code>php+wasmi</code>. All results are for <code>nbody(5000000)</code>:</p> <ul> <li><code>rust-baseline</code>: 287ms,</li> <li><code>php</code>: 19,761ms,</li> <li><code>php+wasmi</code>: 67,622ms.</li> </ul> <p>OK, so… <code>php-ext-wasm</code> with <code>wasmi</code> is <strong>3.4 times slower</strong> than PHP itself, it is pointless to use WebAssembly in such conditions!</p> <p>It confirms my first intuition though: In our case, <code>wasmi</code> is really great to mock something up, but it’s not fast enough for our expectations.</p> <h2 id="faster-faster-faster">Faster, faster, faster…<a role="presentation" class="anchor" href="#faster-faster-faster" title="Anchor link to this header">#</a> </h2> <p>I wanted to use <a rel="noopener external" target="_blank" href="https://github.com/CraneStation/cranelift">Cranelift</a> since the beginning. It’s a code generator, <em>à la</em> <a rel="noopener external" target="_blank" href="http://llvm.org/">LLVM</a> (excuse the brutal shortcut, the goal isn’t to explain what Cranelift is in details, but that’s a really awesome project!). To quote the project itself:</p> <blockquote> <p>Cranelift is a low-level retargetable code generator. It translates a <a rel="noopener external" target="_blank" href="https://cranelift.readthedocs.io/en/latest/ir.html">target-independent intermediate representation</a> into executable machine code.</p> </blockquote> <p>It basically means that the Cranelift API can be used to generate executable code.</p> <p>It’s perfect! I can replace <code>wasmi</code> by Cranelift, and boom, profit. But… there is other ways to get even faster code execution — at the cost of a longer code compilation though.</p> <p>For instance, LLVM can provide a very fast code execution, almost at native speed. Or we can generate assembly code dynamically. Well, there is multiple ways to achieve that. What if a project could provide a WebAssembly virtual machine with multiple backends?</p> <h2 id="enter-wasmer">Enter Wasmer<a role="presentation" class="anchor" href="#enter-wasmer" title="Anchor link to this header">#</a> </h2> <p>And it was at that specific time that I’ve been hired by <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer">Wasmer</a>. To be totally honest, I was looking at Wasmer a few weeks before. It was a surprise and a great opportunity for me. Well, the universe really wants this rewrite from <code>wasmi</code> to Wasmer, right 😅?</p> <p>Wasmer is organized as a set of Rust libraries (called crates). There is even a <code>wasmer-runtime-c-api</code> crate which is a C and a C++ API on top of the <code>wasmer-runtime</code> crate and the <code>wasmer-runtime-core</code> crate, i.e. it allows running the WebAssembly virtual machine as you want, with the backend of your choice: <em>Cranelift</em>, <em>LLVM</em>, or <em>Dynasm</em> (at the time of writing). That’s perfect, it removes my Rust library between the PHP extension and <code>wasmi</code>. Then <code>php-ext-wasm</code> is reduced to a PHP extension without any Rust code, everything goes to <code>wasmer-runtime-c-api</code>. That’s sad to remove Rust from this project, but it relies on more Rust code!</p> <p>Counting the time to make some patches on <code>wasmer-runtime-c-api</code>, I’ve been able to migrate <code>php-ext-wasm</code> to Wasmer in 5 days.</p> <p>By default, <code>php-ext-wasm</code> uses Wasmer with the Cranelift backend, it does a great balance between compilation and execution time. It is really good. Let’s run the benchmark, with the addition of <code>php+wasmer(cranelift)</code>:</p> <ul> <li><code>rust-baseline</code>: 287ms,</li> <li><code>php</code>: 19,761ms,</li> <li><code>php+wasmi</code>: 67,622ms,</li> <li><code>php+wasmer(cranelift)</code>: 2,365ms 🎉.</li> </ul> <p>Finally, the PHP extension provides a faster execution than PHP itself! <code>php+wasmer(cranelift)</code> is <strong>8.6 times faster</strong> than <code>php</code> to be exact. And it is <strong>28.6 times faster</strong> than <code>php+wasmi</code>. Can we reach the native speed (represented by <code>rust-baseline</code> here)? It’s very likely with LLVM. That’s for another article. I’m super happy with Cranelift for the moment. (See <a rel="noopener external" target="_blank" href="https://medium.com/wasmer/benchmarking-webassembly-runtimes-18497ce0d76e">our previous blog post to learn how we benchmark different backends in Wasmer, and other WebAssembly runtimes</a>).</p> <h2 id="more-optimizations">More Optimizations<a role="presentation" class="anchor" href="#more-optimizations" title="Anchor link to this header">#</a> </h2> <p>Wasmer provides more features, like module caching. Those features are now included in the PHP extension. When booting the <code>nbody.wasm</code> file (19kb), it took 4.2ms. By booting, I mean: reading the WebAssembly binary from a file, parsing it, validating it, compiling it to executable code and a WebAssembly module structure.</p> <p>PHP execution model is: starts, runs, dies. Memory is freed for each request. If one wants to use <code>php-ext-wasm</code>, you don’t really want to pay that “<em>booting cost</em>” every time.</p> <p>Hopefully, <code>wasmer-runtime-c-api</code> now provides a module serialization API, which is integrated into the PHP extension itself. It saves the “booting cost”, but it adds a “deserialization cost”. That second cost is smaller, but still, we need to know it exists.</p> <p>Hopefully again, Zend Engine has an API to get persistent in-memory data between PHP executions. <code>php-ext-wasm</code> supports that API to get persistent modules, <em>et voilà</em>.</p> <p>Now it takes <strong>4.2ms</strong> for the first boot of <code>nbody.wasm</code> and <strong>0.005ms</strong> for all the next boots. It’s 840 times faster!</p> <h2 id="conclusion">Conclusion<a role="presentation" class="anchor" href="#conclusion" title="Anchor link to this header">#</a> </h2> <p>Wasmer is a young — but mature — framework to build WebAssembly runtimes on top of. The default backend is Cranelift, and it shows its promises: It brings a correct balance between compilation time and execution time.</p> <p><code>wasmi</code> has been a good companion to develop a <em>Proof-Of-Concept</em>. This library has its place in other usages though, like very short-living WebAssembly binaries (I’m thinking of Ethereum contracts that compile to WebAssembly for instance, which is one of the actual use cases). It’s important to understand that no runtime is better than another, it depends on the use case.</p> <p>The next step is to stabilize <code>php-ext-wasm</code> to release a 1.0.0 version.</p> <p>See you there!</p> <p>If you want to follow the development, take a look at <a rel="noopener external" target="_blank" href="https://twitter.com/wasmerio">@wasmerio</a> and <a rel="noopener external" target="_blank" href="https://twitter.com/mnt_io">@mnt_io</a> on Twitter.</p> Bye bye Automattic, hello Wasmer 2019-03-04T00:00:00+00:00 2019-03-04T00:00:00+00:00 Unknown https://mnt.io/articles/bye-bye-automattic-hello-wasmer/ <p>Today is my first day at <a rel="noopener external" target="_blank" href="https://wasmer.io/">Wasmer</a>.</p> <p>It's with a lot of regrets that I leave Automattic. To be clear, I'm not leaving because something negative happened, I'm leaving because I've received the same job offer, 3 times in 10 days, from Wasmer, Google and Mozilla. Namely to work with Rust or C++ to build a WebAssembly runtime. This is an offer I can barely decline. It's an opportunity and a dream for me. And I was lucky enough to get a choice between 3 excellent companies!</p> <p>I can only encourage you to <a rel="noopener external" target="_blank" href="https://automattic.com/work-with-us/">work with Automattic</a>. It's definitely the best company I've ever work with; stealing the 1st place to Mozilla. Automattic is not only about WordPress.com and other services: It's a way of living. The culture, the spirit, the interactions between people, the mission, everything is <em>exceptional</em>. It has been a super great experience.</p> <p>I could write 100 pages about my team. They have all been <em>remarkable</em> in many ways. I'm closer to them although they live at 10'000km, rather than colleagues I met everyday in person in the past. Congrats to <a rel="noopener external" target="_blank" href="https://ma.tt/about/">Matt</a> for this incredible project.</p> <p>Now it's time to work on <a rel="noopener external" target="_blank" href="https://github.com/wasmerio/wasmer">Wasmer</a>. It's a <a rel="noopener external" target="_blank" href="https://webassembly.org/">WebAssembly</a> runtime written in <a rel="noopener external" target="_blank" href="https://www.rust-lang.org/">Rust</a>: My two current passions. It's powerful, modular, well-designed, and it comes with great ambitions. I'm really exciting. I work with an extraordinary team: <a rel="noopener external" target="_blank" href="https://github.com/syrusakbary">Syrus Akbary</a> (the author of <a rel="noopener external" target="_blank" href="https://github.com/graphql-python/graphene">Graphene</a>, a GraphQL framework in Python), <a rel="noopener external" target="_blank" href="https://github.com/lachlansneff">Lachlan Sneff</a> (the author of <a rel="noopener external" target="_blank" href="https://github.com/nebulet/nebulet">Nebulet</a>, a microkernel that implements a WebAssembly "usermode" that runs in Ring 0), <a rel="noopener external" target="_blank" href="https://github.com/bjfish">Brandon Fish</a> (a great contributor of <a rel="noopener external" target="_blank" href="https://github.com/oracle/truffleruby">Truffleruby</a>, a high performance implementation of Ruby with GraalVM), <a rel="noopener external" target="_blank" href="https://github.com/xmclark">Mackenzie Clark</a>, and soon more.</p> <p>My job will consist to work on the runtime of course, and also to integrate/embed the runtime into different languages, such as PHP —like I did with <code>[php-ext-wasm](https://github.com/Hywan/php-ext-wasm)</code>, more to come on this blog—. More secret projects coming. Let's turn them into realities 🎉!</p> The PHP galaxy 2018-10-29T00:00:00+00:00 2018-10-29T00:00:00+00:00 Unknown https://mnt.io/series/from-rust-to-beyond/the-php-galaxy/ <p>The galaxy we will explore today is the PHP galaxy. This post will explain what PHP is, how to compile any Rust program to C and then to a PHP native extension.</p> <h2 id="what-is-php-and-why">What is PHP, and why?<a role="presentation" class="anchor" href="#what-is-php-and-why" title="Anchor link to this header">#</a> </h2> <p><a rel="noopener external" target="_blank" href="https://secure.php.net/">PHP</a> is a:</p> <blockquote> <p>popular general-purpose scripting language that is especially suited to Web development. Fast, flexible, and pragmatic, PHP powers everything from your blog to the most popular websites in the world.</p> </blockquote> <p>PHP has sadly acquired a bad reputation along the years, but recent releases (since PHP 7.0 mostly) have introduced neat language features, and many cleanups, which are excessively ignored by haters. PHP is also a fast scripting language, and is very flexible. PHP now has declared types, traits, variadic arguments, closures (with explicit scopes!), generators, and a <em>huge</em> backward compatibility. The development of PHP is led by <a rel="noopener external" target="_blank" href="https://wiki.php.net/rfc">RFCs</a>, which is an open and democratic process.</p> <p>The Gutenberg project is a new editor for WordPress. The latter is written in PHP. This is naturally that we want a native extension for PHP to parse the Gutenberg post format.</p> <p>PHP is a language with <a rel="noopener external" target="_blank" href="https://github.com/php/php-langspec">a specification</a>. The most popular virtual machine is <a rel="noopener external" target="_blank" href="http://php.net/manual/en/internals2.php">Zend Engine</a>. Other virtual machines exist, like <a rel="noopener external" target="_blank" href="https://hhvm.com/">HHVM</a> (but the PHP support has been dropped recently in favor of their own PHP fork, called Hack), <a rel="noopener external" target="_blank" href="https://www.peachpie.io/">Peachpie</a>, or <a rel="noopener external" target="_blank" href="https://github.com/tagua-vm/tagua-vm">Tagua VM</a> (under development).</p> <p>In this post, we will create an extension for Zend Engine. This virtual machine is written in C. Great, we have visited <a href="https://mnt.io/series/from-rust-to-beyond/the-c-galaxy/">the C galaxy in the previous episode</a>!</p> <h2 id="rust-rocket-c-rocket-php">Rust 🚀 C 🚀 PHP<a role="presentation" class="anchor" href="#rust-rocket-c-rocket-php" title="Anchor link to this header">#</a> </h2> <figure role="presentation"> <p><img src="https://mnt.io/series/from-rust-to-beyond/the-php-galaxy/./rust-to-php.png" alt="Rust to PHP" loading="lazy" decoding="async" /></p> </figure> <p>To port our Rust parser into PHP, we first need to port it to C. It&#39;s been done in the previous episode. Two files result from this port to C: <code>libgutenberg_post_parser.a</code> and <code>gutenberg_post_parser.h</code>, respectively a static library, and the header file.</p> <h3 id="">Bootstrap with a skeleton<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h3> <p>PHP comes with <a rel="noopener external" target="_blank" href="http://php.net/manual/en/internals2.buildsys.skeleton.php">a script to create an extension skeleton</a>/template, called <a rel="noopener external" target="_blank" href="https://github.com/php/php-src/blob/master/ext/ext_skel.php"><code>ext_skel.php</code></a>. This script is accessible from the source of the Zend Engine virtual machine (which we will refer to as <code>php-src</code>). One can invoke the script like this:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cd php-src/ext/</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> ./ext_skel.php \</span></span> <span class="giallo-l"><span> --ext gutenberg_post_parser \</span></span> <span class="giallo-l"><span> --author &#39;Ivan Enderlin&#39; \</span></span> <span class="giallo-l"><span> --dir /path/to/extension \</span></span> <span class="giallo-l"><span> --onlyunix</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cd /path/to/extension</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> ls gutenberg_post_parser</span></span> <span class="giallo-l"><span>tests/</span></span> <span class="giallo-l"><span>.gitignore</span></span> <span class="giallo-l"><span>CREDITS</span></span> <span class="giallo-l"><span>config.m4</span></span> <span class="giallo-l"><span>gutenberg_post_parser.c</span></span> <span class="giallo-l"><span>php_gutenberg_post_parser.h</span></span></code></pre> <p>The <code>ext_skel.php</code> script recommends to go through the following steps:</p> <ul> <li>Rebuild the configuration of the PHP source (run <code>./buildconf</code> at the root of the <code>php-src</code> directory),</li> <li>Reconfigure the build system to enable the extension, like <code>./configure --enable-gutenberg_post_parser</code>,</li> <li>Build with <code>make</code>,</li> <li>Done.</li> </ul> <p>But our extension is very likely to live outside the <code>php-src</code> tree. So we will use <code>phpize</code> instead. <code>phpize</code> is an executable that comes with <code>php</code>, <code>php-cgi</code>, <code>phpdbg</code>, <code>php-config</code> etc. It allows to compile extensions against an already compiled <code>php</code> binary, which is perfect in our case! We will use it like this :</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cd /path/to/extension/gutenberg_post_parser</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Get the bin directory </span><span class="z-keyword">for</span><span> PHP utilities.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span class="z-variable"> PHP_PREFIX_BIN</span><span class="z-keyword z-operator">=</span><span>$(</span><span class="z-entity z-name">php-config</span><span class="z-constant z-other"> --prefix</span><span>)</span><span class="z-string">/bin</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Clean (</span><span class="z-entity z-name">except</span><span class="z-string"> if it is the first run</span><span>).</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span class="z-variable"> $PHP_PREFIX_BIN</span><span>/phpize --clean</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # “phpize” the extension.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span class="z-variable"> $PHP_PREFIX_BIN</span><span>/phpize</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Configure the extension </span><span class="z-keyword">for</span><span> a particular PHP version.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> ./configure</span><span class="z-variable"> --with-php-config</span><span class="z-keyword z-operator">=</span><span class="z-variable">$PHP_PREFIX_BIN</span><span class="z-string">/php-config</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Compile.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> make install</span></span></code></pre> <p>In this post, we will not show all the edits we have done, but we will rather focus on the extension binding. <a rel="noopener external" target="_blank" href="https://github.com/Hywan/gutenberg-parser-rs/tree/master/bindings/php/extension/gutenberg_post_parser">All the sources can be found here</a>. Shortly, here is the <code>config.m4</code> file:</p> <pre class="giallo z-code"><code data-lang="plain"><span class="giallo-l"><span>PHP_ARG_ENABLE(gutenberg_post_parser, whether to enable gutenberg_post_parser support,</span></span> <span class="giallo-l"><span>[ --with-gutenberg_post_parser Include gutenberg_post_parser support], no)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>if  test &quot;$PHP_GUTENBERG_POST_PARSER&quot; != &quot;no&quot;; then</span></span> <span class="giallo-l"><span>  PHP_SUBST(GUTENBERG_POST_PARSER_SHARED_LIBADD)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>  PHP_ADD_LIBRARY_WITH_PATH(gutenberg_post_parser, ., GUTENBERG_POST_PARSER_SHARED_LIBADD)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>  PHP_NEW_EXTENSION(gutenberg_post_parser, gutenberg_post_parser.c, $ext_shared)</span></span> <span class="giallo-l"><span>fi</span></span></code></pre> <p>What it does is basically the following:</p> <ul> <li>Register the <code>--with-gutenberg_post_parser</code> option in the build system, and</li> <li>Declare the static library to compile with, and the source of the extension itself.</li> </ul> <p>We must add the <code>libgutenberg_post_parser.a</code> and <code>gutenberg_post_parser.h</code> files in the same directory (a symlink is perfect), to get a structure such as:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> ls gutenberg_post_parser</span></span> <span class="giallo-l"><span>tests/ # from ext_skel</span></span> <span class="giallo-l"><span>.gitignore # from ext_skel</span></span> <span class="giallo-l"><span>CREDITS # from ext_skel</span></span> <span class="giallo-l"><span>config.m4 # from ext_skel (edited)</span></span> <span class="giallo-l"><span>gutenberg_post_parser.c # from ext_skel (will be edited)</span></span> <span class="giallo-l"><span>gutenberg_post_parser.h # from Rust</span></span> <span class="giallo-l"><span>libgutenberg_post_parser.a # from Rust</span></span> <span class="giallo-l"><span>php_gutenberg_post_parser.h # from ext_skel</span></span></code></pre> <p>The core of the extension is the <code>gutenberg_post_parser.c</code> file. This file is responsible to create the module, and to bind our Rust code to PHP.</p> <h3 id="-1">The module, aka the extension<a role="presentation" class="anchor" href="#-1" title="Anchor link to this header">#</a> </h3> <p>As said, we will work in the <code>gutenberg_post_parser.c</code> file. First, let&#39;s include everything we need:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-keyword">#include</span><span class="z-string"> &quot;php.h&quot;</span></span> <span class="giallo-l"><span class="z-keyword">#include</span><span class="z-string"> &quot;ext/standard/info.h&quot;</span></span> <span class="giallo-l"><span class="z-keyword">#include</span><span class="z-string"> &quot;php_gutenberg_post_parser.h&quot;</span></span> <span class="giallo-l"><span class="z-keyword">#include</span><span class="z-string"> &quot;gutenberg_post_parser.h&quot;</span></span></code></pre> <p>The last line includes the <code>gutenberg_post_parser.h</code> file generated by Rust (more precisely, by <code>cbindgen</code>, if you don&#39;t remember, <a href="https://mnt.io/series/from-rust-to-beyond/the-c-galaxy/">take a look at the previous episode</a>).</p> <p>Then, we have to decide what API we want to expose into PHP? As a reminder, the Rust parser produces an AST defined as:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> enum</span><span class="z-entity z-name"> Node</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt; {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Block</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> name</span><span class="z-keyword z-operator">:</span><span> (</span><span class="z-entity z-name">Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;,</span><span class="z-entity z-name"> Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;),</span></span> <span class="giallo-l"><span class="z-variable"> attributes</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Option</span><span>&lt;</span><span class="z-entity z-name">Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;&gt;,</span></span> <span class="giallo-l"><span class="z-variable"> children</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Vec</span><span>&lt;</span><span class="z-entity z-name">Node</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;&gt;</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> Phrase</span><span>(</span><span class="z-entity z-name">Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;)</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>The C variant of the AST is very similar (with more structures, but the idea is almost identical). So in PHP, the following structure has been selected:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> Gutenberg_Parser_Block</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-keyword"> string</span><span class="z-variable"> $namespace</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-keyword"> string</span><span class="z-variable"> $name</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-keyword"> string</span><span class="z-variable"> $attributes</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-keyword"> array</span><span class="z-variable"> $children</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> Gutenberg_Parser_Phrase</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-keyword"> string</span><span class="z-variable"> $content</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage z-type z-function">function</span><span class="z-entity z-name z-function"> gutenberg_post_parse</span><span>(</span><span class="z-keyword">string</span><span class="z-variable"> $gutenberg_post</span><span>)</span><span class="z-keyword z-operator">:</span><span class="z-keyword"> array</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>The <code>gutenberg_post_parse</code> function will output an array of objects of kind <code>Gutenberg_Parser_Block</code> or <code>Gutenberg_Parser_Phrase</code>, i.e. our AST.</p> <p>So, let&#39;s declare those classes!</p> <h3 id="-2">Declare the classes<a role="presentation" class="anchor" href="#-2" title="Anchor link to this header">#</a> </h3> <p><em>Note: The next 4 code blocks are not the core of the post, it is just code that needs to be written, you can skip it if you are not about to write a PHP extension.</em></p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span>zend_class_entry </span><span class="z-keyword z-operator">*</span><span>gutenberg_parser_block_class_entry</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>zend_class_entry </span><span class="z-keyword z-operator">*</span><span>gutenberg_parser_phrase_class_entry</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>zend_object_handlers gutenberg_parser_node_class_entry_handlers</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">typedef</span><span class="z-storage z-type"> struct</span><span> _gutenberg_parser_node </span><span class="z-punctuation z-section">{</span></span> <span class="giallo-l"><span> zend_object zobj</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span> gutenberg_parser_node</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>A class entry represents a specific class type. A handler is associated to a class entry. The logic is somewhat complicated. If you need more details, I recommend to read the <a rel="noopener external" target="_blank" href="http://www.phpinternalsbook.com/">PHP Internals Book</a>.</p> <p>Then, let&#39;s create a function to instanciate those objects:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-storage">static</span><span> zend_object </span><span class="z-keyword z-operator">*</span><span class="z-entity z-name z-function">create_parser_node_object</span><span class="z-punctuation z-section">(</span><span>zend_class_entry </span><span class="z-keyword z-operator">*</span><span class="z-variable z-parameter">class_entry</span><span class="z-punctuation z-section">)</span></span> <span class="giallo-l"><span class="z-punctuation z-section">{</span></span> <span class="giallo-l"><span> gutenberg_parser_node </span><span class="z-keyword z-operator">*</span><span>gutenberg_parser_node_object</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> gutenberg_parser_node_object </span><span class="z-keyword z-operator">=</span><span class="z-entity z-name z-function"> ecalloc</span><span class="z-punctuation z-section">(</span><span class="z-constant z-numeric">1</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> sizeof</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">*</span><span>gutenberg_parser_node_object</span><span class="z-punctuation z-section">)</span><span class="z-keyword z-operator"> +</span><span class="z-entity z-name z-function"> zend_object_properties_size</span><span class="z-punctuation z-section">(</span><span>class_entry</span><span class="z-punctuation z-section">))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> zend_object_std_init</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">gutenberg_parser_node_object</span><span class="z-punctuation z-separator">-&gt;</span><span class="z-variable">zobj</span><span class="z-punctuation z-separator">,</span><span> class_entry</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> object_properties_init</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">gutenberg_parser_node_object</span><span class="z-punctuation z-separator">-&gt;</span><span class="z-variable">zobj</span><span class="z-punctuation z-separator">,</span><span> class_entry</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> gutenberg_parser_node_object</span><span class="z-punctuation z-separator">-&gt;</span><span class="z-variable">zobj</span><span class="z-punctuation z-separator">.</span><span class="z-variable">handlers</span><span class="z-keyword z-operator"> = &amp;</span><span>gutenberg_parser_node_class_entry_handlers</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-keyword z-operator"> &amp;</span><span class="z-variable">gutenberg_parser_node_object</span><span class="z-punctuation z-separator">-&gt;</span><span class="z-variable">zobj</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span></code></pre> <p>Then, let&#39;s create a function to free those objects. It works in two steps: Destruct the object by calling its destructor (in the user-land), then free it for real (in the VM-land):</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-storage">static</span><span class="z-storage z-type"> void</span><span class="z-entity z-name z-function"> destroy_parser_node_object</span><span class="z-punctuation z-section">(</span><span>zend_object </span><span class="z-keyword z-operator">*</span><span class="z-variable z-parameter">gutenberg_parser_node_object</span><span class="z-punctuation z-section">)</span></span> <span class="giallo-l"><span class="z-punctuation z-section">{</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> zend_objects_destroy_object</span><span class="z-punctuation z-section">(</span><span>gutenberg_parser_node_object</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">static</span><span class="z-storage z-type"> void</span><span class="z-entity z-name z-function"> free_parser_node_object</span><span class="z-punctuation z-section">(</span><span>zend_object </span><span class="z-keyword z-operator">*</span><span class="z-variable z-parameter">gutenberg_parser_node_object</span><span class="z-punctuation z-section">)</span></span> <span class="giallo-l"><span class="z-punctuation z-section">{</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> zend_object_std_dtor</span><span class="z-punctuation z-section">(</span><span>gutenberg_parser_node_object</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span></code></pre> <p>Then, let&#39;s initialize the “module”, i.e. the extension. During the initialisation, we will create the classes in the user-land, declare their attributes etc.</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-entity z-name z-function">PHP_MINIT_FUNCTION</span><span class="z-punctuation z-section">(</span><span>gutenberg_post_parser</span><span class="z-punctuation z-section">)</span></span> <span class="giallo-l"><span class="z-punctuation z-section">{</span></span> <span class="giallo-l"><span> zend_class_entry class_entry</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Declare Gutenberg_Parser_Block.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> INIT_CLASS_ENTRY</span><span class="z-punctuation z-section">(</span><span>class_entry</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;Gutenberg_Parser_Block&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-constant z-language"> NULL</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> gutenberg_parser_block_class_entry </span><span class="z-keyword z-operator">=</span><span class="z-entity z-name z-function"> zend_register_internal_class</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span>class_entry TSRMLS_CC</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Declare the create handler.</span></span> <span class="giallo-l"><span class="z-variable"> gutenberg_parser_block_class_entry</span><span class="z-punctuation z-separator">-&gt;</span><span class="z-variable">create_object</span><span class="z-keyword z-operator"> =</span><span> create_parser_node_object</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // The class is final.</span></span> <span class="giallo-l"><span class="z-variable"> gutenberg_parser_block_class_entry</span><span class="z-punctuation z-separator">-&gt;</span><span class="z-variable">ce_flags</span><span class="z-keyword z-operator"> |=</span><span> ZEND_ACC_FINAL</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Declare the `namespace` public attribute,</span></span> <span class="giallo-l"><span class="z-comment"> // with an empty string for the default value.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> zend_declare_property_string</span><span class="z-punctuation z-section">(</span><span>gutenberg_parser_block_class_entry</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;namespace&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> sizeof</span><span class="z-punctuation z-section">(</span><span class="z-string">&quot;namespace&quot;</span><span class="z-punctuation z-section">)</span><span class="z-keyword z-operator"> -</span><span class="z-constant z-numeric"> 1</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;&quot;</span><span class="z-punctuation z-separator">,</span><span> ZEND_ACC_PUBLIC</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Declare the `name` public attribute,</span></span> <span class="giallo-l"><span class="z-comment"> // with an empty string for the default value.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> zend_declare_property_string</span><span class="z-punctuation z-section">(</span><span>gutenberg_parser_block_class_entry</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;name&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> sizeof</span><span class="z-punctuation z-section">(</span><span class="z-string">&quot;name&quot;</span><span class="z-punctuation z-section">)</span><span class="z-keyword z-operator"> -</span><span class="z-constant z-numeric"> 1</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;&quot;</span><span class="z-punctuation z-separator">,</span><span> ZEND_ACC_PUBLIC</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Declare the `attributes` public attribute,</span></span> <span class="giallo-l"><span class="z-comment"> // with `NULL` for the default value.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> zend_declare_property_null</span><span class="z-punctuation z-section">(</span><span>gutenberg_parser_block_class_entry</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;attributes&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> sizeof</span><span class="z-punctuation z-section">(</span><span class="z-string">&quot;attributes&quot;</span><span class="z-punctuation z-section">)</span><span class="z-keyword z-operator"> -</span><span class="z-constant z-numeric"> 1</span><span class="z-punctuation z-separator">,</span><span> ZEND_ACC_PUBLIC</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Declare the `children` public attribute,</span></span> <span class="giallo-l"><span class="z-comment"> // with `NULL` for the default value.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> zend_declare_property_null</span><span class="z-punctuation z-section">(</span><span>gutenberg_parser_block_class_entry</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;children&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> sizeof</span><span class="z-punctuation z-section">(</span><span class="z-string">&quot;children&quot;</span><span class="z-punctuation z-section">)</span><span class="z-keyword z-operator"> -</span><span class="z-constant z-numeric"> 1</span><span class="z-punctuation z-separator">,</span><span> ZEND_ACC_PUBLIC</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Declare the Gutenberg_Parser_Block.</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> … skip …</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">    // Declare Gutenberg parser node object handlers.</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function">    memcpy</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span>gutenberg_parser_node_class_entry_handlers</span><span class="z-punctuation z-separator">,</span><span class="z-entity z-name z-function"> zend_get_std_object_handlers</span><span class="z-punctuation z-section">()</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> sizeof</span><span class="z-punctuation z-section">(</span><span>gutenberg_parser_node_class_entry_handlers</span><span class="z-punctuation z-section">))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> gutenberg_parser_node_class_entry_handlers</span><span class="z-punctuation z-separator">.</span><span class="z-variable">offset</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> XtOffsetOf</span><span class="z-punctuation z-section">(</span><span>gutenberg_parser_node</span><span class="z-punctuation z-separator">,</span><span> zobj</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable"> gutenberg_parser_node_class_entry_handlers</span><span class="z-punctuation z-separator">.</span><span class="z-variable">dtor_obj</span><span class="z-keyword z-operator"> =</span><span> destroy_parser_node_object</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable"> gutenberg_parser_node_class_entry_handlers</span><span class="z-punctuation z-separator">.</span><span class="z-variable">free_obj</span><span class="z-keyword z-operator"> =</span><span> free_parser_node_object</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span> SUCCESS</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span></code></pre> <p>If you are still reading, first: Thank you, and second: Congrats!</p> <p>Then, there is a <code>PHP_RINIT_FUNCTION</code> and a <code>PHP_MINFO_FUNCTION</code> functions that are already generated by the <code>ext_skel.php</code> script. Same for the module entry definition and other module configuration details.</p> <h3 id="gutenberg-post-parse">The <code>gutenberg_post_parse</code> function<a role="presentation" class="anchor" href="#gutenberg-post-parse" title="Anchor link to this header">#</a> </h3> <p>We will now focus on the <code>gutenberg_post_parse</code> PHP function. This function takes a string as a single argument  and returns either <code>false</code> if the parsing failed, or an array of objects of kind <code>Gutenberg_Parser_Block</code> or <code>Gutenberg_Parser_Phrase</code> otherwise. Let&#39;s write it! Notice that it is declared with <a rel="noopener external" target="_blank" href="https://github.com/php/php-src/blob/52d91260df54995a680f420884338dfd9d5a0d49/main/php.h#L400">the <code>PHP_FUNCTION</code> macro</a>.</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-entity z-name z-function">PHP_FUNCTION</span><span class="z-punctuation z-section">(</span><span>gutenberg_post_parse</span><span class="z-punctuation z-section">)</span></span> <span class="giallo-l"><span class="z-punctuation z-section">{</span></span> <span class="giallo-l"><span class="z-storage z-type"> char</span><span class="z-keyword z-operator"> *</span><span>input</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage z-type"> size_t</span><span> input_len</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Read the input as a string.</span></span> <span class="giallo-l"><span class="z-keyword"> if</span><span class="z-punctuation z-section"> (</span><span class="z-entity z-name z-function">zend_parse_parameters</span><span class="z-punctuation z-section">(</span><span class="z-entity z-name z-function">ZEND_NUM_ARGS</span><span class="z-punctuation z-section">()</span><span> TSRMLS_CC</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;s&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> &amp;</span><span>input</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> &amp;</span><span>input_len</span><span class="z-punctuation z-section">)</span><span class="z-keyword z-operator"> ==</span><span> FAILURE</span><span class="z-punctuation z-section">) {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> }</span></span></code></pre> <p>At this step, the argument has been declared and typed as a string (<code>"s"</code>). The string value is in <code>input</code> and the string length is in <code>input_len</code>.</p> <p>The next step is to parse the <code>input</code>. (The length of the string is not needed). This is where we are going to call our Rust code! Let&#39;s do that:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-comment">    // Parse the input.</span></span> <span class="giallo-l"><span>    Result parser_result </span><span class="z-keyword z-operator">=</span><span class="z-entity z-name z-function"> parse</span><span class="z-punctuation z-section">(</span><span>input</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">    // If parsing failed, then return false.</span></span> <span class="giallo-l"><span class="z-keyword">    if</span><span class="z-punctuation z-section"> (</span><span>parser_result.tag </span><span class="z-keyword z-operator">==</span><span> Err</span><span class="z-punctuation z-section">) {</span></span> <span class="giallo-l"><span>        RETURN_FALSE</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">    }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">    // Else map the Rust AST into a PHP array.</span></span> <span class="giallo-l"><span class="z-storage">    const</span><span> Vector_Node nodes </span><span class="z-keyword z-operator">=</span><span> parse_result.ok._0</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>The <code>Result</code> type and the <code>parse</code> function come from Rust. If you don&#39;t remember those types, please <a href="https://mnt.io/series/from-rust-to-beyond/the-c-galaxy/">read the previous episode about the C galaxy</a>.</p> <p>Zend Engine has a macro called <code>RETURN_FALSE</code> to return… <code>false</code>! Handy isn&#39;t it?</p> <p>Finally, if everything went well, we get back a collection of node as a <code>Vector_Node</code> type.</p> <p>The next step is to map those Rust/C types into PHP types, i.e. an array of the Gutenberg classes. Let&#39;s go:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-comment">    // Note: return_value is a “magic” variable that holds the value to be returned.</span></span> <span class="giallo-l"><span class="z-comment"> //</span></span> <span class="giallo-l"><span class="z-comment">    // Allocate an array.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">    array_init_size</span><span class="z-punctuation z-section">(</span><span>return_value</span><span class="z-punctuation z-separator">,</span><span> nodes.length</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">    // Map the Rust AST.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">    into_php_objects</span><span class="z-punctuation z-section">(</span><span>return_value</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> &amp;</span><span class="z-variable z-parameter">nodes</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Done 😁! Oh wait… the <code>into_php_objects</code> function need to be written!</p> <h3 id="into-php-objects">The <code>into_php_objects</code> function<a role="presentation" class="anchor" href="#into-php-objects" title="Anchor link to this header">#</a> </h3> <p>This function is not terribly complex: It&#39;s just full of Zend Engine specific API as expected. We are going to explain how to map a <code>Block</code> into a <code>Gutenberg_Parser_Block</code> object, and to let the <code>Phrase</code> mapping to <code>Gutenberg_Parser_Phrase</code> for the assiduous readers. And there we go:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-storage z-type">void</span><span class="z-entity z-name z-function"> into_php_objects</span><span class="z-punctuation z-section">(</span><span>zval </span><span class="z-keyword z-operator">*</span><span class="z-variable z-parameter">php_array</span><span class="z-punctuation z-separator">,</span><span class="z-storage"> const</span><span> Vector_Node </span><span class="z-keyword z-operator">*</span><span class="z-variable z-parameter">nodes</span><span class="z-punctuation z-section">)</span></span> <span class="giallo-l"><span class="z-punctuation z-section">{</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-storage z-type"> uintptr_t</span><span> number_of_nodes </span><span class="z-keyword z-operator">=</span><span class="z-variable"> nodes</span><span class="z-punctuation z-separator">-&gt;</span><span class="z-variable">length</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> if</span><span class="z-punctuation z-section"> (</span><span>number_of_nodes </span><span class="z-keyword z-operator">==</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-section">) {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Iterate over all nodes.</span></span> <span class="giallo-l"><span class="z-keyword"> for</span><span class="z-punctuation z-section"> (</span><span class="z-storage z-type">uintptr_t</span><span> nth </span><span class="z-keyword z-operator">=</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span><span> nth </span><span class="z-keyword z-operator">&lt;</span><span> number_of_nodes</span><span class="z-punctuation z-terminator">;</span><span class="z-keyword z-operator"> ++</span><span>nth</span><span class="z-punctuation z-section">) {</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span> Node node </span><span class="z-keyword z-operator">=</span><span class="z-variable"> nodes</span><span class="z-punctuation z-separator">-&gt;</span><span class="z-variable">buffer</span><span>[nth]</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> if</span><span class="z-punctuation z-section"> (</span><span class="z-variable">node</span><span class="z-punctuation z-separator">.</span><span class="z-variable">tag</span><span class="z-keyword z-operator"> ==</span><span> Block</span><span class="z-punctuation z-section">) {</span></span> <span class="giallo-l"><span class="z-comment"> // Map Block into Gutenberg_Parser_Block.</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> }</span><span class="z-keyword"> else if</span><span class="z-punctuation z-section"> (</span><span class="z-variable">node</span><span class="z-punctuation z-separator">.</span><span class="z-variable">tag</span><span class="z-keyword z-operator"> ==</span><span> Phrase</span><span class="z-punctuation z-section">) {</span></span> <span class="giallo-l"><span class="z-comment"> // Map Phrase into Gutenberg_Parser_Phrase.</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> }</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> }</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span></code></pre> <p>Now let&#39;s map a block. The process is the following:</p> <ol> <li>Allocate PHP strings for the block namespace, and for the block name,</li> <li>Allocate an object,</li> <li>Set the block namespace and the block name to their respective object properties,</li> <li>Allocate a PHP string for the block attributes if any,</li> <li>Set the block attributes to its respective object property,</li> <li>If any children, initialise a new array, and call <code>into_php_objects</code> with the child nodes and the new array,</li> <li>Set the children to its respective object property,</li> <li>Finally, add the block object inside the array to be returned.</li> </ol> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-storage">const</span><span> Block_Body block </span><span class="z-keyword z-operator">=</span><span> node.block</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>zval php_block</span><span class="z-punctuation z-separator">,</span><span> php_block_namespace</span><span class="z-punctuation z-separator">,</span><span> php_block_name</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// 1. Prepare the PHP strings.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">ZVAL_STRINGL</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable z-parameter">php_block_namespace</span><span class="z-punctuation z-separator">,</span><span> block.namespace.pointer</span><span class="z-punctuation z-separator">,</span><span> block.namespace.length</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">ZVAL_STRINGL</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable z-parameter">php_block_name</span><span class="z-punctuation z-separator">,</span><span> block.name.pointer</span><span class="z-punctuation z-separator">,</span><span> block.name.length</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>Do you remember that namespace, name and other similar data are of type <code>Slice_c_char</code>? It&#39;s just a structure with a pointer and a length. The pointer points to the original input string, so that there is no copy (and this is the definition of a slice actually). Well, Zend Engine has <a rel="noopener external" target="_blank" href="https://github.com/php/php-src/blob/52d91260df54995a680f420884338dfd9d5a0d49/Zend/zend_API.h#L563-L565">a <code>ZVAL_STRINGL</code> macro</a> that allows to create a string from a pointer and a length, great! Unfortunately for us, Zend Engine does <a rel="noopener external" target="_blank" href="https://github.com/php/php-src/blob/52d91260df54995a680f420884338dfd9d5a0d49/Zend/zend_string.h#L152-L159">a copy behind the scene</a>… There is no way to keep the pointer and the length only, but it keeps the number of copies small. I think it is to take the full ownership of the data, which is required for the garbage collector.</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-comment">// 2. Create the Gutenberg_Parser_Block object.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">object_init_ex</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable z-parameter">php_block</span><span class="z-punctuation z-separator">,</span><span> gutenberg_parser_block_class_entry</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>The object has been instanciated with a class represented by the <code>gutenberg_parser_block_class_entry</code>.</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-comment">// 3. Set the namespace and the name.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">add_property_zval</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable z-parameter">php_block</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;namespace&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> &amp;</span><span class="z-variable z-parameter">php_block_namespace</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">add_property_zval</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable z-parameter">php_block</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;name&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> &amp;</span><span class="z-variable z-parameter">php_block_name</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function">zval_ptr_dtor</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable z-parameter">php_block_namespace</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">zval_ptr_dtor</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable z-parameter">php_block_name</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>The <code>zval_ptr_dtor</code> adds 1 to the reference counter. This is required for the garbage collector.</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-comment">// 4. Deal with block attributes if some.</span></span> <span class="giallo-l"><span class="z-keyword">if</span><span class="z-punctuation z-section"> (</span><span>block.attributes.tag </span><span class="z-keyword z-operator">==</span><span> Some</span><span class="z-punctuation z-section">) {</span></span> <span class="giallo-l"><span> Slice_c_char attributes </span><span class="z-keyword z-operator">=</span><span class="z-variable"> block</span><span class="z-punctuation z-separator">.</span><span class="z-variable">attributes</span><span class="z-punctuation z-separator">.</span><span class="z-variable">some</span><span>.</span><span class="z-variable">_0</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> zval php_block_attributes</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> ZVAL_STRINGL</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span>php_block_attributes</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> attributes</span><span class="z-punctuation z-separator">.</span><span class="z-variable">pointer</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> attributes</span><span class="z-punctuation z-separator">.</span><span class="z-variable">length</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // 5. Set the attributes.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> add_property_zval</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span>php_block</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;attributes&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> &amp;</span><span>php_block_attributes</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> zval_ptr_dtor</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span>php_block_attributes</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span></code></pre> <p>It is similar to what has been done for <code>namespace</code> and <code>name</code>. Now let&#39;s continue with children.</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-comment">// 6. Handle children.</span></span> <span class="giallo-l"><span class="z-storage">const</span><span> Vector_Node </span><span class="z-keyword z-operator">*</span><span>children </span><span class="z-keyword z-operator">=</span><span class="z-punctuation z-section"> (</span><span class="z-storage">const</span><span> Vector_Node</span><span class="z-keyword z-operator">*</span><span class="z-punctuation z-section">) (</span><span>block.children</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">if</span><span class="z-punctuation z-section"> (</span><span>children</span><span class="z-keyword z-operator">-&gt;</span><span>length </span><span class="z-keyword z-operator">&gt;</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-section">) {</span></span> <span class="giallo-l"><span> zval php_children_array</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> array_init_size</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span>php_children_array</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> children</span><span class="z-punctuation z-separator">-&gt;</span><span class="z-variable">length</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Recursion.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> into_php_objects</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span>php_children_array</span><span class="z-punctuation z-separator">,</span><span> children</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // 7. Set the children.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> add_property_zval</span><span class="z-punctuation z-section">(</span><span class="z-keyword z-operator">&amp;</span><span>php_block</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;children&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> &amp;</span><span>php_children_array</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> Z_DELREF</span><span class="z-punctuation z-section">(</span><span>php_children_array</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function">free</span><span class="z-punctuation z-section">((</span><span class="z-storage z-type">void</span><span class="z-keyword z-operator">*</span><span class="z-punctuation z-section">)</span><span class="z-variable z-parameter"> children</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>Finally, add the block instance into the array to be returned:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-comment"> // 8. Insert the object in the collection.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> add_next_index_zval</span><span class="z-punctuation z-section">(</span><span>php_array</span><span class="z-punctuation z-separator">,</span><span class="z-keyword z-operator"> &amp;</span><span class="z-variable z-parameter">php_block</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p><a rel="noopener external" target="_blank" href="https://github.com/Hywan/gutenberg-parser-rs/blob/master/bindings/php/extension/gutenberg_post_parser/gutenberg_post_parser.c">The entire code lands here</a>.</p> <h2 id="-3">PHP extension 🚀 PHP userland<a role="presentation" class="anchor" href="#-3" title="Anchor link to this header">#</a> </h2> <p>Now the extension is written, we have to compile it. That&#39;s the repetitive set of commands we have shown above with <code>phpize</code>. Once the extension is compiled, the generated <code>gutenberg_post_parser.so</code> file must be located in the extension directory. This directory can be found with the following command:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> php-config --extension-dir</span></span></code></pre> <p>For instance, in my computer, the extension directory is <code>/usr/local/Cellar/php/7.2.11/pecl/20170718</code>.</p> <p>Then, to enable the extension for a given execution, you must write:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> php -d</span><span class="z-variable"> extension</span><span class="z-keyword z-operator">=</span><span class="z-string">gutenberg_post_parser</span><span class="z-entity z-name"> -m</span><span class="z-keyword z-operator"> |</span><span> \</span></span> <span class="giallo-l"><span> grep gutenberg_post_parser</span></span></code></pre> <p>Or, to enable the extension for all executions, locate the <code>php.ini</code> file with <code>php --ini</code> and edit it to add:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span>extension=gutenberg_post_parser</span></span></code></pre> <p>Done!</p> <p>Now, let&#39;s use some reflection to check the extension is correctly loaded and handled by PHP:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> php --re gutenberg_post_parser</span></span> <span class="giallo-l"><span>Extension [ &lt;persistent&gt; extension #64 gutenberg_post_parser version 0.1.0 ] {</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> - Functions {</span></span> <span class="giallo-l"><span> Function [ &lt;internal:gutenberg_post_parser&gt; function gutenberg_post_parse ] {</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> - Parameters [1] {</span></span> <span class="giallo-l"><span> Parameter #0 [ &lt;required&gt; $gutenberg_post_as_string ]</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> - Classes [2] {</span></span> <span class="giallo-l"><span> Class [ &lt;internal:gutenberg_post_parser&gt; final class Gutenberg_Parser_Block ] {</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> - Constants [0] {</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> - Static properties [0] {</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> - Static methods [0] {</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> - Properties [4] {</span></span> <span class="giallo-l"><span> Property [ &lt;default&gt; public $namespace ]</span></span> <span class="giallo-l"><span> Property [ &lt;default&gt; public $name ]</span></span> <span class="giallo-l"><span> Property [ &lt;default&gt; public $attributes ]</span></span> <span class="giallo-l"><span> Property [ &lt;default&gt; public $children ]</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> - Methods [0] {</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> Class [ &lt;internal:gutenberg_post_parser&gt; final class Gutenberg_Parser_Phrase ] {</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> - Constants [0] {</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> - Static properties [0] {</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> - Static methods [0] {</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> - Properties [1] {</span></span> <span class="giallo-l"><span> Property [ &lt;default&gt; public $content ]</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> - Methods [0] {</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Everything looks good: There is one function and two classes that are defined as expected. Now, let&#39;s write some PHP code for the first time in this blog post!</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-function">var_dump</span><span>(</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">    gutenberg_post_parse</span><span>(</span></span> <span class="giallo-l"><span class="z-string"> &#39;&lt;!-- wp:foo /--&gt;bar&lt;!-- wp:baz --&gt;qux&lt;!-- /wp:baz --&gt;&#39;</span></span> <span class="giallo-l"><span> )</span></span> <span class="giallo-l"><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">/**</span></span> <span class="giallo-l"><span class="z-comment"> * Will output:</span></span> <span class="giallo-l"><span class="z-comment"> * array(3) {</span></span> <span class="giallo-l"><span class="z-comment"> * [0]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * object(Gutenberg_Parser_Block)#1 (4) {</span></span> <span class="giallo-l"><span class="z-comment"> * [&quot;namespace&quot;]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * string(4) &quot;core&quot;</span></span> <span class="giallo-l"><span class="z-comment"> * [&quot;name&quot;]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * string(3) &quot;foo&quot;</span></span> <span class="giallo-l"><span class="z-comment"> * [&quot;attributes&quot;]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * NULL</span></span> <span class="giallo-l"><span class="z-comment"> * [&quot;children&quot;]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * NULL</span></span> <span class="giallo-l"><span class="z-comment"> * }</span></span> <span class="giallo-l"><span class="z-comment"> * [1]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * object(Gutenberg_Parser_Phrase)#2 (1) {</span></span> <span class="giallo-l"><span class="z-comment"> * [&quot;content&quot;]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * string(3) &quot;bar&quot;</span></span> <span class="giallo-l"><span class="z-comment"> * }</span></span> <span class="giallo-l"><span class="z-comment"> * [2]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * object(Gutenberg_Parser_Block)#3 (4) {</span></span> <span class="giallo-l"><span class="z-comment"> * [&quot;namespace&quot;]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * string(4) &quot;core&quot;</span></span> <span class="giallo-l"><span class="z-comment"> * [&quot;name&quot;]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * string(3) &quot;baz&quot;</span></span> <span class="giallo-l"><span class="z-comment"> * [&quot;attributes&quot;]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * NULL</span></span> <span class="giallo-l"><span class="z-comment"> * [&quot;children&quot;]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * array(1) {</span></span> <span class="giallo-l"><span class="z-comment"> * [0]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * object(Gutenberg_Parser_Phrase)#4 (1) {</span></span> <span class="giallo-l"><span class="z-comment"> * [&quot;content&quot;]=&gt;</span></span> <span class="giallo-l"><span class="z-comment"> * string(3) &quot;qux&quot;</span></span> <span class="giallo-l"><span class="z-comment"> * }</span></span> <span class="giallo-l"><span class="z-comment"> * }</span></span> <span class="giallo-l"><span class="z-comment"> * }</span></span> <span class="giallo-l"><span class="z-comment"> * }</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span></code></pre> <p>It works very well!</p> <h2 id="-4">Conclusion<a role="presentation" class="anchor" href="#-4" title="Anchor link to this header">#</a> </h2> <p>The journey is:</p> <ul> <li>A string written in PHP,</li> <li>Allocated by the Zend Engine from the Gutenberg extension,</li> <li>Passed to Rust through FFI (static library + header),</li> <li>Back to Zend Engine in the Gutenberg extension,</li> <li>To generate PHP objects,</li> <li>That are read by PHP.</li> </ul> <p>Rust fits really everywhere!</p> <p>We have seen in details how to write a real world parser in Rust, how to bind it to C and compile it to a static library in addition to C headers, how to create a PHP extension exposing one function and two objects, how to integrate the C binding into PHP, and how to use this extension in PHP.</p> <p>As a reminder, the C binding is about 150 lines of code. The PHP extension is about 300 lines of code, but substracting “decorations” (the boilerplate to declare and manage the extension) that are automatically generated, the PHP extension reduces to about 200 lines of code. Once again, I find this is a small surface of code to review considering the fact that the parser is still written in Rust, and modifying the parser will not impact the bindings (except if the AST is updated obviously)!</p> <p>PHP is a language with a garbage collector. It explains why all strings are copied, so that they are owned by PHP itself. However, the fact that Rust does not copy any data saves memory allocations and deallocations, which is the biggest cost most of the time.</p> <p>Rust also provides safety. This property can be questionned considering the number of binding we are going through: Rust to C to PHP: Does it still hold? From the Rust perspective, yes, but everything that happens inside C or PHP must be considered unsafe. A special care must be put in the C binding to handle all situations.</p> <p>Is it still fast? Well, let&#39;s benchmark. I would like to remind that the first goal of this experiment was to tackle the bad performance of the original PEG.js parser. On the JavaScript ground, WASM and ASM.js have shown to be very much faster (see <a href="https://mnt.io/series/from-rust-to-beyond/the-webassembly-galaxy/">the WebAssembly galaxy</a>, and <a href="https://mnt.io/series/from-rust-to-beyond/the-asm-js-galaxy/">the ASM.js galaxy</a>). For PHP, <a rel="noopener external" target="_blank" href="https://github.com/nylen/phpegjs"><code>phpegjs</code> is used</a>: It reads the grammar written for PEG.js and compiles it to PHP. Let&#39;s see how they compare:</p> <figure> <table><thead><tr><th>Document</th><th>PEG PHP parser (ms)</th><th>Rust parser as a PHP extension (ms)</th><th>speedup</th></tr></thead><tbody> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/demo-post.html"><code>demo-post.html</code></a></td><td>30.409</td><td>0.0012</td><td>× 25341</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/shortcode-shortcomings.html"><code>shortcode-shortcomings.html</code></a></td><td>76.39</td><td>0.096</td><td>× 796</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/redesigning-chrome-desktop.html"><code>redesigning-chrome-desktop.html</code></a></td><td>225.824</td><td>0.399</td><td>× 566</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/web-at-maximum-fps.html"><code>web-at-maximum-fps.html</code></a></td><td>173.495</td><td>0.275</td><td>× 631</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/early-adopting-the-future.html"><code>early-adopting-the-future.html</code></a></td><td>280.433</td><td>0.298</td><td>× 941</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/pygmalian-raw-html.html"><code>pygmalian-raw-html.html</code></a></td><td>377.392</td><td>0.052</td><td>× 7258</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/moby-dick-parsed.html"><code>moby-dick-parsed.html</code></a></td><td>5,437.630</td><td>5.037</td><td>× 1080</td></tr> </tbody></table> <figcaption> <p>Benchmarks between PEG PHP parser and Rust parser as a PHP extension.</p> </figcaption> </figure> <p>The PHP extension of the Rust parser is in average 5230 times faster than the actual PEG PHP implementation. The median of the speedup is 941.</p> <p>Another huge issue was that the PEG parser was not able to handle many Gutenberg documents because of a memory limit. Of course, it is possible to grow the size of the memory, but it is not ideal. With the Rust parser as a PHP extension, memory stays constant and close to the size of the parsed document.</p> <p>I reckon we can optimise the extension further to generate an iterator instead of an array. This is something I want to explore and analyse the impact on the performance. The PHP Internals Book has a <a rel="noopener external" target="_blank" href="http://www.phpinternalsbook.com/classes_objects/iterators.html">chapter about Iterators</a>.</p> <p>Thanks for reading!</p> The C galaxy 2018-09-11T00:00:00+00:00 2018-09-11T00:00:00+00:00 Unknown https://mnt.io/series/from-rust-to-beyond/the-c-galaxy/ <p>The galaxy we will explore today is the C galaxy. This post will explain what C is (shortly), how to compile any Rust program in C in theory, and how to do that practically with our Rust parser from the Rust side and the C side. We will also see how to test such a binding.</p> <h2 id="what-is-c-and-why">What is C, and why?<a role="presentation" class="anchor" href="#what-is-c-and-why" title="Anchor link to this header">#</a> </h2> <p><a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/C_(programming_language)">C</a> is probably the most used and known programming language in the world. Quoting Wikipedia:</p> <blockquote> <p>C […] is a general-purpose, imperative computer programming language, supporting structured programming, lexical variable scope and recursion, while a static type system prevents many unintended operations. By design, C provides constructs that map efficiently to typical machine instructions, and therefore it has found lasting use in applications that had formerly been coded in assembly language, including operating systems, as well as various application software for computers ranging from supercomputers to embedded systems.</p> </blockquote> <figure> <p><img src="https://mnt.io/series/from-rust-to-beyond/the-c-galaxy/./dennis-ritchie.png" alt="Dennis Ritchie" loading="lazy" decoding="async" /></p> <figcaption> <p>Dennis Ritchie, the inventor of the C language.</p> </figcaption> </figure> <p>The impact of C is probably without precedent on the progamming language world. Almost everything is written in C, starting with operating systems. Today, it is one of the few common denominator between any programs on any systems on any machines in the world. In other words, being compatible with C opens a large door to everything. Your program will be able to talk directly to any program easily.</p> <p>Because languages like PHP or Python are written in C, in our particular Gutenberg parser usecase, it means that the parser can be embedded and used by PHP or Python directly, with almost no overhead. Neat!</p> <h2 id="">Rust 🚀 C<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h2> <figure role="presentation"> <p><img src="https://mnt.io/series/from-rust-to-beyond/the-c-galaxy/./rust-to-c.png" alt="Rust to C" loading="lazy" decoding="async" /></p> </figure> <p>In order to use Rust from C, one may need 2 elements:</p> <ol> <li>A static library (<code>.a</code> file),</li> <li>A header file (<code>.h</code> file).</li> </ol> <h3 id="-1">The theory<a role="presentation" class="anchor" href="#-1" title="Anchor link to this header">#</a> </h3> <p>To compile a Rust project into a static library, the <code>crate-type</code> property must contain the <code>staticlib</code> value. Let&#39;s edit the <code>Cargo.toml</code> file such as:</p> <pre class="giallo z-code"><code data-lang="toml"><span class="giallo-l"><span>[</span><span class="z-entity z-name">lib</span><span>]</span></span> <span class="giallo-l"><span class="z-variable">name</span><span class="z-punctuation z-separator"> =</span><span class="z-string"> &quot;gutenberg_post_parser&quot;</span></span> <span class="giallo-l"><span class="z-variable">crate-type</span><span class="z-punctuation z-separator"> =</span><span> [</span><span class="z-string">&quot;staticlib&quot;</span><span>]</span></span></code></pre> <p>Once <code>cargo build --release</code> is run, a <code>libgutenberg_post_parser.a</code> file is created in <code>target/release/</code>. Done. <code>cargo</code> and <code>rustc</code> make this step really a doddle.</p> <p>Now the header file. It can be written manually, but it&#39;s tedious and it gets easily outdated. The goal is to <em>automatically</em> generate it. Enter <a rel="noopener external" target="_blank" href="https://github.com/eqrion/cbindgen/"><code>cbindgen</code></a>:</p> <blockquote> <p><code>cbindgen</code> can be used to generate C bindings for Rust code. It is currently being developed to support creating bindings for <a rel="noopener external" target="_blank" href="https://github.com/servo/webrender/">WebRender</a>, but has been designed to support any project.</p> </blockquote> <p>To install <code>cbindgen</code>, edit your <code>Cargo.toml</code> file, such as:</p> <pre class="giallo z-code"><code data-lang="toml"><span class="giallo-l"><span>[</span><span class="z-entity z-name">package</span><span>]</span></span> <span class="giallo-l"><span class="z-variable">build</span><span class="z-punctuation z-separator"> =</span><span class="z-string"> &quot;build.rs&quot;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>[</span><span class="z-entity z-name">build-dependencies</span><span>]</span></span> <span class="giallo-l"><span class="z-variable">cbindgen</span><span class="z-punctuation z-separator"> =</span><span class="z-string"> &quot;^0.6.0&quot;</span></span></code></pre> <p>Actually, <code>cbindgen</code> comes in 2 flavors: CLI executable, or a library. I prefer to use the library approach, which makes installation easier.</p> <p>Note that Cargo has been instructed to use the <code>build.rs</code> file to build the project. This file is an appropriate place to generate the C headers file with <code>cbindgen</code>. Let&#39;s write it!</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-storage">extern</span><span class="z-keyword"> crate</span><span> cbindgen;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">fn</span><span class="z-entity z-name z-function"> main</span><span>() {</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> crate_dir</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> std</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">env</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">var</span><span>(</span><span class="z-string">&quot;CARGO_MANIFEST_DIR&quot;</span><span>)</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">unwrap</span><span>();</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name"> cbindgen</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">generate</span><span>(</span><span class="z-variable">crate_dir</span><span>)</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">expect</span><span>(</span><span class="z-string">&quot;Unable to generate C bindings.&quot;</span><span>)</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">write_to_file</span><span>(</span><span class="z-string">&quot;dist/gutenberg_post_parser.h&quot;</span><span>);</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>With those information, <code>cbindgen</code> will scan the source code of the project and will generate C headers automatically in the <code>dist/gutenberg_post_parser.h</code> header file. Scanning will be detailed in a moment, but before that, let&#39;s quickly see how to control the content of the header file. With the code snippet above, <code>cbindgen</code> will look for a <code>cbindgen.toml</code> configuration file in the <code>CARGO_MANIFEST_DIR</code> directory, i.e. the root of your crate. Mine looks like this:</p> <pre class="giallo z-code"><code data-lang="toml"><span class="giallo-l"><span class="z-variable">header</span><span class="z-punctuation z-separator"> =</span><span class="z-string"> &quot;&quot;&quot;</span></span> <span class="giallo-l"><span class="z-string">/*</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-string">Gutengerg Post Parser, the C bindings.</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-string">Warning, this file is autogenerated by `cbindgen`.</span></span> <span class="giallo-l"><span class="z-string">Do not modify this manually.</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-string">*/&quot;&quot;&quot;</span></span> <span class="giallo-l"><span class="z-variable">tab_width</span><span class="z-punctuation z-separator"> =</span><span class="z-constant z-numeric"> 4</span></span> <span class="giallo-l"><span class="z-variable">language</span><span class="z-punctuation z-separator"> =</span><span class="z-string"> &quot;C&quot;</span></span></code></pre> <p>It describes itself quite easily. <a rel="noopener external" target="_blank" href="https://github.com/eqrion/cbindgen/#configuration">The documentation details the configuration</a> very well.</p> <p><code>cbindgen</code> will scan the code and will stop on <code>struct</code>s or <code>enum</code>s that have the decorator <code>#[repr(C)]</code>, <code>#[repr(_size_)]</code> or <code>#[repr(transparent)]</code>, or functions that are marked as <code>extern "C"</code> and are public. So when one writes:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[repr(</span><span class="z-entity z-name">C</span><span>)]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> struct</span><span class="z-entity z-name"> Slice</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> pointer</span><span class="z-keyword z-operator">: *</span><span class="z-storage">const</span><span class="z-variable"> c_char</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> length</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> usize</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>#[repr(</span><span class="z-entity z-name">C</span><span>)]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> enum</span><span class="z-entity z-name"> Option</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Some</span><span>(</span><span class="z-entity z-name">Slice</span><span>),</span></span> <span class="giallo-l"><span class="z-entity z-name"> None</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>#[no_mangle]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> extern</span><span class="z-string"> &quot;C&quot;</span><span class="z-entity z-name z-function"> parse</span><span>(</span><span class="z-variable">pointer</span><span class="z-keyword z-operator">: *</span><span class="z-storage">const</span><span class="z-variable"> c_char</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-variable"> c_void</span><span> { … }</span></span></code></pre> <p>Then <code>cbindgen</code> will generate this:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-comment">// … header comment …</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">typedef</span><span class="z-storage z-type"> struct</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-storage z-type"> char</span><span class="z-keyword z-operator"> *</span><span>pointer</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage z-type"> uintptr_t</span><span> length</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span> Slice</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">typedef</span><span class="z-storage z-type"> enum</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span> Some</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> None</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span> Option_Tag</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">typedef</span><span class="z-storage z-type"> struct</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span> Slice _0</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span> Some_Body</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">typedef</span><span class="z-storage z-type"> struct</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span> Option_Tag tag</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage z-type"> union</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span> Some_Body some</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> }</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span> Option</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage z-type">void</span><span class="z-entity z-name z-function"> parse</span><span class="z-punctuation z-section">(</span><span class="z-storage">const</span><span class="z-storage z-type"> char</span><span class="z-keyword z-operator"> *</span><span class="z-variable z-parameter">pointer</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>It works; Great!</p> <p>Note the <code>#[no_mangle]</code> that decorates the Rust <code>parse</code> function. It instructs the compiler to not rename the function, so that the function has the same name from the perspective of C.</p> <p>OK, that&#39;s all for the theory. Let&#39;s practise now, we have a parser to bind to C!</p> <h3 id="-2">Practise<a role="presentation" class="anchor" href="#-2" title="Anchor link to this header">#</a> </h3> <p>We want to bind a function named <code>parse</code>. The function outputs an AST representing the language being analysed. <a href="https://mnt.io/series/from-rust-to-beyond/prelude/">For the recall</a>, the original AST looks like this:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> enum</span><span class="z-entity z-name"> Node</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt; {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Block</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> name</span><span class="z-keyword z-operator">:</span><span> (</span><span class="z-entity z-name">Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;,</span><span class="z-entity z-name"> Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;),</span></span> <span class="giallo-l"><span class="z-variable"> attributes</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Option</span><span>&lt;</span><span class="z-entity z-name">Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;&gt;,</span></span> <span class="giallo-l"><span class="z-variable"> children</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Vec</span><span>&lt;</span><span class="z-entity z-name">Node</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;&gt;</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> Phase</span><span>(</span><span class="z-entity z-name">Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;)</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>This AST is defined in the Rust parser. The Rust binding to C will transform this AST into another set of structs and enums for C. It is mandatory only for types that are directly exposed to C, not internal types that Rust uses. Let&#39;s start by defining <code>Node</code>:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[repr(</span><span class="z-entity z-name">C</span><span>)]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> enum</span><span class="z-entity z-name"> Node</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Block</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> namespace</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Slice_c_char</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> name</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Slice_c_char</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> attributes</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Option_c_char</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> children</span><span class="z-keyword z-operator">: *</span><span class="z-storage">const</span><span class="z-variable"> c_void</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> Phrase</span><span>(</span><span class="z-entity z-name">Slice_c_char</span><span>)</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Some immediate thoughts:</p> <ul> <li>The structure <code>Slice_c_char</code> emulates Rust slices (see below),</li> <li>The enum <code>Option_c_char</code> emulates <code>Option</code> (see below),</li> <li>The field <code>children</code> has type <code>*const c_void</code>. It should be <code>*const Vector_Node</code> (our definition of <code>Vector</code>), but the definition of <code>Node</code> is based on <code>Vector_Node</code> and vice versa. This <a rel="noopener external" target="_blank" href="https://github.com/eqrion/cbindgen/issues/43">cyclical definition case is unsupported by <code>cbindgen</code> so far</a>. So… yes, it is defined as a <code>void</code> pointer, and will be casted later in C,</li> <li>The fields <code>namespace</code> and <code>name</code> are originally a tuple in Rust. Tuples have no equivalent in C with <code>cbindgen</code>, so two fields are used instead.</li> </ul> <p>Let&#39;s define <code>Slice_c_char</code>:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[repr(</span><span class="z-entity z-name">C</span><span>)]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> struct</span><span class="z-entity z-name"> Slice_c_char</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> pointer</span><span class="z-keyword z-operator">: *</span><span class="z-storage">const</span><span class="z-variable"> c_char</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> length</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> usize</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>This definition borrows the semantics of <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/primitive.slice.html">Rust&#39; slices</a>. The major benefit is that there is no copy when binding a Rust slice to this structure.</p> <p>Let&#39;s define <code>Option_c_char</code>:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[repr(</span><span class="z-entity z-name">C</span><span>)]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> enum</span><span class="z-entity z-name"> Option_c_char</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Some</span><span>(</span><span class="z-entity z-name">Slice_c_char</span><span>),</span></span> <span class="giallo-l"><span class="z-entity z-name"> None</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Finally, we need to define <code>Vector_Node</code> and our own <code>Result</code> for C. They mimic the Rust semantics closely:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[repr(</span><span class="z-entity z-name">C</span><span>)]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> struct</span><span class="z-entity z-name"> Vector_Node</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> buffer</span><span class="z-keyword z-operator">: *</span><span class="z-storage">const</span><span class="z-constant z-other"> Node</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> length</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> usize</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>#[repr(</span><span class="z-entity z-name">C</span><span>)]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> enum</span><span class="z-entity z-name"> Result</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Ok</span><span>(</span><span class="z-entity z-name">Vector_Node</span><span>),</span></span> <span class="giallo-l"><span class="z-entity z-name"> Err</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Alright, all types are declared! It&#39;s time to write the <code>parse</code> function:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[no_mangle]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> extern</span><span class="z-string"> &quot;C&quot;</span><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> parse</span><span>(</span><span class="z-variable">pointer</span><span class="z-keyword z-operator">: *</span><span class="z-storage">const</span><span class="z-variable"> c_char</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> Result</span><span> {</span></span> <span class="giallo-l"><span> …</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>The function takes a pointer from C. It means that the data to analyse (i.e. the Gutenberg blog post) is allocated and owned by C: The memory is allocated on the C side, and Rust is only responsible of the parsing. This is where Rust shines: No copy, no clone, no memory mess, only pointers to this data will be returned to C as slices and vectors.</p> <p>The workflow will be the following:</p> <ul> <li>First thing to do when we deal with C: Check that the pointer is not null,</li> <li>Reconstitute an input from the pointer with <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/ffi/struct.CStr.html"><code>CStr</code></a>. This standard API is useful to abstract C strings from the Rust point of view. The difference is that a C string terminates by a <code>NULL</code> byte and has no length, while in Rust a string has a length and does not terminate with a <code>NULL</code> byte,</li> <li>Run the parser, then transform the AST into the “C AST”.</li> </ul> <p>Let&#39;s do that!</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> extern</span><span class="z-string"> &quot;C&quot;</span><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> parse</span><span>(</span><span class="z-variable">pointer</span><span class="z-keyword z-operator">: *</span><span class="z-storage">const</span><span class="z-variable"> c_char</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> Result</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> if</span><span class="z-variable"> pointer</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">is_null</span><span>() {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-entity z-name"> Result</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Err</span><span>;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> input</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> unsafe</span><span> {</span><span class="z-entity z-name"> CStr</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">from_ptr</span><span>(</span><span class="z-variable">pointer</span><span>)</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">to_bytes</span><span>() };</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> if</span><span class="z-storage"> let</span><span class="z-entity z-name"> Ok</span><span>((</span><span class="z-variable">_remaining</span><span>,</span><span class="z-variable"> nodes</span><span>))</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> gutenberg_post_parser</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">root</span><span>(</span><span class="z-variable">input</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> output</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Vec</span><span class="z-keyword z-operator"> =</span></span> <span class="giallo-l"><span class="z-variable"> nodes</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">into_iter</span><span>()</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">map</span><span>(</span><span class="z-keyword z-operator">|</span><span class="z-variable">node</span><span class="z-keyword z-operator">|</span><span class="z-entity z-name z-function"> into_c</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">node</span><span>))</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">collect</span><span>();</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> vector_node</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> Vector_Node</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> buffer</span><span class="z-keyword z-operator">:</span><span class="z-variable"> output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">as_slice</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">as_ptr</span><span>(),</span></span> <span class="giallo-l"><span class="z-variable"> length</span><span class="z-keyword z-operator">:</span><span class="z-variable"> output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">len</span><span>()</span></span> <span class="giallo-l"><span> };</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name"> mem</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">forget</span><span>(</span><span class="z-variable">output</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name"> Result</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Ok</span><span>(</span><span class="z-variable">vector_node</span><span>);</span></span> <span class="giallo-l"><span> }</span><span class="z-keyword"> else</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Result</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Err</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Only pointers are used in <code>Vector_Node</code>: Pointer to the output, and the length of the output. The conversion is light.</p> <p>Now let&#39;s see the <code>into_c</code> function. Some parts will not be detailed; Not because they are difficult but because they are repetitive. <a rel="noopener external" target="_blank" href="https://github.com/Hywan/gutenberg-parser-rs/blob/master/bindings/c/src/lib.rs">The entire code lands here</a>.</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">fn</span><span class="z-entity z-name z-function"> into_c</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;(</span><span class="z-variable">node</span><span class="z-keyword z-operator">: &amp;</span><span class="z-entity z-name">ast</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Node</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> Node</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> match</span><span class="z-keyword z-operator"> *</span><span class="z-variable">node</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> ast</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Node</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Block</span><span> {</span><span class="z-variable"> name</span><span>,</span><span class="z-variable"> attributes</span><span>,</span><span class="z-keyword"> ref</span><span class="z-variable"> children</span><span> }</span><span class="z-keyword z-operator"> =&gt;</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Node</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Block</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> namespace</span><span class="z-keyword z-operator">:</span><span> …,</span></span> <span class="giallo-l"><span class="z-variable"> name</span><span class="z-keyword z-operator">:</span><span> …,</span></span> <span class="giallo-l"><span class="z-variable"> attributes</span><span class="z-keyword z-operator">:</span><span> …,</span></span> <span class="giallo-l"><span class="z-variable"> children</span><span class="z-keyword z-operator">:</span><span> …</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name"> ast</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Node</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">Phrase</span><span>(</span><span class="z-variable">input</span><span>)</span><span class="z-keyword z-operator"> =&gt;</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Node</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">Phrase</span><span>(…)</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>I want to show <code>namespace</code> for the warm-up (<code>name</code>, <code>attributes</code> and <code>Phrase</code> are very similar), and <code>children</code> because it deals with <code>void</code>.</p> <p>Let&#39;s convert <code>ast::Node::Block.name.0</code> into <code>Node::Block.namespace</code>:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-entity z-name">ast</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Node</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Block</span><span> {</span><span class="z-variable"> name</span><span>, …, … }</span><span class="z-keyword z-operator"> =&gt;</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Node</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Block</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> namespace</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Slice_c_char</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> pointer</span><span class="z-keyword z-operator">:</span><span class="z-variable"> name</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">0</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">as_ptr</span><span>()</span><span class="z-keyword"> as</span><span class="z-keyword z-operator"> *</span><span class="z-storage">const</span><span class="z-variable"> c_char</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> length</span><span class="z-keyword z-operator">:</span><span class="z-variable"> name</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">0</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">len</span><span>()</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>        …</span></span></code></pre> <p>Pretty straightforward so far. <code>namespace</code> is a <code>Slice_c_char</code>. The <code>pointer</code> is the pointer of the <code>name.0</code> slice, and the <code>length</code> is the length of the same <code>name.0</code>. This is the same process for other Rust slices.</p> <p><code>children</code> is different though. It works in three steps:</p> <ol> <li>Collect all children as C AST nodes in a Rust vector,</li> <li>Transform the Rust vector into a valid <code>Vector_Node</code>,</li> <li>Transform the <code>Vector_Node</code> into a <code>*const c_void</code> pointer.</li> </ol> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-entity z-name">ast</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Node</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Block</span><span> { …, …,</span><span class="z-keyword"> ref</span><span class="z-variable"> children</span><span> }</span><span class="z-keyword z-operator"> =&gt;</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Node</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Block</span><span> {</span></span> <span class="giallo-l"><span>        …</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">        children</span><span class="z-keyword z-operator">:</span><span> {</span></span> <span class="giallo-l"><span class="z-comment"> // 1. Collect all children as C AST nodes.</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> output</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Vec</span><span class="z-keyword z-operator"> =</span></span> <span class="giallo-l"><span class="z-variable"> children</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">into_iter</span><span>()</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">map</span><span>(</span><span class="z-keyword z-operator">|</span><span class="z-variable">node</span><span class="z-keyword z-operator">|</span><span class="z-entity z-name z-function"> into_c</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">node</span><span>))</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> .</span><span class="z-entity z-name z-function">collect</span><span>();</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // 2. Transform the vector into a Vector_Node.</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> vector_node</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> if</span><span class="z-variable"> output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">is_empty</span><span>() {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Box</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>(</span></span> <span class="giallo-l"><span class="z-entity z-name"> Vector_Node</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> buffer</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> ptr</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">null</span><span>(),</span></span> <span class="giallo-l"><span class="z-variable"> length</span><span class="z-keyword z-operator">:</span><span class="z-constant z-numeric"> 0</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> )</span></span> <span class="giallo-l"><span> }</span><span class="z-keyword"> else</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Box</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>(</span></span> <span class="giallo-l"><span class="z-entity z-name"> Vector_Node</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> buffer</span><span class="z-keyword z-operator">:</span><span class="z-variable"> output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">as_slice</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">as_ptr</span><span>(),</span></span> <span class="giallo-l"><span class="z-variable"> length</span><span class="z-keyword z-operator">:</span><span class="z-variable"> output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">len</span><span>()</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> )</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // 3. Transform Vector_Node into a *const c_void pointer.</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> vector_node_pointer</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> Box</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">into_raw</span><span>(</span><span class="z-variable">vector_node</span><span>)</span><span class="z-keyword"> as</span><span class="z-keyword z-operator"> *</span><span class="z-storage">const</span><span class="z-variable"> c_void</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name"> mem</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">forget</span><span>(</span><span class="z-variable">output</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> vector_node_pointer</span></span> <span class="giallo-l"><span>        }</span></span></code></pre> <p>Step 1 is straightforward.</p> <p>Step 2 defines what is the behavior when there is no node. In other words, it defines what an empty <code>Vector_Node</code> is. The <code>buffer</code> must contain a <code>NULL</code> raw pointer, and the length is obviously 0. Without this behavior I got various segmentation fault in my code, even if I checked the <code>length</code> before the <code>buffer</code>. Note that <code>Vector_Node</code> is allocated on the heap with <code>Box::new</code> so that the pointer can be easily shared with C.</p> <p>Step 3 uses <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/boxed/struct.Box.html#method.into_raw">the  <code>Box::into_raw</code> function</a> to consume the box and to return the wrapped raw pointer of the data it owns. Rust will not free anything here, it&#39;s our responsability (or the responsability of C to be pedantic). Then the <code>*mut Vector_Node</code> returned by <code>Box::into_raw</code> can be freely casted into <code>*const c_void</code>.</p> <p>Finally, we instruct the compiler to not drop <code>output</code> when it goes out of scope with <code>mem::forget</code> (at this step of the series, you are very likely to know what it does).</p> <p>Personally, I spent few hours to understand why my pointers got random addresses, or were pointing to a <code>NULL</code> data. The resulting code is simple and kind of clear to read, but it wasn&#39;t obvious for me what to do beforehand.</p> <p>And that&#39;s all for the Rust part! The next section will present the C code that calls Rust, and how to compile everything all together.</p> <h2 id="-3">C 🚀 executable<a role="presentation" class="anchor" href="#-3" title="Anchor link to this header">#</a> </h2> <figure role="presentation"> <p><img src="https://mnt.io/series/from-rust-to-beyond/the-c-galaxy/./c-to-executable.png" alt="Rust to C to executable" loading="lazy" decoding="async" /></p> </figure> <p>Now the Rust part is ready, the C part must be written to call it.</p> <h3 id="-4">Minimal Working Example<a role="presentation" class="anchor" href="#-4" title="Anchor link to this header">#</a> </h3> <p>Let&#39;s do something very quick to see if it links and compiles:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-keyword">#include</span><span class="z-string"> &lt;stdlib.h&gt;</span></span> <span class="giallo-l"><span class="z-keyword">#include</span><span class="z-string"> &lt;stdio.h&gt;</span></span> <span class="giallo-l"><span class="z-keyword">#include</span><span class="z-string"> &lt;string.h&gt;</span></span> <span class="giallo-l"><span class="z-keyword">#include</span><span class="z-string"> &quot;gutenberg_post_parser.h&quot;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage z-type">int</span><span class="z-entity z-name z-function"> main</span><span class="z-punctuation z-section">(</span><span class="z-storage z-type">int</span><span class="z-variable z-parameter"> argc</span><span class="z-punctuation z-separator">,</span><span class="z-storage z-type"> char</span><span class="z-keyword z-operator"> **</span><span class="z-variable z-parameter">argv</span><span class="z-punctuation z-section">) {</span></span> <span class="giallo-l"><span> FILE</span><span class="z-keyword z-operator">*</span><span> file </span><span class="z-keyword z-operator">=</span><span class="z-entity z-name z-function"> fopen</span><span class="z-punctuation z-section">(</span><span class="z-variable">argv</span><span>[</span><span class="z-constant z-numeric">1</span><span>]</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;rb&quot;</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> fseek</span><span class="z-punctuation z-section">(</span><span>file</span><span class="z-punctuation z-separator">,</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-separator">,</span><span> SEEK_END</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage z-type"> long</span><span> file_size </span><span class="z-keyword z-operator">=</span><span class="z-entity z-name z-function"> ftell</span><span class="z-punctuation z-section">(</span><span>file</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> rewind</span><span class="z-punctuation z-section">(</span><span>file</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage z-type"> char</span><span class="z-keyword z-operator">*</span><span> file_content </span><span class="z-keyword z-operator">=</span><span class="z-punctuation z-section"> (</span><span class="z-storage z-type">char</span><span class="z-keyword z-operator">*</span><span class="z-punctuation z-section">)</span><span class="z-entity z-name z-function"> malloc</span><span class="z-punctuation z-section">(</span><span>file_size </span><span class="z-keyword z-operator">* sizeof</span><span class="z-punctuation z-section">(</span><span class="z-storage z-type">char</span><span class="z-punctuation z-section">))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> fread</span><span class="z-punctuation z-section">(</span><span>file_content</span><span class="z-punctuation z-separator">,</span><span class="z-constant z-numeric"> 1</span><span class="z-punctuation z-separator">,</span><span> file_size</span><span class="z-punctuation z-separator">,</span><span> file</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Let&#39;s call Rust!</span></span> <span class="giallo-l"><span> Result output </span><span class="z-keyword z-operator">=</span><span class="z-entity z-name z-function"> parse</span><span class="z-punctuation z-section">(</span><span>file_content</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> if</span><span class="z-punctuation z-section"> (</span><span class="z-variable">output</span><span class="z-punctuation z-separator">.</span><span class="z-variable">tag</span><span class="z-keyword z-operator"> ==</span><span> Err</span><span class="z-punctuation z-section">) {</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> printf</span><span class="z-punctuation z-section">(</span><span class="z-string">&quot;Error while parsing.</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-constant z-numeric"> 1</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> const</span><span> Vector_Node nodes </span><span class="z-keyword z-operator">=</span><span class="z-variable"> output</span><span class="z-punctuation z-separator">.</span><span class="z-variable">ok</span><span class="z-punctuation z-separator">.</span><span class="z-variable">_0</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-comment"> // Do something with nodes.</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> free</span><span class="z-punctuation z-section">(</span><span>file_content</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> fclose</span><span class="z-punctuation z-section">(</span><span>file</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span></code></pre> <p>To keep the code concise, I left all the error handlers out of the example. <a rel="noopener external" target="_blank" href="https://github.com/Hywan/gutenberg-parser-rs/blob/master/bindings/c/bin/gutenberg_post_parser.c">The entire code lands here</a> if you&#39;re curious.</p> <p>What happens in this code? The first thing to notice is <code>#include "gutenberg_post_parser.h"</code> which is the header file that is automatically generated by <code>cbindgen</code>.</p> <p>Then a filename from <code>argv[1]</code> is used to read a blog post to parse. The <code>parse</code> function is from Rust, just like the <code>Result</code> and <code>Vector_Node</code> types.</p> <p>The Rust <code>enum Result { Ok(Vector_Node), Err }</code> is compiled to C as:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-keyword">typedef</span><span class="z-storage z-type"> enum</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span>    Ok</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span>    Err</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span> Result_Tag</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">typedef</span><span class="z-storage z-type"> struct</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span> Vector_Node _0</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span> Ok_Body</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">typedef</span><span class="z-storage z-type"> struct</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span> Result_Tag tag</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage z-type"> union</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span> Ok_Body ok</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> }</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span> Result</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>No need to say that the Rust version is easier and more compact to read, but this isn&#39;t the point. To check if <code>Result</code> contains an <code>Ok</code> value or an <code>Err</code>or, one has to check the <code>tag</code> field, like we did with <code>output.tag == Err</code>. To get the content of the <code>Ok</code>, we did <code>output.ok._0</code> (<code>_0</code> is a field from <code>Ok_Body</code>).</p> <p>Let&#39;s compile this with <a rel="noopener external" target="_blank" href="http://clang.llvm.org"><code>clang</code></a>! We assume that this code above is located in the same directory than the <code>gutenberg_post_parser.h</code> file, i.e. in a <code>dist/</code> directory. Thus:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> cd dist</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> clang \</span></span> <span class="giallo-l"><span> # Enable all warnings. \</span></span> <span class="giallo-l"><span> -Wall \</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> # Output executable name. \</span></span> <span class="giallo-l"><span> -o gutenberg-post-parser \</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> # Input source file. \</span></span> <span class="giallo-l"><span> gutenberg_post_parser.c \</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>      # Directory where to find the static library (*.a). \</span></span> <span class="giallo-l"><span>      -L ../target/release/ \</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>      # Link with the gutenberg_post_parser.h file. \</span></span> <span class="giallo-l"><span>      -l gutenberg_post_parser \</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>      # Other libraries to link with.</span></span> <span class="giallo-l"><span>      -l System \</span></span> <span class="giallo-l"><span> -l pthread \</span></span> <span class="giallo-l"><span> -l c \</span></span> <span class="giallo-l"><span> -l m</span></span></code></pre> <p>And that&#39;s all! We end up with a <code>gutenberg-post-parser</code> executable that runs C and Rust.</p> <h3 id="-5">More details<a role="presentation" class="anchor" href="#-5" title="Anchor link to this header">#</a> </h3> <p><a rel="noopener external" target="_blank" href="https://github.com/Hywan/gutenberg-parser-rs/blob/master/bindings/c/bin/gutenberg_post_parser.c">In the original source code</a>, a recursive function that prints the entire AST on <code>stdout</code> can be found, namely <code>print</code> (original, isn&#39;t it?). Here is some side-by-side comparisons between Rust syntax and C syntax.</p> <p>The <code>Vector_Node</code> struct in Rust:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> struct</span><span class="z-entity z-name"> Vector_Node</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> buffer</span><span class="z-keyword z-operator">: *</span><span class="z-storage">const</span><span class="z-constant z-other"> Node</span><span>,</span></span> <span class="giallo-l"><span class="z-variable"> length</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> usize</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>The <code>Vector_Node</code> struct in C:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-keyword">typedef</span><span class="z-storage z-type"> struct</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span> Node </span><span class="z-keyword z-operator">*</span><span>buffer</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage z-type"> uintptr_t</span><span> length</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span> Vector_Node</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>So to respectivelly read the number of nodes (length of the vector) and the nodes in C, one has to write:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-storage">const</span><span class="z-storage z-type"> uintptr_t</span><span> number_of_nodes </span><span class="z-keyword z-operator">=</span><span> nodes</span><span class="z-keyword z-operator">-&gt;</span><span>length</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">for</span><span class="z-punctuation z-section"> (</span><span class="z-storage z-type">uintptr_t</span><span> nth </span><span class="z-keyword z-operator">=</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span><span> nth </span><span class="z-keyword z-operator">&lt;</span><span> number_of_nodes</span><span class="z-punctuation z-terminator">;</span><span class="z-keyword z-operator"> ++</span><span>nth</span><span class="z-punctuation z-section">) {</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span> Node node </span><span class="z-keyword z-operator">=</span><span class="z-variable"> nodes</span><span class="z-punctuation z-separator">-&gt;</span><span class="z-variable">buffer</span><span>[nth]</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span></code></pre> <p>This is almost idiomatic C code!</p> <p>A <code>Node</code> is defined in C as:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-keyword">typedef</span><span class="z-storage z-type"> enum</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span> Block</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> Phrase</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span> Node_Tag</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">typedef</span><span class="z-storage z-type"> struct</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span> Slice_c_char namespace</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> Slice_c_char name</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> Option_c_char attributes</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-storage z-type"> void</span><span class="z-keyword z-operator">*</span><span> children</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span> Block_Body</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">typedef</span><span class="z-storage z-type"> struct</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span> Slice_c_char _0</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span> Phrase_Body</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">typedef</span><span class="z-storage z-type"> struct</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span> Node_Tag tag</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage z-type"> union</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span> Block_Body block</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> Phrase_Body phrase</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> }</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span> Node</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>So once a node is fetched, one can write the following code to detect its kind:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-keyword">if</span><span class="z-punctuation z-section"> (</span><span>node.tag </span><span class="z-keyword z-operator">==</span><span> Block</span><span class="z-punctuation z-section">) {</span></span> <span class="giallo-l"><span class="z-comment">    // …</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span><span class="z-keyword"> else if</span><span class="z-punctuation z-section"> (</span><span>node.tag </span><span class="z-keyword z-operator">==</span><span> Phrase</span><span class="z-punctuation z-section">) {</span></span> <span class="giallo-l"><span class="z-comment"> // …</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span></code></pre> <p>Let&#39;s focus on <code>Block</code> for a second, and let&#39;s print the namespace and the name of the block separated by a slash (<code>/</code>):</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-storage">const</span><span> Block_Body block </span><span class="z-keyword z-operator">=</span><span> node.block</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">const</span><span> Slice_c_char namespace </span><span class="z-keyword z-operator">=</span><span> block.namespace</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage">const</span><span> Slice_c_char name </span><span class="z-keyword z-operator">=</span><span> block.name</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function">printf</span><span class="z-punctuation z-section">(</span></span> <span class="giallo-l"><span class="z-string"> &quot;</span><span class="z-constant z-other">%.*s</span><span class="z-string">/</span><span class="z-constant z-other z-constant z-character">%.s\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-punctuation z-section">    (</span><span class="z-storage z-type">int</span><span class="z-punctuation z-section">)</span><span> namespace.length</span><span class="z-punctuation z-separator">,</span><span> namespace.pointer</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-punctuation z-section">    (</span><span class="z-storage z-type">int</span><span class="z-punctuation z-section">)</span><span> name.length</span><span class="z-punctuation z-separator">,</span><span> name.pointer</span></span> <span class="giallo-l"><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>The special <code>%.*s</code> form in <code>printf</code> allows to print a string based on its length and its pointer.</p> <p>I think it is interesting to see the cast from void to <code>Vector_Node</code> for <code>children</code>. It&#39;s a single line:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-storage">const</span><span> Vector_Node</span><span class="z-keyword z-operator">*</span><span> children </span><span class="z-keyword z-operator">=</span><span class="z-punctuation z-section"> (</span><span class="z-storage">const</span><span> Vector_Node</span><span class="z-keyword z-operator">*</span><span class="z-punctuation z-section">) (</span><span>block.children</span><span class="z-punctuation z-section">)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>I think that&#39;s all for the details!</p> <h3 id="-6">Testing<a role="presentation" class="anchor" href="#-6" title="Anchor link to this header">#</a> </h3> <p>I reckon it is also interesting to see how to unit test C bindings directly with Rust. To emulate a C binding, first, the inputs must be in “C form”, so strings must be C strings. I prefer to write a macro for that:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-entity z-name z-function">macro_rules! str_to_c_char</span><span> {</span></span> <span class="giallo-l"><span> (</span><span class="z-keyword z-operator">$</span><span class="z-variable">input</span><span class="z-keyword z-operator">:</span><span class="z-variable">expr</span><span>)</span><span class="z-keyword z-operator"> =&gt;</span><span> (</span></span> <span class="giallo-l"><span> {</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> ::</span><span class="z-entity z-name">std</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">ffi</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">CString</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">new</span><span>(</span><span class="z-keyword z-operator">$</span><span class="z-variable">input</span><span>)</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">unwrap</span><span>()</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> )</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>And second, the opposite: The <code>parse</code> function returns data for C, so they need to be “converted back” to Rust. Again, I prefer to write a macro for that:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-entity z-name z-function">macro_rules! slice_c_char_to_str</span><span> {</span></span> <span class="giallo-l"><span> (</span><span class="z-keyword z-operator">$</span><span class="z-variable">input</span><span class="z-keyword z-operator">:</span><span class="z-variable">ident</span><span>)</span><span class="z-keyword z-operator"> =&gt;</span><span> (</span></span> <span class="giallo-l"><span class="z-keyword"> unsafe</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> ::</span><span class="z-entity z-name">std</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">ffi</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">CStr</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">from_bytes_with_nul_unchecked</span><span>(</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> ::</span><span class="z-entity z-name">std</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">slice</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">from_raw_parts</span><span>(</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> $</span><span class="z-variable">input</span><span class="z-keyword z-operator">.</span><span>pointer </span><span class="z-keyword">as</span><span class="z-keyword z-operator"> *</span><span class="z-storage">const</span><span class="z-entity z-name"> u8</span><span>,</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> $</span><span class="z-variable">input</span><span class="z-keyword z-operator">.</span><span>length </span><span class="z-keyword z-operator">+</span><span class="z-constant z-numeric"> 1</span></span> <span class="giallo-l"><span> )</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">to_str</span><span>()</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">unwrap</span><span>()</span></span> <span class="giallo-l"><span> )</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> )</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>All right! The final step is to write a unit test. As an example, a <code>Phrase</code> will be tested; The idea remains the same for <code>Block</code> but the code is more concise for the former.</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[test]</span></span> <span class="giallo-l"><span class="z-keyword">fn</span><span class="z-entity z-name z-function"> test_root_with_a_phrase</span><span>() {</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> input</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> str_to_c_char!</span><span>(</span><span class="z-string">&quot;foo&quot;</span><span>);</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> output</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> parse</span><span>(</span><span class="z-variable">input</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">as_ptr</span><span>());</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> match</span><span class="z-variable"> output</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Result</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Ok</span><span>(</span><span class="z-variable">result</span><span>)</span><span class="z-keyword z-operator"> =&gt;</span><span class="z-keyword"> match</span><span class="z-variable"> result</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Vector_Node</span><span> {</span><span class="z-variable"> buffer</span><span>,</span><span class="z-variable"> length</span><span> }</span><span class="z-keyword"> if</span><span class="z-variable"> length</span><span class="z-keyword z-operator"> ==</span><span class="z-constant z-numeric"> 1</span><span class="z-keyword z-operator"> =&gt;</span></span> <span class="giallo-l"><span class="z-keyword"> match unsafe</span><span> {</span><span class="z-keyword z-operator"> &amp;*</span><span class="z-variable">buffer</span><span> } {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Node</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">Phrase</span><span>(</span><span class="z-variable">phrase</span><span>)</span><span class="z-keyword z-operator"> =&gt;</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> assert_eq!</span><span>(</span><span class="z-entity z-name z-function">slice_c_char_to_str!</span><span>(</span><span class="z-variable">phrase</span><span>),</span><span class="z-string"> &quot;foo&quot;</span><span>);</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> _</span><span class="z-keyword z-operator"> =&gt;</span><span class="z-entity z-name z-function"> assert!</span><span>(</span><span class="z-constant z-language">false</span><span>)</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> _</span><span class="z-keyword z-operator"> =&gt;</span><span class="z-entity z-name z-function"> assert!</span><span>(</span><span class="z-constant z-language">false</span><span>)</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> _</span><span class="z-keyword z-operator"> =&gt;</span><span class="z-entity z-name z-function"> assert!</span><span>(</span><span class="z-constant z-language">false</span><span>)</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>What happens here? The <code>input</code> and <code>output</code> have been prepared. The former is the C string <code>"foo"</code>. The latter is the result of <code>parse</code>. Then there is a <code>match</code> to validate the form of the AST. Rust is very expressive, and this test is a good illustration. The <code>Vector_Node</code> branch is activated if and only if the length of the vector is 1, which is expressed with the guard <code>if length == 1</code>. Then the content of the phrase is transformed into a Rust string and compared with a regular <code>assert_eq!</code> macro.</p> <p>Note that —in this case— <code>buffer</code> is of type <code>*const Node</code>, so it represents the first element of the vector. If we want to access the next elements, we would need to use <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/vec/struct.Vec.html#method.from_raw_parts">the <code>Vec::from_raw_parts</code> function</a> to get a proper Rust API to manipulate this vector.</p> <h2 id="-7">Conclusion<a role="presentation" class="anchor" href="#-7" title="Anchor link to this header">#</a> </h2> <p>We have seen that Rust can be embedded in C very easily. In this example, Rust has been compiled to a static library, and a header file; the former is native with Rust tooling, the latter is automatically generated with <code>cbindgen</code>.</p> <p>The parser written in Rust manipulates a string allocated and owned by C. Rust only returns pointers (as slices) to this string back to C. Then C has no difficulties to read those pointers. The only tricky part is that Rust allocates some data (like vectors of nodes) on the heap that C must free. The “free” part has been omitted from the article though: It does not represent a big challenge, and a C developer is likely to be used to this kind of situation.</p> <p>The fact that Rust does not use a garbage collector makes it a perfect candidate for these usecases. The story behind these bindings is actually all about memory: Who allocates what, and What is the form of the data in memory. Rust has a <code>#[repr(C)]</code> decorator to instruct the compiler to use a C memory layout, which makes C bindings extremely simple for the developer.</p> <p>We have also seen that the C bindings can be unit tested within Rust itself, and run with <code>cargo test</code>.</p> <p><code>cbindgen</code> is a precious companion in this adventure, by automating the header file generation, it reduces the update and the maintenance of the code to a <code>build.rs</code> script.</p> <p>In terms of performance, C should have similar results than Rust, i.e. extremely fast. I didn&#39;t run a benchmark to verify this statement, it&#39;s purely theoretical. It can be a subject for a next post!</p> <p>Now that we have successfully embedded Rust in C, a whole new world opens up to us! The next episode will push Rust in the PHP world as a native extension (written in C). Let&#39;s go!</p> The ASM.js galaxy 2018-08-28T00:00:00+00:00 2018-08-28T00:00:00+00:00 Unknown https://mnt.io/series/from-rust-to-beyond/the-asm-js-galaxy/ <p>The second galaxy that our Rust parser will explore is the ASM.js galaxy. This post will explain what ASM.js is, how to compile the parser into ASM.js, and how to use the ASM.js module with Javascript in a browser. The goal is to use ASM.js as a fallback to WebAssembly when it is not available. I highly recommend to read <a href="https://mnt.io/series/from-rust-to-beyond/the-webassembly-galaxy/">the previous episode</a> about WebAssembly since they have a lot in common.</p> <h2 id="what-is-asm-js-and-why">What is ASM.js, and why?<a role="presentation" class="anchor" href="#what-is-asm-js-and-why" title="Anchor link to this header">#</a> </h2> <p>The main programming language on the Web is Javascript. Applications that want to exist on the Web had to compile to Javascript, like for example games. But a problem occurs: The resulting file is heavy (hence WebAssembly) and Javascript virtual machines have difficulties to optimise this particular code, resulting in slow or inefficient executions (considering the example of games). Also —in this context— Javascript is a compilation target, and as such, some language constructions are useless (like <code>eval</code>).</p> <p>So what if a “new” language can be a compilation target and still be executed by Javascript virtual machines? This is WebAssembly today, but in 2013, the solution was <a rel="noopener external" target="_blank" href="http://asmjs.org/">ASM.js</a>:</p> <blockquote> <p><strong>asm.js</strong>, a strict subset of Javascript that can be used as a low-level, efficient target language for compilers. This sublanguage effectively describes a sandboxed virtual machine for memory-unsafe languages like C or C++. A combination of static and dynamic validation allows Javascript engines to employ an ahead-of-time (AOT) optimizing compilation strategy for valid asm.js code.</p> </blockquote> <p>So an ASM.js program is a regular Javascript program. It is not a new language but a subset of it. It can be executed by any Javascript virtual machines. However, the specific usage of the magic statement <code>'use asm';</code> instructs the virtual machine to optimise the program with an ASM.js “engine”.</p> <p>ASM.js introduces types by using arithmetical operators as an annotation system. For instance, <code>x | 0</code> annotes <code>x</code> to be an integer, <code>+x</code> annotates <code>x</code> to be a double, and <code>fround(x)</code> annotates <code>x</code> to be a float. The following example declares a function <code>fn increment(x: u32) -&gt; u32</code>:</p> <pre class="giallo z-code"><code data-lang="javascript"><span class="giallo-l"><span class="z-storage z-type z-function">function</span><span class="z-entity z-name z-function"> increment</span><span>(</span><span class="z-variable z-parameter">x</span><span>) {</span></span> <span class="giallo-l"><span class="z-variable">    x</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> x</span><span class="z-keyword z-operator"> |</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword">    return</span><span> (</span><span class="z-variable">x</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 1</span><span>)</span><span class="z-keyword z-operator"> |</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Another important difference is that ASM.js works by module in order to isolate them from Javascript. A module is a function that takes 3 arguments:</p> <ol> <li><code>stdlib</code>, an object with references to standard library APIs,</li> <li><code>foreign</code>, an object with user-defined functionalities (such as sending something over a WebSocket),</li> <li><code>heap</code>, an array buffer representing the memory (because memory is manually managed).</li> </ol> <p>But it's still Javascript. So the good news is that if your virtual machine has no specific optimisations for ASM.js, it is executed as any regular Javascript program. And if it does, then you get a pleasant boost.</p> <figure> <p><img src="https://mnt.io/series/from-rust-to-beyond/the-asm-js-galaxy/./asm-benchmarks.png" alt="Graph" loading="lazy" decoding="async" /></p> <figcaption> <p>A graph showing 3 benchmarks running against different Javascript engines: Firefox, Firefox + asm.js, Google, and native.</p> </figcaption> </figure> <p>Remember that ASM.js has been designed to be a compilation target. So normally you don&#39;t have to care about that because it is the role of the compiler. The typical compilation and execution pipeline from C or C++ to the Web looks like this:</p> <figure> <p><img src="https://mnt.io/series/from-rust-to-beyond/the-asm-js-galaxy/./asm-pipeline.png" alt="Pipeline" loading="lazy" decoding="async" /></p> <figcaption> <p>Classical ASM.js compilation and execution pipeline from C or C++ to the Web.</p> </figcaption> </figure> <p><a rel="noopener external" target="_blank" href="http://kripken.github.io/emscripten-site/">Emscripten</a>, as seen in the schema above, is a very important project in this whole evolution of the Web platform. Emscripten is:</p> <blockquote> <p>a toolchain for compiling to asm.js and WebAssembly, built using LLVM, that lets you run C and C++ on the web at near-native speed without plugins.</p> </blockquote> <p>You are very likely to see this name one day or another if you work with ASM.js or WebAssembly.</p> <p>I will not explain deeply what ASM.js is with a lot of examples. I recommend instead to read <a rel="noopener external" target="_blank" href="https://johnresig.com/blog/asmjs-javascript-compile-target/">Asm.js: The Javascript Compile Target</a> by John Resig, or <a rel="noopener external" target="_blank" href="http://kripken.github.io/mloc_emscripten_talk/">Big Web app? Compile it!</a> by Alon Zakai.</p> <p>Our process will be different though. We will not compile our Rust code directly to ASM.js, but instead, we will compile it to WebAssembly, which in turn will be compiled into ASM.js.</p> <h2 id="">Rust 🚀 ASM.js<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h2> <figure role="presentation"> <p><img src="https://mnt.io/series/from-rust-to-beyond/the-asm-js-galaxy/./rust-to-asm-js.png" alt="Rust to ASM.js" loading="lazy" decoding="async" /></p> </figure> <p>This episode will be very short, and somehow the most easiest one. To compile Rust to ASM.js, you need to first compile it to WebAssembly (<a href="https://mnt.io/series/from-rust-to-beyond/the-webassembly-galaxy/">see the previous episode</a>), and then compile the WebAssembly binary into ASM.js.</p> <p>Actually, ASM.js is mostly required when the browser does not support WebAssembly, like Internet Explorer. It is essentially a fallback to run our program on the Web.</p> <p>The workflow is the following:</p> <ol> <li>Compile your Rust project into WebAssembly,</li> <li>Compile your WebAssembly binary into an ASM.js module,</li> <li>Optimise and shrink the ASM.js module.</li> </ol> <p><a rel="noopener external" target="_blank" href="https://github.com/WebAssembly/binaryen">The wasm2js tool</a> will be your best companion to compile the WebAssembly binary into an ASM.js module. It is part of Binaryen project. Then, assuming we have the WebAssembly binary of our program, all we have to do is:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> wasm2js --pedantic --output gutenberg_post_parser.asm.js gutenberg_post_parser.wasm</span></span></code></pre> <p>At this step, the <code>gutenberg_post_parser.asm.js</code> weights 212kb. The file contains ECMAScript 6 code. And remember that old browsers are considered, like Internet Explorer, so the code needs to be transformed a little bit. To optimise and shrink the ASM.js module, we will use <a rel="noopener external" target="_blank" href="https://github.com/mishoo/UglifyJS2/tree/harmony">the <code>uglify-es</code> tool</a>, like this:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Transform code, and embed in a</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function">.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> sed -i </span><span class="z-string">&#39;&#39; &#39;1s/^/function GUTENBERG_POST_PARSER_ASM_MODULE() {/; s/export //&#39;</span><span> gutenberg_post_parser.asm.js</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> echo </span><span class="z-string">&#39;return { root, alloc, dealloc, memory }; }&#39;</span><span class="z-keyword z-operator"> &gt;&gt;</span><span> gutenberg_post_parser.asm.js</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Shrink the code.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> uglifyjs --compress --mangle --output .temp.asm.js gutenberg_post_parser.asm.js</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> mv .temp.asm.js gutenberg_post_parser.asm.js</span></span></code></pre> <p>Just like we did for the WebAssembly binary, we can compress the resulting files with <a rel="noopener external" target="_blank" href="http://www.gzip.org/"><code>gzip</code></a> and <a rel="noopener external" target="_blank" href="https://github.com/google/brotli"><code>brotli</code></a>:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Compress.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> gzip --best --stdout gutenberg_post_parser.asm.js </span><span class="z-keyword z-operator">&gt;</span><span> gutenberg_post_parser.asm.js.gz</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> brotli --best --stdout</span><span class="z-variable"> --lgwin</span><span class="z-keyword z-operator">=</span><span class="z-string">24</span><span class="z-entity z-name"> gutenberg_post_parser.asm.js</span><span class="z-keyword z-operator"> &gt;</span><span class="z-string"> gutenberg_post_parser.asm.js.br</span></span></code></pre> <p>We end up with the following file sizes:</p> <ul> <li><code>.asm.js</code>: 54kb,</li> <li><code>.asm.js.gz</code>: 13kb,</li> <li><code>.asm.js.br</code>: 11kb.</li> </ul> <p>That&#39;s again pretty small!</p> <p>When you think about it, this is a lot of transformations: From Rust to WebAssembly to Javascript/ASM.js… The amount of tools is rather small compared to the amount of work. It shows a well-designed pipeline and a collaboration between many groups of people.</p> <p>Aside: If you are reading this post, I assume you are developers. And as such, I&#39;m sure you can spend hours looking at a source code like if it is a master painting. Did you ever wonder what a Rust program looks like once compiled to Javascript? See bellow:</p> <figure> <p><img src="https://mnt.io/series/from-rust-to-beyond/the-asm-js-galaxy/./rust-as-asm-js.png" alt="Rust as ASM.js" loading="lazy" decoding="async" /></p> <figcaption> <p>A Rust program compiled as WebAssembly compiled as ASM.js.</p> </figcaption> </figure> <p>I like it probably too much.</p> <h2 id="-1">ASM.js 🚀 Javascript<a role="presentation" class="anchor" href="#-1" title="Anchor link to this header">#</a> </h2> <p>The resulting <code>gutenberg_post_parser.asm.js</code> file contains a single function named <code>GUTENBERG_POST_PARSER_ASM_MODULE</code> which returns an object pointing to 4 private functions:</p> <ol> <li><code>root</code>, the axiom of our grammar,</li> <li><code>alloc</code>, to allocate memory,</li> <li><code>dealloc</code>, to deallocate memory, and</li> <li><code>memory</code>, the memory buffer.</li> </ol> <p>It sounds familiar if you have read <a href="https://mnt.io/series/from-rust-to-beyond/the-webassembly-galaxy/">the previous episode with WebAssembly</a>. Don&#39;t expect <code>root</code> to return a full AST: It will return a pointer to the memory, and the data need to be encoded and decoded, and to write into and to read from the memory the same way. Yes, the same way. <em>The exact same way</em>. So the code of the boundary layer is strictly the same. Do you remember the <code>Module</code> object in our WebAssembly Javascript boundary? This is exactly what the <code>GUTENBERG_POST_PARSER_ASM_MODULE</code> function returns. You can replace <code>Module</code> by the returned object, <em>et voilà</em>!</p> <p><a rel="noopener external" target="_blank" href="https://github.com/Hywan/gutenberg-parser-rs/blob/master/bindings/asmjs/bin/gutenberg_post_parser.asm.mjs">The entired code lands here</a>. It completely reuses the Javascript boundary layer for WebAssembly. It just sets the <code>Module</code> differently, and it does not load the WebAssembly binary. Consequently, the ASM.js boundary layer is made of 34 lines of code, only 🙃. It compresses to 218 bytes.</p> <h2 id="-2">Conclusion<a role="presentation" class="anchor" href="#-2" title="Anchor link to this header">#</a> </h2> <p>We have seen that ASM.js can be fallback to WebAssembly in environments that only support Javascript (like Internet Explorer), with or without ASM.js optimisations.</p> <p>The resulting ASM.js file and its boundary layer are quite small. By design, the ASM.js boundary layer reuses almost the entire WebAssembly boundary layer. Therefore there is again a tiny surface of code to review and to maintain, which is helpful.</p> <p>We have seen in the previous episode that Rust is very fast. We have been able to observe the same statement for WebAssembly compared to the actual Javascript parser for the Gutenberg project. However, is it still true for the ASM.js module? In this case, ASM.js is a fallback, and like all fallbacks, they are notably slower than the targeted implementations. Let&#39;s run the same benchmark but use the Rust parser as an ASM.js module:</p> <figure> <table><thead><tr><th>Document</th><th>Javascript parser (ms)</th><th>Rust parser as an ASM.js module (ms)</th><th>speedup</th></tr></thead><tbody> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/demo-post.html"><code>demo-post.html</code></a></td><td>15.368</td><td>2.718</td><td>× 6</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/shortcode-shortcomings.html"><code>shortcode-shortcomings.html</code></a></td><td>31.022</td><td>8.004</td><td>× 4</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/redesigning-chrome-desktop.html"><code>redesigning-chrome-desktop.html</code></a></td><td>106.416</td><td>19.223</td><td>× 6</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/web-at-maximum-fps.html"><code>web-at-maximum-fps.html</code></a></td><td>82.92</td><td>27.197</td><td>× 3</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/early-adopting-the-future.html"><code>early-adopting-the-future.html</code></a></td><td>119.880</td><td>38.321</td><td>× 3</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/pygmalian-raw-html.html"><code>pygmalian-raw-html.html</code></a></td><td>349.075</td><td>23.656</td><td>× 15</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/moby-dick-parsed.html"><code>moby-dick-parsed.html</code></a></td><td>2,543.75</td><td>361.423</td><td>× 7</td></tr> </tbody></table> <figcaption> <p>Benchmark between Javascript parser and Rust parser as an ASM.js module.</p> </figcaption> </figure> <p>The ASM.js module of the Rust parser is in average 6 times faster than the actual Javascript implementation. The median speedup is 6. That&#39;s far from the WebAssembly results, but this is a fallback, and in average, it is 6 times faster, which is really great!</p> <p>So not only the whole pipeline is safer because it starts from Rust, but it ends to be faster than Javascript.</p> <p>We will see in the next episodes of this series that Rust can reach a lot of galaxies, and the more it travels, the more it gets interesting.</p> <p>Thanks for reading!</p> The WebAssembly galaxy 2018-08-22T00:00:00+00:00 2018-08-22T00:00:00+00:00 Unknown https://mnt.io/series/from-rust-to-beyond/the-webassembly-galaxy/ <p>The first galaxy that our Rust parser will explore is the WebAssembly (Wasm) galaxy. This post will explain what WebAssembly is, how to compile the parser into WebAssembly, and how to use the WebAssembly binary with Javascript in a browser and with NodeJS.</p> <h2 id="what-is-webassembly-and-why">What is WebAssembly, and why?<a role="presentation" class="anchor" href="#what-is-webassembly-and-why" title="Anchor link to this header">#</a> </h2> <p>If you already know WebAssembly, you can skip this section.</p> <p><a rel="noopener external" target="_blank" href="https://webassembly.org/">WebAssembly</a> defines itself as:</p> <blockquote> <p>WebAssembly (abbreviated <em>Wasm</em>) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.</p> </blockquote> <p>Should I say more? Probably, yes…</p> <p>WebAssembly is a <em>new portable binary format</em>. Languages like C, C++, or Rust already compiles to this target. It is the spirit successor of <a rel="noopener external" target="_blank" href="http://asmjs.org/">ASM.js</a>. By spirit successor, I mean it is the same people trying to extend the Web platform and to make the Web fast that are working on both technologies. They share some design concepts too, but that's not really important right now.</p> <p>Before WebAssembly, programs had to compile to Javascript in order to run on the Web platform. The resulting files were most of the time large. And because the Web is a network, the files had to be downloaded, and it took time. WebAssembly is designed to be encoded in a size- and load-time efficient <a rel="noopener external" target="_blank" href="https://webassembly.org/docs/binary-encoding/">binary format</a>.</p> <p>WebAssembly is also faster than Javascript for many reasons. Despites all the crazy optimisations engineers put in the Javascript virtual machines, Javascript is a weakly and dynamically typed language, which requires to be interpreted. WebAssembly aims to execute at native speed by taking advantage of <a rel="noopener external" target="_blank" href="https://webassembly.org/docs/portability/#assumptions-for-efficient-execution">common hardware capabilities</a>. <a rel="noopener external" target="_blank" href="https://hacks.mozilla.org/2018/01/making-webassembly-even-faster-firefoxs-new-streaming-and-tiering-compiler/">WebAssembly also loads faster than Javascript</a> because parsing and compiling happen while the binary is streamed from the network. So once the binary is entirely fetched, it is ready to run: No need to wait on the parser and the compiler before running the program.</p> <p>Today, and our blog series is a perfect example of that, it is possible to write a Rust program, and to compile it to run on the Web platform. Why? Because WebAssembly is implemented by <a rel="noopener external" target="_blank" href="https://caniuse.com/#search=wasm">all major browsers</a>, and because it has been designed for the Web: To live and run on the Web platform (like a browser). But its portable aspect and <a rel="noopener external" target="_blank" href="https://webassembly.org/docs/semantics/#linear-memory">its safe and sandboxed memory design</a> make it a good candidate to run outside of the Web platform (see <a rel="noopener external" target="_blank" href="https://github.com/geal/serverless-wasm">a serverless Wasm framework</a>, or <a rel="noopener external" target="_blank" href="https://github.com/losfair/IceCore">an application container built for Wasm</a>).</p> <p>I think it is important to remind that WebAssembly is not here to replace Javascript. It is just another technology which solves many problems we can meet today, like load-time, safety, or speed.</p> <h2 id="rust-rocket-webassembly">Rust 🚀 WebAssembly<a role="presentation" class="anchor" href="#rust-rocket-webassembly" title="Anchor link to this header">#</a> </h2> <figure role="presentation"> <p><img src="https://mnt.io/series/from-rust-to-beyond/the-webassembly-galaxy/./rust-to-wasm.png" alt="Rust to Wasm" loading="lazy" decoding="async" /></p> </figure> <p><a rel="noopener external" target="_blank" href="https://github.com/rustwasm/team">The Rust Wasm team</a> is a group of people leading the effort of pushing Rust into WebAssembly with a set of tools and integrations. <a rel="noopener external" target="_blank" href="https://rustwasm.github.io/book/">There is a book</a> explaining how to write a WebAssembly program with Rust.</p> <p>With the Gutenberg Rust parser, I didn&#39;t use tools like <a rel="noopener external" target="_blank" href="https://github.com/rustwasm/wasm-bindgen/"><code>wasm-bindgen</code></a> (which is a pure gem) when I started the project few months ago because I hit some limitations. Note that some of them have been addressed since then! Anyway, we will do most of the work by hand, and I think this is an excellent way to understand how things work in the background. When you are familiar with WebAssembly interactions, then <code>wasm-bindgen</code> is an excellent tool to have within easy reach, because it abstracts all the interactions and let you focus on your code logic instead.</p> <p>I would like to remind the reader that the Gutenberg Rust parser exposes one AST, and one <code>root</code> function (the axiom of the grammar), respectively defined as:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> enum</span><span class="z-entity z-name"> Node</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt; {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Block</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> name</span><span class="z-keyword z-operator">:</span><span> (</span><span class="z-entity z-name">Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;,</span><span class="z-entity z-name"> Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;),</span></span> <span class="giallo-l"><span class="z-variable"> attributes</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Option</span><span>&lt;</span><span class="z-entity z-name">Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;&gt;,</span></span> <span class="giallo-l"><span class="z-variable"> children</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Vec</span><span>&lt;</span><span class="z-entity z-name">Node</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;&gt;</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> Phrase</span><span>(</span><span class="z-entity z-name">Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;)</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>and</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub fn</span><span class="z-entity z-name z-function"> root</span><span>(</span></span> <span class="giallo-l"><span class="z-variable"> input</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Input</span></span> <span class="giallo-l"><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> Result</span><span>&lt;(</span><span class="z-entity z-name">Input</span><span>,</span><span class="z-entity z-name"> Vec</span><span>&lt;</span><span class="z-variable">ast</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Node</span><span>&gt;),</span><span class="z-variable"> nom</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Err</span><span>&lt;</span><span class="z-entity z-name">Input</span><span>&gt;&gt;;</span></span></code></pre> <p>Knowing that, let&#39;s go!</p> <h3 id="">General design<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h3> <p>Here is our general design or workflow:</p> <ol> <li>Javascript (for instance) writes the blog post to parse into the WebAssembly module memory,</li> <li>Javascript runs the <code>root</code> function by passing a pointer to the memory, and the length of the blog post,</li> <li>Rust reads the blog post from the memory, runs the Gutenberg parser, compiles the resulting AST into a sequence of bytes, and returns the pointer to this sequence of bytes to Javascript,</li> <li>Javascript reads the memory from the received pointer, and decodes the sequence of bytes as Javascript objects in order to recreate an AST with a friendly API.</li> </ol> <p>Why a sequence of bytes? Because WebAssembly only supports integers and floats, not strings or vectors, and also because our Rust parser takes a slice of bytes as input, so this is handy.</p> <p>We use the term <em>boundary layer</em> to refer to this Javascript piece of code responsible to read from and write into the WebAssembly module memory, and responsible of exposing a friendly API.</p> <p>Now, we will focus on the Rust code. It consists of only 4 functions:</p> <ul> <li><code>alloc</code> to allocate memory (exported),</li> <li><code>dealloc</code> to deallocate memory (exported),</li> <li><code>root</code> to run the parser (exported),</li> <li><code>into_bytes</code> to transform the AST into a sequence of bytes.</li> </ul> <p><a rel="noopener external" target="_blank" href="https://github.com/Hywan/gutenberg-parser-rs/blob/master/bindings/wasm/src/lib.rs">The entire code lands here</a>. It is approximately 150 lines of code. We explain it.</p> <h3 id="-1">Memory allocation<a role="presentation" class="anchor" href="#-1" title="Anchor link to this header">#</a> </h3> <p>Let&#39;s start by the memory allocator. I choose to use <a rel="noopener external" target="_blank" href="https://github.com/rustwasm/wee_alloc"><code>wee_alloc</code> for the memory allocator</a>. It is specifically designed for WebAssembly by being very small (less than a kilobyte) and efficient.</p> <p>The following piece of code describes the memory allocator setup and the “prelude” for our code (enabling some compiler features, like <code>alloc</code>, declaring external crates, some aliases, and declaring required function like <code>panic</code>, <code>oom</code> etc.). This can be considered as a boilerplate:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#![no_std]</span></span> <span class="giallo-l"><span>#![feature(</span></span> <span class="giallo-l"><span> alloc,</span></span> <span class="giallo-l"><span> alloc_error_handler,</span></span> <span class="giallo-l"><span> core_intrinsics,</span></span> <span class="giallo-l"><span> lang_items</span></span> <span class="giallo-l"><span>)]</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">extern</span><span class="z-keyword"> crate</span><span> gutenberg_post_parser;</span></span> <span class="giallo-l"><span class="z-storage">extern</span><span class="z-keyword"> crate</span><span> wee_alloc;</span></span> <span class="giallo-l"><span>#[macro_use]</span><span class="z-storage"> extern</span><span class="z-keyword"> crate</span><span> alloc;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> gutenberg_post_parser</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">ast</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Node</span><span>;</span></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> alloc</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">vec</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Vec</span><span>;</span></span> <span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> core</span><span class="z-keyword z-operator">::</span><span>{mem, slice};</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>#[global_allocator]</span></span> <span class="giallo-l"><span class="z-storage">static</span><span class="z-constant z-other"> ALLOC</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> wee_alloc</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">WeeAlloc</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> wee_alloc</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">WeeAlloc</span><span class="z-keyword z-operator">::</span><span class="z-constant z-other">INIT</span><span>;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>#[panic_handler]</span></span> <span class="giallo-l"><span class="z-keyword">fn</span><span class="z-entity z-name z-function"> panic</span><span>(</span><span class="z-variable">_info</span><span class="z-keyword z-operator">: &amp;</span><span class="z-entity z-name">core</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">panic</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">PanicInfo</span><span>)</span><span class="z-keyword z-operator"> -&gt; !</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> unsafe</span><span> {</span><span class="z-entity z-name"> core</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">intrinsics</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">abort</span><span>(); }</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>#[alloc_error_handler]</span></span> <span class="giallo-l"><span class="z-keyword">fn</span><span class="z-entity z-name z-function"> oom</span><span>(</span><span class="z-variable">_</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> core</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">alloc</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Layout</span><span>)</span><span class="z-keyword z-operator"> -&gt; !</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> unsafe</span><span> {</span><span class="z-entity z-name"> core</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">intrinsics</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">abort</span><span>(); }</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// This is the definition of `std::ffi::c_void`, but Wasm runs without std in our case.</span></span> <span class="giallo-l"><span>#[repr(</span><span class="z-entity z-name">u8</span><span>)]</span></span> <span class="giallo-l"><span>#[allow(non_camel_case_types)]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> enum</span><span class="z-variable"> c_void</span><span> {</span></span> <span class="giallo-l"><span> #[doc(hidden)]</span></span> <span class="giallo-l"><span class="z-variable"> __variant1</span><span>,</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> #[doc(hidden)]</span></span> <span class="giallo-l"><span class="z-variable"> __variant2</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>The Rust memory is the WebAssembly memory. Rust will allocate and deallocate memory on its own, but Javascript for instance needs to allocate and deallocate WebAssembly memory in order to communicate/exchange data. So we need to export one function to allocate memory and one function to deallocate memory.</p> <p>Once again, this is almost a boilerplate. The <code>alloc</code> function creates an empty vector of a specific capacity (because it is a linear segment of memory), and returns a pointer to this empty vector:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[no_mangle]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> extern</span><span class="z-string"> &quot;C&quot;</span><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> alloc</span><span>(</span><span class="z-variable">capacity</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> usize</span><span>)</span><span class="z-keyword z-operator"> -&gt; *</span><span class="z-storage">mut</span><span class="z-variable"> c_void</span><span> {</span></span> <span class="giallo-l"><span class="z-storage">   let mut</span><span class="z-variable"> buffer</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> Vec</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">with_capacity</span><span>(</span><span class="z-variable">capacity</span><span>);</span></span> <span class="giallo-l"><span class="z-storage">  let</span><span class="z-variable"> pointer</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> buffer</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">as_mut_ptr</span><span>();</span></span> <span class="giallo-l"><span class="z-entity z-name">  mem</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">forget</span><span>(</span><span class="z-variable">buffer</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">  pointer</span><span class="z-keyword"> as</span><span class="z-keyword z-operator"> *</span><span class="z-storage">mut</span><span class="z-variable"> c_void</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Note the <code>#[no_mangle]</code> attribute that instructs the Rust compiler to not mangle the function name, i.e. to not rename it. And <code>extern "C"</code> to export the function in the WebAssembly module, so it is “public” from outside the WebAssembly binary.</p> <p>The code is pretty straightforward and matches what we announced earlier: A <code>Vec</code> is allocated with a specific capacity, and the pointer to this vector is returned. The important part is <code>mem::forget(buffer)</code>. It is required so that Rust will <em>not</em> deallocate the vector once it goes out of scope. Indeed, Rust enforces <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization">Resource Acquisition Is Initialization (RAII)</a>, so whenever an object goes out of scope, its destructor is called and its owned resources are freed. This behavior shields against resource leaks bugs, and this is why we will never have to manually free memory or worry about memory leaks in Rust (<a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/rust-by-example/scope/raii.html">see some RAII examples</a>). In this case, we want to allocate and keep the allocation after the function execution, hence <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/mem/fn.forget.html">the <code>mem::forget</code> call</a>.</p> <p>Let&#39;s jump on the <code>dealloc</code> function. The goal is to recreate a vector based on a pointer and a capacity, and to let Rust drops it:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[no_mangle]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> extern</span><span class="z-string"> &quot;C&quot;</span><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> dealloc</span><span>(</span><span class="z-variable">pointer</span><span class="z-keyword z-operator">: *</span><span class="z-storage">mut</span><span class="z-variable"> c_void</span><span>,</span><span class="z-variable"> capacity</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> usize</span><span>) {</span></span> <span class="giallo-l"><span class="z-keyword">   unsafe</span><span> {</span></span> <span class="giallo-l"><span class="z-storage">  let</span><span class="z-variable"> _</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> Vec</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">from_raw_parts</span><span>(</span><span class="z-variable">pointer</span><span>,</span><span class="z-constant z-numeric"> 0</span><span>,</span><span class="z-variable"> capacity</span><span>);</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p><a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/vec/struct.Vec.html#method.from_raw_parts">The <code>Vec::from_raw_parts</code> function</a> is marked as unsafe, so we need to delimit it in an <code>unsafe</code> block so that the <code>dealloc</code> function is considered as safe.</p> <p>The variable <code>_</code> contains our data to deallocate, and it goes out of scope immediately, so Rust drops it.</p> <h3 id="-2">From input to a flat AST<a role="presentation" class="anchor" href="#-2" title="Anchor link to this header">#</a> </h3> <p>Now the core of the binding! The <code>root</code> function reads the blog post to parse based on a pointer and a length, then it parses it. If the result is OK, it serializes the AST into a sequence of bytes, i.e. it flatten it, otherwise it returns an empty sequence of bytes.</p> <figure> <p><img src="https://mnt.io/series/from-rust-to-beyond/the-webassembly-galaxy/./flatten-ast.png" alt="Flatten AST" loading="lazy" decoding="async" /></p> <figcaption> <p>The image illustrates the flow of the data: first off there is a blog post; second there is the AST of the blog post; finally there is a linear byte-encoded representation of the AST of the blog post.</p> </figcaption> </figure> <p>The logic flow of the parser: The input on the left is parsed into an AST, which is serialized into a flat sequence of bytes on the right.</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span>#[no_mangle]</span></span> <span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> extern</span><span class="z-string"> &quot;C&quot;</span><span class="z-keyword"> fn</span><span class="z-entity z-name z-function"> root</span><span>(</span><span class="z-variable">pointer</span><span class="z-keyword z-operator">: *</span><span class="z-storage">mut</span><span class="z-entity z-name"> u8</span><span>,</span><span class="z-variable"> length</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> usize</span><span>)</span><span class="z-keyword z-operator"> -&gt; *</span><span class="z-storage">mut</span><span class="z-entity z-name"> u8</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> input</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> unsafe</span><span> {</span><span class="z-entity z-name"> slice</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">from_raw_parts</span><span>(</span><span class="z-variable">pointer</span><span>,</span><span class="z-variable"> length</span><span>) };</span></span> <span class="giallo-l"><span class="z-storage"> let mut</span><span class="z-variable"> output</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> vec!</span><span>[];</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> if</span><span class="z-storage"> let</span><span class="z-entity z-name"> Ok</span><span>((</span><span class="z-variable">_remaining</span><span>,</span><span class="z-variable"> nodes</span><span>))</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> gutenberg_post_parser</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">root</span><span>(</span><span class="z-variable">input</span><span>) {</span></span> <span class="giallo-l"><span class="z-comment"> // Compile the AST (nodes) into a sequence of bytes.</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> pointer</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">as_mut_ptr</span><span>();</span></span> <span class="giallo-l"><span class="z-entity z-name"> mem</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">forget</span><span>(</span><span class="z-variable">output</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> pointer</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>The variable <code>input</code> contains the blog post. It is fetched from memory with a pointer and a length. The variable <code>output</code> is the sequence of bytes the function will return. <code>gutenberg_post_parser::root(input)</code> runs the parser. If parsing is OK, then the <code>nodes</code> are compiled into a sequence of bytes (omitted for now). Then the pointer to the sequence of bytes is grabbed, the Rust compiler is instructed to not drop it, and finally the pointer is returned. The logic is again pretty straightforward.</p> <p>Now, let&#39;s focus on the AST to the sequence of bytes (<code>u8</code>) compilation. All data the AST hold are already bytes, which makes the process easier. The goal is only to flatten the AST:</p> <ul> <li>The first 4 bytes represent the number of nodes at the first level (4 × <code>u8</code> represents <code>u32</code>) ,</li> <li>Next, if the node is <code>Block</code>: <ul> <li>The first byte is the node type: <code>1u8</code> for a block,</li> <li>The second byte is the size of the block name,</li> <li>The third to the sixth bytes are the size of the attributes,</li> <li>The seventh byte is the number of node children the block has,</li> <li>Next bytes are the block name,</li> <li>Next bytes are the attributes (<code>&amp;b"null"[..]</code> if none),</li> <li>Next bytes are node children as a sequence of bytes,</li> </ul> </li> <li>Next, if the node is <code>Phrase</code>: <ul> <li>The first byte is the node type: <code>2u8</code> for a phrase,</li> <li>The second to the fifth bytes are the size of the phrase,</li> <li>Next bytes are the phrase.</li> </ul> </li> </ul> <p>Here is the missing part of the <code>root</code> function:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">if</span><span class="z-storage"> let</span><span class="z-entity z-name"> Ok</span><span>((</span><span class="z-variable">_remaining</span><span>,</span><span class="z-variable"> nodes</span><span>))</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> gutenberg_post_parser</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">root</span><span>(</span><span class="z-variable">input</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> nodes_length</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> u32_to_u8s</span><span>(</span><span class="z-variable">nodes</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">len</span><span>()</span><span class="z-keyword"> as</span><span class="z-entity z-name"> u32</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">nodes_length</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">0</span><span>);</span></span> <span class="giallo-l"><span class="z-variable"> output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">nodes_length</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">1</span><span>);</span></span> <span class="giallo-l"><span class="z-variable"> output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">nodes_length</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">2</span><span>);</span></span> <span class="giallo-l"><span class="z-variable"> output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">nodes_length</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">3</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> for</span><span class="z-variable"> node</span><span class="z-keyword"> in</span><span class="z-variable"> nodes</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> into_bytes</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">node</span><span>,</span><span class="z-keyword z-operator"> &amp;</span><span class="z-storage">mut</span><span class="z-variable"> output</span><span>);</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>And here is the <code>into_bytes</code> function:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">fn</span><span class="z-entity z-name z-function"> into_bytes</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;(</span><span class="z-variable">node</span><span class="z-keyword z-operator">: &amp;</span><span class="z-entity z-name">Node</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;,</span><span class="z-variable"> output</span><span class="z-keyword z-operator">: &amp;</span><span class="z-storage">mut</span><span class="z-entity z-name"> Vec</span><span>&lt;</span><span class="z-entity z-name">u8</span><span>&gt;) {</span></span> <span class="giallo-l"><span class="z-keyword"> match</span><span class="z-keyword z-operator"> *</span><span class="z-variable">node</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name">  Node</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Block</span><span> {</span><span class="z-variable"> name</span><span>,</span><span class="z-variable"> attributes</span><span>,</span><span class="z-keyword"> ref</span><span class="z-variable"> children</span><span> }</span><span class="z-keyword z-operator"> =&gt;</span><span> {</span></span> <span class="giallo-l"><span class="z-storage">   let</span><span class="z-variable"> node_type</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-numeric"> 1</span><span class="z-entity z-name">u8</span><span>;</span></span> <span class="giallo-l"><span class="z-storage">  let</span><span class="z-variable"> name_length</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> name</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">0</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">len</span><span>()</span><span class="z-keyword z-operator"> +</span><span class="z-variable"> name</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">1</span><span class="z-punctuation z-separator">.</span><span class="z-entity z-name z-function">len</span><span>()</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 1</span><span>;</span></span> <span class="giallo-l"><span class="z-storage">   let</span><span class="z-variable"> attributes_length</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> match</span><span class="z-variable"> attributes</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name">   Some</span><span>(</span><span class="z-variable">attributes</span><span>)</span><span class="z-keyword z-operator"> =&gt;</span><span class="z-variable"> attributes</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">len</span><span>(),</span></span> <span class="giallo-l"><span class="z-entity z-name">   None</span><span class="z-keyword z-operator"> =&gt;</span><span class="z-constant z-numeric"> 4</span></span> <span class="giallo-l"><span>   };</span></span> <span class="giallo-l"><span class="z-storage">   let</span><span class="z-variable"> attributes_length_as_u8s</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> u32_to_u8s</span><span>(</span><span class="z-variable">attributes_length</span><span class="z-keyword"> as</span><span class="z-entity z-name"> u32</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">   let</span><span class="z-variable"> number_of_children</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> children</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">len</span><span>();</span></span> <span class="giallo-l"><span class="z-variable">  output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">node_type</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">name_length</span><span class="z-keyword"> as</span><span class="z-entity z-name"> u8</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">attributes_length_as_u8s</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">0</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">attributes_length_as_u8s</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">1</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">attributes_length_as_u8s</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">2</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">attributes_length_as_u8s</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">3</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">number_of_children</span><span class="z-keyword"> as</span><span class="z-entity z-name"> u8</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">extend</span><span>(</span><span class="z-variable">name</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">0</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-string">b&#39;/&#39;</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">extend</span><span>(</span><span class="z-variable">name</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">1</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">   if</span><span class="z-storage"> let</span><span class="z-entity z-name"> Some</span><span>(</span><span class="z-variable">attributes</span><span>)</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> attributes</span><span> {</span></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">extend</span><span>(</span><span class="z-variable">attributes</span><span>);</span></span> <span class="giallo-l"><span>  }</span><span class="z-keyword"> else</span><span> {</span></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">extend</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-string">b&quot;null&quot;</span><span>[</span><span class="z-keyword z-operator">..</span><span>]);</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">   for</span><span class="z-variable"> child</span><span class="z-keyword"> in</span><span class="z-variable"> children</span><span> {</span></span> <span class="giallo-l"><span class="z-entity z-name z-function">   into_bytes</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-variable">child</span><span>,</span><span class="z-variable"> output</span><span>);</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name"> Node</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">Phrase</span><span>(</span><span class="z-variable">phrase</span><span>)</span><span class="z-keyword z-operator"> =&gt;</span><span> {</span></span> <span class="giallo-l"><span class="z-storage">   let</span><span class="z-variable"> node_type</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-numeric"> 2</span><span class="z-entity z-name">u8</span><span>;</span></span> <span class="giallo-l"><span class="z-storage">   let</span><span class="z-variable"> phrase_length</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> phrase</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">len</span><span>();</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">node_type</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">   let</span><span class="z-variable"> phrase_length_as_u8s</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> u32_to_u8s</span><span>(</span><span class="z-variable">phrase_length</span><span class="z-keyword"> as</span><span class="z-entity z-name"> u32</span><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">phrase_length_as_u8s</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">0</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">phrase_length_as_u8s</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">1</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">phrase_length_as_u8s</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">2</span><span>);</span></span> <span class="giallo-l"><span class="z-variable"> output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-variable">phrase_length_as_u8s</span><span class="z-keyword z-operator">.</span><span class="z-constant z-numeric">3</span><span>);</span></span> <span class="giallo-l"><span class="z-variable">   output</span><span class="z-keyword z-operator">.</span><span class="z-entity z-name z-function">extend</span><span>(</span><span class="z-variable">phrase</span><span>);</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>What I find interesting with this code is it reads just like the bullet list above the code.</p> <p>For the most curious, here is the <code>u32_to_u8s</code> function:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">fn</span><span class="z-entity z-name z-function"> u32_to_u8s</span><span>(</span><span class="z-variable">x</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> u32</span><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span> (</span><span class="z-entity z-name">u8</span><span>,</span><span class="z-entity z-name"> u8</span><span>,</span><span class="z-entity z-name"> u8</span><span>,</span><span class="z-entity z-name"> u8</span><span>) {</span></span> <span class="giallo-l"><span> (</span></span> <span class="giallo-l"><span> ((</span><span class="z-variable">x</span><span class="z-keyword z-operator"> &gt;&gt;</span><span class="z-constant z-numeric"> 24</span><span>)</span><span class="z-keyword z-operator"> &amp;</span><span class="z-constant z-numeric"> 0xff</span><span>)</span><span class="z-keyword"> as</span><span class="z-entity z-name"> u8</span><span>,</span></span> <span class="giallo-l"><span> ((</span><span class="z-variable">x</span><span class="z-keyword z-operator"> &gt;&gt;</span><span class="z-constant z-numeric"> 16</span><span>)</span><span class="z-keyword z-operator"> &amp;</span><span class="z-constant z-numeric"> 0xff</span><span>)</span><span class="z-keyword"> as</span><span class="z-entity z-name"> u8</span><span>,</span></span> <span class="giallo-l"><span> ((</span><span class="z-variable">x</span><span class="z-keyword z-operator"> &gt;&gt;</span><span class="z-constant z-numeric"> 8</span><span>)</span><span class="z-keyword z-operator"> &amp;</span><span class="z-constant z-numeric"> 0xff</span><span>)</span><span class="z-keyword"> as</span><span class="z-entity z-name"> u8</span><span>,</span></span> <span class="giallo-l"><span> (</span><span class="z-variable"> x</span><span class="z-keyword z-operator"> &amp;</span><span class="z-constant z-numeric"> 0xff</span><span>)</span><span class="z-keyword"> as</span><span class="z-entity z-name"> u8</span></span> <span class="giallo-l"><span> )</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Here we are. <code>alloc</code>, <code>dealloc</code>, <code>root</code>, and <code>into_bytes</code>. Four functions, and everything is done.</p> <h3 id="-3">Producing and optimising the WebAssembly binary<a role="presentation" class="anchor" href="#-3" title="Anchor link to this header">#</a> </h3> <p>To get a WebAssembly binary, the project has to be compiled to the <code>wasm32-unknown-unknown</code> target. For now (and it will change in a near future), the nightly toolchain is needed to compile the project, so make sure you have the latest nightly version of <code>rustc</code> &amp; co. installed with <code>rustup update nightly</code>. Let&#39;s run <code>cargo</code>:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span class="z-variable"> RUSTFLAGS</span><span class="z-keyword z-operator">=</span><span class="z-string">&#39;-g&#39;</span><span class="z-entity z-name"> cargo</span><span class="z-string"> +nightly build</span><span class="z-constant z-other"> --target</span><span class="z-string"> wasm32-unknown-unknown</span><span class="z-constant z-other"> --release</span></span></code></pre> <p>The WebAssembly binary weights 22kb. Our goal is to reduce the file size. For that, the following tools will be required:</p> <ul> <li><a rel="noopener external" target="_blank" href="https://github.com/alexcrichton/wasm-gc"><code>wasm-gc</code></a> to garbage-collect unused imports, internal functions, types etc.,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/rustwasm/wasm-snip"><code>wasm-snip</code></a> to mark some functions as unreachable, this is useful when the binary includes unused code that the linker were not able to remove,</li> <li><code>wasm-opt</code> from the <a rel="noopener external" target="_blank" href="https://github.com/WebAssembly/binaryen">Binaryen project</a>, to optimise the binary,</li> <li><a rel="noopener external" target="_blank" href="http://www.gzip.org/"><code>gzip</code></a> and <a rel="noopener external" target="_blank" href="https://github.com/google/brotli"><code>brotli</code></a> to compress the binary.</li> </ul> <p>Basically, what we do is the following:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Garbage-collect unused data.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> wasm-gc gutenberg_post_parser.wasm</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Mark fmt and panicking as unreachable.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> wasm-snip --snip-rust-fmt-code --snip-rust-panicking-code gutenberg_post_parser.wasm -o gutenberg_post_parser_snipped.wasm</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> mv gutenberg_post_parser_snipped.wasm gutenberg_post_parser.wasm</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Garbage-collect unreachable data.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> wasm-gc gutenberg_post_parser.wasm</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Optimise </span><span class="z-keyword">for</span><span> small size.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> wasm-opt -Oz -o gutenberg_post_parser_opt.wasm gutenberg_post_parser.wasm</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> mv gutenberg_post_parser_opt.wasm gutenberg_post_parser.wasm</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> # Compress.</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> gzip --best --stdout gutenberg_post_parser.wasm </span><span class="z-keyword z-operator">&gt;</span><span> gutenberg_post_parser.wasm.gz</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> brotli --best --stdout</span><span class="z-variable"> --lgwin</span><span class="z-keyword z-operator">=</span><span class="z-string">24</span><span class="z-entity z-name"> gutenberg_post_parser.wasm</span><span class="z-keyword z-operator"> &gt;</span><span class="z-string"> gutenberg_post_parser.wasm.br</span><span> </span></span></code></pre> <p>We end up with the following file sizes:</p> <ul> <li><code>.wasm</code>: 16kb,</li> <li><code>.wasm.gz</code>: 7.3kb,</li> <li><code>.wasm.br</code>: 6.2kb.</li> </ul> <p>Neat! <a rel="noopener external" target="_blank" href="https://caniuse.com/#search=brotli">Brotli is implemented by most browsers</a>, so when the client sends <code>Accept-Encoding: br</code>, the server can response with the <code>.wasm.br</code> file.</p> <p>To give you a feeling of what 6.2kb represent, the following image also weights 6.2kb:</p> <figure> <p><img src="https://mnt.io/series/from-rust-to-beyond/the-webassembly-galaxy/./image-example.png" alt="The WordPress&#39;s logo" loading="lazy" decoding="async" /></p> <figcaption> <p>An image that is as weight as our compressed WebAssembly module.</p> </figcaption> </figure> <p>The WebAssembly binary is ready to run!</p> <h2 id="-4">WebAssembly 🚀 Javascript<a role="presentation" class="anchor" href="#-4" title="Anchor link to this header">#</a> </h2> <figure role="presentation"> <p><img src="https://mnt.io/series/from-rust-to-beyond/the-webassembly-galaxy/./wasm-to-js.png" alt="Wasm to JS" loading="lazy" decoding="async" /></p> </figure> <p>In this section, we assume Javascript runs in a browser. Thus, what we need to do is the following:</p> <ol> <li>Load/stream and instanciate the WebAssembly binary,</li> <li>Write the blog post to parse in the WebAssembly module memory,</li> <li>Call the <code>root</code> function on the parser,</li> <li>Read the WebAssembly module memory to load the flat AST (a sequence of bytes) and decode it to build a “Javascript AST” (with our own objects).</li> </ol> <p><a rel="noopener external" target="_blank" href="https://github.com/Hywan/gutenberg-parser-rs/blob/master/bindings/wasm/bin/gutenberg_post_parser.mjs">The entire code lands here</a>. It is approximately 150 lines of code too. I won&#39;t explain the whole code since some parts of it is the “friendly API” that is exposed to the user. So I will rather explain the major pieces.</p> <h3 id="-5">Loading/streaming and instanciating<a role="presentation" class="anchor" href="#-5" title="Anchor link to this header">#</a> </h3> <p><a rel="noopener external" target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WebAssembly">The <code>WebAssembly</code> API</a> exposes multiple ways to load a WebAssembly binary. The best you can use is <a rel="noopener external" target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WebAssembly/instantiateStreaming">the <code>WebAssembly.instanciateStreaming</code> function</a>: It streams the binary and compiles it in the same time, nothing is blocking. This API relies on <a rel="noopener external" target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API">the <code>Fetch</code> API</a>. You might have guessed it: It is asynchronous (it returns a promise). WebAssembly itself is not asynchronous (except if you use thread), but the instanciation step is. It is possible to avoid that, but this is tricky, and Google Chrome has a strong limit of 4kb for the binary size which will make you give up quickly.</p> <p>To be able to stream the WebAssembly binary, the server must send the <code>application/wasm</code> MIME type (with the <code>Content-Type</code> header).</p> <p>Let&#39;s instanciate our WebAssembly:</p> <pre class="giallo z-code"><code data-lang="javascript"><span class="giallo-l"><span class="z-storage">const</span><span class="z-variable"> url</span><span class="z-keyword z-operator"> =</span><span class="z-string"> &#39;/gutenberg_post_parser.wasm&#39;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage">const</span><span class="z-variable"> wasm</span><span class="z-keyword z-operator"> =</span></span> <span class="giallo-l"><span class="z-variable"> WebAssembly</span><span class="z-punctuation z-accessor">.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> instantiateStreaming</span><span>(</span><span class="z-entity z-name z-function">fetch</span><span>(</span><span class="z-variable">url</span><span>)</span><span class="z-punctuation z-separator">,</span><span> {})</span><span class="z-punctuation z-accessor">.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> then</span><span>(</span><span class="z-variable z-parameter">object</span><span class="z-storage z-type z-function"> =&gt;</span><span class="z-variable"> object</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">instance</span><span>)</span><span class="z-punctuation z-accessor">.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> then</span><span>(</span><span class="z-variable z-parameter">instance</span><span class="z-storage z-type z-function"> =&gt;</span><span> {</span><span class="z-comment"> /* step 2 */</span><span> })</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>The WebAssembly binary has been instanciated! Now we can move to the next step.</p> <h3 id="-6">Last polish before running the parser<a role="presentation" class="anchor" href="#-6" title="Anchor link to this header">#</a> </h3> <p>Remember that the WebAssembly binary exports 3 functions: <code>alloc</code>, <code>dealloc</code>, and <code>root</code>. They can be found on the <code>exports</code> property, along with the memory. Let&#39;s write that:</p> <pre class="giallo z-code"><code data-lang="javascript"><span class="giallo-l"><span class="z-entity z-name z-function">  then</span><span>(</span><span class="z-variable z-parameter">instance</span><span class="z-storage z-type z-function"> =&gt;</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> Module</span><span class="z-keyword z-operator"> =</span><span> {</span></span> <span class="giallo-l"><span> alloc</span><span class="z-punctuation z-separator">:</span><span class="z-variable"> instance</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">exports</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">alloc</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> dealloc</span><span class="z-punctuation z-separator">:</span><span class="z-variable"> instance</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">exports</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">dealloc</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> root</span><span class="z-punctuation z-separator">:</span><span class="z-variable"> instance</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">exports</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">root</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> memory</span><span class="z-punctuation z-separator">:</span><span class="z-variable"> instance</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">exports</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">memory</span></span> <span class="giallo-l"><span> }</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function"> runParser</span><span>(</span><span class="z-variable">Module</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &#39;&lt;!-- wp:foo /--&gt;xyz&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> })</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>Great, everything is ready to write the <code>runParser</code> function!</p> <h3 id="-7">The parser runner<a role="presentation" class="anchor" href="#-7" title="Anchor link to this header">#</a> </h3> <p>As a reminder, this function has to: Write the <code>input</code> (the blog post to parse) in the WebAssembly module memory (<code>Module.memory</code>), to call the <code>root</code> function (<code>Module.root</code>), and to read the result from the WebAssembly module memory. Let&#39;s do that:</p> <pre class="giallo z-code"><code data-lang="javascript"><span class="giallo-l"><span class="z-storage z-type z-function">function</span><span class="z-entity z-name z-function"> runParser</span><span>(</span><span class="z-variable z-parameter">Module</span><span class="z-punctuation z-separator">,</span><span class="z-variable z-parameter"> raw_input</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> input</span><span class="z-keyword z-operator"> = new</span><span class="z-entity z-name z-function"> TextEncoder</span><span>()</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">encode</span><span>(</span><span class="z-variable">raw_input</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> input_pointer</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> writeBuffer</span><span>(</span><span class="z-variable">Module</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> input</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> output_pointer</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> Module</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">root</span><span>(</span><span class="z-variable">input_pointer</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> input</span><span class="z-punctuation z-accessor">.</span><span>length)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> result</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> readNodes</span><span>(</span><span class="z-variable">Module</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> output_pointer</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> Module</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">dealloc</span><span>(</span><span class="z-variable">input_pointer</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> input</span><span class="z-punctuation z-accessor">.</span><span>length)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> result</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>In details:</p> <ul> <li>The <code>raw_input</code> is encoded into a sequence of bytes with <a rel="noopener external" target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder">the <code>TextEncoder</code>API</a>, in <code>input</code>,</li> <li>The input is written into the WebAssembly memory module with <code>writeBuffer</code> and its pointer is returned,</li> <li>Then the <code>root</code> function is called with the pointer to the input and the length of the input as expected, and the pointer to the output is returned,</li> <li>Then the output is decoded,</li> <li>And finally, the input is deallocated. The output of the parser will be deallocated in the <code>readNodes</code> function because its length is unknown at this step.</li> </ul> <p>Great! So we have 2 functions to write right now: <code>writeBuffer</code>​ and <code>readNodes</code>.</p> <h3 id="-8">Writing the data in memory<a role="presentation" class="anchor" href="#-8" title="Anchor link to this header">#</a> </h3> <p>Let&#39;s go with the first one, <code>writeBuffer</code>:</p> <pre class="giallo z-code"><code data-lang="javascript"><span class="giallo-l"><span class="z-storage z-type z-function">function</span><span class="z-entity z-name z-function"> writeBuffer</span><span>(</span><span class="z-variable z-parameter">Module</span><span class="z-punctuation z-separator">,</span><span class="z-variable z-parameter"> buffer</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> buffer_length</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> buffer</span><span class="z-punctuation z-accessor">.</span><span>length</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> pointer</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> Module</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">alloc</span><span>(</span><span class="z-variable">buffer_length</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> memory</span><span class="z-keyword z-operator"> = new</span><span class="z-entity z-name z-function"> Uint8Array</span><span>(</span><span class="z-variable">Module</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">memory</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">buffer</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> for</span><span> (</span><span class="z-storage">let</span><span class="z-variable"> i</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span><span class="z-variable"> i</span><span class="z-keyword z-operator"> &lt;</span><span class="z-variable"> buffer_length</span><span class="z-punctuation z-terminator">;</span><span class="z-keyword z-operator"> ++</span><span class="z-variable">i</span><span>) {</span></span> <span class="giallo-l"><span class="z-variable"> memory</span><span>[</span><span class="z-variable">pointer</span><span class="z-keyword z-operator"> +</span><span class="z-variable"> i</span><span>]</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> buffer</span><span>[</span><span class="z-variable">i</span><span>]</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> pointer</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>In details:</p> <ul> <li>The length of the buffer is read in <code>buffer_length</code>,</li> <li>A space in memory is allocated to write the buffer,</li> <li>Then a <code>uint8</code> view of the buffer is instanciated, which means that the buffer will be viewed as a sequence of <code>u8</code>, exactly what Rust expects,</li> <li>Finally the buffer is copied into the memory with a loop, that&#39;s very basic, and return the pointer.</li> </ul> <p>Note that, unlike C strings, adding a <code>NUL</code> byte at the end is not mandatory. This is just the raw data (on the Rust side, we read it with <code>slice::from_raw_parts</code>, slice is a very simple structure).</p> <h3 id="-9">Reading the output of the parser<a role="presentation" class="anchor" href="#-9" title="Anchor link to this header">#</a> </h3> <p>So at this step, the input has been written in memory, and the <code>root</code> function has been called so it means the parser has run. It has returned a pointer to the output (the result) and we now have to read it and decode it.</p> <p>Remind that the first 4 bytes encodes the number of nodes we have to read. Let&#39;s go!</p> <pre class="giallo z-code"><code data-lang="javascript"><span class="giallo-l"><span class="z-storage z-type z-function">function</span><span class="z-entity z-name z-function"> readNodes</span><span>(</span><span class="z-variable z-parameter">Module</span><span class="z-punctuation z-separator">,</span><span class="z-variable z-parameter"> start_pointer</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> buffer</span><span class="z-keyword z-operator"> = new</span><span class="z-entity z-name z-function"> Uint8Array</span><span>(</span><span class="z-variable">Module</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">memory</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">buffer</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">slice</span><span>(</span><span class="z-variable">start_pointer</span><span>))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> number_of_nodes</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> u8s_to_u32</span><span>(</span><span class="z-variable">buffer</span><span>[</span><span class="z-constant z-numeric">0</span><span>]</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> buffer</span><span>[</span><span class="z-constant z-numeric">1</span><span>]</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> buffer</span><span>[</span><span class="z-constant z-numeric">2</span><span>]</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> buffer</span><span>[</span><span class="z-constant z-numeric">3</span><span>])</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> if</span><span> (</span><span class="z-constant z-numeric">0</span><span class="z-keyword z-operator"> &gt;=</span><span class="z-variable"> number_of_nodes</span><span>) {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-constant z-language"> null</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> nodes</span><span class="z-keyword z-operator"> =</span><span> []</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> offset</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-numeric"> 4</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> end_offset</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> for</span><span> (</span><span class="z-storage">let</span><span class="z-variable"> i</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span><span class="z-variable"> i</span><span class="z-keyword z-operator"> &lt;</span><span class="z-variable"> number_of_nodes</span><span class="z-punctuation z-terminator">;</span><span class="z-keyword z-operator"> ++</span><span class="z-variable">i</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> last_offset</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> readNode</span><span>(</span><span class="z-variable">buffer</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> offset</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> nodes</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> offset</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> end_offset</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> last_offset</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> Module</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">dealloc</span><span>(</span><span class="z-variable">start_pointer</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> start_pointer</span><span class="z-keyword z-operator"> +</span><span class="z-variable"> end_offset</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> nodes</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>In details:</p> <ul> <li>A <code>uint8</code> view of the memory is instanciated… more precisely: A slice of the memory starting at <code>start_pointer</code>,</li> <li>The number of nodes is read, then all nodes are read,</li> <li>And finally, the output of the parser is deallocated.</li> </ul> <p>For the record, here is the <code>u8s_to_u32</code> function, this is the exact opposite of <code>u32_to_u8s</code>:</p> <pre class="giallo z-code"><code data-lang="javascript"><span class="giallo-l"><span class="z-storage z-type z-function">function</span><span class="z-entity z-name z-function"> u8s_to_u32</span><span>(</span><span class="z-variable z-parameter">o</span><span class="z-punctuation z-separator">,</span><span class="z-variable z-parameter"> p</span><span class="z-punctuation z-separator">,</span><span class="z-variable z-parameter"> q</span><span class="z-punctuation z-separator">,</span><span class="z-variable z-parameter"> r</span><span>) {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span> (</span><span class="z-variable">o</span><span class="z-keyword z-operator"> &lt;&lt;</span><span class="z-constant z-numeric"> 24</span><span>)</span><span class="z-keyword z-operator"> |</span><span> (</span><span class="z-variable">p</span><span class="z-keyword z-operator"> &lt;&lt;</span><span class="z-constant z-numeric"> 16</span><span>)</span><span class="z-keyword z-operator"> |</span><span> (</span><span class="z-variable">q</span><span class="z-keyword z-operator"> &lt;&lt;</span><span class="z-constant z-numeric"> 8</span><span>)</span><span class="z-keyword z-operator"> |</span><span class="z-variable"> r</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>And I will also share the <code>readNode</code> function, but I won&#39;t explain the details. This is just the decoding part of the output from the parser.</p> <pre class="giallo z-code"><code data-lang="javascript"><span class="giallo-l"><span class="z-storage z-type z-function">function</span><span class="z-entity z-name z-function"> readNode</span><span>(</span><span class="z-variable z-parameter">buffer</span><span class="z-punctuation z-separator">,</span><span class="z-variable z-parameter"> offset</span><span class="z-punctuation z-separator">,</span><span class="z-variable z-parameter"> nodes</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> node_type</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> buffer</span><span>[</span><span class="z-variable">offset</span><span>]</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Block.</span></span> <span class="giallo-l"><span class="z-keyword"> if</span><span> (</span><span class="z-constant z-numeric">1</span><span class="z-keyword z-operator"> ===</span><span class="z-variable"> node_type</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> name_length</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> buffer</span><span>[</span><span class="z-variable">offset</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 1</span><span>]</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> attributes_length</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> u8s_to_u32</span><span>(</span><span class="z-variable">buffer</span><span>[</span><span class="z-variable">offset</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 2</span><span>]</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> buffer</span><span>[</span><span class="z-variable">offset</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 3</span><span>]</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> buffer</span><span>[</span><span class="z-variable">offset</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 4</span><span>]</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> buffer</span><span>[</span><span class="z-variable">offset</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 5</span><span>])</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> number_of_children</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> buffer</span><span>[</span><span class="z-variable">offset</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 6</span><span>]</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> payload_offset</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> offset</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 7</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> next_payload_offset</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> payload_offset</span><span class="z-keyword z-operator"> +</span><span class="z-variable"> name_length</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> name</span><span class="z-keyword z-operator"> = new</span><span class="z-entity z-name z-function"> TextDecoder</span><span>()</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">decode</span><span>(</span><span class="z-variable">buffer</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">slice</span><span>(</span><span class="z-variable">payload_offset</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> next_payload_offset</span><span>))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> payload_offset</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> next_payload_offset</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable"> next_payload_offset</span><span class="z-keyword z-operator"> +=</span><span class="z-variable"> attributes_length</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> attributes</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> JSON</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">parse</span><span>(</span><span class="z-keyword z-operator">new</span><span class="z-entity z-name z-function"> TextDecoder</span><span>()</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">decode</span><span>(</span><span class="z-variable">buffer</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">slice</span><span>(</span><span class="z-variable">payload_offset</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> next_payload_offset</span><span>)))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> payload_offset</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> next_payload_offset</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> let</span><span class="z-variable"> end_offset</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> payload_offset</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> children</span><span class="z-keyword z-operator"> =</span><span> []</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> for</span><span> (</span><span class="z-storage">let</span><span class="z-variable"> i</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span><span class="z-variable"> i</span><span class="z-keyword z-operator"> &lt;</span><span class="z-variable"> number_of_children</span><span class="z-punctuation z-terminator">;</span><span class="z-keyword z-operator"> ++</span><span class="z-variable">i</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> last_offset</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> readNode</span><span>(</span><span class="z-variable">buffer</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> payload_offset</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> children</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> payload_offset</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> end_offset</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> last_offset</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> nodes</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-keyword z-operator">new</span><span class="z-entity z-name z-function"> Block</span><span>(</span><span class="z-variable">name</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> attributes</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> children</span><span>))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> end_offset</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span class="z-comment"> // Phrase.</span></span> <span class="giallo-l"><span class="z-keyword"> else if</span><span> (</span><span class="z-constant z-numeric">2</span><span class="z-keyword z-operator"> ===</span><span class="z-variable"> node_type</span><span>) {</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> phrase_length</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name z-function"> u8s_to_u32</span><span>(</span><span class="z-variable">buffer</span><span>[</span><span class="z-variable">offset</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 1</span><span>]</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> buffer</span><span>[</span><span class="z-variable">offset</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 2</span><span>]</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> buffer</span><span>[</span><span class="z-variable">offset</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 3</span><span>]</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> buffer</span><span>[</span><span class="z-variable">offset</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 4</span><span>])</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> phrase_offset</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> offset</span><span class="z-keyword z-operator"> +</span><span class="z-constant z-numeric"> 5</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-storage"> const</span><span class="z-variable"> phrase</span><span class="z-keyword z-operator"> = new</span><span class="z-entity z-name z-function"> TextDecoder</span><span>()</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">decode</span><span>(</span><span class="z-variable">buffer</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">slice</span><span>(</span><span class="z-variable">phrase_offset</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> phrase_offset</span><span class="z-keyword z-operator"> +</span><span class="z-variable"> phrase_length</span><span>))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> nodes</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">push</span><span>(</span><span class="z-keyword z-operator">new</span><span class="z-entity z-name z-function"> Phrase</span><span>(</span><span class="z-variable">phrase</span><span>))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> phrase_offset</span><span class="z-keyword z-operator"> +</span><span class="z-variable"> phrase_length</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span><span class="z-keyword"> else</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> console</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">error</span><span>(</span><span class="z-string">&#39;unknown node type&#39;</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> node_type</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Note that this code is pretty simple and easy to optimise by the Javascript virtual machine. It is almost important to note that this is not the original code. The original version is a little more optimised here and there, but they are very close.</p> <p>And that&#39;s all! We have successfully read and decoded the output of the parser! We just need to write the <code>Block</code> and <code>Phrase</code> classes like this:</p> <pre class="giallo z-code"><code data-lang="javascript"><span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> Block</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> constructor</span><span>(</span><span class="z-variable z-parameter">name</span><span class="z-punctuation z-separator">,</span><span class="z-variable z-parameter"> attributes</span><span class="z-punctuation z-separator">,</span><span class="z-variable z-parameter"> children</span><span>) {</span></span> <span class="giallo-l"><span class="z-variable z-language"> this</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">name</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> name</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable z-language"> this</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">attributes</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> attributes</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable z-language"> this</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">children</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> children</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> Phrase</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> constructor</span><span>(</span><span class="z-variable z-parameter">phrase</span><span>) {</span></span> <span class="giallo-l"><span class="z-variable z-language"> this</span><span class="z-punctuation z-accessor">.</span><span class="z-variable">phrase</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> phrase</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>The final output will be an array of those objects. Easy!</p> <h2 id="-10">WebAssembly 🚀 NodeJS<a role="presentation" class="anchor" href="#-10" title="Anchor link to this header">#</a> </h2> <figure role="presentation"> <p><img src="https://mnt.io/series/from-rust-to-beyond/the-webassembly-galaxy/./wasm-to-nodejs.png" alt="Wasm to NodeJS" loading="lazy" decoding="async" /></p> </figure> <p>The differences between the Javascript version and the NodeJS version are few:</p> <ul> <li>The <code>Fetch</code> API does not exist in NodeJS, so the WebAssembly binary has to be instanciated with a buffer directly, like this: <code>WebAssembly.instantiate(fs.readFileSync(url), {})</code>,</li> <li>The <code>TextEncoder</code> and <code>TextDecoder</code> objects do not exist as global objects, they are in <code>util.TextEncoder</code> and <code>util.TextDecoder</code>.</li> </ul> <p>In order to share the code between both environments, it is possible to write the boundary layer (the Javascript code we wrote) in a <code>.mjs</code> file, aka ECMAScript Module. It allows to write something like <code>import { Gutenberg_Post_Parser } from './gutenberg_post_parser.mjs'</code> for example (considering the whole code we wrote before is a class). On the browser side, the script must be loaded with<code>&lt;script type="module" src="…" /&gt;</code>, and on the NodeJS side, <code>node</code> must run with the <code>--experimental-modules</code> flag. I can recommend you this talk <a rel="noopener external" target="_blank" href="https://www.youtube.com/watch?v=35ZMoH8T-gc&amp;index=4&amp;list=PLOkMRkzDhWGX_4YWI4ZYGbwFPqKnDRudf&amp;t=0s"><em>Please wait… loading: a tale of two loaders</em> by Myles Borins</a> at the JSConf EU 2018 to understand all the story about that.</p> <p><a rel="noopener external" target="_blank" href="https://github.com/Hywan/gutenberg-parser-rs/blob/master/bindings/wasm/bin/index.mjs">The entire code lands here</a>.</p> <h2 id="-11">Conclusion<a role="presentation" class="anchor" href="#-11" title="Anchor link to this header">#</a> </h2> <p>We have seen in details how to write a real world parser in Rust, how to compile it into a WebAssembly binary, and how to use it with Javascript and with NodeJS.</p> <p>The parser can be used in a browser with regular Javascript code, or as a CLI with NodeJS, or on any platforms NodeJS supports.</p> <p>The Rust part for WebAssembly plus the Javascript part totals 313 lines of code. This is a tiny surface of code to review and to maintain compared to writing a Javascript parser from scratch.</p> <p>Another argument is the safety and performance. Rust is memory safe, we know that. It is also performant, but is it still true for the WebAssembly target? The following table shows the benchmark results of the actual Javascript parser for the Gutenberg project (implemented with <a rel="noopener external" target="_blank" href="https://pegjs.org/">PEG.js</a>), against this project: The Rust parser as a WebAssembly binary.</p> <figure> <table><thead><tr><th>Document</th><th>Javascript parser (ms)</th><th>Rust parser as a WebAssembly binary (ms)</th><th>speedup</th></tr></thead><tbody> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/demo-post.html"><code>demo-post.html</code></a></td><td>13.167</td><td>0.252</td><td>× 52</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/shortcode-shortcomings.html"><code>shortcode-shortcomings.html</code></a></td><td>26.784</td><td>0.271</td><td>× 98</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/redesigning-chrome-desktop.html"><code>redesigning-chrome-desktop.html</code></a></td><td>75.500</td><td>0.918</td><td>× 82</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/web-at-maximum-fps.html"><code>web-at-maximum-fps.html</code></a></td><td>88.118</td><td>0.901</td><td>× 98</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/early-adopting-the-future.html"><code>early-adopting-the-future.html</code></a></td><td>201.011</td><td>3.329</td><td>× 60</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/pygmalian-raw-html.html"><code>pygmalian-raw-html.html</code></a></td><td>311.416</td><td>2.692</td><td>× 116</td></tr> <tr><td><a rel="noopener external" target="_blank" href="https://raw.githubusercontent.com/dmsnell/gutenberg-document-library/master/library/moby-dick-parsed.html"><code>moby-dick-parsed.html</code></a></td><td>2,466.533</td><td>25.14</td><td>× 98</td></tr> </tbody></table> <figcaption> <p>Benchmarks between Javascript parser and Rust parser as a WebAssembly binary.</p> </figcaption> </figure> <p>The WebAssembly binary is in average 86 times faster than the actual Javascript implementation. The median of the speedup is 98. Some edge cases are very interesting, like <code>moby-dick-parsed.html</code> where it takes 2.5s with the Javascript parser against 25ms with WebAssembly.</p> <p>So not only it is safer, but it is faster than Javascript in this case. And it is only 300 lines of code.</p> <p>Note that WebAssembly does not support SIMD yet: It is still <a rel="noopener external" target="_blank" href="https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md">a proposal</a>. Rust is gently supporting it (<a rel="noopener external" target="_blank" href="https://github.com/rust-lang-nursery/stdsimd/pull/549">example with PR #549</a>). It will dramatically improve the performances!</p> <p>We will see in the next episodes of this series that Rust can reach a lot of galaxies, and the more it travels, the more it gets interesting.</p> <p>Thanks for reading!</p> Prelude 2018-08-21T00:00:00+00:00 2018-08-21T00:00:00+00:00 Unknown https://mnt.io/series/from-rust-to-beyond/prelude/ <p><a rel="noopener external" target="_blank" href="https://automattic.com/">At my work</a>, I had an opportunity to start an experiment: Writing a single parser implementation in Rust for <a rel="noopener external" target="_blank" href="https://github.com/WordPress/gutenberg">the new Gutenberg post format</a>, bound to many platforms and environments.</p> <figure> <p><img src="https://mnt.io/series/from-rust-to-beyond/prelude/./gutenberg.png" alt="Gutenberg&#39;s logo" loading="lazy" decoding="async" /></p> <figcaption> <p>Gutenberg&#39;s logo.</p> </figcaption> </figure> <p>This series of posts is about those bindings, and explains how to send Rust beyond earth, into many different galaxies.</p> <h2 id="">The Gutenberg post format<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h2> <p>Let&#39;s introduce quickly what Gutenberg is, and why a new post format. If you want an in-depth presentation, I highly recommend to read <a rel="noopener external" target="_blank" href="https://lamda.blog/2018/04/22/the-language-of-gutenberg/">The Language of Gutenberg</a>. Note that this is <em>not</em> required for the reader to understand the Gutenberg post format.</p> <p><a rel="noopener external" target="_blank" href="https://github.com/WordPress/gutenberg">Gutenberg</a> is the next WordPress editor. It is a little revolution on its own. The features it unlocks are very powerful.</p> <blockquote> <p>The editor will create a new page- and post-building experience that makes writing rich posts effortless, and has “blocks” to make it easy what today might take shortcodes, custom HTML, or “mystery meat” embed discovery. — Matt Mullenweg</p> </blockquote> <p>The format of a blog post was HTML. And it continues to be. However, another semantics layer is added through annotations. Annotations are written in comments and borrow the XML syntax, e.g.:</p> <pre class="giallo z-code"><code data-lang="xml"><span class="giallo-l"><span class="z-comment">&lt;!-- wp:ns/block-name {&quot;attributes&quot;: &quot;as JSON&quot;} --&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">p</span><span class="z-punctuation z-definition z-tag">&gt;</span><span>phrase</span><span class="z-punctuation z-definition z-tag">&lt;/</span><span class="z-entity z-name z-tag">p</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-comment">&lt;!-- /wp:ns/block-name --&gt;</span></span></code></pre> <p>The Gutenberg format provides 2 constructions: Block, and Phrase. The example above contains both: There is a block wrapping a phrase. A phrase is basically anything that is not a block. Let&#39;s describe the example:</p> <ul> <li>It starts with an annotation (<code>&lt;!-- … --&gt;</code>),</li> <li>The <code>wp:</code> is mandatory to represent a Gutenberg block,</li> <li>It is followed by a fully qualified block name, which is a pair of an optional namespace (here sets to <code>ns</code> , defaults to <code>core</code>) and a block name (here sets to <code>block-name</code>), separated by a slash,</li> <li>A block has optional attributes encoded as a JSON object (see <a rel="noopener external" target="_blank" href="https://tools.ietf.org/html/rfc7159">RFC 7159, Section 4, Objects</a>),</li> <li>Finally, a block has optional children, i.e. a heterogeneous collection of blocks or phrases. In the example above, there is one child that is the phrase <code>&lt;p&gt;phrase&lt;/p&gt;</code>. And the following example below shows a block with no child:</li> </ul> <pre class="giallo z-code"><code data-lang="xml"><span class="giallo-l"><span class="z-comment">&lt;!-- wp:ns/block-name {&quot;attributes&quot;: &quot;as JSON&quot;} /--&gt;</span></span></code></pre> <p>The complete grammar can be found in <a rel="noopener external" target="_blank" href="https://hywan.github.io/gutenberg-parser-rs/gutenberg_post_parser/parser/index.html">the parser&#39;s documentation</a>.</p> <p>Finally, the parser is used on the <em>editor</em> side, not on the <em>rendering</em> side. Once rendered, the blog post is a regular HTML file. Some blocks are dynamics though, but this is another topic.</p> <figure> <p><img src="https://mnt.io/series/from-rust-to-beyond/prelude/./block-logic-flow.png" alt="Block logic flow" loading="lazy" decoding="async" /></p> <figcaption> <p>The logic flow of the editor (<a rel="noopener external" target="_blank" href="https://make.wordpress.org/core/2017/05/05/editor-how-little-blocks-work/">How Little Blocks Work</a>).</p> </figcaption> </figure> <p>The grammar is relatively small. The challenges are however to be as much performant and memory efficient as possible on many platforms. Some posts can reach megabytes, and we don&#39;t want the parser to be the bottleneck. Even if it is used when creating the post state (cf. the schema above), we have measured several seconds to load some posts. Time during which the user is blocked, and waits, or see an error. In other scenarii, we have hit memory limit of the language&#39;s virtual machines.</p> <p>Hence this experimental project! The current parsers are written in JavaScript (with <a rel="noopener external" target="_blank" href="https://pegjs.org/">PEG.js</a>) and in PHP (with <a rel="noopener external" target="_blank" href="https://github.com/nylen/phpegjs"><code>phpegjs</code></a>). This Rust project proposes a parser written in Rust, that can run in the JavaScript and in the PHP virtual machines, and on many other platforms. Let&#39;s try to be very performant and memory efficient!</p> <h2 id="-1">Why Rust?<a role="presentation" class="anchor" href="#-1" title="Anchor link to this header">#</a> </h2> <p>That&#39;s an excellent question! Thanks for asking. I can summarize my choice with a bullet list:</p> <ul> <li>It is fast, and we need speed,</li> <li>It is memory safe, and also memory efficient,</li> <li>No garbage collector, which simplifies memory management across environments,</li> <li>It can expose a C API (<a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/ffi/index.html">with Foreign Function Interface, FFI</a>), which eases the integration into multiple environments,</li> <li>It compiles to <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/nightly/rustc/platform-support.html">many targets</a>,</li> <li>Because I love it.</li> </ul> <p>One of the goal of the experimentation is to maintain a single implementation (maybe the future reference implementation) with multiple bindings.</p> <h2 id="-2">The parser<a role="presentation" class="anchor" href="#-2" title="Anchor link to this header">#</a> </h2> <p>The parser is written in Rust. It relies on the fabulous <a rel="noopener external" target="_blank" href="https://github.com/Geal/nom/">nom library</a>.</p> <figure> <p><img src="https://mnt.io/series/from-rust-to-beyond/prelude/./nom.png" alt="nom" loading="lazy" decoding="async" /></p> <figcaption> <p><em>nom will happily take a byte out of your files</em> 🙂</p> </figcaption> </figure> <p>The source code is available in <a rel="noopener external" target="_blank" href="https://github.com/Hywan/gutenberg-parser-rs">the <code>src/</code> directory in the repository</a>. It is very small and fun to read.</p> <p>The parser produces an Abstract Syntax Tree (AST) of the grammar, where nodes of the tree are defined as:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub</span><span class="z-storage"> enum</span><span class="z-entity z-name"> Node</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt; {</span></span> <span class="giallo-l"><span class="z-entity z-name"> Block</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> name</span><span class="z-keyword z-operator">:</span><span> (</span><span class="z-entity z-name">Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;,</span><span class="z-entity z-name"> Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;),</span></span> <span class="giallo-l"><span class="z-variable"> attributes</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Option</span><span>&lt;</span><span class="z-entity z-name">Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;&gt;,</span></span> <span class="giallo-l"><span class="z-variable"> children</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Vec</span><span>&lt;</span><span class="z-entity z-name">Node</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;&gt;</span></span> <span class="giallo-l"><span> },</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> Phrase</span><span>(</span><span class="z-entity z-name">Input</span><span>&lt;&#39;</span><span class="z-entity z-name">a</span><span>&gt;)</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>That&#39;s all! We find again the block name, the attributes and the children, and the phrase. Block children are defined as a collection of node, this is recursive. <code>Input&lt;'a&gt;</code> is defined as <code>&amp;'a [u8]</code>, i.e. a slice of bytes.</p> <p>The main parser entry is <a rel="noopener external" target="_blank" href="https://hywan.github.io/gutenberg-parser-rs/gutenberg_post_parser/fn.root.html">the <code>root</code> function</a>. It represents the axiom of the grammar, and is defined as:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">pub fn</span><span class="z-entity z-name z-function"> root</span><span>(</span></span> <span class="giallo-l"><span class="z-variable">    input</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Input</span></span> <span class="giallo-l"><span>)</span><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name"> Result</span><span>&lt;(</span><span class="z-entity z-name">Input</span><span>,</span><span class="z-entity z-name"> Vec</span><span>&lt;</span><span class="z-variable">ast</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Node</span><span>&gt;),</span><span class="z-variable"> nom</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Err</span><span>&lt;</span><span class="z-entity z-name">Input</span><span>&gt;&gt;;</span></span></code></pre> <p>So the parser returns a collection of nodes in the best case. Here is an simple example:</p> <pre class="giallo z-code"><code data-lang="rust"><span class="giallo-l"><span class="z-keyword">use</span><span class="z-entity z-name"> gutenberg_post_parser</span><span class="z-keyword z-operator">::</span><span>{root,</span><span class="z-entity z-name"> ast</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Node</span><span>};</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">let</span><span class="z-variable"> input</span><span class="z-keyword z-operator"> = &amp;</span><span class="z-string">b&quot;&lt;!-- wp:foo {</span><span class="z-constant z-character">\&quot;</span><span class="z-string">bar</span><span class="z-constant z-character">\&quot;</span><span class="z-string">: true} /--&gt;&quot;</span><span>[</span><span class="z-keyword z-operator">..</span><span>];</span></span> <span class="giallo-l"><span class="z-storage">let</span><span class="z-variable"> output</span><span class="z-keyword z-operator"> =</span><span class="z-entity z-name"> Ok</span><span>(</span></span> <span class="giallo-l"><span> (</span></span> <span class="giallo-l"><span class="z-comment"> // The remaining data.</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> &amp;</span><span class="z-string">b&quot;&quot;</span><span>[</span><span class="z-keyword z-operator">..</span><span>],</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // The Abstract Syntax Tree.</span></span> <span class="giallo-l"><span class="z-entity z-name z-function"> vec!</span><span>[</span></span> <span class="giallo-l"><span class="z-entity z-name"> Node</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name">Block</span><span> {</span></span> <span class="giallo-l"><span class="z-variable"> name</span><span class="z-keyword z-operator">:</span><span> (</span><span class="z-keyword z-operator">&amp;</span><span class="z-string">b&quot;core&quot;</span><span>[</span><span class="z-keyword z-operator">..</span><span>],</span><span class="z-keyword z-operator"> &amp;</span><span class="z-string">b&quot;foo&quot;</span><span>[</span><span class="z-keyword z-operator">..</span><span>]),</span></span> <span class="giallo-l"><span class="z-variable"> attributes</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name"> Some</span><span>(</span><span class="z-keyword z-operator">&amp;</span><span class="z-string">b&quot;{</span><span class="z-constant z-character">\&quot;</span><span class="z-string">bar</span><span class="z-constant z-character">\&quot;</span><span class="z-string">: true}&quot;</span><span>[</span><span class="z-keyword z-operator">..</span><span>]),</span></span> <span class="giallo-l"><span class="z-variable"> children</span><span class="z-keyword z-operator">:</span><span class="z-entity z-name z-function"> vec!</span><span>[]</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> ]</span></span> <span class="giallo-l"><span> )</span></span> <span class="giallo-l"><span>);</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-entity z-name z-function">assert_eq!</span><span>(</span><span class="z-entity z-name z-function">root</span><span>(</span><span class="z-variable">input</span><span>),</span><span class="z-variable"> output</span><span>);</span></span></code></pre> <p>The <code>root</code> function and the AST will be the items we are going to use and manipulate in the bindings. The internal items of the parser will stay private.</p> <h2 id="-3">Bindings<a role="presentation" class="anchor" href="#-3" title="Anchor link to this header">#</a> </h2> <figure role="presentation"> <p><img src="https://mnt.io/series/from-rust-to-beyond/prelude/./rust-to.png" alt="Rust to" loading="lazy" decoding="async" /></p> </figure> <p>From now, our goal is to expose the <code>root</code> function and the <code>Node</code> enum in different platforms or environments. Ready?</p> <p>3… 2… 1… lift-off!</p> How Automattic (WordPress.com & co.) partly moved away from PHPUnit to atoum? 2018-02-26T00:00:00+00:00 2018-02-26T00:00:00+00:00 Unknown https://mnt.io/articles/how-automattic-partly-moved-away-from-phpunit-to-atoum/ <p>Hello fellow developers and testers,</p> <p>Few months ago at <a rel="noopener external" target="_blank" href="https://automattic.com/">Automattic</a>, my team and I started a new project: <strong>Having better tests for the payment system</strong>. The payment system is used by all the services at Automattic, i.e. <a rel="noopener external" target="_blank" href="https://wordpress.com/">WordPress</a>, <a rel="noopener external" target="_blank" href="https://vaultpress.com/">VaultPress</a>, <a rel="noopener external" target="_blank" href="https://jetpack.com/">Jetpack</a>, <a rel="noopener external" target="_blank" href="http://akismet.com/">Akismet</a>, <a rel="noopener external" target="_blank" href="http://polldaddy.com/">PollDaddy</a> etc. It's a big challenge! Cherry on the cake: Our experiment could define the future of the testing practices for the entire company. No pressure.</p> <p>This post is a summary about what have been accomplished so far, the achievements, the failures, and the future, focused around manual tests. As the title of this post suggests, we are going to talk about <a rel="noopener external" target="_blank" href="https://phpunit.de/">PHPUnit</a> and <a rel="noopener external" target="_blank" href="http://atoum.org/">atoum</a>, which are two PHP test frameworks. This is not a PHPUnit vs. atoum fight. These are observations made for our software, in our context, with our requirements, and our expectations. I think the discussion can be useful for many projects outside Automattic. I would like to apologize in advance if some parts sound too abstract, I hope you understand I can't reveal any details about the payment system for obvious reasons.</p> <h2 id="where-we-were-and-where-to-go">Where we were, and where to go<a role="presentation" class="anchor" href="#where-we-were-and-where-to-go" title="Anchor link to this header">#</a> </h2> <p>For historical reasons, WordPress, VaultPress, Jetpack &amp; siblings use <a rel="noopener external" target="_blank" href="https://phpunit.de/">PHPUnit</a> for server-side manual tests. There are unit, integration, and system manual tests. There are also end-to-end tests or benchmarks, but we are not interested in them now. When those products were built, PHPUnit was the main test framework in town. Since then, the test landscape has considerably changed in PHP. New competitors, like <a rel="noopener external" target="_blank" href="http://atoum.org/">atoum</a> or <a rel="noopener external" target="_blank" href="http://behat.org/">Behat</a>, have a good position in the game.</p> <p>Those tests exist for many years. Some of them grew organically. PHPUnit does not require any form of structure, which is —despite being questionable according to me— a reason for its success. It is a requirement that the code does not need to be well-designed to be tested, <em>but</em> too much freedom on the test side comes with a cost in the long term if there is not enough attention.</p> <p><strong>Our situation is the following</strong>. The code is complex for justified reasons, and the <em>testability</em> is sometimes lessened. Testing across many services is indubitably difficult. Some parts of the code are really old, mixed with others that are new, shiny, and well-done. In this context, it is really difficult to change something, especially moving to another test framework. The amount of work it represents is colossal. Any new test framework does not worth the price for this huge refactoring. But maybe the new test frameworks can help us to better test our code?</p> <p>I'm a <a rel="noopener external" target="_blank" href="https://github.com/atoum/atoum/graphs/contributors">long term contributor of atoum</a> (top 3 contributors). And at the time of writing, I'm a core member. You have to believe me when I say that, at each step of the discussions or the processes, I have been neutral, arguing in favor or against atoum. The idea to switch to atoum partly came from me actually, but my knowledge about atoum is definitively a plus. I am in a good position to know the pros and the cons of the tool, and I'm perfectly aware of how it could solve issues we have.</p> <p>So after many debates and discussions, we decided to <em>try</em> to move to atoum. A survey and a meeting were scheduled 2 months later to decide whether we should continue or not. Spoiler: We will partly continue with it.</p> <h2 id="our-needs-and-requirements">Our needs and requirements<a role="presentation" class="anchor" href="#our-needs-and-requirements" title="Anchor link to this header">#</a> </h2> <p>Our code is difficult to test. In other words, the testability is low for some parts of the code. atoum has features to help increase the testability. I will try to summarize those features in the following short sections.</p> <h3 id="atoum-phpunit-extension"><code>atoum/phpunit-extension</code><a role="presentation" class="anchor" href="#atoum-phpunit-extension" title="Anchor link to this header">#</a> </h3> <p>As I said, it's not possible to rewrite/migrate all the existing tests. This is a colossal effort with a non-neglieable cost. Then, enter <a rel="noopener external" target="_blank" href="https://github.com/atoum/phpunit-extension"><code>atoum/phpunit-extension</code></a>.</p> <p>As far as I know, atoum is the only PHP framework that is able to run tests that have been written for another framework. The <code>atoum/phpunit-extension</code> does exactly that. It runs tests written with the PHPUnit API with the atoum engines. This is <em>fabulous</em>! PHPUnit is not required at all. With this extension, we have been able to run our “legacy” (aka PHPUnit) tests with atoum. The following scenarios can be fulfilled:</p> <ul> <li>Existing test suites written with the PHPUnit API can be run seamlessly by atoum, no need to rewrite them,</li> <li>Of course, new test suites are written with the atoum API,</li> <li>In case of a test suite migration from PHPUnit to atoum, there are two solutions: <ol> <li>Rewrite the test suite entirely from scratch by logically using the atoum API, or</li> <li>Only change the parent class from <code>PHPUnit\Framework\TestCase</code> to <code>atoum\phpunit\test</code>, and suddenly it is possible to use both API at the same time (and thus migrate one test case after the other for instance).</li> </ol> </li> </ul> <p>This is a very valuable tool for an adventure like ours.</p> <p><code>atoum/phpunit-extension</code> is not perfect though. Some PHPUnit APIs are missing. And while the test verdict is strictly the same, error messages can be different, some PHPUnit extensions may not work properly etc. Fortunately, our usage of PHPUnit is pretty raw: No extensions except home-made ones, few hacks… Everything went well. We also have been able to contribute easily to the extension.</p> <h3 id="mock-engines-plural">Mock engines (plural)<a role="presentation" class="anchor" href="#mock-engines-plural" title="Anchor link to this header">#</a> </h3> <p>atoum comes with <a rel="noopener external" target="_blank" href="http://docs.atoum.org/en/latest/mocking_systems.html">3 mock engines</a>:</p> <ul> <li>Class-like mock engine for classes and interfaces,</li> <li>Function mock engine,</li> <li>Constant mock engine.</li> </ul> <p>Being able to mock global functions or global constants is an important feature for us. It suddenly increases the testability of our code! The following example is fictional, but it's a good illustration. WordPress is full of global functions, but it is possible to mock them with atoum like this:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> test_foo</span><span>() {</span></span> <span class="giallo-l"><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">function</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">get_userdata</span><span class="z-keyword z-operator"> =</span><span> (</span><span class="z-storage">object</span><span>)</span><span class="z-punctuation z-section"> [</span></span> <span class="giallo-l"><span class="z-string"> &#39;user_login&#39;</span><span class="z-keyword z-operator"> =&gt;</span><span class="z-constant z-other"> …</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-string"> &#39;user_pass&#39;</span><span class="z-keyword z-operator"> =&gt;</span><span class="z-constant z-other"> …</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-constant z-other"> …</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> ]</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>In one line of code, it was possible to mock the <a rel="noopener external" target="_blank" href="https://codex.wordpress.org/Function_Reference/get_userdata"><code>get_userdata</code></a> function.</p> <h3 id="runner-engines">Runner engines<a role="presentation" class="anchor" href="#runner-engines" title="Anchor link to this header">#</a> </h3> <p>Being able to isolate test execution is a necessity to avoid flakey tests, and to increase the trust we put in the test verdicts. atoum comes with <em>de facto</em> 3 runner engines:</p> <ul> <li><em>Inline</em>, one test case after another in the same process,</li> <li><em>Isolate</em>, one test case after another but each time in a new process (full isolation),</li> <li><em>Concurrent</em>, like <em>isolate</em> but tests run concurrently (“at the same time”).</li> </ul> <p>I'm not saying PHPUnit doesn't have those features. It is possible to run tests in a different process each time —with the <em>isolate</em> engine—, but test execution time blows up, and the isolation is not strict. We don't use it. The <em>concurrent</em> runner engine in atoum tends to reduce the execution time to be close to the <em>inline</em> engine, while still ensuring a strict isolation.</p> <p>Fun fact: By using atoum and the <code>atoum/phpunit-extension</code>, we are able to run PHPUnit tests concurrently with a strict isolation!</p> <h3 id="code-coverage-reports">Code coverage reports<a role="presentation" class="anchor" href="#code-coverage-reports" title="Anchor link to this header">#</a> </h3> <p>At the time of writing, PHPUnit is not able to generate code coverage reports containing the Branch- or Path Coverage Criteria data. atoum supports them natively with the <a rel="noopener external" target="_blank" href="https://github.com/atoum/reports-extension"><code>atoum/reports-extension</code></a> (including nice graphs, see <a rel="noopener external" target="_blank" href="http://atoum.org/reports-extension/">the demonstration</a>). And we need those data.</p> <h2 id="the-difficulties">The difficulties<a role="presentation" class="anchor" href="#the-difficulties" title="Anchor link to this header">#</a> </h2> <p>On paper, most of the pain points sound addressable. It was time to experiment.</p> <h3 id="integration-to-the-continuous-integration-server">Integration to the Continuous Integration server<a role="presentation" class="anchor" href="#integration-to-the-continuous-integration-server" title="Anchor link to this header">#</a> </h3> <p>Our CI does not natively support standard test execution report formats. Thus we had to create the <a rel="noopener external" target="_blank" href="https://github.com/Hywan/atoum-teamcity-extension/"><code>atoum/teamcity-extension</code></a>. <a href="https://mnt.io/articles/atoum-supports-teamcity/">Learn more</a> by reading a blog post I wrote recently. The TeamCity support is native inside PHPUnit (see the <a rel="noopener external" target="_blank" href="http://phpunit.readthedocs.io/en/latest/textui.html?highlight=--log-teamcity"><code>--log-teamcity</code> option</a>).</p> <h3 id="bootstrap-test-environments">Bootstrap test environments<a role="presentation" class="anchor" href="#bootstrap-test-environments" title="Anchor link to this header">#</a> </h3> <p>Our bootstrap files are… challenging. It's expected though. Setting up a functional test environment for a software like WordPress.com is not a task one can accomplish in 2 minutes. Fortunately, we have been able to re-use most of the PHPUnit parts.</p> <p>Today, our unit tests run in complete isolation and concurrently. Our integration tests, and system tests run in complete isolation but not concurrently, due to MySQL limitations. We have solutions, but time needs to be invested.</p> <p>Generally, even if it works now, it took time to re-organize the bootstrap so that some parts can be shared between the test runners (because we didn't switch the whole company to atoum yet, it was an experiment).</p> <h3 id="documentation-and-help">Documentation and help<a role="presentation" class="anchor" href="#documentation-and-help" title="Anchor link to this header">#</a> </h3> <p>Here is an interesting paradox. The majority of the team recognized that atoum's documentation is better than PHPUnit's, even if some parts must be rewritten or reworked. <em>But</em> developers already know PHPUnit, so they don't look at the documentation. If they have to, they will instead find their answers on StackOverflow, or by talking to someone else in the company, but not by checking the official documentation. atoum does not have many StackOverflow threads, and few people are atoum users within the company.</p> <p>What we have also observed is that when people create a new test, it's a copy-paste from an existing one. Let's admit this is a common and natural practice. When a difficulty is met, it's legit to look at somewhere else in the test repository to check if a similar situation has been resolved. In our context, that information lacked a little bit. We tried to write more and more tests, but not fast enough. It should not be an issue if you have time to try, but in our context, we unfortunately didn't have this time. The team faced many challenges in the same period, and the tests we are building are not simple _Hello, World!_s as you might think, so it increases the effort.</p> <p>To be honest, this was not the biggest difficulty, but still, it is important to notice.</p> <h3 id="concurrent-integration-test-executions">Concurrent integration test executions<a role="presentation" class="anchor" href="#concurrent-integration-test-executions" title="Anchor link to this header">#</a> </h3> <p>Due to some MySQL limitations combined with the complexity of our code, we are not able to run integration (and system) tests concurrently yet. Therefore it takes time to run them, probably too much in our development environments. Even if atoum has friendly options to reduce the debug loop (e.g. see <a rel="noopener external" target="_blank" href="http://docs.atoum.org/en/latest/mode-loop.html">the <code>--loop</code> option</a>), the execution is still slow. The problem can be solved but it requires time, and deep modifications of our code.</p> <p>Note that with our PHPUnit tests, no isolation is used. This is wrong. And thus we have a smaller trust in the test verdict than with atoum. Almost everyone in the team prefers to have slow test execution but isolation, rather than fast test execution but no confidence in the test verdict. So that's partly a difficulty. It's a mix of a positive feature and a needle in the foot, and a needle we can live with. atoum is not responsible of this latency: The state of our code is.</p> <h2 id="the-results">The results<a role="presentation" class="anchor" href="#the-results" title="Anchor link to this header">#</a> </h2> <p>First, let's start by the positive impacts:</p> <ul> <li>In 2 months, we have observed that the testability of our code has been increased by using atoum,</li> <li>We have been able to find bugs in our code that were not detected by PHPUnit, mostly because atoum checks the type of the data,</li> <li>We have been able to migrate “legacy tests” (aka PHPUnit tests) to atoum by just moving the files from one directory to another: What a smooth migration!</li> <li>The <em>trust</em> we put in our test verdict has increased thanks to a strict test execution isolation.</li> </ul> <p>Now, the negative impacts:</p> <ul> <li>Even if the testability has been increased, it's not enough. Right now, we are looking at refactoring our code. Introducing atoum right now was probably too early. Let's refactor first, then use a better test toolchain later when things will be cleaner,</li> <li>Moving the whole company at once is hard. There are thousands of manual tests. The <code>atoum/phpunit-extension</code> is not magical. We have to come with more solid results, stuff to blow minds. It is necessary to set the institutional inertia in motion. For instance, not being able to run integration and system tests concurrently slows down the builds on the CI; it increases the trust we put in the test verdict, but this latency is not acceptable at the company scale,</li> <li>All the issues we faced can be addressed, but it needs time. The experiment time frame was 2 months. We need 1 or 2 other months to solve the majority of the remaining issues. Note that I was kind of in-charge of this project, but not full time.</li> </ul> <p>We stop using atoum for <em>manual tests</em>. It's likely to be a pause though. The experiment has shown we need to refactor and clean our code, then there will be a good chance for atoum to come back. The experiment has also shown how to increase the testability of our code: Not everything can be addressed by using another test framework even if it largely participates. We can focus on those points specifically, because we know where they are now. Finally, I reckon it has participated in moving the test infrastructure inside Automattic by showing that something else exists, and that we can go further.</p> <p>I said we stopped using atoum “for manual tests”. Yes. Because we also have <em>automatically generated tests</em>. The experiment was not only about switching to atoum. Many other aspects of the experiment are still running! For instance, <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Kitab">Kitab</a> is used for our code documentation. Kitab is able to (i) <em>render</em> the documentation, and (ii) <em>test</em> the examples written inside the documentation. That way the documentation is ensured to be always up-to-date and working. Kitab generates tests for- and executes tests with atoum. It was easy to set up: We just had to use the existing test bootstraps designed for atoum. We also have another tool to <a rel="noopener external" target="_blank" href="https://github.com/Hywan/atoum-apiblueprint-extension">compile HTTP API Blueprint specifications into executable tests</a>. So far, everyone is happy with those tools, no need to go back, everything is automat(t)ic. Other tools are likely to be introduced in the future to automatically generate tests. I want to detail this particular topic in another blog post.</p> <h2 id="conclusion">Conclusion<a role="presentation" class="anchor" href="#conclusion" title="Anchor link to this header">#</a> </h2> <p>Moving to another test framework is a huge decision with many factors. The fact atoum has <code>atoum/phpunit-extension</code> is a time saver. Nonetheless a new test framework does not mean it will fix all the testability issues of the code. The benefits of the new test framework must largely overtake the costs. In our current context, it was not the case. <em>atoum solves issues that are not our priorities</em>. So yes, atoum can help us to solve important issues, but since these issues are not priorities, then the move to atoum was too early. During the project, we gained new automatic test tools, like <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Kitab">Kitab</a>. The experiment is not a failure. Will we retry atoum? It's very likely. When? I hope in a year.</p> One conference per day, for one year (2017) 2018-01-25T00:00:00+00:00 2018-01-25T00:00:00+00:00 Unknown https://mnt.io/articles/one-conference-per-day-for-one-year-2017/ <p>My self-assigned challenge for 2017 was to watch at least one conference per day, for one year. That's the first time I try this challenge. Let's dive in for a recap.</p> <h2 id="267-conferences">267 conferences<a role="presentation" class="anchor" href="#267-conferences" title="Anchor link to this header">#</a> </h2> <p>In some way, I failed the challenge because I've been able to watch only 267 conferences. With an average of 34 minutes per conference, I've watched 9078 minutes, or 151 hours of <em>freely available</em> conferences online. Why did I fail to watch 365 of them? Because my first kid was 1.5 years in January 2017, a new little lady came in December 2017, I <a href="https://mnt.io/articles/bye-bye-liip-hello-automattic/">got a new job</a>, I <a href="https://mnt.io/articles/automattic-grand-meetup-2017/">travelled for my job</a>, I <a rel="noopener external" target="_blank" href="https://www.youtube.com/watch?v=Ymy8qAEe0kQ">gave talks</a>, I maintain important open source projects requiring lot of time, I'm building my own self-sufficient ecological house, the vegetable garden requires many hours, I watch other videos, and because I'm lazy sometimes. Most of the time, I was able to watch 2 or 3 conferences in a row.</p> <h2 id="where-to-find-the-resources">Where to find the resources?<a role="presentation" class="anchor" href="#where-to-find-the-resources" title="Anchor link to this header">#</a> </h2> <p>All these conferences are freely available online, on YouTube, or on Vimeo, for most of them. The channel I mostly watch are the following:</p> <ul> <li><a rel="noopener external" target="_blank" href="https://www.youtube.com/channel/UCaYhcUwRBNscFNUKTjgPFiA">Rust</a>,</li> <li><a rel="noopener external" target="_blank" href="https://www.youtube.com/user/Confreaks">Confreaks</a>,</li> <li><a rel="noopener external" target="_blank" href="https://www.youtube.com/channel/UCCBVCTuk6uJrN3iFV_3vurg">Devoxx</a>,</li> <li><a rel="noopener external" target="_blank" href="https://www.youtube.com/channel/UCOpGiN9AkczVjlpGDaBwQrQ">elm-conf</a>,</li> <li><a rel="noopener external" target="_blank" href="https://www.youtube.com/user/BoostCon">BoostCon</a>,</li> <li><a rel="noopener external" target="_blank" href="https://www.youtube.com/user/jsconfeu">JSConf</a>,</li> <li><a rel="noopener external" target="_blank" href="https://www.youtube.com/channel/UCv2_41bSAa5Y_8BacJUZfjQ">LLVM</a>,</li> <li><a rel="noopener external" target="_blank" href="https://www.youtube.com/user/CppCon">CppCon</a>,</li> <li><a rel="noopener external" target="_blank" href="https://www.youtube.com/channel/UC_QIfHvN9auy2CoOdSfMWDw">Strange Loop</a>.</li> </ul> <p>It's very Computer Science centric as you might have noticed, and it targets Rust, C++, Elm, LLVM, or Web technologies (JS, CSS…), but not only, you can find Haskell or Clojure sometimes.</p> <h2 id="my-best-of-list">My best-of list<a role="presentation" class="anchor" href="#my-best-of-list" title="Anchor link to this header">#</a> </h2> <p>In March 2017, more and more people were questionning me, and asked for sharing. I then decided to start a <a rel="noopener external" target="_blank" href="https://www.youtube.com/playlist?list=PLOkMRkzDhWGX_4YWI4ZYGbwFPqKnDRudf">playlist of my “best-of” conferences</a>. I've added 78 conferences in 2017, and 3 new conferences have been added since then.</p> <figure> <iframe class="youtube-player" width="560" height="315" src="https://www.youtube-nocookie.com/embed/videoseries?si=9Q6Qf-EOE4nyFgrn&amp;list=PLOkMRkzDhWGX_4YWI4ZYGbwFPqKnDRudf" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> <figcaption> <p>My best-of talks playlist.</p> </figcaption> </figure> <h2 id="thoughts-and-conclusion">Thoughts and conclusion<a role="presentation" class="anchor" href="#thoughts-and-conclusion" title="Anchor link to this header">#</a> </h2> <p>The challenge was sometimes easy and relaxing, or it was very hard to understand everything especially at 2am after a long day (looking at you CppCon). But it has been a very enjoyable way to learn a lot in a very short period of time. Many speakers are talented, and listening to them is a real pleasure. Some others are just… let's say unprepared, and it's good to stop and jump onto another talk. It's also a good way to get inspired by technologies you don't necessarily know (for instance, I'm not a big fan of Clojure, but some projects are really inspiring, like <a rel="noopener external" target="_blank" href="https://www.youtube.com/watch?v=buPPGxOnBnk&amp;index=81&amp;list=PLOkMRkzDhWGX_4YWI4ZYGbwFPqKnDRudf">Proto REPL</a>).</p> <p>Sometimes <a rel="noopener external" target="_blank" href="https://twitter.com/mnt_io">I tweeted</a> about the talk I watched, and it was quite appreciated too. I reckon because it's a fun and an easy way to learn, especially with the help of video platforms like Youtube.</p> <p>Am I going to continue this challenge in 2018? Yes! But maybe not at this frequency. It's now part of my routine to watch conferences many times per week. I like it. I don't want to stop.</p> <p>As a closing note, I would like to <em>thank</em> every speakers, and more importantly, every conference organizer. You are doing an amazing job: From the program, to the event, to the final sharing on Internet with everyone. Most of you are volunteers. I know the work it represents. You are producing <em>extremely valuable resources</em>. Thank you!</p> Random thoughts about `::class` in PHP 2018-01-24T00:00:00+00:00 2018-01-24T00:00:00+00:00 Unknown https://mnt.io/articles/random-thoughts-about-class-in-php/ <blockquote> <p>The special <strong><code>::class</code></strong> constant allows for fully qualified class name resolution at compile, this is useful for namespaced classes.</p> </blockquote> <p>I'm quoting <a rel="noopener external" target="_blank" href="http://php.net/manual/en/language.oop5.constants.php">the PHP manual</a>. But things can be funny sometimes. Let's go through some examples.</p> <ul> <li><pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">use</span><span> A</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">B</span><span class="z-keyword"> as</span><span> C</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$_</span><span class="z-keyword z-operator"> =</span><span class="z-support z-class"> C</span><span class="z-keyword z-operator">::</span><span class="z-keyword">class</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-string">``` &lt;!-- rumdl-disable-line MD031 --&gt;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-string">resolves to `</span><span class="z-constant z-other">A</span><span>\</span><span class="z-constant z-other">B</span><span class="z-string">`, which is perfect 🙂</span></span> <span class="giallo-l"></span></code></pre></li> <li><pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> C</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> f</span><span>() {</span></span> <span class="giallo-l"><span class="z-variable"> $_</span><span class="z-keyword z-operator"> =</span><span class="z-storage"> self</span><span class="z-keyword z-operator">::</span><span class="z-keyword">class</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>resolves to <code>C</code>, which is perfect 😀</p> </li> <li><pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> C</span><span> {}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> D</span><span class="z-storage"> extends</span><span class="z-entity z-other z-inherited-class"> C</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> f</span><span>() {</span></span> <span class="giallo-l"><span class="z-variable"> $_</span><span class="z-keyword z-operator"> =</span><span class="z-storage"> parent</span><span class="z-keyword z-operator">::</span><span class="z-keyword">class</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"><span class="z-string">``` &lt;!-- rumdl-disable-line MD031 --&gt;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-string">resolves to `</span><span class="z-constant z-other">C</span><span class="z-string">`, which is perfect 😄</span></span> <span class="giallo-l"></span></code></pre></li> <li><pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> C</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public static</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> f</span><span>() {</span></span> <span class="giallo-l"><span class="z-variable"> $_</span><span class="z-keyword z-operator"> =</span><span class="z-storage"> static</span><span class="z-keyword z-operator">::</span><span class="z-keyword">class</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> D</span><span class="z-storage"> extends</span><span class="z-entity z-other z-inherited-class"> C</span><span> {}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-class">D</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">f</span><span>()</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>resolves to <code>D</code>, which is perfect 😍</p> </li> <li><pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-string">&#39;foo&#39;</span><span class="z-keyword z-operator">::</span><span class="z-keyword">class</span></span> <span class="giallo-l"><span class="z-string">``` &lt;!-- rumdl-disable-line MD031 --&gt;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-string">resolves to `&#39;foo&#39;`, which is… huh? 🤨</span></span> <span class="giallo-l"></span></code></pre></li> <li><pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-string">&quot;foo&quot;</span><span class="z-keyword z-operator">::</span><span class="z-keyword">class</span></span></code></pre> <p>resolves to <code>'foo'</code>, which is… expected somehow 😕</p> </li> <li><pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$a</span><span class="z-keyword z-operator"> =</span><span class="z-string"> &#39;oo&#39;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-string">&quot;f{</span><span class="z-variable">$a</span><span class="z-string">}&quot;</span><span class="z-keyword z-operator">::</span><span class="z-keyword">class</span></span> <span class="giallo-l"><span class="z-string">``` &lt;!-- rumdl-disable-line MD031 --&gt;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-string">generates a parse error 🙃</span></span> <span class="giallo-l"></span></code></pre></li> <li><pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-class">PHP_VERSION</span><span class="z-keyword z-operator">::</span><span class="z-keyword">class</span></span></code></pre> <p>resolves to <code>'PHP_VERSION'</code>, which is… strange: It resolves to the fully qualified name of the constant, not the <em>class</em> 🤐</p> </li> </ul> <p><code>::class</code> is very useful to get rid of of the <code>get_class</code> or the <code>get_called_class</code> functions, or even the <code>get_class($this)</code> trick. This is something truly useful in PHP where entities are referenced as strings, not as symbols. <code>::class</code> on constants makes sense, but the name is no longer relevant. And finally, <code>::class</code> on single quote strings is absolutely useless; on double quotes strings it is a source of error because the value can be dynamic (and remember, <code>::class</code> is resolved at compile time, not at run time).</p> Automattic, Grand Meetup 2017 2017-11-26T00:00:00+00:00 2017-11-26T00:00:00+00:00 Unknown https://mnt.io/articles/automattic-grand-meetup-2017/ <p>Awesome company, awesome teams, awesome people. Thanks everyone for this moment!</p> <figure> <p><img src="https://mnt.io/articles/automattic-grand-meetup-2017/./gm.jpg" alt="All the people" loading="lazy" decoding="async" /></p> <figcaption> <p>All the people!</p> </figcaption> </figure> atoum supports TeamCity 2017-11-06T00:00:00+00:00 2017-11-06T00:00:00+00:00 Unknown https://mnt.io/articles/atoum-supports-teamcity/ <p><a rel="noopener external" target="_blank" href="http://atoum.org/">atoum</a> is a popular PHP test framework. <a rel="noopener external" target="_blank" href="https://www.jetbrains.com/teamcity/">TeamCity</a> is a Continuous Integration and Continuous Delivery software developed by Jetbrains. Despites <a rel="noopener external" target="_blank" href="http://atoum.org/features.html#reports">atoum supports many industry standards</a> to report test execution verdicts, TeamCity uses <a rel="noopener external" target="_blank" href="https://confluence.jetbrains.com/display/TCD8/Build+Script+Interaction+with+TeamCity">its own non-standard report</a>, and thus atoum is not compatible with TeamCity… until now.</p> <p>The <code>atoum/teamcity-extension</code> provides TeamCity support inside atoum. When executing tests, the reported verdicts are understandable by TeamCity, and activate all its UI features.</p> <h2 id="install">Install<a role="presentation" class="anchor" href="#install" title="Anchor link to this header">#</a> </h2> <p>If you have Composer, just run:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> composer require atoum/teamcity-extension </span><span class="z-string">&#39;~1.0&#39;</span></span></code></pre> <p>From this point, you need to enable the extension in your <code>.atoum.php</code> configuration file. The following example forces to enable the extension for every test execution:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$extension</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span> atoum</span><span class="z-punctuation z-separator">\</span><span>teamcity</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">extension</span><span>(</span><span class="z-variable">$script</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable">$extension</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">addToRunner</span><span>(</span><span class="z-variable">$runner</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>The following example enables the extension <strong>only within</strong> a TeamCity environment:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$extension</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span> atoum</span><span class="z-punctuation z-separator">\</span><span>teamcity</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">extension</span><span>(</span><span class="z-variable">$script</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable">$extension</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">addToRunnerWithinTeamCityEnvironment</span><span>(</span><span class="z-variable">$runner</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>This latter installation is recommended. That's it 🙂.</p> <h2 id="glance">Glance<a role="presentation" class="anchor" href="#glance" title="Anchor link to this header">#</a> </h2> <p>The default CLI report looks like this:</p> <figure> <p><img src="https://mnt.io/articles/atoum-supports-teamcity/./cli.png" alt="Default atoum CLI report" loading="lazy" decoding="async" /></p> <figcaption> <p>The default CLI report is the default one from atoum.</p> </figcaption> </figure> <p>The TeamCity report looks like this in your terminal (note the <code>TEAMCITY_VERSION</code> variable as a way to emulate a TeamCity environment):</p> <figure> <p><img src="https://mnt.io/articles/atoum-supports-teamcity/./cli-teamcity.png" alt="TeamCity report inside the terminal" loading="lazy" decoding="async" /></p> <figcaption> <p>The TeamCity report is text-based, but it is aimed at being consumed by a formatter to produce HTML.</p> </figcaption> </figure> <p>Which is less easy to read. However, when it comes into TeamCity UI, we will have the following result:</p> <figure> <p><img src="https://mnt.io/articles/atoum-supports-teamcity/./teamcity.png" alt="TeamCity running atoum" loading="lazy" decoding="async" /></p> <figcaption> <p>The final rendering, at an HTML document inside TeamCity itself.</p> </figcaption> </figure> <p>We are using it at <a rel="noopener external" target="_blank" href="https://automattic.com/">Automattic</a>. Hope it is useful for someone else!</p> <p>If you find any bugs, or would like any other features, please use Github at the following repository: <a rel="noopener external" target="_blank" href="https://github.com/Hywan/atoum-teamcity-extension/">https://github.com/Hywan/atoum-teamcity-extension/</a>.</p> Export functions in PHP à la Javascript 2017-10-30T00:00:00+00:00 2017-10-30T00:00:00+00:00 Unknown https://mnt.io/articles/export-functions-in-php-a-la-javascript/ <p>Warning: This post is totally useless. It is the result of a fun private company thread.</p> <h2 id="export-functions-in-javascript">Export functions in Javascript<a role="presentation" class="anchor" href="#export-functions-in-javascript" title="Anchor link to this header">#</a> </h2> <p>In Javascript, a file can export functions like this:</p> <pre class="giallo z-code"><code data-lang="javascript"><span class="giallo-l"><span class="z-keyword">export</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> times2</span><span>(</span><span class="z-variable z-parameter">x</span><span>) {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> x</span><span class="z-keyword z-operator"> *</span><span class="z-constant z-numeric"> 2</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>And then we can import this function in another file like this:</p> <pre class="giallo z-code"><code data-lang="javascript"><span class="giallo-l"><span class="z-keyword">import</span><span> {</span><span class="z-variable">times2</span><span>}</span><span class="z-keyword"> from</span><span class="z-string"> &#39;foo&#39;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">console</span><span class="z-punctuation z-accessor">.</span><span class="z-entity z-name z-function">log</span><span>(</span><span class="z-entity z-name z-function">times2</span><span>(</span><span class="z-constant z-numeric">21</span><span>))</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> // 42</span></span></code></pre> <p>Is it possible with PHP?</p> <h2 id="export-functions-in-php">Export functions in PHP<a role="presentation" class="anchor" href="#export-functions-in-php" title="Anchor link to this header">#</a> </h2> <p>Every entity is public in PHP: Constant, function, class, interface, or trait. They can live in a namespace. So exporting functions in PHP is absolutely useless, but just for the fun, let's keep going.</p> <p>A PHP file can return an integer, a real, an array, an anonymous function, anything. Let's try this:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">return</span><span class="z-storage z-type z-function"> function</span><span> (</span><span class="z-keyword">int</span><span class="z-variable"> $x</span><span>)</span><span class="z-keyword z-operator">:</span><span class="z-keyword"> int</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> $x</span><span class="z-keyword z-operator"> *</span><span class="z-constant z-numeric"> 2</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>And then in another file:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$times2</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> require</span><span class="z-string"> &#39;foo.php&#39;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">var_dump</span><span>(</span><span class="z-variable">$times2</span><span>(</span><span class="z-constant z-numeric">21</span><span>))</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> // int(42)</span></span></code></pre> <p>Great, it works.</p> <p>What if our file returns more than one function? Let's use an array (which has most hashmap properties):</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">return</span><span class="z-punctuation z-section"> [</span></span> <span class="giallo-l"><span class="z-string"> &#39;times2&#39;</span><span class="z-keyword z-operator"> =&gt;</span><span class="z-storage z-type z-function"> function</span><span> (</span><span class="z-keyword">int</span><span class="z-variable"> $x</span><span>)</span><span class="z-keyword z-operator">:</span><span class="z-keyword"> int</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> $x</span><span class="z-keyword z-operator"> *</span><span class="z-constant z-numeric"> 2</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-string"> &#39;answer&#39;</span><span class="z-keyword z-operator"> =&gt;</span><span class="z-storage z-type z-function"> function</span><span> ()</span><span class="z-keyword z-operator">:</span><span class="z-keyword"> int</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-constant z-numeric"> 42</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span class="z-punctuation z-section">]</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>To choose what to import, let's use <a rel="noopener external" target="_blank" href="https://github.com/php/php-langspec/blob/master/spec/10-expressions.md#list-intrinsic">the <code>list</code> intrinsic</a>. It has several forms: With or without key matching, long (<code>list(…)</code>) and short syntax (<code>[…]</code>). Because we are modern, we will use the short syntax with key matching to selectively import functions:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-section">[</span><span class="z-string">&#39;times2&#39;</span><span class="z-keyword z-operator"> =&gt;</span><span class="z-variable"> $mul</span><span class="z-punctuation z-section">]</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> require</span><span class="z-string"> &#39;foo.php&#39;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-function">var_dump</span><span>(</span><span class="z-variable">$mul</span><span>(</span><span class="z-constant z-numeric">21</span><span>))</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> // int(42)</span></span></code></pre> <p>Notice that <code>times2</code> has been aliased to <code>$mul</code>. What a feature!</p> <p>Is it useful? Absolutely not. Is it fun? For me it is.</p> Finite-State Machine as a Type System illustrated with a store product 2017-08-09T00:00:00+00:00 2017-08-09T00:00:00+00:00 Unknown https://mnt.io/articles/finite-state-machine-as-a-type-system-illustrated-with-a-store-product/ <p>Hello fellow coders!</p> <p>In this article, I would like to talk about how to implement a <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Finite-state_machine">Finite-State Machine</a> (FSM) with the PHP type system. The example is a store product (in an e-commerce solution for instance), something we are likely to meet once in our lifetime. Our goal is to simply <strong>avoid impossible states and transitions</strong>.</p> <p>I am in deep love with <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Type_theory">Type theory</a>, however I will try to keep the formulas away from this article to focus on the code. Moreover, you might be aware that the PHP <em>runtime</em> type system is somewhat very permissive and “poor” (this is not a formal definition), hopefully some tricks can help us to express nice constraints.</p> <h2 id="the-product-fsm">The Product FSM<a role="presentation" class="anchor" href="#the-product-fsm" title="Anchor link to this header">#</a> </h2> <p>A product in a store might have the following states:</p> <ul> <li>Active: Can be purchased,</li> <li>Inactive: Has been cancelled or discontinued (a discontinued product can no longer be purchased),</li> <li>Purchased and renewable,</li> <li>Purchased and not renewable,</li> <li>Purchased and cancellable.</li> </ul> <p>The transitions between these states can be viewed as a <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Finite-state_machine">Finite-State Machine</a> (FSM).</p> <figure> <p><img src="https://mnt.io/articles/finite-state-machine-as-a-type-system-illustrated-with-a-store-product/./schema1.png" alt="First schema" loading="lazy" decoding="async" /></p> <figcaption> <p>We read this graph as: A product is in the state <code>A</code>. If the <code>purchase</code> action is called, then it transitions to the state <code>B</code>. If the <code>once-off purchase</code> action is called, then it transitions to the state <code>C</code>. From the state <code>B</code>, if the <code>renew</code> action is called, it remains in the same state. If the <code>cancel</code> action is called, it transitions to the <code>D</code> state. Same for the <code>C</code> to <code>D</code> states.</p> </figcaption> </figure> <p>Our goal is to respect this FSM. Invalid actions must be impossible to do.</p> <h2 id="">Finite-State Machine as a Type System<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h2> <p>Having a FSM is a good thing to define the states and the transitions between them: It is formal and clear. However, it is tested at runtime, not at compile-time, i.e. <code>if</code> statements are required to test if the state of a product can transition into another state, or else throw an exception, and this is decided at runtime. Note that PHP does not really have a compile-time because it is an online compiler (learn more by reading <a rel="noopener external" target="_blank" href="https://speakerdeck.com/hywan/tagua-vm-a-safe-php-virtual-machine">Tagua VM, a safe PHP virtual machine</a>, at slide 29). Our goal is to prevent illegal/invalid states at parse-/compile-time so that the PHP virtual machine, IDE or static analysis tools can prove the state of a product without executing PHP code.</p> <p>Why is this important? Imagine that we decide to change a product to be once-off purchasable instead of purchasable, then we can no longer renew it. We replace an interface on this product, and boom, the IDE tells us that the code is broken in <em>x</em> places. It <strong>detects impossible scenarios ahead of code execution</strong>.</p> <p>No more talking. Here is the code.</p> <h3 id="-1">The mighty product<a role="presentation" class="anchor" href="#-1" title="Anchor link to this header">#</a> </h3> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">/**</span></span> <span class="giallo-l"><span class="z-comment"> * A product.</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span> <span class="giallo-l"><span class="z-storage">interface</span><span class="z-entity z-name"> Product</span><span> {}</span></span></code></pre> <p>A product is a class implementing the <code>Product</code> interface. It allows to type a generic product, with no regards about its state.</p> <h3 id="-2">Active and inactive<a role="presentation" class="anchor" href="#-2" title="Anchor link to this header">#</a> </h3> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">/**</span></span> <span class="giallo-l"><span class="z-comment"> * A product that is active.</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span> <span class="giallo-l"><span class="z-storage">interface</span><span class="z-entity z-name"> Active</span><span class="z-storage"> extends</span><span class="z-entity z-other z-inherited-class"> Product</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> getProduct</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-storage"> self</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">/**</span></span> <span class="giallo-l"><span class="z-comment"> * A product that has been cancelled, or not in stock.</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span> <span class="giallo-l"><span class="z-storage">interface</span><span class="z-entity z-name"> Inactive</span><span class="z-storage"> extends</span><span class="z-entity z-other z-inherited-class"> Product</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> getProduct</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-storage"> self</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>The <code>Active</code> and <code>Inactive</code> interfaces are useful to create constraints such as:</p> <ul> <li>A product can be purchased only if it is active, and</li> <li>A product is inactive if and only if it has been cancelled,</li> <li>To finally conclude that an inactive product can no longer be purchased, nor renewed, nor cancelled.</li> </ul> <p>Basically, it defines the axiom (initial state) and the final states of our FSM.</p> <p>The <code>getProduct(): self</code> trick will make sense later. It helps to express the following constraint: “A valid product cannot be invalid, and vice-versa”, i.e. both interfaces cannot be implemented by the same value.</p> <h3 id="-3">Purchase, renew, and cancel<a role="presentation" class="anchor" href="#-3" title="Anchor link to this header">#</a> </h3> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">/**</span></span> <span class="giallo-l"><span class="z-comment"> * A product that can be purchased.</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span> <span class="giallo-l"><span class="z-storage">interface</span><span class="z-entity z-name"> Purchasable</span><span class="z-storage"> extends</span><span class="z-entity z-other z-inherited-class"> Active</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> purchase</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Renewable</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Only an active product can be purchased. The action is <code>purchase</code> and it generates a product that is renewable. <code>purchase</code> transitions from the state <code>A</code> to <code>B</code> (regarding the graph above).</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">/**</span></span> <span class="giallo-l"><span class="z-comment"> * A product that can be cancelled.</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span> <span class="giallo-l"><span class="z-storage">interface</span><span class="z-entity z-name"> Cancellable</span><span class="z-storage"> extends</span><span class="z-entity z-other z-inherited-class"> Active</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> cancel</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Inactive</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Only an active product can be cancelled. The action is <code>cancel</code> and it generates an inactive product, so it transitions from the state <code>B</code> to <code>D</code>.</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">/**</span></span> <span class="giallo-l"><span class="z-comment"> * A product that can be renewed.</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span> <span class="giallo-l"><span class="z-storage">interface</span><span class="z-entity z-name"> Renewable</span><span class="z-storage"> extends</span><span class="z-entity z-other z-inherited-class"> Cancellable</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> renew</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-storage"> self</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>A renewable product is also cancellable. The action is <code>renew</code> and this is a reflexive transition from the state <code>B</code> to <code>B</code>.</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">/**</span></span> <span class="giallo-l"><span class="z-comment"> * A product that can be once-off purchased, i.e. it can be purchased but not</span></span> <span class="giallo-l"><span class="z-comment"> * renewed.</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span> <span class="giallo-l"><span class="z-storage">interface</span><span class="z-entity z-name"> PurchasableOnce</span><span class="z-storage"> extends</span><span class="z-entity z-other z-inherited-class"> Active</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> purchase</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Cancellable</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Finally, a once-off purchasable product has one action: <code>purchase</code> that produces a <code>Cancellable</code> product, and it transitions from the state <code>A</code> to <code>C</code>.</p> <h3 id="-4">Take a breath<a role="presentation" class="anchor" href="#-4" title="Anchor link to this header">#</a> </h3> <figure role="presentation"> <p><img src="https://mnt.io/articles/finite-state-machine-as-a-type-system-illustrated-with-a-store-product/./schema2.png" alt="Second schema" loading="lazy" decoding="async" /></p> </figure> <p>So far we have defined interfaces, but the FSM is not implemented yet. <strong>Interfaces only define constraints</strong> in our type system. An interface provides a constraint but also <strong>defines type capabilities</strong>: <strong>What operations can be performed on a value implementing a particular interface</strong>.</p> <h3 id="-5">SecretProduct<a role="presentation" class="anchor" href="#-5" title="Anchor link to this header">#</a> </h3> <p>Let&#39;s consider the <code>SecretProduct</code> as a new super secret product that will revolutionise our store:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">/**</span></span> <span class="giallo-l"><span class="z-comment"> * The `SecretProduct` class is:</span></span> <span class="giallo-l"><span class="z-comment"> *</span></span> <span class="giallo-l"><span class="z-comment"> * * A product,</span></span> <span class="giallo-l"><span class="z-comment"> * * Active,</span></span> <span class="giallo-l"><span class="z-comment"> * * Purchasable.</span></span> <span class="giallo-l"><span class="z-comment"> *</span></span> <span class="giallo-l"><span class="z-comment"> * Note that in this implementation, the `SecretProduct` instance is mutable: Every</span></span> <span class="giallo-l"><span class="z-comment"> * action happens on the same `SecretProduct` instance. It makes sense because</span></span> <span class="giallo-l"><span class="z-comment"> * having 2 instances of the same product with different states might be error-prone</span></span> <span class="giallo-l"><span class="z-comment"> * in most scenarios.</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> SecretProduct</span><span class="z-storage"> implements</span><span class="z-entity z-other z-inherited-class"> Active</span><span class="z-punctuation z-separator">,</span><span class="z-entity z-other z-inherited-class"> Purchasable</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> getProduct</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Active</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable z-language"> $this</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> /**</span></span> <span class="giallo-l"><span class="z-comment"> * Purchase the product will return an active product that is renewable,</span></span> <span class="giallo-l"><span class="z-comment"> * and also cancellable.</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> purchase</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Renewable</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return new</span><span class="z-storage"> class</span><span> (</span><span class="z-variable z-language">$this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getProduct</span><span>())</span><span class="z-storage"> implements</span><span class="z-entity z-other z-inherited-class"> Renewable</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> protected</span><span class="z-variable"> $product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-support z-function"> __construct</span><span>(</span><span class="z-support z-class">SecretProduct</span><span class="z-variable"> $product</span><span>) {</span></span> <span class="giallo-l"><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">product</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-comment"> // Do the purchase.</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> getProduct</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Active</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> renew</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Renewable</span><span> {</span></span> <span class="giallo-l"><span class="z-comment"> // Do the renew.</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable z-language"> $this</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> cancel</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Inactive</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return new</span><span class="z-storage"> class</span><span> (</span><span class="z-variable z-language">$this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getProduct</span><span>())</span><span class="z-storage"> implements</span><span class="z-entity z-other z-inherited-class"> Inactive</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> protected</span><span class="z-variable"> $product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-support z-function"> __construct</span><span>(</span><span class="z-support z-class">SecretProduct</span><span class="z-variable"> $product</span><span>) {</span></span> <span class="giallo-l"><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">product</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-comment"> // Do the cancel.</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> getProduct</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Inactive</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> }</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> }</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>The <code>SecretProduct</code> is a product that is active and purchasable. PHP verifies that the <code>Active::getProduct</code> method is implemented, and that the <code>Purchasable::purchase</code> method is implemented too.</p> <p>When this latter is called, it returns an object implementing the <code>Renewable</code> interface (which is also a cancellable active product). The object in this context is an instance of an anonymous class implementing the <code>Renewable</code> interface. So the <code>Active::getProduct</code>, <code>Renewable::renew</code>, and <code>Cancellable::cancel</code> methods must be implemented.</p> <p>Having an anonymous class is not required at all, this is just simpler for the example. A named class may even be better from the testing point of view.</p> <p>Note that:</p> <ul> <li>The real purchase action is performed in the constructor of the anonymous class: This is not a hard rule, this is just convenient; it can be done in the method before returning the new instance,</li> <li>The real renew action is performed in the <code>renew</code> method before returning <code>$this</code> ,</li> <li>And the real cancel action is performed in… we have to dig a little bit more (the principle is exactly the same though): <ul> <li>The <code>Cancellable::cancel</code> method must return an object implementing the <code>Inactive</code> interface.</li> <li>It generates an instance of an anonymous class implementing the <code>Inactive</code> interface, and the real cancel action is done in the constructor.</li> </ul> </li> </ul> <h3 id="-6">Assert possible and impossible actions<a role="presentation" class="anchor" href="#-6" title="Anchor link to this header">#</a> </h3> <p>Let&#39;s try some valid and invalid actions. Those followings are <strong>possible actions</strong>:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-function">assert</span><span>((</span><span class="z-keyword">new</span><span class="z-support z-class"> SecretProduct</span><span>())</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">purchase</span><span>()</span><span class="z-keyword z-operator"> instanceof</span><span class="z-support z-class"> Product</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">assert</span><span>((</span><span class="z-keyword">new</span><span class="z-support z-class"> SecretProduct</span><span>())</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">purchase</span><span>()</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">renew</span><span>()</span><span class="z-keyword z-operator"> instanceof</span><span class="z-support z-class"> Product</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">assert</span><span>((</span><span class="z-keyword">new</span><span class="z-support z-class"> SecretProduct</span><span>())</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">purchase</span><span>()</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">cancel</span><span>()</span><span class="z-keyword z-operator"> instanceof</span><span class="z-support z-class"> Product</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">assert</span><span>((</span><span class="z-keyword">new</span><span class="z-support z-class"> SecretProduct</span><span>())</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">purchase</span><span>()</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">renew</span><span>()</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">renew</span><span>()</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">cancel</span><span>()</span><span class="z-keyword z-operator"> instanceof</span><span class="z-support z-class"> Product</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>It is possible to purchase a product, then renew it zero or many times, and finally to cancel it. It matches the FSM!</p> <p>Those followings are <strong>impossible actions</strong>:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>(</span><span class="z-keyword">new</span><span class="z-support z-class"> SecretProduct</span><span>())</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">renew</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>(</span><span class="z-keyword">new</span><span class="z-support z-class"> SecretProduct</span><span>())</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">cancel</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>(</span><span class="z-keyword">new</span><span class="z-support z-class"> SecretProduct</span><span>())</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">purchase</span><span>()</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">cancel</span><span>()</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">purchase</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>(</span><span class="z-keyword">new</span><span class="z-support z-class"> SecretProduct</span><span>())</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">purchase</span><span>()</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">cancel</span><span>()</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">renew</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>(</span><span class="z-keyword">new</span><span class="z-support z-class"> SecretProduct</span><span>())</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">purchase</span><span>()</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">purchase</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>(</span><span class="z-keyword">new</span><span class="z-support z-class"> SecretProduct</span><span>())</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">purchase</span><span>()</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">cancel</span><span>()</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">cancel</span><span>()</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>It is impossible:</p> <ul> <li>To renew or to cancel a product that has not been purchased,</li> <li>To purchase or renew a product that has been cancelled,</li> <li>To purchase a product more than once,</li> <li>To cancel a product more than once.</li> </ul> <p>Those followings are <strong>impossible implementations</strong>:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> SecretProduct</span><span class="z-storage"> implements</span><span class="z-entity z-other z-inherited-class"> Active</span><span class="z-punctuation z-separator">,</span><span class="z-entity z-other z-inherited-class"> Purchasable</span><span class="z-punctuation z-separator">,</span><span class="z-entity z-other z-inherited-class"> PurchasableOnce</span><span> {}</span></span></code></pre> <p>A product cannot be purchasable and once-off purchasable at the same time, because <code>Purchasable::purchase</code> is not compatible with <code>PurchasableOnce::purchase</code>.</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> SecretProduct</span><span class="z-storage"> implements</span><span class="z-entity z-other z-inherited-class"> Inactive</span><span class="z-punctuation z-separator">,</span><span class="z-entity z-other z-inherited-class"> Cancellable</span><span> {}</span></span></code></pre> <p>An inactive product cannot be purchased nor renewed nor cancelled because <code>Active::getProduct</code> and <code>Inactive::getProduct</code> are not compatible.</p> <p>Wow, that&#39;s great garantees isn&#39;t it? <strong>PHP will raise fatal errors for impossible actions or impossible states</strong>. No warnings or notices: Fatal errors. Most of them are correctly inferred by IDE, so… follow the red crosses in your IDE.</p> <h2 id="-7">Restoring a product<a role="presentation" class="anchor" href="#-7" title="Anchor link to this header">#</a> </h2> <p>One major thing is missing: The state of a product is stored in the database. When loading the product, we must be able to get an instance of a product at its previous state. To avoid repeating code, we will use traits. Rebuilding the state of a product is “just” (it really is) a composition of traits.</p> <p>Note: In these examples, we are using anonymous classes and traits. It is possible to achieve the same behavior with final named classes. Also we are using a repository, which is convenient for this article, but not necessarily the best solution.</p> <h3 id="-8">Repository<a role="presentation" class="anchor" href="#-8" title="Anchor link to this header">#</a> </h3> <p>The following <code>ProductRepository\load</code> function is just here to give you an idea of how it works.</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">namespace</span><span class="z-entity z-name"> ProductRepository</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage z-type z-function">function</span><span class="z-entity z-name z-function"> load</span><span>(</span><span class="z-keyword">int</span><span class="z-variable"> $id</span><span class="z-punctuation z-separator">,</span><span class="z-keyword"> string</span><span class="z-variable"> $state</span><span>)</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Product</span><span> {</span></span> <span class="giallo-l"><span class="z-comment"> // Load the product from the database with `$id`.</span></span> <span class="giallo-l"><span class="z-comment"> //</span></span> <span class="giallo-l"><span class="z-comment"> // The states can be `Renewable`, `Cancellable`, or `Inactive` (check</span></span> <span class="giallo-l"><span class="z-comment"> // the FSM to double-check). Products that have not been purchased</span></span> <span class="giallo-l"><span class="z-comment"> // are not in the database.</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Fake minimal active product.</span></span> <span class="giallo-l"><span class="z-variable"> $product</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span class="z-storage"> class implements</span><span class="z-entity z-other z-inherited-class"> Active</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> getProduct</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Active</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable z-language"> $this</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> }</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> switch</span><span> (</span><span class="z-variable">$state</span><span>) {</span></span> <span class="giallo-l"><span class="z-comment"> // State B.</span></span> <span class="giallo-l"><span class="z-keyword"> case</span><span class="z-support z-class"> Renewable</span><span class="z-keyword z-operator">::</span><span class="z-keyword">class</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-keyword"> return new</span><span class="z-storage"> class</span><span> (</span><span class="z-variable">$product</span><span>)</span><span class="z-storage"> implements</span><span class="z-entity z-other z-inherited-class"> Renewable</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> use</span><span class="z-support z-class"> ActiveProduct</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> use</span><span class="z-support z-class"> RenewableProduct</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> use</span><span class="z-support z-class"> CancellableProduct</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // State C.</span></span> <span class="giallo-l"><span class="z-keyword"> case</span><span class="z-support z-class"> Cancellable</span><span class="z-keyword z-operator">::</span><span class="z-keyword">class</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-keyword"> return new</span><span class="z-storage"> class</span><span> (</span><span class="z-variable">$product</span><span>)</span><span class="z-storage"> implements</span><span class="z-entity z-other z-inherited-class"> Cancellable</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> use</span><span class="z-support z-class"> ActiveProduct</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> use</span><span class="z-support z-class"> CancellableProduct</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // State D.</span></span> <span class="giallo-l"><span class="z-keyword"> case</span><span class="z-support z-class"> Inactive</span><span class="z-keyword z-operator">::</span><span class="z-keyword">class</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-keyword"> return new</span><span class="z-storage"> class</span><span> (</span><span class="z-variable">$product</span><span>)</span><span class="z-storage"> implements</span><span class="z-entity z-other z-inherited-class"> Inactive</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> use</span><span class="z-support z-class"> InactiveProduct</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Invalid state.</span></span> <span class="giallo-l"><span class="z-keyword"> default</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-keyword"> throw new</span><span class="z-support z-class"> RuntimeException</span><span>(</span><span class="z-string">&#39;Invalid product state.&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre><h3 id="-9">Traits<a role="presentation" class="anchor" href="#-9" title="Anchor link to this header">#</a> </h3> <p>The code must look familiar because this is just a split from the <code>SecretProduct</code> implementation.</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">trait</span><span class="z-entity z-name"> ActiveProduct</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> protected</span><span class="z-variable"> $product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-support z-function"> __construct</span><span>(</span><span class="z-support z-class">Product</span><span class="z-variable"> $product</span><span>) {</span></span> <span class="giallo-l"><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">product</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> getProduct</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Active</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">trait</span><span class="z-entity z-name"> RenewableProduct</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> renew</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Renewable</span><span> {</span></span> <span class="giallo-l"><span class="z-comment"> // Do the renew.</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable z-language"> $this</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">trait</span><span class="z-entity z-name"> CancellableProduct</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> cancel</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Inactive</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return new</span><span class="z-storage"> class</span><span> (</span><span class="z-variable z-language">$this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getProduct</span><span>())</span><span class="z-storage"> implements</span><span class="z-entity z-other z-inherited-class"> Inactive</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> protected</span><span class="z-variable"> $product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-support z-function"> __construct</span><span>(</span><span class="z-support z-class">Product</span><span class="z-variable"> $product</span><span>) {</span></span> <span class="giallo-l"><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">product</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-comment"> // Do the cancel.</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> getProduct</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Inactive</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> }</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">trait</span><span class="z-entity z-name"> InactiveProduct</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> protected</span><span class="z-variable"> $product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-support z-function"> __construct</span><span>(</span><span class="z-support z-class">Product</span><span class="z-variable"> $product</span><span>) {</span></span> <span class="giallo-l"><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">product</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> getProduct</span><span>()</span><span class="z-keyword z-operator">:</span><span class="z-support z-class"> Inactive</span><span> {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">product</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre><h3 id="-10">Assert possible and impossible actions<a role="presentation" class="anchor" href="#-10" title="Anchor link to this header">#</a> </h3> <p>The <strong>possible actions</strong> are:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$product</span><span class="z-keyword z-operator"> =</span><span> ProductRepository</span><span class="z-punctuation z-separator">\</span><span class="z-entity z-name z-function">load</span><span>(</span><span class="z-constant z-numeric">42</span><span class="z-punctuation z-separator">,</span><span class="z-support z-class"> Renewable</span><span class="z-keyword z-operator">::</span><span class="z-keyword">class</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-function">assert</span><span>(</span><span class="z-variable">$product</span><span class="z-keyword z-operator"> instanceof</span><span class="z-support z-class"> Product</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">assert</span><span>(</span><span class="z-variable">$product</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">renew</span><span>()</span><span class="z-keyword z-operator"> instanceof</span><span class="z-support z-class"> Product</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">assert</span><span>(</span><span class="z-variable">$product</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">cancel</span><span>()</span><span class="z-keyword z-operator"> instanceof</span><span class="z-support z-class"> Product</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>Product 42 is assumed to be in the state <code>B</code> (<code>Renewable::class</code>), so we can renew and cancel it.</p> <p>Those followings are <strong>impossible actions</strong>:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$product</span><span class="z-keyword z-operator"> =</span><span> ProductRepository</span><span class="z-punctuation z-separator">\</span><span class="z-entity z-name z-function">load</span><span>(</span><span class="z-constant z-numeric">42</span><span class="z-punctuation z-separator">,</span><span class="z-support z-class"> Renewable</span><span class="z-keyword z-operator">::</span><span class="z-keyword">class</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$product</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">purchase</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable">$product</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">cancel</span><span>()</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">cancel</span><span>()</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>It is impossible to purchase the product 42 because it is in state <code>B</code>, so it has already been purchased. It is impossible to cancel a product twice.</p> <p><strong>Same garantees apply here</strong>!</p> <h2 id="-11">Conclusion<a role="presentation" class="anchor" href="#-11" title="Anchor link to this header">#</a> </h2> <p>It is possible to re-implement <code>SecretProduct</code> with the traits we have defined for the <code>ProductRepository</code>, or to use named classes. I let this as an easy wrap up exercise for the reader.</p> <p>The real conclusion is that we have <strong>successfully implemented the Finite-State Machine of a product with a Type System</strong>. It is impossible to have an invalid implementation that violates the constraints, such as an inactive renewable product. PHP detects it immediately at runtime. Invalid actions are also impossible, such as purchasing a product twice, or renewing a once-off purchased product. It is also detected by PHP.</p> <p>All violations take the form of PHP fatal errors.</p> <p>The product repository is an example of how to restore a product at a particular state, with the help of the defined interfaces, and new small and simple traits.</p> <h2 id="-12">One more thing<a role="presentation" class="anchor" href="#-12" title="Anchor link to this header">#</a> </h2> <p>It is possible to integrate product categories in this type system (like bundles). It is more complex, but possible.</p> <p>I would highly recommend these following readings:</p> <ul> <li><a rel="noopener external" target="_blank" href="http://blogs.perl.org/users/ovid/2010/08/what-to-know-before-debating-type-systems.html">What to know before debating type systems</a> to have an overview of different systems,</li> <li><a rel="noopener external" target="_blank" href="https://sdleffler.github.io/RustTypeSystemTuringComplete/">Rust&#39;s Type System is Turing-Complete</a> to see how powerful a type system can be,</li> <li><a rel="noopener external" target="_blank" href="https://speakerdeck.com/willroth/fear-not-the-machine-of-state">Fear Not the Machine of State!</a> to see how to integrate an FSM into an object without using a type system.</li> </ul> <p>I would like to particularly emphasize a paragraph from the first article:</p> <blockquote> <p>So what is a type? The only true definition is this: a type is a <strong>label</strong> used by a type system to <strong>prove</strong> some property of the <strong>program&#39;s behavior</strong>. If the type checker can assign types to the whole program, then it succeeds in its proof; otherwise it fails and points out why it failed.</p> </blockquote> <p>Seeing types as labels is a very smart way of approaching them.</p> <p>I would like to thanks <a rel="noopener external" target="_blank" href="https://ocramius.github.io/">Marco Pivetta</a> for the reviews!</p> Tagua VM, a safe PHP virtual machine 2017-06-19T00:00:00+00:00 2017-06-19T00:00:00+00:00 Unknown https://mnt.io/articles/tagua-vm-a-safe-php-virtual-machine/ <iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/Ymy8qAEe0kQ?si=_7IlrTO1VzOriUKW" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen> </iframe> <p>PHPTour Nantes 2017 (in French):</p> <blockquote> <p>PHP est un langage extrêment populaire. En 2015, PHP était utilisé par plus de 80% de tous les sites Web. Cependant, 500 vulnérabilités sévères sont répertoriées. Bien qu'inhérent à tous langages populaires, cela reste très dangereux. L'objectif du projet Tagua VM est de fournir une VM PHP qui garantie un haut niveau de sûreté et de qualité en supprimant des larges classes de vulnérabilités, grâce à des outils appropriés comme Rust et LLVM. Rust est un langage remarquable qui apporte des garanties fortes à propos de la sûreté de la mémoire. C'est aussi un langage très rapide qui rivalise avec C. LLVM est une infrastructure de compilateur célèbre qui apporte de la modernité, des algorithmes à la pointe, des performances, une suite d'outils pour développeur etc. Ce projet va résoudre trois problèmes en une fois :</p> <ol> <li>Fournir un niveau haut niveau de sûreté et de qualité en supprimant des larges classes de vulnérabilité, et ainsi éviter des coûts de bugs dramatiques ;</li> <li>Fournir de la modernité, une nouvelle expérience développeur et des algorithmes à la pointe de la recherche, donc des performances ;</li> <li>Fournir un ensemble de bibliothèques qui vont composer la VM et qui pourront être réutiliser en dehors du projet (comme le parseur, les analyseurs, les extensions etc.).</li> </ol> <p>Durant cette conférence, nous présenterons les objectifs de ce projet, ainsi que son avancement. Nous expliquerons pourquoi il est crucial et pourquoi il reçoit le soutient d'une communauté grandissante et de développeurs notables (avec un rôle important dans le développement de PHP).</p> </blockquote> <p><a rel="noopener external" target="_blank" href="https://speakerdeck.com/hywan/tagua-vm-a-safe-php-virtual-machine">View slides</a>.</p> Faster find algorithms in nom 2017-05-23T00:00:00+00:00 2017-05-23T00:00:00+00:00 Unknown https://mnt.io/articles/faster-find-algorithms-in-nom/ <p><a rel="noopener external" target="_blank" href="https://github.com/tagua-vm/">Tagua VM</a> is an experimental PHP virtual machine written in Rust and LLVM. It is composed as a set of libraries. One of them that keeps me busy these days is <a rel="noopener external" target="_blank" href="https://github.com/tagua-vm/parser"><code>tagua-parser</code></a>. It contains the lexical and syntactic analysers for the PHP language, in addition to the AST (Abstract Syntax Tree). If you would like to know more about this project, you can see this conference I gave at the PHPTour last week: <a rel="noopener external" target="_blank" href="https://speakerdeck.com/hywan/tagua-vm-a-safe-php-virtual-machine">Tagua VM, a safe PHP virtual machine</a>.</p> <p>The library <code>tagua-parser</code> is built with parser combinators. Instead of having a classical grammar, compiled to a parser, we write pure functions acting as small parsers. We then combine them together. This post does not explain why this is a sane approach in our context, but keep in mind this is much easier to test, to maintain, and to optimise.</p> <p>Because this project is complex enought, we are delegating the parser combinator implementation to <a rel="noopener external" target="_blank" href="https://github.com/Geal/nom/">nom</a>.</p> <blockquote> <p>nom is a parser combinators library written in Rust. Its goal is to provide tools to build safe parsers without compromising the speed or memory consumption. To that end, it uses extensively Rust's <em>strong typing</em>, <em>zero copy</em> parsing, <em>push streaming</em>, <em>pull streaming</em>, and provides macros and traits to abstract most of the error prone plumbing.</p> </blockquote> <p>Recently, I have been working on optimisations in the <code>FindToken</code> and <code>FindSubstring</code> traits from nom itself. These traits provide methods to find a token (i.e. a lexeme), and to find a substring, crazy naming. However, this is not totally valid: <code>FindToken</code> expects to find a single item (if implemented for <code>u8</code>, it will look for a <code>u8</code> in a <code>&amp;[u8]</code>), and <code>FindSubstring</code> really is about finding a substring, so a token of any length.</p> <p>It appeared that these methods can be optimised in some cases. Both default implementations are using Rust iterators: Regular iterator for <code>FindToken</code>, and <a rel="noopener external" target="_blank" href="https://doc.rust-lang.org/std/slice/struct.Windows.html">window iterator</a> for <code>FindSubstring</code>, i.e. an iterator over overlapping subslices of a given length. We have benchmarked big PHP comments, which are analysed by parsers actively using these two trait implementations.</p> <p>Here are the result, before and after our optimisations:</p> <pre class="giallo z-code"><code data-lang="plain"><span class="giallo-l"><span>test …::bench_span ... bench: 73,433 ns/iter (+/- 3,869)</span></span> <span class="giallo-l"><span>test …::bench_span ... bench: 15,986 ns/iter (+/- 3,068)</span></span></code></pre> <p>A boost of 78%! Nice!</p> <p>The <a rel="noopener external" target="_blank" href="https://github.com/Geal/nom/pull/507">pull request has been merged</a> today, thank you Geoffroy Couprie! The new algorithms heavily rely on <a rel="noopener external" target="_blank" href="https://github.com/BurntSushi/rust-memchr">the <code>memchr</code> crate</a>. So all the credits should really go to Andrew Gallant! This crate provides a safe interface <code>libc</code>'s <code>memchr</code> and <code>memrchr</code>. It also provides fallback implementations when either function is unavailable.</p> <p>The new algorithms are only implemented for <code>&amp;[u8]</code> though. Fortunately, the implementation for <code>&amp;str</code> fallbacks to the former.</p> <p>This is small contribution, but it brings a very nice boost. Hope it will benefit to other projects!</p> <p>I am also blowing the dust off of <a rel="noopener external" target="_blank" href="https://www.amazon.com/Algorithms-Strings-Maxime-Crochemore/dp/0521848997">Algorithms on Strings</a>, by M. Crochemore, C. Hancart, and T. Lecroq. I am pretty sure it should be useful for nom and <code>tagua-parser</code>. If you haven't read this book yet, I can only encourage you to do so!</p> Welcome to Chaos 2017-04-24T00:00:00+00:00 2017-04-24T00:00:00+00:00 Unknown https://mnt.io/articles/welcome-to-chaos/ <p>Recently, <a href="https://mnt.io/articles/bye-bye-liip-hello-automattic/">I joined Automattic</a>. This is a world-wide distributed company. The first three weeks you incarn a Happiness Engineer. This is part of the Happiness Rotation duty. This article explains why I loved it, and why I reckon you should do it too.</p> <h2 id="happiness-engineer-really">Happiness Engineer, really?<a role="presentation" class="anchor" href="#happiness-engineer-really" title="Anchor link to this header">#</a> </h2> <p>Does it sound mad as a Cheshire cat? Pretentious maybe? Actually, it's not at all.</p> <p>As a Happiness Engineer, I had to make the support. This is part of the Happiness Rotation: Once a year, almost everyone swaps its position to help our users. I will go back on this later.</p> <p>My role was to make our users happy. To achieve that, I had to:</p> <ul> <li>Meet our users, understand who they are, what they want to achieve,</li> <li>Listen to and understand their issues,</li> <li>Find a way to fix the issues.</li> </ul> <h3 id="meet-the-users">Meet the users<a role="presentation" class="anchor" href="#meet-the-users" title="Anchor link to this header">#</a> </h3> <p>I need motivations in my job. Learning who our users are, and what they want to achieve, is a great motivation. After these three weeks, I know what my contributions will serve. It gives a meaning to each contribution, to each day I wake up.</p> <p>Especially in a distributed company on Internet, our users are world-wide, they speak almost all the languages on Earth, they are present on all continents. Their needs vary a lot, they use our software in ways I was not able to foresee.</p> <h3 id="listen-to-understand-and-fix-their-issues">Listen to, understand, and fix their issues<a role="presentation" class="anchor" href="#listen-to-understand-and-fix-their-issues" title="Anchor link to this header">#</a> </h3> <p>When you are chatting with a “support guy”, you cannot imagine this is a real engineer. This is not a random person filling a pre-defined vague form somewhere where it is cheap to hire her. You will chat with someone very competent. Someone that has no superior. Someone that has all the tools to make you happy.</p> <p>Personally, when I started, it was the first time I was using WordPress. I was more novice than the user I was talking to. So how to fix it on my end? I had to:</p> <ul> <li>Ask help to the right persons,</li> <li>Therefore, meet Automatticians (people working with Automattic),</li> <li>Discover all the interactions between them,</li> <li>Understand the structure of the company,</li> <li>How to ask help, how to formulate my questions, how to reformulate the issues of the users…</li> <li>Discover all the internal tools,</li> <li>Therefore, learn how the software work internally and together,</li> <li>Discover the giant internal and public documentations,</li> <li>When needed, create bug reports or feature requests to the appropriated teams,</li> <li>Learn the culture of the company.</li> </ul> <p>This is why it is called <em>Welcome to Chaos</em>. Yes, you have to learn a lot in three weeks, but it is extremely educative. This is like a speed training.</p> <h3 id="happiness">Happiness<a role="presentation" class="anchor" href="#happiness" title="Anchor link to this header">#</a> </h3> <p>I can ensure that when a user is grateful after you fixed its issue, the term Happiness Engineer makes a lot of sense. Automattic provides a lot of freedom to their Happiness Engineers to make people really happy, both in term of tooling or financial.</p> <p>This is the first time I see a company that is that much generous with its customers.</p> <h3 id="thanks-buddy">Thanks buddy<a role="presentation" class="anchor" href="#thanks-buddy" title="Anchor link to this header">#</a> </h3> <p>Of course, when embracing the chaos, you are not alone. Everyone is here to help you, and to answer your questions. After all, this is part of <a rel="noopener external" target="_blank" href="https://automattic.com/creed/">the Automattic's creed</a> (<a rel="noopener external" target="_blank" href="https://ma.tt/2011/09/automattic-creed/">story of the creed</a>):</p> <blockquote> <p>I will never stop learning. I won’t just work on things that are assigned to me. I know there’s no such thing as a status quo. I will build our business sustainably through passionate and loyal customers. <strong>I will never pass up an opportunity to help out a colleague</strong>, and <strong>I’ll remember the days before I knew everything</strong>. I am more motivated by impact than money, and I know that Open Source is one of the most powerful ideas of our generation. I will communicate as much as possible, because it’s the oxygen of a distributed company. I am in a marathon, not a sprint, and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day. Given time, there is no problem that’s insurmountable.</p> </blockquote> <p>In addition to everyone willing to help, a buddy was assigned to me. A person that helps and teaches you every time. This is very helpful. Thank you Hannah!</p> <h2 id="happiness-rotation">Happiness Rotation<a role="presentation" class="anchor" href="#happiness-rotation" title="Anchor link to this header">#</a> </h2> <p>This experience is great. But after some time, you might forget it. So as a reminder, once a year, you incarn a Happiness Engineer again. This is part of the happiness rotation. As far as I understand, it implies almost everyone in the company.</p> <p>Note: Obviously, there is permanent happiness engineers.</p> <h2 id="conclusion">Conclusion<a role="presentation" class="anchor" href="#conclusion" title="Anchor link to this header">#</a> </h2> <p>I deeply think this approach has many advantages. Some of them are listed above. It helps to understand the company, and more importantly the users. The happiness rotation stresses the fact that users are central to Automattic, probably like any companies, but not with this care. Remember the creed: I will build our business sustainably through passionate and loyal customers. To have passionate and loyal users, you need to know them.</p> <p>For me, it was a great experience. It was chaotic at first, but it is worth it.</p> Bye bye Liip, hello Automattic 2017-04-18T00:00:00+00:00 2017-04-18T00:00:00+00:00 Unknown https://mnt.io/articles/bye-bye-liip-hello-automattic/ <p>Since April 2017, I have left <a rel="noopener external" target="_blank" href="https://www.liip.ch/">Liip</a> to join <a rel="noopener external" target="_blank" href="https://automattic.com/">Automattic</a>.</p> <h2 id="bye-bye-liip">Bye bye Liip<a role="presentation" class="anchor" href="#bye-bye-liip" title="Anchor link to this header">#</a> </h2> <p>After almost 20 months at Liip, I am leaving. Liip was a great experience. It was my first industrial non-remote job. It was also my first job in the country I am currently living in. And I have discovered a new way of working.</p> <h3 id="first-industrial-non-remote-job">First industrial non-remote job<a role="presentation" class="anchor" href="#first-industrial-non-remote-job" title="Anchor link to this header">#</a> </h3> <p>Before working for Liip, I was working for <a rel="noopener external" target="_blank" href="https://fruux.com/">fruux</a>. My situation was the following: A french citizen, living as a foreigner in Switzerland, working for a German company, with employees from Germany, Holland, and Canada. Everything happened on chat, mail, and Skype. When my son was born, I had to change my work to simplify my life. It was not the only reason, but one of them.</p> <p>And before fruux, I was working for <a rel="noopener external" target="_blank" href="https://www.inria.fr/en/">INRIA</a>, a research institute in France. It was partially a remote job.</p> <p>Liip has several offices. I was based in Lausanne.</p> <p>So, yes, Liip was my first industrial non-remote job. And I liked it. Working in the train on the morning, walking in Lausanne, seeing real people, everything in my local language. Because yes, it was my first job in my native language too.</p> <p>Everything was simpler. And when you have your first baby, anything else that is simpler saves your life.</p> <h3 id="introducing-holacracy">Introducing Holacracy<a role="presentation" class="anchor" href="#introducing-holacracy" title="Anchor link to this header">#</a> </h3> <p>Giant discussions were happening to remove any form of hierarchy in Liip. Then we discovered <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Holacracy">Holacracy</a>, and we started moving to this system. This is a new governance system. If you are familiar with distributed network topologies in Computer Science, or data structures, it really looks like a <a rel="noopener external" target="_blank" href="https://hal.archives-ouvertes.fr/hal-00560821/document">Distributed Spanning Tree</a> [<a rel="noopener external" target="_blank" href="http://dblp.org/rec/html/journals/tpds/DahanPN09">DahanPN09</a>]. Note: I am sure that the authors of Holacracy are not aware of DST, but, eh.</p> <p>So nothing new from a research point of view, but it is cool to see this algorithm coming alive in real life. And it worked: Less meetings, more self-organisation, more shared responsabilities, no more “boss” etc. This is not a tool for all companies, but I am sure that if you are reading my blog, then your company should give it a try.</p> <h3 id="open-source-projects">Open source projects<a role="presentation" class="anchor" href="#open-source-projects" title="Anchor link to this header">#</a> </h3> <p>Liip has been very generous with me regarding my open source engagements. I was involved in <a rel="noopener external" target="_blank" href="https://hoa-project.net/">Hoa</a>, <a rel="noopener external" target="_blank" href="https://atoum.org/">atoum</a>, and <a rel="noopener external" target="_blank" href="https://github.com/FriendsOfPHP/pickle">Pickle</a> when joining the company. Liip gave me a 5% budget, so roughly 1 hour per day to work on Hoa. Thank you for that!</p> <p>After that, I have started a new big project, called <a rel="noopener external" target="_blank" href="https://github.com/tagua-vm/tagua-vm">Tagua VM</a>. They gave me an additional 5% budget. So I got 2 hours per day to work on Hoa and Tagua VM. Again, thank you for that!</p> <p>Finally, I have started an in-house open project called <a rel="noopener external" target="_blank" href="https://github.com/liip/TheA11yMachine">The A11y Machine</a> (a11ym for short). I have written a case study for this tool on the Liip's blog: <a rel="noopener external" target="_blank" href="https://blog.liip.ch/archive/2016/12/06/accessibility-with-a11ym.html">Accessibility: make your website barrier-free with a11ym!</a></p> <p>The goal of a11ym is to automate the accessibility testing of any site by crawling and testing each page. A sweet report is generated, showing all errors, warnings, and notices, with all information needed by the developer to fix the issues as fast as possible.</p> <figure> <p><img src="https://mnt.io/articles/bye-bye-liip-hello-automattic/./dashboard.jpg" alt="Dashboard" loading="lazy" decoding="async" /></p> <figcaption> <p>Dashboard of a11ym, showing the evolution of the accessibility of a site in time</p> </figcaption> </figure> <figure> <p><img src="https://mnt.io/articles/bye-bye-liip-hello-automattic/./report.png" alt="Report" loading="lazy" decoding="async" /></p> <figcaption> <p>A typical a11ym report listing all errors, warnings, and notices for a given URL</p> </figcaption> </figure> <p>This project has received really good feedbacks from the accessibility community. It has been downloaded 7000 times so far, which is not bad considering the niche it targets.</p> <p>A new SaaS platform is being build around this software. I enjoyed working on it, and it was really tangible.</p> <h3 id="">Main customer, huge project<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h3> <p>Liip is a Web agency, so you have dozens of customers at the same time. However, I was in a special team for an important customer. The site is a luxury watches and jewellery e-commerce platform, located in several countries, in 10 languages, accessible from 16 domains, shared in 2 datacenters. This is not a casual site.</p> <p>I learned a lot about all the complexity such a site brings: Checkout rules (oh damned…), product catalogs in different formats for different countries with different references, all the business logic inherent to each country, different payment providers, crazy front end compatibilities etc.</p> <p>I have a hundred of crazy anecdotes to tell. This was clearly not a job for me at first glance: I am a researcher, I have an open source culture background, I am not tailored for this kind of project. But at the end of the story, I learned a lot. Really a lot. I have a better overview of the crazy things any customer can ask, or has to deal with, and the infrastructure craziness that can be set up. I learned how to make better things: How to transform a really crappy software into something understandable by everyone, how to not break a 10+ years old progam with no test etc. And it requires skills. I learned it the hard way, but I learned it.</p> <h3 id="-1">Why leaving?<a role="presentation" class="anchor" href="#-1" title="Anchor link to this header">#</a> </h3> <p>Because even if I learned during my time at Liip, the Web agency model was definitively not for me. I am very thankful to every Liiper, I had a great time, I love the Web, but not in an agency.</p> <p>My son is now 21 months old, and I need fresh air. I can take new challenges.</p> <h2 id="-2">Welcome Automattic<a role="presentation" class="anchor" href="#-2" title="Anchor link to this header">#</a> </h2> <p><a rel="noopener external" target="_blank" href="https://automattic.com/">Automattic</a> is the company behind <a rel="noopener external" target="_blank" href="https://wordpress.com/">WordPress.com</a>, <a rel="noopener external" target="_blank" href="https://woocommerce.com/">WooCommerce</a>, <a rel="noopener external" target="_blank" href="https://akismet.com/">Akismet</a>, <a rel="noopener external" target="_blank" href="https://simplenote.com/">Simplenote</a>, <a rel="noopener external" target="_blank" href="https://cloudup.com/">Cloudup</a>, <a rel="noopener external" target="_blank" href="https://simperium.com">Simperium</a>, <a rel="noopener external" target="_blank" href="http://en.gravatar.com/">Gravatar</a> and other giant services.</p> <p>I came to Automattic by coincidence. I was looking for a sponsor for Tagua VM, and someone pointed me out Automattic. After some researches about the company, it appears that it could be a really great place where to work. So I applied.</p> <p>The hiring process was 4 months long. It was exhausting because it happened at the same time than a big sprint at Liip (remember the SaaS platform for The A11y Machine?). But after 4 months, it appears I succeeded, and I am very glad of that fact!</p> <p>I am just starting my job at Automattic. I don&#39;t have anything strong and finite to say now, apart that everything is just awesome so far. In few weeks, I am likely to write about my start at Automattic I did, see <a href="https://mnt.io/articles/welcome-to-chaos/">Welcome to Chaos</a>. They have a very interesting way to get you on board.</p> <p>Time for a new adventure!</p> DuckDuckGo in a Shell 2015-08-05T00:00:00+00:00 2015-08-05T00:00:00+00:00 Unknown https://mnt.io/articles/duckduckgo-in-a-shell/ <h2 id="the-tip">The tip<a role="presentation" class="anchor" href="#the-tip" title="Anchor link to this header">#</a> </h2> <p>When I go outside my terminal, I am kind of lost. I control everything from my terminal and I hate clicking. That's why I found a small tip today to open a search on DuckDuckGo directly from the terminal. It redirects me to my default browser in the background, which is the expected behavior.</p> <p>First, I create a function called <code>duckduckgo</code>:</p> <pre class="giallo z-code"><code data-lang="shellscript"><span class="giallo-l"><span class="z-storage z-type z-function">function</span><span class="z-entity z-name z-function"> duckduckgo</span><span class="z-punctuation z-section"> {</span></span> <span class="giallo-l"><span class="z-variable"> query</span><span class="z-keyword z-operator">=</span><span class="z-string">`</span><span class="z-entity z-name">php</span><span class="z-constant z-other"> -r</span><span class="z-string"> &#39;echo urlencode($argv[1]);&#39; &quot;</span><span class="z-variable z-parameter">$1</span><span class="z-string">&quot;`</span></span> <span class="giallo-l"><span class="z-entity z-name"> open</span><span class="z-string"> &#39;https://duckduckgo.com/?q=&#39;</span><span class="z-variable">$query</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span></code></pre> <p>Note how I (avoid to) deal with quotes in <code>$1</code>.</p> <p>Then, I just have to create an alias called <code>?</code>:</p> <pre class="giallo z-code"><code data-lang="shellscript"><span class="giallo-l"><span class="z-support z-function">alias</span><span class="z-string"> &#39;?&#39;=&#39;duckduckgo&#39;</span></span></code></pre> <p>And here we (duckduck) go!</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span class="z-keyword z-operator"> ?</span><span class="z-string"> &quot;foo bar&#39;s baz&quot;</span></span></code></pre> <p>You can <a rel="noopener external" target="_blank" href="https://github.com/Hywan/Dotfiles/commit/fab6d98448240a787eb0e34ab836c5c43d50379c">see the commit</a> that adds this to my “shell home framework”.</p> <p>Oh, and to open the default browser, I use <a rel="noopener external" target="_blank" href="https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/open.1.html"><code>open (1)</code></a>, like this:</p> <pre class="giallo z-code"><code data-lang="shellscript"><span class="giallo-l"><span class="z-storage">alias</span><span class="z-variable"> open</span><span class="z-keyword z-operator">=</span><span class="z-string">&#39;open -g&#39;</span></span></code></pre> <p>Hope it helps!</p> sabre/katana 2015-07-13T00:00:00+00:00 2015-07-13T00:00:00+00:00 Unknown https://mnt.io/articles/sabre-katana-a-contact-calendar-task-list-and-file-server/ <figure> <p><img src="https://mnt.io/articles/sabre-katana-a-contact-calendar-task-list-and-file-server/./logo-katana.png" alt="sabre/katana&#39;s logo" loading="lazy" decoding="async" /></p> <figcaption> <p>Project&#39;s logo.</p> </figcaption> </figure> <h2 id="">What is it?<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h2> <p><code>sabre/katana</code> is a contact, calendar, task list and file server. What does it mean? Assuming nowadays you have multiple devices (PC, phones, tablets, TVs…). If you would like to get your address books, calendars, task lists and files synced between all these devices from everywhere, you need a server. All your devices are then considered as clients.</p> <p>But there is an issue with the server. Most of the time, you might choose <a rel="noopener external" target="_blank" href="https://google.com/">Google</a> or maybe <a rel="noopener external" target="_blank" href="https://apple.com/">Apple</a>, but one may wonder: Can we trust these servers? Can we give them our private data, like all our contacts, our calendars, all our photos…? What if you are a company or an association and you have sensitive data that are really private or strategic? So, can you still trust them? Where the data are stored? Who can look at these data? More and more, there is a huge need for “personal” server.</p> <p>Moreover, servers like Google or Apple are often closed: You reach your data with specific clients and they are not available in all platforms. This is for strategic reasons of course. But with <code>sabre/katana</code>, you are not limited. See the above schema: Firefox OS can talk to iOS or Android at the same time.</p> <p><code>sabre/katana</code> is this kind of server. You can install it on your machine and manage users in a minute. Each user will have a collection of address books, calendars, task lists and files. This server can talk to a <a rel="noopener external" target="_blank" href="https://fruux.com/supported-devices/">loong list of devices</a>, mainly thanks to a scrupulous respect of industrial standards:</p> <ul> <li>macOS: <ul> <li>OS X 10.10 (Yosemite),</li> <li>OS X 10.9 (Mavericks),</li> <li>OS X 10.8 (Mountain Lion),</li> <li>OS X 10.7 (Lion),</li> <li>OS X 10.6 (Snow Leopard),</li> <li>OS X 10.5 (Leopard),</li> <li>BusyCal,</li> <li>BusyContacts,</li> <li>Fantastical,</li> <li>Rainlendar,</li> <li>ReminderFox,</li> <li>SoHo Organizer,</li> <li>Spotlife,</li> <li>Thunderbird ,</li> </ul> </li> <li>Windows: <ul> <li>eM Client,</li> <li>Microsoft Outlook 2013,</li> <li>Microsoft Outlook 2010,</li> <li>Microsoft Outlook 2007,</li> <li>Microsoft Outlook with Bynari WebDAV Collaborator,</li> <li>Microsoft Outlook with iCal4OL,</li> <li>Rainlendar,</li> <li>ReminderFox,</li> <li>Thunderbird,</li> </ul> </li> <li>Linux: <ul> <li>Evolution,</li> <li>Rainlendar,</li> <li>ReminderFox,</li> <li>Thunderbird,</li> </ul> </li> <li>Mobile: <ul> <li>Android,</li> <li>BlackBerry 10,</li> <li>BlackBerry PlayBook,</li> <li>Firefox OS,</li> <li>iOS 8,</li> <li>iOS 7,</li> <li>iOS 6,</li> <li>iOS 5,</li> <li>iOS 4,</li> <li>iOS 3,</li> <li>Nokia N9,</li> <li>Sailfish.</li> </ul> </li> </ul> <p>Did you find your device in this list? Probably yes 😉.</p> <p><code>sabre/katana</code> sits in the middle of all your devices and synced all your data. Of course, it is <strong>free</strong> and <strong>open source</strong>. <a rel="noopener external" target="_blank" href="https://github.com/fruux/sabre-katana/">Go check the source</a>!</p> <h2 id="-1">List of features<a role="presentation" class="anchor" href="#-1" title="Anchor link to this header">#</a> </h2> <p>Here is a non-exhaustive list of features supported by <code>sabre/katana</code>. Depending whether you are a user or a developer, the features that might interest you are radically not the same. I decided to show you a list from the user point of view. If you would like to get a list from the developer point of view, please see this <a rel="noopener external" target="_blank" href="http://sabre.io/dav/standards-support/">exhaustive list of supported RFC</a> for more details.</p> <h3 id="-2">Contacts<a role="presentation" class="anchor" href="#-2" title="Anchor link to this header">#</a> </h3> <p>All usual fields are supported, like phone numbers, email addresses, URLs, birthday, ringtone, texttone, related names, postal addresses, notes, HD photos etc. Of course, groups of cards are also supported.</p> <figure> <p><img src="https://mnt.io/articles/sabre-katana-a-contact-calendar-task-list-and-file-server/./card-inside-macos-client.png" alt="My card on macOS" loading="lazy" decoding="async" /></p> <figcaption> <p>My card inside the native Contact application of macOS.</p> </figcaption> </figure> <figure> <p><img src="https://mnt.io/articles/sabre-katana-a-contact-calendar-task-list-and-file-server/./card-inside-firefox-os-client.png" alt="My card on Firefox OS" loading="lazy" decoding="async" /></p> <figcaption> <p>My card inside the native Contact application of Firefox OS.</p> </figcaption> </figure> <p>My photo is not in HD, I really have to update it!</p> <p>Cards can be encoded into several formats. The most usual format is VCF. <code>sabre/katana</code> allows you to download the whole address book of a user as a single VCF file. You can also create, update and delete address books.</p> <h3 id="-3">Calendars<a role="presentation" class="anchor" href="#-3" title="Anchor link to this header">#</a> </h3> <p>A calendar is just a set of events. Each event has several properties, such as a title, a location, a date start, a date end, some notes, URLs, alarms etc. <code>sabre/katana</code> also support recurring events (“each last Monday of the month, at 11am…”), in addition to scheduling (see bellow).</p> <figure> <p><img src="https://mnt.io/articles/sabre-katana-a-contact-calendar-task-list-and-file-server/./calendars-inside-macos-client.png" alt="My calendars on macOS" loading="lazy" decoding="async" /></p> <figcaption> <p>My calendars inside the native Calendar application of macOS.</p> </figcaption> </figure> <figure> <p><img src="https://mnt.io/articles/sabre-katana-a-contact-calendar-task-list-and-file-server/./calendars-inside-firefox-os-client.png" alt="My calendars on Firefox OS" loading="lazy" decoding="async" /></p> <figcaption> <p>My calendars inside the native Calendar application of Firefox OS.</p> </figcaption> </figure> <p>Few words about calendar scheduling. Let&#39;s say you are organizing an event, like New release (we always enjoy release day!). You would like to invite several people but you don&#39;t know if they could be present or not. In your event, all you have to do is to add attendees. How are they going to be notified about this event? Two situations:</p> <ol> <li>Either attendees are registered on your <code>sabre/katana</code> server and they will receive an invite inside their calendar application (we call this iTIP),</li> <li>Or they are not registered on your server and they will receive an email with the event as an attached file (we call this iMIP). All they have to do is to open this event in their calendar application.</li> </ol> <figure> <p><img src="https://mnt.io/articles/sabre-katana-a-contact-calendar-task-list-and-file-server/./invite-by-email.png" alt="Typical mail to invite an attendee to an event" loading="lazy" decoding="async" /></p> <figcaption> <p>Invite an attendee by email because she is not registered on your <code>sabre/katana</code> server.</p> </figcaption> </figure> <p>Notice the gorgeous map embedded inside the email!</p> <p>Once they received the event, they can accept, decline or “don&#39;t know” (they will try to be present at) the event.</p> <figure> <p><img src="https://mnt.io/articles/sabre-katana-a-contact-calendar-task-list-and-file-server/./respond-to-invite.png" alt="Receive an invite to an event" loading="lazy" decoding="async" /></p> <figcaption> <p>Receive an invite to an event. Here: Gordon is inviting Hywan. Three choices for Hywan:</p> </figcaption> </figure> <figure> <p><img src="https://mnt.io/articles/sabre-katana-a-contact-calendar-task-list-and-file-server/./accepted-event.png" alt="Status of all attendees" loading="lazy" decoding="async" /></p> <figcaption> <p>Hywan has accepted the event. Here is what the event looks like. Hywan can see the response of each attendees.</p> </figcaption> </figure> <figure> <p><img src="https://mnt.io/articles/sabre-katana-a-contact-calendar-task-list-and-file-server/./notification.png" alt="Notification from attendees" loading="lazy" decoding="async" /></p> <figcaption> <p>Gordon is even notified that Hywan has accepted the event.</p> </figcaption> </figure> <p>Of course, attendees will be notified too if the event has been moved, canceled, refreshed etc.</p> <p>Calendars can be encoded into several formats. The most usal format is ICS. <code>sabre/katana</code> allows you to download the whole calendar of a user as a single ICS file. You can also create, update and delete calendars.</p> <h3 id="-4">Task lists<a role="presentation" class="anchor" href="#-4" title="Anchor link to this header">#</a> </h3> <p>A task list is exactly like a calendar (from a programmatically point of view). Instead of containg event objects, it contains todo objects.</p> <p><code>sabre/katana</code> supports group of tasks, reminder, progression etc.</p> <figure> <p><img src="https://mnt.io/articles/sabre-katana-a-contact-calendar-task-list-and-file-server/./tasks.png" alt="My task lists on macOS" loading="lazy" decoding="async" /></p> <figcaption> <p>My task lists inside the native Reminder application of macOS.</p> </figcaption> </figure> <p>Just like calendars, task lists can be encoded into several formats, whose ICS. <code>sabre/katana</code> allows you to download the whole task list of a user as a single ICS file. You can also create, update and delete task lists.</p> <h3 id="-5">Files<a role="presentation" class="anchor" href="#-5" title="Anchor link to this header">#</a> </h3> <p>Finally, <code>sabre/katana</code> creates a home collection per user: A personal directory that can contain files and directories and… synced between all your devices (as usual 😄).</p> <p><code>sabre/katana</code> also creates a special directory called <code>public/</code> which is a public directory. Every files and directories stored inside this directory are accessible to anyone that has the correct link. No listing is prompted to protect your public data.</p> <p>Just like contact, calendar and task list applications, you need a client application to connect to your home collection on <code>sabre/katana</code>.</p> <figure> <p><img src="https://mnt.io/articles/sabre-katana-a-contact-calendar-task-list-and-file-server/./connect-to-dav.png" alt="Connect to a server in macOS" loading="lazy" decoding="async" /></p> <figcaption> <p>Connect to a server with the Finder application of macOS.</p> </figcaption> </figure> <p>Then, your public directory on <code>sabre/katana</code> will be a regular directory as every other.</p> <figure> <p><img src="https://mnt.io/articles/sabre-katana-a-contact-calendar-task-list-and-file-server/./files.png" alt="List of my files" loading="lazy" decoding="async" /></p> <figcaption> <p>List of my files, right here in the Finder application of macOS.</p> </figcaption> </figure> <p><code>sabre/katana</code> is able to store any kind of files. Yes, any kinds. It&#39;s just files. However, it white-lists the kind of files that can be showed in the browser. Only images, audios, videos, texts, PDF and some vendor formats (like Microsoft Office) are considered as safe (for the server). This way, associations can share musics, videos or images, companies can share PDF or Microsoft Word documents etc. Maybe in the future <code>sabre/katana</code> might white-list more formats. If a format is not white-listed, the file will be forced to download.</p> <h2 id="sabre-katana">How is <code>sabre/katana</code> built?<a role="presentation" class="anchor" href="#sabre-katana" title="Anchor link to this header">#</a> </h2> <p><code>sabre/katana</code> is based on two big and solid projects:</p> <ol> <li><a rel="noopener external" target="_blank" href="http://sabre.io/"><code>sabre/dav</code></a>,</li> <li><a rel="noopener external" target="_blank" href="http://hoa-project.net/">Hoa</a>.</li> </ol> <p><code>sabre/dav</code> is one of the most powerful <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/CardDAV">CardDAV</a>, <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/CalDAV">CalDAV</a> and <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/WebDAV">WebDAV</a> framework in the planet. Trusted by the likes of <a rel="noopener external" target="_blank" href="https://www.atmail.com/">Atmail</a>, <a rel="noopener external" target="_blank" href="https://www.box.com/blog/in-search-of-an-open-source-webdav-solution/">Box</a>, <a rel="noopener external" target="_blank" href="https://fruux.com/">fruux</a> and <a rel="noopener external" target="_blank" href="http://owncloud.org/">ownCloud</a>, it powers millions of users world-wide! It is written in PHP and is open source.</p> <p>Hoa is a modular, extensible and structured set of PHP libraries. Fun fact: Also open source, this project is also trusted by <a rel="noopener external" target="_blank" href="http://owncloud.org/">ownCloud</a>, in addition to <a rel="noopener external" target="_blank" href="http://mozilla.org/">Mozilla</a>, <a rel="noopener external" target="_blank" href="http://jolicode.com/">joliCode</a> etc. Recently, this project has recorded more than 600,000 downloads and the community is about to reach 1000 people.</p> <p><code>sabre/katana</code> is then a program based on <code>sabre/dav</code> for the DAV part and Hoa for everything else, like the logic code inside the <code>sabre/dav</code>&#39;s plugins. The result is a ready-to-use server with a nice interface for the administration.</p> <p>To ensure code quality, we use <a rel="noopener external" target="_blank" href="http://atoum.org/">atoum</a>, a popular and modern test framework for PHP. So far, <code>sabre/dav</code> has more than 1000 assertions.</p> <h2 id="-6">Conclusion<a role="presentation" class="anchor" href="#-6" title="Anchor link to this header">#</a> </h2> <p><code>sabre/katana</code> is a server for contacts, calendars, task lists and files. Everything is synced, everytime and everywhere. It perfectly connects to a lot of devices on the market. Several features we need and use daily have been presented. This is the easiest and a secure way to host your own private data.</p> <p><a rel="noopener external" target="_blank" href="https://github.com/fruux/sabre-katana">Go download it</a>!</p> RFCs should provide executable test suites 2015-02-27T00:00:00+00:00 2015-02-27T00:00:00+00:00 Unknown https://mnt.io/articles/rfcs-should-provide-executable-test-suites/ <p>Recently, I implemented xCal and xCard formats inside the <code>sabre/dav</code> libraries. While testing the different RFCs against my implementation, several errata have been found. This article, first, quickly list them and, second, ask questions about how such errors can be present and how they can be easily revealed. If reading my dry humor about RFC errata is boring, the next sections are more interesting. The whole idea is: Why RFCs do not provide executable test suites?</p> <h2 id="what-is-xcal-and-xcard">What is xCal and xCard?<a role="presentation" class="anchor" href="#what-is-xcal-and-xcard" title="Anchor link to this header">#</a> </h2> <p>The Web is a read-only media. It is based on the HTTP protocol. However, there is the <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/WebDAV">WebDAV</a> protocol, standing for Web Distributed Authoring and Versioning. This is an extension to HTTP. <em>Et voilà !</em> The Web is a read and write media. WebDAV is standardized in <a rel="noopener external" target="_blank" href="https://tools.ietf.org/html/rfc2518">RFC2518</a> and <a rel="noopener external" target="_blank" href="https://tools.ietf.org/html/rfc4918">RFC4918</a>.</p> <p>Based on WebDAV, we have <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/CalDAV">CalDAV</a> and <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/CardDAV">CardDAV</a>, respectively for reading and writing calendars and addressbooks. They are standardized in <a rel="noopener external" target="_blank" href="https://tools.ietf.org/html/rfc4791">RFC4791</a>, <a rel="noopener external" target="_blank" href="https://tools.ietf.org/html/rfc6638">RFC6638</a> and <a rel="noopener external" target="_blank" href="https://tools.ietf.org/html/rfc6352">RFC6352</a>. Good! But these protocols only explain how to read and write, not how to represent a real calendar or an addressbook. So let's leave protocols for formats.</p> <p>The <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/ICalendar">iCalendar</a> format represents calendar events, like events (<code>VEVENT</code>), tasks (<code>VTODO</code>), journal entry (<code>VJOURNAL</code>, very rare…), free/busy time (<code>VFREEBUSY</code>) etc. The <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/VCard">vCard</a> format represents cards. The formats are very similar and share a common ancestry: This is a <strong>horrible</strong> line-, colon- and semicolon-, randomly-escaped based format. For instance:</p> <pre class="giallo z-code"><code data-lang="plain"><span class="giallo-l"><span>BEGIN:VCALENDAR</span></span> <span class="giallo-l"><span>VERSION:2.0</span></span> <span class="giallo-l"><span>CALSCALE:GREGORIAN</span></span> <span class="giallo-l"><span>PRODID:-//Example Inc.//Example Calendar//EN</span></span> <span class="giallo-l"><span>BEGIN:VEVENT</span></span> <span class="giallo-l"><span>DTSTAMP:20080205T191224Z</span></span> <span class="giallo-l"><span>DTSTART;VALUE=DATE:20081006</span></span> <span class="giallo-l"><span>SUMMARY:Planning meeting</span></span> <span class="giallo-l"><span>UID:4088E990AD89CB3DBB484909</span></span> <span class="giallo-l"><span>END:VEVENT</span></span> <span class="giallo-l"><span>END:VCALENDAR</span></span></code></pre> <p>Horrible, yes. You were warned. These formats are standardized in several RFCs, to list some of them: <a rel="noopener external" target="_blank" href="https://tools.ietf.org/html/rfc5545">RFC5545</a>, <a rel="noopener external" target="_blank" href="http://tools.ietf.org/html/rfc2426">RFC2426</a> and <a rel="noopener external" target="_blank" href="http://tools.ietf.org/html/rfc6350">RFC6350</a>.</p> <p>This format is impossible to read, even for a computer. That's why we have jCal and jCard, which are respectively another representation of iCalendar and vCard but in <a rel="noopener external" target="_blank" href="http://json.org/">JSON</a>. JSON is quite popular in the Web today, especially because it eases the manipulation and exchange of data in Javascript. This is just a very simple, and —from my point of view— human readable, serialization format. jCal and jCard are respectively standardized in <a rel="noopener external" target="_blank" href="http://tools.ietf.org/html/rfc7265">RFC7265</a> and <a rel="noopener external" target="_blank" href="http://tools.ietf.org/html/rfc7095">RFC7095</a>. Thus, the equivalent of the previous iCalendar example in jCal is:</p> <pre class="giallo z-code"><code data-lang="json"><span class="giallo-l"><span>[</span></span> <span class="giallo-l"><span class="z-string"> &quot;vcalendar&quot;</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> [</span></span> <span class="giallo-l"><span> [</span><span class="z-string">&quot;version&quot;</span><span class="z-punctuation z-separator">,</span><span> {}</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;text&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;2.0&quot;</span><span>]</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> [</span><span class="z-string">&quot;calscale&quot;</span><span class="z-punctuation z-separator">,</span><span> {}</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;text&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;GREGORIAN&quot;</span><span>]</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> [</span><span class="z-string">&quot;prodid&quot;</span><span class="z-punctuation z-separator">,</span><span> {}</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;text&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;-</span><span class="z-constant z-character">\/\/</span><span class="z-string">Example Inc.</span><span class="z-constant z-character">\/\/</span><span class="z-string">Example Calendar</span><span class="z-constant z-character">\/\/</span><span class="z-string">EN&quot;</span><span>]</span></span> <span class="giallo-l"><span> ]</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> [</span></span> <span class="giallo-l"><span> [</span></span> <span class="giallo-l"><span class="z-string"> &quot;vevent&quot;</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> [</span></span> <span class="giallo-l"><span> [</span><span class="z-string">&quot;dtstamp&quot;</span><span class="z-punctuation z-separator">,</span><span> {}</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;date-time&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;2008-02-05T19:12:24Z&quot;</span><span>]</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> [</span><span class="z-string">&quot;dtstart&quot;</span><span class="z-punctuation z-separator">,</span><span> {}</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;date&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;2008-10-06&quot;</span><span>]</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> [</span><span class="z-string">&quot;summary&quot;</span><span class="z-punctuation z-separator">,</span><span> {}</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;text&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;Planning meeting&quot;</span><span>]</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> [</span><span class="z-string">&quot;uid&quot;</span><span class="z-punctuation z-separator">,</span><span> {}</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;text&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;4088E990AD89CB3DBB484909&quot;</span><span>]</span></span> <span class="giallo-l"><span> ]</span></span> <span class="giallo-l"><span> ]</span></span> <span class="giallo-l"><span> ]</span></span> <span class="giallo-l"><span>]</span></span></code></pre> <p>Much better. But this is JSON, which is a rather loose format, so we also have xCal and xCard another representation of iCalendar and vCard but in <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/XML">XML</a>. They are standardized in <a rel="noopener external" target="_blank" href="https://tools.ietf.org/html/rfc6321">RFC6321</a> and <a rel="noopener external" target="_blank" href="https://tools.ietf.org/html/rfc6351">RFC6351</a>. The same example in xCal looks like this:</p> <pre class="giallo z-code"><code data-lang="xml"><span class="giallo-l"><span class="z-punctuation z-definition z-tag">&lt;</span><span class="z-entity z-name z-tag">icalendar</span><span class="z-entity z-other z-attribute-name"> xmlns</span><span>=</span><span class="z-string">&quot;urn:ietf:params:xml:ns:icalendar-2.0&quot;</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">vcalendar</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">properties</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">version</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">text</span><span class="z-punctuation z-definition z-tag">&gt;</span><span>2.0</span><span class="z-punctuation z-definition z-tag">&lt;/</span><span class="z-entity z-name z-tag">text</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;/</span><span class="z-entity z-name z-tag">version</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">calscale</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">text</span><span class="z-punctuation z-definition z-tag">&gt;</span><span>GREGORIAN</span><span class="z-punctuation z-definition z-tag">&lt;/</span><span class="z-entity z-name z-tag">text</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;/</span><span class="z-entity z-name z-tag">calscale</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">prodid</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">text</span><span class="z-punctuation z-definition z-tag">&gt;</span><span>-//Example Inc.//Example Calendar//EN</span><span class="z-punctuation z-definition z-tag">&lt;/</span><span class="z-entity z-name z-tag">text</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;/</span><span class="z-entity z-name z-tag">prodid</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;/</span><span class="z-entity z-name z-tag">properties</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">components</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">vevent</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">properties</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">dtstamp</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">date-time</span><span class="z-punctuation z-definition z-tag">&gt;</span><span>2008-02-05T19:12:24Z</span><span class="z-punctuation z-definition z-tag">&lt;/</span><span class="z-entity z-name z-tag">date-time</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;/</span><span class="z-entity z-name z-tag">dtstamp</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">dtstart</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">date</span><span class="z-punctuation z-definition z-tag">&gt;</span><span>2008-10-06</span><span class="z-punctuation z-definition z-tag">&lt;/</span><span class="z-entity z-name z-tag">date</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;/</span><span class="z-entity z-name z-tag">dtstart</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">summary</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">text</span><span class="z-punctuation z-definition z-tag">&gt;</span><span>Planning meeting</span><span class="z-punctuation z-definition z-tag">&lt;/</span><span class="z-entity z-name z-tag">text</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;/</span><span class="z-entity z-name z-tag">summary</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">uid</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;</span><span class="z-entity z-name z-tag">text</span><span class="z-punctuation z-definition z-tag">&gt;</span><span>4088E990AD89CB3DBB484909</span><span class="z-punctuation z-definition z-tag">&lt;/</span><span class="z-entity z-name z-tag">text</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;/</span><span class="z-entity z-name z-tag">uid</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;/</span><span class="z-entity z-name z-tag">properties</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;/</span><span class="z-entity z-name z-tag">vevent</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;/</span><span class="z-entity z-name z-tag">components</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag"> &lt;/</span><span class="z-entity z-name z-tag">vcalendar</span><span class="z-punctuation z-definition z-tag">&gt;</span></span> <span class="giallo-l"><span class="z-punctuation z-definition z-tag">&lt;/</span><span class="z-entity z-name z-tag">icalendar</span><span class="z-punctuation z-definition z-tag">&gt;</span></span></code></pre> <p>More semantics, more meaning, easier to read (from my point of view), namespaces… It is very easy to <strong>embed</strong> xCal and xCard inside other XML formats.</p> <p>Managing all these formats is an extremely laborious task. I suggest you to take a look at <a rel="noopener external" target="_blank" href="http://sabre.io/vobject/"><code>sabre/vobject</code></a> (see <a rel="noopener external" target="_blank" href="https://github.com/fruux/sabre-vobject/">the Github repository of <code>sabre/vobject</code></a>). This is a PHP library to manage all the weird formats. The following example shows how to read from iCalendar and write to jCal and xCal:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Read iCalendar.</span></span> <span class="giallo-l"><span class="z-variable">$document</span><span class="z-keyword z-operator"> =</span><span> Sabre</span><span class="z-punctuation z-separator">\</span><span>VObject</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Reader</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">read</span><span>(</span><span class="z-variable">$icalendar</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Write jCal.</span></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span> Sabre</span><span class="z-punctuation z-separator">\</span><span>VObject</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Writer</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">writeJson</span><span>(</span><span class="z-variable">$document</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// Write xCal.</span></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span> Sabre</span><span class="z-punctuation z-separator">\</span><span>VObject</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Writer</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">writeXml</span><span>(</span><span class="z-variable">$document</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>Magic when you know the complexity of these formats (in both term of parsing and validation)!</p> <h2 id="list-of-errata">List of errata<a role="presentation" class="anchor" href="#list-of-errata" title="Anchor link to this header">#</a> </h2> <p>Now, let's talk about all the errata I submited recently:</p> <ul> <li><a rel="noopener external" target="_blank" href="http://www.rfc-editor.org/errata_search.php?eid=4241">4241, in RFC6351</a> (xCard),</li> <li><a rel="noopener external" target="_blank" href="http://www.rfc-editor.org/errata_search.php?eid=4243">4243, in RFC6351</a> (xCard),</li> <li><a rel="noopener external" target="_blank" href="http://www.rfc-editor.org/errata_search.php?eid=4246">4246, in RFC6350</a> (vCard),</li> <li><a rel="noopener external" target="_blank" href="http://www.rfc-editor.org/errata_search.php?eid=4247">4247, in RFC6351</a> (xCard),</li> <li><a rel="noopener external" target="_blank" href="http://www.rfc-editor.org/errata_search.php?eid=4245">4245, in RFC6350</a> (vCard),</li> <li><a rel="noopener external" target="_blank" href="http://www.rfc-editor.org/errata_search.php?eid=4261">4261, in RFC6350</a> (vCard).</li> </ul> <p>The 2 last ones are reported, not yet verified.</p> <p>4241, 4243 and 4246 are just typos in examples. “<em>just</em>” is a bit of an under-statement when you are reading RFCs for days straight, you have 10 of them opened in your browser and trying to figure out how everything fits together and if you are doing everything correctly. Finding typos at that point in your process can be very confusing…</p> <p>4247 is more subtle. The RFC about xCard comes with an <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/XML_Schema_%28W3C%29">XML Schema</a>. That's great! It will help us to test our documents and see if they are valid or not! No? No.</p> <p>Most of the time, I try to relax and deal with the incoming problems. But the date and time format in iCalendar, vCard, jCal, jCard, xCal and xCard can make my blood boil in a second. In what world, exactly, <code>--10</code> or <code>---28</code> is a conceivable date and time format? How long did I sleep? “Well” — was I saying to myself, “do not make a drama, we have the XML Schema!”. No. Because there is an error in the schema. More precisely, in a regular expression:</p> <pre class="giallo z-code"><code data-lang="plain"><span class="giallo-l"><span>value-time = element time {</span></span> <span class="giallo-l"><span> xsd:string { pattern = &quot;(\d\d(\d\d(\d\d)?)?|-\d\d(\d\d?)|--\d\d)&quot;</span></span> <span class="giallo-l"><span> ~ &quot;(Z|[+\-]\d\d(\d\d)?)?&quot; }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Did you find the error? <code>(\d\d?)</code> is invalid, this is <code>(\d\d)?</code>. Don't get me wrong: Everyone makes mistakes, but not this kind of error. I will explain why in the next section.</p> <p>4245 is not an editorial error but a technical one, under review.</p> <p>4261 is crazy. It deserves a whole sub-section.</p> <h3 id="welcome-in-the-crazy-world-of-date-and-time-formats">Welcome in the crazy world of date and time formats<a role="presentation" class="anchor" href="#welcome-in-the-crazy-world-of-date-and-time-formats" title="Anchor link to this header">#</a> </h3> <p>There are two major popular date and time format: <a rel="noopener external" target="_blank" href="http://tools.ietf.org/html/rfc2822">RFC2822</a> and ISO.8601. Examples:</p> <ul> <li><code>Fri, 27 Feb 2015 16:06:58 +0100</code> and</li> <li><code>2015-02-27T16:07:16+01:00</code>.</li> </ul> <p>The second one is a good candidate for a computer representation: no locale, only digits, all information are present…</p> <p>Maybe you noticed there is no link on ISO.8601. Why? Because ISO standards are not free and I don't want <a rel="noopener external" target="_blank" href="http://www.iso.org/iso/catalogue_detail?csnumber=40874">to pay 140€</a> to buy a standard…</p> <p>The date and time format adopted by iCalendar and vCard (and the rest of the family) is ISO.8601.2004. I cannot read it. However, since we said in xCard we have an XML Schema; we can read this (after having applied erratum 4247):</p> <pre class="giallo z-code"><code data-lang="plain"><span class="giallo-l"><span># 4.3.1</span></span> <span class="giallo-l"><span>value-date = element date {</span></span> <span class="giallo-l"><span> xsd:string { pattern = &quot;\d{8}|\d{4}-\d\d|--\d\d(\d\d)?|---\d\d&quot; }</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span># 4.3.2</span></span> <span class="giallo-l"><span>value-time = element time {</span></span> <span class="giallo-l"><span> xsd:string { pattern = &quot;(\d\d(\d\d(\d\d)?)?|-\d\d(\d\d)?|--\d\d)&quot;</span></span> <span class="giallo-l"><span> ~ &quot;(Z|[+\-]\d\d(\d\d)?)?&quot; }</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span># 4.3.3</span></span> <span class="giallo-l"><span>value-date-time = element date-time {</span></span> <span class="giallo-l"><span> xsd:string { pattern = &quot;(\d{8}|--\d{4}|---\d\d)T\d\d(\d\d(\d\d)?)?&quot;</span></span> <span class="giallo-l"><span> ~ &quot;(Z|[+\-]\d\d(\d\d)?)?&quot; }</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span># 4.3.4</span></span> <span class="giallo-l"><span>value-date-and-or-time = value-date | value-date-time | value-time</span></span></code></pre> <p>Question: <strong><code>--10</code> is October or 10 seconds</strong>?</p> <p><code>--10</code> can fit into <code>value-date</code> and <code>value-time</code>:</p> <ul> <li>From <code>value-date</code>, the 3rd element in the disjunction is <code>--\d\d(\d\d)?</code>, so it matches <code>--10</code> ,</li> <li>From <code>value-time</code>, the last element in the first disjunction is <code>--\d\d</code>, so it matches <code>--10</code>.</li> </ul> <p>If we have a date-and-or-time value, <code>value-date</code> comes first, so <code>--10</code> is always October. Nevertheless, if we have a time value, <code>--10</code> is 10 seconds. Crazy now?</p> <p>Oh, and XML has its own date and time format, which is well-defined and standardized. Why should we drag this crazy format along?</p> <p>Oh, and I assume every format depending on ISO.8601.2004 has this bug. But I am not sure because ISO standards are not free.</p> <h2 id="how-can-rfcs-have-such-errors">How can RFCs have such errors?<a role="presentation" class="anchor" href="#how-can-rfcs-have-such-errors" title="Anchor link to this header">#</a> </h2> <p>So far, RFCs are textual standards. Great. But they are just text. Written by humans, and thus they are subject to errors or failures. It is even error-prone. I do not understand: Why an RFC does not come with an <strong>executable test suite</strong>? I am pretty sure every reader of an RFC will try to create a test suite on its own.</p> <p>I assume xCal and xCard formats are not yet very popular. Consequently, few people read the RFC and tried to write an implementation. This is my guess. However, it does not avoid the fact an executable test suite should (must?) be provided.</p> <h2 id="how-did-i-find-them">How did I find them?<a role="presentation" class="anchor" href="#how-did-i-find-them" title="Anchor link to this header">#</a> </h2> <p>This is how I found these errors. I wrote <a rel="noopener external" target="_blank" href="https://github.com/fruux/sabre-vobject/blob/master/tests/VObject/Parser/XmlTest.php">a test suite for xCal and xCard in <code>sabre/vobject</code></a>. I would love to write a test suite agnostic of the implementation, but I ran out of time. This is basically format transformation: R:x→y where R can be a reflexive operator or not (depending of the versions of iCalendar and vCard we consider).</p> <p>For “simple“ errata, I found the errors by testing it manually. For errata 4247 and 4261 (with the regular expressions), I found the error by applying the algorithms presented in <a href="https://mnt.io/articles/generate-strings-based-on-regular-expressions/">Generate strings based on regular expressions</a> .</p> <h2 id="conclusion">Conclusion<a role="presentation" class="anchor" href="#conclusion" title="Anchor link to this header">#</a> </h2> <p><code>sabre/vobject</code> supports xCal and xCard.</p> Control the terminal, the right way 2015-01-04T00:00:00+00:00 2015-01-04T00:00:00+00:00 Unknown https://mnt.io/articles/control-the-terminal-the-right-way/ <p>Nowadays, there are plenty of terminal emulators in the wild. Each one has a specific way to handle controls. How many colours does it support? How to control the style of a character? How to control more than style, like the cursor or the window? In this article, we are going to explain and show in action the right ways to control your terminal with a portable and an easy to maintain API. We are going to talk about <code>stat</code>, <code>tput</code>, <code>terminfo</code>, <code>hoa/ console</code>… but do not be afraid, it's easy and fun!</p> <h2 id="introduction">Introduction<a role="presentation" class="anchor" href="#introduction" title="Anchor link to this header">#</a> </h2> <p>Terminals. They are the ancient interfaces, still not old fashioned yet. They are fast, efficient, work remotely with a low bandwidth, secured and very simple to use.</p> <p>A terminal is a canvas composed of columns and lines. Only one character fits at a position. According to the terminal, we have some features enabled; for instance, a character might be stylized with a colour, a decoration, a weight etc. Let's consider the former. A colour belongs to a palette, which contains either 2, 8, 256 or more colours. One may wonder:</p> <ul> <li>How many colours does a terminal support?</li> <li>How to control the style of a character?</li> <li>How to control more than style, like the cursor or the window?</li> </ul> <p>Well, this article is going to explain how a terminal works and how we interact with it. We are going to talk about terminal capabilities, terminal information (stored in database) and <a rel="noopener external" target="_blank" href="http://github.com/hoaproject/Console"><code>Hoa\Console</code></a>, a PHP library that provides advanced terminal controls.</p> <h2 id="the-basis-of-a-terminal">The basis of a terminal<a role="presentation" class="anchor" href="#the-basis-of-a-terminal" title="Anchor link to this header">#</a> </h2> <p>A terminal, or a console, is an interface that allows to interact with the computer. This interface is textual. Like a graphical interface, there are inputs: The keyboard and the mouse, and ouputs: The screen or a file (a real file, a socket, a FIFO, something else…).</p> <p>There is a ton of terminals. The most famous ones are:</p> <ul> <li><a rel="noopener external" target="_blank" href="http://invisible-island.net/xterm/xterm.html">xterm</a>,</li> <li><a rel="noopener external" target="_blank" href="http://iterm2.com/">iTerm2</a>,</li> <li><a rel="noopener external" target="_blank" href="http://software.schmorp.de/pkg/rxvt-unicode.html">urxvt</a>,</li> <li><a rel="noopener external" target="_blank" href="http://ttssh2.sourceforge.jp/">TeraTerm</a>.</li> </ul> <p>Whatever the terminal you use, inputs are handled by programs (or processus) and outputs are produced by these latters. We said outputs can be the screen or a file. Actually, everything is a file, so the screen is also a file. However, the user is able to use <a rel="noopener external" target="_blank" href="http://gnu.org/software/bash/manual/bashref.html#Redirections">redirections</a> to choose where the ouputs must go.</p> <p>Let's consider the <code>echo</code> program that prints all its options/arguments on its output. Thus, in the following example, <code>foobar</code> is printed on the screen:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> echo </span><span class="z-string">&#39;foobar&#39;</span></span></code></pre> <p>And in the following example, <code>foobar</code> is redirected to a file called <code>log</code>:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> echo </span><span class="z-string">&#39;foobar&#39;</span><span class="z-keyword z-operator"> &gt;</span><span> log</span></span></code></pre> <p>We are also able to redirect the output to another program, like <code>wc</code> that counts stuff:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> echo </span><span class="z-string">&#39;foobar&#39;</span><span class="z-keyword z-operator"> |</span><span class="z-entity z-name"> wc</span><span class="z-constant z-other"> -c</span></span> <span class="giallo-l"><span>7</span></span></code></pre> <p>Now we know there are 7 characters in <code>foobar</code>… no! <code>echo</code> automatically adds a new-line (<code>\n</code>) after each line; so:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> echo -n </span><span class="z-string">&#39;foobar&#39;</span><span class="z-keyword z-operator"> |</span><span class="z-entity z-name"> wc</span><span class="z-constant z-other"> -c</span></span> <span class="giallo-l"><span>6</span></span></code></pre> <p>This is more correct!</p> <h2 id="detecting-type-of-pipes">Detecting type of pipes<a role="presentation" class="anchor" href="#detecting-type-of-pipes" title="Anchor link to this header">#</a> </h2> <p>Inputs and outputs are called <strong>pipes</strong>. Yes, trivial, this is nothing more than basic pipes!</p> <p>There are 3 standard pipes:</p> <ul> <li><code>STDIN</code>, standing for the standard input pipe,</li> <li><code>STDOUT</code>, standing for the standard output pipe and</li> <li><code>STDERR</code>, standing for the standard error pipe (also an output one).</li> </ul> <p>If the output is attached to the screen, we say this is a “direct output”. Why is it important? Because if we stylize a text, this is <strong>only for the screen</strong>, not for a file. A file should receive regular text, not all the decorations and styles.</p> <p>Hopefully, the <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Console/blob/master/Source/Console.php"><code>Hoa\Console\Console</code> class</a> provides the <code>isDirect</code>, <code>isPipe</code> and <code>isRedirection</code> static methods to know whether the pipe is respectively direct, a pipe or a redirection (damn naming…!). Thus, let <code>Type.php</code> be the following program:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span class="z-string"> &#39;is direct: &#39;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">var_dump</span><span>(Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Console</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">isDirect</span><span>(</span><span class="z-support z-constant">STDOUT</span><span>))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span class="z-string"> &#39;is pipe: &#39;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">var_dump</span><span>(Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Console</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">isPipe</span><span>(</span><span class="z-support z-constant">STDOUT</span><span>))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span class="z-string"> &#39;is redirection: &#39;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">var_dump</span><span>(Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Console</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">isRedirection</span><span>(</span><span class="z-support z-constant">STDOUT</span><span>))</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>Now, let's test our program:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> php Type.php</span></span> <span class="giallo-l"><span>is direct: bool(true)</span></span> <span class="giallo-l"><span>is pipe: bool(false)</span></span> <span class="giallo-l"><span>is redirection: bool(false)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> php Type.php </span><span class="z-keyword z-operator">|</span><span class="z-entity z-name"> xargs</span><span class="z-constant z-other"> -I@</span><span class="z-string"> echo @</span></span> <span class="giallo-l"><span>is direct: bool(false)</span></span> <span class="giallo-l"><span>is pipe: bool(true)</span></span> <span class="giallo-l"><span>is redirection: bool(false)</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> php Type.php </span><span class="z-keyword z-operator">&gt;</span><span> /tmp/foo</span><span class="z-punctuation z-terminator">;</span><span class="z-entity z-name"> cat</span><span class="z-string"> !!</span><span>$</span></span> <span class="giallo-l"><span>is direct: bool(false)</span></span> <span class="giallo-l"><span>is pipe: bool(false)</span></span> <span class="giallo-l"><span>is redirection: bool(true)</span></span></code></pre> <p>The first execution is very classic. <code>STDOUT</code>, the standard output, is direct. The second execution redirects the output to another program, then <code>STDOUT</code> is of kind pipe. Finally, the last execution redirects the output to a file called <code>/tmp/foo</code>, so <code>STDOUT</code> is a redirection.</p> <p>How does it work? We use <a rel="noopener external" target="_blank" href="http://php.net/fstat"><code>fstat</code></a> to read the <code>mode</code> of the file. The underlying <code>fstat</code> implementation is defined in C, so let's take a look at the <a rel="noopener external" target="_blank" href="http://man.cx/fstat%282%29">documentation of <code>fstat(2)</code></a>. <code>stat</code> is a C structure that looks like:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-storage z-type">struct</span><span> stat </span><span class="z-punctuation z-section">{</span></span> <span class="giallo-l"><span class="z-storage z-type"> dev_t</span><span> st_dev</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* device inode resides on */</span></span> <span class="giallo-l"><span class="z-storage z-type"> ino_t</span><span> st_ino</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* inode&#39;s number */</span></span> <span class="giallo-l"><span class="z-storage z-type"> mode_t</span><span> st_mode</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* inode protection mode */</span></span> <span class="giallo-l"><span class="z-storage z-type"> nlink_t</span><span> st_nlink</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* number of hard links to the file */</span></span> <span class="giallo-l"><span class="z-storage z-type"> uid_t</span><span> st_uid</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* user-id of owner */</span></span> <span class="giallo-l"><span class="z-storage z-type"> gid_t</span><span> st_gid</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* group-id of owner */</span></span> <span class="giallo-l"><span class="z-storage z-type"> dev_t</span><span> st_rdev</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* device type, for special file inode */</span></span> <span class="giallo-l"><span class="z-storage z-type"> struct</span><span> timespec st_atimespec</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* time of last access */</span></span> <span class="giallo-l"><span class="z-storage z-type"> struct</span><span> timespec st_mtimespec</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* time of last data modification */</span></span> <span class="giallo-l"><span class="z-storage z-type"> struct</span><span> timespec st_ctimespec</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* time of last file status change */</span></span> <span class="giallo-l"><span class="z-storage z-type"> off_t</span><span> st_size</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* file size, in bytes */</span></span> <span class="giallo-l"><span class="z-storage z-type"> quad_t</span><span> st_blocks</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* blocks allocated for file */</span></span> <span class="giallo-l"><span class="z-storage z-type"> u_long</span><span> st_blksize</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* optimal file sys I/O ops blocksize */</span></span> <span class="giallo-l"><span class="z-storage z-type"> u_long</span><span> st_flags</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* user defined flags for file */</span></span> <span class="giallo-l"><span class="z-storage z-type"> u_long</span><span> st_gen</span><span class="z-punctuation z-terminator">;</span><span class="z-comment"> /* file generation number */</span></span> <span class="giallo-l"><span class="z-punctuation z-section">}</span></span></code></pre> <p>The value of <code>mode</code> returned by the PHP <code>fstat</code> function is equal to <code>st_mode</code> in this structure. And <code>st_mode</code> has the following bits:</p> <pre class="giallo z-code"><code data-lang="c"><span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IFMT</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">170000</span><span class="z-comment"> /* type of file mask */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IFIFO</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">010000</span><span class="z-comment"> /* named pipe (fifo) */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IFCHR</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">020000</span><span class="z-comment"> /* character special */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IFDIR</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">040000</span><span class="z-comment"> /* directory */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IFBLK</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">060000</span><span class="z-comment"> /* block special */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IFREG</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">100000</span><span class="z-comment"> /* regular */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IFLNK</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">120000</span><span class="z-comment"> /* symbolic link */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IFSOCK</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">140000</span><span class="z-comment"> /* socket */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IFWHT</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">160000</span><span class="z-comment"> /* whiteout */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_ISUID</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">004000</span><span class="z-comment"> /* set user id on execution */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_ISGID</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">002000</span><span class="z-comment"> /* set group id on execution */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_ISVTX</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">001000</span><span class="z-comment"> /* save swapped text even after use */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IRWXU</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">000700</span><span class="z-comment"> /* RWX mask for owner */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IRUSR</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">000400</span><span class="z-comment"> /* read permission, owner */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IWUSR</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">000200</span><span class="z-comment"> /* write permission, owner */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IXUSR</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">000100</span><span class="z-comment"> /* execute/search permission, owner */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IRWXG</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">000070</span><span class="z-comment"> /* RWX mask for group */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IRGRP</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">000040</span><span class="z-comment"> /* read permission, group */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IWGRP</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">000020</span><span class="z-comment"> /* write permission, group */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IXGRP</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">000010</span><span class="z-comment"> /* execute/search permission, group */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IRWXO</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">000007</span><span class="z-comment"> /* RWX mask for other */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IROTH</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">000004</span><span class="z-comment"> /* read permission, other */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IWOTH</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">000002</span><span class="z-comment"> /* write permission, other */</span></span> <span class="giallo-l"><span class="z-keyword">#define</span><span class="z-entity z-name z-function"> S_IXOTH</span><span class="z-keyword"> 0</span><span class="z-constant z-numeric">000001</span><span class="z-comment"> /* execute/search permission, other */</span></span></code></pre> <p>Awesome, we have everything we need! We mask <code>mode</code> with <code>S_IFMT</code> to get the file data. Then we just have to check whether it is a named pipe <code>S_IFIFO</code>, a character special <code>S_IFCHR</code> etc. Concretly:</p> <ul> <li><code>isDirect</code> checks that the mode is equal to <code>S_IFCHR</code>, it means it is attached to the screen (in our case),</li> <li><code>isPipe</code> checks that the mode is equal to <code>S_IFIFO</code>: This is a special file that behaves like a FIFO stack (see the <a rel="noopener external" target="_blank" href="http://www.freebsd.org/cgi/man.cgi?query=mkfifo&amp;sektion=1">documentation of <code>mkfifo(1)</code></a>), everything which is written is directly read just after and the reading order is defined by the writing order (first-in, first-out!),</li> <li><code>isRedirection</code> checks that the mode is equal to <code>S_IFREG</code> , <code>S_IFDIR</code> , <code>S_IFLNK</code> , <code>S_IFSOCK</code> or <code>S_IFBLK</code> , in other words: All kind of files on which we can apply a redirection. Why? Because the <code>STDOUT</code> (or another <code>STD_*_</code> pipe) of the current processus is defined as a file pointer to the redirection destination and it can be only a file, a directory, a link, a socket or a block file.</li> </ul> <p>I encourage you to read the <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Console/blob/master/Source/Console.php">implementation of the <code>Hoa\Console\Console::getMode</code> method</a>.</p> <p>So yes, this is useful to enable styles on text but also to define the default verbosity level. For instance, if a program outputs the result of a computation with some explanations around, the highest verbosity level would output everything (the result and the explanations) while the lowest level would output only the result. Let's try with the <code>toUpperCase.php</code> program:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$verbose</span><span class="z-keyword z-operator"> =</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Console</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">isDirect</span><span>(</span><span class="z-support z-constant">STDOUT</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable">$string</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $argv</span><span class="z-punctuation z-section">[</span><span class="z-constant z-numeric">1</span><span class="z-punctuation z-section">]</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable">$result</span><span class="z-keyword z-operator"> =</span><span> (</span><span class="z-keyword">new</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>String</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">String</span><span>(</span><span class="z-variable">$string</span><span>))</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">toUpperCase</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">if</span><span>(</span><span class="z-constant z-language">true</span><span class="z-keyword z-operator"> ===</span><span class="z-variable"> $verbose</span><span>) {</span></span> <span class="giallo-l"><span class="z-support z-function"> echo</span><span class="z-variable"> $string</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &#39; becomes &#39;</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $result</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &#39; in upper case!&#39;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span><span class="z-keyword"> else</span><span> {</span></span> <span class="giallo-l"><span class="z-support z-function"> echo</span><span class="z-variable"> $result</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Then, let's execute this program:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> php toUpperCase.php </span><span class="z-string">&#39;Hello world!&#39;</span></span> <span class="giallo-l"><span>Hello world! becomes HELLO WORLD! in upper case!</span></span></code></pre> <p>And now, let's execute this program with a pipe:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> php toUpperCase.php </span><span class="z-string">&#39;Hello world!&#39;</span><span class="z-keyword z-operator"> |</span><span class="z-entity z-name"> xargs</span><span class="z-constant z-other"> -I@</span><span class="z-string"> echo @</span></span> <span class="giallo-l"><span>HELLO WORLD!</span></span></code></pre> <p>Useful and very simple, isn't it?</p> <h2 id="terminal-capabilities">Terminal capabilities<a role="presentation" class="anchor" href="#terminal-capabilities" title="Anchor link to this header">#</a> </h2> <p>We can control the terminal with the inputs, like the keyboard, but we can also control the outputs. How? With the text itself. Actually, an output does not contain only the text but it includes <strong>control functions</strong>. It's like HTML: Around a text, you can have an element, specifying that the text is a link. It's exactly the same for terminals! To specify that a text must be in red, we must add a control function around it.</p> <p>Hopefully, these control functions have been standardized in the <a rel="noopener external" target="_blank" href="http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-048.pdf">ECMA-48</a> document: Control Functions for Coded Character Set. However, not all terminals implement all this standard, and for historical reasons, some terminals use slightly different control functions. Moreover, some information do not belong to this standard (because this is out of its scope), like: How many colours does the terminal support? or does the terminal support the meta key?</p> <p>Consequently, each terminal has a list of <strong>capabilities</strong>. This list is splitted in <strong>3 categories</strong>:</p> <ul> <li>boolean capabilities,</li> <li>number capabilities,</li> <li>string capabilities.</li> </ul> <p>For instance:</p> <ul> <li>the “does the terminal support the meta key” is a boolean capability called <code>meta_key</code> where its value is <code>true</code> or <code>false</code>,</li> <li>the “number of colours supported by the terminal” is a… number capability called <code>max_colors</code> where its value can be <code>2</code>, <code>8</code>, <code>256</code> or more,</li> <li>the “clear screen control function” is a string capability called <code>clear_screen</code> where its value might be <code>\e[H\e[2J</code>,</li> <li>the “move the cursor one column to the right” is also a string capability called <code>cursor_right</code> where its value might be <code>\e[C</code> .</li> </ul> <p>All the capabilities can be found in the <a rel="noopener external" target="_blank" href="http://www.freebsd.org/cgi/man.cgi?query=terminfo&amp;sektion=5">documentation of <code>terminfo(5)</code></a> or in the <a rel="noopener external" target="_blank" href="http://pubs.opengroup.org/onlinepubs/7908799/xcurses/terminfo.html">documentation of xcurses</a>. I encourage you to follow these links and see how rich the terminal capabilities are!</p> <h2 id="terminal-information">Terminal information<a role="presentation" class="anchor" href="#terminal-information" title="Anchor link to this header">#</a> </h2> <p>Terminal capabilities are stored as <strong>information</strong> in <strong>databases</strong>. Where are these databases located? In files with a binary format. Favorite locations are:</p> <ul> <li><code>/usr/share/terminfo</code>,</li> <li><code>/usr/share/lib/terminfo</code>,</li> <li><code>/lib/terminfo</code>,</li> <li><code>/usr/lib/terminfo</code>,</li> <li><code>/usr/local/share/terminfo</code>,</li> <li><code>/usr/local/share/lib/terminfo</code>,</li> <li>etc.</li> <li>or the <code>TERMINFO</code> or <code>TERMINFO_DIRS</code> environment variables.</li> </ul> <p>Inside these directories, we have a tree of the form: <code>_xx_/_name_</code>, where <code>_xx_</code> is the ASCII value in hexadecimal of the first letter of the terminal name <code>_name_</code>, or <code>_n_/_name_</code> where <code>_n_</code> is the first letter of the terminal name. The terminal name is stored in the <code>TERM</code> environment variable. For instance, on my computer:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> echo </span><span class="z-variable">$TERM</span></span> <span class="giallo-l"><span>xterm-256color</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> file /usr/share/terminfo/78/xterm-256color</span></span> <span class="giallo-l"><span>/usr/share/terminfo/78/xterm-256color: Compiled terminfo entry</span></span></code></pre> <p>We can use the <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Console/blob/master/Source/Tput.php"><code>Hoa\Console\Tput</code> class</a> to retrieve these information. The <code>getTerminfo</code> static method allows to get the path of the terminal information file. The <code>getTerm</code> static method allows to get the terminal name. Finally, the whole class allows to parse a terminal information database (it will use the file returned by <code>getTerminfo</code> by default). For instance:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$tput</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Tput</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">var_dump</span><span>(</span><span class="z-variable">$tput</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">count</span><span>(</span><span class="z-string">&#39;max_colors&#39;</span><span>))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">/**</span></span> <span class="giallo-l"><span class="z-comment"> * Will output:</span></span> <span class="giallo-l"><span class="z-comment"> * int(256)</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span></code></pre> <p>On my computer, with <code>xterm-256color</code>, I have 256 colours, as expected. If we parse the information of <code>xterm</code> and not <code>xterm-256color</code>, we will have:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-variable">$tput</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Tput</span><span>(Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Tput</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">getTerminfo</span><span>(</span><span class="z-string">&#39;xterm&#39;</span><span>))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">var_dump</span><span>(</span><span class="z-variable">$tput</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">count</span><span>(</span><span class="z-string">&#39;max_colors&#39;</span><span>))</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">/**</span></span> <span class="giallo-l"><span class="z-comment"> * Will output:</span></span> <span class="giallo-l"><span class="z-comment"> * int(8)</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span></code></pre><h2 id="the-power-in-your-hand-control-the-cursor">The power in your hand: Control the cursor<a role="presentation" class="anchor" href="#the-power-in-your-hand-control-the-cursor" title="Anchor link to this header">#</a> </h2> <p>Let's summarize. We are able to parse and know all the terminal capabilities of a specific terminal (including the one of the current user). If we would like a powerful terminal API, we need to control the basis, like the cursor.</p> <p>Remember. We said that the terminal is a canvas of columns and lines. The cursor is like a pen. We can move it and write something. We are going to (partly) see how the <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Console/blob/master/Source/Cursor.php"><code>Hoa\Console\Cursor</code> class</a> works.</p> <h3 id="i-like-to-move-it">I like to move it!<a role="presentation" class="anchor" href="#i-like-to-move-it" title="Anchor link to this header">#</a> </h3> <p>The <code>moveTo</code> static method allows to move the cursor to an absolute position. For example:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Cursor</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">moveTo</span><span>(</span><span class="z-variable">$x</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $y</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>The control function we use is <code>cursor_address</code>. So all we need to do is to use the <code>Hoa\Console\Tput</code> class and call the <code>get</code> method on it to get the value of this string capability. This is a parameterized one: On <code>xterm-256color</code>, its value is <code>e[%i%p1%d;%p2%dH</code>. We replace the parameters by <code>$x</code> and <code>$y</code> and we output the result. That's all! We are able to move the cursor on an absolute position on <strong>all terminals</strong>! This is the right way to do.</p> <p>We use the same strategy for the <code>move</code> static method that moves the cursor relatively to its current position. For example:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Cursor</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">move</span><span>(</span><span class="z-string">&#39;right up&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>We split the steps and for each step we read the appropriated string capability using the <code>Hoa\Console\Tput</code> class. For <code>right</code>, we read the <code>parm_right_cursor</code> string capability, for <code>up</code>, we read <code>parm_up_cursor</code> etc. Note that <code>parm_right_cursor</code> is different of <code>cursor_right</code>: The first one is used to move the cursor a certain number of times while the second one is used to move the cursor only one time. With performances in mind, we should use the first one if we have to move the cursor several times.</p> <p>The <code>getPosition</code> static method returns the position of the cursor. This way to interact is a little bit different. We must write a control function on the output, and then, the terminal replies on the input. <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Console/blob/master/Source/Cursor.php">See the implementation by yourself</a>.</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-function">print_r</span><span>(Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Cursor</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">getPosition</span><span>())</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">/**</span></span> <span class="giallo-l"><span class="z-comment"> * Will output:</span></span> <span class="giallo-l"><span class="z-comment"> * Array</span></span> <span class="giallo-l"><span class="z-comment"> * (</span></span> <span class="giallo-l"><span class="z-comment"> * [x] =&gt; 7</span></span> <span class="giallo-l"><span class="z-comment"> * [y] =&gt; 42</span></span> <span class="giallo-l"><span class="z-comment"> * )</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span></code></pre> <p>In the same way, we have the <code>save</code> and <code>restore</code> static methods that save the current position of the cursor and restore it. This is very useful. We use the <code>save_cursor</code> and <code>restore_cursor</code> string capabilities.</p> <p>Also, the <code>clear</code> static method splits some parts to clear. For each part (direction or way), we read from <code>Hoa\Console\Tput</code> the appropriated string capabilities: <code>clear_screen</code> to clear all the screen, <code>clr_eol</code> to clear everything on the right of the cursor, <code>clr_eos</code> to clear everything bellow the cursor etc.</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Cursor</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">clear</span><span>(</span><span class="z-string">&#39;left&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>See what we learnt in action:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span class="z-string"> &#39;Foobar&#39;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-string"> &#39;Foobar&#39;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-string"> &#39;Foobar&#39;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-string"> &#39;Foobar&#39;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-string"> &#39;Foobar&#39;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Cursor</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">save</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">sleep</span><span>(</span><span class="z-constant z-numeric">1</span><span>)</span><span class="z-punctuation z-terminator">;</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Cursor</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">move</span><span>(</span><span class="z-string">&#39;LEFT&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">sleep</span><span>(</span><span class="z-constant z-numeric">1</span><span>)</span><span class="z-punctuation z-terminator">;</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Cursor</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">move</span><span>(</span><span class="z-string">&#39;↑&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">sleep</span><span>(</span><span class="z-constant z-numeric">1</span><span>)</span><span class="z-punctuation z-terminator">;</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Cursor</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">move</span><span>(</span><span class="z-string">&#39;↑&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">sleep</span><span>(</span><span class="z-constant z-numeric">1</span><span>)</span><span class="z-punctuation z-terminator">;</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Cursor</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">move</span><span>(</span><span class="z-string">&#39;↑&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">sleep</span><span>(</span><span class="z-constant z-numeric">1</span><span>)</span><span class="z-punctuation z-terminator">;</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Cursor</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">clear</span><span>(</span><span class="z-string">&#39;↔&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">sleep</span><span>(</span><span class="z-constant z-numeric">1</span><span>)</span><span class="z-punctuation z-terminator">;</span><span class="z-support z-function"> echo</span><span class="z-string"> &#39;Hahaha!&#39;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">sleep</span><span>(</span><span class="z-constant z-numeric">1</span><span>)</span><span class="z-punctuation z-terminator">;</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Cursor</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">restore</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &#39;Bye!&#39;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>The result is presented in the following figure.</p> <figure> <p><img src="https://mnt.io/articles/control-the-terminal-the-right-way/./cursor_move.gif" alt="Moving a cursor in the terminal" loading="lazy" decoding="async" /></p> <figcaption> <p>Saving, moving, clearing and restoring the cursor with <code>Hoa\Console</code>.</p> </figcaption> </figure> <p>The resulting API is portable, clean, simple to read and very easy to maintain! This is the right way to do.</p> <p>To get more information, please <a rel="noopener external" target="_blank" title="Documentation of Hoa\Console\Cursor" href="http://hoa-project.net/Literature/Hack/Console.html#Cursor">read the documentation</a>.</p> <h3 id="">Colours and decorations<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h3> <p>Now: Colours. This is mainly the reason why I decided to write this article. We see the same and the same libraries, again and again, doing only colours in the terminal, but unfortunately not in the right way 😞.</p> <p>A terminal has a palette of colours. Each colour is indexed by an integer, from 0 to potentially +∞ . The size of the palette is described by the <code>max_colors</code> number capability. Usually, a palette contains 1, 2, 8, 256 or 16 million colours.</p> <figure> <p><img src="https://mnt.io/articles/control-the-terminal-the-right-way/./xterm_256color_chart.svg" alt="<code>xterm-256color</code> palette" loading="lazy" decoding="async" /></p> <figcaption> <p>The <code>xterm-256color</code> palette (<a rel="noopener external" target="_blank" title="Source of the `xterm-256color` palette" href="https://commons.wikimedia.org/wiki/File:Xterm_256color_chart.svg">source</a>).</p> </figcaption> </figure> <p>So first thing to do is to check whether we have more than 1 colour. If not, we must not colorize the given text. Next, if we have less than 256 colours, we have to convert the style into a palette containing 8 colours. Same with less than 16 million colours, we have to convert into 256 colours.</p> <p>Moreover, we can define the style of the foreground or of the background with respectively the <code>set_a_foreground</code> and <code>set_a_background</code> string capabilities. Finally, in addition to colours, we can define other decorations like bold, underline, blink or even inverse the foreground and the background colours.</p> <p>One thing to remember is: With this capability, we only define the style at a given “pixel” and it will apply on the following text. In this case, it is not exactly like HTML where we have a beginning and an end. Here we only have a beginning. Let&#39;s try!</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Cursor</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">colorize</span><span>(</span><span class="z-string">&#39;underlined foreground(yellow) background(#932e2e)&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span class="z-string"> &#39;foo&#39;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Cursor</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">colorize</span><span>(</span><span class="z-string">&#39;!underlined background(normal)&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span class="z-string"> &#39;bar&#39;</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>The API is pretty simple: We start to underline the text, we set the foreground to yellow and we set the background to <code>#932e2e</code>  . Then we output something. We continue with cancelling the underline decoration in addition to resetting the background. Finally we output something else. Here is the result:</p> <figure> <p><img src="https://mnt.io/articles/control-the-terminal-the-right-way/./colour.png" alt="A styled text in the terminal" loading="lazy" decoding="async" /></p> <figcaption> <p>Fun with <code>Hoa\Console\Cursor::colorize</code>.</p> </figcaption> </figure> <p>What do we observe? My terminal does not support more than 256 colours. Thus, <code>#932e2e</code> is <strong>automatically converted into the closest colour</strong> in my actual palette! This is the right way to do.</p> <p>For fun, you can change the colours in the palette with the <code>Hoa\Console\Cursor::changeColor</code> static method. You can also change the style of the cursor, like <code>▋</code>, <code>_</code> or <code>|</code>.</p> <p>To get more information, please <a rel="noopener external" target="_blank" title="Documentation of Hoa\Console\Cursor" href="http://hoa-project.net/Fr/Literature/Hack/Console.html#Content">read the documentation</a>.</p> <h2 id="-1">The power in your hand: Readline<a role="presentation" class="anchor" href="#-1" title="Anchor link to this header">#</a> </h2> <p>A more complete usage of <code>Hoa\Console\Cursor</code> and even <code>Hoa\Console\Window</code> is the <a rel="noopener external" target="_blank" href="http://central.hoa-project.net/Resource/Library/Console/Readline/Readline.php"><code>Hoa\Console\Readline</code> class</a> that is a powerful readline. More than autocompleters, history, key bindings etc., it has an advanced use of cursors. See this in action:</p> <figure> <p><img src="https://mnt.io/articles/control-the-terminal-the-right-way/./readline_autocompleters.gif" alt="Play with autocompleters" loading="lazy" decoding="async" /></p> <figcaption> <p>An autocompletion menu, made with <code>Hoa\Console\Cursor</code> and <code>Hoa\Console\Window</code>.</p> </figcaption> </figure> <p>We use <code>Hoa\Console\Cursor</code> to move the cursor or change the colours and <code>Hoa\Console\Window</code> to get the dimensions of the window, scroll some text in it etc. I encourage you to read the implementation.</p> <p>To get more information, please <a rel="noopener external" target="_blank" title="Documentation of Hoa\Console\Readline" href="http://hoa-project.net/Literature/Hack/Console.html#Readline">read the documentation</a>.</p> <h2 id="-2">The power in your hand: Sound 🎵<a role="presentation" class="anchor" href="#-2" title="Anchor link to this header">#</a> </h2> <p>Yes, even sound is defined by terminal capabilities. The famous bip is given by the <code>bell</code> string capability. You would like to make a bip? Easy:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$tput</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Tput</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span class="z-variable"> $tput</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">get</span><span>(</span><span class="z-string">&#39;bell&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>That&#39;s it!</p> <h2 id="-3">Bonus: Window<a role="presentation" class="anchor" href="#-3" title="Anchor link to this header">#</a> </h2> <p>As a bonus, a quick demo of <code>Hoa\Console\Window</code> because it&#39;s fun.</p> <p>The video shows the execution of the following code:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span>Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Window</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">setSize</span><span>(</span><span class="z-constant z-numeric">80</span><span class="z-punctuation z-separator">,</span><span class="z-constant z-numeric"> 35</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">var_dump</span><span>(Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Window</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">getPosition</span><span>())</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">foreach</span><span>(</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> [</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> [</span><span class="z-constant z-numeric">100</span><span class="z-punctuation z-separator">,</span><span class="z-constant z-numeric"> 100</span><span class="z-punctuation z-section">]</span><span class="z-punctuation z-separator">,</span><span class="z-punctuation z-section"> [</span><span class="z-constant z-numeric">150</span><span class="z-punctuation z-separator">,</span><span class="z-constant z-numeric"> 150</span><span class="z-punctuation z-section">]</span><span class="z-punctuation z-separator">,</span><span class="z-punctuation z-section"> [</span><span class="z-constant z-numeric">200</span><span class="z-punctuation z-separator">,</span><span class="z-constant z-numeric"> 100</span><span class="z-punctuation z-section">]</span><span class="z-punctuation z-separator">,</span><span class="z-punctuation z-section"> [</span><span class="z-constant z-numeric">200</span><span class="z-punctuation z-separator">,</span><span class="z-constant z-numeric"> 80</span><span class="z-punctuation z-section">]</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> [</span><span class="z-constant z-numeric">200</span><span class="z-punctuation z-separator">,</span><span class="z-constant z-numeric"> 60</span><span class="z-punctuation z-section">]</span><span class="z-punctuation z-separator">,</span><span class="z-punctuation z-section"> [</span><span class="z-constant z-numeric">200</span><span class="z-punctuation z-separator">,</span><span class="z-constant z-numeric"> 100</span><span class="z-punctuation z-section">]</span></span> <span class="giallo-l"><span class="z-punctuation z-section"> ]</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> as</span><span class="z-support z-function"> list</span><span>(</span><span class="z-variable">$x</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $y</span><span>)</span></span> <span class="giallo-l"><span>) {</span></span> <span class="giallo-l"><span class="z-support z-function"> sleep</span><span>(</span><span class="z-constant z-numeric">1</span><span>)</span><span class="z-punctuation z-terminator">;</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Window</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">moveTo</span><span>(</span><span class="z-variable">$x</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $y</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-support z-function">sleep</span><span>(</span><span class="z-constant z-numeric">2</span><span>)</span><span class="z-punctuation z-terminator">;</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Window</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">minimize</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">sleep</span><span>(</span><span class="z-constant z-numeric">2</span><span>)</span><span class="z-punctuation z-terminator">;</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Window</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">restore</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">sleep</span><span>(</span><span class="z-constant z-numeric">2</span><span>)</span><span class="z-punctuation z-terminator">;</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Window</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">lower</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">sleep</span><span>(</span><span class="z-constant z-numeric">2</span><span>)</span><span class="z-punctuation z-terminator">;</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span>Console</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Window</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">raise</span><span>()</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>We resize the window, we get its position, we move the window on the screen, we minimize and restore it, and finally we put it behind all other windows just before raising it.</p> <iframe src="https://player.vimeo.com/video/115901611?title=0&amp;byline=0&amp;portrait=0&amp;badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" frameborder="0" allow="autoplay; fullscreen; picture-in-picture; clipboard-write" style="aspect-ratio: 16/9; width: 100%;" title="Hoa\Console\Window in action"> </iframe> <p>To get more information, please <a rel="noopener external" target="_blank" title="Documentation of Hoa\Console\Window" href="http://hoa-project.net/Literature/Hack/Console.html#Window">read the documentation</a>.</p> <h2 id="-4">Conclusion<a role="presentation" class="anchor" href="#-4" title="Anchor link to this header">#</a> </h2> <p>In this article, we saw how to control the terminal by: Firstly, detecting the type of pipes, and secondly, reading and using the terminal capabilities. We know where these capabilities are stored and we saw few of them in action.</p> <p>This approach ensures your code will be <strong>portable</strong>, easy to maintain and <strong>easy to use</strong>. The portability is very important because, like browsers and user devices, we have a lot of terminal emulators released in the wild. We have to care about them.</p> <p>I encourage you to take a look at the <a rel="noopener external" target="_blank" href="http://github.com/hoaproject/Console"><code>Hoa\Console</code> library</a> and to contribute to make it even more awesome 😄.</p> atoum has two release managers 2014-11-28T00:00:00+00:00 2014-11-28T00:00:00+00:00 Unknown https://mnt.io/articles/atoum-has-two-release-managers/ <h2 id="what-is-atoum">What is atoum?<a role="presentation" class="anchor" href="#what-is-atoum" title="Anchor link to this header">#</a> </h2> <p>Short introduction: atoum is a simple, modern and intuitive unit testing framework for PHP. Originally created by <a rel="noopener external" target="_blank" href="http://blog.mageekbox.net/">Frédéric Hardy</a>, a good friend, it has grown thanks to <a rel="noopener external" target="_blank" href="https://github.com/atoum/atoum/graphs/contributors">many contributors</a>.</p> <figure> <p><img src="https://mnt.io/articles/atoum-has-two-release-managers/./atoum-logo.png" alt="atoum&#39;s logo" loading="lazy" decoding="async" /></p> <figcaption> atoum's logo. </figcaption> </figure> <p>No one can say that atoum is not simple or intuitive. The framework offers several awesome features and is more like a meta unit testing framework. Indeed, the “user-land” of atoum, I mean all the assertions API (“this is an integer and it is equal to…”) is based on a very flexible mechanism, handled or embedded in runners, reporters etc. Thus, the framework is very extensible. You can find more informations in the <code>README.md</code> of the project: <a rel="noopener external" target="_blank" href="https://github.com/atoum/atoum#why-atoum">Why atoum?</a>.</p> <p>Several important projects or companies use atoum. For instance, <a rel="noopener external" target="_blank" href="https://github.com/FriendsOfPHP/pickle/">Pickle</a>, the PHP Extension installer, created by <a rel="noopener external" target="_blank" href="https://twitter.com/PierreJoye">Pierre Joye</a>, another friend (the world is very small 😉) use atoum for its unit tests. Another example with <a rel="noopener external" target="_blank" href="https://github.com/M6Web">M6Web</a>, the geeks working at <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/M6_%28TV_channel%29">M6</a>, the most profitable private national French TV channel, also use atoum. Another example, <a rel="noopener external" target="_blank" href="http://mozilla.org/">Mozilla</a> is using atoum to test some of their applications.</p> <h2 id="">Where is the cap&#39;tain?<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h2> <p>Since the beginning, Frédéric has been a great leader for the project. He has inspired many people, users and contributors. In real life, on stage, on IRC… its personality and charisma were helpful in all aspects. However, leading such a project is a challenging and nerve-wracking daily work. I know what I am talking about with <a rel="noopener external" target="_blank" href="http://hoa-project.net/">Hoa</a>. Hopefully for Frédéric, some contributors were here to help.</p> <h2 id="-1">Where to go cap&#39;tain?<a role="presentation" class="anchor" href="#-1" title="Anchor link to this header">#</a> </h2> <p>However, having contributors do not create a community. A community is a group of people that share something together. A project needs a community with strong connections. They do no need to all look at the same direction, but they have to share something. In the case of atoum, I would say the project has been <strong>victim of its own success</strong>. We have seen the number of users increasing very quickly and the project was not yet ready for such a massive use. The documentation was not ready, a lot of features were not finalized, there were few contributors and the absence of a real community did not help. Put all these things together, blend them together and you obtain a bomb 😄. The project leaders were under a terrible pressure.</p> <p>In these conditions, this is not easy to work. Especially when users ask for new features. The needs to have a roadmap and people taking decisions were very strong.</p> <h2 id="-2">When the community acts<a role="presentation" class="anchor" href="#-2" title="Anchor link to this header">#</a> </h2> <p>After a couple of months under the sea, we have decided that we need to create a structure around the project. An organization. Frédéric is not able to do everything by himself. That&#39;s why <strong>2 release managers have been elected</strong>: Mikaël Randy and I. Thank you to <a rel="noopener external" target="_blank" href="http://jubianchi.fr/">Julien Bianchi</a>, another friend 😉, for having organized these elections and being one of the most active contributor of atoum!</p> <p>Our goal is to define the roadmap of atoum:</p> <ul> <li>what will be included in the next version and what will not,</li> <li>what features need work,</li> <li>what bugs or issues need to be solved,</li> <li>etc.</li> </ul> <p>Well, a release manager is a pretty common job.</p> <p>Why 2? To avoid the bus effect and delegate. We all have a family, friends, jobs and side projects. With 2 release managers, we have 2 times more time to organize this project, and it deserves such an amount of time.</p> <p>The goal is also to organize the community if it is possible. New great features are coming and they will allow more people to contribute and build their “own atoum”. See below.</p> <h2 id="-3">Features to port!<a role="presentation" class="anchor" href="#-3" title="Anchor link to this header">#</a> </h2> <p>Everything is not defined at 100% but here is an overview of what is coming.</p> <p>First of all, you will find the <a rel="noopener external" target="_blank" href="https://github.com/atoum/atoum/milestones/1.0.0">latest issues and bugs</a> we have to close before the first release.</p> <p>Second, you will notice the version number… 1.0.0. Yes! atoum will have tags! After several discussions (<a rel="noopener external" target="_blank" href="https://github.com/atoum/atoum/issues/261">#261</a>, <a rel="noopener external" target="_blank" href="https://github.com/atoum/atoum/issues/300">#300</a>, <a rel="noopener external" target="_blank" href="https://github.com/atoum/atoum/issues/342">#342</a>, <a rel="noopener external" target="_blank" href="https://github.com/atoum/atoum/issues/349">#349</a>…), even if atoum is rolling-released, it will have tags. And with the <a rel="noopener external" target="_blank" href="http://semver.org/">semver format</a>. More informations on the blog of Julien Bianchi: <a rel="noopener external" target="_blank" href="http://jubianchi.fr/atoum-release.htm">atoum embraces semver</a>.</p> <p>Finally, a big feature is the <a rel="noopener external" target="_blank" href="https://github.com/atoum/atoum/pull/330">Extension API</a>, that allows to write extension, such as:</p> <ul> <li><a rel="noopener external" target="_blank" href="https://github.com/atoum/visibility-extension"><code>atoum/visibility-extension</code></a>, allows to override methods visibility; example:</li> </ul> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> Foo</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> protected</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> bar</span><span>(</span><span class="z-variable">$arg</span><span>) {</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> $arg</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// and…</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> Foo</span><span class="z-storage"> extends</span><span> atoum</span><span class="z-punctuation z-separator">\</span><span class="z-entity z-other z-inherited-class">test</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> testBaz</span><span>() {</span></span> <span class="giallo-l"><span class="z-variable z-language"> $this</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">if</span><span>(</span><span class="z-variable">$sut</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span class="z-punctuation z-separator"> \</span><span class="z-support z-class">Foo</span><span>())</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">and</span><span>(</span><span class="z-variable">$arg</span><span class="z-keyword z-operator"> =</span><span class="z-string"> &#39;bar&#39;</span><span>)</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-variable">then</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">variable</span><span>(</span><span class="z-variable z-language">$this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">invoke</span><span>(</span><span class="z-variable">$sut</span><span>)</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">bar</span><span>(</span><span class="z-variable">$arg</span><span>))</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">isEqualTo</span><span>(</span><span class="z-variable">$arg</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Now you will be able to test your protected and private methods!</p> <ul> <li><a rel="noopener external" target="_blank" href="https://github.com/atoum/bdd-extension"><code>atoum/bdd-extension</code></a>, allows to write tests with the behavior-driven development style and vocabulary; example:</li> </ul> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> Formatter</span><span class="z-storage"> extends</span><span> atoum</span><span class="z-punctuation z-separator">\</span><span class="z-entity z-other z-inherited-class">spec</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> should_format_underscore_separated_method_name</span><span>() {</span></span> <span class="giallo-l"><span class="z-variable z-language"> $this</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">given</span><span>(</span><span class="z-variable">$formatter</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span class="z-support z-class"> testedClass</span><span>())</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-variable">then</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-variable">invoking</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">format</span><span>(</span><span class="z-constant z-language">__FUNCTION__</span><span>)</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">on</span><span>(</span><span class="z-variable">$formatter</span><span>)</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">shouldReturn</span><span>(</span><span class="z-string">&#39;should format underscore separated method name&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Even the output looks familiar:</p> <figure> <p><img src="https://mnt.io/articles/atoum-has-two-release-managers/output.png" alt="Example of a terminal output" loading="lazy" decoding="async" /></p> <figcaption> Possible output with the `atoum/bdd-extension`. </figcaption> </figure> <ul> <li><a rel="noopener external" target="_blank" href="https://github.com/atoum/json-schema-extension"><code>atoum/json-schema-extension</code></a>, allows to validate JSON payloads against a schema; example:</li> </ul> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> Foo</span><span class="z-storage"> extends</span><span> atoum</span><span class="z-punctuation z-separator">\</span><span class="z-entity z-other z-inherited-class">test</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> testIsJson</span><span>() {</span></span> <span class="giallo-l"><span class="z-variable z-language"> $this</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">given</span><span>(</span><span class="z-variable">$string</span><span class="z-keyword z-operator"> =</span><span class="z-string"> &#39;{&quot;foo&quot;: &quot;bar&quot;}&#39;</span><span>)</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-variable">then</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">json</span><span>(</span><span class="z-variable">$string</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> testValidatesSchema</span><span>() {</span></span> <span class="giallo-l"><span class="z-variable z-language"> $this</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">given</span><span>(</span><span class="z-variable">$string</span><span class="z-keyword z-operator"> =</span><span class="z-string"> &#39;[&quot;foo&quot;, &quot;bar&quot;]&#39;</span><span>)</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-variable">then</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">json</span><span>(</span><span class="z-variable">$string</span><span>)</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">validates</span><span>(</span><span class="z-string">&#39;{&quot;title&quot;: &quot;test&quot;, &quot;type&quot;: &quot;array&quot;}&#39;</span><span>)</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">json</span><span>(</span><span class="z-variable">$string</span><span>)</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">validates</span><span>(</span><span class="z-string">&#39;/path/to/json.schema&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <ul> <li><a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Contributions-Atoum-PraspelExtension"><code>atoum/praspel-extension</code></a>, allows to use <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Praspel">Praspel</a> inside atoum: automatically generate and validate advanced test data and unit tests; example:</li> </ul> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> Foo</span><span class="z-storage"> extends</span><span> atoum</span><span class="z-punctuation z-separator">\</span><span class="z-entity z-other z-inherited-class">test</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> testFoo</span><span>() {</span></span> <span class="giallo-l"><span class="z-variable z-language"> $this</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">if</span><span>(</span><span class="z-variable">$regex</span><span class="z-keyword z-operator"> =</span><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">realdom</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">regex</span><span>(</span><span class="z-string z-regexp">&#39;/[\w\-_]</span><span class="z-keyword z-operator">+</span><span class="z-string z-regexp z-constant z-character">(\.[\w\-\_]</span><span class="z-keyword z-operator">+</span><span class="z-string z-regexp">)</span><span class="z-keyword z-operator">*</span><span class="z-string z-regexp z-constant z-character">@\w\.(net|org)/&#39;</span><span>))</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">and</span><span>(</span><span class="z-variable">$email</span><span class="z-keyword z-operator"> =</span><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">sample</span><span>(</span><span class="z-variable">$regex</span><span>))</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-variable">then</span></span> <span class="giallo-l"><span class="z-constant z-other"> …</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Here, we have generated a string based on its regular expression. Reminder, you might have seen this on this blog: <a href="https://mnt.io/articles/generate-strings-based-on-regular-expressions/">Generate strings based on regular expressions</a> .</p> <p>Fun fact: the <code>atoum/json-schema-extension</code> is tested with atoum obviously and… <code>atoum/praspel-extension</code>!</p> <h2 id="-4">Conclusion<a role="presentation" class="anchor" href="#-4" title="Anchor link to this header">#</a> </h2> <p>atoum has a bright future with exciting features! We sincerely hope this new direction will gather existing and new contributors 😄.</p> <p>❤️ open-source!</p> Hello fruux! 2014-11-24T00:00:00+00:00 2014-11-24T00:00:00+00:00 Unknown https://mnt.io/articles/hello-fruux/ <h2 id="leaving-the-research-world">Leaving the research world<a role="presentation" class="anchor" href="#leaving-the-research-world" title="Anchor link to this header">#</a> </h2> <p>I have really enjoyed my time at INRIA and Femto-ST, 2 research institutes in France. But after 8 years at the university and a hard PhD thesis (but with great results by the way!), I would like to see other things.</p> <p>My time as an intern at Mozilla and my work in the open-source world have been very <em>seductive</em>. Open-source contrasts a lot with the research world, where privacy and secrecy are first-citizens of every project. All the work I have made and all the algorithms I have developed during my PhD thesis have been implemented under an open-source license, and I ran into some issues because of such decision (patents are sometimes better, sometimes not… long story).</p> <p>So, I like research but I also like to hack and share everything. And right now, I have to get a change of air! So I asked on Twitter:</p> <blockquote> <p>I (Ivan Enderlin, a fresh PhD, creator of Hoa) am looking for a job. Here is my CV: <a rel="noopener external" target="_blank" href="http://t.co/dAdLm35RUu">http://t.co/dAdLm35RUu</a>. Please, contact me! <a rel="noopener external" target="_blank" href="https://twitter.com/hashtag/hoajob?src=hash">#hoajob</a></p> <p>— Hoa project (@hoaproject) <a rel="noopener external" target="_blank" href="https://twitter.com/hoaproject/status/492382581271572480">July 24th, 2014</a></p> </blockquote> <p>And what a surprise! A <strong>lot</strong> of companies answered to my tweet (most of them in private of course), but the most interesting one at my eyes was… fruux 😉.</p> <h2 id="fruux">fruux<a role="presentation" class="anchor" href="#fruux" title="Anchor link to this header">#</a> </h2> <p>fruux defines itself as: “A unified contacts/calendaring system that works across <a rel="noopener external" target="_blank" href="https://fruux.com/supported-devices/">platforms and devices</a>. We are behind <a rel="noopener external" target="_blank" href="https://fruux.com/opensource"><code>sabre/dav</code></a>, which is the most popular open-source implementation of the <a rel="noopener external" target="_blank" href="http://en.wikipedia.org/wiki/CardDAV">CardDAV</a> and <a rel="noopener external" target="_blank" href="http://en.wikipedia.org/wiki/CardDAV">CalDAV</a> standards. Besides us, developers and companies around the globe use our <code>sabre/dav</code> technology to deliver sync functionality to millions of users”.</p> <figure> <p><img src="https://mnt.io/articles/hello-fruux/./fruux-logo.png" alt="Fruux&#39;s logo" loading="lazy" decoding="async" /></p> <figcaption> fruux's logo. </figcaption> </figure> <p>Several things attract me at fruux:</p> <ol> <li>low-layers are open-source,</li> <li>viable ecosystem based on open-source,</li> <li>accepts remote working,</li> <li>close timezone to mine,</li> <li>touching millions of people,</li> <li>standards in minds.</li> </ol> <p>The first point is the most important for me. I don&#39;t want to make a company richer without any benefits for the rest of the world. I want my work to be beneficial to the rest of the world, to share my work, I want my work to be reused, hacked, criticized, updated and shared again. This is the spirit of the open-source and the hackability paradigms. And fortunately for me, fruux&#39;s low-layers are 100% open-source, namely <code>sabre/dav</code> &amp; co.</p> <p>However, being able to eat at the end of the month with open-source is not that simple. Fortunately for me, fruux has a stable economic model, based on open-source. Obviously, I have to work on closed projects, obviously, I have to work for some specific customers, but I can go back to open-source goodnesses all the time 😉.</p> <p>In addition, I am currently living in Switzerland and fruux is located in Germany. Fortunately for me, fruux&#39;s team is kind of dispatched all around Europe and the world. Naturally, they accept me to work remotely. Whilst it can be inconvenient for some people, I enjoy to have my own hours, to organize myself as I would like etc. Periodical meetings and phone-calls help to stay focused. And I like to think that people are more productive this way. After 4 years at home because of my Master thesis and PhD thesis, I know how to organize myself and exchange with a decentralized team. This is a big advantage. Moreover, Germany is in the same timezone as Switzerland! Compared to companies located at, for instance, California, this is simpler for my family.</p> <p>Finally, working on an open-source project that is used by millions of users is very motivating. You know that your contributions will touch a lot of people and it gives meaning to my work on a daily basis. Also, the last thing I love at fruux is this desire to respect standards, RFC, recommandations etc. They are involved in these processes, consortiums and groups (for instance <a rel="noopener external" target="_blank" href="http://calconnect.org/mbrlist.shtml">CalConnect</a>). I love standards and specifications, and this methodology reminds me the scientific approach I had with my PhD thesis. I consider that a standard without an implementation has no meaning, and a well-designed standard is a piece of a delicious cake, especially when everyone respects this standard 😄.</p> <p>(… but the cake is a lie!)</p> <h2 id="sabre"><code>sabre/*</code><a role="presentation" class="anchor" href="#sabre" title="Anchor link to this header">#</a> </h2> <p>fruux has mostly hired me because of my experience on <a rel="noopener external" target="_blank" href="http://hoa-project.net/">Hoa</a>. One of my main public job is to work on all the <code>sabre/*</code> libraries, which include:</p> <ul> <li><a rel="noopener external" target="_blank" href="https://github.com/fruux/sabre-dav"><code>sabre/dav</code></a>,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/fruux/sabre-davclient"><code>sabre/davclient</code></a>,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/fruux/sabre-event"><code>sabre/event</code></a>,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/fruux/sabre-http"><code>sabre/http</code></a>,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/fruux/sabre-proxy"><code>sabre/proxy</code></a>,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/fruux/sabre-tzserver"><code>sabre/tzserver</code></a>,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/fruux/sabre-vobject"><code>sabre/vobject</code></a>,</li> <li><a rel="noopener external" target="_blank" href="https://github.com/fruux/sabre-xml"><code>sabre/xml</code></a>.</li> </ul> <p>You will find the documentations and the news on <a rel="noopener external" target="_blank" href="http://sabre.io/">sabre.io</a>.</p> <p>All these libraries serve the first one: <code>sabre/dav</code>, which is an implementation of the WebDAV technology, including extensions for CalDAV, and CardDAV, respectively for calendars, tasks and address books. For the one who does not know what is WebDAV, in few words: The Web is mostly a read-only media, but WebDAV extends HTTP in order to be able to write and collaborate on documents. The way WebDAV is defined is fascinating, and even more, the way it can be extended.</p> <p>Most of the work is already done by <a rel="noopener external" target="_blank" href="http://evertpot.com/">Evert</a> and many contributors, but we can go deeper! More extensions, more standards, better code, better algorithms etc.!</p> <p>If you are interested in the work I am doing on <code>sabre/*</code>, you can check this <a rel="noopener external" target="_blank" href="https://github.com/search?q=user%3Afruux+author%3Ahywan&amp;type=Issues">search result on Github</a>.</p> <h2 id="">Future of Hoa<a role="presentation" class="anchor" href="#" title="Anchor link to this header">#</a> </h2> <p>Certain people have asked me about the future of Hoa: Whether I am going to stop it or not since I have a job now.</p> <p>Firstly, a PhD thesis is exhausting, and believe me, it requires more energy than a regular job, even if you are passionate about your job and you did not count working hours. With a PhD thesis, you have no weekend, no holidays, you are always out of time, you always have a ton (sic) of articles and documents to read… there is no break, no end. In these conditions, I was able to maintain Hoa and to grow the project though, thanks to a very helpful and present community!</p> <p>Secondly, fruux is planning to use Hoa. I don&#39;t know how or when, but if at a certain moment, using Hoa makes sense, they will. What does it imply for Hoa and me? It means that I will be paid to work on Hoa at a little percentage. I don&#39;t know how much, it will depend of the moments, but this is a big step forward for the project. Moreover, a project like fruux using Hoa is a big chance! I hope to see the fruux&#39;s logo very soon on the homepage of the Hoa&#39;s website.</p> <p>Thus, to conclude, I will have more time (on evenings, weekends, holidays and sometimes during the day) to work on Hoa. Do not be afraid, the future is bright 😄.</p> <h2 id="-1">Conclusion<a role="presentation" class="anchor" href="#-1" title="Anchor link to this header">#</a> </h2> <p><em>Bref</em>, I am working at fruux!</p> Generate strings based on regular expressions 2014-09-30T00:00:00+00:00 2014-09-30T00:00:00+00:00 Unknown https://mnt.io/articles/generate-strings-based-on-regular-expressions/ <p>During my PhD thesis, I have partly worked on the problem of the automatic accurate test data generation. In order to be complete and self-contained, I have addressed all kinds of data types, including strings. This article aims at showing how to generate accurate and relevant strings under several constraints.</p> <h2 id="what-is-a-regular-expression">What is a regular expression?<a role="presentation" class="anchor" href="#what-is-a-regular-expression" title="Anchor link to this header">#</a> </h2> <p>We are talking about formal language theory here. In the known world, there are four kinds of languages. More formally, in 1956, the <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Chomsky_hierarchy">Chomsky hierarchy</a> has been formulated, classifying grammars (which define languages) in four levels:</p> <ol> <li>unrestricted grammars, matching langages known as Turing languages, no restriction,</li> <li>context-sensitive grammars, matching contextual languages,</li> <li>context-free grammars, matching algebraic languages, based on stacked automata,</li> <li>regular grammars, matching regular languages.</li> </ol> <p>Each level includes the next level. The last level is the “weaker”, which must not sound negative here. <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Regular_expression">Regular expressions</a> are used often because of their simplicity and also because they solve most problems we encounter daily.</p> <p>A regular expression is a small language with very few operators and, most of the time, a simple semantics. For instance <code>ab(c|d)</code> means: a word (a data) starting by <code>ab</code> and followed by <code>c</code> or <code>d</code>. We also have quantification operators (also known as repetition operators), such as <code>?</code>, <code>*</code> and <code>+</code>. We also have <code>{_x_,_y_}</code> to define a repetition between <code>_x_</code> and <code>_y_</code>. Thus, <code>? </code> is equivalent to <code>{0,1}</code>, <code>*</code> to <code>{0,}</code> and <code>+</code> to <code>{1,}</code>. When <code>_y_</code> is missing, it means +∞, so unbounded (or more exactly, bounded by the limits of the machine). So, for instance <code>ab(c|d){2,4}e?</code> means: a word starting by <code>ab</code>, followed 2, 3 or 4 times by <code>c</code> or <code>d</code> (so <code>cc</code>, <code>cd</code>, <code>dc</code>, <code>ccc</code>, <code>ccd</code>, <code>cdc</code> and so on) and potentially followed by <code>e</code>.</p> <p>The goal here is not to teach you regular expressions but this is kind of a tiny reminder. There are plenty of regular languages. You might know <a rel="noopener external" target="_blank" href="http://www.unix.com/man-page/Linux/7/regex/">POSIX regular expression</a> or <a rel="noopener external" target="_blank" href="http://pcre.org/">Perl Compatible Regular Expressions (PCRE)</a>. Forget the first one, please. The syntax and the semantics are too much limited. PCRE is the regular language I recommend all the time.</p> <p>Behind every formal language there is a graph. A regular expression is compiled into a <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/">Finite State Machine (FSM)</a>. I am not going to draw and explain them, but it is interesting to know that behind a regular expression there is a basic automaton. No magic.</p> <h3 id="why-focussing-regular-expressions">Why focussing regular expressions?<a role="presentation" class="anchor" href="#why-focussing-regular-expressions" title="Anchor link to this header">#</a> </h3> <p>This article focuses on regular languages instead of other kind of languages because we use them very often (even daily). I am going to address context-free languages in another article, be patient young padawan. The needs and constraints with other kind of languages are not the same and more complex algorithms must be involved. So we are going easy for the first step.</p> <h2 id="understanding-pcre-lex-and-parse-them">Understanding PCRE: lex and parse them<a role="presentation" class="anchor" href="#understanding-pcre-lex-and-parse-them" title="Anchor link to this header">#</a> </h2> <p>The <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Compiler"><code>Hoa\Compiler</code> library</a> provides both LL(1) LL(k) compiler-compilers. The <a rel="noopener external" target="_blank" href="http://hoa-project.net/Literature/Hack/Compiler.html">documentation</a> describes how to use it. We discover that the LL(k) compiler comes with a grammar description language called PP. What does it mean? It means for instance that the grammar of the PCRE can be written with the PP language and that <code>Hoa\Compiler\Llk</code> will transform this grammar into a compiler. That's why we call them “compiler of compilers”.</p> <p>Fortunately, the <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Regex"><code>Hoa\Regex</code> library</a> provides the grammar of the PCRE language in the <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Regex/blob/master/Source/Grammar.pp"><code>hoa://Library/Regex/Grammar.pp</code></a> file. Consequently, we are able to analyze regular expressions written in the PCRE language! Let's try in a shell at first with the <code>hoa compiler:pp</code> tool:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> echo </span><span class="z-string">&#39;ab(c|d){2,4}e?&#39;</span><span class="z-keyword z-operator"> |</span><span class="z-entity z-name"> hoa</span><span class="z-string"> compiler:pp hoa://Library/Regex/Grammar.pp</span><span class="z-constant z-numeric"> 0</span><span class="z-constant z-other"> --visitor</span><span class="z-string"> dump</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">&gt;</span><span> #expression</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">&gt;</span><span class="z-keyword z-operator"> &gt;</span><span class="z-comment"> #concatenation</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">&gt;</span><span class="z-keyword z-operator"> &gt; &gt;</span><span> token(</span><span class="z-entity z-name">literal,</span><span class="z-string"> a</span><span>)</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">&gt;</span><span class="z-keyword z-operator"> &gt; &gt;</span><span> token(</span><span class="z-entity z-name">literal,</span><span class="z-string"> b</span><span>)</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">&gt;</span><span class="z-keyword z-operator"> &gt; &gt;</span><span class="z-comment"> #quantification</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">&gt;</span><span class="z-keyword z-operator"> &gt; &gt; &gt;</span><span class="z-comment"> #alternation</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">&gt;</span><span class="z-keyword z-operator"> &gt; &gt; &gt; &gt;</span><span> token(</span><span class="z-entity z-name">literal,</span><span class="z-string"> c</span><span>)</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">&gt;</span><span class="z-keyword z-operator"> &gt; &gt; &gt; &gt;</span><span> token(</span><span class="z-entity z-name">literal,</span><span class="z-string"> d</span><span>)</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">&gt;</span><span class="z-keyword z-operator"> &gt; &gt; &gt;</span><span> token(</span><span class="z-entity z-name">n_to_m,</span><span class="z-string"> {2,4}</span><span>)</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">&gt;</span><span class="z-keyword z-operator"> &gt; &gt;</span><span class="z-comment"> #quantification</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">&gt;</span><span class="z-keyword z-operator"> &gt; &gt; &gt;</span><span> token(</span><span class="z-entity z-name">literal,</span><span class="z-string"> e</span><span>)</span></span> <span class="giallo-l"><span class="z-punctuation z-separator">&gt;</span><span class="z-keyword z-operator"> &gt; &gt; &gt;</span><span> token(</span><span class="z-entity z-name">zero_or_one,</span><span class="z-string"> ?</span><span>)</span></span></code></pre> <p>We read that the whole expression is composed of a single concatenation of two tokens: <code>a</code> and <code>b</code>, followed by a quantification, followed by another quantification. The first quantification is an alternation of (a choice betwen) two tokens: <code>c</code> and <code>d</code>, between 2 to 4 times. The second quantification is the <code>e</code> token that can appear zero or one time. Pretty simple.</p> <p>The final output of the <code>Hoa\Compiler\Llk\Parser</code> class is an <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax Tree (AST)</a>. The documentation of <code>Hoa\Compiler</code> explains all that stuff, you should read it. The LL(k) compiler is cut out into very distinct layers in order to improve hackability. Again, the documentation teach us we have <a rel="noopener external" target="_blank" href="http://hoa-project.net/Literature/Hack/Compiler.html#Compilation_process">four levels in the compilation process</a>: lexical analyzer, syntactic analyzer, trace and AST. The lexical analyzer (also known as lexer) transforms the textual data being analyzed into a sequence of tokens (formally known as lexemes). It checks whether the data is composed of the good pieces. Then, the syntactic analyzer (also known as parser) checks that the order of tokens in this sequence is correct (formally we say that it derives the sequence, see the <a rel="noopener external" target="_blank" href="http://hoa-project.net/Literature/Hack/Compiler.html#Matching_words">Matching words section</a> to learn more).</p> <p>Still in the shell, we can get the result of the lexical analyzer by using the <code>--token-sequence</code> option; thus:</p> <pre class="giallo z-code"><code data-lang="shellsession"><span class="giallo-l"><span class="z-punctuation z-separator">$</span><span> echo </span><span class="z-string">&#39;ab(c|d){2,4}e?&#39;</span><span class="z-keyword z-operator"> |</span><span class="z-entity z-name"> hoa</span><span class="z-string"> compiler:pp hoa://Library/Regex/Grammar.pp</span><span class="z-constant z-numeric"> 0</span><span class="z-constant z-other"> --token-sequence</span></span> <span class="giallo-l"><span> # … token name token value offset</span></span> <span class="giallo-l"><span>-----------------------------------------</span></span> <span class="giallo-l"><span> 0 … literal a 0</span></span> <span class="giallo-l"><span> 1 … literal b 1</span></span> <span class="giallo-l"><span> 2 … capturing_ ( 2</span></span> <span class="giallo-l"><span> 3 … literal c 3</span></span> <span class="giallo-l"><span> 4 … alternation | 4</span></span> <span class="giallo-l"><span> 5 … literal d 5</span></span> <span class="giallo-l"><span> 6 … _capturing ) 6</span></span> <span class="giallo-l"><span> 7 … n_to_m {2,4} 7</span></span> <span class="giallo-l"><span> 8 … literal e 12</span></span> <span class="giallo-l"><span> 9 … zero_or_one ? 13</span></span> <span class="giallo-l"><span> 10 … EOF 15</span></span></code></pre> <p>This is the sequence of tokens produced by the lexical analyzer. The tree is not yet built because this is the first step of the compilation process. However this is always interesting to understand these different steps and see how it works.</p> <p>Now we are able to analyze any regular expressions in the PCRE format! The result of this analysis is a tree. You know what is fun with trees? <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Visitor_pattern">Visiting them</a>.</p> <h2 id="visiting-the-ast">Visiting the AST<a role="presentation" class="anchor" href="#visiting-the-ast" title="Anchor link to this header">#</a> </h2> <p>Unsurprisingly, each node of the AST can be visited thanks to the <a rel="noopener external" target="_blank" href="http://github.com/hoaproject/Visitor"><code>Hoa\Visitor</code> library</a>. Here is an example with the “dump” visitor:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">use</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Compiler</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword">use</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">File</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// 1. Load grammar.</span></span> <span class="giallo-l"><span class="z-variable">$compiler</span><span class="z-keyword z-operator"> =</span><span> Compiler</span><span class="z-punctuation z-separator">\</span><span>Llk</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Llk</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">load</span><span>(</span></span> <span class="giallo-l"><span class="z-keyword"> new</span><span> File</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Read</span><span>(</span><span class="z-string">&#39;hoa://Library/Regex/Grammar.pp&#39;</span><span>)</span></span> <span class="giallo-l"><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// 2. Parse a data.</span></span> <span class="giallo-l"><span class="z-variable">$ast</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $compiler</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">parse</span><span>(</span><span class="z-string">&#39;ab(c|d){2,4}e?&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// 3. Dump the AST.</span></span> <span class="giallo-l"><span class="z-variable">$dump</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span> Compiler</span><span class="z-punctuation z-separator">\</span><span>Visitor</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Dump</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span class="z-variable"> $dump</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">visit</span><span>(</span><span class="z-variable">$ast</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>This program will print the same AST dump we have previously seen in the shell.</p> <p>How to write our own visitor? A visitor is a class with a single <code>visit</code> method. Let's try a visitor that pretty print a regular expression, i.e. transform:</p> <pre class="giallo z-code"><code data-lang="plain"><span class="giallo-l"><span>ab(c|d){2,4}e?</span></span></code></pre> <p>into:</p> <pre class="giallo z-code"><code data-lang="plain"><span class="giallo-l"><span>a</span></span> <span class="giallo-l"><span>b</span></span> <span class="giallo-l"><span>(</span></span> <span class="giallo-l"><span> c</span></span> <span class="giallo-l"><span> |</span></span> <span class="giallo-l"><span> d</span></span> <span class="giallo-l"><span>){2,4}</span></span> <span class="giallo-l"><span>e?</span></span></code></pre> <p>Why a pretty printer? First, it shows how to visit a tree. Second, it shows the structure of the visitor: we filter by node ID (<code>#expression</code>, <code>#quantification</code>, <code>token</code> etc.) and we apply respective computations. A pretty printer is often a good way for being familiarized with the structure of an AST.</p> <p>Here is the class. It catches only useful constructions for the given example:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">use</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Visitor</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> PrettyPrinter</span><span class="z-storage"> implements</span><span> Visitor</span><span class="z-punctuation z-separator">\</span><span class="z-entity z-other z-inherited-class">Visit</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> visit</span><span>(</span></span> <span class="giallo-l"><span> Visitor</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Element</span><span class="z-variable"> $element</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-storage"> &amp;</span><span class="z-variable">$handle</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-language"> null</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-variable"> $eldnah</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-language"> null</span></span> <span class="giallo-l"><span> ) {</span></span> <span class="giallo-l"><span class="z-storage"> static</span><span class="z-variable"> $_indent</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> $out</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-language"> null</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable"> $nodeId</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getId</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> switch</span><span>(</span><span class="z-variable">$nodeId</span><span>) {</span></span> <span class="giallo-l"><span class="z-comment"> // Reset indentation and…</span></span> <span class="giallo-l"><span class="z-keyword"> case</span><span class="z-string"> &#39;#expression&#39;</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-variable"> $_indent</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // … visit all the children.</span></span> <span class="giallo-l"><span class="z-keyword"> case</span><span class="z-string"> &#39;#quantification&#39;</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-keyword"> foreach</span><span>(</span><span class="z-variable">$element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getChildren</span><span>()</span><span class="z-keyword z-operator"> as</span><span class="z-variable"> $child</span><span>)</span></span> <span class="giallo-l"><span class="z-variable"> $out</span><span class="z-keyword z-operator"> .=</span><span class="z-variable"> $child</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">accept</span><span>(</span><span class="z-variable z-language">$this</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $handle</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $eldnah</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> break</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // One new line between each children of the concatenation.</span></span> <span class="giallo-l"><span class="z-keyword"> case</span><span class="z-string"> &#39;#concatenation&#39;</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-keyword"> foreach</span><span>(</span><span class="z-variable">$element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getChildren</span><span>()</span><span class="z-keyword z-operator"> as</span><span class="z-variable"> $child</span><span>)</span></span> <span class="giallo-l"><span class="z-variable"> $out</span><span class="z-keyword z-operator"> .=</span><span class="z-variable"> $child</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">accept</span><span>(</span><span class="z-variable z-language">$this</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $handle</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $eldnah</span><span>)</span><span class="z-keyword z-operator"> .</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> break</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Add parenthesis and increase indentation.</span></span> <span class="giallo-l"><span class="z-keyword"> case</span><span class="z-string"> &#39;#alternation&#39;</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-variable"> $oout</span><span class="z-keyword z-operator"> =</span><span class="z-punctuation z-section"> []</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable"> $pIndent</span><span class="z-keyword z-operator"> =</span><span class="z-support z-function"> str_repeat</span><span>(</span><span class="z-string">&#39; &#39;</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $_indent</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> ++</span><span class="z-variable">$_indent</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable"> $cIndent</span><span class="z-keyword z-operator"> =</span><span class="z-support z-function"> str_repeat</span><span>(</span><span class="z-string">&#39; &#39;</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $_indent</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> foreach</span><span>(</span><span class="z-variable">$element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getChildren</span><span>()</span><span class="z-keyword z-operator"> as</span><span class="z-variable"> $child</span><span>)</span></span> <span class="giallo-l"><span class="z-variable"> $oout</span><span class="z-punctuation z-section">[]</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $cIndent</span><span class="z-keyword z-operator"> .</span><span class="z-variable"> $child</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">accept</span><span>(</span><span class="z-variable z-language">$this</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $handle</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $eldnah</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword z-operator"> --</span><span class="z-variable">$_indent</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable"> $out</span><span class="z-keyword z-operator"> .=</span><span class="z-variable"> $pIndent</span><span class="z-keyword z-operator"> .</span><span class="z-string"> &#39;(&#39;</span><span class="z-keyword z-operator"> .</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-keyword z-operator"> .</span></span> <span class="giallo-l"><span class="z-support z-function"> implode</span><span>(</span><span class="z-string">&quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-keyword z-operator"> .</span><span class="z-variable"> $cIndent</span><span class="z-keyword z-operator"> .</span><span class="z-string"> &#39;|&#39;</span><span class="z-keyword z-operator"> .</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $oout</span><span>)</span><span class="z-keyword z-operator"> .</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-keyword z-operator"> .</span></span> <span class="giallo-l"><span class="z-variable"> $pIndent</span><span class="z-keyword z-operator"> .</span><span class="z-string"> &#39;)&#39;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> break</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Print token value verbatim.</span></span> <span class="giallo-l"><span class="z-keyword"> case</span><span class="z-string"> &#39;token&#39;</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-variable"> $tokenId</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getValueToken</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable"> $tokenValue</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getValueValue</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> switch</span><span>(</span><span class="z-variable">$tokenId</span><span>) {</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> case</span><span class="z-string"> &#39;literal&#39;</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-keyword"> case</span><span class="z-string"> &#39;n_to_m&#39;</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-keyword"> case</span><span class="z-string"> &#39;zero_or_one&#39;</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-variable"> $out</span><span class="z-keyword z-operator"> .=</span><span class="z-variable"> $tokenValue</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> break</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> default</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-keyword"> throw new</span><span class="z-support z-class"> RuntimeException</span><span>(</span></span> <span class="giallo-l"><span class="z-string"> &#39;Token ID &#39;</span><span class="z-keyword z-operator"> .</span><span class="z-variable"> $tokenId</span><span class="z-keyword z-operator"> .</span><span class="z-string"> &#39; is not well-handled.&#39;</span></span> <span class="giallo-l"><span> )</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span class="z-keyword"> break</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> default</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-keyword"> throw new</span><span class="z-support z-class"> RuntimeException</span><span>(</span></span> <span class="giallo-l"><span class="z-string"> &#39;Node ID &#39;</span><span class="z-keyword z-operator"> .</span><span class="z-variable"> $nodeId</span><span class="z-keyword z-operator"> .</span><span class="z-string"> &#39; is not well-handled.&#39;</span></span> <span class="giallo-l"><span> )</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> $out</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>And finally, we apply the pretty printer on the AST like previously seen:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$compiler</span><span class="z-keyword z-operator"> =</span><span> Compiler</span><span class="z-punctuation z-separator">\</span><span>Llk</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Llk</span><span class="z-keyword z-operator">::</span><span class="z-entity z-name z-function">load</span><span>(</span></span> <span class="giallo-l"><span class="z-keyword"> new</span><span> File</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Read</span><span>(</span><span class="z-string">&#39;hoa://Library/Regex/Grammar.pp&#39;</span><span>)</span></span> <span class="giallo-l"><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable">$ast</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $compiler</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">parse</span><span>(</span><span class="z-string">&#39;ab(c|d){2,4}e?&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable">$prettyprint</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span class="z-support z-class"> PrettyPrinter</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span class="z-variable"> $prettyprint</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">visit</span><span>(</span><span class="z-variable">$ast</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p><em>Et voilà !</em></p> <p>Now, put all that stuff together!</p> <h2 id="isotropic-generation">Isotropic generation<a role="presentation" class="anchor" href="#isotropic-generation" title="Anchor link to this header">#</a> </h2> <p>We can use <code>Hoa\Regex</code> and <code>Hoa\Compiler</code> to get the AST of any regular expressions written in the PCRE format. We can use <code>Hoa\Visitor</code> to traverse the AST and apply computations according to the type of nodes. Our goal is to generate strings based on regular expressions. What kind of generation are we going to use? There are plenty of them: uniform random, smallest, coverage based…</p> <p>The simplest is isotropic generation, also known as random generation. But random says nothing: what is the repartition, or do we have any uniformity? Isotropic means each choice will be solved randomly and uniformly. Uniformity has to be defined: does it include the whole set of nodes or just the immediate children of the node? Isotropic means we consider only immediate children. For instance, a node <code>#alternation</code> has <em>c</em> immediate children, the probability <em>C</em> to choose one child is:</p> <math xmlns="http://www.w3.org/1998/Math/MathML"> <semantics> <mrow> <mi>P</mi> <mo stretchy="false">(</mo><mi>C</mi><mo stretchy="false">)</mo> <mo>=</mo> <mfrac> <mn>1</mn> <mi>c</mi> </mfrac> </mrow> <annotation encoding="application/x-tex">P(C) = \frac{1}{c}</annotation> </semantics> </math> <p>Yes, simple as that!</p> <p>We can use the <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Math"><code>Hoa\Math</code> library</a> that provides the <code>Hoa\Math\Sampler\Random</code> class to sample uniform random integers and floats. Ready?</p> <h3 id="structure-of-the-visitor">Structure of the visitor<a role="presentation" class="anchor" href="#structure-of-the-visitor" title="Anchor link to this header">#</a> </h3> <p>The structure of the visitor is the following:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">use</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Visitor</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword">use</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Math</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">class</span><span class="z-entity z-name"> IsotropicSampler</span><span class="z-storage"> implements</span><span> Visitor</span><span class="z-punctuation z-separator">\</span><span class="z-entity z-other z-inherited-class">Visit</span><span> {</span></span> <span class="giallo-l"><span class="z-storage"> protected</span><span class="z-variable"> $_sampler</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-language"> null</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-support z-function"> __construct</span><span>(Math</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Sampler</span><span class="z-variable"> $sampler</span><span>) {</span></span> <span class="giallo-l"><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">_sampler</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $sampler</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage"> public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> visit</span><span>(</span></span> <span class="giallo-l"><span> Visitor</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Element</span><span class="z-variable"> $element</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-storage"> &amp;</span><span class="z-variable">$handle</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-language"> null</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-variable"> $eldnah</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-language"> null</span></span> <span class="giallo-l"><span> ) {</span></span> <span class="giallo-l"><span class="z-keyword"> switch</span><span>(</span><span class="z-variable">$element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getId</span><span>()) {</span></span> <span class="giallo-l"><span class="z-comment"> // …</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>We set a sampler and we start visiting and filtering nodes by their node ID. The following code will generate a string based on the regular expression contained in the <code>$expression</code> variable:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$expression</span><span class="z-keyword z-operator"> =</span><span class="z-string"> &#39;…&#39;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable">$ast</span><span class="z-keyword z-operator"> =</span><span class="z-variable"> $compiler</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">parse</span><span>(</span><span class="z-variable">$expression</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable">$generator</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span class="z-support z-class"> IsotropicSampler</span><span>(</span><span class="z-keyword">new</span><span> Math</span><span class="z-punctuation z-separator">\</span><span>Sampler</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Random</span><span>())</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span class="z-variable"> $generator</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">visit</span><span>(</span><span class="z-variable">$ast</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>We are going to change the value of <code>$expression</code> step by step until having <code>ab(c|d){2,4}e?</code>.</p> <h3 id="case-of-expression">Case of <code>#expression</code><a role="presentation" class="anchor" href="#case-of-expression" title="Anchor link to this header">#</a> </h3> <p>A node of type <code>#expression</code> has only one child. Thus, we simply return the computation of this node:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">case</span><span class="z-string"> &#39;#expression&#39;</span><span class="z-keyword z-operator">:</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> $element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getChild</span><span>(</span><span class="z-constant z-numeric">0</span><span>)</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">accept</span><span>(</span><span class="z-variable z-language">$this</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $handle</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $eldnah</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> break</span><span class="z-punctuation z-terminator">;</span></span></code></pre><h3 id="case-of-token">Case of <code>token</code><a role="presentation" class="anchor" href="#case-of-token" title="Anchor link to this header">#</a> </h3> <p>We consider only one type of token for now: <code>literal</code>. A literal can contain an escaped character, can be a single character or can be <code>.</code> (which means everything). We consider only a single character for this example (spoil: the whole visitor already exists). Thus:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">case</span><span class="z-string"> &#39;token&#39;</span><span class="z-keyword z-operator">:</span></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> $element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getValueValue</span><span>()</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> break</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>Here, with <code>$expression = 'a';</code> we get the string <code>a</code>.</p> <h3 id="case-of-concatenation">Case of <code>#concatenation</code><a role="presentation" class="anchor" href="#case-of-concatenation" title="Anchor link to this header">#</a> </h3> <p>A concatenation is just the computation of all children joined in a single piece of string. Thus:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">case</span><span class="z-string"> &#39;#concatenation&#39;</span><span class="z-keyword z-operator">:</span></span> <span class="giallo-l"><span class="z-variable"> $out</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-language"> null</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> foreach</span><span>(</span><span class="z-variable">$element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getChildren</span><span>()</span><span class="z-keyword z-operator"> as</span><span class="z-variable"> $child</span><span>)</span></span> <span class="giallo-l"><span class="z-variable"> $out</span><span class="z-keyword z-operator"> .=</span><span class="z-variable"> $child</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">accept</span><span>(</span><span class="z-variable z-language">$this</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $handle</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $eldnah</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> $out</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> break</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>At this step, with <code>$expression = 'ab';</code> we get the string <code>ab</code>. Totally crazy.</p> <h3 id="case-of-alternation">Case of <code>#alternation</code><a role="presentation" class="anchor" href="#case-of-alternation" title="Anchor link to this header">#</a> </h3> <p>An alternation is a choice between several children. All we have to do is to select a child based on the probability given above. The number of children for the current node can be known thanks to the <code>getChildrenNumber</code> method. We are also going to use the sampler of integers. Thus:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">case</span><span class="z-string"> &#39;#alternation&#39;</span><span class="z-keyword z-operator">:</span></span> <span class="giallo-l"><span class="z-variable"> $childIndex</span><span class="z-keyword z-operator"> =</span><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">_sampler</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getInteger</span><span>(</span></span> <span class="giallo-l"><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-variable"> $element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getChildrenNumber</span><span>()</span><span class="z-keyword z-operator"> -</span><span class="z-constant z-numeric"> 1</span></span> <span class="giallo-l"><span> )</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> $element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getChild</span><span>(</span><span class="z-variable">$childIndex</span><span>)</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">accept</span><span>(</span><span class="z-variable z-language">$this</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $handle</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $eldnah</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> break</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>Now, with <code>$expression = 'ab(c|d)';</code> we get the strings <code>abc</code> or <code>abd</code> at random. Try several times to see by yourself.</p> <h3 id="case-of-quantification">Case of <code>#quantification</code><a role="presentation" class="anchor" href="#case-of-quantification" title="Anchor link to this header">#</a> </h3> <p>A quantification is an alternation of concatenations. Indeed, <code>e{2,4}</code> is strictly equivalent to <code>ee|eee|eeee</code>. We have only two quantifications in our example: <code>?</code> and <code>{_x_,_y_}</code>. We are going to find the value for <code>_x_</code> and <code>_y_</code> and then choose at random between these bounds. Let's go:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">case</span><span class="z-string"> &#39;#quantification&#39;</span><span class="z-keyword z-operator">:</span></span> <span class="giallo-l"><span class="z-variable"> $out</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-language"> null</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable"> $x</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable"> $y</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Filter the type of quantification.</span></span> <span class="giallo-l"><span class="z-keyword"> switch</span><span>(</span><span class="z-variable">$element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getChild</span><span>(</span><span class="z-constant z-numeric">1</span><span>)</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getValueToken</span><span>()) {</span></span> <span class="giallo-l"><span class="z-comment"> // ?</span></span> <span class="giallo-l"><span class="z-keyword"> case</span><span class="z-string"> &#39;zero_or_one&#39;</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-variable"> $y</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-numeric"> 1</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> break</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // {x,y}</span></span> <span class="giallo-l"><span class="z-keyword"> case</span><span class="z-string"> &#39;n_to_m&#39;</span><span class="z-punctuation z-terminator">:</span></span> <span class="giallo-l"><span class="z-variable"> $xy</span><span class="z-keyword z-operator"> =</span><span class="z-support z-function"> explode</span><span>(</span></span> <span class="giallo-l"><span class="z-string"> &#39;,&#39;</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-support z-function"> trim</span><span>(</span><span class="z-variable">$element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getChild</span><span>(</span><span class="z-constant z-numeric">1</span><span>)</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getValueValue</span><span>()</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &#39;{}&#39;</span><span>)</span></span> <span class="giallo-l"><span> )</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable"> $x</span><span class="z-keyword z-operator"> =</span><span> (</span><span class="z-storage">int</span><span>)</span><span class="z-support z-function"> trim</span><span>(</span><span class="z-variable">$xy</span><span class="z-punctuation z-section">[</span><span class="z-constant z-numeric">0</span><span class="z-punctuation z-section">]</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-variable"> $y</span><span class="z-keyword z-operator"> =</span><span> (</span><span class="z-storage">int</span><span>)</span><span class="z-support z-function"> trim</span><span>(</span><span class="z-variable">$xy</span><span class="z-punctuation z-section">[</span><span class="z-constant z-numeric">1</span><span class="z-punctuation z-section">]</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> break</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Choose the number of repetitions.</span></span> <span class="giallo-l"><span class="z-variable"> $max</span><span class="z-keyword z-operator"> =</span><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">_sampler</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getInteger</span><span>(</span><span class="z-variable">$x</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $y</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment"> // Concatenate.</span></span> <span class="giallo-l"><span class="z-keyword"> for</span><span>(</span><span class="z-variable">$i</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span><span class="z-variable"> $i</span><span class="z-keyword z-operator"> &lt;</span><span class="z-variable"> $max</span><span class="z-punctuation z-terminator">;</span><span class="z-keyword z-operator"> ++</span><span class="z-variable">$i</span><span>) {</span></span> <span class="giallo-l"><span class="z-variable"> $out</span><span class="z-keyword z-operator"> .=</span><span class="z-variable"> $element</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">getChild</span><span>(</span><span class="z-constant z-numeric">0</span><span>)</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">accept</span><span>(</span><span class="z-variable z-language">$this</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $handle</span><span class="z-punctuation z-separator">,</span><span class="z-variable"> $eldnah</span><span>)</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword"> return</span><span class="z-variable"> $out</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-keyword"> break</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>Finally, with <code>$expression = 'ab(c|d){2,4}e?';</code> we can have the following strings: <code>abdcce</code>, <code>abdc</code>, <code>abddcd</code>, <code>abcde</code> etc. Nice isn't it? Want more?</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">for</span><span>(</span><span class="z-variable">$i</span><span class="z-keyword z-operator"> =</span><span class="z-constant z-numeric"> 0</span><span class="z-punctuation z-terminator">;</span><span class="z-variable"> $i</span><span class="z-keyword z-operator"> &lt;</span><span class="z-constant z-numeric"> 42</span><span class="z-punctuation z-terminator">;</span><span class="z-keyword z-operator"> ++</span><span class="z-variable">$i</span><span>) {</span></span> <span class="giallo-l"><span class="z-support z-function"> echo</span><span class="z-variable"> $generator</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">visit</span><span>(</span><span class="z-variable">$ast</span><span>)</span><span class="z-punctuation z-separator">,</span><span class="z-string"> &quot;</span><span class="z-constant z-character">\n</span><span class="z-string">&quot;</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span>}</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">/**</span></span> <span class="giallo-l"><span class="z-comment"> * Could output:</span></span> <span class="giallo-l"><span class="z-comment"> * abdce</span></span> <span class="giallo-l"><span class="z-comment"> * abdcc</span></span> <span class="giallo-l"><span class="z-comment"> * abcdde</span></span> <span class="giallo-l"><span class="z-comment"> * abcdcd</span></span> <span class="giallo-l"><span class="z-comment"> * abcde</span></span> <span class="giallo-l"><span class="z-comment"> * abcc</span></span> <span class="giallo-l"><span class="z-comment"> * abddcde</span></span> <span class="giallo-l"><span class="z-comment"> * abddcce</span></span> <span class="giallo-l"><span class="z-comment"> * abcde</span></span> <span class="giallo-l"><span class="z-comment"> * abcc</span></span> <span class="giallo-l"><span class="z-comment"> * abdcce</span></span> <span class="giallo-l"><span class="z-comment"> * abcde</span></span> <span class="giallo-l"><span class="z-comment"> * abdce</span></span> <span class="giallo-l"><span class="z-comment"> * abdd</span></span> <span class="giallo-l"><span class="z-comment"> * abcdce</span></span> <span class="giallo-l"><span class="z-comment"> * abccd</span></span> <span class="giallo-l"><span class="z-comment"> * abdcdd</span></span> <span class="giallo-l"><span class="z-comment"> * abcdcce</span></span> <span class="giallo-l"><span class="z-comment"> * abcce</span></span> <span class="giallo-l"><span class="z-comment"> * abddc</span></span> <span class="giallo-l"><span class="z-comment"> */</span></span></code></pre><h2 id="performance">Performance<a role="presentation" class="anchor" href="#performance" title="Anchor link to this header">#</a> </h2> <p>This is difficult to give numbers because it depends of a lot of parameters: your machine configuration, the PHP VM, if other programs run etc. But I have generated 1 million strings in less than 25 seconds on my machine (an old MacBook Pro), which is pretty reasonable.</p> <h2 id="conclusion-and-surprise">Conclusion and surprise<a role="presentation" class="anchor" href="#conclusion-and-surprise" title="Anchor link to this header">#</a> </h2> <p>So, yes, now we know how to generate strings based on regular expressions! Supporting all the PCRE format is difficult. That's why the <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Regex"><code>Hoa\Regex</code> library</a> provides the <code>Hoa\Regex\Visitor\Isotropic</code> class that is a more advanced visitor. This latter supports classes, negative classes, ranges, all quantifications, all kinds of literals (characters, escaped characters, types of characters —<code>\w</code>, <code>\d</code>, <code>\h</code>…—) etc. Consequently, all you have to do is:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-keyword">use</span><span> Hoa</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Regex</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-comment">// …</span></span> <span class="giallo-l"><span class="z-variable">$generator</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span> Regex</span><span class="z-punctuation z-separator">\</span><span>Visitor</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Isotropic</span><span>(</span><span class="z-keyword">new</span><span> Math</span><span class="z-punctuation z-separator">\</span><span>Sampler</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Random</span><span>())</span><span class="z-punctuation z-terminator">;</span></span> <span class="giallo-l"><span class="z-support z-function">echo</span><span class="z-variable"> $generator</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">visit</span><span>(</span><span class="z-variable">$ast</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>This algorithm is used in <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Praspel">Praspel</a>, a specification language I have designed during my PhD thesis. More specifically, this algorithm is used inside realistic domains. I am not going to explain it today but it allows me to introduce the “surprise”.</p> <h3 id="generate-strings-based-on-regular-expressions-in-atoum">Generate strings based on regular expressions in atoum<a role="presentation" class="anchor" href="#generate-strings-based-on-regular-expressions-in-atoum" title="Anchor link to this header">#</a> </h3> <p><a rel="noopener external" target="_blank" href="http://atoum.org/">atoum</a> is an awesome unit test framework. You can use the <a rel="noopener external" target="_blank" href="https://github.com/hoaproject/Contributions-Atoum-PraspelExtension"><code>Atoum\PraspelExtension</code> extension</a> to use Praspel and therefore realistic domains inside atoum. You can use realistic domains to validate <strong>and</strong> to generate data, they are designed for that. Obviously, we can use the <code>Regex</code> realistic domain. This extension provides several features including <code>sample</code>, <code>sampleMany</code> and <code>predicate</code> to respectively generate one datum, generate many data and validate a datum based on a realistic domain. To declare a regular expression, we must write:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$regex</span><span class="z-keyword z-operator"> =</span><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">realdom</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">regex</span><span>(</span><span class="z-string z-regexp">&#39;/ab(c|d){2,4}e?/&#39;</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>And to generate a datum, all we have to do is:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-variable">$datum</span><span class="z-keyword z-operator"> =</span><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">sample</span><span>(</span><span class="z-variable">$regex</span><span>)</span><span class="z-punctuation z-terminator">;</span></span></code></pre> <p>For instance, imagine you are writing a test called <code>test_mail</code> and you need an email address:</p> <pre class="giallo z-code"><code data-lang="php"><span class="giallo-l"><span class="z-keyword z-operator">&lt;?</span><span class="z-constant z-other">php</span></span> <span class="giallo-l"></span> <span class="giallo-l"><span class="z-storage">public</span><span class="z-storage z-type z-function"> function</span><span class="z-entity z-name z-function"> test_mail</span><span>() {</span></span> <span class="giallo-l"><span class="z-variable z-language"> $this</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">given</span><span>(</span></span> <span class="giallo-l"><span class="z-variable"> $regex</span><span class="z-keyword z-operator"> =</span><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-variable">realdom</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">regex</span><span>(</span><span class="z-string z-regexp">&#39;/[\w\-_]</span><span class="z-keyword z-operator">+</span><span class="z-string z-regexp z-constant z-character">(\.[\w\-\_]</span><span class="z-keyword z-operator">+</span><span class="z-string z-regexp">)</span><span class="z-keyword z-operator">*</span><span class="z-string z-regexp z-constant z-character">@\w\.(net|org)/&#39;</span><span>)</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-variable"> $address</span><span class="z-keyword z-operator"> =</span><span class="z-variable z-language"> $this</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">sample</span><span>(</span><span class="z-variable">$regex</span><span>)</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-variable"> $mailer</span><span class="z-keyword z-operator"> =</span><span class="z-keyword"> new</span><span class="z-punctuation z-separator"> \</span><span>Mock</span><span class="z-punctuation z-separator">\</span><span class="z-support z-class">Mailer</span><span>(</span><span class="z-constant z-other">…</span><span>)</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span> )</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-entity z-name z-function">when</span><span>(</span><span class="z-variable">$mailer</span><span class="z-keyword z-operator">-&gt;</span><span class="z-entity z-name z-function">sendTo</span><span>(</span><span class="z-variable">$address</span><span>))</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-variable">then</span></span> <span class="giallo-l"><span class="z-keyword z-operator"> -&gt;</span><span class="z-variable">…</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>Easy to read, fast to execute and help to focus on the logic of the test instead of test data (also known as fixtures). Note that most of the time the regular expressions are already in the code (maybe as constants). It is therefore easier to write and to maintain the tests.</p> <p>I hope you enjoyed this first part of the series :-)! This work has been published in the International Conference on Software Testing, Verification and Validation: <a rel="noopener external" target="_blank" href="https://hal.science/hal-00931662/file/EDGB12.pdf">Grammar-Based Testing using Realistic Domains in PHP</a>.</p> Rüsh Release 2014-09-15T00:00:00+00:00 2014-09-15T00:00:00+00:00 Unknown https://mnt.io/articles/rush-release/ <p>Since 2 years, at <a rel="noopener external" target="_blank" href="http://hoa-project.net/">Hoa</a>, we are looking for the perfect release process. Today, we have finalized the last thing related to this new process: we have found a name. It is called <strong>Rüsh Release</strong>, standing for <em>Rolling Ünd ScHeduled Release</em>.</p> <p>The following explanations are useful from the user point of view, not from the developer point of view. It means that we do not explain all the branches and the workflow between all of them. We will settle for the user final impact.</p> <h2 id="rolling-release">Rolling Release<a role="presentation" class="anchor" href="#rolling-release" title="Anchor link to this header">#</a> </h2> <p>On one hand, Hoa is not and will never be finished. We will never reach the “Holy 1.0 Grail”. So, one might reckon that Hoa is rolling-released? Let's dive into this direction. There are plenty <a rel="noopener external" target="_blank" href="https://en.wikipedia.org/wiki/Rolling_release">rolling release types</a> out there, such as:</p> <ul> <li>partially rolling,</li> <li>fully rolling,</li> <li>truly rolling,</li> <li>pseudo-rolling,</li> <li>optionally rolling,</li> <li>cyclically rolling,</li> <li>and synonyms…</li> </ul> <p>I am not going to explain all of them. All you need to know is that Hoa is partly and truly rolling released, or <em>part-</em> and <em>true-</em> rolling released for short. Why? Firstly, “Part-rolling project has a subset of software packages that are not rolling”. If we look at Hoa only, it is fully rolling but Hoa depends on PHP virtual machines to be executed, which are not rolling released (for the most popular ones at least). Thus, Hoa is partly rolling released. Secondly, “True-rolling [project] are developed solely using a rolling release software development model”, which is the case of Hoa. Consequently and finally, the <code>master</code> branch is the final public branch, it means that it <strong>always</strong> contains the latest version, and users constantly fetch updates from it.</p> <h2 id="versioning">Versioning<a role="presentation" class="anchor" href="#versioning" title="Anchor link to this header">#</a> </h2> <p>Sounds good. On the other hand, the majority of programs that are using Hoa use tools called dependency managers. The most popular in PHP is <a rel="noopener external" target="_blank" href="http://getcomposer.org/">Composer</a>. This is a fantastic tool but with a little spine that hurts us a lot: it does not support rolling release! Most of the time, dependency managers work with version numbers, mainly of the form <code>_x_._y_._z_</code>, with a specific semantics for <code>_x_</code>, <code>_y_</code> and <code>_z_</code>. For instance, some people have agreed about <a rel="noopener external" target="_blank" href="http://semver.org/">semver</a>, standing for <em>Semantic Versioning</em>.</p> <p>Also, we are not extremist. We understand the challenges and the needs behind versioning. So, how to mix both: rolling release and versioning? Before answering this question, let's progress a little step forward and learn more about an alternative versioning approach.</p> <h3 id="scheduled-based-release">Scheduled-based release<a role="presentation" class="anchor" href="#scheduled-based-release" title="Anchor link to this header">#</a> </h3> <p>Scheduled-based, also known as date-based, release allows to define releases at regular periods of time. This approach is widely adopted for projects that progress quickly, such as Firefox or PHP (see the <a rel="noopener external" target="_blank" href="https://wiki.php.net/rfc/releaseprocess">PHP RFC: Release Process</a> for example). For Firefox, every 6 weeks, a new version is released. Note that we should say <em>a new update</em> to be honest: the term <em>version</em> has an unclear meaning here.</p> <p>The scheduled-based release seems a good candidate to be mixed with rolling release, isn't it?</p> <h2 id="rush-release">Rüsh Release<a role="presentation" class="anchor" href="#rush-release" title="Anchor link to this header">#</a> </h2> <p>Rüsh Release is a mix between part- and true-rolling release and scheduled-based release. The <code>master</code> branch is part- and true-rolling release, but with a semi-automatically versioning:</p> <ul> <li>each 6 weeks, if at least one new patch has been merged into the <code>master</code>, a new version is created,</li> <li>before 6 weeks, if several critical or significant patches have been applied, a new version is created.</li> </ul> <p>What is the version format then? We have proposed <code>_YY_{2,4}._mm_._dd_</code>, starting from 2000, our “Rüsh Epoch”.</p> <p>Nevertheless, we are not <strong>infallible</strong> and we can potentially break backward compatibility. It never happened but we have to face it. This is a problem because neither the part- and true-rolling release nor the scheduled-based release holds the information that the backward compatibility has been broken. Therefore, the <code>master</code> branch must have a <strong>compatibility number</strong> <code>_x_</code>, starting from 1 with step of 1. Consequently, the new and last version format is <code>_x_._Y_{2,4}._mm_._dd_</code>. For today for instance, it is <code>1.14.09.15</code>.</p> <p>With the Rüsh Release process, we can freely rolling release our libraries while ensuring the safety and embracing the pros of versioning.</p> <p>So, now, you will be able to change your <code>composer.json</code> files from:</p> <pre class="giallo z-code"><code data-lang="json"><span class="giallo-l"><span>{</span></span> <span class="giallo-l"><span class="z-support z-type z-property-name"> &quot;require&quot;</span><span class="z-punctuation z-separator">:</span><span> {</span></span> <span class="giallo-l"><span class="z-support z-type z-property-name"> &quot;hoa/websocket&quot;</span><span class="z-punctuation z-separator">:</span><span class="z-string"> &quot;dev-master&quot;</span></span> <span class="giallo-l"><span> }</span><span class="z-punctuation z-separator">,</span></span> <span class="giallo-l"><span class="z-support z-type z-property-name"> &quot;minimum-stability&quot;</span><span class="z-punctuation z-separator">:</span><span class="z-string"> &quot;dev&quot;</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>to (<a rel="noopener external" target="_blank" href="https://getcomposer.org/doc/01-basic-usage.md#next-significant-release-tilde-operator-">learn more about the tilde operator</a>):</p> <pre class="giallo z-code"><code data-lang="json"><span class="giallo-l"><span>{</span></span> <span class="giallo-l"><span class="z-support z-type z-property-name"> &quot;require&quot;</span><span class="z-punctuation z-separator">:</span><span> {</span></span> <span class="giallo-l"><span class="z-support z-type z-property-name"> &quot;hoa/websocket&quot;</span><span class="z-punctuation z-separator">:</span><span class="z-string"> &quot;~1.0&quot;</span></span> <span class="giallo-l"><span> }</span></span> <span class="giallo-l"><span>}</span></span></code></pre> <p>\o/</p>