Jekyll2022-12-22T16:11:50+00:00https://fbiville.github.io/feed.xml🎾 Florent + The MachineInsatiable LearnerFlorent BivilleNode.js Streams For Fun And Profit2020-04-16T00:00:00+00:002020-04-16T00:00:00+00:00https://fbiville.github.io/2020/04/16/Node_Streams_For_Fun_And_Profit<p>I joined the <a href="https://projectriff.io">riff</a> team at Pivotal a year and a half ago. I have been working for more than a year on <a href="https://projectriff.io">riff</a> invokers.</p> <p>This probably deserves a blog post on its own, but invokers, in short, have the responsibility of invoking user-defined functions and exposing a way to send inputs and receive outputs. The <a href="https://github.com/projectriff/invoker-specification/">riff invocation protocol</a> formally defines the scope of such invokers.</p> <p>Part of my job has been to update the existing invokers (especially the <a href="https://github.com/projectriff/node-function-invoker">Node.js one</a>) so that they comply with this spec. As the invocation protocol is a <a href="https://github.com/projectriff/invoker-specification/blob/a41d885fb411dc00e7ea3f7724ede4c435121a62/riff-rpc.proto#L13">streaming-first protocol</a>, I had to really brush up my knowledge about Node.js streams (narrator’s voice: well, learn from zero).</p> <p>I learnt a lot by trial and error, probably more than I care to admit. This blog post serves as an introduction to Node.js streams. Hopefully, it also outlines some good practices, and some annoying pitfalls to avoid.</p> <h2 id="thanks-dear-proofreaders">Thanks, Dear (Proof)Readers</h2> <p>I would like to thank:</p> <ul> <li><a href="https://twitter.com/old_sound">Alvaro Videla</a></li> <li><a href="https://twitter.com/nicokosi">Nicolas Kosinski</a></li> <li><a href="https://twitter.com/poledesfetes">Vladimir de Turckheim</a></li> </ul> <p>for the various suggestions to make this better. Thanks ❀</p> <h2 id="harder-better-mapper-zipper">Harder, Better, Mapper, Zipper</h2> <p>Let’s create a tiny Node.js library that works with streams and provide familiar functional operators such as <code class="language-plaintext highlighter-rouge">map</code> and <code class="language-plaintext highlighter-rouge">zip</code>.</p> <p>First, what is a stream?</p> <p>Loosely defined, a stream conveys (possibly indefinitely) chunks of data, to which specific operations can be applied.</p> <p>How does that translate to Node.js exactly?</p> <h2 id="streams-in-nodejs">Streams in Node.js</h2> <p>Node.js streams come in two flavors: <a href="https://nodejs.org/api/stream.html#stream_readable_streams"><code class="language-plaintext highlighter-rouge">Readable</code></a> and <a href="https://nodejs.org/api/stream.html#stream_writable_streams"><code class="language-plaintext highlighter-rouge">Writable</code></a>.</p> <ul> <li><code class="language-plaintext highlighter-rouge">Readable</code> streams can be read from</li> <li><code class="language-plaintext highlighter-rouge">Writable</code> streams can be written to</li> </ul> <p><a href="https://nodejs.org/api/stream.html#stream_readable_pipe_destination_options"><code class="language-plaintext highlighter-rouge">Readable#pipe</code></a> allows to create a pipeline, where the inputs come from the <code class="language-plaintext highlighter-rouge">Readable</code> stream and are written to the destination <code class="language-plaintext highlighter-rouge">Writable</code> stream.</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span><span class="p">,</span> <span class="nx">Writable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="kd">const</span> <span class="nx">myReadableStream</span> <span class="cm">/* = instantiate Readable stream */</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">myWritableStream</span> <span class="cm">/* = instantiate Writable stream */</span><span class="p">;</span> <span class="nx">myReadableStream</span><span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">myWritableStream</span><span class="p">);</span> </code></pre></div></div> <p>What happens here is that the source <code class="language-plaintext highlighter-rouge">Readable</code> stream goes from a paused state to a <a href="https://nodejs.org/api/stream.html#stream_three_states">flowing state</a> after <code class="language-plaintext highlighter-rouge">pipe</code> is called.</p> <blockquote> <p>You can manually manage such state transitions with functions like <a href="https://nodejs.org/api/stream.html#stream_readable_pause"><code class="language-plaintext highlighter-rouge">Readable#pause</code></a> or <a href="https://nodejs.org/api/stream.html#stream_readable_resume"><code class="language-plaintext highlighter-rouge">Readable#resume</code></a> but we are only going to rely on automatic flowing mode from now on.</p> </blockquote> <p>A Node.js stream can also encapsulate a <code class="language-plaintext highlighter-rouge">Readable</code> side <strong>and</strong> a <code class="language-plaintext highlighter-rouge">Writable</code> side, such streams are called <a href="https://nodejs.org/api/stream.html#stream_class_stream_duplex"><code class="language-plaintext highlighter-rouge">Duplex</code></a> streams. If outputs of the duplex stream depend on inputs, then a <a href="https://nodejs.org/api/stream.html#stream_class_stream_transform"><code class="language-plaintext highlighter-rouge">Transform</code></a> stream is the way to go (it is a specialization of the <code class="language-plaintext highlighter-rouge">Duplex</code> type).</p> <blockquote> <p>Outputs are <em>read</em>, hence they come from the <code class="language-plaintext highlighter-rouge">Readable</code> side of the <code class="language-plaintext highlighter-rouge">Duplex</code> stream.</p> <p>Inputs are <em>written</em>, hence they go to the <code class="language-plaintext highlighter-rouge">Writable</code> side of the <code class="language-plaintext highlighter-rouge">Duplex</code> stream.</p> <p><code class="language-plaintext highlighter-rouge">Transform</code> streams automatically expose chunks from the <code class="language-plaintext highlighter-rouge">Writable</code> side to a user-defined transformation function. The function results are automatically forwarded to the <code class="language-plaintext highlighter-rouge">Readable</code> side of the <code class="language-plaintext highlighter-rouge">Transform</code> stream.</p> <p>Note: unfortunately, <code class="language-plaintext highlighter-rouge">Duplex</code> streams do not differentiate <code class="language-plaintext highlighter-rouge">Readable</code> errors from <code class="language-plaintext highlighter-rouge">Writable</code> ones.</p> </blockquote> <p><img src="/assets/img/node_streams.svg" alt="Node.js stream family" title="Node.js stream family diagram" /></p> <p>These compound streams are interesting for any kind of pipeline beyond basic ones. They encode intermediate transformations before chunks reach the final destination <code class="language-plaintext highlighter-rouge">Writable</code> stream.</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span><span class="p">,</span> <span class="nx">Transform</span><span class="p">,</span> <span class="nx">Writable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="kd">const</span> <span class="nx">myReadableStream</span> <span class="cm">/* = instantiate Readable stream */</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">myTransformStream1</span> <span class="cm">/* = instantiate Transform stream */</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">myTransformStream2</span> <span class="cm">/* = instantiate Transform stream */</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">myTransformStream3</span> <span class="cm">/* = instantiate Transform stream */</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">myWritableStream</span> <span class="cm">/* = instantiate Writable stream */</span><span class="p">;</span> <span class="nx">myReadableStream</span> <span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">myTransformStream1</span><span class="p">)</span> <span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">myTransformStream2</span><span class="p">)</span> <span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">myTransformStream3</span><span class="p">)</span> <span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">myWritableStream</span><span class="p">);</span> </code></pre></div></div> <p>The above “fluent” example works because <code class="language-plaintext highlighter-rouge">Readable#pipe</code> returns the reference to the destination stream. <code class="language-plaintext highlighter-rouge">Transform</code> (or more generally, <code class="language-plaintext highlighter-rouge">Duplex</code>) streams have two sides, so they can be piped to (<code class="language-plaintext highlighter-rouge">Writable</code> side) and then from (<code class="language-plaintext highlighter-rouge">Readable</code> side) via a new <code class="language-plaintext highlighter-rouge">pipe</code> call.</p> <p>However, this is not necessarily the best way to define a <strong>linear</strong> pipeline though. One important limitation is that <code class="language-plaintext highlighter-rouge">pipe</code> does not offer any particular assistance when it comes to error handling.</p> <blockquote> <p>Emphasis on linear here. Streams can be piped from and to several times, so you can end up with graph-shaped pipelines.</p> </blockquote> <p>A more robust alternative in case of linear pipelines is to use the built-in <code class="language-plaintext highlighter-rouge">pipeline</code> function. It must be called with:</p> <ul> <li>1 <code class="language-plaintext highlighter-rouge">Readable</code> stream (a.k.a. the source)</li> <li>0..n <code class="language-plaintext highlighter-rouge">Duplex</code> stream (a.k.a. intermediates)</li> <li>1 <code class="language-plaintext highlighter-rouge">Writable</code> stream (a.k.a. the destination)</li> </ul> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">pipeline</span><span class="p">,</span> <span class="nx">Readable</span><span class="p">,</span> <span class="nx">Transform</span><span class="p">,</span> <span class="nx">Writable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="kd">const</span> <span class="nx">myReadableStream</span> <span class="cm">/* = instantiate Readable stream */</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">myTransformStream1</span> <span class="cm">/* = instantiate Transform stream */</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">myTransformStream2</span> <span class="cm">/* = instantiate Transform stream */</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">myTransformStream3</span> <span class="cm">/* = instantiate Transform stream */</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">myWritableStream</span> <span class="cm">/* = instantiate Writable stream */</span><span class="p">;</span> <span class="nx">pipeline</span><span class="p">(</span> <span class="nx">myReadableStream</span><span class="p">,</span> <span class="nx">myTransformStream1</span><span class="p">,</span> <span class="nx">myTransformStream2</span><span class="p">,</span> <span class="nx">myTransformStream3</span><span class="p">,</span> <span class="nx">myWritableStream</span><span class="p">,</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">}</span> <span class="p">);</span> </code></pre></div></div> <p>You can also provide a callback that will be invoked when the pipeline completes, abnormally (i.e. when an error occurs) or not.</p> <blockquote> <p><code class="language-plaintext highlighter-rouge">pipeline</code> invokes the completion callback even if any of the streams’ setting <code class="language-plaintext highlighter-rouge">autoDestroy</code> is set to <code class="language-plaintext highlighter-rouge">false</code>.</p> </blockquote> <blockquote> <p><code class="language-plaintext highlighter-rouge">pipeline</code> actually supports more than streams but that’s out of scope for this article. Feel free to check <a href="https://nodejs.org/api/stream.html#stream_stream_pipeline_source_transforms_destination_callback">the documentation</a> to learn about other usages.</p> </blockquote> <p>Now that the general pipeline model is understood, let’s dive into the details of how <code class="language-plaintext highlighter-rouge">map</code> works, learning how custom streams are implemented in the process.</p> <h2 id="you-cant-map-this">You Can’t <code class="language-plaintext highlighter-rouge">map</code> This</h2> <p>Credit where credit is due, I am going to reuse the awesome diagrams of <a href="https://projectreactor.io/">project Reactor</a>.</p> <p><img src="/assets/img/mapForFlux.svg" alt="`map` diagram" title="`map` diagram" /></p> <p>The top of the diagram depicts chunks as they initially come to the stream, as well as the stream completion signal (marked by the bold vertical line at the end of the sequence).</p> <p>The <code class="language-plaintext highlighter-rouge">map</code> operation here is in the middle, applying a transformation from circles to squares.</p> <p>The bottom part of the diagram shows the resulting chunks and how the completion signal is propagated as-is.</p> <p>In other terms, <code class="language-plaintext highlighter-rouge">map</code> applies a transformation function to each element of the stream, in the order they arrive.</p> <p>Let’s start with a <a href="https://jasmine.github.io/">Jasmine</a> test:</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kd">const</span> <span class="p">{</span> <span class="nx">PassThrough</span><span class="p">,</span> <span class="nx">pipeline</span><span class="p">,</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="nx">describe</span><span class="p">(</span><span class="dl">"</span><span class="s2">map operator =&gt;</span><span class="dl">"</span><span class="p">,</span> <span class="p">()</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="nx">it</span><span class="p">(</span><span class="dl">"</span><span class="s2">applies transformations to chunks</span><span class="dl">"</span><span class="p">,</span> <span class="p">(</span><span class="nx">done</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="kd">const</span> <span class="nx">source</span> <span class="o">=</span> <span class="nx">Readable</span><span class="p">.</span><span class="k">from</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">{</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="c1">// (1)</span> <span class="kd">const</span> <span class="nx">transformation</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">MapTransform</span><span class="p">((</span><span class="nx">number</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="nx">number</span> <span class="o">**</span> <span class="mi">2</span><span class="p">);</span> <span class="c1">// (2)</span> <span class="kd">const</span> <span class="nx">destination</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">PassThrough</span><span class="p">({</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="c1">// (3)</span> <span class="kd">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="p">[];</span> <span class="c1">// ??? (4)</span> <span class="nx">pipeline</span><span class="p">(</span> <span class="nx">source</span><span class="p">,</span> <span class="nx">transformation</span><span class="p">,</span> <span class="nx">destination</span><span class="p">,</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="c1">// (5)</span> <span class="nx">expect</span><span class="p">(</span><span class="nx">err</span><span class="p">).</span><span class="nx">toBeFalsy</span><span class="p">(</span><span class="dl">'</span><span class="s1">pipeline should successfully complete</span><span class="dl">'</span><span class="p">);</span> <span class="nx">expect</span><span class="p">(</span><span class="nx">result</span><span class="p">).</span><span class="nx">toEqual</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">9</span><span class="p">]);</span> <span class="nx">done</span><span class="p">();</span> <span class="p">}</span> <span class="p">);</span> <span class="p">});</span> <span class="p">})</span> </code></pre></div></div> <p>A few things of note:</p> <ol> <li>You can create a <code class="language-plaintext highlighter-rouge">Readable</code> from an iterable source such as an array, or a <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/function*">generator function</a>. Here, the stream will emit each array element in succession. The <a href="https://nodejs.org/api/stream.html#stream_object_mode"><code class="language-plaintext highlighter-rouge">objectMode</code></a> option configures the stream to receive any kind of chunk. The default chunk data type is textual or binary (i.e. strings, <code class="language-plaintext highlighter-rouge">Buffer</code> or <code class="language-plaintext highlighter-rouge">Uint8Array</code>). Quite surprisingly, the default mode when specifically using <code class="language-plaintext highlighter-rouge">Readable#from</code> is the object mode, contrary to stream constructors. However redundant, the object mode is set here just for consistency’s sake.</li> <li><code class="language-plaintext highlighter-rouge">MapTransform</code> does not exist yet, we will have to figure out its implementation next but we can assume its constructor accepts a transformation function (here: the square function). We could pass the <code class="language-plaintext highlighter-rouge">objectMode</code> setting, but let’s assume it always operates this way.</li> <li><a href="https://nodejs.org/api/stream.html#stream_class_stream_passthrough"><code class="language-plaintext highlighter-rouge">PassThrough</code></a> is a special implementation of <code class="language-plaintext highlighter-rouge">Transform</code> stream which directly forwards inputs as outputs (it applies the identity function in other words).</li> <li>we need to somehow accumulate the observed outputs to <code class="language-plaintext highlighter-rouge">result</code>, more on that soon</li> <li>we leverage the completion callback of <code class="language-plaintext highlighter-rouge">pipeline</code> to verify a few things: <ol> <li>the pipeline completes successfully</li> <li>the observed results are consistent with the transformation we intend to apply on the initial chunks</li> <li><code class="language-plaintext highlighter-rouge">done</code> is a Jasmine utility to notify the test runner of the (asynchronous) test completion</li> </ol> </li> </ol> <p>For people familiar with the given-when-then test structure, this test may look a bit strange. Indeed, the order is changed here to given-then-when. This has to do with the asynchronous nature of streams. We have to set up the expectations (the “then” block) before data starts flowing in, i.e. before <code class="language-plaintext highlighter-rouge">pipeline</code> is called.</p> <p>How can we be sure the test completes? After all, streams can be infinite. In that case, <code class="language-plaintext highlighter-rouge">Readable#from</code> reads a finite array and will send a completion signal once the array is fully consumed. This completion signal will be forwarded to all the other (downstream) streams, we can therefore be confident the <code class="language-plaintext highlighter-rouge">pipeline</code> completion callback is going to be called. In the worst case, the test will hang for a while until the Jasmine timeout is reached, causing a test failure.</p> <p>We now need to figure out how to complete the test.</p> <p>Node.js streams extend <a href="https://nodejs.org/api/events.html#events_events"><code class="language-plaintext highlighter-rouge">EventEmitter</code></a>. They emit specific events that can be listened to via functions such as <code class="language-plaintext highlighter-rouge">EventEmitter#on(eventType, callback)</code>. Event listeners are <strong>synchronously</strong> executed in the order they are added (you can tweak the order via alternative functions such as <code class="language-plaintext highlighter-rouge">EventEmitter#prependListener(eventType, callback)</code>).</p> <p>Our test needs to observe chunks written to the destination stream. Technically, the destination could just be a <code class="language-plaintext highlighter-rouge">Writable</code> stream as this is the only requirement of <code class="language-plaintext highlighter-rouge">pipe</code> and <code class="language-plaintext highlighter-rouge">pipeline</code>. However, we need to read the chunks that have been written to, so using a <code class="language-plaintext highlighter-rouge">Transform</code> stream such as <code class="language-plaintext highlighter-rouge">PassThrough</code> definitely helps as it exposes a <code class="language-plaintext highlighter-rouge">Readable</code> side.</p> <p>In particular, <code class="language-plaintext highlighter-rouge">Readable</code> streams emit a <a href="https://nodejs.org/api/stream.html#stream_event_data"><code class="language-plaintext highlighter-rouge">data</code> event</a> with the associated chunk of data. That is exactly what we need to accumulate the results!</p> <p>Our test now becomes:</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">PassThrough</span><span class="p">,</span> <span class="nx">pipeline</span><span class="p">,</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="nx">describe</span><span class="p">(</span><span class="dl">"</span><span class="s2">map operator =&gt;</span><span class="dl">"</span><span class="p">,</span> <span class="p">()</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="nx">it</span><span class="p">(</span><span class="dl">"</span><span class="s2">applies transformations to chunks</span><span class="dl">"</span><span class="p">,</span> <span class="p">(</span><span class="nx">done</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="kd">const</span> <span class="nx">source</span> <span class="o">=</span> <span class="nx">Readable</span><span class="p">.</span><span class="k">from</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">{</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="kd">const</span> <span class="nx">transformation</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">MapTransform</span><span class="p">((</span><span class="nx">number</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="nx">number</span> <span class="o">**</span> <span class="mi">2</span><span class="p">);</span> <span class="kd">const</span> <span class="nx">destination</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">PassThrough</span><span class="p">({</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="kd">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="p">[];</span> <span class="nx">destination</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="nx">result</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk</span><span class="p">);</span> <span class="p">});</span> <span class="nx">pipeline</span><span class="p">(</span> <span class="nx">source</span><span class="p">,</span> <span class="nx">transformation</span><span class="p">,</span> <span class="nx">destination</span><span class="p">,</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="nx">expect</span><span class="p">(</span><span class="nx">err</span><span class="p">).</span><span class="nx">toBeFalsy</span><span class="p">(</span><span class="dl">'</span><span class="s1">pipeline should successfully complete</span><span class="dl">'</span><span class="p">);</span> <span class="nx">expect</span><span class="p">(</span><span class="nx">result</span><span class="p">).</span><span class="nx">toEqual</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">9</span><span class="p">]);</span> <span class="nx">done</span><span class="p">();</span> <span class="p">}</span> <span class="p">);</span> <span class="p">});</span> <span class="p">})</span> </code></pre></div></div> <p>The test seems ready. If I execute it, I get:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test </span>Failures: 1<span class="o">)</span> map operator <span class="o">=&gt;</span> applies transformations to chunks Message: ReferenceError: MapTransform is not defined </code></pre></div></div> <p>Just to make sure the pipeline is properly set up, let’s temporarily replace <code class="language-plaintext highlighter-rouge">MapTransform</code> with <code class="language-plaintext highlighter-rouge">PassThrough</code> in object mode. In that case, the test should fail because <code class="language-plaintext highlighter-rouge">result</code> will be equal to <code class="language-plaintext highlighter-rouge">[1, 2, 3]</code> and not <code class="language-plaintext highlighter-rouge">[1, 4, 9]</code>. Let’s see:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test </span>1<span class="o">)</span> map operator <span class="o">=&gt;</span> applies transformations to chunks Message: Expected <span class="nv">$[</span>1] <span class="o">=</span> 2 to equal 4. Expected <span class="nv">$[</span>2] <span class="o">=</span> 3 to equal 9. </code></pre></div></div> <p>The test fails as expected, let’s focus on the implementation now.</p> <p><code class="language-plaintext highlighter-rouge">map</code> is an intermediate transformation, directly correlating outputs to inputs. Hence, <code class="language-plaintext highlighter-rouge">Transform</code> is the ideal choice.</p> <p>Let’s subclass <code class="language-plaintext highlighter-rouge">Transform</code>, then:</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">Transform</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="kd">class</span> <span class="nx">MapTransform</span> <span class="kd">extends</span> <span class="nx">Transform</span> <span class="p">{</span> <span class="kd">constructor</span><span class="p">(</span><span class="nx">mapFunction</span><span class="p">)</span> <span class="p">{</span> <span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">mapFunction</span> <span class="o">=</span> <span class="nx">mapFunction</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// ???</span> <span class="p">}</span> </code></pre></div></div> <p><code class="language-plaintext highlighter-rouge">Transform</code> streams need to implement the <a href="https://nodejs.org/api/stream.html#stream_transform_transform_chunk_encoding_callback"><code class="language-plaintext highlighter-rouge">_transform</code> method</a>. The first parameter is the chunk of data coming to the <code class="language-plaintext highlighter-rouge">Writable</code> side, the second is the encoding (which is irrelevant in object mode) and the third one is a callback that must be called <strong>exactly once</strong> to notify either an error or null (first argument) or pass on the result to the <code class="language-plaintext highlighter-rouge">Readable</code> side (second argument).</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">Transform</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="kd">class</span> <span class="nx">MapTransform</span> <span class="kd">extends</span> <span class="nx">Transform</span> <span class="p">{</span> <span class="kd">constructor</span><span class="p">(</span><span class="nx">mapFunction</span><span class="p">)</span> <span class="p">{</span> <span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">mapFunction</span> <span class="o">=</span> <span class="nx">mapFunction</span><span class="p">;</span> <span class="p">}</span> <span class="nx">_transform</span><span class="p">(</span><span class="nx">chunk</span><span class="p">,</span> <span class="nx">encoding</span><span class="p">,</span> <span class="nx">callback</span><span class="p">)</span> <span class="p">{</span> <span class="nx">callback</span><span class="p">(</span><span class="kc">null</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">mapFunction</span><span class="p">(</span><span class="nx">chunk</span><span class="p">));</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>Let’s see if the test passes now:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test</span> <span class="o">&gt;</span> jasmine Randomized with seed 30817 Started <span class="nb">.</span> 1 spec, 0 failures Finished <span class="k">in </span>0.014 seconds </code></pre></div></div> <p>đŸŸ It does!</p> <p>We could improve a few things, such as accepting asynchronous functions and handling throwing functions. This is left as an exercise to the readers 😉 (hint: <code class="language-plaintext highlighter-rouge">Promise.resolve</code> bridges synchronous and asynchronous functions)</p> <h2 id="zip-it">Zip it!</h2> <p><code class="language-plaintext highlighter-rouge">zip</code> is slightly more complex than <code class="language-plaintext highlighter-rouge">map</code> as it operates on (at least) two streams. Let’s see it in action (thanks again to <a href="https://projectreactor.io/">project Reactor</a> for the diagrams):</p> <p><img src="/assets/img/zip.svg" alt="`zip` diagram" title="`zip` diagram" /></p> <p><code class="language-plaintext highlighter-rouge">zip</code> pairs up chunks by order of arrival. Once the pair is formed, a transformation function is applied to it. <code class="language-plaintext highlighter-rouge">zip</code> completes when the last stream completes.</p> <p>For simplicity’s sake, our <code class="language-plaintext highlighter-rouge">zip</code> implementation will only pair elements together but not apply any transformation.</p> <p>Time to express our intent with a test:</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">PassThrough</span><span class="p">,</span> <span class="nx">pipeline</span><span class="p">,</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="nx">describe</span><span class="p">(</span><span class="dl">"</span><span class="s2">zip operator =&gt;</span><span class="dl">"</span><span class="p">,</span> <span class="p">()</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="nx">it</span><span class="p">(</span><span class="dl">"</span><span class="s2">pairs chunks from upstream streams</span><span class="dl">"</span><span class="p">,</span> <span class="p">(</span><span class="nx">done</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="kd">const</span> <span class="nx">upstream1</span> <span class="o">=</span> <span class="nx">Readable</span><span class="p">.</span><span class="k">from</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">{</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="c1">// (1)</span> <span class="kd">const</span> <span class="nx">upstream2</span> <span class="o">=</span> <span class="nx">Readable</span><span class="p">.</span><span class="k">from</span><span class="p">([</span><span class="dl">"</span><span class="s2">Un</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">Deux</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">Trois</span><span class="dl">"</span><span class="p">],</span> <span class="p">{</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="c1">// (1)</span> <span class="kd">const</span> <span class="nx">zipSource</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">ZipReadable</span><span class="p">(</span><span class="nx">upstream1</span><span class="p">,</span> <span class="nx">upstream2</span><span class="p">);</span> <span class="c1">// (2)</span> <span class="kd">const</span> <span class="nx">destination</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">PassThrough</span><span class="p">({</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="c1">// (3)</span> <span class="kd">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="p">[];</span> <span class="c1">// (4)</span> <span class="nx">destination</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="c1">// (4)</span> <span class="nx">result</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk</span><span class="p">);</span> <span class="p">});</span> <span class="nx">pipeline</span><span class="p">(</span> <span class="nx">zipSource</span><span class="p">,</span> <span class="nx">destination</span><span class="p">,</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="c1">// (5)</span> <span class="nx">expect</span><span class="p">(</span><span class="nx">err</span><span class="p">).</span><span class="nx">toBeFalsy</span><span class="p">(</span><span class="dl">'</span><span class="s1">pipeline should successfully complete</span><span class="dl">'</span><span class="p">);</span> <span class="nx">expect</span><span class="p">(</span><span class="nx">result</span><span class="p">).</span><span class="nx">toEqual</span><span class="p">([</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="dl">"</span><span class="s2">Un</span><span class="dl">"</span><span class="p">],</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="dl">"</span><span class="s2">Deux</span><span class="dl">"</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="dl">"</span><span class="s2">Trois</span><span class="dl">"</span><span class="p">]</span> <span class="p">]);</span> <span class="nx">done</span><span class="p">();</span> <span class="p">}</span> <span class="p">);</span> <span class="p">})</span> <span class="p">})</span> </code></pre></div></div> <p>This is very similar to the previous <code class="language-plaintext highlighter-rouge">map</code> test:</p> <ol> <li>we need two streams to read from, hence the creation of two <code class="language-plaintext highlighter-rouge">Readable</code> streams from different arrays. Note we could (and should for a production implementation) spice up the test a bit by introducing latency, thus making sure we properly wait for chunks to be paired in order. This could be done with <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/function*">generator functions</a> and <a href="https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/setTimeout"><code class="language-plaintext highlighter-rouge">setTimeout</code></a>.</li> <li>the next step will be to figure out how to implement <code class="language-plaintext highlighter-rouge">ZipReadable</code>. We can safely assume it accepts two <code class="language-plaintext highlighter-rouge">Readable</code> streams to read chunks from.</li> <li>same as before, we rely on <code class="language-plaintext highlighter-rouge">PassThrough</code> to receive the resulting chunks. We will use its <code class="language-plaintext highlighter-rouge">Readable</code> side to observe and accumulate the results.</li> <li>we accumulate the observed resulting chunks in <code class="language-plaintext highlighter-rouge">result</code>, based on the <a href="https://nodejs.org/api/stream.html#stream_event_data"><code class="language-plaintext highlighter-rouge">data</code> event</a> emitted by the <code class="language-plaintext highlighter-rouge">Readable</code> side of the <code class="language-plaintext highlighter-rouge">PassThrough</code> stream</li> <li>finally, we rely on the completion callback to make sure, as before, that the pipeline successfully completes, the resulting chunks are as we expect and notify Jasmine of the test completion</li> </ol> <p>Let’s run the test:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test </span>Failures: 1<span class="o">)</span> zip operator <span class="o">=&gt;</span> pairs chunks from upstream streams Message: ReferenceError: ZipReadable is not defined </code></pre></div></div> <p>Let’s create an implementation that works with two streams for now. First, what kind of stream our <code class="language-plaintext highlighter-rouge">ZipReadable</code> should be? Let’s go with <code class="language-plaintext highlighter-rouge">Readable</code>, as <code class="language-plaintext highlighter-rouge">ZipReadable</code> acts as a source built upon two upstream streams.</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="kd">class</span> <span class="nx">ZipReadable</span> <span class="kd">extends</span> <span class="nx">Readable</span> <span class="p">{</span> <span class="kd">constructor</span><span class="p">(</span><span class="nx">stream1</span><span class="p">,</span> <span class="nx">stream2</span><span class="p">)</span> <span class="p">{</span> <span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span> <span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream1</span> <span class="o">=</span> <span class="nx">stream1</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream2</span> <span class="o">=</span> <span class="nx">stream2</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// ??? (2)</span> <span class="nx">_startReading</span><span class="p">()</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream1</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk1</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="c1">// ??? (1)</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream2</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk2</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="c1">// ??? (1)</span> <span class="p">});</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <ol> <li>we need to get data from both the upstream streams. We chose here not to call <code class="language-plaintext highlighter-rouge">_startReading</code> in the constructor. The goal is to start reading only when a first consumer wants to read data.</li> <li>we somehow need to emit data whenever <code class="language-plaintext highlighter-rouge">ZipReadable</code> is read from</li> </ol> <p>Let’s first worry about buffering the incoming data:</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="kd">class</span> <span class="nx">ZipReadable</span> <span class="kd">extends</span> <span class="nx">Readable</span> <span class="p">{</span> <span class="kd">constructor</span><span class="p">(</span><span class="nx">stream1</span><span class="p">,</span> <span class="nx">stream2</span><span class="p">)</span> <span class="p">{</span> <span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span> <span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span> <span class="o">=</span> <span class="p">[];</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span> <span class="o">=</span> <span class="p">[];</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream1</span> <span class="o">=</span> <span class="nx">stream1</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream2</span> <span class="o">=</span> <span class="nx">stream2</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// ???</span> <span class="nx">_startReading</span><span class="p">()</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream1</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk1</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk1</span><span class="p">);</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream2</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk2</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk2</span><span class="p">);</span> <span class="p">});</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>Nothing too fancy here, chunks are pushed to the corresponding array. Custom <code class="language-plaintext highlighter-rouge">Readable</code> need to implement <a href="https://nodejs.org/api/stream.html#stream_readable_read_size_1"><code class="language-plaintext highlighter-rouge">Readable#_read</code></a>. Results are pushed to consumers via <a href="https://nodejs.org/api/stream.html#stream_readable_push_chunk_encoding"><code class="language-plaintext highlighter-rouge">Readable#push</code></a>.</p> <p>Let’s have a crack at it:</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// DO NOT USE IN PRODUCTION - SEE BELOW FOR DETAILS</span> <span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="kd">class</span> <span class="nx">ZipReadable</span> <span class="kd">extends</span> <span class="nx">Readable</span> <span class="p">{</span> <span class="kd">constructor</span><span class="p">(</span><span class="nx">stream1</span><span class="p">,</span> <span class="nx">stream2</span><span class="p">)</span> <span class="p">{</span> <span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span> <span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream1</span> <span class="o">=</span> <span class="nx">stream1</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream2</span> <span class="o">=</span> <span class="nx">stream2</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span> <span class="o">=</span> <span class="p">[];</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span> <span class="o">=</span> <span class="p">[];</span> <span class="p">}</span> <span class="nx">_read</span><span class="p">(</span><span class="nx">size</span><span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">initialized</span><span class="p">)</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">_startReading</span><span class="p">();</span> <span class="c1">// (1)</span> <span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span> <span class="p">}</span> <span class="kd">const</span> <span class="nx">bound</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">min</span><span class="p">(</span><span class="nx">size</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">length</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span> <span class="c1">// (2)</span> <span class="k">if</span> <span class="p">(</span><span class="nx">bound</span> <span class="o">===</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span> <span class="kd">const</span> <span class="nx">readyChunks1</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">);</span> <span class="c1">// (3)</span> <span class="kd">const</span> <span class="nx">readyChunks2</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">);</span> <span class="c1">// (3)</span> <span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="nx">bound</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> <span class="kd">const</span> <span class="nx">pair</span> <span class="o">=</span> <span class="p">[</span><span class="nx">readyChunks1</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span> <span class="nx">readyChunks2</span><span class="p">[</span><span class="nx">i</span><span class="p">]];</span> <span class="c1">// (4)</span> <span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">pair</span><span class="p">);</span> <span class="c1">// (5)</span> <span class="p">}</span> <span class="p">}</span> <span class="nx">_startReading</span><span class="p">()</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream1</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk1</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk1</span><span class="p">);</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream2</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk2</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk2</span><span class="p">);</span> <span class="p">});</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <ol> <li>upon the first call to <code class="language-plaintext highlighter-rouge">Readable#_read</code> (when <code class="language-plaintext highlighter-rouge">pipeline</code> is called in the test), we start reading data from the upstream sources. As we do not want to subscribe to the <code class="language-plaintext highlighter-rouge">'data'</code> event multiple times, we guard this initialization with the <code class="language-plaintext highlighter-rouge">this.initialized</code> flag.</li> <li><code class="language-plaintext highlighter-rouge">size</code> is advisory, so we could just ignore it but it does not cost much to include in the bound computation. More on that towards the end of this article.</li> <li><code class="language-plaintext highlighter-rouge">splice</code> is used here to remove and return the <code class="language-plaintext highlighter-rouge">bound</code> first elements of each array as well as shift the remaining ones. That way, we do not keep consumed chunks around.</li> <li>the core logic of <code class="language-plaintext highlighter-rouge">zip</code> is here, we create a pair (an array) of chunks accumulated from two streams</li> <li>finally, we publish that pair</li> </ol> <p>Let’s see if our test is happy:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Failures: 1<span class="o">)</span> zip operator <span class="o">=&gt;</span> pairs chunks from upstream streams Message: Error: Timeout - Async <span class="k">function </span>did not <span class="nb">complete </span>within 5000ms <span class="o">(</span><span class="nb">set </span>by jasmine.DEFAULT_TIMEOUT_INTERVAL<span class="o">)</span> </code></pre></div></div> <p>Oh no! The test fails. Looking at the above implementation, this actually makes sense. When <code class="language-plaintext highlighter-rouge">_read</code> is called the first time, there is no guarantee at all that data has been buffered yet from the upstream sources.</p> <p>Looking a bit more closely to <a href="https://nodejs.org/api/stream.html#stream_readable_read_size_1"><code class="language-plaintext highlighter-rouge">Readable#_read</code> documentation</a>, we can read:</p> <blockquote> <p>Once the readable._read() method has been called, it will not be called again until more data is pushed through the readable.push() method.</p> </blockquote> <p>Ahah! That’s exactly the issue we hit! <code class="language-plaintext highlighter-rouge">_read</code> is called a first time when the pipeline is set up, but no data has come yet so nothing to push. Then, we are stuck forever as no further <code class="language-plaintext highlighter-rouge">Readable#push</code> calls can occur because <code class="language-plaintext highlighter-rouge">_read</code> will not be called anymore.</p> <p>Lucky for us, nothing prevents <code class="language-plaintext highlighter-rouge">Readable#push</code>, or even <code class="language-plaintext highlighter-rouge">Readable#_read</code> from being called from elsewhere in the <code class="language-plaintext highlighter-rouge">Readable</code> implementation.</p> <p>Let’s try again (and add a few temporary logs while we’re at it):</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// DO NOT USE IN PRODUCTION - SEE BELOW FOR DETAILS</span> <span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="kd">class</span> <span class="nx">ZipReadable</span> <span class="kd">extends</span> <span class="nx">Readable</span> <span class="p">{</span> <span class="kd">constructor</span><span class="p">(</span><span class="nx">stream1</span><span class="p">,</span> <span class="nx">stream2</span><span class="p">)</span> <span class="p">{</span> <span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span> <span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream1</span> <span class="o">=</span> <span class="nx">stream1</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream2</span> <span class="o">=</span> <span class="nx">stream2</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span> <span class="o">=</span> <span class="p">[];</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span> <span class="o">=</span> <span class="p">[];</span> <span class="p">}</span> <span class="nx">_read</span><span class="p">(</span><span class="nx">size</span><span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">initialized</span><span class="p">)</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">'</span><span class="s1">Initializing pipeline</span><span class="dl">'</span><span class="p">);</span> <span class="k">this</span><span class="p">.</span><span class="nx">_startReading</span><span class="p">();</span> <span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span> <span class="p">}</span> <span class="kd">const</span> <span class="nx">bound</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">min</span><span class="p">(</span><span class="nx">size</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">length</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="nx">bound</span> <span class="o">===</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`Waiting for data, nothing to do for now...`</span><span class="p">);</span> <span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`Data flowing: </span><span class="p">${</span><span class="nx">bound</span><span class="p">}</span><span class="s2"> element(s) from each source to zip!`</span><span class="p">);</span> <span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">readyChunks1</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">);</span> <span class="kd">const</span> <span class="nx">readyChunks2</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">);</span> <span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="nx">bound</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> <span class="kd">const</span> <span class="nx">pair</span> <span class="o">=</span> <span class="p">[</span><span class="nx">readyChunks1</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span> <span class="nx">readyChunks2</span><span class="p">[</span><span class="nx">i</span><span class="p">]];</span> <span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">pair</span><span class="p">);</span> <span class="p">}</span> <span class="p">}</span> <span class="nx">_startReading</span><span class="p">()</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream1</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk1</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`Chunk 1 received: </span><span class="p">${</span><span class="nx">chunk1</span><span class="p">}</span><span class="s2">`</span><span class="p">);</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk1</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span><span class="p">)</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`Waiting for data, calling with </span><span class="p">${</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">length</span><span class="p">}</span><span class="s2"> element(s) from first upstream`</span><span class="p">);</span> <span class="k">this</span><span class="p">.</span><span class="nx">_read</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span> <span class="p">}</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream2</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk2</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`Chunk 2 received: </span><span class="p">${</span><span class="nx">chunk2</span><span class="p">}</span><span class="s2">`</span><span class="p">);</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk2</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span><span class="p">)</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`Waiting for data, calling with </span><span class="p">${</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">length</span><span class="p">}</span><span class="s2"> element(s) from second upstream`</span><span class="p">);</span> <span class="k">this</span><span class="p">.</span><span class="nx">_read</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span> <span class="p">}</span> <span class="p">});</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>Let’s re-run the test:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test </span>Initializing pipeline Waiting <span class="k">for </span>data, nothing to <span class="k">do for </span>now... Chunk 1 received: 1 Waiting <span class="k">for </span>data, calling with 1 element<span class="o">(</span>s<span class="o">)</span> from first upstream Waiting <span class="k">for </span>data, nothing to <span class="k">do for </span>now... Chunk 2 received: Un Waiting <span class="k">for </span>data, calling with 1 element<span class="o">(</span>s<span class="o">)</span> from second upstream Data flowing: 1 element<span class="o">(</span>s<span class="o">)</span> from each <span class="nb">source </span>to zip! Chunk 1 received: 2 Chunk 2 received: Deux Chunk 1 received: 3 Chunk 2 received: Trois Data flowing: 2 element<span class="o">(</span>s<span class="o">)</span> from each <span class="nb">source </span>to zip! Waiting <span class="k">for </span>data, nothing to <span class="k">do for </span>now... Failures: 1<span class="o">)</span> zip operator <span class="o">=&gt;</span> pairs chunks from upstream streams Message: Error: Timeout - Async <span class="k">function </span>did not <span class="nb">complete </span>within 5000ms <span class="o">(</span><span class="nb">set </span>by jasmine.DEFAULT_TIMEOUT_INTERVAL<span class="o">)</span> </code></pre></div></div> <p>Hmm, the test still fails, but the implementation seems to behave correctly. What actually happens is that our <code class="language-plaintext highlighter-rouge">ZipReadable</code> implementation never completes. Looking again at the <a href="https://nodejs.org/api/stream.html#stream_readable_push_chunk_encoding"><code class="language-plaintext highlighter-rouge">Readable#push</code></a> documentation, we can see pushing that <code class="language-plaintext highlighter-rouge">null</code> notifies downstream consumers that the stream is done emitting data.</p> <p>Now, when should we do that? If we look at the Reactor diagram of <code class="language-plaintext highlighter-rouge">zip</code> again:</p> <p><img src="/assets/img/zip.svg" alt="`zip` diagram" title="`zip` diagram" /></p> <p>
 we can see that the completion should be sent when the last stream completes. <code class="language-plaintext highlighter-rouge">Readable</code> streams notify consumers with the <a href="https://nodejs.org/api/stream.html#stream_event_end"><code class="language-plaintext highlighter-rouge">end</code> event</a> when they are done. Now that we have got everything figured out, let’s get rid of the logs and fix our implementation:</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// DO NOT USE IN PRODUCTION - SEE BELOW FOR DETAILS</span> <span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="kd">class</span> <span class="nx">ZipReadable</span> <span class="kd">extends</span> <span class="nx">Readable</span> <span class="p">{</span> <span class="kd">constructor</span><span class="p">(</span><span class="nx">stream1</span><span class="p">,</span> <span class="nx">stream2</span><span class="p">)</span> <span class="p">{</span> <span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span> <span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// (1)</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream1</span> <span class="o">=</span> <span class="nx">stream1</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream2</span> <span class="o">=</span> <span class="nx">stream2</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span> <span class="o">=</span> <span class="p">[];</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span> <span class="o">=</span> <span class="p">[];</span> <span class="p">}</span> <span class="nx">_read</span><span class="p">(</span><span class="nx">size</span><span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">initialized</span><span class="p">)</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">_startReading</span><span class="p">();</span> <span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span> <span class="p">}</span> <span class="kd">const</span> <span class="nx">bound</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">min</span><span class="p">(</span><span class="nx">size</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">length</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="nx">bound</span> <span class="o">===</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span> <span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span> <span class="kd">const</span> <span class="nx">readyChunks1</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">);</span> <span class="kd">const</span> <span class="nx">readyChunks2</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">);</span> <span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="nx">bound</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> <span class="kd">const</span> <span class="nx">pair</span> <span class="o">=</span> <span class="p">[</span><span class="nx">readyChunks1</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span> <span class="nx">readyChunks2</span><span class="p">[</span><span class="nx">i</span><span class="p">]];</span> <span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">pair</span><span class="p">);</span> <span class="p">}</span> <span class="p">}</span> <span class="nx">_startReading</span><span class="p">()</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream1</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">end</span><span class="dl">'</span><span class="p">,</span> <span class="p">()</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="c1">// (2)</span> <span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span><span class="o">++</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span> <span class="o">===</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// (3)</span> <span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="kc">null</span><span class="p">);</span> <span class="p">}</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream2</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">end</span><span class="dl">'</span><span class="p">,</span> <span class="p">()</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="c1">// (2)</span> <span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span><span class="o">++</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span> <span class="o">===</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// (3)</span> <span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="kc">null</span><span class="p">);</span> <span class="p">}</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream1</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk1</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk1</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span><span class="p">)</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">_read</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span> <span class="p">}</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">stream2</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk2</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk2</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span><span class="p">)</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">_read</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span> <span class="p">}</span> <span class="p">});</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <ol> <li>we introduce a counter to keep track of upstream stream completion.</li> <li>we observe each upstream stream completion and increment the counter when than occurs.</li> <li>we notify the <code class="language-plaintext highlighter-rouge">zip</code> stream completion when all upstream streams are done.</li> </ol> <p>Let’s run the tests:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test </span>2 specs, 0 failures </code></pre></div></div> <p>Yay, it passes đŸ„ł</p> <p>However, the implementation could definitely be refactored as there is a lot of duplicated behaviors. It could even be generalized to <em>n</em> upstream sources (the corresponding test is very similar to the one with 2 sources)!</p> <p>And here we go:</p> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// DO NOT USE IN PRODUCTION - SEE BELOW FOR DETAILS</span> <span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span> <span class="kd">class</span> <span class="nx">ZipReadable</span> <span class="kd">extends</span> <span class="nx">Readable</span> <span class="p">{</span> <span class="kd">constructor</span><span class="p">(...</span><span class="nx">upstreams</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// (1)</span> <span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span> <span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">streams</span> <span class="o">=</span> <span class="nx">upstreams</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks</span> <span class="o">=</span> <span class="nx">upstreams</span><span class="p">.</span><span class="nx">map</span><span class="p">(()</span> <span class="o">=&gt;</span> <span class="p">[]);</span> <span class="c1">// (2)</span> <span class="p">}</span> <span class="nx">_read</span><span class="p">(</span><span class="nx">size</span><span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">initialized</span><span class="p">)</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">_startReading</span><span class="p">();</span> <span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span> <span class="p">}</span> <span class="kd">const</span> <span class="nx">bound</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">min</span><span class="p">(</span><span class="nx">size</span><span class="p">,</span> <span class="p">...</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks</span><span class="p">.</span><span class="nx">map</span><span class="p">(</span><span class="nx">array</span> <span class="o">=&gt;</span> <span class="nx">array</span><span class="p">.</span><span class="nx">length</span><span class="p">));</span> <span class="c1">// (3)</span> <span class="k">if</span> <span class="p">(</span><span class="nx">bound</span> <span class="o">===</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span> <span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks</span> <span class="p">.</span><span class="nx">map</span><span class="p">(</span><span class="nx">a</span> <span class="o">=&gt;</span> <span class="nx">a</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">))</span> <span class="p">.</span><span class="nx">reduce</span><span class="p">((</span><span class="nx">prev</span><span class="p">,</span> <span class="nx">curr</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="c1">// (4)</span> <span class="kd">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="p">[];</span> <span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="nx">bound</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> <span class="kd">const</span> <span class="nx">previous</span> <span class="o">=</span> <span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">prev</span><span class="p">[</span><span class="nx">i</span><span class="p">]))</span> <span class="p">?</span> <span class="nx">prev</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="p">:</span> <span class="p">[</span><span class="nx">prev</span><span class="p">[</span><span class="nx">i</span><span class="p">]];</span> <span class="nx">result</span><span class="p">.</span><span class="nx">push</span><span class="p">([...</span><span class="nx">previous</span><span class="p">,</span> <span class="nx">curr</span><span class="p">[</span><span class="nx">i</span><span class="p">]]);</span> <span class="p">}</span> <span class="k">return</span> <span class="nx">result</span> <span class="p">})</span> <span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">pair</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">pair</span><span class="p">);</span> <span class="p">})</span> <span class="p">}</span> <span class="nx">_startReading</span><span class="p">()</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">streams</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">stream</span><span class="p">,</span> <span class="nx">index</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="nx">stream</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">end</span><span class="dl">'</span><span class="p">,</span> <span class="p">()</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span><span class="o">++</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span> <span class="o">===</span> <span class="k">this</span><span class="p">.</span><span class="nx">streams</span><span class="p">.</span><span class="nx">length</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// (5)</span> <span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="kc">null</span><span class="p">);</span> <span class="p">}</span> <span class="p">});</span> <span class="nx">stream</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="kd">const</span> <span class="nx">streamChunks</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks</span><span class="p">[</span><span class="nx">index</span><span class="p">];</span> <span class="nx">streamChunks</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span><span class="p">)</span> <span class="p">{</span> <span class="k">this</span><span class="p">.</span><span class="nx">_read</span><span class="p">(</span><span class="nx">streamChunks</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span> <span class="p">}</span> <span class="p">});</span> <span class="p">});</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <ol> <li>we use now the <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/rest_parameters">“rest parameter” syntax</a> to accept any number of streams. We could arguably improve the signature further by having two mandatory streams and an optional rest ones for extra streams.</li> <li>we just have to create an initial empty array of chunks for every stream</li> <li>we compute the current length of each chunk array and use the <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax">“spread syntax”</a> to fit these lengths into separate arguments of <code class="language-plaintext highlighter-rouge">Math.min</code>.</li> <li>finally, after <code class="language-plaintext highlighter-rouge">Array#splice</code> extract the <code class="language-plaintext highlighter-rouge">bound</code> first parameter of each chunk array, these arrays are reduced into pairs and then published via <code class="language-plaintext highlighter-rouge">Readable#push</code></li> <li>the counter now need to reflect the dynamic number of upstream sources instead of the hardcoded 2 of the previous version</li> </ol> <p>Does the existing test still pass?</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test </span>2 specs, 0 failures </code></pre></div></div> <p>Yes!</p> <h2 id="one-more-thing">One More Thing</h2> <p>There is one (albeit very important) aspect of streams I deliberately did not mention here: <a href="https://nodejs.org/es/docs/guides/backpressuring-in-streams/">backpressure</a>. Backpressure happens when downstream streams cannot keep up with upstream streams. Basically, the latter conveys data too fast for the first.</p> <p>The good news is that <code class="language-plaintext highlighter-rouge">Readable#pipe</code> handles backpressure “for free” (and I assume <code class="language-plaintext highlighter-rouge">pipeline</code> as well).</p> <p>That being said, do our custom implementations of <code class="language-plaintext highlighter-rouge">zip</code> and <code class="language-plaintext highlighter-rouge">map</code> <a href="https://nodejs.org/en/docs/guides/backpressuring-in-streams/#rules-to-abide-by-when-implementing-custom-streams">handle backpressure correctly</a>?</p> <p>Spoiler alert: I’m afraid not.</p> <p>However, there will be a dedicated blog post about this, with updates to the initial implementations 😉</p> <h2 id="going-further">Going further</h2> <p>If you notice improvements (other than backpressure-related ones), please send a <a href="https://github.com/fbiville/fbiville.github.io">Pull Request</a> and/or reach out to me on <a href="https://twitter.com/fbiville">Twitter</a>. Here are a few references that helped me in my stream learning journey that are worth sharing:</p> <ul> <li><a href="https://nodejs.org/api/stream.html">https://nodejs.org/api/stream.html</a>: the official documentation of Node.js streams, including implementation guides</li> <li><a href="https://github.com/nodejs/help/">https://github.com/nodejs/help/</a>: stuck with something? Open an issue in this repository and Node.js maintainers will help you!</li> <li><a href="https://www.w3.org/TR/streams-api/">https://www.w3.org/TR/streams-api/</a> W3C/WhatWG stream spec (it slightly differs from Node.js stream API, but many concepts overlap)</li> <li><a href="https://v8.dev/blog">https://v8.dev/blog</a>: not directly related to streams, but this blog authored by v8 maintainers is a goldmine of information w.r.t. how v8 works and new Javascript features</li> </ul>Florent BivilleI joined the riff team at Pivotal a year and a half ago. I have been working for more than a year on riff invokers. This probably deserves a blog post on its own, but invokers, in short, have the responsibility of invoking user-defined functions and exposing a way to send inputs and receive outputs. The riff invocation protocol formally defines the scope of such invokers.Hello Jekyll!2019-05-19T00:00:00+00:002019-05-19T00:00:00+00:00https://fbiville.github.io/2019/05/19/Hello_Jekyll_<p>After a few issues with <a href="https://github.com/HubPress/hubpress.io">Hubpress.io</a> (is it even maintained now?), I decided to migrate my blog again and move to Jekyll.</p> <p>The process was a mix of automatic (<a href="https://pandoc.org/">Pandoc</a>), semi-manual (helped with some good old Bash commands) and purely manual transformations. I even fixed old quirks from the previous Dotclear-&gt;Hubpress migration in the process.</p> <p>The theme is used is well
 minimal but I do not really need a fancy blog. I got rid of the analytics. I also added a mystery page.</p> <p>Anyway, my blog is now live and usable again.</p> <p>Stay tuned for an announcement I have been wanting to make for a while!</p> <p>In the meantime, long live Jekyll!</p> <p><img src="/assets/img/jekyll.png" alt="Jekyll" /></p>Florent BivilleAfter a few issues with Hubpress.io (is it even maintained now?), I decided to migrate my blog again and move to Jekyll.hack.commit.push2019-05-19T00:00:00+00:002019-05-19T00:00:00+00:00https://fbiville.github.io/2019/05/19/hack.commit.push<p><a href="https://hack-commit-pu.sh">hack.commit.push</a> est un nouvel Ă©vĂ©nement gratuit autour des projets libres / open-source qui dĂ©barque bientĂŽt Ă  Paris !</p> <p>Avant d’entrer dans les dĂ©tails, je voulais revenir sur les motivations qui m’ont poussĂ© Ă  le co-crĂ©er.</p> <h1 id="tldr"><abbr title="Too Long; Didn't Read">TL;DR</abbr></h1> <p>Pas envie de tout lire ? Vous pouvez aller <a href="#save-the-date">droit Ă  l’essentiel</a> avec les infos Ă  retenir.</p> <h1 id="la-source--hackergarten-paris">La source : Hackergarten Paris</h1> <p>Le meetup <a href="https://www.meetup.com/Paris-Hackergarten/">Hackergarten Paris</a> rĂ©unit contributeur·trice·s de projets libres/open-source et personnes dĂ©sireuses de s’y mettre sans nĂ©cessairement savoir par oĂč commencer.</p> <p>Comme expliquĂ© dans <a href="https://fbiville.github.io/2016/09/20/Pourquoi-venir-au-Hackergarten.html">une publication prĂ©cĂ©dente</a>, l’avantage est multiple.</p> <p>Les nouveaux·elles venu·e·s sont accompagné·e·s en direct par une personne familiĂšre avec le code Ă  changer. Elles peuvent donc contribuer efficacement, prendre confiance et aussi dĂ©mystifier le travail accompli : <strong>vous aussi</strong> ĂȘtes capable de contribuer !</p> <p>CĂŽtĂ© project leads, une rĂ©cente enquĂȘte (en anglais) de l’excellente initiative <a href="https://opencollective.com/">Open Collective</a> rĂ©sume bien mieux que moi l’un des besoins que les meetups Hackergarten ont pour ambition de satisfaire.</p> <blockquote class="twitter-tweet" data-partner="tweetdeck"><p lang="en" dir="ltr">One of the core reasons why the <a href="https://twitter.com/hackcommitpush?ref_src=twsrc%5Etfw">@hackcommitpush</a> conference and the <a href="https://twitter.com/Hackergarten?ref_src=twsrc%5Etfw">@hackergarten</a> meetups exist is perfectly summed up in this <a href="https://twitter.com/opencollect?ref_src=twsrc%5Etfw">@opencollect</a> survey: <a href="https://t.co/qmOEIkKAdr">https://t.co/qmOEIkKAdr</a>. Worth reading and sharing!<br />Looking forward to welcoming contributors on June 15: <a href="https://t.co/skQvuterrd">https://t.co/skQvuterrd</a>! <a href="https://t.co/yYDiHtRHBO">pic.twitter.com/yYDiHtRHBO</a></p>&mdash; hack.commit.push (@hackcommitpush) <a href="https://twitter.com/hackcommitpush/status/1129324028735438848?ref_src=twsrc%5Etfw">May 17, 2019</a></blockquote> <script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> <p>En effet, la plupart des projets libres/open-source sont maintenus par des personnes distribuĂ©es sur toute la planĂšte et la communication s’effectue habituellement par Ă©crans interposĂ©s.</p> <p>Le meetup Hackergarten Paris (<a href="http://hackergarten.net/">comme ceux d’autres villes</a>) permet donc de co-localiser les personnes motivĂ©es par un sujet commun et de les faire avancer dans un cadre dĂ©tendu et bienveillant. En bref, retisser un lien social qui se perd entre contributeur·rice·s.</p> <h1 id="hackcommitpush-dans-tout-ça-">hack.commit.push dans tout ça ?</h1> <p>J’ai d’excellents souvenirs de mes premiĂšres participations au Hackergarten autour de 2011-2012. Il avait lieu rĂ©guliĂšrement Ă  <a href="https://xebia.com/">Xebia</a> et Ă©tait organisĂ© par <a href="https://twitter.com/mathildelemee">Mathilde</a>, <a href="https://twitter.com/BriceDutheil">Brice</a> et <a href="https://twitter.com/elefevre">Éric</a>.</p> <p>NĂ©anmoins, faute de temps, le meetup ne fut plus organisĂ© que pendant les grandes confĂ©rences (Devoxx etc). Avec la permission des trois organisateurs citĂ©s ci-dessus, j’ai alors repris le meetup (fin 2015, de mĂ©moire) et relancĂ© sa version mensuelle (qui continue aujourd’hui : tous les derniers mardis du mois Ă  <a href="https://www.meetup.com/Paris-Hackergarten/">Paris</a>).</p> <p>J’ai mĂȘme essayĂ© deux ou trois fois de tenir le Hackergarten pendant <a href="https://devoxx.fr">Devoxx France</a>, aprĂšs sa migration au Palais des CongrĂšs. Pour des raisons diverses, cela n’a simplement pas fonctionnĂ© : quasiment personne n’a rejoint la session.</p> <p>Au delĂ  des amĂ©liorations d’organisation potentielles de Devoxx pour le Hackergarten (les orgas abattent dĂ©jĂ  un travail considĂ©rable), j’ai fini par me demander s’il Ă©tait vraiment pertinent de proposer un Hackergarten Ă  des personnes venues avant tout pour assister Ă  des confĂ©rences et pour rĂ©seauter.</p> <p>C’est de ce constat qu’est nĂ© l’idĂ©e du <a href="https://hack-commit-pu.sh">hack.commit.push</a> est nĂ© : un Ă©vĂ©nement 100% dĂ©diĂ© aux contributions de projets libres / open-source, Ă  la maniĂšre des Hackergartens existants !</p> <h1 id="save-the-date">Save the date</h1> <p>OrganisĂ©e par <a href="https://twitter.com/aalmiray">Andres</a>, <a href="https://twitter.com/hboutemy">HervĂ©</a>, <a href="https://twitter.com/mesirii">Michael</a> et votre serviteur, soutenue par des contributeur·trice·s tel·le·s que <a href="https://twitter.com/JessicaGantier">Jessica</a>, <a href="https://twitter.com/dyild">Dilek</a> et <a href="https://twitter.com/kehrlann">Daniel</a>, la premiĂšre Ă©dition est <strong>GRATUITE</strong>, aura lieu le <strong>15 Juin</strong> Ă  <strong>Paris</strong> dans les trĂšs beaux locaux de <a href="http://www.techandcodefactory.fr/">Tech &amp; Code Factory</a> et s’inscrit dans la droite lignĂ©e des Hackergartens :</p> <ul> <li>tou·te·s les participant·e·s sont bienvenu·e·s, quel que soit leur niveau en dĂ©veloppement logiciel et leur expĂ©rience avec des projets libres / open-source</li> <li>que ce soit de l’amĂ©lioration de documentation, de design, de correction de bugs ou de l’ajout de fonctionnalitĂ©, chaque contribution compte !</li> </ul> <p>Pour les dĂ©butant·e·s, nous avons pour volontĂ© d’organiser des ateliers d’introduction la matinĂ©e (par exemple : introduction Ă  Git / Github) afin de les aider Ă  contribuer pendant l’aprĂšs-midi.</p> <p>Nous avons d’ores et dĂ©jĂ  de beaux projets Ă  vous proposer :</p> <ul> <li><a href="https://maven.apache.org/">Apache Maven</a></li> <li><a href="https://neo4j.com/">Neo4j</a></li> <li><a href="https://gradle.org/">Gradle</a></li> <li><a href="https://projectriff.io/">riff</a></li> <li><a href="https://kubernetes.io/docs/home/">Kubernetes FR docs</a> &lt;- un grand merci Ă  <a href="https://twitter.com/remyleone">RĂ©my Leone</a> de <a href="https://www.scaleway.com/en/betas/">Scaleway</a> au passage</li> <li>et bien d’autres !</li> </ul> <p>N’hĂ©sitez plus, <a href="https://hack-commit-pu.sh/">inscrivez-vous</a> et faites passer le mot !</p> <p>Vous souhaitez vous impliquer davantage ? Lisez ce qui suit ↓</p> <h1 id="je-veux-mimpliquer-">Je veux m’impliquer !</h1> <h2 id="je-veux-proposer-un-projet">Je veux proposer un projet</h2> <p>Votre mission, si vous l’acceptez, est d’accompagner de façon bienveillante des personnes au niveau variĂ© sur leurs premiĂšres contributions Ă  votre projet libre/open-source.</p> <p>Votre challenge sera d’équilibrer le temps d’explication nĂ©cessaire pour commencer Ă  contribuer (vous voulez maximiser la participation des contributeur·trice·s) et le temps effectif de contribution (vous pouvez dĂ©finir des prĂ©-requis pour ĂȘtre plus efficace, mais c’est au risque d’exclure d’emblĂ©e trop de participant·e·s).</p> <p>Toujours tenté·e·s ? Alors, n’hĂ©sitez pas Ă  nous envoyer, de prĂ©fĂ©rence en anglais, une description de votre projet et les contributions possibles en une journĂ©e (avec d’éventuels prĂ©-requis pour les participant·e·s) : <code class="language-plaintext highlighter-rouge">organization AT hack-commit-pu.sh</code>.</p> <h2 id="ma-sociĂ©tĂ©-veut-sponsoriser">Ma sociĂ©tĂ© veut sponsoriser</h2> <p>Nous avons en effet divers frais Ă  couvrir, tels que le buffet de la journĂ©e, le cocktail de clĂŽture et pourquoi pas encore d’autres services si le budget le permet.</p> <p>Pour information, nous sommes structurĂ©s en <a href="https://paris-springers.github.io/">association</a>.</p> <p>N’hĂ©sitez pas Ă  nous contacter, de prĂ©fĂ©rence en anglais, Ă  <code class="language-plaintext highlighter-rouge">organization AT hack-commit-pu.sh</code> pour que nous vous envoyions notre prospectus.</p> <h2 id="je-veux-animer-un-atelier-dintroduction">Je veux animer un atelier d’introduction</h2> <p>Nous avons Ă  coeur que les profils moins expĂ©rimentĂ©s puissent Ă©galement participer. Le but des ateliers d’introduction est d’adresser, en deux heures, les fondamenteux de technologies utiles aux diffĂ©rents projets reprĂ©sentĂ©s pendant l’évĂ©nement.</p> <p>Le candidat le plus Ă©vident est Git / Github.</p> <p>N’hĂ©sitez pas Ă  nous contacter, de prĂ©fĂ©rence en anglais, Ă  <code class="language-plaintext highlighter-rouge">organization AT hack-commit-pu.sh</code> si cette opportunitĂ© vous intĂ©resse.</p> <h2 id="je-veux-ĂȘtre-bĂ©nĂ©vole">Je veux ĂȘtre bĂ©nĂ©vole</h2> <p>Si vous voulez rejoindre l’aventure, n’hĂ©sitez pas Ă  nous contacter, de prĂ©fĂ©rence en anglais, Ă  <code class="language-plaintext highlighter-rouge">organization AT hack-commit-pu.sh</code>.</p> <p>Si vous ne voulez aider “que” pendant le jour J, voici un aperçu de ce qu’il est possible de faire :</p> <ul> <li>accueil des sponsors / project leads</li> <li>inscription des participant·e·s</li> <li>annonce des pauses</li> <li>aide au mĂ©nage en fin de journĂ©e</li> </ul> <p>Ce qui n’est pas incompatible avec une participation Ă  l’évĂ©nement en lui-mĂȘme (vous aurez juste un temps de participation un peu plus rĂ©duit) !</p>Florent Bivillehack.commit.push est un nouvel Ă©vĂ©nement gratuit autour des projets libres / open-source qui dĂ©barque bientĂŽt Ă  Paris !Pourquoi Venir Au Hackergarten2016-09-20T00:00:00+00:002016-09-20T00:00:00+00:00https://fbiville.github.io/2016/09/20/Pourquoi-venir-au-Hackergarten<p>Qu’on se le dise, les logiciels Open Source sont partout. Il y a fort Ă  parier que vous les utilisiez directement voire en dĂ©veloppiez dans votre activitĂ© professionnelle. Il demeure indĂ©niable que vous en bĂ©nĂ©ficiez dans votre vie quotidienne, mĂȘme indirectement.</p> <h1 id="hackers-we-need-you">Hackers: we need you!</h1> <p>Il vous est peut-ĂȘtre mĂȘme arrivĂ© de renseigner un bug, voire de soumettre un correctif Ă  un logiciel open-source que vous utilisez dans le cadre professionnel. Mais en dehors de ces rares occasions, vous n’avez jamais trouvĂ© le temps de contribuer de façon plus pĂ©renne.</p> <p>Pourtant, en voilĂ  un objectif qui peut rendre fier|fiĂšre ! Devenir l’un des committers principaux d’un projet visible (ou en passe de le devenir) peut faire une belle diffĂ©rence sur le CV et dans votre carriĂšre.</p> <p>Cela ne se fait Ă©videmment pas en un jour, mais chaque premiĂšre contribution est importante. Il peut ĂȘtre assez difficile de se plonger dans une base de code inconnue sans aide extĂ©rieure, ni objectif prĂ©cis.</p> <p>Paris Hackergarten est lĂ  pour vous !</p> <p>Il vise Ă  regrouper, dans une mĂȘme piĂšce, le temps d’une soirĂ©e (1 fois par mois), committers confirmĂ©s (a.k.a. mentors) et contributeurs motivĂ©s (a.k.a. hackers) !</p> <p>Chacun y retrouve son compte :</p> <ul> <li> <p>le mentor voit son projet avancer grĂące aux contributions</p> </li> <li> <p>le hacker se familiarise avec la base de code, avec l’aide du mentor et envoie ses premiĂšres contributions en quelques heures, et non pas en quelques jours</p> </li> </ul> <p>Lors de la derniĂšre soirĂ©e, un binĂŽme a rĂ©ussi Ă  soumettre une <a href="https://github.com/apache/maven-shared/pull/13">pull request</a> au projet Apache Maven ! Ils ont pourtant commencĂ© la soirĂ©e sans connaissances prĂ©alables de la base de code. Merci Ă  HervĂ© pour le mentoring au passage !</p> <p>Tous les hackers sont bienvenus ! Ne vous auto-censurez pas en pensant que vous n’avez pas le niveau, ça n’est pas vrai ! ;-)</p> <h1 id="appel-aux-mentors">Appel aux mentors</h1> <p>Vous souhaitez prĂ©senter votre projet et attirer de nouvelles contributions ?</p> <p>Pour se faire, deux rĂšgles sont en vigueur :</p> <ol> <li> <p>prĂ©parer une prĂ©sentation deux minutes afin de familiariser et "vendre" votre projet aux participants</p> </li> <li> <p>avoir un ensemble de tĂąches bien dĂ©finies, idĂ©alement rĂ©alisables en une soirĂ©e</p> </li> </ol> <p>Concernant la technologie employĂ©e : aucune contrainte !</p> <p>Je tiens Ă  insister sur ce point. On pourrait croire actuellement que le meetup est rĂ©servĂ© aux dĂ©veloppeurs Java, ça n’est pas le cas !</p> <p>Il se peut mĂȘme qu’une session du Paris Hackergarten soit prochainement dĂ©diĂ©e au dĂ©veloppement iOS, stay tuned! ;-)</p> <h1 id="Ă -vos-calendriers-">À vos calendriers !</h1> <p>Nous nous efforçons d’organiser le <a href="http://www.meetup.com/Paris-Hackergarten/">Paris Hackergarten</a> tous les derniers mardis du mois, dans les locaux de Xebia.</p> <p>Le <a href="http://www.meetup.com/Paris-Hackergarten/events/231855753/">prochain</a> aura donc lieu le 27 Septembre, j’espĂšre donc vous y voir !</p>Florent BivilleQu’on se le dise, les logiciels Open Source sont partout. Il y a fort Ă  parier que vous les utilisiez directement voire en dĂ©veloppiez dans votre activitĂ© professionnelle. Il demeure indĂ©niable que vous en bĂ©nĂ©ficiez dans votre vie quotidienne, mĂȘme indirectement.Rant: The Teletubbies “Documentation” Pitfall2016-09-19T00:00:00+00:002016-09-19T00:00:00+00:00https://fbiville.github.io/2016/09/19/Rant-The-Teletubbies-Documentation-Pitfall<h1 id="disclaimer">Disclaimer</h1> <p>I am not Uncle Bob’s nephew, but if you already have read Clean Code, chances are you will not learn much from this post.</p> <h1 id="typical-example">Typical example</h1> <p>Let me talk about a coding practice that I find profoundly disturbing. Get this code for instance:</p> <pre><code class="language-{.java}">public SomeResult computeResult(SomeParameter parameter) { // call nice service to fetch foo Foo foo = niceService.fetchFoo(parameter); return new SomeResult(foo); } </code></pre> <p>Basically, we have got some trivial calls to a service and use it for instanciating the result we are interested in.</p> <p>Do we need the comment, though? Obviously, we don’t!</p> <p>We are just adding noise!</p> <p>That’s why I call it a Teletubbies documentation.</p> <h1 id="teletu-what">Teletu-what?</h1> <p>Teletubbies, as you probably already know, is a TV show for very young children, created by the BBC.</p> <p>If you know the show, you know also that whenever a Teletubbies character does something, the following happens:</p> <ol> <li> <p>the character announces what it intends to do</p> </li> <li> <p>the voice-over paraphrases what the character just said</p> </li> <li> <p>the character does it</p> </li> <li> <p>optionally back to step 1</p> </li> </ol> <p>This makes sense for very young children, part of education is based on repetition.</p> <h1 id="back-to-our-example">Back to our example</h1> <p>So whenever I encounter a snippet of code like above, I immediately hear this annoying voice-over that just repeats something we already know.</p> <p>It is annoying because, well, we are not very young children.</p> <p>What’s the big deal, you might object?</p> <p>Well, comments like these can <strong>easily</strong> get out of sync. In the worst-case scenario, they become misleading.</p> <p>It leads to situations where you have to confront the current code and the outdated comment and you cannot really be sure which one describes what the behavior <strong>should</strong> be.</p> <p>Comments don’t run, they are just an informal bunch of text and cannot be changed automatically (at least, not in a 100% reliable way). Their risk of becoming obsolete is therefore higher.</p> <p>To rephrase it, comments like this are part of the problem, not the solution.</p> <p>Inline comments are just a liability.</p> <p>The worst part is that they often appear as a whole bunch:</p> <pre><code class="language-{.java}">public SomeResult computeResult(SomeParameter parameter) { // call nice service to fetch foo Foo foo = niceService.fetchFoo(parameter); // [...] 200 lines with comments+code like that // hilarity ensues... not return new SomeResult(foo, ...); } </code></pre> <p>Indeed, the bad side effect of this kind of brain-dead comments is that it <strong>prevents</strong> the original authors to ask themselves: is the code readable enough this way? Am I thinking this through? How can I make the code more self-explanatory?</p> <p>If you get used to this kind of comments, you will most likely focus your reading on them and live in the illusion that the method is readable and well-documented.</p> <p>I have got some bad news for you: 200 lines of code for a method are NOT readable at all, no matter how much obsolete poetry you stick in there.</p> <p>As a general rule of thumb, is it worth writing something down if that only took you 10 seconds to come up with?</p> <h1 id="a-not-so-noisy-example">A not-so-noisy example</h1> <p>Let’s move on to a more interesting example.</p> <p>It’s not that the first example does not happen frequently, but there are some situations like the following that involves a bit more than pure noise.</p> <pre><code class="language-{.java}">public SomeResult computeResult(SomeParameter parameter) { /* * call nice service to fetch foo because * some contextual reasons * * fetchFoo may throw in theory but will not * because the parameter is always valid in * this particular usecase [...], so no try-catch, * YOLO */ Foo foo = niceService.fetchFoo(parameter); return new SomeResult(foo); } </code></pre> <p>"Ah! This comment is useful! It explains the implementation rationale!”, you may say.</p> <p>While there is some value in these pieces of information, they just do not belong there.</p> <p>Let me elaborate.</p> <h1 id="small-detour-back-to-basics">Small detour: back to basics</h1> <p>As you already know, in many programming languages, method signatures look like:</p> <pre><code class="language-{.java}">public SomeResult computeResult(SomeParameter parameter) </code></pre> <p>Ideally, the signature should be explicit enough (especially with well-defined types, parametricity FTW) to know what the method does. How the method does it should be relevant only if you have to change something there.</p> <p>Everything that follows between curly braces is about <strong>implementation</strong> details.</p> <h1 id="back-to-the-example-again">Back to the example again</h1> <p>However, I would argue that the two information encoded as a inline comment above are NOT implementation details, yet they live in the implementation section.</p> <p>What are these comment sections about?</p> <ol> <li> <p>the first part describe the intent behind the implementation (or at least part of it)</p> </li> <li> <p>the second and last part describe (part of) the observable behavior of the method</p> </li> </ol> <h1 id="intent-documentation">Intent documentation</h1> <p>Intents are very contextual and temporal.</p> <p>Decisions, no matter how small, are taken every day and guide the way we implement things.</p> <p>These decisions are influenced by temporal factors mostly: the assumptions made at the time may not hold at all anymore in 6 months, 1 year
​</p> <p>Temporal documentation.</p> <p><strong>TEMPORAL</strong> documentation.</p> <p>It rings a bell, somehow.</p> <p>S-C-M! Source Control Management tools like Git, Mercurial and friends.</p> <p>They play an important part in documentation. Not only do they intrinsically describe what has changed and when, they should describe <strong>why</strong> the changes were made.</p> <p>That’s what <strong>commit messages</strong> are for!</p> <p>And if you start thinking this way, there will be an additional benefit: you will keep your commits as small and focused as possible. If the commit is too big, there is no way you can explain all the important changes you made ;-)</p> <p>And if you start to care enough about your changelog, you will get nice readable releases notes for free!</p> <h1 id="observable-behavior-documentation">Observable behavior documentation</h1> <p>If what you describe is part of the observable behavior of the scope you are modifying, then it is clearly about the contract you implicitly sign between the code you are implementing and its callers.</p> <p>The documentation is about the API. API is just a clever name for a set of accessible signatures. It is not an implementation detail at all, it should be near the method signature itself:</p> <pre><code class="language-{.java}">/** * *describes the nominal observable behaviour here [...]* * * fetchFoo may throw in theory but will not * because the parameter is always valid in this * particular usecase [...], so no try-catch, YOLO */ public SomeResult computeResult(SomeParameter parameter) { Foo foo = niceService.fetchFoo(parameter); return new SomeResult(foo); } </code></pre> <h1 id="going-further">Going further</h1> <p>You could even rewrite the method like this:</p> <pre><code class="language-{.java}">/** * *describes the nominal observable behaviour here [...]* */ public SomeResult computeResult(SomeParameter parameter) { try { Foo foo = niceService.fetchFoo(parameter); return new SomeResult(foo); } catch (MyNiceServiceException e) { throw new AssertionError("Should not happen", e); } } </code></pre> <p>Now the assumptions are even more explicit. That opens even an interesting discussion about the virtues of <a href="https://www.youtube.com/watch?v=57P86oZXjXs">failing fast</a> :-)</p> <p>One could argue we could do even better. Ideally, method signatures should be sufficient to tell what the method is doing: <a href="http://data.tmorris.net/talks/yow-west-2016/1d388b6263e7cbeedfbea224997648daa1d7862d/parametricity.pdf">parametricity</a> FTW! Hoogle.com is probably one of the best illustrations for this.</p> <p>That requires discipline (especially with languages such as Java, C# et al), but is not impossible to achieve: try to minimize and contain side effects, forego nulls
​ and then types could convery a lot more useful information!</p> <p>Yet another interesting discussion!</p> <h1 id="the-end">The end</h1> <p>As you can see, caring about documentation is a gateway drug to better software, clearer releases and happier collaborators.</p> <p>I personally write comments less than 1% of the time I write code. This happens where there is a tiny local expression that may seem obscure and there is not simple way around it.</p> <p>For the 99+%, there are almost always better places to write the information you want to convey:</p> <ul> <li> <p>the code itself, it should answer <strong>WHAT</strong> it does, without ambiguity, else just refactor it (extract meaningful methods, rename, split expressions
​ the IDE is your friend). This is the material that decays the least, rely on this as much as you can!</p> </li> <li> <p>the *-doc (e.g. Javadoc, Csharpdoc): the information is about the observable behavior of the section you are altering</p> </li> <li> <p>the intent: that should justify the commit you are about to push</p> </li> </ul> <p>Inline comments are (99+%) dead! Long live inline comments!</p>Florent BivilleDisclaimerCompilers Hate Him! Discover This One Weird Trick with Neo4j Stored Procedures2016-07-12T00:00:00+00:002016-07-12T00:00:00+00:00https://fbiville.github.io/2016/07/12/Compilers-hate-him-Discover-this-one-weird-trick-with-Neo4j-stored-procedures<p>As you probably already know, Neo4j 3.0 finally comes with <a href="https://neo4j.com/docs/java-reference/current/#_calling_procedure">stored procedures</a> (let’s call them sprocs from now on).</p> <p>The cool thing about this is you can directly interact with sprocs in Cypher, as <a href="https://twitter.com/mesirii">Michael Hunger</a> explains in this <a href="https://neo4j.com/blog/intro-user-defined-procedures-apoc/">blog post</a>.</p> <h1 id="writing-stored-procedures">Writing stored procedures</h1> <p>During the preparation of my Neo4j introduction talk in the latest <a href="https://www.facebook.com/GoCriteo/photos/pcb.1045385882181102/1045385698847787/?type=3">Criteo summit</a> (we’re <a href="http://www.criteo.com/careers/#careers-browser">hiring</a>!), I started playing around with sprocs.</p> <p>The process is quite simple:</p> <ol> <li> <p>You write some code, annotate it</p> </li> <li> <p>test it with the test harness</p> </li> <li> <p>package the JAR and deploy it to your Neo4j instance (<code class="language-plaintext highlighter-rouge">plugins/</code>)!</p> </li> </ol> <p>Actually, step 3 may repeat itself quite a few times, Neo4j sprocs must comply to a few rules before your Neo4j server accepts to deploy it.</p> <h1 id="sproc-rules">Sproc rules</h1> <p>The rules are detailed in <code class="language-plaintext highlighter-rouge">@org.neo4j.procedure.Procedure</code> <a href="https://github.com/neo4j/neo4j/blob/3.0/community/kernel/src/main/java/org/neo4j/procedure/Procedure.java#L31">javadoc</a>, but we can summarize them as follows:</p> <ul> <li> <p>a sproc is a method annotated with <code class="language-plaintext highlighter-rouge">@org.neo4j.procedure.Procedure</code></p> </li> <li> <p>it must return a <a href="https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html"><code class="language-plaintext highlighter-rouge">java.util.stream.Stream&lt;T&gt;</code></a> where T is a user-defined record type</p> </li> <li> <p>the record type must define public fields</p> </li> <li> <p>these can only be of restricted types</p> </li> <li> <p>if the sproc accepts parameters, they all must be annotated with <a href="https://github.com/neo4j/neo4j/blob/3.0/community/kernel/src/main/java/org/neo4j/procedure/Name.java"><code class="language-plaintext highlighter-rouge">@org.neo4j.procedure.Name</code></a></p> </li> <li> <p>parameters can only be of specific types</p> </li> <li> <p>the procedure name must be unique (name = package name+method name)</p> </li> <li> <p>injectable types (<code class="language-plaintext highlighter-rouge">GraphDatabaseService</code> et al) must target public non-static, non-final, <a href="https://github.com/neo4j/neo4j/blob/3.0/community/kernel/src/main/java/org/neo4j/procedure/Context.java"><code class="language-plaintext highlighter-rouge">@Context</code>-annotated</a> fields</p> </li> </ul> <p>Fortunately, folks at <a href="https://neo4j.com/company/">Neo Technology</a> have done a wonderful job at error reporting. Neo4j fails fast if any of the rules is violated and gives a detailed error message.</p> <p>Here is an example with Neo4j 3.0.3 and the following <strong>failing</strong> attempt to deploy the following sproc:</p> <pre><code class="language-{.java}">@Procedure public Stream&lt;MyRecord&gt; doSomething(Map&lt;String, Integer&gt; value) { // [...] } </code></pre> <p>The following error will be prompted (see <code class="language-plaintext highlighter-rouge">logs/neo4j.log</code>):</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Caused by: org.neo4j.kernel.api.exceptions.ProcedureException: Argument at position 0 in method `doSomething` is missing an `@Name` annotation. Please add the annotation, recompile the class and try again. </code></pre></div></div> <p>Nice error message! Just add the missing <code class="language-plaintext highlighter-rouge">@Name</code> on the only parameter, re-compile, package and deploy the JAR again, restart Neo4j and you’re done!</p> <h1 id="can-we-do-better">Can we do better?</h1> <p>The previous example is quite trivial, but this back-and-forth could be potentially repeated many times, especially when one is not much familiar with sprocs.</p> <p>Fortunately for us, most of the errors can be caught at compile time.</p> <h1 id="eurekaannotation-processing-ftw">@Eureka("annotation processing FTW!”)</h1> <p>Annotations have been around in Java since end of 2004 (v1.5) and have come together with <code class="language-plaintext highlighter-rouge">apt</code> (now built in <code class="language-plaintext highlighter-rouge">javac</code>), the annotation processing tool.</p> <p>What the latter does in brief (in long, read the <a href="https://www.jcp.org/en/jsr/detail?id=269">spec</a>) is to allow user-defined code to introspect a Java program at compile-time (original paper <a href="http://www.bracha.org/mirrors.pdf">here</a>) and possibly:</p> <ul> <li> <p>issue compilation notices/warnings/errors</p> </li> <li> <p>generate static, source and/or bytecode files</p> </li> </ul> <p>(By the way, this means exceptions can be raised at compile-time too!)</p> <p>Based on this, I decided to write a little annotation processor on my way back from Criteo summit (did I mention we are <a href="http://www.criteo.com/careers/#careers-browser">hiring</a>?).</p> <p><a href="https://github.com/fbiville/neo4j-sproc-compiler">neo4j-sproc-compiler</a> is born. And it’s <a href="https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/18fe85a3712aa84696cc4dedaf0db659a63e3e7b/pom.xml#L72">used</a>!</p> <p>If Michael is happy, I am happy:</p> <p><img src="https://raw.githubusercontent.com/fbiville/fbiville.github.io/master/images/michael-sproc-compiler-feedback.png" alt="michael sproc compiler feedback" /></p> <p>(I swear it’s not photoshopped, see #apoc channel, 1st of July 2016 in Neo4j-Users <a href="https://neo4j-users.slack.com">Slack</a>).</p> <h1 id="neo4j-sproc-compiler-in-action">neo4j-sproc-compiler in action</h1> <p>While the following screencast features Maven, the annotation processor is actually agnostic of any build tool. You can use any build tool you want or directly <code class="language-plaintext highlighter-rouge">javac</code> if that floats your boat!</p> <p><a href="https://asciinema.org/a/79379"><img src="https://asciinema.org/a/79379.svg" alt="asciicast" /></a></p> <h1 id="conclusion">Conclusion</h1> <p>Be cautious, most but <strong>not</strong> all checks can be performed at compile time. You’ll still need to write some tests and monitor your deploys!</p> <p>Hopefully, this little utility that I wrote will shorten your development feedback loop and get your stored procedures harder, better, stronger and faster.</p>Florent BivilleAs you probably already know, Neo4j 3.0 finally comes with stored procedures (let’s call them sprocs from now on).New Blog!2015-05-03T00:00:00+00:002015-05-03T00:00:00+00:00https://fbiville.github.io/2015/05/03/New-blog<p>Getting rid of Dotclear was long overdue. Impractical at best, I wasted way too much time polishing the contents so that it would not render too bad.</p> <h1 id="whats-next">What’s next?</h1> <p>I need to automate the migration to HubPress, so it will take some more time before all my blog posts show up here. For now, <a href="http://florent.biville.net">http://florent.biville.net</a> is still serving my old blog.</p> <p>It’s just a matter of time before everything is fully set up ;)</p>Florent BivilleGetting rid of Dotclear was long overdue. Impractical at best, I wasted way too much time polishing the contents so that it would not render too bad.Transfert Estival2014-10-05T00:00:00+00:002014-10-05T00:00:00+00:00https://fbiville.github.io/2014/10/05/Transfert-estival<h1 id="mais-pourquoi-">Mais pourquoi ?!</h1> <p>Pour avoir un dictionnaire chaque annĂ©e, bien sĂ»r ! (DĂ©solĂ©, mes talents GIMPiens sont encore limitĂ©s).</p> <p><img src="/assets/img/rtfv_m.png" alt="Read The F****** Vidal" /></p> <p>Plus sĂ©rieusement, le fait de partir de Lateral Thoughts, sociĂ©tĂ© Ă  laquelle j’étais associĂ© et oĂč je disposais d’une grande autonomie, peut poser question. Lateral Thoughts, pour toute personne souhaitant devenir freelance, est un endroit idĂ©al. On peut mĂȘme y ĂȘtre salariĂ© en ayant les mĂȘmes avantages (rĂ©munĂ©rations nets moindres, Ă©videmment). Oui, mais voilĂ , alors que le freelancing fait rage depuis plusieurs annĂ©es dans notre "industrie", ma voie actuelle s’en Ă©carte.</p> <h1 id="le-dĂ©clencheur">Le dĂ©clencheur</h1> <p>Il y a quelques mois, j’ai Ă©tĂ© contactĂ© par un recruteur Google.  De l’agrĂ©able surprise s’ensuivit un stress Ă©norme et des prĂ©parations d’entretien jusqu’à la derniĂšre marche courant Juin : la journĂ©e d’entretiens Ă  Paris. Finalement non retenu Ă  l’ultime jury de sĂ©lection de cette ultime Ă©tape, je n’en retiens que du positif. Petite parenthĂšse, quand je vois certains critiquer les entretiens oĂč il est demandĂ© de coder, je rigole doucement. Tentez le marathon Google et on en reparle :) Revenons Ă  nos moutons.  Comme je le disais, cette expĂ©rience intense m’a Ă©normĂ©ment appris : la lecture des publications de Google, entr’apercevoir l’entreprise pendant quelques heures, parler avec quelques ingĂ©nieurs
 ont renforcĂ© ma conviction sur un point : je veux ĂȘtre dĂ©veloppeur, et rien d’autre. C’est un peu l’essence de notre mĂ©tier, tel que je le conçois, qui m’est revenu en pleine figure : la technique au service du besoin. Et quand je dis technique, je ne parle pas du dernier framework Ă  la mode ou du dernier <a href="https://developer.apple.com/swift/">langage</a> soi-disant rĂ©volutionnaire. Je pense plutĂŽt Ă  de l’algorithmie, du design (pas celui de l’Architecte Omniscient, hein). Les ingĂ©s de Google n’ont pas créé <a href="http://cracking8hacking.com/cracking-hacking/Ebooks/Misc/pdf/The%20Google%20filesystem.pdf">Google FileSystem</a> pour le fun ou pour en parler en confĂ©rence, mais bien parce que le besoin Ă©tait criant. Revenir aux fondamentaux a donc redynamisĂ© mon intĂ©rĂȘt pour le dĂ©veloppement et m’a fait prendre conscience de la distance entre mon quotidien, le microcosme dans lequel j’évolue et le quotidien prĂ©sentĂ© dans une entreprise d’une telle ampleur.</p> <h1 id="et-pourquoi-pas-freelance-">Et pourquoi pas freelance ?</h1> <p>Le fait d’évoluer en quasi-freelance m’a appris beaucoup de choses. Ça pourrait en fait se rĂ©sumer en une phrase : on n’obtient que ce que l’on va chercher.  Une bonne mission ? Trouve-la toi-mĂȘme (ou fais en sorte que celle oĂč tu es le devienne). Pas content de telle ou telle situation ? Agis ou accepte. Tout n’est pas rose non plus. Au sein d’un regroupement de freelances ou simili-freelances comme Ă  Lateral Thoughts, chacun, et c’est bien normal, trace son bonhomme de chemin et fait Ă©merger les projets qu’il a envie de dĂ©velopper. LĂ  oĂč cela se complique, c’est quand il s’agit de mutualiser les efforts. Pas de magie : si tu as besoin de plus de cerveaux pour co-rĂ©aliser ton idĂ©e, il faut convaincre.  C’est un procĂ©dĂ© juste, mais usant voire parfois dĂ©motivant. Pour qui me donne-je du mal ? Pour ma personne ? Pour Lateral Thoughts ? Une des rĂ©ponses est : “en t’exposant au public, tu bĂ©nĂ©ficies de plus de visibilitĂ© et c’est aussi tout bĂ©nef’ pour LT”. J’ai d’ailleurs suivi ce prĂ©cepte pendant 2 ans, autour de Neo4j, notamment : de Paris Ă  Istanbul en passant par GenĂšve.  Enrichissant, mais fatigant aussi. Finalement, ces entretiens pour Google m’ont redonnĂ© un objectif qui dĂ©passe mon nombril. J’ai touchĂ© de prĂšs Ă  l’un des gĂ©ants du Web, une boĂźte qui (me) fait rĂȘver et Ă  laquelle j’ai envie de contribuer.  (J’assume mon cĂŽtĂ© bisounours). Bref, Google m’a juste aiguillĂ© sur le bon chemin. Et ce chemin ne passe pas par le freelancing.</p> <h1 id="larrivĂ©e-Ă -vidal">L’arrivĂ©e Ă  Vidal</h1> <p>J’étais dĂ©jĂ  intervenu Ă  Vidal et j’y connaissais ses challenges techniques. L’environnement de travail de notre Ă©quipe auto-organisĂ©e est propice Ă  l’amĂ©lioration continue et je compte bien l’utiliser Ă  bon escient. Ce qui m’a motivĂ© pour les rejoindre en tant qu’interne : c’est la perspective de pouvoir se focaliser sur ce que l’on fait de mieux et devenir irrĂ©prochables (par ordre d’importance) :</p> <ul> <li> <p>s’approprier nos softs, de leur crĂ©ation au suivi de prod en passant par les tests</p> </li> <li> <p>devenir de plus en plus vĂ©loces sur la maintenance de ces produits</p> </li> <li> <p>oser tenter des choix Ă  contre-courant</p> </li> </ul> <p>Ce ne sont pas les idĂ©es qui manquent, ni la motivation gĂ©nĂ©rale. J’ai vraiment Ă  coeur que notre Ă©quipe "Software" s’amĂ©liore collectivement. Nicolas Martignole parlait de l’équipe <a href="http://www.touilleur-express.fr/2010/03/19/rencontre-avec-des-developpeurs-chez-vidal-software/">"Software" de Vidal en 2010</a>, vivement 2015 !</p>Florent BivilleMais pourquoi ?!CrĂ©er une application java avec Neo4j embarquĂ©2014-06-17T00:00:00+00:002014-06-17T00:00:00+00:00https://fbiville.github.io/2014/06/17/Creer-une-application-Java-avec-Neo4j-embarque<h1 id="un-long-discours-">Un long discours ?</h1> <p>AprĂšs vous avoir assommĂ© avec <a href="/?post/2014/06/09/Neo4j-sous-le-capot">mon article prĂ©cĂ©dent</a> sur le stockage interne de Neo4j et sa scalabilitĂ©, je vais aujourd’hui me contenter d’assez peu. En effet, plutĂŽt que de consacrer un effort important Ă  expliquer des bonnes pratiques autour de la mise en oeuvre de Neo4j dans des projets Java, pourquoi ne pas crĂ©er l’<a href="https://github.com/fbiville/maven-embedded-neo4j-archetype">archetype Maven</a> qui fait le boulot ?</p> <h1 id="archetype-maven-">Archetype
​ Maven ?</h1> <p>Alors oui, je sais, certains d’entre vous ne peuvent pas voir Maven en couleurs. </p> <p>Je sais qu’il existe quelques archetypes bien particuliers autour de Neo4j pour d’autres outils de build tels que <a href="https://github.com/sarmbruster/unmanaged-extension-archetype">celui</a> de <a href="https://twitter.com/darthvader42">Stefan Armbruster</a> pour <a href="http://www.gradle.org/">Gradle</a>. NĂ©anmoins, je n’ai pas croisĂ© d’archetypes Ă©quivalents Ă  celui que je vais vous prĂ©senter.</p> <p>Si vous pensez en avoir trouvĂ© un, n’hĂ©sitez pas Ă  <a href="https://www.twitter.com/fbiville">me contacter</a> que je le liste ici.</p> <h2 id="physiologie">Physiologie</h2> <p>Penchons-nous maintenant sur l’<a href="https://github.com/fbiville/maven-embedded-neo4j-archetype">archetype</a> créé pour l’occasion.</p> <p>Il gĂ©nĂšre des projets embarquant :</p> <ul> <li> <p>neo4j</p> </li> <li> <p>neo4j-kernel (classifier test-jar) pour les tests d’intĂ©gration</p> </li> <li> <p>junit</p> </li> <li> <p>assertj-core</p> </li> </ul> <p><a href="http://joel-costigliola.github.io/assertj/assertj-neo4j.html">assertj-neo4j</a> n’est pas encore assez mature, je vais tĂącher de le faire Ă©voluer avant de le proposer via l’archetype.</p> <h2 id="contenu">Contenu</h2> <p>Si vous suivez <a href="https://github.com/fbiville/maven-embedded-neo4j-archetype/blob/master/README.md">les instructions</a>, vous vous retrouverez avec un projet tout simple : * qui insĂšre des donnĂ©es avec <a href="http://docs.neo4j.org/chunked/stable/cypher-query-lang.html">Cypher</a> :</p> <ul> <li>qui lit des donnĂ©es via le <a href="http://docs.neo4j.org/chunked/stable/tutorial-traversal-java-api.html">framework de traversĂ©e Java</a></li> <li>qui utilise EmbeddedDatabaseRule pour les tests <a href="http://junit.org/">JUnit</a> (cette <a href="https://github.com/junit-team/junit/wiki/Rules">rĂšgle JUnit</a> encapsule l’utilisation de Neo4j pour les tests d’intĂ©gration via son <a href="http://docs.neo4j.org/chunked/stable/tutorials-java-unit-testing.html">implĂ©mentation spĂ©cifique</a>)</li> </ul> <h1 id="conclusion">Conclusion</h1> <p>Un autre archetype Maven devrait suivre pour l’interfaçage REST de Neo4j.  L’archetype dĂ©crit ici sera bientĂŽt releasĂ© sur Maven Central. En attendant, vous pouvez dĂ©jĂ  l’utiliser et dĂ©marrer avec Neo4j sur des bases saines !</p>Florent BivilleUn long discours ?Neo4j Sous Le Capot2014-06-09T00:00:00+00:002014-06-09T00:00:00+00:00https://fbiville.github.io/2014/06/09/Neo4j-sous-le-capot<h1 id="3615-ma-vie">3615-ma-vie</h1> <dl> <dt>Tout ce qui va suivre n’est qu’un tissu de mauvaises excuses, me</dt> <dt>direz-vous, mais j’ai tout de mĂȘme quelques circonstances attĂ©nuantes</dt> <dt>quant Ă  l’inactivitĂ© de mon blog (et mon absence de la scĂšne parisienne</dt> <dd>je n’y ai pas fait de talks depuis 6 mois).</dd> </dl> <p>Sur un plan personnel d’abord, je suis heureux de vous annoncer qu’une jolie alliance orne dĂ©sormais l’annulaire de ma main gauche :-)</p> <p>Sur un plan professionnel, bien qu’absent “publiquement”, beaucoup de choses se sont passĂ©es : ma premiĂšre <a href="http://www.lateral-thoughts.com/formation-neo4j">formation sur Neo4j</a> a eu lieu, j’ai eu l’occasion d’intervenir chez plus de clients et certains projets autour de Neo4j s’esquissent encore (stay tuned!).</p> <p>D’ailleurs, si vous voulez que je vienne parler de Neo4j dans votre User Group, n’hĂ©sitez pas Ă  me contacter (sur <a href="https://twitter.com/fbiville">Twitter</a> par exemple).</p> <h1 id="back-to-business--parlons-de-neo">Back to business : parlons de Neo</h1> <h2 id="base-de-donnĂ©es-orientĂ©e-graphe-">Base de donnĂ©es orientĂ©e graphe ?</h2> <p><a href="http://www.neo4j.org/">Neo4j</a>, vous l’aurez compris, est une base de donnĂ©es orientĂ©e graphe. Mais qu’est-ce qu’“orientĂ©e graphe” signifie exactement ?</p> <p>Si l’on cite <a href="http://fr.wikipedia.org/wiki/Base_de_donn%C3%A9es_orient%C3%A9e_graphe">Wikipedia</a>, une base de donnĂ©es orientĂ©e graphe (<em>graph database</em>) est donc une base de donnĂ©es mettant en oeuvre des noeuds, relations et propriĂ©tĂ©s pour reprĂ©senter et stocker de la donnĂ©e.</p> <p>Cette dĂ©finition peut vous paraĂźtre anodine, mais notez bien la prĂ©sence de deux verbes (et non pas d’un seul) : </p> <ul> <li> <p>reprĂ©senter</p> </li> <li> <p>stocker</p> </li> </ul> <p>En termes plus techniques, une base de donnĂ©es orientĂ©e graphe offre donc une API (“reprĂ©senter”) exposant un vocabulaire propre au graphe. Ses enregistrements sur disque (“stocker”) doivent eux aussi ĂȘtre formatĂ©s selon les structures d’un graphe.</p> <p>Ce deuxiĂšme point est fondamental. </p> <p>Prenons l’exemple d’un concurrent de Neo4j : <a href="http://thinkaurelius.github.io/titan/">Titan</a>. </p> <p>DĂšs la page d’accueil, on peut lire : </p> <blockquote> <p>Titan is a scalable graph database [
​] </p> <p>Support for various storage backends:</p> <ul> <li> <p>Apache Cassandra</p> </li> <li> <p>Apache HBase</p> </li> <li> <p>Oracle BerkeleyDB</p> </li> <li> <p>Akiban Persistit</p> </li> </ul> <p>Cela contredit la dĂ©finition que je vous ai donnĂ©e plus haut. </p> </blockquote> <p>Si Titan Ă©tait une base de donnĂ©es graphe, cela impliquerait que Cassandra, HBase, BerkeleyDB et Persistit le soient. Or, jusqu’à preuve du contraire, cela n’est pas le cas :)</p> <p>Titan propose une <strong>surcouche</strong> d’API orientĂ©e graphe, dĂ©lĂ©guant la persistance Ă  des stores distribuĂ©es. Cela n’en fait pas pour autant une base de donnĂ©es orientĂ©e graphe, tout comme <a href="https://giraph.apache.org/">Apache Giraph</a> n’est “qu’une” API de calcul orientĂ©e graphe.</p> <p>“Quelle importance ?”, me direz-vous ?</p> <p>HĂ© bien, une base de donnĂ©es graphe, bien qu’elle offre des nombreux avantages, est intrinsĂšquement difficile Ă  distribuer comme nous allons le voir au travers de cet article. C’est en regardant les couches les plus basses d’une base typiquement orientĂ©e graphe comme Neo4j que vous allez comprendre ce qu’ĂȘtre une base de donnĂ©es graphe implique en termes de partis pris.</p> <h2 id="des-liens-et-des-chaĂźnes">Des liens et des chaĂźnes</h2> <p>Neo4j, selon le modĂšle du <a href="https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model">Property Graph</a>, structure les donnĂ©es par des noeuds liĂ©s par des relations. </p> <ul> <li> <p>Chacune de ces entitĂ©s peut se voir attribuer un ensemble de propriĂ©tĂ©s (une clef [String], une valeur [entier, String, tableau de primitifs]).</p> </li> <li> <p>Chaque relation porte obligatoirement une notion de type (exemple : une relation “FOLLOWS” ou “IS_FRIEND_WITH”).</p> </li> <li> <p>Chaque noeud porte, depuis la version 2.0, une notion optionnelle (mais fortement recommandĂ©e) appelĂ©e “label” (un noeud a de 0 Ă  n labels).</p> </li> </ul> <p>Évidemment, toutes ces informations sont persistĂ©es sur disque.</p> <p>Un simple <code class="language-plaintext highlighter-rouge">ls /path/to/neo/data/graph.db</code> vous permettra de constater, outre les fichiers d’indexes Lucene (legacy: rĂ©pertoire <code class="language-plaintext highlighter-rouge">index</code>, nouveau: rĂ©pertoire <code class="language-plaintext highlighter-rouge">schema</code>) et les journaux de transactions, les diffĂ©rents fichiers .db :</p> <ul> <li> <p><code class="language-plaintext highlighter-rouge">neostore.labeltokenstore.db</code></p> </li> <li> <p><code class="language-plaintext highlighter-rouge">neostore.nodestore.db</code></p> </li> <li> <p><code class="language-plaintext highlighter-rouge">neostore.propertystore.db</code></p> </li> <li> <p><code class="language-plaintext highlighter-rouge">neostore.relationshipstore.db</code></p> </li> <li> <p><code class="language-plaintext highlighter-rouge">neostore.schemastore.db</code></p> </li> </ul> <p>Ils reprĂ©sentent tous un “store” dĂ©diĂ© Ă  un type de donnĂ©es particulier. Passons-les en revue individuellement, en commençant par les nouveautĂ©s. </p> <p>Notez que les informations Ă  venir sont sujettes Ă  caution : les <a href="http://neo4j.com/blog/the-neo4j-2-1-0-milestone-1-release-import-and-dense-nodes/">rĂ©cents travaux</a> autour des noeuds denses ont sans doute influencĂ© le format des fichiers dĂ©crits.</p> <h3 id="labeltokenstore"><code class="language-plaintext highlighter-rouge">LabelTokenStore</code></h3> <p>On s’en douterait presque, ce(s) fichier(s) contien(nen)t les enregistrements de labels. Il(s) n’existai(en)t donc pas avant la sortie de la 2.0.</p> <p>Ces enregistrements comprennent :</p> <ul> <li> <p>un ID interne (typĂ© int en Java, donc jusqu’à 2ÂłÂč - 1 [sauf Java 8 oĂč on peut avoir des int de 0 Ă  232 - 1 mais je diverge]). chacun de ces IDs est rĂ©fĂ©rencĂ© dans le fichier neostore.labeltokenstore.db.id. </p> </li> <li> <p>et un nom (c’est justement la valeur que vous assignez au label : “Personne” pour le label Personne) lui-mĂȘme uniquement identifiĂ© (neostore.labeltokenstore.db.names.id) et stockĂ© dans (neostore.labeltokenstore.db.names)</p> </li> </ul> <p>Ainsi le fichier neostore.labeltokenstore.db ne comporte en fait que des rĂ©fĂ©rences vers les IDs internes et noms, stockĂ©s “à cĂŽtĂ©â€. Notez que cette division en fichier <code class="language-plaintext highlighter-rouge">neostore.db.*</code> se retrouve pour tous les autres stores. </p> <h3 id="schemastore"><code class="language-plaintext highlighter-rouge">SchemaStore</code></h3> <p>Avec l’émergence des labels est apparu la notion de schema. Ne vous emballez pas : Neo4j n’est pas devenue une base de donnĂ©es normalisĂ©e. On parle plutĂŽt d’une base de donnĂ©es <em>schema-optional</em>. </p> <p>Les labels permettent de grouper des noeuds sĂ©mantiquement similaires (cela est donc complĂštement dĂ©pendant du domaine mĂ©tier) mais rien n’empĂȘche lesdits noeuds d’ĂȘtre complĂštement hĂ©tĂ©rogĂšnes. Par exemple, deux noeuds peuvent partager le label Personne tout en comportant des propriĂ©tĂ©s diffĂ©rentes, disons, la couleur des cheveux pour l’un, la pointure pour l’autre.</p> <p>Maintenant que nous avons des labels Ă  disposition, nous pouvons mĂȘme dĂ©finir des contraintes sur ceux-ci : des contraintes d’unicitĂ© par exemple. Ces contraintes sont en fait appelĂ©es <em>rules</em> et l’ensemble de celles-ci forment le fameux schema dont je vous parlais. Ce support est assez rĂ©cent et la structuration sous-jacente est encore toute simple. En effet, une rule comprend :</p> <ul> <li> <p>un ID interne (<code class="language-plaintext highlighter-rouge">neostore.schemastore.db.id</code>)</p> </li> <li> <p>sa description Ă  proprement parler (<code class="language-plaintext highlighter-rouge">neostore.schemastore.db</code>)</p> </li> </ul> <p>Jusqu’ici, j’ai couvert les additions rĂ©centes de Neo4j. </p> <p>Bien entendu, Neo n’a pas attendu sa version 2.0 pour ĂȘtre une base de donnĂ©es orientĂ©e graphe Ă  part entiĂšre. Regardons ses composants centraux.</p> <h3 id="propertystore">PropertyStore</h3> <p>À quoi servirait une base de donnĂ©es orientĂ©e graphe sans propriĂ©tĂ©s sur nos noeuds et relations ? Pas grand chose :-)</p> <p>Ces propriĂ©tĂ©s (rappel : propriĂ©tĂ© = clef/valeur) nĂ©anmoins ne sont pas enregistrĂ©es exactement au mĂȘme endroit selon certains critĂšres :</p> <ul> <li> <p><code class="language-plaintext highlighter-rouge">neostore.propertystore.db.index</code> stocke la partie “clef” des propriĂ©tĂ©s</p> </li> <li> <p><code class="language-plaintext highlighter-rouge">neostore.propertystore.db.arrays</code>, comme son nom l’indique, est dĂ©diĂ© aux propriĂ©tĂ©s dont la valeur est un tableau de primitives ou String</p> </li> <li> <p><code class="language-plaintext highlighter-rouge">neostore.propertystore.db.strings</code> quant Ă  lui se charge de rĂ©pertorier les propriĂ©tĂ©s dont la valeur est une chaĂźne de caractĂšres</p> </li> <li> <p>les autres propriĂ©tĂ©s (boolĂ©en, entier) sont stockĂ©s directement dans <code class="language-plaintext highlighter-rouge">neostore.propertystore.db</code></p> </li> </ul> <p>Chaque jeu de propriĂ©tĂ©s est propre Ă  la relation/le noeud le contenant, les propriĂ©tĂ©s sont reprĂ©sentĂ©es comme des listes simplement chaĂźnĂ©es.</p> <h3 id="nodestore-et-relationshipstore">NodeStore et RelationshipStore</h3> <p>Le voilĂ , le nerf de la guerre !</p> <p>Commençons par les noeuds. Chaque noeud est composĂ© d’un :</p> <ul> <li> <p>ID “interne” (<code class="language-plaintext highlighter-rouge">neostore.nodestore.db.id</code>)</p> </li> <li> <p>des rĂ©fĂ©rences Ă  ses labels (<code class="language-plaintext highlighter-rouge">neostore.nodestore.db.labels{,.id}</code>)</p> </li> <li> <p>une rĂ©fĂ©rence vers sa premiĂšre propriĂ©tĂ© (l’ID interne de la propriĂ©tĂ©) et le premier noeud parmi tous ceux qui lui sont liĂ©s (le tout dans <code class="language-plaintext highlighter-rouge">neostore.nodestore.db</code>)</p> </li> </ul> <p>Conceptuellement, cela pourrait se reprĂ©senter ainsi (slide outrageusement et Ă  de nombreuses reprises empruntĂ© Ă  Neo Technology) : </p> <p><img src="/assets/img/graph_on_disk.png" alt="graph on disk" /></p> <p>Tout repose sur la structuration des enregistrements de relations. Cela est plutĂŽt intuitif : les relations sont l’épine dorsale du graphe.</p> <p>Cet Ă©lĂ©ment central se dĂ©compose de la façon suivante :</p> <ul> <li> <p>un ID “interne” (comme d’hab’ : <code class="language-plaintext highlighter-rouge">neostore.relationshipstore.db.id</code>)</p> </li> <li> <p>son type (<code class="language-plaintext highlighter-rouge">neostore.relationshiptypestore.db.names</code>)</p> </li> </ul> <p>Pour l’instant, ça n’explique pas ce qui en fait une base orientĂ©e graphe. </p> <p>Pour cela, regardons plutĂŽt le code Java (eh oui, c’est ça qui est cool avec les <a href="https://github.com/neo4j/neo4j">projets open source</a> dans les langages qu’on connaĂźt bien) : </p> <pre><code class="language-{.java}">public class RelationshipRecord extends PrimitiveRecord {     private long firstNode;     private long secondNode;     private int type;     private long firstPrevRel = 1;     private long firstNextRel = Record.NO_NEXT_RELATIONSHIP.intValue();     private long secondPrevRel = 1;     private long secondNextRel = Record.NO_NEXT_RELATIONSHIP.intValue();     // [...] </code></pre> <p>Passons sur le formatage digne des codeurs C les plus chevronnĂ©s (qui pour une Pull Request pour remettre les accolades en fin de ligne ? :P).</p> <p>Ce qui est vraiment intĂ©ressant ici, c’est cette notion de <code class="language-plaintext highlighter-rouge">first</code> et <code class="language-plaintext highlighter-rouge">second</code>. En rĂ©alitĂ©, il s’agit des rĂ©fĂ©rences internes (tout est rĂ©fĂ©rence Ă  ce niveau) aux enregistrements correspondant aux noeuds de dĂ©part et d’arrivĂ©e. Seulement, la notion de direction n’ayant de sens qu’au moment du requĂȘtage et non Ă  la crĂ©ation de la relation, on ne peut pas savoir, Ă  ce niveau, qui du <code class="language-plaintext highlighter-rouge">first</code> ou du <code class="language-plaintext highlighter-rouge">second</code> est le noeud de dĂ©part d’oĂč cette nomenclature.</p> <p>Ce que vous devez comprendre de ce petit bout de code, c’est qu’une relation porte en rĂ©alitĂ©, outre les informations prĂ©cĂ©demment mentionnĂ©es :</p> <ul> <li> <p>une rĂ©fĂ©rence vers ses noeuds de dĂ©part et d’arrivĂ©e</p> </li> <li> <p>une rĂ©fĂ©rence vers la prĂ©cĂ©dente relation des noeuds de dĂ©part / d’arrivĂ©e</p> </li> <li> <p>une rĂ©fĂ©rence vers la relation suivante des noeuds de dĂ©part / d’arrivĂ©e</p> </li> </ul> <p>Une illustration vaut mieux qu’un long discours :</p> <p><img src="/assets/img/graph_on_disk_bis.png" alt="graph on disk bis" /></p> <p>Il s’agit exactement de ce que j’ai tentĂ© d’expliquer : les flĂšches rouges symbolisent les liens portĂ©s par les enregistrements de relations. Chacune de ces relations pointe vers les relations prĂ©cĂ©dentes/suivantes de ses noeuds de dĂ©part et d’arrivĂ©e.</p> <p>Autrement dit, chaque noeud rĂ©fĂ©rence (flĂšche verte) un Ă©lĂ©ment d’une liste doublement chaĂźnĂ©e de relations.</p> <p>Et c’est lĂ  la nature mĂȘme du graphe !</p> <p>C’est par cette structure que Neo4j peut se targuer d’ĂȘtre une base de donnĂ©es graphe.</p> <ul> <li> <p>Comment requĂȘter de la donnĂ©e dans un graphe ? Par une traversĂ©e.</p> </li> <li> <p>Comment traverser dans Neo4j ? En trouvant les points de dĂ©part les plus pertinents possible et en naviguant dans listes de relations/noeuds.</p> </li> </ul> <p>Vous commencez Ă  comprendre pourquoi ce genre de base de donnĂ©es s’adapte trĂšs bien aux donnĂ©es fortement connectĂ©es ?</p> <h3 id="quid-des-noeuds-denses-">Quid des noeuds denses ?</h3> <p>Ahah, je vois que j’ai affaire Ă  des lecteurs initiĂ©s ;)</p> <p>Resituons le contexte au travers de deux situations lĂ©gĂšrement diffĂ©rentes.</p> <h4 id="situation-n1">Situation n°1</h4> <p>Un noeud dense est un noeud qui est fortement connectĂ©. De nombreux exemples se retrouvent d’ailleurs dans la vie courante. Par exemple, Justin Bieber a 52 millions de followers sur Twitter (tiens, je ne savais pas que la surditĂ© Ă©tait devenu un phĂ©nomĂšne de masse).</p> <p>Rappelez-vous, le noeud Justin Bieber pointe vers sa premiĂšre relation. Si par manque de chance, vous avez besoin d’accĂ©der Ă  son 52 millioniĂšme noeud-fan, vous allez devoir traverser, dans le pire des cas, l’intĂ©gralitĂ© de la liste doublement chaĂźnĂ©e des relations avant de le retrouver : bref, du O(n)
​ vraiment pas terrible.</p> <p>Ceci dit, ce cas reste relativement rare. Modifions lĂ©gĂšrement l’exemple.</p> <h4 id="situation-n2">Situation n°2</h4> <p>Justin Bieber a certes 52 millions de followers mais il a bien moins de personnes dans sa famille.</p> <p>Si par hasard, parmi cette gigantesque quantitĂ© de relations, seules les relations familiales vous intĂ©ressent, vous faites face exactement au mĂȘme problĂšme que dĂ©crit ci-dessus
 si vous utilisez une version de Neo4j antĂ©rieure Ă  la version 2.1 de Neo4j. </p> <p>Depuis cette version, les relations sont aussi discriminĂ©es par type, permettant ainsi de ne pas tomber dans cet Ă©cueuil. Un noeud est d’ailleurs considĂ©rĂ© dense Ă  partir de 50 relations par dĂ©faut (cf. “http://docs.neo4j.org/chunked/stable/kernel-configuration.html[dense node threshold]”).</p> <h4 id="help-je-suis-dans-la-situation-n1">Help! Je suis dans la situation n°1!</h4> <p>Si par malheur, et aprĂšs exploration de toutes les alternatives (Ă©chantillonnage statistique etc), vous en concluez que vous ne pouvez faire autrement : rassurez-vous !</p> <p>Tout d’abord, les Ă©quipes de Neo continuent de plancher et d’apporter des amĂ©liorations Ă  ce sujet. Nous devrions donc voir quelques amĂ©liorations avec la v2.2.</p> <p>De plus, une approche simple <a href="https://github.com/maxdemarzi/dense">est dĂ©jĂ  codĂ©e pour vous</a> par l’excellent <a href="https://twitter.com/maxdemarzi">Max</a> <a href="http://maxdemarzi.com/">de</a> <a href="https://www.kickstarter.com/projects/1355751798/high-performance-neo4j-video-course">Marzi</a>.</p> <p>L’idĂ©e de son extension est simple : elle va simplement ventiler les noeuds par niveau lors de chaque nouvelle insertion et les lire de façon transparente.</p> <p>Voici donc un exemple de structure automatiquement créée par son extension :</p> <p><img src="/assets/img/dense_nodes.png" alt="dense nodes" /></p> <p>Tout comme Justin Bieber, Lady Gaga et Madonna ont Ă©galement de nombreux fans (chaque fan “LIKES” l’artiste). Un noeud factice va donc se substituer aux noeuds que l’on aurait directement liĂ© aux artistes et introduire des couches, par le biais de noeuds intermĂ©diaires regroupant eux aussi un nombre limitĂ© de fans, reliĂ© alors par une “DENSE_LIKES”. Les relations sont maintenant rĂ©parties et l’on pourra paginer nos requĂȘtes de lecture de cette façon : </p> <pre><code class="language-{.cypher}">MATCH (fan:Fan)-[:DENSE_LIKES*0..5]-&gt;()-[:LIKES]-&gt;(loved:Artist {name: “Madonna”}) RETURN fan </code></pre> <p>Cette requĂȘte signifie (en lisant le pattern de bas en haut, de droite Ă  gauche) :</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>retourne tous les noeuds au label “Artist” et au nom “Madonna” + qui sont “LIKÉS“ par un noeud quelconque (appelons-le META) + et 0 Ă  5 relations DENSE_LIKE sĂ©parent META des noeuds </code></pre></div></div> <p>Étant donnĂ© que la requĂȘte recherche les nombreux fans d’un artiste, sans aucune ventilation du graphe, nous serions en plein dans la situation n°1 dĂ©crite prĂ©alablement. NĂ©anmoins, cette approche simple couplĂ©e Ă  l’usage astucieux des <a href="http://docs.neo4j.org/chunked/milestone/query-match.html#match-variable-length-relationships">variable-length paths</a> permet de ne rĂ©cupĂ©rer qu’une fraction des fans sans pour autant traverser toutes les relations dont l’artiste dĂ©pend.</p> <h2 id="neo4j-et-scalabilitĂ©">Neo4j et scalabilitĂ©</h2> <p>Maintenant que le format physique des fichiers est un peu plus clair, regardons un peu les couches supĂ©rieures.</p> <h3 id="architecture">Architecture</h3> <p>Les accĂšs disques sont bien Ă©videmment limitĂ©s autant que possible. Deux niveaux de cache interviennent.</p> <h4 id="le-file-buffer-cache">Le <em>file buffer cache</em></h4> <p>Vous vous en doutez, le file buffer cache sert de tampon aux Ă©critures/lectures des enregistrements physiques (cf. les fichiers dĂ©crits prĂ©cĂ©demment). Les entrĂ©es les moins rĂ©cemment accĂ©dĂ©es sont Ă©vincĂ©es du buffer (<a href="http://en.wikipedia.org/wiki/Least_Recently_Used#LRU">LRU</a>). Si possible, ce buffer est directement mappĂ© au fichier store sous-jacent (“memory-mapping”). Ce comportement dĂ©pend du systĂšme de fichiers et de l’OS.  Quoi qu’il en soit, cette couche a pour seul but de rĂ©duire au maximum les accĂšs disque mais n’introduit aucune forme d’abstraction sur les donnĂ©es manipulĂ©es.</p> <h4 id="lobject-cache">L’<em>object cache</em></h4> <p>Lui aussi cache LRU, c’est Ă  partir de ce moment-lĂ  que les donnĂ©es manipulĂ©es commencent Ă  prendre la forme du graphe que vous requĂȘtez par traversĂ©e ou par Cypher. Notez que l’allocation mĂ©moire Ă  ce niveau est prise sur la heap de la JVM hĂŽte et non plus directement de l’OS hĂŽte sous-jacent. C’est pourquoi il est souvent prĂ©fĂ©rable de dĂ©ployer Neo4j de façon isolĂ©e, afin que votre application ne vienne pas perturber (comme par exemple : ) les cycles GC de votre instance Neo et vise-versa.</p> <h4 id="et-le-reste">et le reste</h4> <p>À partir de lĂ , les APIs unitaires Java prennent le relais, suivies des APIs de traversĂ©es, Cypher et les APIs REST !</p> <p><img src="/assets/img/neo4j_archi.png" alt="neo4j archi" /></p> <h3 id="gestion-de-la-concurrence">Gestion de la concurrence</h3> <p>Bien que faisant partie de cette (non-)famille qu’est NoSQL, Neo4j fait un peu figure d’exception, en se conformant Ă  ACID. En effet, vous retrouverez avec Neo4j les transactions en 2 phases que vous connaissez bien. N’étant pas un spĂ©cialiste des systĂšmes distribuĂ©s, je vous invite Ă  lire la multitude d’articles existants sur les limites d’ACID, les limites du locking et les alternatives existantes (“lock-free concurrency”, BASE vs ACID) : Google est votre ami. J’en profite donc pour passer Ă  la partie qui m’intĂ©resse le plus : le <em>sharding</em> :)</p> <h3 id="sharding-dun-graphe-dynamique"><em>Sharding</em> d’un graphe dynamique</h3> <p>Expliquons briĂšvement le terme <em>sharding</em>. Le <em>sharding</em> consiste simplement Ă  rĂ©partir ses donnĂ©es entre diffĂ©rentes instances d’un systĂšme de persistence distribuĂ©. Par exemple : je peux dĂ©cider de stocker toutes les adresses postales amĂ©ricaines sur mes serveurs aux États-Unis et mes adresses australiennes Ă  Sydney. Une instance donnĂ©e ne contient donc pas l’intĂ©gralitĂ© des donnĂ©es, mais le domaine mĂ©tier auquel appartient mon application appartient comporte des notions qui se rĂ©partissent naturellement. Eh oui ! Le <em>sharding</em> est une solution technique, certes, mais hautement dĂ©pendante du mĂ©tier (comme toute solution technique devrait l’ĂȘtre, mais je digresse).</p> <h4 id="graphe-statique">Graphe statique</h4> <p>Un graphe statique est plutĂŽt facile Ă  <em>sharder</em> (dans la mesure oĂč le domaine mĂ©tier modĂ©lisĂ© le permet), ses fragmentations sont faciles Ă  dĂ©tecter (on parle de “<em>graph clustering</em>” ou de “<em>community detection</em>”) : elles ne sont pas amenĂ©es Ă  Ă©voluer du tout.  <a href="http://en.wikipedia.org/wiki/Strongly_connected_component">Certains algorithmes</a> sont mĂȘme relativement faciles Ă  implĂ©menter.</p> <h4 id="graphe-dynamique">Graphe dynamique</h4> <p>Pour les graphes dynamiques, en revanche, c’est une autre paire de manche. De nombreuses opĂ©rations d’insertion et suppression interviennent en permanence et elles impactent nĂ©cessairement la topologie du graphe. Le but du jeu est donc de dĂ©terminer un dĂ©coupage du graphe en shards de telle sorte, qu’à tout instant, le nombre de relations inter-shards soit minimisĂ©. Cela est d’autant plus critique que les shards sont distants (imaginez la latence rĂ©seau induite par une traversĂ©e qui commence par un shard hĂ©bergĂ© Ă  Los Angeles pour finir dans un shard Ă  PĂ©kin).</p> <p><img src="/assets/img/neo4j_shards.png" alt="neo4j shards" /></p> <p>C’est un <a href="http://alexaverbuch.blogspot.fr/2010/04/me-my-names-alex-im-currently.html">sujet de recherche</a> Ă  part entiĂšre et Neo Technology travaille depuis plusieurs annĂ©es sur un systĂšme shardable. Comprenez bien le terrible dilemne : par son orientation graphe dĂšs les couches physiques, Neo4j est Ă  la fois idĂ©al pour stocker et requĂȘter des donnĂ©es sous forme de graphe mais Ă©galement trĂšs difficile Ă  sharder !</p> <h4 id="une-lueur-despoir-">Une lueur d’espoir ?</h4> <dl> <dt>Il est pour l’instant nĂ©cessaire de miser sur du [*scaling</dt> <dt>vertical*](http://fr.wikipedia.org/wiki/Scalability) : dimensionnez</dt> <dt>suffisamment vos machines et tout se passera trĂšs bien. Laissez-moi vous</dt> <dt>rassurer davantage : * jusqu’à prĂ©sent, une infime minoritĂ© de clients</dt> <dt>a Ă©tĂ© confrontĂ©e Ă  une volumĂ©trie telle ([capacitĂ© nomimale de</dt> <dt>Neo4j](http://docs.neo4j.org/chunked/stable/capabilities-capacity.html)</dt> <dd>34 millards de noeuds et de relations) qu’une rĂ©partition des donnĂ©es Ă©tait nĂ©cessaire * il se trouve que certains domaines mĂ©tiers permettent naturellement de sĂ©grĂ©guer ses donnĂ©es * il existe un dĂ©but de solution de rĂ©partition !</dd> </dl> <h4 id="le-cache-sharding-">Le <em>cache sharding</em> !</h4> <p>Le titre peut faire peur, mais rassurez-vous, l’idĂ©e est toute simple. Tout d’abord, cette idĂ©e s’applique Ă  Neo4j en mode <a href="http://docs.neo4j.org/chunked/stable/ha-how.html">High Availability</a>. En d’autres termes, cela ne s’applique qu’à une instance Neo4j au sein d’un <em>cluster</em>.</p> <p>Non seulement vous bĂ©nĂ©ficiez d’une rĂ©plication master/replica, mais vous pouvez Ă©galement bĂ©nĂ©ficier de <em>sharding</em>. Oui, oui, j’ai bien dit <em>sharding</em>. Malheureusement, pour les raisons Ă©voquĂ©es plus haut, il ne s’agit pas de <em>sharding</em> sur les donnĂ©es Ă  proprement parler. Comme le titre l’évoque, il s’agit de sharding sur le cache.</p> <p>Comment est-ce possible ? C’est tout simple !</p> <p>Les caches de Neo4j sont des caches LRU, ils ne conservent que les entrĂ©es les plus rĂ©centes en leur sein. S’il existait un moyen de rĂ©partir les requĂȘtes de façon persistante entre chaque instance de mon cluster, le tour serait jouĂ©. En effet, la requĂȘte X serait toujours exĂ©cutĂ©e sur l’instance A, la requĂȘte Y sur l’instance B
 Le rĂ©sultat X serait de facto dans les caches A, celui d’Y dans les caches B. Mes donnĂ©es seraient donc effectivement rĂ©parties par cache. Le problĂšme se rĂ©duit donc Ă  : comment rĂ©partir de façon consistante les requĂȘtes Ă  exĂ©cuter entre les instances de mon cluster Neo4j ? Je vous le donne en mille. La solution existe depuis des lustres : un simple load balancer comme <a href="http://haproxy.1wt.eu/">HAProxy</a> saura faire l’affaire. On parle de consistent routing (plus gĂ©nĂ©ralement de <a href="http://en.wikipedia.org/wiki/Consistent_hashing"><em>consistent hashing</em></a>).  Il suffit de configurer sa façon de router selon un des arguments prĂ©sents dans le corps ou un quelconque entĂȘte des appels HTTP envoyĂ©s Ă  Neo (rappelez-vous : toute communication distante est dĂ©finie par une API REST) et le load balancer se chargera d’exĂ©cuter vos ordres lĂ  oĂč vous l’avez configurĂ© ! Astucieux, non ? Un simple load balancer, un cluster Neo4j (l’édition High Availability vous fournit tous les outils qu’il vous fait) et vous ĂȘtes prĂȘts Ă  affronter une forte volumĂ©trie de donnĂ©es !</p> <h1 id="conclusion">Conclusion</h1> <p>Une des leçons de NOSQL est que toute solution se restreint Ă  un certain champ d’application et s’applique sous certaines conditions. J’espĂšre que cet article vous aura permis de comprendre les faiblesses mais surtout les forces des bases de donnĂ©es graphe et, qui sait, vous donnera envie d’approfondir le sujet.</p> <p>Je ne prĂ©tends pas Ă  l’exhaustivitĂ©, donc si vous souhaitez que je dĂ©taille d’autres parties (exemple : Cypher), je peux Ă©ventuellement y consacrer d’autres articles.</p> <p>&lt;shameless_plug&gt;Si cet article vous a plu, je peux aussi venir en parler dans un User Group de votre ville et je donne des formations customisables sur Neo4j et en français ! &lt;/shameless_plug&gt;</p>Florent Biville3615-ma-vie