Jekyll2022-12-18T11:34:14+00:00https://mark1626.github.io/feed.xmlMark1626 Home PageNimalan aka Mark1626Deep Dive: On Parallel graph algorithms2022-05-27T00:00:00+00:002022-05-27T00:00:00+00:00https://mark1626.github.io/posts/2022/05/27/on-parallel-graph-algorithms<p>Traditional graph traversal algorithms DFS and BFS are recursive. There are iterative versions as well but our next question would be on how to scale these to multiple CPU cores, or across muliple GPU cores. This deep dive is to explore parallel graph algorithms and on what type of hardware they would shine</p>
<!--more-->
<h2 id="parallel-bfs">Parallel BFS</h2>
<p>In a parallel BFS we create a list of all the nodes in the layer, then we distribute this to threads for processing, a barrier is needed after between layers in the graph to ensure consistency.</p>
<p>A small psuedo code of this would be as follows. This wouldn’t be the best way of doing this in parallel, the example is for representation purposes</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">bfs_parallel</span><span class="p">(</span><span class="n">source</span> <span class="n">s</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Curr_Nodes</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">Next_Nodes</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">push</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">Curr_Nodes</span><span class="p">)</span>
<span class="cp">#pragma omp parallel shared(Curr_Nodes, Next_Nodes)
</span> <span class="p">{</span>
<span class="k">while</span> <span class="p">(</span><span class="n">Curr_Nodes</span> <span class="o">!</span><span class="n">empty</span><span class="p">)</span> <span class="p">{</span>
<span class="cp">#pragma omp for
</span> <span class="k">for</span> <span class="p">(</span><span class="n">node</span> <span class="o">:</span> <span class="n">Curr_Nodes</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Add neighbours for next level</span>
<span class="k">for</span> <span class="p">(</span><span class="n">v</span> <span class="o">:</span> <span class="n">u</span><span class="p">.</span><span class="n">neighbour</span><span class="p">())</span> <span class="p">{</span>
<span class="cp">#pragma omp critical
</span> <span class="p">{</span>
<span class="n">push</span><span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">Next_Nodes</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">process</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
<span class="p">}</span>
<span class="cp">#pragma omp barrier
</span>
<span class="cp">#pragma omp master
</span> <span class="p">{</span>
<span class="n">Current_Nodes</span> <span class="o">=</span> <span class="n">Next_Nodes</span><span class="p">;</span>
<span class="n">Next_Nodes</span> <span class="o">=</span> <span class="p">{};</span>
<span class="n">level</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="cp">#pragma omp barrier
</span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This idea runs into some problems</p>
<ol>
<li>This cannot be used for algorithms where the order is important like range based algorithms.</li>
<li>The amount of workload received by the threads will change in different layers of the graph.</li>
<li>Synchronization between layers would mean that we need to have a decent number of nodes in a layer to actually benefit from parallelism</li>
<li>Data locality is a challenge
<ul>
<li>If we were in a distributed system it would mean that the data has to be transferred to the remote system adding communication challenges</li>
<li>If we were using a multicore / threading data being in the same cache line would be a challenge</li>
</ul>
</li>
</ol>
<blockquote>
<p><strong>Note:</strong> On a distributed system an alternative way is to partition the graph across nodes</p>
</blockquote>
<h2 id="graph-algorithms-in-the-gpu">Graph algorithms in the GPU</h2>
<p>A recursive BFS would perform very badly on the GPU. If we were to use an iterative approach we would run into the problem of thread divergence. GPUs have SIMT architecture, where we would get max performance if all the threads in a thread warp execute the same instruction. A relatively simple GPU kernel without a lot of conditions would have the best performance.</p>
<p>There is a case study from Nvidia’s Technical Blog <a href="https://developer.nvidia.com/blog/thinking-parallel-part-ii-tree-traversal-gpu/">Tree Traversal on GPU</a> on graph traversal in which they assign one thread per leaf node in a BVH and threads will process objects that are nearby in 3D space.</p>
<h2 id="references">References</h2>
<ol>
<li>
<p>Wikipedia. Parallel breadth-first search. URL <a href="https://en.wikipedia.org/wiki/Parallel_breadth-first_search">https://en.wikipedia.org/wiki/Parallel_breadth-first_search</a></p>
</li>
<li>
<p>Tero Karras. Thinking Parallel, Part 2: Tree Traversal on GPU. Nvidia Technical Blog . URL <a href="https://developer.nvidia.com/blog/thinking-parallel-part-ii-tree-traversal-gpu/">https://developer.nvidia.com/blog/thinking-parallel-part-ii-tree-traversal-gpu/</a></p>
</li>
<li>
<p>Tero Karras. Thinking Parallel, Part 3: Tree Construction on GPU. Nvidia Technical Blog . URL <a href="https://developer.nvidia.com/blog/thinking-parallel-part-iii-tree-construction-gpu/">https://developer.nvidia.com/blog/thinking-parallel-part-iii-tree-construction-gpu/</a></p>
</li>
</ol>NimalanTraditional graph traversal algorithms DFS and BFS are recursive. There are iterative versions as well but our next question would be on how to scale these to multiple CPU cores, or across muliple GPU cores. This deep dive is to explore parallel graph algorithms and on what type of hardware they would shineDeep Dive - The Cost of Integer Division2022-05-20T00:00:00+00:002022-05-20T00:00:00+00:00https://mark1626.github.io/posts/2022/05/20/cost-of-int-division<p>Integer division and modulo has always been expensive in the hardware. Modern CPUs convert ASM into uops internally which can then even be processed out of order, but even so division takes a lot of clock cycles to complete and stalls the pipeline. In this case study we will be compare different approaches to unsigned integer division.</p>
<!--more-->
<h2 id="approaches">Approaches</h2>
<h3 id="x86-unsigned-integer-division">x86 unsigned integer division</h3>
<p>When we do not know the divisor before hand the compiler uses the <code class="language-plaintext highlighter-rouge">idiv</code> instruction. The <code class="language-plaintext highlighter-rouge">idiv</code> instruction calculates both the divisor and reminder.</p>
<p>For the following code GCC will generate an <code class="language-plaintext highlighter-rouge">idiv</code> instruction for both the function</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="nf">mod</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">n</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">d</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">n</span> <span class="o">%</span> <span class="n">d</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">quo</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">n</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">d</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">n</span> <span class="o">/</span> <span class="n">d</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Between the assembly generated for division and reminder the only difference is the <code class="language-plaintext highlighter-rouge">movl %edx, %eax</code> done to extract the reminder from the <code class="language-plaintext highlighter-rouge">edx</code> register.</p>
<div class="language-s highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mod</span><span class="p">(</span><span class="n">unsigned</span><span class="w"> </span><span class="n">int</span><span class="p">,</span><span class="w"> </span><span class="n">unsigned</span><span class="w"> </span><span class="n">int</span><span class="p">)</span><span class="o">:</span><span class="w">
</span><span class="n">movl</span><span class="w"> </span><span class="o">%edi, %</span><span class="n">eax</span><span class="w">
</span><span class="n">xorl</span><span class="w"> </span><span class="o">%edx, %</span><span class="n">edx</span><span class="w">
</span><span class="n">divl</span><span class="w"> </span><span class="o">%esi
movl %</span><span class="n">edx</span><span class="p">,</span><span class="w"> </span><span class="o">%eax
ret
quo(unsigned int, unsigned int):
movl %</span><span class="n">edi</span><span class="p">,</span><span class="w"> </span><span class="o">%eax
xorl %</span><span class="n">edx</span><span class="p">,</span><span class="w"> </span><span class="o">%edx
divl %</span><span class="n">esi</span><span class="w">
</span><span class="n">ret</span><span class="w">
</span></code></pre></div></div>
<p>This instruction is very expensive, if we view the CPU execution timeline in <code class="language-plaintext highlighter-rouge">llvm-mca</code> the <code class="language-plaintext highlighter-rouge">div</code> instruction takes a very long to finish execution and also stalls the pipeline.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// The timeline is truncated to fit, it takes is close to 80 cycles
Timeline view:
0 1 2 ... 7
0123456789 ... 0123456789
Index 0123456789 01 ...
[0,0] DeER . . . . ... . . . movl %edi, %eax
[0,1] D--R . . . . ... . . . xorl %edx, %edx
[0,2] .Deeeeeeeeeeeeeeeeeee ... eeeeeeeeeER divl %esi
</code></pre></div></div>
<h3 id="granlund-montgomery-algorithm">Granlund-Montgomery algorithm</h3>
<p>The compiler uses <a href="https://doi.org/10.1145/773473.178249">Granlund-Montgomery’s division algorithm</a> for division when the divisor is known.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Compiler optimizes to multiplication and shift instructions*/</span>
<span class="kt">uint32_t</span> <span class="nf">mod</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">n</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">d</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">n</span> <span class="o">%</span> <span class="mi">23</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Although this has a lot more instructions than the earlier version with <code class="language-plaintext highlighter-rouge">idiv</code>, the division is calculated in a single multiplication</p>
<div class="language-s highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mod</span><span class="p">(</span><span class="n">unsigned</span><span class="w"> </span><span class="n">int</span><span class="p">,</span><span class="w"> </span><span class="n">unsigned</span><span class="w"> </span><span class="n">int</span><span class="p">)</span><span class="o">:</span><span class="w">
</span><span class="n">movl</span><span class="w"> </span><span class="o">%edi, %</span><span class="n">eax</span><span class="w">
</span><span class="n">movl</span><span class="w"> </span><span class="o">$</span><span class="m">2987803337</span><span class="p">,</span><span class="w"> </span><span class="o">%edx
imulq %</span><span class="n">rdx</span><span class="p">,</span><span class="w"> </span><span class="o">%rax
shrq $36, %</span><span class="n">rax</span><span class="w">
</span><span class="n">imull</span><span class="w"> </span><span class="o">$</span><span class="m">23</span><span class="p">,</span><span class="w"> </span><span class="o">%eax, %</span><span class="n">edx</span><span class="w">
</span><span class="n">movl</span><span class="w"> </span><span class="o">%edi, %</span><span class="n">eax</span><span class="w">
</span><span class="n">subl</span><span class="w"> </span><span class="o">%edx, %</span><span class="n">eax</span><span class="w">
</span><span class="n">ret</span><span class="w">
</span></code></pre></div></div>
<p>The timeline view for a AMD Zen3 architecture CPU shows the modulo being run for 3 iterations. The amount of cycles needed is much less than using <code class="language-plaintext highlighter-rouge">idiv</code></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
Timeline view:
0123456
Index 0123456789
[0,0] DeER . . .. movl %edi, %eax
[0,1] DeER . . .. movl $2987803337, %edx
[0,2] D=eeeER . .. imulq %rdx, %rax
[0,3] D====eER . .. shrq $36, %rax
[0,4] D=====eeeER .. imull $23, %eax, %edx
[0,5] DeE-------R .. movl %edi, %eax
[0,6] .D=======eER .. subl %edx, %eax
[0,7] .DeeeeeeeE-R .. retq
[1,0] .DeE-------R .. movl %edi, %eax
[1,1] .D=eE------R .. movl $2987803337, %edx
[1,2] . D=eeeE---R .. imulq %rdx, %rax
[1,3] . D====eE--R .. shrq $36, %rax
[1,4] . D=====eeeER .. imull $23, %eax, %edx
[1,5] . DeE-------R .. movl %edi, %eax
[1,6] . D========eER .. subl %edx, %eax
[1,7] . DeeeeeeeE-R .. retq
[2,0] . DeE-------R .. movl %edi, %eax
[2,1] . D=eE------R .. movl $2987803337, %edx
[2,2] . D===eeeE--R .. imulq %rdx, %rax
[2,3] . D=====eE-R .. shrq $36, %rax
[2,4] . D======eeeER. imull $23, %eax, %edx
[2,5] . DeE--------R. movl %edi, %eax
[2,6] . D=========eER subl %edx, %eax
[2,7] . DeeeeeeeE--R retq
</code></pre></div></div>
<h3 id="libdivide">libdivide</h3>
<p><a href="https://github.com/ridiculousfish/libdivide">libdivide</a> replace the expensive integer divides with multiplication and bitshifts similar to the GM algorithm done by the compiler. But unlike the compiler libdivide can be used to optimize runtime constants. libdivide also allows division for SIMD vectors.</p>
<h3 id="lkk-algorithm">LKK algorithm</h3>
<p><a href="http://arxiv.org/abs/1902.01961">Lemire Kaser Kurz(LKK)</a> is another algorithm for unsigned integer division. It has similar performance to Granlund-Montgomery and is better for some divisors. This algorithm works best when we precompute the inverse of the divisor beforehand, and the divisor is to an extent a runtime constant.</p>
<p>A full implementation of LKK algorithm can be found in <a href="https://github.com/lemire/fastmod">fastmod</a></p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="kt">uint32_t</span> <span class="n">d</span> <span class="o">=</span> <span class="mi">9</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">c</span> <span class="o">=</span> <span class="n">UINT64_C</span><span class="p">(</span><span class="mh">0xFFFFFFFFFFFFFFFF</span><span class="p">)</span> <span class="o">/</span> <span class="n">d</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="nf">fastmod</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">lowbits</span> <span class="o">=</span> <span class="n">c</span> <span class="o">*</span> <span class="n">n</span><span class="p">;</span>
<span class="k">return</span> <span class="p">((</span><span class="n">__uint128_t</span><span class="p">)</span><span class="n">lowbits</span> <span class="o">*</span> <span class="n">d</span><span class="p">)</span> <span class="o">>></span> <span class="mi">64</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-s highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fastmod</span><span class="p">(</span><span class="n">unsigned</span><span class="w"> </span><span class="n">int</span><span class="p">)</span><span class="o">:</span><span class="w">
</span><span class="n">movl</span><span class="w"> </span><span class="o">%edi, %</span><span class="n">eax</span><span class="w">
</span><span class="n">movabsq</span><span class="w"> </span><span class="o">$</span><span class="m">2049638230412172402</span><span class="p">,</span><span class="w"> </span><span class="o">%rdx
imulq %</span><span class="n">rdx</span><span class="p">,</span><span class="w"> </span><span class="o">%rax
movl $9, %</span><span class="n">edx</span><span class="w">
</span><span class="n">mulq</span><span class="w"> </span><span class="o">%rdx
movq %</span><span class="n">rdx</span><span class="p">,</span><span class="w"> </span><span class="o">%</span><span class="n">rax</span><span class="w">
</span><span class="n">ret</span><span class="w">
</span></code></pre></div></div>
<p>Timeline view is similar to GM algorithm in term of the number of cycles and instruction level parallelism used.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Timeline view:
0123456
Index 0123456789
[0,0] DeER . . .. movl %edi, %eax
[0,1] DeER . . .. movabsq $2049638230412172402, %rdx
[0,2] D=eeeER . .. imulq %rdx, %rax
[0,3] DeE---R . .. movl $9, %edx
[0,4] D====eeeeER .. mulq %rdx
[0,5] .D=======eER .. movq %rdx, %rax
[0,6] .DeeeeeeeE-R .. retq
[1,0] .DeE-------R .. movl %edi, %eax
[1,1] .D=eE------R .. movabsq $2049638230412172402, %rdx
[1,2] . D=eeeE---R .. imulq %rdx, %rax
[1,3] . DeE------R .. movl $9, %edx
[1,4] . D====eeeeER .. mulq %rdx
[1,5] . D========eER .. movq %rdx, %rax
[1,6] . DeeeeeeeE-R .. retq
[2,0] . DeE-------R .. movl %edi, %eax
[2,1] . D=eE------R .. movabsq $2049638230412172402, %rdx
[2,2] . D==eeeE---R .. imulq %rdx, %rax
[2,3] . DeE------R .. movl $9, %edx
[2,4] . D=====eeeeER. mulq %rdx
[2,5] . D=========eER movq %rdx, %rax
[2,6] . DeeeeeeeE--R retq
</code></pre></div></div>
<h2 id="performance">Performance</h2>
<p>Performance for 100 iterations</p>
<table>
<thead>
<tr>
<th>Method</th>
<th>Instructions</th>
<th>Total Cycles</th>
<th>Total uOps</th>
<th>uOps Per Cycle</th>
<th>IPC</th>
</tr>
</thead>
<tbody>
<tr>
<td>idiv</td>
<td>500</td>
<td>1462</td>
<td>3800</td>
<td>2.60</td>
<td>0.34</td>
</tr>
<tr>
<td>compiler</td>
<td>800</td>
<td>234</td>
<td>1000</td>
<td>4.27</td>
<td>3.42</td>
</tr>
<tr>
<td>fastmod</td>
<td>700</td>
<td>235</td>
<td>1000</td>
<td>4.26</td>
<td>2.98</td>
</tr>
</tbody>
</table>
<h2 id="references">References</h2>
<ol>
<li>
<p>Torbjörn Granlund and Peter L. Montgomery. Division by invariant integers using multiplication. SIGPLAN Not., 29(6):61-72, jun 1994. ISSN 0362-1340. doi: 10.1145/773473.178249. URL <a href="https://doi.org/10.1145/773473.178249">https://doi.org/10.1145/773473.178249</a>.</p>
</li>
<li>
<p>Daniel Lemire, Owen Kaser, and Nathan Kurz. Faster remainder by direct computation: Applications to compilers and software libraries. CoRR, abs/1902.01961, 2019. URL <a href="http://arxiv.org/abs/1902.01961">http://arxiv.org/abs/1902.01961</a>.</p>
</li>
<li>
<p>Daniel Lemire <a href="https://github.com/lemire">@lemire</a>. fastmod. URL <a href="https://github.com/lemire/fastmod">https://github.com/lemire/fastmod</a>.</p>
</li>
<li>
<p>Peter Ammon <a href="https://github.com/ridiculousfish">@ridiculousfish</a>. libdivide. URL <a href="https://github.com/ridiculousfish/libdivide">https://github.com/ridiculousfish/libdivide</a>.</p>
</li>
</ol>NimalanInteger division and modulo has always been expensive in the hardware. Modern CPUs convert ASM into uops internally which can then even be processed out of order, but even so division takes a lot of clock cycles to complete and stalls the pipeline. In this case study we will be compare different approaches to unsigned integer division.Auto Vectorization Case Study 22021-10-09T00:00:00+00:002021-10-09T00:00:00+00:00https://mark1626.github.io/posts/2021/10/09/auto-vectorization-case-study-2<p>For this case study let’s try to auto-vectorize an implementation of <a href="https://en.wikipedia.org/wiki/Abelian_sandpile_model">Abelian sandpile model</a></p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">stabilize</span><span class="p">()</span> <span class="p">{</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">spills</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">y</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">pos</span> <span class="o">=</span> <span class="n">y</span> <span class="o">*</span> <span class="n">points</span> <span class="o">+</span> <span class="n">x</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">currSand</span> <span class="o">=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span><span class="p">];</span>
<span class="kt">char</span> <span class="n">newSand</span> <span class="o">=</span> <span class="n">currSand</span> <span class="o">>=</span> <span class="mi">4</span> <span class="o">?</span> <span class="n">currSand</span> <span class="o">-</span> <span class="mi">4</span> <span class="o">:</span> <span class="n">currSand</span><span class="p">;</span>
<span class="n">spills</span> <span class="o">+=</span> <span class="n">currSand</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span> <span class="o">-</span> <span class="n">points</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span> <span class="o">+</span> <span class="n">points</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">state</span><span class="p">[</span><span class="n">pos</span><span class="p">]</span> <span class="o">=</span> <span class="n">newSand</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="n">swap</span><span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="n">state</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">spills</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Explore this example in [Godbolt]https://godbolt.org/#z:OYLghAFBqd5TKALEBjA9gEwKYFFMCWALugE4A0BIEAZgQDbYB2AhgLbYgDkAjF%2BTXRMiAZVQtGIHgCYBQogFUAztgAKAD24AGfgCsp5eiyahUAUmkAhC5fIrGqIgSHVmmAMLp6AVzZMDbgAyBEzYAHK%2BAEbYpFIA7OQADuhKxM5Mnj5%2BBsmpTkLBoRFs0bE8CfbYjukiRCykRJm%2B/jx22A75TLX1RIXhUTHxdnUNTdmtSiO9If0lg%2BUAlHbo3qSonFwWAMwhqD44ANRmW%2B6TpCHAx7hmWgCCGEyTByFEB4kE6u1HWwAiBzzHdyAgAcx2sdweTxeb3QLyU3z%2BAJOILBN1uIXoMwOqQAXtgAPqvUjYJReABu2AAkph1BBcQTXgBPcjYgh4wkHdQLI5xSwHYlEVZMA6Mg4AKhhcKOVk5YJ5PzRaLJsMwBxoTAgqCQ9QlkW8NBoMRZWp12LqRGw3LMvLRB2ewmx73o9Hhxz%2BWlRd1tarIBzpbIZIoR/zlosBbreH3acpsNkZVptdztdu2iVILGAbBYB3QbESrLYmG9dsEpD99I56mDSL5VfDv0jn3oMasNi5PPBt2TyZNZdQq1IImMqojeoNMTMAFZLMTSfQKdTaczOQspwqtp3u3bewdQgB3IdMEcN/ukQfD77XBsAFm%2BADEDqfz0eDgBaA63kCPgeHosb4vJkoToutK1gnj%2BF5XBG16el2W4APTwQcIjATmFJljQpC5ru2AEMASCRCspBKABdr7r%2BoGjvqhqkFOM4kuSVI0lAorvjwSwrmul7QbBpFmiwFp0bOjGLhAy5clxEbkcOsHdtaCpJvKipxD8XBLPQ3CTvw/hcDo5DoNwQKtjKpKrOs0pbHw5BENoalLAA1iAWxbAAdAAbFoWzXtePBbHEbmTpOWhubIGlcNe/BsCAPkuQAnNIcRxDwWjAtezlxFsk7kDpekGVw/BKCAWjWbZSxwLAKAYDg%2BDEGQlDUHQjCsBw3BWYIwhiBInA8NecjCMoaiaLpeggKFximLGVhtB06SuEeYwtFs5BBDMxSlFIxW5GkQgLVIS1bZ0fRrfMxWVNUQjdKMXjNHt01VJ0l3TEUAxlKdUy7b5ww9EdL0bUsplrBs2y7Ps2DfKcRDnCYUEQkIUIOu8TbVvW7igv%2BsOPK80LJFKEZIkCJxo52GJYhWRIMfOTG0mTIosjT7bWnyApCkGEo48IroyuoMYqYqdzKgQqrqpq2qkLq1FGo%2BosSpMAmWh23rQkBDAgRGHro7c3qluWAYcmGDY1kG9Z/Ij0YbpRVjxgriklr6/rsq8dYG3KTsEybUbNubk2WAziZwd2O5PhRVHjrR07CZToniauk7rpuW47tJL4RkHkFbFefy3scD6py%2B76ft%2BZ6/rJW7K86nNgX8ufHhnH4l92iHIah6DoWqWFsDheEEURJE28mScjq2DZjjRQkUwuzFiW%2B/wcRJsfcTevF93asuCeH49U2JLJz38UnYAeMka3JvOKfJymqepmnabZ%2BmGd72JEeZ2zSPwNnDQsSxINgLA4LEECX%2BFSKIBJzAhcloOI0hpDAlijwScbkQpuWvLFNy2Ub55QKkVEq78HKjWkC5IK0CeAhWBAFOIPlrxZTClsa%2Bw1b75SwToMqiByrIDQLmJ0RoqCanYQwQYwAeAyAEAwC0xFqCRBvpEEI9RGStX4BgNgHBhAAHkmD0BkbQnAWYTCSA0QQYk1QKSFVodgT4/YLSyMoMIdoN9MSRHTKQRkngcA30hgQKKvA1ICCMMAJQAA1Ag%2B8lGJGYBY9qohxCSB6n1RQKgNA330K0IwJg0De0MAQSIhVIBLHQIkToRjXxKK2G%2BdU6BXyQ2wNgV8FJHBkADMUnJzA8wFXaPdWaEA3AfWWkeH6cwyhJBSNtDI11xj9LyOkHp60JgtPOl0d6wyWh3RmY9CZ8wvpXSyAs2WDQVllH%2Bo/bq1liQbD4AArSqDaF5XUCQ18iCDjAFQKgf4PAXLSD9EZawU0Dg1RIGWbYrQDieDzLw350hLILFfqVHBSCXIQNioQ2KcC4jAgSok7gEVyBRSCuc3K3AMHFTfow8gLCICVR4YwCgXD5EcNiPwwRjURGFQgOI2hkjWAOIsfIxRRAVFqJvpo8aOi9KEH0U4QxN8TFVG8OYjxliLRhT0rY%2BxjisAbD0q49xJyvEZj8QEvcQSQkyrCZ1SJvUwkDTibQ/QY1knmGMrYWxmT/76VyekfJhTmkzRcO0%2Ba8yAjdNWr9VoB10idODQUANvSDBnQenMjZUbpkxu%2BhGyZazGi%2BomFMHZUg9lmU4LIcpxzPFhTOTlfglzrm3PuY8gRLy3mpK%2BYQH5FlZAArJTEZt4KGF2XIF/H%2BgwnVhXRZi4qpa6F4q7R/cgjlpCxXcvFNyMhYoZUSiQyh3BqHYrLbiidRLmEoEIAaeqtB5DGu6qa%2BQ5qhoKqQIVRJN6D00CIIyYJmDSA3tGuQN9SgH1PpfVoU5NCcVcB%2BAQA0Bx/H73bVctyNzbxVqebWsk8JoOwbuQ8hDL8J1QriDCuF144jgK2Mg/D0hYGGDRYBrd9DCr4sheRrgmHR3oKw%2BQdCqQXDXiAA%3D%3D%3D)
or <a href="https://godbolt.org/z/Tec6dGMhr">here</a></p>
<!--more-->
<h2 id="restructuring-for-auto-vectorization">Restructuring for auto vectorization</h2>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="kt">int</span> <span class="n">pixel</span> <span class="o">=</span> <span class="mi">1</span><span class="o"><<</span><span class="mi">8</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">points</span> <span class="o">=</span> <span class="mi">1</span><span class="o"><<</span><span class="mi">8</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">stabilize</span><span class="p">(</span><span class="kt">char</span><span class="o">*</span> <span class="n">buffer</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">state</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">spills</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">y</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">pos</span> <span class="o">=</span> <span class="n">y</span> <span class="o">*</span> <span class="n">points</span> <span class="o">+</span> <span class="n">x</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">currSand</span> <span class="o">=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span><span class="p">];</span>
<span class="kt">char</span> <span class="n">newSand</span> <span class="o">=</span> <span class="n">currSand</span> <span class="o">>=</span> <span class="mi">4</span> <span class="o">?</span> <span class="n">currSand</span> <span class="o">-</span> <span class="mi">4</span> <span class="o">:</span> <span class="n">currSand</span><span class="p">;</span>
<span class="n">spills</span> <span class="o">+=</span> <span class="n">currSand</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span> <span class="o">-</span> <span class="n">points</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span> <span class="o">+</span> <span class="n">points</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">state</span><span class="p">[</span><span class="n">pos</span><span class="p">]</span> <span class="o">=</span> <span class="n">newSand</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="n">swap</span><span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="n">state</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">spills</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>To make this reable I’m going to be extracting the <code class="language-plaintext highlighter-rouge">y*points + x</code> as an inline function</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="kt">int</span> <span class="n">pixel</span> <span class="o">=</span> <span class="mi">1</span><span class="o"><<</span><span class="mi">8</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">points</span> <span class="o">=</span> <span class="mi">1</span><span class="o"><<</span><span class="mi">8</span><span class="p">;</span>
<span class="kr">inline</span> <span class="kt">size_t</span> <span class="nf">resolveIdx</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">y</span> <span class="o">*</span> <span class="n">points</span> <span class="o">+</span> <span class="n">x</span><span class="p">;</span> <span class="p">}</span>
<span class="kt">void</span> <span class="nf">stabilize</span><span class="p">(</span><span class="kt">char</span><span class="o">*</span> <span class="n">buffer</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">state</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">spills</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">y</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">char</span> <span class="n">currSand</span> <span class="o">=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">)];</span>
<span class="kt">char</span> <span class="n">newSand</span> <span class="o">=</span> <span class="n">currSand</span> <span class="o">>=</span> <span class="mi">4</span> <span class="o">?</span> <span class="n">currSand</span> <span class="o">-</span> <span class="mi">4</span> <span class="o">:</span> <span class="n">currSand</span><span class="p">;</span>
<span class="n">spills</span> <span class="o">+=</span> <span class="n">currSand</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="c1">// Spill over from neighbours</span>
<span class="n">newSand</span> <span class="o">+=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">((</span><span class="n">y</span> <span class="o">-</span> <span class="mi">1</span><span class="p">),</span> <span class="n">x</span><span class="p">)]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">+=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">((</span><span class="n">y</span> <span class="o">+</span> <span class="mi">1</span><span class="p">),</span> <span class="n">x</span><span class="p">)]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">+=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">+=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">state</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">)]</span> <span class="o">=</span> <span class="n">newSand</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="openmp">OpenMP</h2>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="kt">int</span> <span class="n">pixel</span> <span class="o">=</span> <span class="mi">1</span><span class="o"><<</span><span class="mi">8</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">points</span> <span class="o">=</span> <span class="mi">1</span><span class="o"><<</span><span class="mi">8</span><span class="p">;</span>
<span class="kr">inline</span> <span class="kt">size_t</span> <span class="nf">resolveIdx</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">y</span> <span class="o">*</span> <span class="n">points</span> <span class="o">+</span> <span class="n">x</span><span class="p">;</span> <span class="p">}</span>
<span class="kt">void</span> <span class="nf">stabilize</span><span class="p">(</span><span class="kt">char</span><span class="o">*</span> <span class="n">buffer</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">state</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">spills</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">y</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="cp">#pragma omp simd
</span> <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">char</span> <span class="n">currSand</span> <span class="o">=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">)];</span>
<span class="kt">char</span> <span class="n">newSand</span> <span class="o">=</span> <span class="n">currSand</span> <span class="o">>=</span> <span class="mi">4</span> <span class="o">?</span> <span class="n">currSand</span> <span class="o">-</span> <span class="mi">4</span> <span class="o">:</span> <span class="n">currSand</span><span class="p">;</span>
<span class="n">spills</span> <span class="o">+=</span> <span class="n">currSand</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="c1">// Spill over from neighbours</span>
<span class="n">newSand</span> <span class="o">+=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">((</span><span class="n">y</span> <span class="o">-</span> <span class="mi">1</span><span class="p">),</span> <span class="n">x</span><span class="p">)]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">+=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">((</span><span class="n">y</span> <span class="o">+</span> <span class="mi">1</span><span class="p">),</span> <span class="n">x</span><span class="p">)]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">+=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">+=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">state</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">)]</span> <span class="o">=</span> <span class="n">newSand</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="encapsulating-the-function-as-a-class">Encapsulating the function as a class</h2>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="kt">size_t</span> <span class="n">pixel</span> <span class="o">=</span> <span class="mi">1</span> <span class="o"><<</span> <span class="mi">8</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">size_t</span> <span class="n">points</span> <span class="o">=</span> <span class="n">pixel</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">size_t</span> <span class="n">size</span> <span class="o">=</span> <span class="n">points</span> <span class="o">*</span> <span class="n">points</span><span class="p">;</span>
<span class="k">class</span> <span class="nc">Sandpile</span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">stabilize</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">state</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buffer</span><span class="p">);</span>
<span class="kr">inline</span> <span class="kt">size_t</span> <span class="n">resolveIdx</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">y</span> <span class="o">*</span> <span class="n">points</span> <span class="o">+</span> <span class="n">x</span><span class="p">;</span> <span class="p">}</span>
<span class="p">};</span>
<span class="kt">void</span> <span class="n">Sandpile</span><span class="o">::</span><span class="n">stabilize</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">state</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buffer</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">size_t</span> <span class="n">spills</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">y</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="cp">#pragma omp simd
</span> <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">pos</span> <span class="o">=</span> <span class="n">y</span> <span class="o">*</span> <span class="n">points</span> <span class="o">+</span> <span class="n">x</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">currSand</span> <span class="o">=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span><span class="p">];</span>
<span class="kt">char</span> <span class="n">newSand</span> <span class="o">=</span> <span class="n">currSand</span> <span class="o">>=</span> <span class="mi">4</span> <span class="o">?</span> <span class="n">currSand</span> <span class="o">-</span> <span class="mi">4</span> <span class="o">:</span> <span class="n">currSand</span><span class="p">;</span>
<span class="n">spills</span> <span class="o">+=</span> <span class="n">currSand</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="c1">// Spill over from neighbours</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span> <span class="o">-</span> <span class="n">points</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span> <span class="o">+</span> <span class="n">points</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">state</span><span class="p">[</span><span class="n">pos</span><span class="p">]</span> <span class="o">=</span> <span class="n">newSand</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// print();</span>
<span class="n">std</span><span class="o">::</span><span class="n">swap</span><span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="n">state</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">spills</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>For some reason with the signature I wasn’t able to auto vectorize it</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Work</span>
<span class="k">class</span> <span class="nc">Sandpile</span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">stabilize</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">state</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buffer</span><span class="p">);</span>
<span class="kr">inline</span> <span class="kt">size_t</span> <span class="n">resolveIdx</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">y</span> <span class="o">*</span> <span class="n">points</span> <span class="o">+</span> <span class="n">x</span><span class="p">;</span> <span class="p">}</span>
<span class="p">};</span>
<span class="c1">// Does not allow for vectorization</span>
<span class="k">class</span> <span class="nc">Sandpile</span> <span class="p">{</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">state</span><span class="p">;</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">buffer</span><span class="p">;</span>
<span class="kt">void</span> <span class="n">stabilize</span><span class="p">();</span>
<span class="kr">inline</span> <span class="kt">size_t</span> <span class="n">resolveIdx</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">y</span> <span class="o">*</span> <span class="n">points</span> <span class="o">+</span> <span class="n">x</span><span class="p">;</span> <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>
<p>Final auto vectorized verison.</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="kt">size_t</span> <span class="n">pixel</span> <span class="o">=</span> <span class="mi">1</span> <span class="o"><<</span> <span class="mi">8</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">size_t</span> <span class="n">points</span> <span class="o">=</span> <span class="n">pixel</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">size_t</span> <span class="n">size</span> <span class="o">=</span> <span class="n">points</span> <span class="o">*</span> <span class="n">points</span><span class="p">;</span>
<span class="k">class</span> <span class="nc">Sandpile</span> <span class="p">{</span>
<span class="n">vector</span><span class="o"><</span><span class="kt">char</span><span class="o">></span> <span class="n">_state</span><span class="p">;</span>
<span class="n">vector</span><span class="o"><</span><span class="kt">char</span><span class="o">></span> <span class="n">_buffer</span><span class="p">;</span>
<span class="nl">private:</span>
<span class="kr">inline</span> <span class="kt">size_t</span> <span class="n">resolveIdx</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">y</span> <span class="o">*</span> <span class="n">points</span> <span class="o">+</span> <span class="n">x</span><span class="p">;</span> <span class="p">}</span>
<span class="c1">// __restrict__ as we know there will be no overlap</span>
<span class="kt">void</span> <span class="n">stabilize</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">__restrict__</span> <span class="n">state</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">__restrict__</span> <span class="n">buffer</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">size_t</span> <span class="n">spills</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">y</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="cp">#pragma omp simd
</span> <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">pos</span> <span class="o">=</span> <span class="n">y</span> <span class="o">*</span> <span class="n">points</span> <span class="o">+</span> <span class="n">x</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">currSand</span> <span class="o">=</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span><span class="p">];</span>
<span class="kt">char</span> <span class="n">newSand</span> <span class="o">=</span> <span class="n">currSand</span> <span class="o">>=</span> <span class="mi">4</span> <span class="o">?</span> <span class="n">currSand</span> <span class="o">-</span> <span class="mi">4</span> <span class="o">:</span> <span class="n">currSand</span><span class="p">;</span>
<span class="n">spills</span> <span class="o">+=</span> <span class="n">currSand</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="c1">// Spill over from neighbours</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span> <span class="o">-</span> <span class="n">points</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span> <span class="o">+</span> <span class="n">points</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">newSand</span> <span class="o">=</span> <span class="n">newSand</span> <span class="o">+</span> <span class="n">buffer</span><span class="p">[</span><span class="n">pos</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="o">>=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">state</span><span class="p">[</span><span class="n">pos</span><span class="p">]</span> <span class="o">=</span> <span class="n">newSand</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// print();</span>
<span class="n">std</span><span class="o">::</span><span class="n">swap</span><span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="n">state</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">spills</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">computeIdentity</span><span class="p">()</span> <span class="p">{</span>
<span class="c1">// f(ones(n)*6 - f(ones(n)*6)</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">buffer</span> <span class="o">=</span> <span class="n">_buffer</span><span class="p">.</span><span class="n">data</span><span class="p">();</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">state</span> <span class="o">=</span> <span class="n">_state</span><span class="p">.</span><span class="n">data</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">y</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">pos</span> <span class="o">=</span> <span class="n">y</span> <span class="o">*</span> <span class="n">points</span> <span class="o">+</span> <span class="n">x</span><span class="p">;</span>
<span class="n">buffer</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">)]</span> <span class="o">=</span> <span class="mi">6</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">stabilize</span><span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="n">state</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">y</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">pixel</span><span class="p">;</span> <span class="o">++</span><span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="n">buffer</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">)]</span> <span class="o">=</span> <span class="mi">6</span> <span class="o">-</span> <span class="n">state</span><span class="p">[</span><span class="n">resolveIdx</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">)];</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">stabilize</span><span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="n">state</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>NimalanFor this case study let’s try to auto-vectorize an implementation of Abelian sandpile model void stabilize() { while(1) { int spills = 0; for (size_t y = 1; y <= pixel; ++y) { for (size_t x = 1; x <= pixel; ++x) { const int pos = y * points + x; char currSand = buffer[pos]; char newSand = currSand >= 4 ? currSand - 4 : currSand; spills += currSand >= 4; newSand = newSand + buffer[pos - points] >= 4; newSand = newSand + buffer[pos + points] >= 4; newSand = newSand + buffer[pos-1] >= 4; newSand = newSand + buffer[pos+1] >= 4; state[pos] = newSand; } } std::swap(buffer, state); if (!spills) { return; } } } Explore this example in [Godbolt]https://godbolt.org/#z:OYLghAFBqd5TKALEBjA9gEwKYFFMCWALugE4A0BIEAZgQDbYB2AhgLbYgDkAjF%2BTXRMiAZVQtGIHgCYBQogFUAztgAKAD24AGfgCsp5eiyahUAUmkAhC5fIrGqIgSHVmmAMLp6AVzZMDbgAyBEzYAHK%2BAEbYpFIA7OQADuhKxM5Mnj5%2BBsmpTkLBoRFs0bE8CfbYjukiRCykRJm%2B/jx22A75TLX1RIXhUTHxdnUNTdmtSiO9If0lg%2BUAlHbo3qSonFwWAMwhqD44ANRmW%2B6TpCHAx7hmWgCCGEyTByFEB4kE6u1HWwAiBzzHdyAgAcx2sdweTxeb3QLyU3z%2BAJOILBN1uIXoMwOqQAXtgAPqvUjYJReABu2AAkph1BBcQTXgBPcjYgh4wkHdQLI5xSwHYlEVZMA6Mg4AKhhcKOVk5YJ5PzRaLJsMwBxoTAgqCQ9QlkW8NBoMRZWp12LqRGw3LMvLRB2ewmx73o9Hhxz%2BWlRd1tarIBzpbIZIoR/zlosBbreH3acpsNkZVptdztdu2iVILGAbBYB3QbESrLYmG9dsEpD99I56mDSL5VfDv0jn3oMasNi5PPBt2TyZNZdQq1IImMqojeoNMTMAFZLMTSfQKdTaczOQspwqtp3u3bewdQgB3IdMEcN/ukQfD77XBsAFm%2BADEDqfz0eDgBaA63kCPgeHosb4vJkoToutK1gnj%2BF5XBG16el2W4APTwQcIjATmFJljQpC5ru2AEMASCRCspBKABdr7r%2BoGjvqhqkFOM4kuSVI0lAorvjwSwrmul7QbBpFmiwFp0bOjGLhAy5clxEbkcOsHdtaCpJvKipxD8XBLPQ3CTvw/hcDo5DoNwQKtjKpKrOs0pbHw5BENoalLAA1iAWxbAAdAAbFoWzXtePBbHEbmTpOWhubIGlcNe/BsCAPkuQAnNIcRxDwWjAtezlxFsk7kDpekGVw/BKCAWjWbZSxwLAKAYDg%2BDEGQlDUHQjCsBw3BWYIwhiBInA8NecjCMoaiaLpeggKFximLGVhtB06SuEeYwtFs5BBDMxSlFIxW5GkQgLVIS1bZ0fRrfMxWVNUQjdKMXjNHt01VJ0l3TEUAxlKdUy7b5ww9EdL0bUsplrBs2y7Ps2DfKcRDnCYUEQkIUIOu8TbVvW7igv%2BsOPK80LJFKEZIkCJxo52GJYhWRIMfOTG0mTIosjT7bWnyApCkGEo48IroyuoMYqYqdzKgQqrqpq2qkLq1FGo%2BosSpMAmWh23rQkBDAgRGHro7c3qluWAYcmGDY1kG9Z/Ij0YbpRVjxgriklr6/rsq8dYG3KTsEybUbNubk2WAziZwd2O5PhRVHjrR07CZToniauk7rpuW47tJL4RkHkFbFefy3scD6py%2B76ft%2BZ6/rJW7K86nNgX8ufHhnH4l92iHIah6DoWqWFsDheEEURJE28mScjq2DZjjRQkUwuzFiW%2B/wcRJsfcTevF93asuCeH49U2JLJz38UnYAeMka3JvOKfJymqepmnabZ%2BmGd72JEeZ2zSPwNnDQsSxINgLA4LEECX%2BFSKIBJzAhcloOI0hpDAlijwScbkQpuWvLFNy2Ub55QKkVEq78HKjWkC5IK0CeAhWBAFOIPlrxZTClsa%2Bw1b75SwToMqiByrIDQLmJ0RoqCanYQwQYwAeAyAEAwC0xFqCRBvpEEI9RGStX4BgNgHBhAAHkmD0BkbQnAWYTCSA0QQYk1QKSFVodgT4/YLSyMoMIdoN9MSRHTKQRkngcA30hgQKKvA1ICCMMAJQAA1Ag%2B8lGJGYBY9qohxCSB6n1RQKgNA330K0IwJg0De0MAQSIhVIBLHQIkToRjXxKK2G%2BdU6BXyQ2wNgV8FJHBkADMUnJzA8wFXaPdWaEA3AfWWkeH6cwyhJBSNtDI11xj9LyOkHp60JgtPOl0d6wyWh3RmY9CZ8wvpXSyAs2WDQVllH%2Bo/bq1liQbD4AArSqDaF5XUCQ18iCDjAFQKgf4PAXLSD9EZawU0Dg1RIGWbYrQDieDzLw350hLILFfqVHBSCXIQNioQ2KcC4jAgSok7gEVyBRSCuc3K3AMHFTfow8gLCICVR4YwCgXD5EcNiPwwRjURGFQgOI2hkjWAOIsfIxRRAVFqJvpo8aOi9KEH0U4QxN8TFVG8OYjxliLRhT0rY%2BxjisAbD0q49xJyvEZj8QEvcQSQkyrCZ1SJvUwkDTibQ/QY1knmGMrYWxmT/76VyekfJhTmkzRcO0%2Ba8yAjdNWr9VoB10idODQUANvSDBnQenMjZUbpkxu%2BhGyZazGi%2BomFMHZUg9lmU4LIcpxzPFhTOTlfglzrm3PuY8gRLy3mpK%2BYQH5FlZAArJTEZt4KGF2XIF/H%2BgwnVhXRZi4qpa6F4q7R/cgjlpCxXcvFNyMhYoZUSiQyh3BqHYrLbiidRLmEoEIAaeqtB5DGu6qa%2BQ5qhoKqQIVRJN6D00CIIyYJmDSA3tGuQN9SgH1PpfVoU5NCcVcB%2BAQA0Bx/H73bVctyNzbxVqebWsk8JoOwbuQ8hDL8J1QriDCuF144jgK2Mg/D0hYGGDRYBrd9DCr4sheRrgmHR3oKw%2BQdCqQXDXiAA%3D%3D%3D) or herePerf Scripts2021-08-17T00:00:00+00:002021-08-17T00:00:00+00:00https://mark1626.github.io/posts/2021/08/17/perf-scripts<p><code class="language-plaintext highlighter-rouge">perf</code> is a Performance analysis tools for Linux. <code class="language-plaintext highlighter-rouge">perf</code> can be used to provide useful statistics about your application with <code class="language-plaintext highlighter-rouge">perf stat <app></code> or sampled and analysed with <code class="language-plaintext highlighter-rouge">perf record <app></code>. When recording with perf we are left with a binary file <code class="language-plaintext highlighter-rouge">perf.data</code> which contains information of the all sampled events.</p>
<p>Events from <code class="language-plaintext highlighter-rouge">perf.data</code> can be extracted and scripted on with <code class="language-plaintext highlighter-rouge">perf script</code>. There is very limited documentation and examples of <code class="language-plaintext highlighter-rouge">perf script</code>, so I’m going to be walking through the exploration I did with <code class="language-plaintext highlighter-rouge">perf script</code>. For this I’ll be using a <code class="language-plaintext highlighter-rouge">perf.data</code> generated from an execution of an albeian sandpile model program.</p>
<!--more-->
<h2 id="dump-events-from-perfdata">Dump events from perf.data</h2>
<p>First let’s understand what data can be extracted from <code class="language-plaintext highlighter-rouge">perf.data</code>. Running <code class="language-plaintext highlighter-rouge">perf script</code> will output events from <code class="language-plaintext highlighter-rouge">perf.data</code></p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>perf record
<span class="c"># Walk through perf file and output contents of each record</span>
perf script <span class="o">></span> out.perf
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> sandpile 1751 13289.335292: 10101010 cpu-clock:uhH: 559a3f21248b Fractal::Sandpile::stabilize+0x24b (/home/bench/sandpile)
</code></pre></div></div>
<p>What each field means</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>comm - sandpile
tid - 1751
event - cpu-clock:uhpppH
ip - 559a3f21248b
sym - Fractal::Sandpile::stabilize
symoff - +0x24b
time - 13289.335292
dso - (/home/bench/sandpile)
</code></pre></div></div>
<p>If perf was recorded with <code class="language-plaintext highlighter-rouge">-g</code>, the output contains trace like this</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>perf record <span class="nt">-g</span>
perf script <span class="o">></span> out.perf
sandpile 131 11005.728397: 10101010 cpu-clock:uhH:
560cdca6739a Fractal::Sandpile::stabilize+0x15a <span class="o">(</span>/home/bench/sandpile<span class="o">)</span>
560cdca67984 Fractal::Sandpile::computeIdentity+0x84 <span class="o">(</span>/home/bench/sandpile<span class="o">)</span>
560cdca670ef main+0x1f <span class="o">(</span>/home/bench/sandpile<span class="o">)</span>
7f36527fd09b __libc_start_main+0xeb <span class="o">(</span>/lib/x86_64-linux-gnu/libc-2.28.so<span class="o">)</span>
41fd89415541f689 <span class="o">[</span>unknown] <span class="o">([</span>unknown]<span class="o">)</span>
</code></pre></div></div>
<p>If we want to extracting a custom set of fields from perf events</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>perf script -F comm,tid,event,ip,sym,srcline,time > out.perf
sandpile 1751 13289.335292: cpu-clock:uhH: 559a3f21248b Fractal::Sandpile::stabilize
sandpile.cc:72
sandpile 1751 13289.345516: cpu-clock:uhH: 559a3f212494 Fractal::Sandpile::stabilize
sandpile.cc:72
# Add extra field
perf script -F+srcline > out.perf
sandpile 1751 13289.335292: 10101010 cpu-clock:uhH: 559a3f21248b Fractal::Sandpile::stabilize+0x24b (/home/bench/sandpile)
sandpile.cc:72
sandpile 1751 13289.345516: 10101010 cpu-clock:uhH: 559a3f212494 Fractal::Sandpile::stabilize+0x254 (/home/bench/sandpile)
# Remove field
perf script -F+srcline -F-period > out.perf
sandpile 1751 13289.335292: cpu-clock:uhH: 559a3f21248b Fractal::Sandpile::stabilize+0x24b (/home/bench/sandpile)
sandpile.cc:72
sandpile 1751 13289.345516: cpu-clock:uhH: 559a3f212494 Fractal::Sandpile::stabilize+0x254 (/home/bench/sandpile)
sandpile.cc:72
</code></pre></div></div>
<p>List of all events in perf we can extract(this was taken from the man page)</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Valid types: hw,sw,trace,raw,synth. Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,addr,symoff,srcline,period,iregs,uregs,brstack,brstacksym,flags,bpf-output,brstackinsn,brstackoff,callindent,insn,insnlen,synth,phys_addr,metric,misc
</code></pre></div></div>
<h2 id="scripting">Scripting</h2>
<p>There are two ways of scripting the data. You can generate a script from the default template with <code class="language-plaintext highlighter-rouge">perf script -g perl</code> or <code class="language-plaintext highlighter-rouge">perf script -g python</code></p>
<p>Or you can create a custom script, for this example we’ll generate a custom script, I’ll cover using the generated script in a separate blog</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>perf script <span class="nt">-F</span>+srcline <span class="nt">-F-period</span> <span class="nt">-F-time</span> <span class="nt">-F-dso</span> <span class="nt">-F</span>+sym <span class="nt">-F-symoff</span> <span class="o">></span> out.perf
sandpile 1751/1751 cpu-clock:uhH: 559a3f21248b Fractal::Sandpile::stabilize
sandpile.cc:72
</code></pre></div></div>
<p>The structure of the file is constant so scripting this is similar to scripting any file. The following is a script to extract the occurrence of a particular source line number in the samples, giving us an idea on which symbol and line took time</p>
<h3 id="case---1-identifying-a-hotspot-in-the-code-with-source-line-number-and-symbol">Case - 1: Identifying a hotspot in the code with source line number and symbol</h3>
<div class="language-pl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env perl</span>
<span class="c1"># srcline-occurance.pl <perf output></span>
<span class="c1"># Usage perf script -F+srcline -F-period -F-time -F-dso -F+sym -F-symoff | srcline-occurance.pl</span>
<span class="k">use</span> <span class="nv">strict</span><span class="p">;</span>
<span class="k">my</span> <span class="nv">%sym_occurance</span> <span class="o">=</span> <span class="p">();</span>
<span class="k">while</span> <span class="p">(</span><span class="o"><></span><span class="p">)</span> <span class="p">{</span>
<span class="nb">chomp</span><span class="p">;</span>
<span class="k">next</span> <span class="k">if</span> <span class="vg">$_</span> <span class="o">=~</span> <span class="sr">/^#/</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="sr">/^\s*(\S.+?)\s+(\d+)\/*(\d+)*\s+/</span><span class="p">)</span> <span class="p">{</span>
<span class="k">my</span> <span class="p">(</span><span class="nv">$comm</span><span class="p">,</span> <span class="nv">$pid</span><span class="p">,</span> <span class="nv">$tid</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="err">$</span><span class="mi">1</span><span class="p">,</span> <span class="err">$</span><span class="mi">2</span><span class="p">,</span> <span class="err">$</span><span class="mi">3</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="ow">not</span> <span class="nv">$tid</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$tid</span> <span class="o">=</span> <span class="nv">$pid</span><span class="p">;</span>
<span class="nv">$pid</span> <span class="o">=</span> <span class="p">"</span><span class="s2">?</span><span class="p">";</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="sr">/(\S+):\s*(\S+)\s+(\S+)/</span><span class="p">)</span> <span class="p">{</span>
<span class="k">my</span> <span class="p">(</span><span class="nv">$event</span><span class="p">,</span> <span class="nv">$ip</span><span class="p">,</span> <span class="nv">$sym</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="err">$</span><span class="mi">1</span><span class="p">,</span> <span class="err">$</span><span class="mi">2</span><span class="p">,</span> <span class="err">$</span><span class="mi">3</span><span class="p">);</span>
<span class="vg">$_</span> <span class="o">=</span> <span class="o"><></span><span class="p">;</span>
<span class="nb">chomp</span><span class="p">;</span>
<span class="k">my</span> <span class="nv">$srcline</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="sr">/\s+(\S+:\d+)/</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$srcline</span> <span class="o">=</span> <span class="err">$</span><span class="mi">1</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nv">$srcline</span> <span class="o">=</span> <span class="p">'</span><span class="s1">Unknown</span><span class="p">';</span>
<span class="nv">$sym</span> <span class="o">=</span> <span class="p">'</span><span class="s1">?</span><span class="p">';</span>
<span class="k">next</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">exists</span> <span class="nv">$sym_occurance</span><span class="p">{</span><span class="nv">$srcline</span><span class="p">})</span> <span class="p">{</span>
<span class="nv">$sym_occurance</span><span class="p">{</span><span class="nv">$srcline</span><span class="p">}{</span><span class="nv">occ</span><span class="p">}</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nv">$sym_occurance</span><span class="p">{</span><span class="nv">$srcline</span><span class="p">}{</span><span class="nv">symbol</span><span class="p">}</span> <span class="o">=</span> <span class="nv">$sym</span><span class="p">;</span>
<span class="nv">$sym_occurance</span><span class="p">{</span><span class="nv">$srcline</span><span class="p">}{</span><span class="nv">occ</span><span class="p">}</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">print</span> <span class="bp">STDERR</span> <span class="p">"</span><span class="s2">Unknown line</span><span class="p">";</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">foreach</span> <span class="k">my</span> <span class="nv">$k</span> <span class="p">(</span><span class="nb">sort</span> <span class="p">{</span> <span class="nv">$a</span> <span class="ow">cmp</span> <span class="nv">$b</span> <span class="p">}</span> <span class="nb">keys</span> <span class="nv">%sym_occurance</span><span class="p">)</span> <span class="p">{</span>
<span class="k">print</span> <span class="p">"</span><span class="si">$k</span><span class="s2"> </span><span class="si">$sym_occurance</span><span class="s2">{</span><span class="si">$k</span><span class="s2">}{symbol} </span><span class="si">$sym_occurance</span><span class="s2">{</span><span class="si">$k</span><span class="s2">}{occ}</span><span class="se">\n</span><span class="p">";</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sandpile.cc:62 Fractal::Sandpile::stabilize 1
sandpile.cc:63 Fractal::Sandpile::stabilize 1
sandpile.cc:64 Fractal::Sandpile::stabilize 5
sandpile.cc:67 Fractal::Sandpile::stabilize 35
sandpile.cc:70 Fractal::Sandpile::stabilize 3
sandpile.cc:71 Fractal::Sandpile::stabilize 2
sandpile.cc:72 Fractal::Sandpile::stabilize 4
sandpile.cc:73 Fractal::Sandpile::stabilize 15
sandpile.cc:75 Fractal::Sandpile::stabilize 3
sandpile.cc:85 Fractal::Sandpile::stabilize 1
</code></pre></div></div>
<h3 id="case---2-spitting-perfdata-of-a-mpi-run-into-separate-files-for-individual-processes">Case - 2: Spitting perf.data of a MPI run into separate files for individual processes</h3>
<div class="language-perl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env perl</span>
<span class="c1"># split-process.pl <perf output></span>
<span class="c1"># Usage perf script -F+pid | split-process.pl</span>
<span class="k">use</span> <span class="nv">strict</span><span class="p">;</span>
<span class="k">my</span> <span class="nv">%files</span> <span class="o">=</span> <span class="p">{};</span>
<span class="k">my</span> <span class="nv">$pid</span> <span class="o">=</span> <span class="p">'';</span>
<span class="k">my</span> <span class="nv">$comm</span> <span class="o">=</span> <span class="p">'';</span>
<span class="k">my</span> <span class="nv">$tid</span> <span class="o">=</span> <span class="p">'';</span>
<span class="k">while</span> <span class="p">(</span><span class="o"><></span><span class="p">)</span> <span class="p">{</span>
<span class="nb">chomp</span><span class="p">;</span>
<span class="k">next</span> <span class="k">if</span> <span class="vg">$_</span> <span class="o">=~</span> <span class="sr">/^#/</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="sr">/^\s*(\S.+?)\s+(\d+)\/*(\d+)*\s+/</span><span class="p">)</span> <span class="p">{</span>
<span class="p">(</span><span class="nv">$pid</span><span class="p">,</span> <span class="nv">$tid</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="err">$</span><span class="mi">2</span><span class="p">,</span> <span class="err">$</span><span class="mi">3</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="ow">not</span> <span class="nv">$tid</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$tid</span> <span class="o">=</span> <span class="nv">$pid</span><span class="p">;</span>
<span class="nv">$pid</span> <span class="o">=</span> <span class="p">"</span><span class="s2">?</span><span class="p">";</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nb">open</span> <span class="k">my</span> <span class="nv">$fh</span><span class="p">,</span> <span class="p">'</span><span class="s1">>></span><span class="p">',</span> <span class="p">"</span><span class="s2">out-</span><span class="si">$pid</span><span class="s2">.perf</span><span class="p">";</span>
<span class="k">print</span> <span class="nv">$fh</span> <span class="vg">$_</span> <span class="o">.</span> <span class="p">"</span><span class="se">\n</span><span class="p">";</span>
<span class="p">}</span>
</code></pre></div></div>
<p>will generate <code class="language-plaintext highlighter-rouge">out-<pid1>.perf</code>, <code class="language-plaintext highlighter-rouge">out-<pid2>.perf</code>, <code class="language-plaintext highlighter-rouge">out-<pid3>.perf</code></p>Nimalanperf is a Performance analysis tools for Linux. perf can be used to provide useful statistics about your application with perf stat <app> or sampled and analysed with perf record <app>. When recording with perf we are left with a binary file perf.data which contains information of the all sampled events. Events from perf.data can be extracted and scripted on with perf script. There is very limited documentation and examples of perf script, so I’m going to be walking through the exploration I did with perf script. For this I’ll be using a perf.data generated from an execution of an albeian sandpile model program.Auto Vectorization Case Study 12021-08-12T00:00:00+00:002021-08-12T00:00:00+00:00https://mark1626.github.io/posts/2021/08/12/auto-vectorization-case-study-1<p>A small case study of auto vectorization.</p>
<h2 id="the-challenge">The Challenge</h2>
<p>One challenge in auto vectorization this is that the compiler has no way to know that the pointers <code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">b</code> and <code class="language-plaintext highlighter-rouge">c</code> don’t overlap; meaning the function can be called as <code class="language-plaintext highlighter-rouge">case_study_1(x, y, z)</code> or <code class="language-plaintext highlighter-rouge">case_study_1(x, x+8, x+16)</code>, the latter if vectorized leads to undefined behavior</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">case_study_1</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">b</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">1</span><span class="o"><<</span><span class="mi">20</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="n">c</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Example at <a href="https://godbolt.org/z/b6qP8hasr">Godbolt</a></p>
<!--more-->
<h2 id="runtime-check-of-pointers">Runtime Check of Pointers</h2>
<p>Both clang and GCC use runtime checks of pointers to vectorize the loops as there is no guarantee that the pointers don’t overlap - <a href="https://llvm.org/docs/Vectorizers.html#runtime-checks-of-pointers">Clang Runtime Check of Pointers</a></p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">case_study_1</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">b</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">1</span><span class="o"><<</span><span class="mi">20</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="n">c</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>They generate two sets of ASM one vectorized and another normal and call the appropriate one</p>
<div class="language-s highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">case_study_1</span><span class="p">(</span><span class="n">int</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">int</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">int</span><span class="o">*</span><span class="p">)</span><span class="o">:</span><span class="w">
</span><span class="n">lea</span><span class="w"> </span><span class="n">rcx</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="n">rdi</span><span class="m">+4</span><span class="p">]</span><span class="w">
</span><span class="n">mov</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="n">rdx</span><span class="w">
</span><span class="n">sub</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="n">rcx</span><span class="w"> </span><span class="c1"># Basic address arithmetic to look for overlap</span><span class="w">
</span><span class="n">cmp</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="c1"># Check for overlap</span><span class="w">
</span><span class="n">jbe</span><span class="w"> </span><span class="n">.L5</span><span class="w"> </span><span class="c1"># Fallback to non vectorized .L5 if overlap is detected</span><span class="w">
</span><span class="n">lea</span><span class="w"> </span><span class="n">rcx</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="n">rsi</span><span class="m">+4</span><span class="p">]</span><span class="w">
</span><span class="n">mov</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="n">rdx</span><span class="w">
</span><span class="n">sub</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="n">rcx</span><span class="w">
</span><span class="n">cmp</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="c1"># Check for overlap</span><span class="w">
</span><span class="n">jbe</span><span class="w"> </span><span class="n">.L5</span><span class="w"> </span><span class="c1"># Fallback to non vectorized .L5 if overlap is detected</span><span class="w">
</span><span class="n">xor</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="w"> </span><span class="n">eax</span><span class="w">
</span><span class="n">.L3</span><span class="o">:</span><span class="w">
</span><span class="n">movdqu</span><span class="w"> </span><span class="n">xmm0</span><span class="p">,</span><span class="w"> </span><span class="n">XMMWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="p">[</span><span class="n">rdi</span><span class="o">+</span><span class="n">rax</span><span class="p">]</span><span class="w">
</span><span class="n">movdqu</span><span class="w"> </span><span class="n">xmm1</span><span class="p">,</span><span class="w"> </span><span class="n">XMMWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="p">[</span><span class="n">rsi</span><span class="o">+</span><span class="n">rax</span><span class="p">]</span><span class="w">
</span><span class="n">paddd</span><span class="w"> </span><span class="n">xmm0</span><span class="p">,</span><span class="w"> </span><span class="n">xmm1</span><span class="w">
</span><span class="n">movups</span><span class="w"> </span><span class="n">XMMWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="p">[</span><span class="n">rdx</span><span class="o">+</span><span class="n">rax</span><span class="p">],</span><span class="w"> </span><span class="n">xmm0</span><span class="w">
</span><span class="n">add</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="m">16</span><span class="w">
</span><span class="n">cmp</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="m">4194304</span><span class="w">
</span><span class="n">jne</span><span class="w"> </span><span class="n">.L3</span><span class="w">
</span><span class="n">ret</span><span class="w">
</span><span class="n">.L5</span><span class="o">:</span><span class="w">
</span><span class="n">xor</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="w"> </span><span class="n">eax</span><span class="w">
</span><span class="n">.L2</span><span class="o">:</span><span class="w">
</span><span class="n">mov</span><span class="w"> </span><span class="n">ecx</span><span class="p">,</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="p">[</span><span class="n">rsi</span><span class="o">+</span><span class="n">rax</span><span class="p">]</span><span class="w">
</span><span class="n">add</span><span class="w"> </span><span class="n">ecx</span><span class="p">,</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="p">[</span><span class="n">rdi</span><span class="o">+</span><span class="n">rax</span><span class="p">]</span><span class="w">
</span><span class="n">mov</span><span class="w"> </span><span class="n">DWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="p">[</span><span class="n">rdx</span><span class="o">+</span><span class="n">rax</span><span class="p">],</span><span class="w"> </span><span class="n">ecx</span><span class="w">
</span><span class="n">add</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="w">
</span><span class="n">cmp</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="m">4194304</span><span class="w">
</span><span class="n">jne</span><span class="w"> </span><span class="n">.L2</span><span class="w">
</span><span class="n">ret</span><span class="w">
</span></code></pre></div></div>
<h2 id="__restrict__">__restrict__</h2>
<p>This can be simplified by telling the compiler that these pointers will not overlap with the <code class="language-plaintext highlighter-rouge">__restrict__</code>, this will result in the compiler generating only vector instructions</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">case_study_1</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">__restrict__</span> <span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">__restrict__</span> <span class="n">b</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">__restrict__</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">1</span><span class="o"><<</span><span class="mi">20</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="n">c</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-s highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">case_study_1</span><span class="p">(</span><span class="n">int</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">int</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">int</span><span class="o">*</span><span class="p">)</span><span class="o">:</span><span class="w">
</span><span class="n">xor</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="w"> </span><span class="n">eax</span><span class="w">
</span><span class="n">.L2</span><span class="o">:</span><span class="w"> </span><span class="c1"># Vectorized loop</span><span class="w">
</span><span class="n">movdqu</span><span class="w"> </span><span class="n">xmm0</span><span class="p">,</span><span class="w"> </span><span class="n">XMMWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="p">[</span><span class="n">rsi</span><span class="o">+</span><span class="n">rax</span><span class="p">]</span><span class="w">
</span><span class="n">movdqu</span><span class="w"> </span><span class="n">xmm1</span><span class="p">,</span><span class="w"> </span><span class="n">XMMWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="p">[</span><span class="n">rdi</span><span class="o">+</span><span class="n">rax</span><span class="p">]</span><span class="w">
</span><span class="n">paddd</span><span class="w"> </span><span class="n">xmm0</span><span class="p">,</span><span class="w"> </span><span class="n">xmm1</span><span class="w">
</span><span class="n">movups</span><span class="w"> </span><span class="n">XMMWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="p">[</span><span class="n">rdx</span><span class="o">+</span><span class="n">rax</span><span class="p">],</span><span class="w"> </span><span class="n">xmm0</span><span class="w">
</span><span class="n">add</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="m">16</span><span class="w">
</span><span class="n">cmp</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="m">4194304</span><span class="w">
</span><span class="n">jne</span><span class="w"> </span><span class="n">.L2</span><span class="w">
</span><span class="n">ret</span><span class="w">
</span></code></pre></div></div>
<h2 id="openmp">OpenMP</h2>
<p>Another way to achieve vectorization here is with the OpenMP directive <code class="language-plaintext highlighter-rouge">#pragma omp simd</code>. By defining the <code class="language-plaintext highlighter-rouge">pragma</code> we force the compiler to emit vector instructions for the loop</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">case_study_1</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">b</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Note: Compiler assumes there is no overlap so this has to used when you are sure there will not be a overlap</span>
<span class="cp">#pragma omp simd
</span> <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">1</span><span class="o"><<</span><span class="mi">20</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="n">c</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-s highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">case_study_1</span><span class="p">(</span><span class="n">int</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">int</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">int</span><span class="o">*</span><span class="p">)</span><span class="o">:</span><span class="w">
</span><span class="n">xor</span><span class="w"> </span><span class="n">eax</span><span class="p">,</span><span class="w"> </span><span class="n">eax</span><span class="w">
</span><span class="n">.L2</span><span class="o">:</span><span class="w"> </span><span class="c1"># Vectorized loop</span><span class="w">
</span><span class="n">movdqu</span><span class="w"> </span><span class="n">xmm0</span><span class="p">,</span><span class="w"> </span><span class="n">XMMWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="p">[</span><span class="n">rsi</span><span class="o">+</span><span class="n">rax</span><span class="p">]</span><span class="w">
</span><span class="n">movdqu</span><span class="w"> </span><span class="n">xmm1</span><span class="p">,</span><span class="w"> </span><span class="n">XMMWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="p">[</span><span class="n">rdi</span><span class="o">+</span><span class="n">rax</span><span class="p">]</span><span class="w">
</span><span class="n">paddd</span><span class="w"> </span><span class="n">xmm0</span><span class="p">,</span><span class="w"> </span><span class="n">xmm1</span><span class="w">
</span><span class="n">movups</span><span class="w"> </span><span class="n">XMMWORD</span><span class="w"> </span><span class="n">PTR</span><span class="w"> </span><span class="p">[</span><span class="n">rdx</span><span class="o">+</span><span class="n">rax</span><span class="p">],</span><span class="w"> </span><span class="n">xmm0</span><span class="w">
</span><span class="n">add</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="m">16</span><span class="w">
</span><span class="n">cmp</span><span class="w"> </span><span class="n">rax</span><span class="p">,</span><span class="w"> </span><span class="m">4194304</span><span class="w">
</span><span class="n">jne</span><span class="w"> </span><span class="n">.L2</span><span class="w">
</span><span class="n">ret</span><span class="w">
</span></code></pre></div></div>NimalanA small case study of auto vectorization. The Challenge One challenge in auto vectorization this is that the compiler has no way to know that the pointers a, b and c don’t overlap; meaning the function can be called as case_study_1(x, y, z) or case_study_1(x, x+8, x+16), the latter if vectorized leads to undefined behavior void case_study_1(int *a, int *b, int *c) { for (int i = 0; i < 1<<20; i++) { c[i] = a[i] + b[i]; } } Example at GodboltExploring Obsured Code: Part -12021-06-04T00:00:00+00:002021-06-04T00:00:00+00:00https://mark1626.github.io/posts/2021/06/04/exploring-obscured-code-part-1<p>This is an snippet I saved back in college. I’m not sure who the original author is for this. Let’s try to understand how this works</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// A decimal to binary convertor</span>
<span class="c1">// All Credits to the original author</span>
<span class="p">(</span><span class="nx">_$</span><span class="o">=</span><span class="p">(</span><span class="nx">$</span><span class="p">,</span><span class="nx">_</span><span class="o">=</span><span class="p">[]</span><span class="o">+</span><span class="p">[])</span><span class="o">=></span><span class="nx">$</span><span class="p">?</span><span class="nx">_$</span><span class="p">(</span><span class="nx">$</span><span class="o">>>+!!</span><span class="p">[],(</span><span class="nx">$</span><span class="o">&+!!</span><span class="p">[])</span><span class="o">+</span><span class="nx">_</span><span class="p">):</span><span class="nx">_</span><span class="p">)(</span><span class="mi">255</span><span class="p">);</span>
</code></pre></div></div>
<!--more-->
<h2 id="round-1">Round 1</h2>
<p>Let’s format this</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span>
<span class="nx">_$</span><span class="o">=</span><span class="p">(</span><span class="nx">$</span><span class="p">,</span><span class="nx">_</span><span class="o">=</span><span class="p">[]</span><span class="o">+</span><span class="p">[])</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">$</span><span class="p">?</span><span class="nx">_$</span><span class="p">(</span><span class="nx">$</span><span class="o">>>+!!</span><span class="p">[],(</span><span class="nx">$</span><span class="o">&+!!</span><span class="p">[])</span><span class="o">+</span><span class="nx">_</span><span class="p">):</span><span class="nx">_</span>
<span class="p">}</span>
<span class="p">)(</span><span class="mi">255</span><span class="p">);</span>
</code></pre></div></div>
<hr />
<h2 id="round-2">Round 2</h2>
<p>Rename to some readable variables</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// number is the number we are trying to convert to binary</span>
<span class="c1">// binary is the binary representation constructed so far</span>
<span class="p">(</span>
<span class="nx">fn</span> <span class="o">=</span> <span class="p">(</span><span class="nx">number</span><span class="p">,</span> <span class="nx">binary</span><span class="o">=</span><span class="p">[]</span><span class="o">+</span><span class="p">[])</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">number</span> <span class="p">?</span> <span class="nx">fn</span><span class="p">(</span><span class="nx">number</span><span class="o">>>+!!</span><span class="p">[],(</span><span class="nx">number</span><span class="o">&+!!</span><span class="p">[])</span><span class="o">+</span><span class="nx">binary</span><span class="p">)</span> <span class="p">:</span> <span class="nx">binary</span>
<span class="p">}</span>
<span class="p">)(</span><span class="mi">255</span><span class="p">);</span>
</code></pre></div></div>
<h2 id="round-3">Round 3</h2>
<p>Simplyfing some expressions</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// []+[] === ''</span>
<span class="c1">// !![] === true</span>
<span class="c1">// +true === 1</span>
<span class="c1">// (number&1) + '' === '0' | '1'</span>
<span class="p">(</span>
<span class="nx">fn</span> <span class="o">=</span> <span class="p">(</span><span class="nx">number</span><span class="p">,</span> <span class="nx">binary</span> <span class="o">=</span> <span class="dl">""</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">number</span> <span class="p">?</span> <span class="nx">fn</span><span class="p">(</span><span class="nx">number</span> <span class="o">>></span> <span class="mi">1</span><span class="p">,</span> <span class="p">(</span><span class="nx">number</span> <span class="o">&</span> <span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="nx">binary</span><span class="p">)</span> <span class="p">:</span> <span class="nx">binary</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">)(</span><span class="mi">255</span><span class="p">);</span>
</code></pre></div></div>
<h2 id="round-4">Round 4</h2>
<p>Simplyfing this down even further. Now this is a standard recursive decimal to binary convertor</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// From this (_$=($,_=[]+[])=>$?_$($>>+!![],($&+!![])+_):_)(255); to </span>
<span class="kd">const</span> <span class="nx">fn</span> <span class="o">=</span> <span class="p">(</span><span class="nx">number</span><span class="p">,</span> <span class="nx">binary</span> <span class="o">=</span> <span class="dl">""</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">number</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">binary</span> <span class="o">=</span> <span class="p">(</span><span class="nx">number</span> <span class="o">&</span> <span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="nx">binary</span><span class="p">;</span>
<span class="nx">number</span> <span class="o">>>=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">return</span> <span class="nx">fn</span><span class="p">(</span><span class="nx">number</span><span class="p">,</span> <span class="nx">binary</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">binary</span><span class="p">;</span>
<span class="p">};</span>
<span class="nx">fn</span><span class="p">(</span><span class="mi">255</span><span class="p">);</span>
</code></pre></div></div>NimalanThis is an snippet I saved back in college. I’m not sure who the original author is for this. Let’s try to understand how this works // A decimal to binary convertor // All Credits to the original author (_$=($,_=[]+[])=>$?_$($>>+!![],($&+!![])+_):_)(255);Node Module System2021-05-09T00:00:00+00:002021-05-09T00:00:00+00:00https://mark1626.github.io/posts/2021/05/09/node-module-extensions<p>Similar to the article I wrote about <a href="https://mark1626.github.io/posts/2021/04/21/lua-module-loader/">Lua module loaders</a>, the same can be done in <code class="language-plaintext highlighter-rouge">nodejs</code>. We can override the <code class="language-plaintext highlighter-rouge">require</code> to be able to load custom extensions</p>
<p>Wait a minute is there any practical use case other than a syntax hack?</p>
<p>Well then how about we build a loader so nodejs can read yaml</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">./loader</span><span class="dl">"</span><span class="p">)</span>
<span class="kd">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">config.yaml</span><span class="dl">"</span><span class="p">)</span>
<span class="kd">const</span> <span class="nx">port</span> <span class="o">=</span> <span class="nx">config</span><span class="p">.</span><span class="nx">port</span> <span class="o">||</span> <span class="mi">3000</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">http</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">'</span><span class="s1">http</span><span class="dl">'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">requestListener</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">req</span><span class="p">,</span> <span class="nx">res</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">writeHead</span><span class="p">(</span><span class="mi">200</span><span class="p">);</span>
<span class="nx">res</span><span class="p">.</span><span class="nx">end</span><span class="p">(</span><span class="dl">'</span><span class="s1">Hello, World!</span><span class="dl">'</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">server</span> <span class="o">=</span> <span class="nx">http</span><span class="p">.</span><span class="nx">createServer</span><span class="p">(</span><span class="nx">requestListener</span><span class="p">);</span>
<span class="nx">server</span><span class="p">.</span><span class="nx">listen</span><span class="p">(</span><span class="nx">config</span><span class="p">.</span><span class="nx">port</span><span class="p">);</span>
</code></pre></div></div>
<p>This looks way more practical so let’s begin.</p>
<p>TLDR; Leaving a link to the <a href="https://github.com/Mark1626/Paraphernalia/tree/master/node-module-extensions">source code</a> I made for this example</p>
<!--more-->
<h2 id="understanding-the-module-system">Understanding the Module system</h2>
<p>Before we get into this, have you seen stack traces similar to these before? In this case I tried to run node on a non existent file</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>❯ node test
internal/modules/cjs/loader.js:883
throw err;
^
Error: Cannot find module 'path/to/file'
at Function.Module._resolveFilename (internal/modules/cjs/loader.js:880:15)
at Function.Module._load (internal/modules/cjs/loader.js:725:27)
at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:72:12)
at internal/main/run_main_module.js:17:47 {
code: 'MODULE_NOT_FOUND',
requireStack: []
}
</code></pre></div></div>
<p>What I’m trying to highlight here is the <code class="language-plaintext highlighter-rouge">internal/modules/cjs/loader.js</code>, <code class="language-plaintext highlighter-rouge">internal/modules/run_main.js</code>, <code class="language-plaintext highlighter-rouge">MODULE_NOT_FOUND</code></p>
<h3 id="module-system">Module System</h3>
<p><a href="https://github.com/nodejs/node/blob/master/lib/internal/modules/cjs/loader.js">Source@nodejs</a></p>
<p>Every module in <code class="language-plaintext highlighter-rouge">nodejs</code> is converted into a function wrapper of the following structure</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">exports</span><span class="p">,</span> <span class="nx">require</span><span class="p">,</span> <span class="nx">module</span><span class="p">,</span> <span class="nx">__filename</span><span class="p">,</span> <span class="nx">__dirname</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Module code actually lives in here</span>
<span class="p">});</span>
</code></pre></div></div>
<ul>
<li>The module object is where we export our code in <code class="language-plaintext highlighter-rouge">module.exports</code></li>
<li>Next this conversion is done by a module loader</li>
</ul>
<h3 id="main-module">Main Module</h3>
<p>When a file is run with <code class="language-plaintext highlighter-rouge">node</code>, it is set as the main module <code class="language-plaintext highlighter-rouge">require.main</code>, this is what we saw in the first call <code class="language-plaintext highlighter-rouge">internal/modules/run_main.js</code> earlier in the stack trace</p>
<p>The main module is run inside the VM</p>
<h3 id="module_load">Module._load</h3>
<p><a href="https://github.com/nodejs/node/blob/7c8a60851c459ea18afbfc54bfc8cf7394ea56c3/lib/internal/modules/cjs/loader.js#L753">Source@nodejs</a>
<a href="https://github.com/nodejs/node/blob/7c8a60851c459ea18afbfc54bfc8cf7394ea56c3/lib/internal/modules/cjs/loader.js#L977">Source@nodejs</a></p>
<ol>
<li>First it checks if the module is present in cache</li>
<li>Create a new module and save it to cache</li>
<li>Call <code class="language-plaintext highlighter-rouge">module.load</code>
<ul>
<li>Call <code class="language-plaintext highlighter-rouge">Module._extensions[extension](this, filename);</code></li>
</ul>
</li>
</ol>
<h3 id="module_extensions">Module._extensions</h3>
<p><code class="language-plaintext highlighter-rouge">_extensions</code> contains loaders for all the different file extensions in <code class="language-plaintext highlighter-rouge">nodejs</code>. Loaders finally populate the <code class="language-plaintext highlighter-rouge">module.exports</code></p>
<p>In the stacktrace earlier the file is loaded with the <code class="language-plaintext highlighter-rouge">cjs</code> loader hence the <code class="language-plaintext highlighter-rouge">internal/modules/cjs/loader.js</code> in the trace</p>
<p>In the JSON loader it’s just a simple <code class="language-plaintext highlighter-rouge">module.exports = JSON.parse(content)</code></p>
<h4 id="native-module-loaders-in-nodejs">Native module loaders in NodeJS</h4>
<ol>
<li><a href="https://github.com/nodejs/node/blob/master/lib/internal/modules/cjs/loader.js#L1118-L1139">CJS Loader</a></li>
<li><a href="https://github.com/nodejs/node/blob/master/lib/internal/modules/cjs/loader.js#L1143-L1157">Json Loader</a></li>
<li><a href="https://github.com/nodejs/node/blob/master/lib/internal/modules/cjs/loader.js#L1161-L1169">Node(.node) Loader</a></li>
</ol>
<h3 id="module_compile">Module._compile</h3>
<p><a href="https://github.com/nodejs/node/blob/7c8a60851c459ea18afbfc54bfc8cf7394ea56c3/lib/internal/modules/cjs/loader.js#L1063">Source@nodejs</a>
<a href="https://github.com/nodejs/node/blob/7c8a60851c459ea18afbfc54bfc8cf7394ea56c3/lib/internal/modules/cjs/helpers.js#L49">Create Require Source</a></p>
<p>This does the actual work. It create a new require instance</p>
<ul>
<li>Run the file contents in the correct scope. Expose the correct helper variables (require, module, exports) to the file.</li>
<li>Returns exception, if any.</li>
</ul>
<p>Unfortunately I can’t get into the depths of this in this article, so let’s save this for later</p>
<hr />
<h2 id="creating-our-own-module-extensions">Creating our own module extensions</h2>
<blockquote>
<p><strong>Note:</strong> There will always be a better way to do this than to override require, <strong>do not use this in production without knowing what you are doing</strong>. As per the <a href="https://nodejs.org/api/modules.html#modules_require_extensions">Nodejs docs</a> this is considered deprecated as it could mess with performance.</p>
</blockquote>
<p>Despite the warning this is how things work under the hood, let’s continue for our understanding</p>
<h3 id="case-1-module-loader-for-yaml">Case 1: Module loader for YAML</h3>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Load the yaml as JSON into the variable config</span>
<span class="kd">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">./config.yaml</span><span class="dl">"</span><span class="p">)</span>
</code></pre></div></div>
<p>Since this is a POC I’m going to be using the npm package <code class="language-plaintext highlighter-rouge">yaml</code> for the parsing</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">fs</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">fs</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">yaml</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">yaml</span><span class="dl">"</span><span class="p">);</span>
<span class="cm">/*
Loader for yaml, based on the json loader in nodejs
https://github.com/nodejs/node/blob/master/lib/internal/modules/cjs/loader.js#L1143-L1157
*/</span>
<span class="kd">function</span> <span class="nx">yamlLoader</span><span class="p">(</span><span class="nx">mod</span><span class="p">,</span> <span class="nx">filename</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">content</span> <span class="o">=</span> <span class="nx">fs</span><span class="p">.</span><span class="nx">readFileSync</span><span class="p">(</span><span class="nx">filename</span><span class="p">,</span> <span class="dl">"</span><span class="s2">utf8</span><span class="dl">"</span><span class="p">);</span>
<span class="k">try</span> <span class="p">{</span>
<span class="nx">mod</span><span class="p">.</span><span class="nx">exports</span> <span class="o">=</span> <span class="nx">yaml</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">content</span><span class="p">)</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">err</span><span class="p">.</span><span class="nx">message</span> <span class="o">=</span> <span class="nx">filename</span> <span class="o">+</span> <span class="dl">"</span><span class="s2">: </span><span class="dl">"</span> <span class="o">+</span> <span class="nx">err</span><span class="p">.</span><span class="nx">message</span><span class="p">;</span>
<span class="k">throw</span> <span class="nx">err</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">Module</span><span class="p">.</span><span class="nx">_extensions</span><span class="p">[</span><span class="dl">"</span><span class="s2">.yaml</span><span class="dl">"</span><span class="p">]</span> <span class="o">=</span> <span class="nx">yamlLoader</span><span class="p">;</span>
<span class="nx">Module</span><span class="p">.</span><span class="nx">_extensions</span><span class="p">[</span><span class="dl">"</span><span class="s2">.yml</span><span class="dl">"</span><span class="p">]</span> <span class="o">=</span> <span class="nx">yamlLoader</span><span class="p">;</span>
</code></pre></div></div>
<p>That’s pretty much it!!!!</p>
<h4 id="how-does-this-work">How does this work?</h4>
<ul>
<li><code class="language-plaintext highlighter-rouge">Module._extensions</code> contains the loaders for each extension</li>
<li>We simply need to add a function to handle <code class="language-plaintext highlighter-rouge">.yaml</code> and <code class="language-plaintext highlighter-rouge">.yml</code> files</li>
<li>The loader is a function which takes <code class="language-plaintext highlighter-rouge">module</code> and <code class="language-plaintext highlighter-rouge">filename</code> as arguments
<ul>
<li><code class="language-plaintext highlighter-rouge">filename</code> is self explanatory, the name of the file</li>
<li>The <code class="language-plaintext highlighter-rouge">module</code> refers to the module which I described in the previous section. By setting the <code class="language-plaintext highlighter-rouge">module.exports</code> we define how a file has to be loaded</li>
</ul>
</li>
</ul>
<h3 id="case-2-extending-js-file-syntax">Case 2: Extending JS file syntax</h3>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// cube.js</span>
<span class="kd">class</span> <span class="nx">Cube</span> <span class="p">{</span>
<span class="kd">constructor</span><span class="p">(</span><span class="nx">side</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">$</span><span class="p">.</span><span class="nx">side</span> <span class="o">=</span> <span class="nx">side</span>
<span class="p">}</span>
<span class="nx">area</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">$</span><span class="p">.</span><span class="nx">side</span> <span class="o">*</span> <span class="nx">$</span><span class="p">.</span><span class="nx">side</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// fn.js</span>
<span class="kd">const</span> <span class="nx">greet</span> <span class="o">=</span> <span class="nx">fn</span> <span class="p">(</span><span class="nx">nm</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">'</span><span class="s1">Hello </span><span class="dl">'</span> <span class="o">+</span> <span class="nx">nm</span><span class="p">)</span>
<span class="p">}</span>
<span class="nx">greet</span><span class="p">(</span><span class="dl">'</span><span class="s1">mark</span><span class="dl">'</span><span class="p">)</span>
</code></pre></div></div>
<p>For this I’m going ahead with the same loader I used in my Lua module loader article</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">transform</span> <span class="o">=</span> <span class="p">(</span><span class="nx">code</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">patterns</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span> <span class="na">patt</span><span class="p">:</span> <span class="sr">/</span><span class="se">\$</span><span class="sr">/g</span><span class="p">,</span> <span class="na">repl</span><span class="p">:</span> <span class="dl">"</span><span class="s2">this</span><span class="dl">"</span> <span class="p">},</span>
<span class="p">{</span> <span class="na">patt</span><span class="p">:</span> <span class="sr">/fn </span><span class="se">\(</span><span class="sr">/g</span><span class="p">,</span> <span class="na">repl</span><span class="p">:</span> <span class="dl">"</span><span class="s2">function (</span><span class="dl">"</span> <span class="p">},</span>
<span class="p">];</span>
<span class="nx">patterns</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">pattern</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">code</span> <span class="o">=</span> <span class="nx">code</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="nx">pattern</span><span class="p">.</span><span class="nx">patt</span><span class="p">,</span> <span class="nx">pattern</span><span class="p">.</span><span class="nx">repl</span><span class="p">);</span>
<span class="p">});</span>
<span class="k">return</span> <span class="nx">code</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>
<p>Now we additionally also have to extend the existing <code class="language-plaintext highlighter-rouge">cjs</code> loader, this is not quite straight forward as the yaml loader we made</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">oldLoader</span> <span class="o">=</span> <span class="nx">Module</span><span class="p">.</span><span class="nx">_extensions</span><span class="p">[</span><span class="dl">"</span><span class="s2">.js</span><span class="dl">"</span><span class="p">];</span>
<span class="cm">/**
* Simplified version of the code from pirates
* MIT License
* Copyright (c) 2016-2018 Ari Porad
* https://github.com/ariporad/pirates/blob/master/LICENSE
*/</span>
<span class="nx">Module</span><span class="p">.</span><span class="nx">_extensions</span><span class="p">[</span><span class="dl">"</span><span class="s2">.js</span><span class="dl">"</span><span class="p">]</span> <span class="o">=</span> <span class="kd">function</span> <span class="nx">customLoader</span><span class="p">(</span><span class="nx">mod</span><span class="p">,</span> <span class="nx">filename</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">compile</span> <span class="o">=</span> <span class="nx">mod</span><span class="p">.</span><span class="nx">_compile</span><span class="p">;</span>
<span class="nx">mod</span><span class="p">.</span><span class="nx">_compile</span> <span class="o">=</span> <span class="kd">function</span> <span class="nx">_compile</span><span class="p">(</span><span class="nx">code</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// reset the compile immediately as otherwise we end up having the</span>
<span class="c1">// compile function being changed even though this loader might be reverted</span>
<span class="c1">// Not reverting it here leads to long useless compile chains when doing</span>
<span class="c1">// addHook -> revert -> addHook -> revert -> ...</span>
<span class="c1">// The compile function is also anyway created new when the loader is called a second time.</span>
<span class="nx">mod</span><span class="p">.</span><span class="nx">_compile</span> <span class="o">=</span> <span class="nx">compile</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">newCode</span> <span class="o">=</span> <span class="nx">transform</span><span class="p">(</span><span class="nx">code</span><span class="p">);</span>
<span class="k">return</span> <span class="nx">mod</span><span class="p">.</span><span class="nx">_compile</span><span class="p">(</span><span class="nx">newCode</span><span class="p">,</span> <span class="nx">filename</span><span class="p">);</span>
<span class="p">};</span>
<span class="c1">// Run the original loader</span>
<span class="c1">// https://github.com/nodejs/node/blob/master/lib/internal/modules/cjs/loader.js#L1118-L1139</span>
<span class="nx">oldLoader</span><span class="p">(</span><span class="nx">mod</span><span class="p">,</span> <span class="nx">filename</span><span class="p">);</span>
<span class="p">};</span>
</code></pre></div></div>
<h4 id="how-it-works">How it works</h4>
<p>JS files will have to be compiled for them to work, since a new compile function is too complex, we will have to rely on the old one</p>
<ol>
<li>Store the original <code class="language-plaintext highlighter-rouge">CJS</code> loader</li>
<li>Creating a new <code class="language-plaintext highlighter-rouge">mod._compile</code> function</li>
<li>Store the original <code class="language-plaintext highlighter-rouge">_compile</code> function</li>
<li>Run transform over the original code</li>
<li>Run <code class="language-plaintext highlighter-rouge">mod._compile</code> the original compile function over the new code</li>
<li>Run the original <code class="language-plaintext highlighter-rouge">CJS</code> loader, this will now compile the module</li>
</ol>
<hr />
<h2 id="reference">Reference</h2>
<ul>
<li><a href="https://nodejs.org/api/modules.html">NodeJS Module API</a></li>
<li><a href="https://github.com/nodejs/node/blob/master/lib/module.js">Module.js</a></li>
<li><a href="http://fredkschott.com/post/2014/06/require-and-the-module-system/">Require System</a></li>
</ul>
<h2 id="see-also">See Also</h2>
<ul>
<li><a href="https://github.com/Mark1626/Paraphernalia/tree/master/node-module-extensions">Source Code of the examples</a></li>
<li><a href="https://github.com/ariporad/pirates">Pirates - require hijack</a></li>
</ul>NimalanSimilar to the article I wrote about Lua module loaders, the same can be done in nodejs. We can override the require to be able to load custom extensions Wait a minute is there any practical use case other than a syntax hack? Well then how about we build a loader so nodejs can read yaml require("./loader") const config = require("config.yaml") const port = config.port || 3000; const http = require('http'); const requestListener = function (req, res) { res.writeHead(200); res.end('Hello, World!'); } const server = http.createServer(requestListener); server.listen(config.port); This looks way more practical so let’s begin. TLDR; Leaving a link to the source code I made for this exampleLua Module Loader2021-04-21T00:00:00+00:002021-04-21T00:00:00+00:00https://mark1626.github.io/posts/2021/04/21/lua-module-loader<p>What if I told you there is a way to extend the default syntax of Lua? In the below example I use <code class="language-plaintext highlighter-rouge">@</code> as an alias for <code class="language-plaintext highlighter-rouge">self</code>, and <code class="language-plaintext highlighter-rouge">fn</code> as an alias of <code class="language-plaintext highlighter-rouge">function</code>. This and may more can be achived with the module loader in Lua</p>
<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- myfile.lua</span>
<span class="kd">local</span> <span class="n">greet</span> <span class="o">=</span> <span class="n">fn</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Hello '</span> <span class="o">..</span> <span class="n">name</span><span class="p">)</span>
<span class="k">end</span>
<span class="n">greet</span><span class="p">(</span><span class="s1">'mark1626'</span><span class="p">)</span>
<span class="c1">---</span>
<span class="c1">-- cube.lua</span>
<span class="n">Cube</span> <span class="o">=</span> <span class="p">{</span><span class="n">a</span> <span class="o">=</span> <span class="mi">1</span><span class="p">}</span>
<span class="k">function</span> <span class="nf">Cube</span><span class="p">:</span><span class="n">new</span><span class="p">(</span><span class="n">o</span><span class="p">)</span>
<span class="n">o</span> <span class="o">=</span> <span class="n">o</span> <span class="ow">or</span> <span class="p">{}</span>
<span class="nb">setmetatable</span><span class="p">(</span><span class="n">o</span><span class="p">,</span> <span class="err">@</span><span class="p">)</span>
<span class="err">@</span><span class="p">.</span><span class="n">__index</span> <span class="o">=</span> <span class="err">@</span>
<span class="k">return</span> <span class="n">o</span>
<span class="k">end</span>
<span class="k">function</span> <span class="nf">Cube</span><span class="p">:</span><span class="n">area</span><span class="p">()</span>
<span class="k">return</span> <span class="err">@</span><span class="p">.</span><span class="n">a</span> <span class="o">*</span> <span class="err">@</span><span class="p">.</span><span class="n">a</span>
<span class="k">end</span>
<span class="k">return</span> <span class="n">Cube</span>
</code></pre></div></div>
<!--more-->
<h2 id="module-loader">Module Loader</h2>
<p>The <a href="http://www.lua.org/manual/5.1/manual.html#pdf-package.loaders">package loaders</a> is a table used by require to load modules. This can be overridden with a simple <code class="language-plaintext highlighter-rouge">table.insert</code> to add our custom logic</p>
<p>In the below I implement a custom loader that runs a function <code class="language-plaintext highlighter-rouge">transform</code> over the contents of the file</p>
<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">local</span> <span class="n">custom_loader</span> <span class="o">=</span> <span class="k">function</span><span class="p">(</span><span class="n">modulename</span><span class="p">)</span>
<span class="kd">local</span> <span class="n">modulepath</span> <span class="o">=</span> <span class="nb">string.gsub</span><span class="p">(</span><span class="n">modulename</span><span class="p">,</span> <span class="s2">"%."</span><span class="p">,</span> <span class="s2">"/"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">path</span> <span class="k">in</span> <span class="nb">string.gmatch</span><span class="p">(</span><span class="nb">package.path</span><span class="p">,</span> <span class="s2">"([^;]+)"</span><span class="p">)</span> <span class="k">do</span>
<span class="kd">local</span> <span class="n">filename</span> <span class="o">=</span> <span class="nb">string.gsub</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s2">"%?"</span><span class="p">,</span> <span class="n">modulepath</span><span class="p">)</span>
<span class="kd">local</span> <span class="n">file</span> <span class="o">=</span> <span class="nb">io.open</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="s2">"rb"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">file</span> <span class="k">then</span>
<span class="kd">local</span> <span class="n">content</span> <span class="o">=</span> <span class="nb">assert</span><span class="p">(</span><span class="n">file</span><span class="p">:</span><span class="n">read</span><span class="p">(</span><span class="s2">"*a"</span><span class="p">))</span>
<span class="kd">local</span> <span class="n">transformed_file</span> <span class="o">=</span> <span class="n">transform</span><span class="p">(</span><span class="n">content</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">assert</span><span class="p">(</span><span class="n">loadstring</span><span class="p">(</span><span class="n">transformed_file</span><span class="p">,</span> <span class="n">modulename</span><span class="p">))</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">return</span> <span class="s2">"Unable to load file "</span> <span class="o">..</span> <span class="n">modulename</span>
<span class="k">end</span>
<span class="c1">-- Override the default loader</span>
<span class="nb">table.insert</span><span class="p">(</span><span class="n">package</span><span class="p">.</span><span class="n">loaders</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">custom_loader</span><span class="p">)</span>
</code></pre></div></div>
<blockquote>
<p><strong>Note:</strong> This will work on all the future require after importing this module
<strong>Note:</strong> In Lua 5.3 this is called <a href="http://www.lua.org/manual/5.3/manual.html#pdf-package.searchers">package searchers</a></p>
</blockquote>
<h2 id="transformation-use-case">Transformation Use Case</h2>
<p>Now let’s have a look at the first example I posted</p>
<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">local</span> <span class="n">greet</span> <span class="o">=</span> <span class="n">fn</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Hello '</span> <span class="o">..</span> <span class="n">name</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>This is the transform function I use to achieve this uses a simple string replace. <code class="language-plaintext highlighter-rouge">fn(</code> is converted into <code class="language-plaintext highlighter-rouge">function(</code></p>
<div class="language-lua highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- This will transform some patterns in our file</span>
<span class="kd">local</span> <span class="k">function</span> <span class="nf">transform</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
<span class="kd">local</span> <span class="n">str</span> <span class="o">=</span> <span class="n">s</span>
<span class="kd">local</span> <span class="n">int</span> <span class="o">=</span> <span class="s2">"([%d]+)"</span>
<span class="kd">local</span> <span class="n">patterns</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">{</span> <span class="n">patt</span> <span class="o">=</span> <span class="s2">"@"</span><span class="p">,</span> <span class="n">repl</span> <span class="o">=</span> <span class="s2">"self"</span> <span class="p">},</span>
<span class="p">{</span> <span class="n">patt</span> <span class="o">=</span> <span class="s2">"&&"</span><span class="p">,</span> <span class="n">repl</span> <span class="o">=</span> <span class="s2">" and "</span> <span class="p">},</span>
<span class="p">{</span> <span class="n">patt</span> <span class="o">=</span> <span class="s2">"||"</span><span class="p">,</span> <span class="n">repl</span> <span class="o">=</span> <span class="s2">" or "</span> <span class="p">},</span>
<span class="p">{</span> <span class="n">patt</span> <span class="o">=</span> <span class="s2">"fn%("</span><span class="p">,</span> <span class="n">repl</span> <span class="o">=</span> <span class="s2">"function("</span> <span class="p">},</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">v</span> <span class="k">in</span> <span class="nb">ipairs</span><span class="p">(</span><span class="n">patterns</span><span class="p">)</span> <span class="k">do</span> <span class="n">str</span> <span class="o">=</span> <span class="n">str</span><span class="p">:</span><span class="nb">gsub</span><span class="p">(</span><span class="err">v.patt</span><span class="p">,</span> <span class="err">v.repl</span><span class="p">)</span> <span class="k">end</span>
<span class="k">return</span> <span class="n">str</span>
<span class="k">end</span>
</code></pre></div></div>
<h2 id="references">References</h2>
<ul>
<li><a href="https://pgl.yoyo.org/luai/i/package.loaders">Package Loaders - Luai</a></li>
<li><a href="http://lua-users.org/wiki/LuaModulesLoader">Lua Module Loader - Lua Users</a></li>
</ul>
<h2 id="see-also">See Also</h2>
<ul>
<li>First saw this pattern in usage <a href="https://github.com/4v0v/k1n3m4t1ks/blob/master/monkey.lua">here</a>, kudos to the author <a href="https://github.com/4v0v">4v0v</a></li>
<li><a href="https://github.com/Mark1626/Paraphernalia/tree/master/lua-package-loader">Example</a></li>
</ul>NimalanWhat if I told you there is a way to extend the default syntax of Lua? In the below example I use @ as an alias for self, and fn as an alias of function. This and may more can be achived with the module loader in Lua -- myfile.lua local greet = fn(name) print('Hello ' .. name) end greet('mark1626') --- -- cube.lua Cube = {a = 1} function Cube:new(o) o = o or {} setmetatable(o, @) @.__index = @ return o end function Cube:area() return @.a * @.a end return CubeThe Drunken Bishop2021-03-16T00:00:00+00:002021-03-16T00:00:00+00:00https://mark1626.github.io/posts/2021/03/16/the-drunken-bishop<p>Have you ever seen something similar to this??</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>+----[RSA 2048]---+
| . o.+o .|
| . + * +o...|
| + * .. ... |
| o + . . |
| S o . |
| o . |
| . o|
| .o|
| Eo|
+------[MD5]------+
</code></pre></div></div>
<p>Usually when we create a new SSH key there is an image printed at the last step and we mostly ignore it, this time I wanted to understand what this image meant.</p>
<p>The image is a visualization of the fingerprint created with the Drunken Bishop Algorithm</p>
<!--more-->
<h3 id="what-is-this-image">What is this image?</h3>
<p>The Drunken Bishop Algorithm is an algorithm used by OpenSSH for visualizing the fingerprints of SSH keys.</p>
<h3 id="why-do-we-need-a-fingerprint-visualization-algorithm">Why do we need a Fingerprint Visualization Algorithm?</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>❯ ssh [email protected]
Do you want to connect to authentic host with fingeprint fc94b0c1e5b0987c5533997697ee9fb7
+-----------------+
| . o.+o .|
| . + * +o...|
| + * .. ... |
| o + . . |
| S o . |
| o . |
| . o|
| .o|
| Eo|
+-----------------+
❯ ssh user@my.__host
Do you want to connect to malicious host with fingeprint fc94b0c1e5b0987c5843997697ee9fb7
+-----------------+
| .=o. . |
| . *+*. o |
| =.*..o |
| o + .. |
| S o. |
| o . |
| . . . |
| o .|
| E.|
+-----------------+
</code></pre></div></div>
<p>In the above example it would be nearly impossible to find out that the host is different just by looking at the hashes <code class="language-plaintext highlighter-rouge">fc94b0c1e5b0987c5533997697ee9fb7</code> and <code class="language-plaintext highlighter-rouge">fc94b0c1e5b0987c5843997697ee9fb7</code>. The visual aid can make a big difference in this scenario</p>
<h3 id="how-does-it-work">How does it work?</h3>
<p>Consider a bishop in the center of a <code class="language-plaintext highlighter-rouge">17x9</code> board</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0 1 2 3 4 5 6 ... 14 15 16
17 18 19 20 21 ....... 31 32 33
...
... 76
...
...
136 137 .... 150 151 152
</code></pre></div></div>
<p>Each bit of the fingerprint represents the path traversed by the bishop.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Fingerprint fc : 94 : ... : b7
Bits 11 11 11 00 : 10 01 01 00 : ... : 10 11 01 11
| | | | | | | | | | | |
Step 4 3 2 1 8 7 6 5 ... 64 63 62 61
</code></pre></div></div>
<p>For each bit we move in the following direction</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00 - Upper left
01 - Upper right
10 - Lower right
11 - Lower left
</code></pre></div></div>
<p>When the bishop tries to move outside the grid it stays in the same square it started from</p>
<p>When the bishop visits a grid we increment a count on how many times it has visited that grid</p>
<p>To visualize the grid we use the following characters. <code class="language-plaintext highlighter-rouge">S</code> and <code class="language-plaintext highlighter-rouge">E</code> are special characters we use to mark the start and end of the bishop’s journey</p>
<table>
<thead>
<tr>
<th>Value</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
<th>10</th>
<th>11</th>
<th>12</th>
<th>13</th>
<th>14</th>
<th>15</th>
<th>16</th>
</tr>
</thead>
<tbody>
<tr>
<td>Character</td>
<td> </td>
<td>.</td>
<td>o</td>
<td>+</td>
<td>=</td>
<td>*</td>
<td>B</td>
<td>O</td>
<td>X</td>
<td>@</td>
<td>%</td>
<td>&</td>
<td>#</td>
<td>/</td>
<td>^</td>
<td>S</td>
<td>E</td>
</tr>
</tbody>
</table>
<h3 id="is-is-this-good-to-rely-on-visualization">Is is this good to rely on visualization?</h3>
<p>The next question we may have is that, can there be two hashes with similar visuals. The answer is yes, but finding one would be similar to finding a hash collision</p>
<p>These visualization are surprisingly resistant to collision</p>
<h3 id="how-to-enable-this-in-ssh-config">How to Enable this in SSH Config</h3>
<p>You can make it be printed when doing SSH related operations by setting</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// ~/.ssh/config
VisualHostKey true
</code></pre></div></div>
<h3 id="references-and-further-links">References and Further links</h3>
<ul>
<li><a href="http://www.dirk-loss.de/sshvis/drunken_bishop.pdf">White Paper</a></li>
<li><a href="https://github.com/Mark1626/Paraphernalia/blob/master/drunken-bishop/main.js">Implementation in Node</a></li>
</ul>
<p>Retro video games used similar approach of relying on visuals for checkpoints. An example of how this was used in Castlevania 3</p>
<ul>
<li><a href="https://meatfighter.com/castlevania3-password/">Castlevania 3 Password Algorithm</a></li>
</ul>NimalanHave you ever seen something similar to this?? +----[RSA 2048]---+ | . o.+o .| | . + * +o...| | + * .. ... | | o + . . | | S o . | | o . | | . o| | .o| | Eo| +------[MD5]------+ Usually when we create a new SSH key there is an image printed at the last step and we mostly ignore it, this time I wanted to understand what this image meant. The image is a visualization of the fingerprint created with the Drunken Bishop AlgorithmReverse Engineering Code Art - Part 6 - Noisy Donuts2021-03-09T00:00:00+00:002021-03-09T00:00:00+00:00https://mark1626.github.io/posts/2021/03/09/art-reverse-engineering-part-6<p>Based on this <a href="https://www.dwitter.net/d/21426">dwitter</a> by <a href="https://www.dwitter.net/u/Pascal">Pascal</a></p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="nx">c</span><span class="p">.</span><span class="nx">width</span><span class="o">=</span><span class="nx">w</span><span class="o">=</span><span class="mi">70</span><span class="p">;</span><span class="k">for</span><span class="p">(</span><span class="nx">i</span><span class="o">=</span><span class="mi">2800</span><span class="p">;</span><span class="nx">i</span><span class="o">--</span><span class="p">;)</span><span class="nx">x</span><span class="p">.</span><span class="nx">fillRect</span><span class="p">(</span><span class="nx">X</span><span class="o">=</span><span class="nx">i</span><span class="o">%</span><span class="nx">w</span><span class="p">,</span><span class="nx">Y</span><span class="o">=</span><span class="p">(</span><span class="nx">i</span><span class="o">-</span><span class="nx">X</span><span class="p">)</span><span class="o">/</span><span class="nx">w</span><span class="p">,</span><span class="mi">0</span><span class="o"><</span><span class="nx">T</span><span class="p">(</span><span class="nx">T</span><span class="p">(</span><span class="nx">S</span><span class="p">(</span><span class="nx">X</span><span class="o">/</span><span class="mi">7</span><span class="o">+</span><span class="mi">5</span><span class="o">*</span><span class="nx">C</span><span class="p">(</span><span class="nx">t</span><span class="p">))</span><span class="o">+</span><span class="nx">S</span><span class="p">(</span><span class="nx">Y</span><span class="o">/</span><span class="mi">7</span><span class="o">-</span><span class="mi">5</span><span class="o">*</span><span class="nx">t</span><span class="p">))),</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="/assets/images/noisy_donuts.png" alt="Noisy Donuts" /></p>
<!--more-->
<hr />
<h2 id="format--analysis">Format & Analysis</h2>
<p>The code is simple, there is a lot of math to explain</p>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="kd">const</span> <span class="nx">u</span> <span class="o">=</span> <span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">c</span><span class="p">.</span><span class="nx">width</span> <span class="o">=</span> <span class="nx">w</span> <span class="o">=</span> <span class="mi">70</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="nx">i</span> <span class="o">=</span> <span class="mi">2800</span><span class="p">;</span> <span class="nx">i</span><span class="o">--</span><span class="p">;</span> <span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">X</span> <span class="o">=</span> <span class="nx">i</span> <span class="o">%</span> <span class="nx">w</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">Y</span> <span class="o">=</span> <span class="p">(</span><span class="nx">i</span> <span class="o">-</span> <span class="nx">X</span><span class="p">)</span> <span class="o">/</span> <span class="nx">w</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">angle</span> <span class="o">=</span> <span class="nx">S</span><span class="p">(</span><span class="nx">X</span> <span class="o">/</span> <span class="mi">7</span> <span class="o">+</span> <span class="mi">5</span> <span class="o">*</span> <span class="nx">t</span><span class="p">)</span> <span class="o">+</span> <span class="nx">S</span><span class="p">(</span><span class="nx">Y</span> <span class="o">/</span> <span class="mi">7</span> <span class="o">-</span> <span class="mi">5</span> <span class="o">*</span> <span class="nx">t</span><span class="p">)</span>
<span class="kd">const</span> <span class="nx">val</span> <span class="o">=</span> <span class="nx">T</span><span class="p">(</span><span class="nx">T</span><span class="p">(</span><span class="nx">angle</span><span class="p">));</span>
<span class="nx">x</span><span class="p">.</span><span class="nx">fillRect</span><span class="p">(</span><span class="nx">X</span><span class="p">,</span> <span class="nx">Y</span><span class="p">,</span> <span class="mi">0</span> <span class="o"><</span> <span class="nx">val</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>
<p>I’m making a small change to make things easier to understand, isolating the moving effect to one direction by changing <code class="language-plaintext highlighter-rouge">S(X / 7 + 5 * C(t))</code> to <code class="language-plaintext highlighter-rouge">S(X / 7 + 5 * t)</code></p>
<p>Our main focus is around two functions</p>
<h3 id="sinx--sinx">sin(x) + sin(x)</h3>
<p>This is rather a simple function</p>
<p><img src="/assets/images/sin_sin.png" alt="Sin Sin" /></p>
<p>We can visualizing it by</p>
<p><code class="language-plaintext highlighter-rouge">x.fillRect(X, Y, S(X / 7 + 5 * t) + S(Y / 7 - 5 * t), 1)</code></p>
<p>It’s seeing the 2D graph in a 1D pixel space</p>
<p>Another example <a href="https://www.dwitter.net/d/21403">dwitter</a>, but with <code class="language-plaintext highlighter-rouge">sin(x) + cos(x)</code></p>
<h3 id="tantanx">tan(tan(x))</h3>
<p>The function <code class="language-plaintext highlighter-rouge">y = tan(tan(x))</code> has multiple bands.</p>
<p><img src="/assets/images/tan_tan.png" alt="Tan Tan" /></p>
<p>In each band these are infinitely transitions from <code class="language-plaintext highlighter-rouge">Inf</code> to <code class="language-plaintext highlighter-rouge">-Inf</code>, this causes the noise effect that we see</p>
<p><img src="/assets/images/tan_tan_band.png" alt="Tan Tan Band" /></p>
<p>The below is the function from x <code class="language-plaintext highlighter-rouge">-1.6</code> to <code class="language-plaintext highlighter-rouge">-1.5</code></p>
<p><img src="/assets/images/tan_tan_visual_1.png" alt="Tan Tan Visual 1" />
<img src="/assets/images/tan_tan_visual_2.png" alt="Tan Tan Visual 2" /></p>
<p>By adding the factor of time the noise circles behave chaotically</p>NimalanBased on this dwitter by Pascal c.width=w=70;for(i=2800;i--;)x.fillRect(X=i%w,Y=(i-X)/w,0<T(T(S(X/7+5*C(t))+S(Y/7-5*t))),1)