<web>EuAndreh</web>EuAndreh’s bloghttps://euandre.org/feed.articles.en.atom2023-09-19T11:20:48-03:00EuAndreh[email protected]A Relational Model of Data for Large Shared Data Banks - article-review2021-04-29T00:00:00-03:002021-04-29T00:00:00-03:00https://euandre.org/2021/04/29/a-relational-model-of-data-for-large-shared-data-banks-article-review.html
<p>This is a review of the article “<a href="https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf">A Relational Model of Data for Large Shared Data Banks</a>”, by E. F. Codd.</p>
<h2 id="data-independence">Data Independence</h2>
<p>Codd brings the idea of <em>data independence</em> as a better approach to use on databases.
This is contrast with the existing approaches, namely hierarquical (tree-based) and network-based.</p>
<p>His main argument is that queries in applications shouldn’t depende and be coupled with how the data is represented internally by the database system.
This key idea is very powerful, and something that we strive for in many other places: decoupling the interface from the implementation.</p>
<p>If the database system has this separation, it can kep the querying interface stable, while having the freedom to change its internal representation at will, for better performance, less storage, etc.</p>
<p>This is true for most modern database systems.
They can change from B-Trees with leafs containing pointers to data, to B-Trees with leafs containing the raw data , to hash tables.
All that without changing the query interface, only its performance.</p>
<p>Codd mentions that, from an information representation standpoint, any index is a duplication, but useful for perfomance.</p>
<p>This data independence also impacts ordering (a <em>relation</em> doesn’t rely on the insertion order).</p>
<h2 id="duplicates">Duplicates</h2>
<p>His definition of relational data is a bit differente from most modern database systems, namely <strong>no duplicate rows</strong>.</p>
<p>I couldn’t find a reason behind this restriction, though.
For practical purposes, I find it useful to have it.</p>
<h2 id="relational-data">Relational Data</h2>
<p>In the article, Codd doesn’t try to define a language, and today’s most popular one is SQL.</p>
<p>However, there is no restriction that says that “SQL database” and “relational database” are synonyms.
One could have a relational database without using SQL at all, and it would still be a relational one.</p>
<p>The main one that I have in mind, and the reason that led me to reading this paper in the first place, is Datomic.</p>
<p>Is uses an <a href="https://github.com/edn-format/edn">edn</a>-based representation for datalog queries<sup id="fnref:edn-queries" role="doc-noteref"><a href="#fn:edn-queries" class="footnote" rel="footnote">1</a></sup>, and a particular schema used to represent data.</p>
<p>Even though it looks very weird when coming from SQL, I’d argue that it ticks all the boxes (except for “no duplicates”) that defines a relational database, since building relations and applying operations on them is possible.</p>
<p>Compare and contrast a contrived example of possible representations of SQL and datalog of the same data:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="rouge-code"><pre><span class="c1">-- create schema</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">people</span> <span class="p">(</span>
<span class="n">id</span> <span class="n">UUID</span> <span class="k">PRIMARY</span> <span class="k">KEY</span><span class="p">,</span>
<span class="n">name</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">manager_id</span> <span class="n">UUID</span><span class="p">,</span>
<span class="k">FOREIGN</span> <span class="k">KEY</span> <span class="p">(</span><span class="n">manager_id</span><span class="p">)</span> <span class="k">REFERENCES</span> <span class="n">people</span> <span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="p">);</span>
<span class="c1">-- insert data</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">people</span> <span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">manager_id</span><span class="p">)</span> <span class="k">VALUES</span>
<span class="p">(</span><span class="nv">"d3f29960-ccf0-44e4-be66-1a1544677441"</span><span class="p">,</span> <span class="nv">"Foo"</span><span class="p">,</span> <span class="nv">"076356f4-1a0e-451c-b9c6-a6f56feec941"</span><span class="p">),</span>
<span class="p">(</span><span class="nv">"076356f4-1a0e-451c-b9c6-a6f56feec941"</span><span class="p">,</span> <span class="nv">"Bar"</span><span class="p">);</span>
<span class="c1">-- query data, make a relation</span>
<span class="k">SELECT</span> <span class="n">employees</span><span class="p">.</span><span class="n">name</span> <span class="k">AS</span> <span class="s1">'employee-name'</span><span class="p">,</span>
<span class="n">managers</span><span class="p">.</span><span class="n">name</span> <span class="k">AS</span> <span class="s1">'manager-name'</span>
<span class="k">FROM</span> <span class="n">people</span> <span class="n">employees</span>
<span class="k">INNER</span> <span class="k">JOIN</span> <span class="n">people</span> <span class="n">managers</span> <span class="k">ON</span> <span class="n">employees</span><span class="p">.</span><span class="n">manager_id</span> <span class="o">=</span> <span class="n">managers</span><span class="p">.</span><span class="n">id</span><span class="p">;</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
</pre></td><td class="rouge-code"><pre>;; create schema
#{ {:db/ident :person/id
:db/valueType :db.type/uuid
:db/cardinality :db.cardinality/one
:db/unique :db.unique/value}
{:db/ident :person/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one}
{:db/ident :person/manager
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/one}}
;; insert data
#{ {:person/id #uuid "d3f29960-ccf0-44e4-be66-1a1544677441"
:person/name "Foo"
:person/manager [:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941"]}
{:person/id #uuid "076356f4-1a0e-451c-b9c6-a6f56feec941"
:person/name "Bar"}}
;; query data, make a relation
{:find [?employee-name ?manager-name]
:where [[?person :person/name ?employee-name]
[?person :person/manager ?manager]
[?manager :person/name ?manager-name]]}
</pre></td></tr></tbody></table></code></pre></div></div>
<p>(forgive any errors on the above SQL and datalog code, I didn’t run them to check. Patches welcome!)</p>
<p>This employee example comes from the paper, and both SQL and datalog representations match the paper definition of “relational”.</p>
<p>Both “Foo” and “Bar” are employees, and the data is normalized.
SQL represents data as tables, and Datomic as datoms, but relations could be derived from both, which we could view as:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>employee_name | manager_name
----------------------------
"Foo" | "Bar"
</pre></td></tr></tbody></table></code></pre></div></div>
<h2 id="conclusion">Conclusion</h2>
<p>The article also talks about operators, consistency and normalization, which are now so widespread and well-known that it feels a bit weird seeing someone advocating for it.</p>
<p>I also stablish that <code class="language-plaintext highlighter-rouge">relational != SQL</code>, and other databases such as Datomic are also relational, following Codd’s original definition.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:edn-queries" role="doc-endnote">
<p>You can think of it as JSON, but with a Clojure taste. <a href="#fnref:edn-queries" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
EuAndreh[email protected]This is a review of the article “A Relational Model of Data for Large Shared Data Banks”, by E. F. Codd.ANN: fallible - Fault injection library for stress-testing failure scenarios2021-02-17T00:00:00-03:002022-03-06T00:00:00-03:00https://euandre.org/2021/02/17/ann-fallible-fault-injection-library-for-stress-testing-failure-scenarios.html
<p>Yesterday I pushed v0.1.0 of <a href="https://euandreh.xyz/fallible/">fallible</a>, a miniscule library for fault-injection
and stress-testing C programs.</p>
<h2 id="edit"><em>EDIT</em></h2>
<p>2021-06-12: As of <a href="https://euandreh.xyz/fallible/CHANGELOG.html">0.3.0</a> (and beyond), the macro interface improved and is a bit different from what is presented in this article. If you’re interested, I encourage you to take a look at it.</p>
<p>2022-03-06: I’ve <a href="https://euandre.org/static/attachments/fallible.tar.gz">archived</a> the project for now. It still needs some maturing before being usable.</p>
<h2 id="existing-solutions">Existing solutions</h2>
<p>Writing robust code can be challenging, and tools like static analyzers, fuzzers and friends can help you get there with more certainty.
As I would try to improve some of my C code and make it more robust, in order to handle system crashes, filled disks, out-of-memory and similar scenarios, I didn’t find existing tooling to help me get there as I expected to find.
I couldn’t find existing tools to help me explicitly stress-test those failure scenarios.</p>
<p>Take the “<a href="https://www.gnu.org/prep/standards/standards.html#Semantics">Writing Robust Programs</a>” section of the GNU Coding Standards:</p>
<blockquote>
<p>Check every system call for an error return, unless you know you wish to ignore errors.
(…) Check every call to malloc or realloc to see if it returned NULL.</p>
</blockquote>
<p>From a robustness standpoint, this is a reasonable stance: if you want to have a robust program that knows how to fail when you’re out of memory and <code class="language-plaintext highlighter-rouge">malloc</code> returns <code class="language-plaintext highlighter-rouge">NULL</code>, than you ought to check every call to <code class="language-plaintext highlighter-rouge">malloc</code>.</p>
<p>Take a sample code snippet for clarity:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="rouge-code"><pre><span class="kt">void</span> <span class="nf">a_function</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">s1</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">A_NUMBER</span><span class="p">);</span>
<span class="n">strcpy</span><span class="p">(</span><span class="n">s1</span><span class="p">,</span> <span class="s">"some string"</span><span class="p">);</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">s2</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">A_NUMBER</span><span class="p">);</span>
<span class="n">strcpy</span><span class="p">(</span><span class="n">s2</span><span class="p">,</span> <span class="s">"another string"</span><span class="p">);</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>At a first glance, this code is unsafe: if any of the calls to <code class="language-plaintext highlighter-rouge">malloc</code> returns <code class="language-plaintext highlighter-rouge">NULL</code>, <code class="language-plaintext highlighter-rouge">strcpy</code> will be given a <code class="language-plaintext highlighter-rouge">NULL</code> pointer.</p>
<p>My first instinct was to change this code to something like this:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="rouge-code"><pre><span class="p">@@ -1,7 +1,15 @@</span>
void a_function() {
char *s1 = malloc(A_NUMBER);
<span class="gi">+ if (!s1) {
+ fprintf(stderr, "out of memory, exitting\n");
+ exit(1);
+ }
</span> strcpy(s1, "some string");
char *s2 = malloc(A_NUMBER);
<span class="gi">+ if (!s2) {
+ fprintf(stderr, "out of memory, exitting\n");
+ exit(1);
+ }
</span> strcpy(s2, "another string");
}
</pre></td></tr></tbody></table></code></pre></div></div>
<p>As I later found out, there are at least 2 problems with this approach:</p>
<ol>
<li><strong>it doesn’t compose</strong>: this could arguably work if <code class="language-plaintext highlighter-rouge">a_function</code> was <code class="language-plaintext highlighter-rouge">main</code>.
But if <code class="language-plaintext highlighter-rouge">a_function</code> lives inside a library, an <code class="language-plaintext highlighter-rouge">exit(1);</code> is a inelegant way of handling failures, and will catch the top-level <code class="language-plaintext highlighter-rouge">main</code> consuming the library by surprise;</li>
<li><strong>it gives up instead of handling failures</strong>: the actual handling goes a bit beyond stopping.
What about open file handles, in-memory caches, unflushed bytes, etc.?</li>
</ol>
<p>If you could force only the second call to <code class="language-plaintext highlighter-rouge">malloc</code> to fail, <a href="https://www.valgrind.org/">Valgrind</a> would correctly complain that the program exitted with unfreed memory.</p>
<p>So the last change to make the best version of the above code is:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="rouge-code"><pre><span class="p">@@ -1,15 +1,14 @@</span>
<span class="gd">-void a_function() {
</span><span class="gi">+bool a_function() {
</span> char *s1 = malloc(A_NUMBER);
if (!s1) {
<span class="gd">- fprintf(stderr, "out of memory, exitting\n");
- exit(1);
</span><span class="gi">+ return false;
</span> }
strcpy(s1, "some string");
char *s2 = malloc(A_NUMBER);
if (!s2) {
<span class="gd">- fprintf(stderr, "out of memory, exitting\n");
- exit(1);
</span><span class="gi">+ free(s1);
+ return false;
</span> }
strcpy(s2, "another string");
}
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Instead of returning <code class="language-plaintext highlighter-rouge">void</code>, <code class="language-plaintext highlighter-rouge">a_function</code> now returns <code class="language-plaintext highlighter-rouge">bool</code> to indicate whether an error ocurred during its execution.
If <code class="language-plaintext highlighter-rouge">a_function</code> returned a pointer to something, the return value could be <code class="language-plaintext highlighter-rouge">NULL</code>, or an <code class="language-plaintext highlighter-rouge">int</code> that represents an error code.</p>
<p>The code is now a) safe and b) failing gracefully, returning the control to the caller to properly handle the error case.</p>
<p>After seeing similar patterns on well designed APIs, I adopted this practice for my own code, but was still left with manually verifying the correctness and robustness of it.</p>
<p>How could I add assertions around my code that would help me make sure the <code class="language-plaintext highlighter-rouge">free(s1);</code> exists, before getting an error report?
How do other people and projects solve this?</p>
<p>From what I could see, either people a) hope for the best, b) write safe code but don’t strees-test it or c) write ad-hoc code to stress it.</p>
<p>The most proeminent case of c) is SQLite: it has a few wrappers around the familiar <code class="language-plaintext highlighter-rouge">malloc</code> to do fault injection, check for memory limits, add warnings, create shim layers for other environments, etc.
All of that, however, is tightly couple with SQLite itself, and couldn’t be easily pulled off for using somewhere else.</p>
<p>When searching for it online, an <a href="https://stackoverflow.com/questions/1711170/unit-testing-for-failed-malloc">interesting thread</a> caught my atention: fail the call to <code class="language-plaintext highlighter-rouge">malloc</code> for each time it is called, and when the same stacktrace appears again, allow it to proceed.</p>
<h2 id="implementation">Implementation</h2>
<p>A working implementation of that already exists: <a href="https://github.com/ralight/mallocfail">mallocfail</a>.
It uses <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code> to replace <code class="language-plaintext highlighter-rouge">malloc</code> at run-time, computes the SHA of the stacktrace and fails once for each SHA.</p>
<p>I initially envisioned and started implementing something very similar to mallocfail.
However I wanted it to go beyond out-of-memory scenarios, and using <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code> for every possible corner that could fail wasn’t a good idea on the long run.</p>
<p>Also, mallocfail won’t work together with tools such as Valgrind, who want to do their own override of <code class="language-plaintext highlighter-rouge">malloc</code> with <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code>.</p>
<p>I instead went with less automatic things: starting with a <code class="language-plaintext highlighter-rouge">fallible_should_fail(char *filename, int lineno)</code> function that fails once for each <code class="language-plaintext highlighter-rouge">filename</code>+<code class="language-plaintext highlighter-rouge">lineno</code> combination, I created macro wrappers around common functions such as <code class="language-plaintext highlighter-rouge">malloc</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="rouge-code"><pre><span class="kt">void</span> <span class="o">*</span><span class="nf">fallible_malloc</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">size</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="k">const</span> <span class="n">filename</span><span class="p">,</span> <span class="kt">int</span> <span class="n">lineno</span><span class="p">)</span> <span class="p">{</span>
<span class="cp">#ifdef FALLIBLE
</span> <span class="k">if</span> <span class="p">(</span><span class="n">fallible_should_fail</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">lineno</span><span class="p">))</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="cp">#else
</span> <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">filename</span><span class="p">;</span>
<span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">lineno</span><span class="p">;</span>
<span class="cp">#endif
</span> <span class="k">return</span> <span class="n">malloc</span><span class="p">(</span><span class="n">size</span><span class="p">);</span>
<span class="p">}</span>
<span class="cp">#define MALLOC(size) fallible_malloc(size, __FILE__, __LINE__)
</span></pre></td></tr></tbody></table></code></pre></div></div>
<p>With this definition, I could replace the calls to <code class="language-plaintext highlighter-rouge">malloc</code> with <code class="language-plaintext highlighter-rouge">MALLOC</code> (or any other name that you want to <code class="language-plaintext highlighter-rouge">#define</code>):</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="rouge-code"><pre><span class="gd">--- 3.c 2021-02-17 00:15:38.019706074 -0300
</span><span class="gi">+++ 4.c 2021-02-17 00:44:32.306885590 -0300
</span><span class="p">@@ -1,11 +1,11 @@</span>
bool a_function() {
<span class="gd">- char *s1 = malloc(A_NUMBER);
</span><span class="gi">+ char *s1 = MALLOC(A_NUMBER);
</span> if (!s1) {
return false;
}
strcpy(s1, "some string");
- char *s2 = malloc(A_NUMBER);
<span class="gi">+ char *s2 = MALLOC(A_NUMBER);
</span> if (!s2) {
free(s1);
return false;
</pre></td></tr></tbody></table></code></pre></div></div>
<p>With this change, if the program gets compiled with the <code class="language-plaintext highlighter-rouge">-DFALLIBLE</code> flag the fault-injection mechanism will run, and <code class="language-plaintext highlighter-rouge">MALLOC</code> will fail once for each <code class="language-plaintext highlighter-rouge">filename</code>+<code class="language-plaintext highlighter-rouge">lineno</code> combination.
When the flag is missing, <code class="language-plaintext highlighter-rouge">MALLOC</code> is a very thin wrapper around <code class="language-plaintext highlighter-rouge">malloc</code>, which compilers could remove entirely, and the <code class="language-plaintext highlighter-rouge">-lfallible</code> flags can be omitted.</p>
<p>This applies not only to <code class="language-plaintext highlighter-rouge">malloc</code> or other <code class="language-plaintext highlighter-rouge">stdlib.h</code> functions.
If <code class="language-plaintext highlighter-rouge">a_function</code> is important or relevant, I could add a wrapper around it too, that checks if <code class="language-plaintext highlighter-rouge">fallible_should_fail</code> to exercise if its callers are also doing the proper clean-up.</p>
<p>The actual code is just this single function, <a href="https://euandre.org/git/fallible/tree/src/fallible.c?id=v0.1.0#n16"><code class="language-plaintext highlighter-rouge">fallible_should_fail</code></a>, which ended-up taking only ~40 lines.
In fact, there are more lines of either Makefile (111), README.md (82) or troff (306) on this first version.</p>
<p>The price for such fine-grained control is that this approach requires more manual work.</p>
<h2 id="usage-examples">Usage examples</h2>
<h3 id="malloc-from-the-readmemd"><code class="language-plaintext highlighter-rouge">MALLOC</code> from the <code class="language-plaintext highlighter-rouge">README.md</code></h3>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td class="rouge-code"><pre><span class="c1">// leaky.c</span>
<span class="cp">#include <string.h>
#include <fallible_alloc.h>
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">aaa</span> <span class="o">=</span> <span class="n">MALLOC</span><span class="p">(</span><span class="mi">100</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">aaa</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">strcpy</span><span class="p">(</span><span class="n">aaa</span><span class="p">,</span> <span class="s">"a safe use of strcpy"</span><span class="p">);</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">bbb</span> <span class="o">=</span> <span class="n">MALLOC</span><span class="p">(</span><span class="mi">100</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">bbb</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// free(aaa);</span>
<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">strcpy</span><span class="p">(</span><span class="n">bbb</span><span class="p">,</span> <span class="s">"not unsafe, but aaa is leaking"</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">bbb</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">aaa</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Compile with <code class="language-plaintext highlighter-rouge">-DFALLIBLE</code> and run <a href="https://euandreh.xyz/fallible/fallible-check.1.html"><code class="language-plaintext highlighter-rouge">fallible-check.1</code></a>:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="nv">$ </span>c99 <span class="nt">-DFALLIBLE</span> <span class="nt">-o</span> leaky leaky.c <span class="nt">-lfallible</span>
<span class="nv">$ </span>fallible-check ./leaky
Valgrind failed when we did not expect it to:
<span class="o">(</span>...suppressed output...<span class="o">)</span>
<span class="c"># exit status is 1</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<h2 id="conclusion">Conclusion</h2>
<p>For my personal use, I’ll <a href="https://euandre.org/git/package-repository/">package</a> them for GNU Guix and Nix.
Packaging it to any other distribution should be trivial, or just downloading the tarball and running <code class="language-plaintext highlighter-rouge">[sudo] make install</code>.</p>
<p>Patches welcome!</p>
EuAndreh[email protected]Yesterday I pushed v0.1.0 of fallible, a miniscule library for fault-injection and stress-testing C programs.ANN: remembering - Add memory to dmenu, fzf and similar tools2021-01-26T00:00:00-03:002021-01-26T00:00:00-03:00https://euandre.org/2021/01/26/ann-remembering-add-memory-to-dmenu-fzf-and-similar-tools.html
<p>Today I pushed v0.1.0 of <a href="https://euandreh.xyz/remembering/">remembering</a>, a tool to enhance the interactive usability of menu-like tools, such as <a href="https://tools.suckless.org/dmenu/">dmenu</a> and <a href="https://github.com/junegunn/fzf">fzf</a>.</p>
<h2 id="previous-solution">Previous solution</h2>
<p>I previously used <a href="http://dmwit.com/yeganesh/">yeganesh</a> to fill this gap, but as I started to rely less on Emacs, I added fzf as my go-to tool for doing fuzzy searching on the terminal.
But I didn’t like that fzf always showed the same order of things, when I would only need 3 or 4 commonly used files.</p>
<p>For those who don’t know: yeganesh is a wrapper around dmenu that will remember your most used programs and put them on the beginning of the list of executables.
This is very convenient for interactive prolonged use, as with time the things you usually want are right at the very beginning.</p>
<p>But now I had this thing, yeganesh, that solved this problem for dmenu, but didn’t for fzf.</p>
<p>I initially considered patching yeganesh to support it, but I found it more coupled to dmenu than I would desire.
I’d rather have something that knows nothing about dmenu, fzf or anything, but enhances tools like those in a useful way.</p>
<h2 id="implementation">Implementation</h2>
<p>Other than being decoupled from dmenu, another improvement I though that could be made on top of yeganesh is the programming language choice.
Instead of Haskell, I went with POSIX sh.
Sticking to POSIX sh makes it require less build-time dependencies. There aren’t any, actually. Packaging is made much easier due to that.</p>
<p>The good thing is that the program itself is small enough (<a href="https://euandre.org/git/remembering/tree/remembering?id=v0.1.0">119 lines</a> on v0.1.0) that POSIX sh does the job just fine, combined with other POSIX utilities such as <a href="http://www.opengroup.org/onlinepubs/9699919799/utilities/getopts.html">getopts</a>, <a href="http://www.opengroup.org/onlinepubs/9699919799/utilities/sort.html">sort</a> and <a href="http://www.opengroup.org/onlinepubs/9699919799/utilities/awk.html">awk</a>.</p>
<p>The behaviour is: given a program that will read from STDIN and write a single entry to STDOUT, <code class="language-plaintext highlighter-rouge">remembering</code> wraps that program, and rearranges STDIN so that previous choices appear at the beginning.</p>
<p>Where you would do:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre></td><td class="rouge-code"><pre><span class="nv">$ </span><span class="nb">seq </span>5 | fzf
5
4
3
2
<span class="o">></span> 1
5/5
<span class="o">></span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>And every time get the same order of numbers, now you can write:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre></td><td class="rouge-code"><pre><span class="nv">$ </span><span class="nb">seq </span>5 | remembering <span class="nt">-p</span> seq-fzf <span class="nt">-c</span> fzf
5
4
3
2
<span class="o">></span> 1
5/5
<span class="o">></span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>On the first run, everything is the same. If you picked 4 on the previous example, the following run would be different:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre></td><td class="rouge-code"><pre><span class="nv">$ </span><span class="nb">seq </span>5 | remembering <span class="nt">-p</span> seq-fzf <span class="nt">-c</span> fzf
5
3
2
1
<span class="o">></span> 4
5/5
<span class="o">></span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>As time passes, the list would adjust based on the frequency of your choices.</p>
<p>I aimed for reusability, so that I could wrap diverse commands with <code class="language-plaintext highlighter-rouge">remembering</code> and it would be able to work. To accomplish that, a “profile” (the <code class="language-plaintext highlighter-rouge">-p something</code> part) stores data about different runs separately.</p>
<p>I took the idea of building something small with few dependencies to other places too:</p>
<ul>
<li>the manpages are written in troff directly;</li>
<li>the tests are just more POSIX sh files;</li>
<li>and a POSIX Makefile to <code class="language-plaintext highlighter-rouge">check</code> and <code class="language-plaintext highlighter-rouge">install</code>.</li>
</ul>
<p>I was aware of the value of sticking to coding to standards, but I had past experience mostly with programming language standards, such as ECMAScript, Common Lisp, Scheme, or with IndexedDB or DOM APIs.
It felt good to rediscover these nice POSIX tools, which makes me remember of a quote by <a href="https://en.wikipedia.org/wiki/Henry_Spencer#cite_note-3">Henry Spencer</a>:</p>
<blockquote>
<p>Those who do not understand Unix are condemned to reinvent it, poorly.</p>
</blockquote>
<h2 id="usage-examples">Usage examples</h2>
<p>Here are some functions I wrote myself that you may find useful:</p>
<h3 id="run-a-command-with-fzf-on-pwd">Run a command with fzf on <code class="language-plaintext highlighter-rouge">$PWD</code></h3>
<pre><code class="language-shellcheck">f() {
profile="$f-shell-function(pwd | sed -e 's_/_-_g')"
file="$(git ls-files | \
remembering -p "$profile" \
-c "fzf --select-1 --exit -0 --query \"$2\" --preview 'cat {}'")"
if [ -n "$file" ]; then
# shellcheck disable=2068
history -s f $@
history -s "$1" "$file"
"$1" "$file"
fi
}
</code></pre>
<p>This way I can run <code class="language-plaintext highlighter-rouge">f vi</code> or <code class="language-plaintext highlighter-rouge">f vi config</code> at the root of a repository, and the list of files will always appear on the most used order.
Adding <code class="language-plaintext highlighter-rouge">pwd</code> to the profile allows it to not mix data for different repositories.</p>
<h3 id="copy-password-to-clipboard">Copy password to clipboard</h3>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="rouge-code"><pre><span class="nv">choice</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span>find <span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/.password-store"</span> <span class="nt">-type</span> f | <span class="se">\</span>
<span class="nb">grep</span> <span class="nt">-Ev</span> <span class="s1">'(.git|.gpg-id)'</span> | <span class="se">\</span>
<span class="nb">sed</span> <span class="nt">-e</span> <span class="s2">"s|</span><span class="nv">$HOME</span><span class="s2">/.password-store/||"</span> <span class="nt">-e</span> <span class="s1">'s/\.gpg$//'</span> | <span class="se">\</span>
remembering <span class="nt">-p</span> password-store <span class="se">\</span>
<span class="nt">-c</span> <span class="s1">'dmenu -l 20 -i'</span><span class="si">)</span><span class="s2">"</span>
<span class="k">if</span> <span class="o">[</span> <span class="nt">-n</span> <span class="s2">"</span><span class="nv">$choice</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
</span>pass show <span class="s2">"</span><span class="nv">$choice</span><span class="s2">"</span> <span class="nt">-c</span>
<span class="k">fi</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Adding the above to a file and binding it to a keyboard shortcut, I can access the contents of my <a href="https://www.passwordstore.org/">password store</a>, with the entries ordered by usage.</p>
<h3 id="replacing-yeganesh">Replacing yeganesh</h3>
<p>Where I previously had:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="nv">exe</span><span class="o">=</span><span class="si">$(</span>yeganesh <span class="nt">-x</span><span class="si">)</span> <span class="o">&&</span> <span class="nb">exec</span> <span class="nv">$exe</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Now I have:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="nv">exe</span><span class="o">=</span><span class="si">$(</span>dmenu_path | remembering <span class="nt">-p</span> dmenu-exec <span class="nt">-c</span> dmenu<span class="si">)</span> <span class="o">&&</span> <span class="nb">exec</span> <span class="nv">$exe</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>This way, the executables appear on order of usage.</p>
<p>If you don’t have <code class="language-plaintext highlighter-rouge">dmenu_path</code>, you can get just the underlying <code class="language-plaintext highlighter-rouge">stest</code> tool that looks at the executables available in your <code class="language-plaintext highlighter-rouge">$PATH</code>. Here’s a juicy one-liner to do it:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre><span class="nv">$ </span>wget <span class="nt">-O-</span> https://dl.suckless.org/tools/dmenu-5.0.tar.gz | <span class="se">\</span>
<span class="nb">tar </span>Ozxf - dmenu-5.0/arg.h dmenu-5.0/stest.c | <span class="se">\</span>
<span class="nb">sed</span> <span class="s1">'s|^#include "arg.h"$|// #include "arg.h"|'</span> | <span class="se">\</span>
cc <span class="nt">-xc</span> - <span class="nt">-o</span> stest
</pre></td></tr></tbody></table></code></pre></div></div>
<p>With the <code class="language-plaintext highlighter-rouge">stest</code> utility you’ll be able to list executables in your <code class="language-plaintext highlighter-rouge">$PATH</code> and pipe them to dmenu or something else yourself:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="nv">$ </span><span class="o">(</span><span class="nv">IFS</span><span class="o">=</span>:<span class="p">;</span> ./stest <span class="nt">-flx</span> <span class="nv">$PATH</span><span class="p">;</span><span class="o">)</span> | <span class="nb">sort</span> <span class="nt">-u</span> | remembering <span class="nt">-p</span> another-dmenu-exec <span class="nt">-c</span> dmenu | sh
</pre></td></tr></tbody></table></code></pre></div></div>
<p>In fact, the code for <code class="language-plaintext highlighter-rouge">dmenu_path</code> is almost just like that.</p>
<h2 id="conclusion">Conclusion</h2>
<p>For my personal use, I’ve <a href="https://euandre.org/git/package-repository/">packaged</a> <code class="language-plaintext highlighter-rouge">remembering</code> for GNU Guix and Nix. Packaging it to any other distribution should be trivial, or just downloading the tarball and running <code class="language-plaintext highlighter-rouge">[sudo] make install</code>.</p>
<p>Patches welcome!</p>
EuAndreh[email protected]Today I pushed v0.1.0 of remembering, a tool to enhance the interactive usability of menu-like tools, such as dmenu and fzf.Local-First Software: You Own Your Data, in spite of the Cloud - article review2020-11-14T00:00:00-03:002020-11-14T00:00:00-03:00https://euandre.org/2020/11/14/local-first-software-you-own-your-data-in-spite-of-the-cloud-article-review.html
<p><em>This article is derived from a <a href="/slides/2020/11/14/on-local-first-beyond-the-crdt-silver-bullet.html">presentation</a> given at a Papers
We Love meetup on the same subject.</em></p>
<p>This is a review of the article
“<a href="https://martin.kleppmann.com/papers/local-first.pdf">Local-First Software: You Own Your Data, in spite of the Cloud</a>”,
by M. Kleppmann, A. Wiggins, P. Van Hardenberg and M. F. McGranaghan.</p>
<h3 id="offline-first-local-first">Offline-first, local-first</h3>
<p>The “local-first” term they use isn’t new, and I have used it myself in the past
to refer to this types of application, where the data lives primarily on the
client, and there are conflict resolution algorithms that reconcile data created
on different instances.</p>
<p>Sometimes I see confusion with this idea and “client-side”, “offline-friendly”,
“syncable”, etc. I have myself used this terms, also.</p>
<p>There exists, however, already the “offline-first” term, which conveys almost
all of that meaning. In my view, “local-first” doesn’t extend “offline-first” in
any aspect, rather it gives a well-defined meaning to it instead. I could say
that “local-first” is just “offline-first”, but with 7 well-defined ideals
instead of community best practices.</p>
<p>It is a step forward, and given the number of times I’ve seen the paper shared
around I think there’s a chance people will prefer saying “local-first” in
<em>lieu</em> of “offline-first” from now on.</p>
<h3 id="software-licenses">Software licenses</h3>
<p>On a footnote of the 7th ideal (“You Retain Ultimate Ownership and Control”),
the authors say:</p>
<blockquote>
<p>In our opinion, maintaining control and ownership of data does not mean that
the software must necessarily be open source. (…) as long as it does not
artificially restrict what users can do with their files.</p>
</blockquote>
<p>They give examples of artificial restrictions, like this artificial restriction
I’ve come up with:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre><span class="c">#!/bin/sh</span>
<span class="nv">TODAY</span><span class="o">=</span><span class="si">$(</span><span class="nb">date</span> +%s<span class="si">)</span>
<span class="nv">LICENSE_EXPIRATION</span><span class="o">=</span><span class="si">$(</span><span class="nb">date</span> <span class="nt">-d</span> 2020-11-15 +%s<span class="si">)</span>
<span class="k">if</span> <span class="o">[</span> <span class="nv">$TODAY</span> <span class="nt">-ge</span> <span class="nv">$LICENSE_EXPIRATION</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
</span><span class="nb">echo</span> <span class="s1">'License expired!'</span>
<span class="nb">exit </span>1
<span class="k">fi
</span><span class="nb">echo</span> <span class="k">$((</span><span class="m">2</span> <span class="o">+</span> <span class="m">2</span><span class="k">))</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Now when using this very useful program:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="rouge-code"><pre><span class="c"># today</span>
<span class="nv">$ </span>./useful-adder.sh
4
<span class="c"># tomorrow</span>
<span class="nv">$ </span>./useful-adder.sh
License expired!
</pre></td></tr></tbody></table></code></pre></div></div>
<p>This is obviously an intentional restriction, and it goes against the 5th ideal
(“The Long Now”). This software would only be useful as long as the embedded
license expiration allowed. Sure you could change the clock on the computer, but
there are many other ways that this type of intentional restriction is in
conflict with that ideal.</p>
<p>However, what about unintentional restrictions? What if a software had an equal
or similar restriction, and stopped working after days pass? Or what if the
programmer added a constant to make the development simpler, and this led to
unintentionally restricting the user?</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="rouge-code"><pre><span class="c"># today</span>
<span class="nv">$ </span>useful-program
<span class="c"># ...useful output...</span>
<span class="c"># tomorrow, with more data</span>
<span class="nv">$ </span>useful-program
ERROR: Panic! Stack overflow!
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Just as easily as I can come up with ways to intentionally restrict users, I can
do the same for unintentionally restrictions. A program can stop working for a
variety of reasons.</p>
<p>If it stops working due do, say, data growth, what are the options? Reverting to
an earlier backup, and making it read-only? That isn’t really a “Long Now”, but
rather a “Long Now as long as the software keeps working as expected”.</p>
<p>The point is: if the software isn’t free, “The Long Now” isn’t achievable
without a lot of wishful thinking. Maybe the authors were trying to be more
friendly towards business who don’t like free software, but in doing so they’ve proposed
a contradiction by reconciling “The Long Now” with proprietary software.</p>
<p>It isn’t the same as saying that any free software achieves that ideal,
either. The license can still be free, but the source code can become
unavailable due to cloud rot. Or maybe the build is undocumented, or the build
tools had specific configuration that one has to guess. A piece of free
software can still fail to achieve “The Long Now”. Being free doesn’t guarantee
it, just makes it possible.</p>
<p>A colleague has challenged my view, arguing that the software doesn’t really
need to be free, as long as there is an specification of the file format. This
way if the software stops working, the format can still be processed by other
programs. But this doesn’t apply in practice: if you have a document that you
write to, and software stops working, you still want to write to the document.
An external tool that navigates the content and shows it to you won’t allow you
to keep writing, and when it does that tool is now starting to re-implement the
software.</p>
<p>An open specification could serve as a blueprint to other implementations,
making the data format more friendly to reverse-engineering. But the
re-implementation still has to exist, at which point the original software failed
to achieve “The Long Now”.</p>
<p>It is less bad, but still not quite there yet.</p>
<h3 id="denial-of-existing-solutions">Denial of existing solutions</h3>
<p>When describing “Existing Data Storage and Sharing Models”, on a
footnote<sup id="fnref:devil" role="doc-noteref"><a href="#fn:devil" class="footnote" rel="footnote">1</a></sup> the authors say:</p>
<blockquote>
<p>In principle it is possible to collaborate without a repository service,
e.g. by sending patch files by email, but the majority of Git users rely
on GitHub.</p>
</blockquote>
<p>The authors go to a great length to talk about usability of cloud apps, and even
point to research they’ve done on it, but they’ve missed learning more from
local-first solutions that already exist.</p>
<p>Say the automerge CRDT proves to be even more useful than what everybody
imagined. Say someone builds a local-first repository service using it. How will
it change anything of the Git/GitHub model? What is different about it that
prevents people in the future writing a paper saying:</p>
<blockquote>
<p>In principle it is possible to collaborate without a repository service,
e.g. by using automerge and platform X,
but the majority of Git users rely on GitHub.</p>
</blockquote>
<p>How is this any better?</p>
<p>If it is already <a href="https://drewdevault.com/2018/07/23/Git-is-already-distributed.html">possible</a> to have a local-first development
workflow, why don’t people use it? Is it just fashion, or there’s a fundamental
problem with it? If so, what is it, and how to avoid it?</p>
<p>If sending patches by emails is perfectly possible but out of fashion, why even
talk about Git/GitHub? Isn’t this a problem that people are putting themselves
in? How can CRDTs possibly prevent people from doing that?</p>
<p>My impression is that the authors envision a better future, where development is
fully decentralized unlike today, and somehow CRDTs will make that happen. If
more people think this way, “CRDT” is next in line to the buzzword list that
solves everything, like “containers”, “blockchain” or “machine learning”.</p>
<p>Rather than picturing an imaginary service that could be described like
“GitHub+CRDTs” and people would adopt it, I’d rather better understand why
people don’t do it already, since Git is built to work like that.</p>
<h3 id="ditching-of-web-applications">Ditching of web applications</h3>
<p>The authors put web application in a worse position for building local-first
application, claiming that:</p>
<blockquote>
<p>(…) the architecture of web apps remains fundamentally server-centric.
Offline support is an afterthought in most web apps, and the result is
accordingly fragile.</p>
</blockquote>
<p>Well, I disagree.</p>
<p>The problem isn’t inherit to the web platform, but instead how people use it.</p>
<p>I have myself built offline-first applications, leveraging IndexedDB, App Cache,
<em>etc</em>. I wanted to build an offline-first application on the web, and so I did.</p>
<p>In fact, many people choose <a href="https://pouchdb.com/">PouchDB</a> <em>because</em> of that, since it is a
good tool for offline-first web applications. The problem isn’t really the
technology, but how much people want their application to be local-first.</p>
<p>Contrast it with Android <a href="https://developer.android.com/topic/google-play-instant">Instant Apps</a>, where applications are
sent to the phone in small parts. Since this requires an internet connection to
move from a part of the app bundle to another, a subset of the app isn’t
local-first, despite being an app.</p>
<p>The point isn’t the technology, but how people are using it. Local-first web
applications are perfectly possible, just like non-local-first native
applications are possible.</p>
<h3 id="costs-are-underrated">Costs are underrated</h3>
<p>I think the costs of “old-fashioned apps” over “cloud apps” are underrated,
mainly regarding storage, and that this costs can vary a lot by application.</p>
<p>Say a person writes online articles for their personal website, and puts
everything into Git. Since there isn’t supposed to be any collaboration, all
of the relevant ideals of local-first are achieved.</p>
<p>Now another person creates videos instead of articles. They could try keeping
everything local, but after some time the storage usage fills the entire disk.
This person’s local-first setup would be much more complex, and would cost much
more on maintenance, backup and storage.</p>
<p>Even though both have similar needs, a local-first video repository is much more
demanding. So the local-first thinking here isn’t “just keep everything local”,
but “how much time and money am I willing to spend to keep everything local”.</p>
<p>The convenience of “cloud apps” becomes so attractive that many don’t even have
a local copy of their videos, and rely exclusively on service providers to
maintain, backup and store their content.</p>
<p>The dial measuring “cloud apps” and “old-fashioned apps” needs to be specific to
use-cases.</p>
<h3 id="real-time-collaboration-is-optional">Real-time collaboration is optional</h3>
<p>If I were the one making the list of ideals, I wouldn’t focus so much on
real-time collaboration.</p>
<p>Even though seamless collaboration is desired, it being real-time depends on the
network being available for that. But ideal 3 states that
“The Network is Optional”, so real-time collaboration is also optional.</p>
<p>The fundamentals of a local-first system should enable real-time collaboration
when network is available, but shouldn’t focus on it.</p>
<p>On many places when discussing applications being offline, it is common for me
to find people saying that their application works
“even on a plane, subway or elevator”. That is a reflection of when said
developers have to deal with networks being unavailable.</p>
<p>But this leaves out a big chunk of the world where internet connection is
intermittent, or only works every other day or only once a week, or stops
working when it rains, <em>etc</em>. For this audience, living without network
connectivity isn’t such a discrete moment in time, but part of every day life. I
like the fact that the authors acknowledge that.</p>
<p>When discussing “working offline”, I’d rather keep this type of person in mind,
then the subset of people who are offline when on the elevator will naturally be
included.</p>
<h3 id="on-crdts-and-developer-experience">On CRDTs and developer experience</h3>
<p>When discussing developer experience, the authors bring up some questions to be
answered further, like:</p>
<blockquote>
<p>For an app developer, how does the use of a CRDT-based data layer compare to
existing storage layers like a SQL database, a filesystem, or CoreData? Is a
distributed system harder to write software for?</p>
</blockquote>
<p>That is an easy one: yes.</p>
<p>A distributed system <em>is</em> harder to write software for, being a distributed
system.</p>
<p>Adding a large layer of data structures and algorithms will make it more complex
to write software for, naturally. And if trying to make this layer transparent
to the programmer, so they can pretend that layer doesn’t exist is a bad idea,
as RPC frameworks have tried, and failed.</p>
<p>See “<a href="https://web.archive.org/web/20130116163535/http://labs.oracle.com/techrep/1994/smli_tr-94-29.pdf">A Note on Distributed Computing</a>” for a critique on RPC
frameworks trying to make the network invisible, which I think also applies in
equivalence for making the CRDTs layer invisible.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I liked a lot the article, as it took the “offline-first” philosophy and ran
with it.</p>
<p>But I think the authors’ view of adding CRDTs and things becoming local-first is
a bit too magical.</p>
<p>This particular area is one that I have large interest on, and I wish to see
more being done on the “local-first” space.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:devil" role="doc-endnote">
<p>This is the second aspect that I’m picking on the article from a
footnote. I guess the devil really is on the details. <a href="#fnref:devil" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
EuAndreh[email protected]This article is derived from a presentation given at a Papers We Love meetup on the same subject.Durable persistent trees and parser combinators - building a database2020-11-12T00:00:00-03:002021-02-09T00:00:00-03:00https://euandre.org/2020/11/12/durable-persistent-trees-and-parser-combinators-building-a-database.html
<p>I’ve received with certain frequency messages from people wanting to know if
I’ve made any progress on the database project
<a href="/2020/08/31/the-database-i-wish-i-had.html">I’ve written about</a>.</p>
<p>There are a few areas where I’ve made progress, and here’s a public post on it.</p>
<h2 id="proof-of-concept-dag-log">Proof-of-concept: DAG log</h2>
<p>The main thing I wanted to validate with a concrete implementation was the
concept of modeling a DAG on a sequence of datoms.</p>
<p>The notion of a <em>datom</em> is a rip-off from Datomic, which models data with time
aware <em>facts</em>, which come from RDF. RDF’s fact is a triple of
subject-predicate-object, and Datomic’s datoms add a time component to it:
subject-predicate-object-time, A.K.A. entity-attribute-value-transaction:</p>
<div class="language-clojure highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre><span class="p">[[</span><span class="n">person</span><span class="w"> </span><span class="no">:likes</span><span class="w"> </span><span class="s">"pizza"</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="n">true</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="n">person</span><span class="w"> </span><span class="no">:likes</span><span class="w"> </span><span class="s">"bread"</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">true</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="n">person</span><span class="w"> </span><span class="no">:likes</span><span class="w"> </span><span class="s">"pizza"</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">false</span><span class="p">]]</span><span class="w">
</span></pre></td></tr></tbody></table></code></pre></div></div>
<p>The above datoms say:</p>
<ul>
<li>at time 0, <code class="language-plaintext highlighter-rouge">person</code> like pizza;</li>
<li>at time 1, <code class="language-plaintext highlighter-rouge">person</code> stopped liking pizza, and started to like bread.</li>
</ul>
<p>Datomic ensures total consistency of this ever growing log by having a single
writer, the transactor, that will enforce it when writing.</p>
<p>In order to support disconnected clients, I needed a way to allow multiple
writers, and I chose to do it by making the log not a list, but a
directed acyclic graph (DAG):</p>
<div class="language-clojure highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="p">[[</span><span class="n">person</span><span class="w"> </span><span class="no">:likes</span><span class="w"> </span><span class="s">"pizza"</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="n">true</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="mi">0</span><span class="w"> </span><span class="no">:parent</span><span class="w"> </span><span class="no">:db/root</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="n">true</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="n">person</span><span class="w"> </span><span class="no">:likes</span><span class="w"> </span><span class="s">"bread"</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">true</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="n">person</span><span class="w"> </span><span class="no">:likes</span><span class="w"> </span><span class="s">"pizza"</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">false</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="mi">1</span><span class="w"> </span><span class="no">:parent</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">true</span><span class="p">]]</span><span class="w">
</span></pre></td></tr></tbody></table></code></pre></div></div>
<p>The extra datoms above add more information to build the directionality to the
log, and instead of a single consistent log, the DAG could have multiple leaves
that coexist, much like how different Git branches can have different “latest”
commits.</p>
<p>In order to validate this idea, I started with a Clojure implementation. The
goal was not to write the actual final code, but to make a proof-of-concept that
would allow me to test and stretch the idea itself.</p>
<p>This code <a href="https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n1">already exists</a>, but is yet fairly incomplete:</p>
<ul>
<li>the building of the index isn’t done yet (with some
<a href="https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n295">commented code</a> on the next step to be implemented)</li>
<li>the indexing is extremely inefficient, with <a href="https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n130">more</a>
<a href="https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n146">than</a> <a href="https://euandre.org/git/mediator/tree/src/core/clojure/src/mediator.clj?id=db4a727bc24b54b50158827b34502de21dbf8948#n253">one</a> occurrence of <code class="language-plaintext highlighter-rouge">O²</code> functions;</li>
<li>no query support yet.</li>
</ul>
<h2 id="top-down-and-bottom-up">Top-down <em>and</em> bottom-up</h2>
<p>However, as time passed and I started looking at what the final implementation
would look like, I started to consider keeping the PoC around.</p>
<p>The top-down approach (Clojure PoC) was in fact helping guide me with the
bottom-up, and I now have “promoted” the Clojure PoC into a “reference
implementation”. It should now be a finished implementation that says what the
expected behaviour is, and the actual code should match the behaviour.</p>
<p>The good thing about a reference implementation is that it has no performance of
resources boundary, so if it ends up being 1000x slower and using 500× more
memory, it should be find. The code can be also 10x or 100x simpler, too.</p>
<h2 id="top-down-durable-persistent-trees">Top-down: durable persistent trees</h2>
<p>In promoting the PoC into a reference implementation, this top-down approach now
needs to go beyond doing everything in memory, and the index data structure now
needs to be disk-based.</p>
<p>Roughly speaking, most storage engines out there are based either on B-Trees or
LSM Trees, or some variations of those.</p>
<p>But when building an immutable database, update-in-place B-Trees aren’t an
option, as it doesn’t accommodate keeping historical views of the tree. LSM Trees
may seem a better alternative, but duplication on the files with compaction are
also ways to delete old data which is indeed useful for a historical view.</p>
<p>I think the thing I’m after is a mix of a Copy-on-Write B-Tree, which would keep
historical versions with the write IO cost amortization of memtables of LSM
Trees. I don’t know of any B-Tree variant out there that resembles this, so I’ll
call it “Flushing Copy-on-Write B-Tree”.</p>
<p>I haven’t written any code for this yet, so all I have is a high-level view of
what it will look like:</p>
<ol>
<li>
<p>like Copy-on-Write B-Trees, changing a leaf involves creating a new leaf and
building a new path from root to the leaf. The upside is that writes a lock
free, and no coordination is needed between readers and writers, ever;</p>
</li>
<li>
<p>the downside is that a single leaf update means at least <code class="language-plaintext highlighter-rouge">H</code> new nodes that
will have to be flushed to disk, where <code class="language-plaintext highlighter-rouge">H</code> is the height of the tree. To avoid
that, the writer creates these nodes exclusively on the in-memory memtable, to
avoid flushing to disk on every leaf update;</p>
</li>
<li>
<p>a background job will consolidate the memtable data every time it hits X MB,
and persist it to disk, amortizing the cost of the Copy-on-Write B-Tree;</p>
</li>
<li>
<p>readers than will have the extra job of getting the latest relevant
disk-resident value and merge it with the memtable data.</p>
</li>
</ol>
<p>The key difference to existing Copy-on-Write B-Trees is that the new trees
are only periodically written to disk, and the intermediate values are kept in
memory. Since no node is ever updated, the page utilization is maximum as it
doesn’t need to keep space for future inserts and updates.</p>
<p>And the key difference to existing LSM Trees is that no compaction is run:
intermediate values are still relevant as the database grows. So this leaves out
tombstones and value duplication done for write performance.</p>
<p>One can delete intermediate index values to reclaim space, but no data is lost
on the process, only old B-Tree values. And if the database ever comes back to
that point (like when doing a historical query), the B-Tree will have to be
rebuilt from a previous value. After all, the database <em>is</em> a set of datoms, and
everything else is just derived data.</p>
<p>Right now I’m still reading about other data structures that storage engines
use, and I’ll start implementing the “Flushing Copy-on-Write B-Tree” as I learn
more<sup id="fnref:learn-more-db" role="doc-noteref"><a href="#fn:learn-more-db" class="footnote" rel="footnote">1</a></sup> and mature it more.</p>
<h2 id="bottom-up-parser-combinators-and-ffi">Bottom-up: parser combinators and FFI</h2>
<p>I chose Rust as it has the best WebAssembly tooling support.</p>
<p>My goal is not to build a Rust database, but a database that happens to be in
Rust. In order to reach client platforms, the primary API is the FFI one.</p>
<p>I’m not very happy with current tools for exposing Rust code via FFI to the
external world: they either mix C with C++, which I don’t want to do, or provide
no access to the intermediate representation of the FFI, which would be useful
for generating binding for any language that speaks FFI.</p>
<p>I like better the path that the author of <a href="https://github.com/eqrion/cbindgen">cbindgen</a>
crate <a href="https://blog.eqrion.net/future-directions-for-cbindgen/">proposes</a>: emitting an data representation of the Rust C API
(the author calls is a <code class="language-plaintext highlighter-rouge">ffi.json</code> file), and than building transformers from the
data representation to the target language. This way you could generate a C API
<em>and</em> the node-ffi bindings for JavaScript automatically from the Rust code.</p>
<p>So the first thing to be done before moving on is an FFI exporter that doesn’t
mix C and C++, and generates said <code class="language-plaintext highlighter-rouge">ffi.json</code>, and than build a few transformers
that take this <code class="language-plaintext highlighter-rouge">ffi.json</code> and generate the language bindings, be it C, C++,
JavaScript, TypeScript, Kotlin, Swift, Dart, <em>etc</em><sup id="fnref:ffi-langs" role="doc-noteref"><a href="#fn:ffi-langs" class="footnote" rel="footnote">2</a></sup>.</p>
<p>I think the best way to get there is by taking the existing code for cbindgen,
which uses the <a href="https://github.com/dtolnay/syn">syn</a> crate to parse the Rust code<sup id="fnref:rust-syn" role="doc-noteref"><a href="#fn:rust-syn" class="footnote" rel="footnote">3</a></sup>, and
adapt it to emit the metadata.</p>
<p>I’ve started a fork of cbindgen: <del>x-bindgen</del><sup id="fnref:x-bindgen" role="doc-noteref"><a href="#fn:x-bindgen" class="footnote" rel="footnote">4</a></sup>. Right now it is
just a copy of cbindgen verbatim, and I plan to remove all C and C++ emitting
code from it, and add a IR emitting code instead.</p>
<p>When starting working on x-bindgen, I realized I didn’t know what to look for in
a header file, as I haven’t written any C code in many years. So as I was
writing <a href="https://euandre.org/git/libedn/">libedn</a>, I didn’t know how to build a good C API to
expose. So I tried porting the code to C, and right now I’m working on building
a <em>good</em> C API for a JSON parser using parser combinators:
<del>ParsecC</del> <sup id="fnref:parsecc" role="doc-noteref"><a href="#fn:parsecc" class="footnote" rel="footnote">5</a></sup>.</p>
<p>After “finishing” ParsecC I’ll have a good notion of what a good C API is, and
I’ll have a better direction towards how to expose code from libedn to other
languages, and work on x-bindgen then.</p>
<p>What both libedn and ParsecC are missing right now are proper error reporting,
and property-based testing for libedn.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I’ve learned a lot already, and I feel the journey I’m on is worth going
through.</p>
<p>If any of those topics interest you, message me to discuss more or contribute!
Patches welcome!</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:learn-more-db" role="doc-endnote">
<p>If you are interested in learning more about this too, the
very best two resources on this subject are Andy Pavlo’s
“<a href="https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_otTrBTrjyohi">Intro to Database Systems</a>”
course and Alex Petrov’s “<a href="https://www.databass.dev/">Database Internals</a>” book. <a href="#fnref:learn-more-db" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:ffi-langs" role="doc-endnote">
<p>Those are, specifically, the languages I’m more interested on. My
goal is supporting client applications, and those languages are the most
relevant for doing so: C for GTK, C++ for Qt, JavaScript and TypeScript for
Node.js and browser, Kotlin for Android and Swing, Swift for iOS, and Dart
for Flutter. <a href="#fnref:ffi-langs" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:rust-syn" role="doc-endnote">
<p>The fact that syn is an external crate to the Rust compiler points
to a big warning: procedural macros are not first class in Rust. They are
just like Babel plugins in JavaScript land, with the extra shortcoming that
there is no specification for the Rust syntax, unlike JavaScript.</p>
<p>As flawed as this may be, it seems to be generally acceptable and adopted,
which works against building a solid ecosystem for Rust.</p>
<p>The alternative that rust-ffi implements relies on internals of the Rust
compiler, which isn’t actually worst, just less common and less accepted. <a href="#fnref:rust-syn" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:x-bindgen" role="doc-endnote">
<p><em>EDIT</em>: now archived, the experimentation was fun. I’ve started to move more towards C, so this effort became deprecated. <a href="#fnref:x-bindgen" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:parsecc" role="doc-endnote">
<p><em>EDIT</em>: now also archived. <a href="#fnref:parsecc" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
EuAndreh[email protected]I’ve received with certain frequency messages from people wanting to know if I’ve made any progress on the database project I’ve written about.The Next Paradigm Shift in Programming - video review2020-11-08T00:00:00-03:002020-11-08T00:00:00-03:00https://euandre.org/2020/11/08/the-next-paradigm-shift-in-programming-video-review.html
<p>This is a review with comments of
“<a href="https://www.youtube.com/watch?v=6YbK8o9rZfI">The Next Paradigm Shift in Programming</a>”, by Richard Feldman.</p>
<p>This video was <em>strongly</em> suggested to me by a colleague. I wanted to discuss it
with her, and when drafting my response I figured I could publish it publicly
instead.</p>
<p>Before anything else, let me just be clear: I really like the talk, and I think
Richard is a great public speaker. I’ve watched several of his talks over the
years, and I feel I’ve followed his career at a distance, with much respect.
This isn’t a piece criticizing him personally, and I agree with almost
everything he said. These are just some comments but also nitpicks on a few
topics I think he missed, or that I view differently.</p>
<h2 id="structured-programming">Structured programming</h2>
<p>The historical overview at the beginning is very good. In fact, the very video I
watched previously was about structured programming!</p>
<p>Kevlin Henney on
“<a href="https://www.youtube.com/watch?v=SFv8Wm2HdNM">The Forgotten Art of Structured Programming</a>” does a
deep-dive on the topic of structured programming, and how on his view it is
still hidden in our code, when we do a <code class="language-plaintext highlighter-rouge">continue</code> or a <code class="language-plaintext highlighter-rouge">break</code> in some ways.
Even though it is less common to see an explicit <code class="language-plaintext highlighter-rouge">goto</code> in code these days, many
of the original arguments of Dijkstra against explicit <code class="language-plaintext highlighter-rouge">goto</code>s is applicable to
other constructs, too.</p>
<p>This is a very mature view, and I like how he goes beyond the
“don’t use <code class="language-plaintext highlighter-rouge">goto</code>s” heuristic and proposes and a much more nuanced understanding
of what “structured programming” means.</p>
<p>In a few minutes, Richard is able to condense most of the significant bits of
Kevlin’s talk in a didactical way. Good job.</p>
<h2 id="oop-like-a-distributed-system">OOP like a distributed system</h2>
<p>Richard extrapolates Alan Kay’s original vision of OOP, and he concludes that
it is more like a distributed system that how people think about OOP these days.
But he then states that this is a rather bad idea, and we shouldn’t pursue it,
given that distributed systems are known to be hard.</p>
<p>However, his extrapolation isn’t really impossible, bad or an absurd. In fact,
it has been followed through by Erlang. Joe Armstrong used to say that
“<a href="https://www.infoq.com/interviews/johnson-armstrong-oop/">Erlang might the only OOP language</a>”, since it actually adopted
this paradigm.</p>
<p>But Erlang is a functional language. So this “OOP as a distributed system” view
is more about designing systems in the large than programs in the small.</p>
<p>There is a switch of levels in this comparison I’m making, as can be done with
any language or paradigm: you can have a functional-like system that is built
with an OOP language (like a compiler, that given the same input will produce
the same output), or an OOP-like system that is built with a functional language
(Rich Hickey calls it
“<a href="https://www.youtube.com/watch?v=ROor6_NGIWU">OOP in the large</a>”<sup id="fnref:the-language-of-the-system" role="doc-noteref"><a href="#fn:the-language-of-the-system" class="footnote" rel="footnote">1</a></sup>).</p>
<p>So this jump from in-process paradigm to distributed paradigm is rather a big
one, and I don’t think you he can argue that OOP has anything to say about
software distribution across nodes. You can still have Erlang actors that run
independently and send messages to each other without a network between them.
Any OTP application deployed on a single node effectively works like that.</p>
<p>I think he went a bit too far with this extrapolation. Even though I agree it is
a logical a fair one, it isn’t evidently bad as he painted. I would be fine
working with a single-node OTP application and seeing someone call it “a <em>real</em>
OOP program”.</p>
<h2 id="first-class-immutability">First class immutability</h2>
<p>I agree with his view of languages moving towards the functional paradigm.
But I think you can narrow down the “first-class immutability” feature he points
out as present on modern functional programming languages to “first-class
immutable data structures”.</p>
<p>I wouldn’t categorize a language as “supporting functional programming style”
without a library for functional data structures it. By discipline you can avoid
side-effects, write pure functions as much as possible, and pass functions as
arguments around is almost every language these days, but if when changing an
element of a vector mutates things in-place, that is still not functional
programming.</p>
<p>To avoid that, you end-up needing to make clones of objects to pass to a
function, using freezes or other workarounds. All those cases are when the
underlying mix of OOP and functional programming fail.</p>
<p>There are some languages with third-party libraries that provide functional data
structures, like <a href="https://sinusoid.es/immer/">immer</a> for C++, or <a href="https://immutable-js.github.io/immutable-js/">ImmutableJS</a> for
JavaScript.</p>
<p>But functional programming is more easily achievable in languages that have them
built-in, like Erlang, Elm and Clojure.</p>
<h2 id="managed-side-effects">Managed side-effects</h2>
<p>His proposal of adopting managed side-effects as a first-class language concept
is really intriguing.</p>
<p>This is something you can achieve with a library, like <a href="https://redux.js.org/">Redux</a> for JavaScript or
<a href="https://github.com/Day8/re-frame">re-frame</a> for Clojure.</p>
<p>I haven’t worked with a language with managed side-effects at scale, and I don’t
feel this is a problem with Clojure or Erlang. But is this me finding a flaw in
his argument or not acknowledging a benefit unknown to me? This is a provocative
question I ask myself.</p>
<p>Also all FP languages with managed side-effects I know are statically-typed, and
all dynamically-typed FP languages I know don’t have managed side-effects baked in.</p>
<h2 id="what-about-declarative-programming">What about declarative programming?</h2>
<p>In “<a href="http://curtclifton.net/papers/MoseleyMarks06a.pdf">Out of the Tar Pit</a>”, B. Moseley and P. Marks go beyond his view
of functional programming as the basis, and name a possible “functional
relational programming” as an even better solution. They explicitly call out
some flaws in most of the modern functional programming languages, and instead
pick declarative programming as an even better starting paradigm.</p>
<p>If the next paradigm shift is towards functional programming, will the following
shift be towards declarative programming?</p>
<h2 id="conclusion">Conclusion</h2>
<p>Beyond all Richard said, I also hear often bring up functional programming when
talking about utilizing all cores of a computer, and how FP can help with that.</p>
<p>Rich Hickey makes a great case for single-process FP on his famous talk
“<a href="https://www.infoq.com/presentations/Simple-Made-Easy/">Simple Made Easy</a>”.</p>
<!-- I find this conclusion too short, and it doesn't revisits the main points -->
<!-- presented on the body of the article. I won't rewrite it now, but it would be an -->
<!-- improvement to extend it to do so. -->
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:the-language-of-the-system" role="doc-endnote">
<p>From 24:05 to 27:45. <a href="#fnref:the-language-of-the-system" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
EuAndreh[email protected]This is a review with comments of “The Next Paradigm Shift in Programming”, by Richard Feldman.DIY an offline bug tracker with text files, Git and email2020-11-07T00:00:00-03:002021-08-14T00:00:00-03:00https://euandre.org/2020/11/07/diy-an-offline-bug-tracker-with-text-files-git-and-email.html
<p>When <a href="https://github.com/github/dmca/blob/master/2020/10/2020-10-23-RIAA.md">push comes to shove</a>, the operational aspects
of governance of a software project matter a lot. And everybody likes to chime
in with their alternative of how to avoid single points of failure in project
governance, just like I’m doing right now.</p>
<p>The most valuable assets of a project are:</p>
<ol>
<li>source code</li>
<li>discussions</li>
<li>documentation</li>
<li>builds</li>
<li>tasks and bugs</li>
</ol>
<p>For <strong>source code</strong>, Git and other DVCS solve that already: everybody gets a
full copy of the entire source code.</p>
<p>If your code forge is compromised, moving it to a new one takes a couple of
minutes, if there isn’t a secondary remote serving as mirror already. In this
case, no action is required.</p>
<p>If you’re having your <strong>discussions</strong> by email,
“<a href="https://sourcehut.org/blog/2020-10-29-how-mailing-lists-prevent-censorship/">taking this archive somewhere else and carrying on is effortless</a>”.</p>
<p>Besides, make sure to backup archives of past discussions so that the history is
also preserved when this migration happens.</p>
<p>The <strong>documentation</strong> should
<a href="https://podcast.writethedocs.org/2017/01/25/episode-3-trends/">live inside the repository itself</a><sup id="fnref:writethedocs-in-repo" role="doc-noteref"><a href="#fn:writethedocs-in-repo" class="footnote" rel="footnote">1</a></sup>,
so that not only it gets first class treatment, but also gets distributed to
everybody too. Migrating the code to a new forge already migrates the
documentation with it.</p>
<p>As long as you keep the <strong>builds</strong> vendor neutral, the migration should only
involve adapting how you call your <code class="language-plaintext highlighter-rouge">tests.sh</code> from the format of
<code class="language-plaintext highlighter-rouge">provider-1.yml</code> uses to the format that <code class="language-plaintext highlighter-rouge">provider-2.yml</code> accepts.
It isn’t valuable to carry the build history with the project, as this data
quickly decays in value as weeks and months go by, but for simple text logs
<a href="/til/2020/11/30/storing-ci-data-on-git-notes.html">using Git notes</a> may be just enough, and they would be replicated with the rest
of the repository.</p>
<p>But for <strong>tasks and bugs</strong> many rely on a vendor-specific service, where you
register and manage those issues via a web browser. Some provide an
<a href="https://man.sr.ht/todo.sr.ht/#email-access">interface for interacting via email</a> or an API for
<a href="https://github.com/MichaelMure/git-bug#bridges">bridging local bugs with vendor-specific services</a>. But
they’re all layers around the service, that disguises it as being a central
point of failure, which when compromised would lead to data loss. When push comes
to shove, you’d loose data.</p>
<h2 id="alternative-text-files-git-and-email">Alternative: text files, Git and email</h2>
<p>Why not do the same as documentation, and move tasks and bugs into the
repository itself?</p>
<p>It requires no extra tool to be installed, and fits right in the already
existing workflow for source code and documentation.</p>
<p>I like to keep a <a href="https://euandre.org/git/remembering/tree/TODOs.md?id=3f727802cb73ab7aa139ca52e729fd106ea916d0"><code class="language-plaintext highlighter-rouge">TODOs.md</code></a> file at the repository top-level, with
two relevant sections: “tasks” and “bugs”. Then when building the documentation
I’ll just <a href="https://euandre.org/git/remembering/tree/aux/workflow/TODOs.sh?id=3f727802cb73ab7aa139ca52e729fd106ea916d0">generate an HTML file from it</a>, and <a href="https://euandreh.xyz/remembering/TODOs.html">publish</a> it alongside the static
website. All that is done on the main branch.</p>
<p>Any issues discussions are done in the mailing list, and a reference to a
discussion could be added to the ticket itself later on. External contributors
can file tickets by sending a patch.</p>
<p>The good thing about this solution is that it works for 99% of projects out
there.</p>
<p>For the other 1%, having Fossil’s “<a href="https://fossil-scm.org/home/doc/trunk/www/bugtheory.wiki">tickets</a>” could be an
alternative, but you may not want to migrate your project to Fossil to get those
niceties.</p>
<p>Even though I keep a <code class="language-plaintext highlighter-rouge">TODOs.md</code> file on the main branch, you can have a <code class="language-plaintext highlighter-rouge">tasks</code>
branch with a <code class="language-plaintext highlighter-rouge">task-n.md</code> file for each task, or any other way you like.</p>
<p>These tools are familiar enough that you can adjust it to fit your workflow.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:writethedocs-in-repo" role="doc-endnote">
<p>Described as “the ultimate marriage of the two”. Starts
at time 31:50. <a href="#fnref:writethedocs-in-repo" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
EuAndreh[email protected]When push comes to shove, the operational aspects of governance of a software project matter a lot. And everybody likes to chime in with their alternative of how to avoid single points of failure in project governance, just like I’m doing right now.How not to interview engineers2020-10-20T00:00:00-03:002020-10-24T00:00:00-03:00https://euandre.org/2020/10/20/how-not-to-interview-engineers.html
<p>This is a response to Slava’s
“<a href="https://defmacro.substack.com/p/how-to-interview-engineers">How to interview engineers</a>” article. I initially
thought it was a satire, <a href="https://defmacro.substack.com/p/how-to-interview-engineers/comments#comment-599996">as have others</a>, but he has
<a href="https://twitter.com/spakhm/status/1315754730740617216">doubled down on it</a>:</p>
<blockquote>
<p>(…) Some parts are slightly exaggerated for sure, but the essay isn’t meant
as a joke.</p>
</blockquote>
<p>That being true, he completely misses the point on how to improve hiring, and
proposes a worse alternative on many aspects. It doesn’t qualify as provocative,
it is just wrong.</p>
<p>I was comfortable taking it as a satire, and I would just ignore the whole thing
if it wasn’t (except for the technical memo part), but friends of mine
considered it to be somewhat reasonable. This is a adapted version of parts of
the discussions we had, risking becoming a gigantic showcase of
<a href="https://en.wikipedia.org/wiki/Poe%27s_law">Poe’s law</a>.</p>
<p>In this piece, I will argument against his view, and propose an alternative
approach to improve hiring.</p>
<p>It is common to find people saying how broken technical hiring is, as well put
in words by a phrase on <a href="https://news.ycombinator.com/item?id=24757511">this comment</a>:</p>
<blockquote>
<p>Everyone loves to read and write about how developer interviewing is flawed,
but no one wants to go out on a limb and make suggestions about how to improve
it.</p>
</blockquote>
<p>I guess Slava was trying to not fall on this trap, and make a suggestion on how
to improve instead, which all went terribly wrong.</p>
<h2 id="what-not-to-do">What not to do</h2>
<h3 id="time-candidates">Time candidates</h3>
<p>Timing the candidate shows up on the “talent” and “judgment” sections, and they
are both bad ideas for the same reason: programming is not a performance.</p>
<p>What do e-sports, musicians, actors and athletes have in common: performance
psychologists.</p>
<p>For a pianist, their state of mind during concerts is crucial: they not only
must be able to deal with stage anxiety, but to become really successful they
will have to learn how to exploit it. The time window of the concert is what
people practice thousands of hours for, and it is what defines one’s career,
since how well all the practice went is irrelevant to the nature of the
profession. Being able to leverage stage anxiety is an actual goal of them.</p>
<p>That is also applicable to athletes, where the execution during a competition
makes them sink or swim, regardless of how all the training was.</p>
<p>The same cannot be said about composers, though. They are more like book
writers, where the value is not on very few moments with high adrenaline, but on
the aggregate over hours, days, weeks, months and years. A composer may have a
deadline to finish a song in five weeks, but it doesn’t really matter if it is
done on a single night, every morning between 6 and 9, at the very last week, or
any other way. No rigid time structure applies, only whatever fits best to the
composer.</p>
<p>Programming is more like composing than doing a concert, which is another way of
saying that programming is not a performance. People don’t practice algorithms
for months to keep them at their fingertips, so that finally in a single
afternoon they can sit down and write everything at once in a rigid 4 hours
window, and launch it immediately after.</p>
<p>Instead software is built iteratively, by making small additions, than
refactoring the implementation, fixing bugs, writing a lot at once, <em>etc</em>.
all while they get a firmer grasp of the problem, stop to think about it, come
up with new ideas, <em>etc</em>.</p>
<p>Some specifically plan for including spaced pauses, and call it
“<a href="https://www.youtube.com/watch?v=f84n5oFoZBc">Hammock Driven Development</a>”, which is just
artist’s “creative idleness” for hackers.</p>
<p>Unless you’re hiring for a live coding group, a competitive programming team, or
a professional live demoer, timing the candidate that way is more harmful than
useful. This type of timing doesn’t find good programmers, it finds performant
programmers, which isn’t the same thing, and you’ll end up with people who can
do great work on small problems but who might be unable to deal with big
problems, and loose those who can very well handle huge problems, slowly. If you
are lucky you’ll get performant people who can also handle big problems on the
long term, but maybe not.</p>
<p>An incident is the closest to a “performance” that it gets, and yet it is still
dramatically different. Surely it is a high stress scenario, but while people
are trying to find a root cause and solve the problem, only the downtime itself
is visible to the exterior. It is like being part of the support staff backstage
during a play: even though execution matters, you’re still not on the spot.
During an incident you’re doing debugging in anger rather than live coding.</p>
<p>Although giving a candidate the task to write a “technical memo” has
potential to get a measure of the written communication skills of someone, doing
so in a hard time window also misses the point for the same reasons.</p>
<h3 id="pay-attention-to-typing-speed">Pay attention to typing speed</h3>
<p>Typing is speed in never the bottleneck of a programmer, no matter how great
they are.</p>
<p>As <a href="https://www.cs.utexas.edu/users/EWD/transcriptions/EWD05xx/EWD512.html">Dijkstra said</a>:</p>
<blockquote>
<p>But programming, when stripped of all its circumstantial irrelevancies, boils
down to no more and no less than very effective thinking so as to avoid
unmastered complexity, to very vigorous separation of your many different
concerns.</p>
</blockquote>
<p>In other words, programming is not about typing, it is about thinking.</p>
<p>Otherwise, the way to get those star programmers that can’t type fast enough a
huge productivity boost is to give them a touch typing course. If they are so
productive with typing speed being a limitation, imagine what they could
accomplish if they had razor sharp touch typing skills?</p>
<p>Also, why stop there? A good touch typist can do 90 WPM (words per minute), and
a great one can do 120 WPM, but with a stenography keyboard they get to 200
WPM+. That is double the productivity! Why not try
<a href="https://www.youtube.com/watch?v=Mz3JeYfBTcY">speech-to-text</a>? Make them all use <a href="https://www.jsoftware.com/#/">J</a> so they all need
to type less! How come nobody thought of that?</p>
<p>And if someone couldn’t solve the programming puzzle in the given time window,
but could come back in the following day with an implementation that is not only
faster, but uses less memory, was simpler to understand and easier to read than
anybody else? You’d be losing that person too.</p>
<h3 id="iq">IQ</h3>
<p>For “building an extraordinary team at a hard technology startup”, intelligence
is not the most important, <a href="http://www.paulgraham.com/determination.html">determination is</a>.</p>
<p>And talent isn’t “IQ specialized for engineers”. IQ itself isn’t a measure of how
intelligent someone is. Ever since Alfred Binet with Théodore Simon started to
formalize what would become IQ tests years later, they already acknowledged
limitations of the technique for measuring intelligence, which is
<a href="https://sci-hub.do/https://psycnet.apa.org/doiLanding?doi=10.1037%2F1076-8971.6.1.33">still true today</a>.</p>
<p>So having a high IQ tells only how smart people are for a particular aspect of
intelligence, which is not representative of programming. There are numerous
aspects of programming that are covered by IQ measurement: how to name variables
and functions, how to create models which are compatible with schema evolution,
how to make the system dynamic for runtime parameterization without making it
fragile, how to measure and observe performance and availability, how to pick
between acquiring and paying technical debt, <em>etc</em>.</p>
<p>Not to say about everything else that a programmer does that is not purely
programming. Saying high IQ correlates with great programming is a stretch, at
best.</p>
<h3 id="ditch-hr">Ditch HR</h3>
<p>Slava tangentially picks on HR, and I will digress on that a bit:</p>
<blockquote>
<p>A good rule of thumb is that if a question could be asked by an intern in HR,
it’s a non-differential signaling question.</p>
</blockquote>
<p>Stretching it, this is a rather snobbish view of HR. Why is it that an intern in
HR can’t make signaling questions? Could the same be said of an intern in
engineering?</p>
<p>In other words: is the question not signaling because the one
asking is from HR, or because the one asking is an intern? If the latter, than
he’s just arguing that interns have no place in interviewing, but if the former
than he was picking on HR.</p>
<p>Extrapolating that, it is common to find people who don’t value HR’s work, and
only see them as inferiors doing unpleasant work, and who aren’t capable enough
(or <em>smart</em> enough) to learn programming.</p>
<p>This is equivalent to people who work primarily on backend, and see others working on
frontend struggling and say: “isn’t it just building views and showing them on
the browser? How could it possibly be that hard? I bet I could do it better,
with 20% of code”. As you already know, the answer to it is “well, why don’t you
go do it, then?”.</p>
<p>This sense of superiority ignores the fact that HR have actual professionals
doing actual hard work, not unlike programmers. If HR is inferior and so easy,
why not automate everything away and get rid of a whole department?</p>
<p>I don’t attribute this world view to Slava, this is only an extrapolation of a
snippet of the article.</p>
<h3 id="draconian-mistreating-of-candidates">Draconian mistreating of candidates</h3>
<p>If I found out that people employed theatrics in my interview so that I could
feel I’ve “earned the privilege to work at your company”, I would quit.</p>
<p>If your moral compass is so broken that you are comfortable mistreating me while
I’m a candidate, I immediately assume you will also mistreat me as an employee,
and that the company is not a good place to work, as
<a href="http://www.paulgraham.com/apple.html">evil begets stupidity</a>:</p>
<blockquote>
<p>But the other reason programmers are fussy, I think, is that evil begets
stupidity. An organization that wins by exercising power starts to lose the
ability to win by doing better work. And it’s not fun for a smart person to
work in a place where the best ideas aren’t the ones that win. I think the
reason Google embraced “Don’t be evil” so eagerly was not so much to impress
the outside world as to inoculate themselves against arrogance.</p>
</blockquote>
<p>Paul Graham goes beyond “don’t be evil” with a better motto:
“<a href="http://www.paulgraham.com/good.html">be good</a>”.</p>
<p>Abusing the asymmetric nature of an interview to increase the chance that the
candidate will accept the offer is, well, abusive. I doubt a solid team can
actually be built on such poor foundations, surrounded by such evil measures.</p>
<p>And if you really want to give engineers “the measure of whoever they’re going
to be working with”, there are plenty of reasonable ways of doing it that don’t
include performing fake interviews.</p>
<h3 id="personality-tests">Personality tests</h3>
<p>Personality tests around the world need to be a) translated, b) adapted and c)
validated. Even though a given test may be applicable and useful in a country,
this doesn’t imply it will work for other countries.</p>
<p>Not only tests usually come with translation guidelines, but also its
applicability needs to be validated again after the translation and adaptation
is done to see if the test still measures what it is supposed to.</p>
<p>That is also true within the same language. If a test is shown to work in
England, it may not work in New Zealand, in spite of both speaking english. The
cultural context difference is influent to the point of invalidating a test and
making it be no longer valid.</p>
<p>Irregardless of the validity of the proposed “big five” personality test,
saying “just use attributes x, y and z this test and you’ll be fine” is a rough
simplification, much like saying “just use Raft for distributed systems, after
all it has been proven to work” shows he throws all of that background away.</p>
<p>So much as applying personality tests themselves is not a trivial task, and
psychologists do need special training to become able to effectively apply one.</p>
<h3 id="more-cargo-culting">More cargo culting</h3>
<p>He calls the ill-defined “industry standard” to be cargo-culting, but his
proposal isn’t sound enough to not become one.</p>
<p>Even if the ideas were good, they aren’t solid enough, or based on solid
enough things to make them stand out by themselves. Why is it that talent,
judgment and personality are required to determine the fitness of a good
candidate? Why not 2, 5, or 20 things? Why those specific 3? Why is talent
defined like that? Is it just because he found talent to be like that?</p>
<p>Isn’t that definitionally also
<a href="http://calteches.library.caltech.edu/51/2/CargoCult.htm">cargo-culting</a><sup id="fnref:cargo-culting-archive" role="doc-noteref"><a href="#fn:cargo-culting-archive" class="footnote" rel="footnote">1</a></sup>? Isn’t he just repeating
whatever he found to work form him, without understanding why?</p>
<p>What Feynman proposes is actually the opposite:</p>
<blockquote>
<p>In summary, the idea is to try to give <strong>all</strong> of the information to help others
to judge the value of your contribution; not just the information that leads
to judgment in one particular direction or another.</p>
</blockquote>
<p>What Slava did was just another form of cargo culting, but this was one that he
believed to work.</p>
<h2 id="what-to-do">What to do</h2>
<p>I will not give you a list of things that “worked for me, thus they are
correct”. I won’t either critique the current “industry standard”, nor what I’ve
learned from interviewing engineers.</p>
<p>Instead, I’d like to invite you to learn from history, and from what other
professionals have to teach us.</p>
<p>Programming isn’t an odd profession, where everything about it is different from
anything else. It is just another episode in the “technology” series, which has
seasons since before recorded history. It may be an episode where things move a
bit faster, but it is fundamentally the same.</p>
<p>So here is the key idea: what people did <em>before</em> software engineering?</p>
<p>What hiring is like for engineers in other areas? Don’t civil, electrical and
other types of engineering exist for much, much longer than software engineering
does? What have those centuries of accumulated experience thought the world
about technical hiring?</p>
<p>What studies were performed on the different success rate of interviewing
strategies? What have they done right and what have they done wrong?</p>
<p>What is the purpose of HR? Why do they even exist? Do we need them, and if so,
what for? What is the value they bring, since everybody insist on building an HR
department in their companies? Is the existence of HR another form of cargo
culting?</p>
<p>What is industrial and organizational psychology? What is that field of study?
What do they specialize in? What have they learned since the discipline
appeared? What have they done right and wrong over history? Is is the current
academic consensus on that area? What is a hot debate topic in academia on that
area? What is the current bleeding edge of research? What can they teach us
about hiring? What can they teach us about technical hiring?</p>
<h2 id="conclusion">Conclusion</h2>
<p>If all I’ve said makes me a “no hire” in the proposed framework, I’m really
glad.</p>
<p>This says less about my programming skills, and more about the employer’s world
view, and I hope not to be fooled into applying for a company that adopts this
one.</p>
<p>Claiming to be selecting “extraordinary engineers” isn’t an excuse to reinvent
the wheel, poorly.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:cargo-culting-archive" role="doc-endnote">
<p><a href="https://web.archive.org/web/20201003090303/http://calteches.library.caltech.edu/51/2/CargoCult.htm">Archived version</a>. <a href="#fnref:cargo-culting-archive" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
EuAndreh[email protected]This is a response to Slava’s “How to interview engineers” article. I initially thought it was a satire, as have others, but he has doubled down on it:Feature flags: differences between backend, frontend and mobile2020-10-19T00:00:00-03:002020-11-03T00:00:00-03:00https://euandre.org/2020/10/19/feature-flags-differences-between-backend-frontend-and-mobile.html
<p><em>This article is derived from a <a href="/slides/2020/10/19/rollout-feature-flag-experiment-operational-toggle.html">presentation</a> on the same
subject.</em></p>
<p>When discussing about feature flags, I find that their
costs and benefits are often well exposed and addressed. Online articles like
“<a href="https://martinfowler.com/articles/feature-toggles.html">Feature Toggle (aka Feature Flags)</a>” do a great job of
explaining them in detail, giving great general guidance of how to apply
techniques to adopt it.</p>
<p>However the weight of those costs and benefits apply differently on backend,
frontend or mobile, and those differences aren’t covered. In fact, many of them
stop making sense, or the decision of adopting a feature flag or not may change
depending on the environment.</p>
<p>In this article I try to make the distinction between environments and how
feature flags apply to them, with some final best practices I’ve acquired when
using them in production.</p>
<h2 id="why-feature-flags">Why feature flags</h2>
<p>Feature flags in general tend to be cited on the context of
<a href="https://www.atlassian.com/continuous-delivery/principles/continuous-integration-vs-delivery-vs-deployment">continuous deployment</a>:</p>
<blockquote>
<p>A: With continuous deployment, you deploy to production automatically</p>
</blockquote>
<blockquote>
<p>B: But how do I handle deployment failures, partial features, <em>etc.</em>?</p>
</blockquote>
<blockquote>
<p>A: With techniques like canary, monitoring and alarms, feature flags, <em>etc.</em></p>
</blockquote>
<p>Though adopting continuous deployment doesn’t force you to use feature
flags, it creates a demand for it. The inverse is also true: using feature flags
on the code points you more obviously to continuous deployment. Take the
following code sample for example, that we will reference later on the article:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="kd">function</span> <span class="nx">processTransaction</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">validate</span><span class="p">();</span>
<span class="nx">persist</span><span class="p">();</span>
<span class="c1">// TODO: add call to notifyListeners()</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>While being developed, being tested for suitability or something similar,
<code class="language-plaintext highlighter-rouge">notifyListeners()</code> may not be included in the code at once. So instead of
keeping it on a separate, long-lived branch, a feature flag can decide when the
new, partially implemented function will be called:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="rouge-code"><pre><span class="kd">function</span> <span class="nx">processTransaction</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">validate</span><span class="p">();</span>
<span class="nx">persist</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">featureIsEnabled</span><span class="p">(</span><span class="dl">"</span><span class="s2">activate-notify-listeners</span><span class="dl">"</span><span class="p">))</span> <span class="p">{</span>
<span class="nx">notifyListeners</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>This allows your code to include <code class="language-plaintext highlighter-rouge">notifyListeners()</code>, and decide when to call it
at runtime. For the price of extra things around the code, you get more
dynamicity.</p>
<p>So the fundamental question to ask yourself when considering adding a feature
flag should be:</p>
<blockquote>
<p>Am I willing to pay with code complexity to get dynamicity?</p>
</blockquote>
<p>It is true that you can make the management of feature flags as
straightforward as possible, but having no feature flags is simpler than having
any. What you get in return is the ability to parameterize the behaviour of the
application at runtime, without doing any code changes.</p>
<p>Sometimes this added complexity may tilt the balance towards not using a feature
flag, and sometimes the flexibility of changing behaviour at runtime is
absolutely worth the added complexity. This can vary a lot by code base, feature, but
fundamentally by environment: its much cheaper to deploy a new version of a
service than to release a new version of an app.</p>
<p>So the question of which environment is being targeted is key when reasoning
about costs and benefits of feature flags.</p>
<h2 id="control-over-the-environment">Control over the environment</h2>
<p>The key differentiator that makes the trade-offs apply differently is how much
control you have over the environment.</p>
<p>When running a <strong>backend</strong> service, you usually are paying for the servers
themselves, and can tweak them as you wish. This means you have full control do
to code changes as you wish. Not only that, you decide when to do it, and for
how long the transition will last.</p>
<p>On the <strong>frontend</strong> you have less control: even though you can choose to make a
new version available any time you wish, you can’t force<sup id="fnref:force" role="doc-noteref"><a href="#fn:force" class="footnote" rel="footnote">1</a></sup> clients to
immediately switch to the new version. That means that a) clients could skip
upgrades at any time and b) you always have to keep backward and forward
compatibility in mind.</p>
<p>Even though I’m mentioning frontend directly, it applies to other environment
with similar characteristics: desktop applications, command-line programs,
<em>etc</em>.</p>
<p>On <strong>mobile</strong> you have even less control: app stores need to allow your app to
be updated, which could bite you when least desired. Theoretically you could
make you APK available on third party stores like <a href="https://f-droid.org/">F-Droid</a>, or even
make the APK itself available for direct download, which would give you the same
characteristics of a frontend application, but that happens less often.</p>
<p>On iOS you can’t even do that. You have to get Apple’s blessing on every single
update. Even though we already know that is a <a href="http://www.paulgraham.com/apple.html">bad idea</a> for over a
decade now, there isn’t a way around it. This is where you have the least
control.</p>
<p>In practice, the amount of control you have will change how much you value
dynamicity: the less control you have, the more valuable it is. In other words,
having a dynamic flag on the backend may or may not be worth it since you could
always update the code immediately after, but on iOS it is basically always
worth it.</p>
<h2 id="rollout">Rollout</h2>
<p>A rollout is used to <em>roll out</em> a new version of software.</p>
<p>They are usually short-lived, being relevant as long as the new code is being
deployed. The most common rule is percentages.</p>
<p>On the <strong>backend</strong>, it is common to find it on the deployment infrastructure
itself, like canary servers, blue/green deployments,
<a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#creating-a-deployment">a kubernetes deployment rollout</a>, <em>etc</em>. You could do those manually, by
having a dynamic control on the code itself, but rollbacks are cheap enough that
people usually do a normal deployment and just give some extra attention to the
metrics dashboard.</p>
<p>Any time you see a blue/green deployment, there is a rollout happening: most
likely a load balancer is starting to direct traffic to the new server, until
reaching 100% of the traffic. Effectively, that is a rollout.</p>
<p>On the <strong>frontend</strong>, you can selectively pick which user’s will be able to
download the new version of a page. You could use geographical region, IP,
cookie or something similar to make this decision.</p>
<p>CDN propagation delays and people not refreshing their web
pages are also rollouts by themselves, since old and new versions of the
software will coexist.</p>
<p>On <strong>mobile</strong>, the Play Store allows you to perform
fine-grained <a href="https://support.google.com/googleplay/android-developer/answer/6346149?hl=en">staged rollouts</a>, and the App Store allows you to
perform limited <a href="https://help.apple.com/app-store-connect/#/dev3d65fcee1">phased releases</a>.</p>
<p>Both for Android and iOS, the user plays the role of making the download.</p>
<p>In summary: since you control the servers on the backend, you can do rollouts at
will, and those are often found automated away in base infrastructure. On the
frontend and on mobile, there are ways to make new versions available, but users
may not download them immediately, and many different versions of the software
end up coexisting.</p>
<h2 id="feature-flag">Feature flag</h2>
<p>A feature flag is a <em>flag</em> that tells the application on runtime to turn on or
off a given <em>feature</em>. That means that the actual production code will have more
than one possible code paths to go through, and that a new version of a feature
coexists with the old version. The feature flag tells which part of the code to
go through.</p>
<p>They are usually medium-lived, being relevant as long as the new code is being
developed. The most common rules are percentages, allow/deny lists, A/B groups
and client version.</p>
<p>On the <strong>backend</strong>, those are useful for things that have a long development
cycle, or that needs to done by steps. Consider loading the feature flag rules
in memory when the application starts, so that you avoid querying a database
or an external service for applying a feature flag rule and avoid flakiness on
the result due to intermittent network failures.</p>
<p>Since on the <strong>frontend</strong> you don’t control when to update the client software,
you’re left with applying the feature flag rule on the server, and exposing the
value through an API for maximum dynamicity. This could be in the frontend code
itself, and fallback to a “just refresh the page”/”just update to the latest
version” strategy for less dynamic scenarios.</p>
<p>On <strong>mobile</strong> you can’t even rely on a “just update to the latest version”
strategy, since the code for the app could be updated to a new feature and be
blocked on the store. Those cases aren’t recurrent, but you should always assume
the store will deny updates on critical moments so you don’t find yourself with
no cards to play. That means the only control you actually have is via
the backend, by parameterizing the runtime of the application using the API. In
practice, you should always have a feature flag to control any relevant piece of
code. There is no such thing as “too small code change for a feature flag”. What
you should ask yourself is:</p>
<blockquote>
<p>If the code I’m writing breaks and stays broken for around a month, do I care?</p>
</blockquote>
<p>If you’re doing an experimental screen, or something that will have a very small
impact you might answer “no” to the above question. For everything else, the
answer will be “yes”: bug fixes, layout changes, refactoring, new screen,
filesystem/database changes, <em>etc</em>.</p>
<h2 id="experiment">Experiment</h2>
<p>An experiment is a feature flag where you care about analytical value of the
flag, and how it might impact user’s behaviour. A feature flag with analytics.</p>
<p>They are also usually medium-lived, being relevant as long as the new code is
being developed. The most common rule is A/B test.</p>
<p>On the <strong>backend</strong>, an experiment rely on an analytical environment that will
pick the A/B test groups and distributions, which means those can’t be held in
memory easily. That also means that you’ll need a fallback value in case
fetching the group for a given customer fails.</p>
<p>On the <strong>frontend</strong> and on <strong>mobile</strong> they are no different from feature flags.</p>
<h2 id="operational-toggle">Operational toggle</h2>
<p>An operational toggle is like a system-level manual circuit breaker, where you
turn on/off a feature, fail over the load to a different server, <em>etc</em>. They are
useful switches to have during an incident.</p>
<p>They are usually long-lived, being relevant as long as the code is in
production. The most common rule is percentages.</p>
<p>They can be feature flags that are promoted to operational toggles on the
<strong>backend</strong>, or may be purposefully put in place preventively or after a
postmortem analysis.</p>
<p>On the <strong>frontend</strong> and on <strong>mobile</strong> they are similar to feature flags, where
the “feature” is being turned on and off, and the client interprets this value
to show if the “feature” is available or unavailable.</p>
<h2 id="best-practices">Best practices</h2>
<h3 id="prefer-dynamic-content">Prefer dynamic content</h3>
<p>Even though feature flags give you more dynamicity, they’re still somewhat
manual: you have to create one for a specific feature and change it by hand.</p>
<p>If you find yourself manually updating a feature flags every other day, or
tweaking the percentages frequently, consider making it fully dynamic. Try
using a dataset that is generated automatically, or computing the content on the
fly.</p>
<p>Say you have a configuration screen with a list of options and sub-options, and
you’re trying to find how to better structure this list. Instead of using a
feature flag for switching between 3 and 5 options, make it fully dynamic. This
way you’ll be able to perform other tests that you didn’t plan, and get more
flexibility out of it.</p>
<h3 id="use-the-client-version-to-negotiate-feature-flags">Use the client version to negotiate feature flags</h3>
<p>After effectively finishing a feature, the old code that coexisted with the new
one will be deleted, and all traces of the transition will vanish from the code
base. However if you just remove the feature flags from the API, all of the old
versions of clients that relied on that value to show the new feature will go
downgrade to the old feature.</p>
<p>This means that you should avoid deleting client-facing feature flags, and
retire them instead: use the client version to decide when the feature is
stable, and return <code class="language-plaintext highlighter-rouge">true</code> for every client with a version greater or equal to
that. This way you can stop thinking about the feature flag, and you don’t break
or downgrade clients that didn’t upgrade past the transition.</p>
<h3 id="beware-of-many-nested-feature-flags">Beware of many nested feature flags</h3>
<p>Nested flags combine exponentially.</p>
<p>Pick strategic entry points or transitions eligible for feature flags, and
beware of their nesting.</p>
<h3 id="include-feature-flags-in-the-development-workflow">Include feature flags in the development workflow</h3>
<p>Add feature flags to the list of things to think about during whiteboarding, and
deleting/retiring a feature flags at the end of the development.</p>
<h3 id="always-rely-on-a-feature-flag-on-the-app">Always rely on a feature flag on the app</h3>
<p>Again, there is no such thing “too small for a feature flag”. Too many feature
flags is a good problem to have, not the opposite. Automate the process of
creating a feature flag to lower its cost.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:force" role="doc-endnote">
<p>Technically you could force a reload with JavaScript using
<code class="language-plaintext highlighter-rouge">window.location.reload()</code>, but that not only is invasive and impolite, but
also gives you the illusion that you have control over the client when you
actually don’t: clients with disabled JavaScript would be immune to such
tactics. <a href="#fnref:force" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
EuAndreh[email protected]This article is derived from a presentation on the same subject.cargo2nix: Dramatically simpler Rust in Nix2020-10-05T02:00:00-03:002020-10-05T02:00:00-03:00https://euandre.org/2020/10/05/cargo2nix-dramatically-simpler-rust-in-nix.html
<p>In the same vein of my earlier post on
<a href="/2020/10/05/swift2nix-run-swift-inside-nix-builds.html">swift2nix</a>, I
was able to quickly prototype a Rust and Cargo variation of it:
<a href="https://euandre.org/static/attachments/cargo2nix.tar.gz">cargo2nix</a>.</p>
<p>The initial prototype is even smaller than swift2nix: it has only
37 lines of code.</p>
<p>Here’s how to use it (snippet taken from the repo’s README):</p>
<div class="language-nix highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
</pre></td><td class="rouge-code"><pre><span class="kd">let</span>
<span class="nv">niv-sources</span> <span class="o">=</span> <span class="kr">import</span> <span class="sx">./nix/sources.nix</span><span class="p">;</span>
<span class="nv">mozilla-overlay</span> <span class="o">=</span> <span class="kr">import</span> <span class="nv">niv-sources</span><span class="o">.</span><span class="nv">nixpkgs-mozilla</span><span class="p">;</span>
<span class="nv">pkgs</span> <span class="o">=</span> <span class="kr">import</span> <span class="nv">niv-sources</span><span class="o">.</span><span class="nv">nixpkgs</span> <span class="p">{</span> <span class="nv">overlays</span> <span class="o">=</span> <span class="p">[</span> <span class="nv">mozilla-overlay</span> <span class="p">];</span> <span class="p">};</span>
<span class="nv">src</span> <span class="o">=</span> <span class="nv">pkgs</span><span class="o">.</span><span class="nv">nix-gitignore</span><span class="o">.</span><span class="nv">gitignoreSource</span> <span class="p">[</span> <span class="p">]</span> <span class="sx">./.</span><span class="p">;</span>
<span class="nv">cargo2nix</span> <span class="o">=</span> <span class="nv">pkgs</span><span class="o">.</span><span class="nv">callPackage</span> <span class="nv">niv-sources</span><span class="o">.</span><span class="nv">cargo2nix</span> <span class="p">{</span>
<span class="nv">lockfile</span> <span class="o">=</span> <span class="sx">./Cargo.lock</span><span class="p">;</span>
<span class="p">};</span>
<span class="kn">in</span> <span class="nv">pkgs</span><span class="o">.</span><span class="nv">stdenv</span><span class="o">.</span><span class="nv">mkDerivation</span> <span class="p">{</span>
<span class="kn">inherit</span> <span class="nv">src</span><span class="p">;</span>
<span class="nv">name</span> <span class="o">=</span> <span class="s2">"cargo-test"</span><span class="p">;</span>
<span class="nv">buildInputs</span> <span class="o">=</span> <span class="p">[</span> <span class="nv">pkgs</span><span class="o">.</span><span class="nv">latest</span><span class="o">.</span><span class="nv">rustChannels</span><span class="o">.</span><span class="nv">nightly</span><span class="o">.</span><span class="nv">rust</span> <span class="p">];</span>
<span class="nv">phases</span> <span class="o">=</span> <span class="p">[</span> <span class="s2">"unpackPhase"</span> <span class="s2">"buildPhase"</span> <span class="p">];</span>
<span class="nv">buildPhase</span> <span class="o">=</span> <span class="s2">''</span><span class="err">
</span><span class="s2"> # Setup dependencies path to satisfy Cargo</span><span class="err">
</span><span class="s2"> mkdir .cargo/</span><span class="err">
</span><span class="s2"> ln -s </span><span class="si">${</span><span class="nv">cargo2nix</span><span class="o">.</span><span class="nv">env</span><span class="o">.</span><span class="nv">cargo-config</span><span class="si">}</span><span class="s2"> .cargo/config</span><span class="err">
</span><span class="s2"> ln -s </span><span class="si">${</span><span class="nv">cargo2nix</span><span class="o">.</span><span class="nv">env</span><span class="o">.</span><span class="nv">vendor</span><span class="si">}</span><span class="s2"> vendor</span><span class="err">
</span><span class="s2"> # Run the tests</span><span class="err">
</span><span class="s2"> cargo test</span><span class="err">
</span><span class="s2"> touch $out</span><span class="err">
</span><span class="s2"> ''</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>That <code class="language-plaintext highlighter-rouge">cargo test</code> part on line 20 is what I have been fighting with every
“*2nix” available for Rust out there. I don’t want to bash any of them. All I
want is to have full control of what Cargo commands to run, and the “*2nix” tool
should only setup the environment for me. Let me drive Cargo myself, no need to
parameterize how the tool runs it for me, or even replicate its internal
behaviour by calling the Rust compiler directly.</p>
<p>Sure it doesn’t support private registries or Git dependencies, but how much
bigger does it has to be to support them? Also, it doesn’t support those <strong>yet</strong>,
there’s no reason it can’t be extended. I just haven’t needed it yet, so I
haven’t added. Patches welcome.</p>
<p>The layout of the <code class="language-plaintext highlighter-rouge">vendor/</code> directory is more explicit and public then what
swift2nix does: it is whatever the command <code class="language-plaintext highlighter-rouge">cargo vendor</code> returns. However I
haven’t checked if the shape of the <code class="language-plaintext highlighter-rouge">.cargo-checksum.json</code> is specified, or
internal to Cargo.</p>
<p>Try out the demo (also taken from the repo’s README):</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="nb">pushd</span> <span class="s2">"</span><span class="si">$(</span><span class="nb">mktemp</span> <span class="nt">-d</span><span class="si">)</span><span class="s2">"</span>
wget <span class="nt">-O-</span> https://euandre.org/static/attachments/cargo2nix-demo.tar.gz |
<span class="nb">tar</span> <span class="nt">-xv</span>
<span class="nb">cd </span>cargo2nix-demo/
nix-build
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Report back if you wish.</p>
EuAndreh[email protected]In the same vein of my earlier post on swift2nix, I was able to quickly prototype a Rust and Cargo variation of it: cargo2nix.swift2nix: Run Swift inside Nix builds2020-10-05T01:00:00-03:002020-10-05T01:00:00-03:00https://euandre.org/2020/10/05/swift2nix-run-swift-inside-nix-builds.html
<p>While working on a Swift project, I didn’t find any tool that would allow Swift
to run inside <a href="https://nixos.org/">Nix</a> builds. Even thought you <em>can</em> run Swift, the real
problem arises when using the package manager. It has many of the same problems
that other package managers have when trying to integrate with Nix, more on this
below.</p>
<p>I wrote a simple little tool called <a href="https://euandre.org/static/attachments/swift2nix.tar.gz">swift2nix</a> that allows you trick
Swift’s package manager into assuming everything is set up. Here’s the example
from swift2nix’s README file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
</pre></td><td class="rouge-code"><pre>let
niv-sources = import ./nix/sources.nix;
pkgs = import niv-sources.nixpkgs { };
src = pkgs.nix-gitignore.gitignoreSource [ ] ./.;
swift2nix = pkgs.callPackage niv-sources.swift2nix {
package-resolved = ./Package.resolved;
};
in pkgs.stdenv.mkDerivation {
inherit src;
name = "swift-test";
buildInputs = with pkgs; [ swift ];
phases = [ "unpackPhase" "buildPhase" ];
buildPhase = ''
# Setup dependencies path to satisfy SwiftPM
mkdir .build
ln -s ${swift2nix.env.dependencies-state-json} .build/dependencies-state.json
ln -s ${swift2nix.env.checkouts} .build/checkouts
# Run the tests
swift test
touch $out
'';
}
</pre></td></tr></tbody></table></code></pre></div></div>
<p>The key parts are lines 15~17: we just fake enough files inside <code class="language-plaintext highlighter-rouge">.build/</code> that
Swift believes it has already downloaded and checked-out all dependencies, and
just moves on to building them.</p>
<p>I’ve worked on it just enough to make it usable for myself, so beware of
unimplemented cases.</p>
<h2 id="design">Design</h2>
<p>What swift2nix does is just provide you with the bare minimum that Swift
requires, and readily get out of the way:</p>
<ol>
<li>I explicitly did not want to generated a <code class="language-plaintext highlighter-rouge">Package.nix</code> file, since
<code class="language-plaintext highlighter-rouge">Package.resolved</code> already exists and contains the required information;</li>
<li>I didn’t want to have an “easy” interface right out of the gate, after
fighting with “*2nix” tools that focus too much on that.</li>
</ol>
<p>The final actual code was so small (46 lines) that it made me
think about package managers, “*2nix” tools and some problems with many of them.</p>
<h2 id="problems-with-package-managers">Problems with package managers</h2>
<p>I’m going to talk about solely language package managers. Think npm and cargo,
not apt-get.</p>
<p>Package managers want to do too much, or assume too much, or just want to take
control of the entire build of the dependencies.</p>
<p>This is a recurrent problem in package managers, but I don’t see it as an
intrinsic one. There’s nothing about a “package manager” that prevents it from
<em>declaring</em> what it expects to encounter and in which format. The <em>declaring</em>
part is important: it should be data, not code, otherwise you’re back in the
same problem, just like lockfiles are just data. Those work in any language, and
tools can cooperate happily.</p>
<p>There’s no need for this declarative expectation to be standardized, or be made
compatible across languages. That would lead to a poor format that no package
manager really likes. Instead, If every package manager could say out loud what
it wants to see exactly, than more tools like swift2nix could exist, and they
would be more reliable.</p>
<p>This could even work fully offline, and be simply a mapping from the lockfile
(the <code class="language-plaintext highlighter-rouge">Package.resolved</code> in Swift’s case) to the filesystem representation. For
Swift, the <code class="language-plaintext highlighter-rouge">.build/dependencies-state.json</code> comes very close, but it is internal
to the package manager.</p>
<p>Even though this pain only exists when trying to use Swift inside Nix, it sheds
light into this common implicit coupling that package managers have. They
usually have fuzzy boundaries and tight coupling between:</p>
<ol>
<li>resolving the dependency tree and using some heuristic to pick a package
version;</li>
<li>generating a lockfile with the exact pinned versions;</li>
<li>downloading the dependencies present on the lockfile into some local cache;</li>
<li>arranging the dependencies from the cache in a meaningful way for itself inside
the project;</li>
<li>work using the dependencies while <em>assuming</em> that step 4 was done.</li>
</ol>
<p>When you run <code class="language-plaintext highlighter-rouge">npm install</code> in a repository with no lockfile, it does 1~4. If you
do the same with <code class="language-plaintext highlighter-rouge">cargo build</code>, it does 1~5. That’s too much: many of those
assumptions are implicit and internal to the package manager, and if you ever
need to rearrange them, you’re on your own. Even though you can perform some of
those steps, you can’t compose or rearrange them.</p>
<p>Instead a much saner approach could be:</p>
<ol>
<li>this stays the same;</li>
<li>this also stays the same;</li>
<li>be able to generate some JSON/TOML/edn which represents the local expected
filesystem layout with dependencies (i.e. exposing what the package manager
expects to find), let’s call it <code class="language-plaintext highlighter-rouge">local-registry.json</code>;</li>
<li>if a <code class="language-plaintext highlighter-rouge">local-registry.json</code> was provided, do a build using that. Otherwise
generate its own, by downloading the dependencies, arranging them, <em>etc.</em></li>
</ol>
<p>The point is just making what the package manager requires visible to the
outside world via some declarative data. If this data wasn’t provided, it can
move on to doing its own automatic things.</p>
<p>By making the expectation explicit and public, one can plug tools <em>à la carte</em>
if desired, but doesn’t prevent the default code path of doing things the exact
same way they are now.</p>
<h2 id="problems-with-2nix-tools">Problems with “*2nix” tools</h2>
<p>I have to admit: I’m unhappy with most of they.</p>
<p>They conflate “using Nix” with “replicating every command of the package manager
inside Nix”.</p>
<p>The avoidance of an “easy” interface that I mentioned above comes from me
fighting with some of the “*2nix” tools much like I have to fight with package
managers: I don’t want to offload all build responsibilities to the “*2nix”
tool, I just want to let it download some of the dependencies and get out of the
way. I want to stick with <code class="language-plaintext highlighter-rouge">npm test</code> or <code class="language-plaintext highlighter-rouge">cargo build</code>, and Nix should only
provide the environment.</p>
<p>This is something that <a href="https://github.com/svanderburg/node2nix">node2nix</a> does right. It allows you to build
the Node.js environment to satisfy NPM, and you can keep using NPM for
everything else:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="nb">ln</span> <span class="nt">-s</span> <span class="k">${</span><span class="nv">node2nix</span><span class="p">-package.shell.nodeDependencies</span><span class="k">}</span>/lib/node_modules ./node_modules
npm <span class="nb">test</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Its natural to want to put as much things into Nix as possible to benefit from
Nix’s advantages. Isn’t that how NixOS itself was born?</p>
<p>But a “*2nix” tool should leverage Nix, not be coupled with it. The above
example lets you run any arbitrary NPM command while profiting from isolation
and reproducibility that Nix provides. It is even less brittle: any changes to
how NPM runs some things will be future-compatible, since node2nix isn’t trying
to replicate what NPM does, or fiddling with NPM’s internal.</p>
<p><strong>A “*2nix” tool should build the environment, preferably from the lockfile
directly and offload everything else to the package manager</strong>. The rest is just
nice-to-have.</p>
<p>swift2nix itself could provide an “easy” interface, something that allows you to
write:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre>nix-build <span class="nt">-A</span> swift2nix.release
nix-build <span class="nt">-A</span> swift2nix.test
</pre></td></tr></tbody></table></code></pre></div></div>
<p>The implementation of those would be obvious: create a new
<code class="language-plaintext highlighter-rouge">pkgs.stdenv.mkDerivation</code> and call <code class="language-plaintext highlighter-rouge">swift build -c release</code> and <code class="language-plaintext highlighter-rouge">swift test</code>
while using <code class="language-plaintext highlighter-rouge">swift2nix.env</code> under the hood.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Package managers should provide exact dependencies via a data representation,
i.e. lockfiles, and expose via another data representation how they expect those
dependencies to appear on the filesystem, i.e. <code class="language-plaintext highlighter-rouge">local-registry.json</code>. This
allows package managers to provide an API so that external tools can create
mirrors, offline builds, other registries, isolated builds, <em>etc.</em></p>
<p>”*2nix” tools should build simple functions that leverage that
<code class="language-plaintext highlighter-rouge">local-registry.json</code><sup id="fnref:local-registry" role="doc-noteref"><a href="#fn:local-registry" class="footnote" rel="footnote">1</a></sup> data and offload all the rest back to the
package manager itself. This allows the “*2nix” to not keep chasing the package
manager evolution, always trying to duplicate its behaviour.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:local-registry" role="doc-endnote">
<p>This <code class="language-plaintext highlighter-rouge">local-registry.json</code> file doesn’t have to be checked-in
the repository at all. It could be always generated on the fly, much like
how Swift’s <code class="language-plaintext highlighter-rouge">dependencies-state.json</code> is. <a href="#fnref:local-registry" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
EuAndreh[email protected]While working on a Swift project, I didn’t find any tool that would allow Swift to run inside Nix builds. Even thought you can run Swift, the real problem arises when using the package manager. It has many of the same problems that other package managers have when trying to integrate with Nix, more on this below.The database I wish I had2020-08-31T00:00:00-03:002020-09-03T00:00:00-03:00https://euandre.org/2020/08/31/the-database-i-wish-i-had.html
<p>I watched the talk
“<a href="https://vimeo.com/230142234">Platform as a Reflection of Values: Joyent, Node.js and beyond</a>”
by Bryan Cantrill, and I think he was able to put into words something I already
felt for some time: if there’s no piece of software out there that reflects your
values, it’s time for you to build that software<sup id="fnref:talk-time" role="doc-noteref"><a href="#fn:talk-time" class="footnote" rel="footnote">1</a></sup>.</p>
<p>I kind of agree with what he said, because this is already happening to me. I
long for a database with a certain set of values, and for a few years I was just
waiting for someone to finally write it. After watching his talk, Bryan is
saying to me: “time to stop waiting, and start writing it yourself”.</p>
<p>So let me try to give an overview of such database, and go over its values.</p>
<h2 id="overview">Overview</h2>
<p>I want a database that allows me to create decentralized client-side
applications that can sync data.</p>
<p>The best one-line description I can give right now is:</p>
<blockquote>
<p>It’s sort of like PouchDB, Git, Datomic, SQLite and Mentat.</p>
</blockquote>
<p>A more descriptive version could be:</p>
<blockquote>
<p>An embedded, immutable, syncable relational database.</p>
</blockquote>
<p>Let’s go over what I mean by each of those aspects one by one.</p>
<h3 id="embedded">Embedded</h3>
<p>I think the server-side database landscape is diverse and mature enough for
my needs (even though I end up choosing SQLite most of the time), and what I’m
after is a database to be embedded on client-side applications itself, be it
desktop, browser, mobile, <em>etc.</em></p>
<p>The purpose of such database is not to keep some local cache of data in case of
lost connectivity: we have good solutions for that already. It should serve as
the source of truth, and allow the application to work on top of it.</p>
<p><a href="https://sqlite.org/index.html"><strong>SQLite</strong></a> is a great example of that: it is a very powerful
relational database that runs <a href="https://sqlite.org/whentouse.html">almost anywhere</a>. What I miss
from it that SQLite doesn’t provide is the ability to run it on the browser:
even though you could compile it to WebAssembly, <del>it assumes a POSIX filesystem
that would have to be emulated</del><sup id="fnref:posix-sqlite" role="doc-noteref"><a href="#fn:posix-sqlite" class="footnote" rel="footnote">2</a></sup>.</p>
<p><a href="https://pouchdb.com/"><strong>PouchDB</strong></a> is another great example: it’s a full reimplementation of
<a href="https://couchdb.apache.org/">CouchDB</a> that targets JavaScript environments, mainly the browser and
Node.js. However I want a tool that can be deployed anywhere, and not limit its
applications to places that already have a JavaScript runtime environment, or
force the developer to bundle a JavaScript runtime environment with their
application. This is true for GTK+ applications, command line programs, Android
apps, <em>etc.</em></p>
<p><a href="https://github.com/mozilla/mentat"><strong>Mentat</strong></a> was an interesting project, but its reliance on SQLite
makes it inherit most of the downsides (and benefits too) of SQLite itself.</p>
<p>Having such a requirement imposes a different approach to storage: we have to
decouple the knowledge about the intricacies of storage from the usage of
storage itself, so that a module (say query processing) can access storage
through an API without needing to know about its implementation. This allows
the database to target a POSIX filesystems storage API and an IndexedDB storage
API, and make the rest of the code agnostic about storage. PouchDB has such
mechanism (called <a href="https://pouchdb.com/adapters.html">adapters</a>) and Datomic has them too (called
<a href="https://docs.datomic.com/on-prem/storage.html">storage services</a>).</p>
<p>This would allow the database to adapt to where it is embedded: when targeting
the browser the IndexedDB storage API would provide the persistence layer
that the database requires, and similarly the POSIX filesystem storage API would
provide the persistence layer when targeting POSIX systems (like desktops,
mobile, <em>etc.</em>).</p>
<p>But there’s also an extra restriction that comes from by being embedded: it
needs to provide and embeddable artifact, most likely a binary library object
that exposes a C compatible FFI, similar to
<a href="https://www.sqlite.org/amalgamation.html">how SQLite does</a>. Bundling a full runtime environment is
possible, but doesn’t make it a compelling solution for embedding. This rules
out most languages, and leaves us with C, Rust, Zig, and similar options that
can target POSIX systems and WebAssembly.</p>
<h3 id="immutable">Immutable</h3>
<p>Being immutable means that only new information is added, no in-place update
ever happens, and nothing is ever deleted.</p>
<p>Having an immutable database presents us with similar trade-offs found in
persistent data structures, like lack of coordination when doing reads, caches
being always coherent, and more usage of space.</p>
<p><a href="https://www.datomic.com/"><strong>Datomic</strong></a> is the go to database example of this: it will only add
information (datoms) and allows you to query them in a multitude of ways. Stuart
Halloway calls it “accumulate-only” over “append-only”<sup id="fnref:accumulate-only" role="doc-noteref"><a href="#fn:accumulate-only" class="footnote" rel="footnote">3</a></sup>:</p>
<blockquote>
<p>It’s accumulate-only, it is not append-only. So append-only, most people when
they say that they’re implying something physical about what happens.</p>
</blockquote>
<p>Also a database can be append-only and overwrite existing information with new
information, by doing clean-ups of “stale” data. I prefer to adopt the
“accumulate-only” naming and approach.</p>
<p><a href="https://git-scm.com/"><strong>Git</strong></a> is another example of this: new commits are always added on top
of the previous data, and it grows by adding commits instead of replacing
existing ones.</p>
<p>Git repositories can only grow in size, and that is not only an acceptable
condition, but also one of the reasons to use it.</p>
<p>All this means that no in-place updates happens on data, and the database will
be much more concerned about how compact and efficiently it stores data than how
fast it does writes to disk. Being embedded, the storage limitation is either a)
how much storage the device has or b) how much storage was designed for the
application to consume. So even though the database could theoretically operate
with hundreds of TBs, a browser page or mobile application wouldn’t have access
to this amount of storage. SQLite even <a href="https://sqlite.org/limits.html">says</a> that it does
support approximately 280 TBs of data, but those limits are untested.</p>
<p>The upside of keeping everything is that you can have historical views of your
data, which is very powerful. This also means that applications should turn this
off when not relevant<sup id="fnref:no-history" role="doc-noteref"><a href="#fn:no-history" class="footnote" rel="footnote">4</a></sup>.</p>
<h3 id="syncable">Syncable</h3>
<p>This is a frequent topic when talking about offline-first solutions. When
building applications that:</p>
<ul>
<li>can fully work offline,</li>
<li>stores data,</li>
<li>propagates that data to other application instances,</li>
</ul>
<p>then you’ll need a conflict resolution strategy to handle all the situations
where different application instances disagree. Those application instances
could be a desktop and a browser version of the same application, or the same
mobile app in different devices.</p>
<p>A three-way merge seems to be the best approach, on top of which you could add
application specific conflict resolution functions, like:</p>
<ul>
<li>pick the change with higher timestamp;</li>
<li>if one change is a delete, pick it;</li>
<li>present the diff on the screen and allow the user to merge them.</li>
</ul>
<p>Some databases try to make this “easy”, by choosing a strategy for you, but I’ve
found that different applications require different conflict resolution
strategies. Instead, the database should leave this up to the user to decide,
and provide tools for them to do it.</p>
<p><a href="https://en.wikipedia.org/wiki/Merge_(version_control)"><strong>Three-way merges in version control</strong></a> are the best example,
performing automatic merges when possible and asking the user to resolve
conflicts when they appear.</p>
<p>The unit of conflict for a version control system is a line of text. The
database equivalent would probably be a single attribute, not a full entity or a
full row.</p>
<p>Making all the conflict resolution logic be local should allow the database to
have encrypted remotes similar to how <a href="https://spwhitton.name/tech/code/git-remote-gcrypt/">git-remote-gcrypt</a>
adds this functionality to Git. This would enable users to sync the application
data across devices using an untrusted intermediary.</p>
<h3 id="relational">Relational</h3>
<p>I want the power of relational queries on the client applications.</p>
<p>Most of the arguments against traditional table-oriented relational databases
are related to write performance, but those don’t apply here. The bottlenecks
for client applications usually aren’t write throughput. Nobody is interested in
differentiating between 1 MB/s or 10 MB/s when you’re limited to 500 MB total.</p>
<p>The relational model of the database could either be based on SQL and tables
like in SQLite, or maybe <a href="https://docs.datomic.com/on-prem/query.html">datalog</a> and <a href="https://docs.datomic.com/cloud/whatis/data-model.html#datoms">datoms</a> like in
Datomic.</p>
<h2 id="from-aspects-to-values">From aspects to values</h2>
<p>Now let’s try to translate the aspects above into values, as suggested by Bryan
Cantrill.</p>
<h3 id="portability">Portability</h3>
<p>Being able to target so many different platforms is a bold goal, and the
embedded nature of the database demands portability to be a core value.</p>
<h3 id="integrity">Integrity</h3>
<p>When the local database becomes the source of truth of the application, it must
provide consistency guarantees that enables applications to rely on it.</p>
<h3 id="expressiveness">Expressiveness</h3>
<p>The database should empower applications to slice and dice the data in any way
it wants to.</p>
<h2 id="next-steps">Next steps</h2>
<p>Since I can’t find any database that fits these requirements, I’ve finally come
to terms with doing it myself.</p>
<p>It’s probably going to take me a few years to do it, and making it portable
between POSIX and IndexedDB will probably be the biggest challenge. I got myself
a few books on databases to start.</p>
<p>I wonder if I’ll ever be able to get this done.</p>
<h2 id="external-links">External links</h2>
<p>See discussions on <a href="https://www.reddit.com/r/programming/comments/ijwz5b/the_database_i_wish_i_had/">Reddit</a>, <a href="https://lobste.rs/s/m9vkg4/database_i_wish_i_had">lobsters</a>, <a href="https://news.ycombinator.com/item?id=24337244">HN</a> and
<a href="https://lists.sr.ht/~euandreh/public-inbox/%3C010101744a592b75-1dce9281-f0b8-4226-9d50-fd2c7901fa72-000000%40us-west-2.amazonses.com%3E">a lengthy email exchange</a>.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:talk-time" role="doc-endnote">
<p>At the very end, at time 29:49. When talking about the draft of
this article with a friend, he noted that Bryan O’Sullivan (a different
Bryan) says a similar thing on his talk
“<a href="https://www.youtube.com/watch?v=ZR3Jirqk6W8">Running a startup on Haskell</a>”,
at time 4:15. <a href="#fnref:talk-time" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:posix-sqlite" role="doc-endnote">
<p>It was <a href="https://news.ycombinator.com/item?id=24338881">pointed out to me</a>
that SQLite doesn’t assume the existence of a POSIX filesystem, as I wrongly
stated. Thanks for the correction.</p>
<p>This makes me consider it as a storage backend all by itself. I
initially considered having an SQLite storage backend as one implementation
of the POSIX filesystem storage API that I mentioned. My goal was to rely on
it so I could validate the correctness of the actual implementation, given
SQLite’s robustness.</p>
<p>However it may even better to just use SQLite, and get an ACID backend
without recreating a big part of SQLite from scratch. In fact, both Datomic
and PouchDB didn’t create an storage backend for themselves, they just
plugged on what already existed and already worked. I’m beginning to think
that it would be wiser to just do the same, and drop entirely the from
scratch implementation that I mentioned.</p>
<p>That’s not to say that adding an IndexedDB compatibility layer to SQLite
would be enough to make it fit the other requirements I mention on this
page. SQLite still is an implementation of a update-in-place, SQL,
table-oriented database. It is probably true that cherry-picking the
relevant parts of SQLite (like storage access, consistency, crash recovery,
parser generator, <em>etc.</em>) and leaving out the unwanted parts (SQL, tables,
threading, <em>etc.</em>) would be better than including the full SQLite stack, but
that’s simply an optimization. Both could even coexist, if desired.</p>
<p>SQLite would have to be treated similarly to how Datomic treats SQL
databases: instead of having a table for each entities, spread attributes
over the tables, <em>etc.</em>, it treats SQL databases as a key-value storage so it
doesn’t have to re-implement interacting with the disk that other databases
do well.</p>
<p>The tables would contain blocks of binary data, so there isn’t a difference
on how the SQLite storage backend behaves and how the IndexedDB storage
backend behaves, much like how Datomic works the same regardless of the
storage backend, same for PouchDB.</p>
<p>I welcome corrections on what I said above, too. <a href="#fnref:posix-sqlite" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:accumulate-only" role="doc-endnote">
<p>Video “<a href="https://vimeo.com/116315075">Day of Datomic Part 2</a>”
on Datomic’s information model, at time 12:28. <a href="#fnref:accumulate-only" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:no-history" role="doc-endnote">
<p>Similar to
<a href="https://docs.datomic.com/cloud/best.html#nohistory-for-high-churn">Datomic’s <code class="language-plaintext highlighter-rouge">:db/noHistory</code></a>. <a href="#fnref:no-history" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
EuAndreh[email protected]I watched the talk “Platform as a Reflection of Values: Joyent, Node.js and beyond” by Bryan Cantrill, and I think he was able to put into words something I already felt for some time: if there’s no piece of software out there that reflects your values, it’s time for you to build that software1. At the very end, at time 29:49. When talking about the draft of ↩Guix inside sourcehut builds.sr.ht CI2020-08-10T00:00:00-03:002020-08-19T00:00:00-03:00https://euandre.org/2020/08/10/guix-inside-sourcehut-builds-sr-ht-ci.html
<p>After the release of the <a href="https://man.sr.ht/builds.sr.ht/compatibility.md#nixos">NixOS images in builds.sr.ht</a> and much
usage of it, I also started looking at <a href="https://guix.gnu.org/">Guix</a> and
wondered if I could get it on the awesome builds.sr.ht service.</p>
<p>The Guix manual section on the <a href="https://guix.gnu.org/manual/en/guix.html#Binary-Installation">binary installation</a> is very thorough, and
even a <a href="https://git.savannah.gnu.org/cgit/guix.git/plain/etc/guix-install.sh">shell installer script</a> is provided, but it is built towards someone
installing Guix on their personal computer, and relies heavily on interactive
input.</p>
<p>I developed the following set of scripts that I have been using for some time to
run Guix tasks inside builds.sr.ht jobs. First, <code class="language-plaintext highlighter-rouge">install-guix.sh</code>:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
</pre></td><td class="rouge-code"><pre><span class="c">#!/usr/bin/env bash</span>
<span class="nb">set</span> <span class="nt">-x</span>
<span class="nb">set</span> <span class="nt">-Eeuo</span> pipefail
<span class="nv">VERSION</span><span class="o">=</span><span class="s1">'1.0.1'</span>
<span class="nv">SYSTEM</span><span class="o">=</span><span class="s1">'x86_64-linux'</span>
<span class="nv">BINARY</span><span class="o">=</span><span class="s2">"guix-binary-</span><span class="k">${</span><span class="nv">VERSION</span><span class="k">}</span><span class="s2">.</span><span class="k">${</span><span class="nv">SYSTEM</span><span class="k">}</span><span class="s2">.tar.xz"</span>
<span class="nb">cd</span> /tmp
wget <span class="s2">"https://ftp.gnu.org/gnu/guix/</span><span class="k">${</span><span class="nv">BINARY</span><span class="k">}</span><span class="s2">"</span>
<span class="nb">tar</span> <span class="nt">-xf</span> <span class="s2">"</span><span class="k">${</span><span class="nv">BINARY</span><span class="k">}</span><span class="s2">"</span>
<span class="nb">sudo mv </span>var/guix /var/
<span class="nb">sudo mv </span>gnu /
<span class="nb">sudo mkdir</span> <span class="nt">-p</span> ~root/.config/guix
<span class="nb">sudo ln</span> <span class="nt">-fs</span> /var/guix/profiles/per-user/root/current-guix ~root/.config/guix/current
<span class="nv">GUIX_PROFILE</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span><span class="nb">echo</span> ~root<span class="si">)</span><span class="s2">/.config/guix/current"</span>
<span class="nb">source</span> <span class="s2">"</span><span class="k">${</span><span class="nv">GUIX_PROFILE</span><span class="k">}</span><span class="s2">/etc/profile"</span>
groupadd <span class="nt">--system</span> guixbuild
<span class="k">for </span>i <span class="k">in</span> <span class="si">$(</span><span class="nb">seq</span> <span class="nt">-w</span> 1 10<span class="si">)</span><span class="p">;</span>
<span class="k">do
</span>useradd <span class="nt">-g</span> guixbuild <span class="se">\</span>
<span class="nt">-G</span> guixbuild <span class="se">\</span>
<span class="nt">-d</span> /var/empty <span class="se">\</span>
<span class="nt">-s</span> <span class="s2">"</span><span class="si">$(</span><span class="nb">command</span> <span class="nt">-v</span> nologin<span class="si">)</span><span class="s2">"</span> <span class="se">\</span>
<span class="nt">-c</span> <span class="s2">"Guix build user </span><span class="k">${</span><span class="nv">i</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--system</span> <span class="se">\</span>
<span class="s2">"guixbuilder</span><span class="k">${</span><span class="nv">i</span><span class="k">}</span><span class="s2">"</span><span class="p">;</span>
<span class="k">done
</span><span class="nb">mkdir</span> <span class="nt">-p</span> /usr/local/bin
<span class="nb">cd</span> /usr/local/bin
<span class="nb">ln</span> <span class="nt">-s</span> /var/guix/profiles/per-user/root/current-guix/bin/guix <span class="nb">.</span>
<span class="nb">ln</span> <span class="nt">-s</span> /var/guix/profiles/per-user/root/current-guix/bin/guix-daemon <span class="nb">.</span>
guix archive <span class="nt">--authorize</span> < ~root/.config/guix/current/share/guix/ci.guix.gnu.org.pub
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Almost all of it is taken directly from the <a href="https://guix.gnu.org/manual/en/guix.html#Binary-Installation">binary installation</a> section
from the manual, with the interactive bits stripped out: after downloading and
extracting the Guix tarball, we create some symlinks, add guixbuild users and
authorize the <code class="language-plaintext highlighter-rouge">ci.guix.gnu.org.pub</code> signing key.</p>
<p>After installing Guix, we perform a <code class="language-plaintext highlighter-rouge">guix pull</code> to update Guix inside <code class="language-plaintext highlighter-rouge">start-guix.sh</code>:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre><span class="c">#!/usr/bin/env bash</span>
<span class="nb">set</span> <span class="nt">-x</span>
<span class="nb">set</span> <span class="nt">-Eeuo</span> pipefail
<span class="nb">sudo </span>guix-daemon <span class="nt">--build-users-group</span><span class="o">=</span>guixbuild &
guix pull
guix package <span class="nt">-u</span>
guix <span class="nt">--version</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Then we can put it all together in a sample <code class="language-plaintext highlighter-rouge">.build.yml</code> configuration file I’m
using myself:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="rouge-code"><pre><span class="na">image</span><span class="pi">:</span> <span class="s">debian/stable</span>
<span class="na">packages</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">wget</span>
<span class="na">sources</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">https://git.sr.ht/~euandreh/songbooks</span>
<span class="na">tasks</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">install-guix</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">cd ./songbooks/</span>
<span class="s">./scripts/install-guix.sh</span>
<span class="s">./scripts/start-guix.sh</span>
<span class="s">echo 'sudo guix-daemon --build-users-group=guixbuild &' >> ~/.buildenv</span>
<span class="s">echo 'export PATH="${HOME}/.config/guix/current/bin${PATH:+:}$PATH"' >> ~/.buildenv</span>
<span class="pi">-</span> <span class="na">tests</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">cd ./songbooks/</span>
<span class="s">guix environment -m build-aux/guix.scm -- make check</span>
<span class="pi">-</span> <span class="na">docs</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">cd ./songbooks/</span>
<span class="s">guix environment -m build-aux/guix.scm -- make publish-dist</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>We have to add the <code class="language-plaintext highlighter-rouge">guix-daemon</code> to <code class="language-plaintext highlighter-rouge">~/.buildenv</code> so it can be started on every
following task run. Also, since we used <code class="language-plaintext highlighter-rouge">wget</code> inside <code class="language-plaintext highlighter-rouge">install-guix.sh</code>, we had
to add it to the images package list.</p>
<p>After the <code class="language-plaintext highlighter-rouge">install-guix</code> task, you can use Guix to build and test your project,
or run any <code class="language-plaintext highlighter-rouge">guix environment --ad-hoc my-package -- my script</code> :)</p>
<h2 id="improvements">Improvements</h2>
<p>When I originally created this code I had a reason why to have both a <code class="language-plaintext highlighter-rouge">sudo</code>
call for <code class="language-plaintext highlighter-rouge">sudo ./scripts/install-guix.sh</code> and <code class="language-plaintext highlighter-rouge">sudo</code> usages inside
<code class="language-plaintext highlighter-rouge">install-guix.sh</code> itself. I couldn’t figure out why (it feels like my past self
was a bit smarter 😬), but it feels ugly now. If it is truly required I could
add an explanation for it, or remove this entirely in favor of a more elegant solution.</p>
<p>I could also contribute the Guix image upstream to builds.sr.ht, but there
wasn’t any build or smoke tests in the original <a href="https://git.sr.ht/~sircmpwn/builds.sr.ht">repository</a>, so I wasn’t
inclined to make something that just “works on my machine” or add a maintainence
burden to the author. I didn’t look at it again recently, though.</p>
EuAndreh[email protected]After the release of the NixOS images in builds.sr.ht and much usage of it, I also started looking at Guix and wondered if I could get it on the awesome builds.sr.ht service.Using NixOS as an stateless workstation2019-06-02T00:00:00-03:002019-06-02T00:00:00-03:00https://euandre.org/2019/06/02/using-nixos-as-an-stateless-workstation.html
<p>Last week<sup id="fnref:last-week" role="doc-noteref"><a href="#fn:last-week" class="footnote" rel="footnote">1</a></sup> I changed back to an old<sup id="fnref:old-computer" role="doc-noteref"><a href="#fn:old-computer" class="footnote" rel="footnote">2</a></sup> Samsung laptop, and installed
<a href="https://nixos.org/">NixOS</a> on it.</p>
<p>After using NixOS on another laptop for around two years, I wanted
verify how reproducible was my desktop environment, and how far does
NixOS actually can go on recreating my whole OS from my configuration
files and personal data. I gravitated towards NixOS after trying (and
failing) to create an <code class="language-plaintext highlighter-rouge">install.sh</code> script that would imperatively
install and configure my whole OS using apt-get. When I found a
GNU/Linux distribution that was built on top of the idea of
declaratively specifying the whole OS I was automatically convinced<sup id="fnref:convinced-by-declarative-aspect" role="doc-noteref"><a href="#fn:convinced-by-declarative-aspect" class="footnote" rel="footnote">3</a></sup>.</p>
<p>I was impressed. Even though I’ve been experiencing the benefits of Nix
isolation daily, I always felt skeptical that something would be
missing, because the devil is always on the details. But the result was
much better than expected!</p>
<p>There were only 2 missing configurations:</p>
<ol>
<li>tap-to-click on the touchpad wasn’t enabled by default;</li>
<li>the default theme from the gnome-terminal is “Black on white”
instead of “White on black”.</li>
</ol>
<p>That’s all.</p>
<p>I haven’t checked if I can configure those in NixOS GNOME module, but I
guess both are scriptable and could be set in a fictional <code class="language-plaintext highlighter-rouge">setup.sh</code>
run.</p>
<p>This makes me really happy, actually. More happy than I anticipated.</p>
<p>Having such a powerful declarative OS makes me feel like my data is the
really important stuff (as it should be), and I can interact with it on
any workstation. All I need is an internet connection and a few hours to
download everything. It feels like my physical workstation and the
installed OS are serving me and my data, instead of me feeling as
hostage to the specific OS configuration at the moment. Having a few
backup copies of everything important extends such peacefulness.</p>
<p>After this positive experience with recreating my OS from simple Nix
expressions, I started to wonder how far I could go with this, and
started considering other areas of improvements:</p>
<h3 id="first-run-on-a-fresh-nixos-installation">First run on a fresh NixOS installation</h3>
<p>Right now the initial setup relies on non-declarative manual tasks, like
decrypting some credentials, or manually downloading <strong>this</strong> git
repository with specific configurations before <strong>that</strong> one.</p>
<p>I wonder what some areas of improvements are on this topic, and if
investing on it is worth it (both time-wise and happiness-wise).</p>
<h3 id="emacs">Emacs</h3>
<p>Right now I’m using the <a href="http://spacemacs.org/">Spacemacs</a>, which is a
community package curation and configuration on top of
<a href="https://www.gnu.org/software/emacs/">Emacs</a>.</p>
<p>Spacemacs does support the notion of
<a href="http://spacemacs.org/doc/LAYERS.html">layers</a>, which you can
declaratively specify and let Spacemacs do the rest.</p>
<p>However this solution isn’t nearly as robust as Nix: being purely
functional, Nix does describe everything required to build a derivation,
and knows how to do so. Spacemacs it closer to more traditional package
managers: even though the layers list is declarative, the installation
is still very much imperative. I’ve had trouble with Spacemacs not
behaving the same on different computers, both with identical
configurations, only brought to convergence back again after a
<code class="language-plaintext highlighter-rouge">git clean -fdx</code> inside <code class="language-plaintext highlighter-rouge">~/.emacs.d/</code>.</p>
<p>The ideal solution would be managing Emacs packages with Nix itself.
After a quick search I did found that <a href="https://nixos.org/nixos/manual/index.html#module-services-emacs-adding-packages">there is support for Emacs
packages in
Nix</a>.
So far I was only aware of <a href="https://www.gnu.org/software/guix/manual/en/html_node/Application-Setup.html#Emacs-Packages">Guix support for Emacs packages</a>.</p>
<p>This isn’t a trivial change because Spacemacs does include extra
curation and configuration on top of Emacs packages. I’m not sure the
best way to improve this right now.</p>
<h3 id="myrepos">myrepos</h3>
<p>I’m using <a href="https://myrepos.branchable.com/">myrepos</a> to manage all my
git repositories, and the general rule I apply is to add any repository
specific configuration in myrepos’ <code class="language-plaintext highlighter-rouge">checkout</code> phase:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="rouge-code"><pre><span class="c"># sample ~/.mrconfig file snippet</span>
<span class="o">[</span>dev/guix/guix]
checkout <span class="o">=</span>
git clone https://git.savannah.gnu.org/git/guix.git guix
<span class="nb">cd </span>guix/
git config sendemail.to [email protected]
</pre></td></tr></tbody></table></code></pre></div></div>
<p>This way when I clone this repo again the email sending is already
pre-configured.</p>
<p>This works well enough, but the solution is too imperative, and my
<code class="language-plaintext highlighter-rouge">checkout</code> phases tend to become brittle over time if not enough care is
taken.</p>
<h3 id="gnu-stow">GNU Stow</h3>
<p>For my home profile and personal configuration I already have a few
dozens of symlinks that I manage manually. This has worked so far, but
the solution is sometimes fragile and <a href="https://euandre.org/git/dotfiles/tree/bash/symlinks.sh?id=316939aa215181b1d22b69e94241eef757add98d">not declarative at all</a>. I
wonder if something like <a href="https://www.gnu.org/software/stow/">GNU Stow</a> can help me simplify this.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I’m really satisfied with NixOS, and I intend to keep using it. If what
I’ve said interests you, maybe try tinkering with the <a href="https://nixos.org/nix/">Nix package
manager</a> (not the whole NixOS) on your current
distribution (it can live alongside any other package manager).</p>
<p>If you have experience with declarative Emacs package managements, GNU
Stow or any similar tool, <em>etc.</em>,
<a href="mailto:[email protected]">I’d like some tips</a>. If you don’t have any
experience at all, I’d still love to hear from you.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:last-week" role="doc-endnote">
<p>“Last week” as of the start of this writing, so around the end of
May 2019. <a href="#fnref:last-week" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:old-computer" role="doc-endnote">
<p>I was using a 32GB RAM, i7 and 250GB SSD Samsung laptop. The
switch was back to a 8GB RAM, i5 and 500GB HDD Dell laptop. The biggest
difference I noticed was on faster memory, both RAM availability and the
disk speed, but I had 250GB less local storage space. <a href="#fnref:old-computer" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:convinced-by-declarative-aspect" role="doc-endnote">
<p>The declarative configuration aspect is
something that I now completely take for granted, and wouldn’t consider
using something which isn’t declarative. A good metric to show this is me
realising that I can’t pinpoint the moment when I decided to switch to
NixOS. It’s like I had a distant past when this wasn’t true. <a href="#fnref:convinced-by-declarative-aspect" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
EuAndreh[email protected]Last week1 I changed back to an old2 Samsung laptop, and installed NixOS on it. “Last week” as of the start of this writing, so around the end of ↩ I was using a 32GB RAM, i7 and 250GB SSD Samsung laptop. The ↩Using “youtube-dl” to manage YouTube subscriptions2018-12-21T00:00:00-02:002018-12-21T00:00:00-02:00https://euandre.org/2018/12/21/using-youtube-dl-to-manage-youtube-subscriptions.html
<p>I’ve recently read the
<a href="https://www.reddit.com/r/DataHoarder/comments/9sg8q5/i_built_a_selfhosted_youtube_subscription_manager/">announcement</a>
of a very nice <a href="https://github.com/chibicitiberiu/ytsm">self-hosted YouTube subscription
manager</a>. I haven’t used
YouTube’s built-in subscriptions for a while now, and haven’t missed
it at all. When I saw the announcement, I considered writing about the
solution I’ve built on top of <a href="https://youtube-dl.org/">youtube-dl</a>.</p>
<h2 id="background-the-problem-with-youtube">Background: the problem with YouTube</h2>
<p>In many ways, I agree with <a href="https://staltz.com/what-happens-when-you-block-internet-giants.html">André Staltz’s view on data ownership and
privacy</a>:</p>
<blockquote>
<p>I started with the basic premise that “I want to be in control of my
data”. Sometimes that meant choosing when to interact with an internet
giant and how much I feel like revealing to them. Most of times it
meant not interacting with them at all. I don’t want to let them be in
full control of how much they can know about me. I don’t want to be in
autopilot mode. (…) Which leads us to YouTube. While I was able to
find alternatives to Gmail (Fastmail), Calendar (Fastmail), Translate
(Yandex Translate), <em>etc.</em> YouTube remains as the most indispensable
Google-owned web service. It is really really hard to avoid consuming
YouTube content. It was probably the smartest startup acquisition
ever. My privacy-oriented alternative is to watch YouTube videos
through Tor, which is technically feasible but not polite to use the
Tor bandwidth for these purposes. I’m still scratching my head with
this issue.</p>
</blockquote>
<p>Even though I don’t use most alternative services he mentions, I do
watch videos from YouTube. But I also feel uncomfortable logging in to
YouTube with a Google account, watching videos, creating playlists and
similar things.</p>
<p>Using the mobile app is worse: you can’t even block ads in there.
You’re in less control on what you share with YouTube and Google.</p>
<h2 id="youtube-dl">youtube-dl</h2>
<p>youtube-dl is a command-line tool for downloading videos, from YouTube
and <a href="https://rg3.github.io/youtube-dl/supportedsites.html">many other sites</a>:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="nv">$ </span>youtube-dl https://www.youtube.com/watch?v<span class="o">=</span>rnMYZnY3uLA
<span class="o">[</span>youtube] rnMYZnY3uLA: Downloading webpage
<span class="o">[</span>youtube] rnMYZnY3uLA: Downloading video info webpage
<span class="o">[</span>download] Destination: A Origem da Vida _ Nerdologia-rnMYZnY3uLA.mp4
<span class="o">[</span>download] 100% of 32.11MiB <span class="k">in </span>00:12
</pre></td></tr></tbody></table></code></pre></div></div>
<p>It can be used to download individual videos as showed above, but it
also has some interesting flags that we can use:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">--output</code>: use a custom template to create the name of the
downloaded file;</li>
<li><code class="language-plaintext highlighter-rouge">--download-archive</code>: use a text file for recording and remembering
which videos were already downloaded;</li>
<li><code class="language-plaintext highlighter-rouge">--prefer-free-formats</code>: prefer free video formats, like <code class="language-plaintext highlighter-rouge">webm</code>,
<code class="language-plaintext highlighter-rouge">ogv</code> and Matroska <code class="language-plaintext highlighter-rouge">mkv</code>;</li>
<li><code class="language-plaintext highlighter-rouge">--playlist-end</code>: how many videos to download from a “playlist” (a
channel, a user or an actual playlist);</li>
<li><code class="language-plaintext highlighter-rouge">--write-description</code>: write the video description to a
<code class="language-plaintext highlighter-rouge">.description</code> file, useful for accessing links and extra content.</li>
</ul>
<p>Putting it all together:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="rouge-code"><pre><span class="nv">$ </span>youtube-dl <span class="s2">"https://www.youtube.com/channel/UClu474HMt895mVxZdlIHXEA"</span> <span class="se">\</span>
<span class="nt">--download-archive</span> ~/Nextcloud/cache/youtube-dl-seen.conf <span class="se">\</span>
<span class="nt">--prefer-free-formats</span> <span class="se">\</span>
<span class="nt">--playlist-end</span> 20 <span class="se">\</span>
<span class="nt">--write-description</span> <span class="se">\</span>
<span class="nt">--output</span> <span class="s2">"~/Downloads/yt-dl/%(uploader)s/%(upload_date)s - %(title)s.%(ext)s"</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>This will download the latest 20 videos from the selected channel, and
write down the video IDs in the <code class="language-plaintext highlighter-rouge">youtube-dl-seen.conf</code> file. Running it
immediately after one more time won’t have any effect.</p>
<p>If the channel posts one more video, running the same command again will
download only the last video, since the other 19 were already
downloaded.</p>
<p>With this basic setup you have a minimal subscription system at work,
and you can create some functions to help you manage that:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
</pre></td><td class="rouge-code"><pre><span class="c">#!/bin/sh</span>
<span class="nb">export </span><span class="nv">DEFAULT_PLAYLIST_END</span><span class="o">=</span>15
download<span class="o">()</span> <span class="o">{</span>
youtube-dl <span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span> <span class="se">\</span>
<span class="nt">--download-archive</span> ~/Nextcloud/cache/youtube-dl-seen.conf <span class="se">\</span>
<span class="nt">--prefer-free-formats</span> <span class="se">\</span>
<span class="nt">--playlist-end</span> <span class="nv">$2</span> <span class="se">\</span>
<span class="nt">--write-description</span> <span class="se">\</span>
<span class="nt">--output</span> <span class="s2">"~/Downloads/yt-dl/%(uploader)s/%(upload_date)s - %(title)s.%(ext)s"</span>
<span class="o">}</span>
<span class="nb">export</span> <span class="nt">-f</span> download
download_user<span class="o">()</span> <span class="o">{</span>
download <span class="s2">"https://www.youtube.com/user/</span><span class="nv">$1</span><span class="s2">"</span> <span class="k">${</span><span class="nv">2</span><span class="p">-</span><span class="nv">$DEFAULT_PLAYLIST_END</span><span class="k">}</span>
<span class="o">}</span>
<span class="nb">export</span> <span class="nt">-f</span> download_user
download_channel<span class="o">()</span> <span class="o">{</span>
download <span class="s2">"https://www.youtube.com/channel/</span><span class="nv">$1</span><span class="s2">"</span> <span class="k">${</span><span class="nv">2</span><span class="p">-</span><span class="nv">$DEFAULT_PLAYLIST_END</span><span class="k">}</span>
<span class="o">}</span>
<span class="nb">export</span> <span class="nt">-f</span> download_channel
download_playlist<span class="o">()</span> <span class="o">{</span>
download <span class="s2">"https://www.youtube.com/playlist?list=</span><span class="nv">$1</span><span class="s2">"</span> <span class="k">${</span><span class="nv">2</span><span class="p">-</span><span class="nv">$DEFAULT_PLAYLIST_END</span><span class="k">}</span>
<span class="o">}</span>
<span class="nb">export</span> <span class="nt">-f</span> download_playlist
</pre></td></tr></tbody></table></code></pre></div></div>
<p>With these functions, you now can have a subscription fetching script to
download the latest videos from your favorite channels:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="c">#!/bin/sh</span>
download_user ClojureTV 15
download_channel <span class="s2">"UCmEClzCBDx-vrt0GuSKBd9g"</span> 100
download_playlist <span class="s2">"PLqG7fA3EaMRPzL5jzd83tWcjCUH9ZUsbX"</span> 15
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Now, whenever you want to watch the latest videos, just run the above
script and you’ll get all of them in your local machine.</p>
<h2 id="tradeoffs">Tradeoffs</h2>
<h3 id="ive-made-it-for-myself-with-my-use-case-in-mind">I’ve made it for myself, with my use case in mind</h3>
<ol>
<li>
<p>Offline</p>
<p>My internet speed it somewhat reasonable<sup id="fnref:internet-speed" role="doc-noteref"><a href="#fn:internet-speed" class="footnote" rel="footnote">1</a></sup>, but it is really
unstable. Either at work or at home, it’s not uncommon to loose internet
access for 2 minutes 3~5 times every day, and stay completely offline for a
couple of hours once every week.</p>
<p>Working through the hassle of keeping a playlist on disk has payed
off many, many times. Sometimes I even not notice when the
connection drops for some minutes, because I’m watching a video and
working on some document, all on my local computer.</p>
<p>There’s also no quality adjustment for YouTube’s web player, I
always pick the higher quality and it doesn’t change during the
video. For some types of content, like a podcast with some tiny
visual resources, this doesn’t change much. For other types of
content, like a keynote presentation with text written on the
slides, watching on 144p isn’t really an option.</p>
<p>If the internet connection drops during the video download,
youtube-dl will resume from where it stopped.</p>
<p>This is an offline first benefit that I really like, and works well
for me.</p>
</li>
<li>
<p>Sync the “seen” file</p>
<p>I already have a running instance of Nextcloud, so just dumping the
<code class="language-plaintext highlighter-rouge">youtube-dl-seen.conf</code> file inside Nextcloud was a no-brainer.</p>
<p>You could try putting it in a dedicated git repository, and wrap the
script with an autocommit after every run. If you ever had a merge
conflict, you’d simply accept all changes and then run:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre><span class="nv">$ </span><span class="nb">uniq </span>youtube-dl-seen.conf <span class="o">></span> youtube-dl-seen.conf
</pre></td></tr></tbody></table></code></pre></div> </div>
<p>to tidy up the file.</p>
</li>
<li>
<p>Doesn’t work on mobile</p>
<p>My primary device that I use everyday is my laptop, not my phone. It
works well for me this way.</p>
<p>Also, it’s harder to add ad-blockers to mobile phones, and most
mobile software still depends on Google’s and Apple’s blessing.</p>
<p>If you wish, you can sync the videos to the SD card periodically,
but that’s a bit of extra manual work.</p>
</li>
</ol>
<h3 id="the-good">The Good</h3>
<ol>
<li>
<p>Better privacy</p>
<p>We don’t even have to configure the ad-blocker to keep ads and
trackers away!</p>
<p>YouTube still has your IP address, so using a VPN is always a good
idea. However, a timing analysis would be able to identify you
(considering the current implementation).</p>
</li>
<li>
<p>No need to self-host</p>
<p>There’s no host that needs maintenance. Everything runs locally.</p>
<p>As long as you keep youtube-dl itself up to date and sync your
“seen” file, there’s little extra work to do.</p>
</li>
<li>
<p>Track your subscriptions with git</p>
<p>After creating a <code class="language-plaintext highlighter-rouge">subscriptions.sh</code> executable that downloads all
the videos, you can add it to git and use it to track metadata about
your subscriptions.</p>
</li>
</ol>
<h3 id="the-bad">The Bad</h3>
<ol>
<li>
<p>Maximum playlist size is your disk size</p>
<p>This is a good thing for getting a realistic view on your actual
“watch later” list. However I’ve run out of disk space many
times, and now I need to be more aware of how much is left.</p>
</li>
</ol>
<h3 id="the-ugly">The Ugly</h3>
<p>We can only avoid all the bad parts of YouTube with youtube-dl as long
as YouTube keeps the videos public and programmatically accessible. If
YouTube ever blocks that we’d loose the ability to consume content this
way, but also loose confidence on considering YouTube a healthy
repository of videos on the internet.</p>
<h2 id="going-beyond">Going beyond</h2>
<p>Since you’re running everything locally, here are some possibilities to
be explored:</p>
<h3 id="a-playlist-that-is-too-long-for-being-downloaded-all-at-once">A playlist that is too long for being downloaded all at once</h3>
<p>You can wrap the <code class="language-plaintext highlighter-rouge">download_playlist</code> function (let’s call the wrapper
<code class="language-plaintext highlighter-rouge">inc_download</code>) and instead of passing it a fixed number to the
<code class="language-plaintext highlighter-rouge">--playlist-end</code> parameter, you can store the <code class="language-plaintext highlighter-rouge">$n</code> in a folder
(something like <code class="language-plaintext highlighter-rouge">$HOME/.yt-db/$PLAYLIST_ID</code>) and increment it by <code class="language-plaintext highlighter-rouge">$step</code>
every time you run <code class="language-plaintext highlighter-rouge">inc_download</code>.</p>
<p>This way you can incrementally download videos from a huge playlist
without filling your disk with gigabytes of content all at once.</p>
<h3 id="multiple-computer-scenario">Multiple computer scenario</h3>
<p>The <code class="language-plaintext highlighter-rouge">download_playlist</code> function could be aware of the specific machine
that it is running on and apply specific policies depending on the
machine: always download everything; only download videos that aren’t
present anywhere else; <em>etc.</em></p>
<h2 id="conclusion">Conclusion</h2>
<p>youtube-dl is a great tool to keep at hand. It covers a really large
range of video websites and works robustly.</p>
<p>Feel free to copy and modify this code, and
<a href="mailto:[email protected]">send me</a> suggestions of improvements or related
content.</p>
<h2 id="edit"><em>Edit</em></h2>
<p>2019-05-22: Fix spelling.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:internet-speed" role="doc-endnote">
<p>Considering how expensive it is and the many ways it could be
better, but also how much it has improved over the last years, I say it’s
reasonable. <a href="#fnref:internet-speed" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
EuAndreh[email protected]I’ve recently read the announcement of a very nice self-hosted YouTube subscription manager. I haven’t used YouTube’s built-in subscriptions for a while now, and haven’t missed it at all. When I saw the announcement, I considered writing about the solution I’ve built on top of youtube-dl.Verifying “npm ci” reproducibility2018-08-01T00:00:00-03:002019-05-22T00:00:00-03:00https://euandre.org/2018/08/01/verifying-npm-ci-reproducibility.html
<p>When <a href="https://blog.npmjs.org/post/161081169345/v500">npm@5</a> came bringing
<a href="https://docs.npmjs.com/files/package-locks">package-locks</a> with it, I was
confused about the benefits it provided, since running <code class="language-plaintext highlighter-rouge">npm install</code> more than
once could resolve all the dependencies again and yield yet another fresh
<code class="language-plaintext highlighter-rouge">package-lock.json</code> file. The message saying “you should add this file to
version control” left me hesitant on what to do<sup id="fnref:package-lock-message" role="doc-noteref"><a href="#fn:package-lock-message" class="footnote" rel="footnote">1</a></sup>.</p>
<p>However the <a href="https://blog.npmjs.org/post/171556855892/introducing-npm-ci-for-faster-more-reliable">addition of <code class="language-plaintext highlighter-rouge">npm ci</code></a>
filled this gap: it’s a stricter variation of <code class="language-plaintext highlighter-rouge">npm install</code> which
guarantees that “<a href="https://docs.npmjs.com/files/package-lock.json">subsequent installs are able to generate identical trees</a>”. But are they
really identical? I could see that I didn’t have the same problems of
different installation outputs, but I didn’t know for <strong>sure</strong> if it
was really identical.</p>
<h2 id="computing-the-hash-of-a-directorys-content">Computing the hash of a directory’s content</h2>
<p>I quickly searched for a way to check for the hash signature of an
entire directory tree, but I couldn’t find one. I’ve made a poor
man’s <a href="https://en.wikipedia.org/wiki/Merkle_tree">Merkle tree</a>
implementation using <code class="language-plaintext highlighter-rouge">sha256sum</code> and a few piped commands at the
terminal:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="rouge-code"><pre>merkle-tree <span class="o">()</span> <span class="o">{</span>
<span class="nb">dirname</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">1</span><span class="p">-.</span><span class="k">}</span><span class="s2">"</span>
<span class="nb">pushd</span> <span class="s2">"</span><span class="nv">$dirname</span><span class="s2">"</span>
find <span class="nb">.</span> <span class="nt">-type</span> f | <span class="se">\</span>
<span class="nb">sort</span> | <span class="se">\</span>
xargs <span class="nt">-I</span><span class="o">{}</span> <span class="nb">sha256sum</span> <span class="s2">"{}"</span> | <span class="se">\</span>
<span class="nb">sha256sum</span> | <span class="se">\</span>
<span class="nb">awk</span> <span class="s1">'{print $1}'</span>
<span class="nb">popd</span>
<span class="o">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Going through it line by line:</p>
<ul>
<li>#1 we define a Bash function called <code class="language-plaintext highlighter-rouge">merkle-tree</code>;</li>
<li>#2 it accepts a single argument: the directory to compute the
merkle tree from. If nothing is given, it runs on the current
directory (<code class="language-plaintext highlighter-rouge">.</code>);</li>
<li>#3 we go to the directory, so we don’t get different prefixes in
<code class="language-plaintext highlighter-rouge">find</code>’s output (like <code class="language-plaintext highlighter-rouge">../a/b</code>);</li>
<li>#4 we get all files from the directory tree. Since we’re using
<code class="language-plaintext highlighter-rouge">sha256sum</code> to compute the hash of the file contents, we need to
filter out folders from it;</li>
<li>#5 we need to sort the output, since different file systems and
<code class="language-plaintext highlighter-rouge">find</code> implementations may return files in different orders;</li>
<li>#6 we use <code class="language-plaintext highlighter-rouge">xargs</code> to compute the hash of each file individually
through <code class="language-plaintext highlighter-rouge">sha256sum</code>. Since a file may contain spaces we need to
escape it with quotes;</li>
<li>#7 we compute the hash of the combined hashes. Since <code class="language-plaintext highlighter-rouge">sha256sum</code>
output is formatted like <code class="language-plaintext highlighter-rouge"><hash> <filename></code>, it produces a
different final hash if a file ever changes name without changing
it’s content;</li>
<li>#8 we get the final hash output, excluding the <code class="language-plaintext highlighter-rouge"><filename></code> (which
is <code class="language-plaintext highlighter-rouge">-</code> in this case, aka <code class="language-plaintext highlighter-rouge">stdin</code>).</li>
</ul>
<h3 id="positive-points">Positive points:</h3>
<ol>
<li>ignore timestamp: running more than once on different installation
yields the same hash;</li>
<li>the name of the file is included in the final hash computation.</li>
</ol>
<h3 id="limitations">Limitations:</h3>
<ol>
<li>it ignores empty folders from the hash computation;</li>
<li>the implementation’s only goal is to represent using a digest
whether the content of a given directory is the same or not. Leaf
presence checking is obviously missing from it.</li>
</ol>
<h3 id="testing-locally-with-sample-data">Testing locally with sample data</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="rouge-code"><pre><span class="nb">mkdir</span> /tmp/merkle-tree-test/
<span class="nb">cd</span> /tmp/merkle-tree-test/
<span class="nb">mkdir</span> <span class="nt">-p</span> a/b/ a/c/ d/
<span class="nb">echo</span> <span class="s2">"one"</span> <span class="o">></span> a/b/one.txt
<span class="nb">echo</span> <span class="s2">"two"</span> <span class="o">></span> a/c/two.txt
<span class="nb">echo</span> <span class="s2">"three"</span> <span class="o">></span> d/three.txt
merkle-tree <span class="nb">.</span> <span class="c"># output is be343bb01fe00aeb8fef14a3e16b1c3d1dccbf86d7e41b4753e6ccb7dc3a57c3</span>
merkle-tree <span class="nb">.</span> <span class="c"># output still is be343bb01fe00aeb8fef14a3e16b1c3d1dccbf86d7e41b4753e6ccb7dc3a57c3</span>
<span class="nb">echo</span> <span class="s2">"four"</span> <span class="o">></span> d/four.txt
merkle-tree <span class="nb">.</span> <span class="c"># output is now b5464b958969ed81815641ace96b33f7fd52c20db71a7fccc45a36b3a2ae4d4c</span>
<span class="nb">rm </span>d/four.txt
merkle-tree <span class="nb">.</span> <span class="c"># output back to be343bb01fe00aeb8fef14a3e16b1c3d1dccbf86d7e41b4753e6ccb7dc3a57c3</span>
<span class="nb">echo</span> <span class="s2">"hidden-five"</span> <span class="o">></span> a/b/one.txt
merkle-tree <span class="nb">.</span> <span class="c"># output changed 471fae0d074947e4955e9ac53e95b56e4bc08d263d89d82003fb58a0ffba66f5</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>It seems to work for this simple test case.</p>
<p>You can try copying and pasting it to verify the hash signatures.</p>
<h2 id="using-merkle-tree-to-check-the-output-of-npm-ci">Using <code class="language-plaintext highlighter-rouge">merkle-tree</code> to check the output of <code class="language-plaintext highlighter-rouge">npm ci</code></h2>
<p><em>I’ve done all of the following using Node.js v8.11.3 and [email protected].</em></p>
<p>In this test case I’ll take the main repo of
<a href="https://lernajs.io/">Lerna</a><sup id="fnref:lerna-package-lock" role="doc-noteref"><a href="#fn:lerna-package-lock" class="footnote" rel="footnote">2</a></sup>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre><span class="nb">cd</span> /tmp/
git clone https://github.com/lerna/lerna.git
<span class="nb">cd </span>lerna/
git checkout 57ff865c0839df75dbe1974971d7310f235e1109
npm ci
merkle-tree node_modules/ <span class="c"># outputs 11e218c4ac32fac8a9607a8da644fe870a25c99821167d21b607af45699afafa</span>
<span class="nb">rm</span> <span class="nt">-rf</span> node_modules/
npm ci
merkle-tree node_modules/ <span class="c"># outputs 11e218c4ac32fac8a9607a8da644fe870a25c99821167d21b607af45699afafa</span>
npm ci <span class="c"># test if it also works with an existing node_modules/ folder</span>
merkle-tree node_modules/ <span class="c"># outputs 11e218c4ac32fac8a9607a8da644fe870a25c99821167d21b607af45699afafa</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Good job <code class="language-plaintext highlighter-rouge">npm ci</code> :)</p>
<p>#6 and #9 take some time to run (21 seconds in my machine), but this
specific use case isn’t performance sensitive. The slowest step is
computing the hash of each individual file.</p>
<h2 id="conclusion">Conclusion</h2>
<p><code class="language-plaintext highlighter-rouge">npm ci</code> really “generates identical trees”.</p>
<p>I’m not aware of any other existing solution for verifying the hash
signature of a directory. If you know any I’d
<a href="mailto:[email protected]">like to know</a>.</p>
<h2 id="edit"><em>Edit</em></h2>
<p>2019-05-22: Fix spelling.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:package-lock-message" role="doc-endnote">
<p>The
<a href="https://docs.npmjs.com/cli/install#description">documentation</a> claims <code class="language-plaintext highlighter-rouge">npm
install</code> is driven by the existing <code class="language-plaintext highlighter-rouge">package-lock.json</code>, but that’s actually
<a href="https://github.com/npm/npm/issues/17979#issuecomment-332701215">a little bit tricky</a>. <a href="#fnref:package-lock-message" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:lerna-package-lock" role="doc-endnote">
<p>Finding a big known repo that actually committed the
<code class="language-plaintext highlighter-rouge">package-lock.json</code> file was harder than I expected. <a href="#fnref:lerna-package-lock" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
EuAndreh[email protected]When npm@5 came bringing package-locks with it, I was confused about the benefits it provided, since running npm install more than once could resolve all the dependencies again and yield yet another fresh package-lock.json file. The message saying “you should add this file to version control” left me hesitant on what to do1. The ↩Running Guix on NixOS2018-07-17T00:00:00-03:002018-07-17T00:00:00-03:00https://euandre.org/2018/07/17/running-guix-on-nixos.html
<p>I wanted to run
Guix on a NixOS machine. Even though the Guix manual explains how to do it
<a href="https://www.gnu.org/software/guix/manual/en/html_node/Binary-Installation.html#Binary-Installation">step by step</a>, I needed a few extra ones to make it work properly.</p>
<p>I couldn’t just install GuixSD because my wireless network card
doesn’t have any free drivers (yet).</p>
<h2 id="creating-guixbuilder-users">Creating <code class="language-plaintext highlighter-rouge">guixbuilder</code> users</h2>
<p>Guix requires you to create non-root users that will be used to perform
the builds in the isolated environments.</p>
<p>The <a href="https://www.gnu.org/software/guix/manual/en/html_node/Build-Environment-Setup.html#Build-Environment-Setup">manual</a> already provides you with a ready to run (as root) command for
creating the build users:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre>groupadd <span class="nt">--system</span> guixbuild
<span class="k">for </span>i <span class="k">in</span> <span class="sb">`</span><span class="nb">seq</span> <span class="nt">-w</span> 1 10<span class="sb">`</span><span class="p">;</span>
<span class="k">do
</span>useradd <span class="nt">-g</span> guixbuild <span class="nt">-G</span> guixbuild <span class="se">\</span>
<span class="nt">-d</span> /var/empty <span class="nt">-s</span> <span class="sb">`</span>which nologin<span class="sb">`</span> <span class="se">\</span>
<span class="nt">-c</span> <span class="s2">"Guix build user </span><span class="nv">$i</span><span class="s2">"</span> <span class="nt">--system</span> <span class="se">\</span>
guixbuilder<span class="nv">$i</span><span class="p">;</span>
<span class="k">done</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>However, In my personal NixOS I have disabled <a href="https://nixos.org/nixos/manual/index.html#sec-user-management"><code class="language-plaintext highlighter-rouge">users.mutableUsers</code></a>, which
means that even if I run the above command it means that they’ll be removed once
I rebuild my OS:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="rouge-code"><pre><span class="nv">$ </span><span class="nb">sudo </span>nixos-rebuild switch
<span class="o">(</span>...<span class="o">)</span>
removing user ‘guixbuilder7’
removing user ‘guixbuilder3’
removing user ‘guixbuilder10’
removing user ‘guixbuilder1’
removing user ‘guixbuilder6’
removing user ‘guixbuilder9’
removing user ‘guixbuilder4’
removing user ‘guixbuilder2’
removing user ‘guixbuilder8’
removing user ‘guixbuilder5’
<span class="o">(</span>...<span class="o">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Instead of enabling <code class="language-plaintext highlighter-rouge">users.mutableUsers</code> I could add the Guix users by
adding them to my system configuration:</p>
<div class="language-nix highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
</pre></td><td class="rouge-code"><pre><span class="p">{</span> <span class="nv">config</span><span class="p">,</span> <span class="nv">pkgs</span><span class="p">,</span> <span class="o">...</span><span class="p">}:</span>
<span class="p">{</span>
<span class="c"># ... NixOS usual config ellided ...</span>
<span class="nv">users</span> <span class="o">=</span> <span class="p">{</span>
<span class="nv">mutableUsers</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="nv">extraUsers</span> <span class="o">=</span>
<span class="kd">let</span>
<span class="nv">andrehUser</span> <span class="o">=</span> <span class="p">{</span>
<span class="nv">andreh</span> <span class="o">=</span> <span class="p">{</span>
<span class="c"># my custom user config</span>
<span class="p">};</span>
<span class="p">};</span>
<span class="nv">buildUser</span> <span class="o">=</span> <span class="p">(</span><span class="nv">i</span><span class="p">:</span>
<span class="p">{</span>
<span class="s2">"guixbuilder</span><span class="si">${</span><span class="nv">i</span><span class="si">}</span><span class="s2">"</span> <span class="o">=</span> <span class="p">{</span> <span class="c"># guixbuilder$i</span>
<span class="nv">group</span> <span class="o">=</span> <span class="s2">"guixbuild"</span><span class="p">;</span> <span class="c"># -g guixbuild</span>
<span class="nv">extraGroups</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"guixbuild"</span><span class="p">];</span> <span class="c"># -G guixbuild</span>
<span class="nv">home</span> <span class="o">=</span> <span class="s2">"/var/empty"</span><span class="p">;</span> <span class="c"># -d /var/empty</span>
<span class="nv">shell</span> <span class="o">=</span> <span class="nv">pkgs</span><span class="o">.</span><span class="nv">nologin</span><span class="p">;</span> <span class="c"># -s `which nologin`</span>
<span class="nv">description</span> <span class="o">=</span> <span class="s2">"Guix build user </span><span class="si">${</span><span class="nv">i</span><span class="si">}</span><span class="s2">"</span><span class="p">;</span> <span class="c"># -c "Guix buid user $i"</span>
<span class="nv">isSystemUser</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span> <span class="c"># --system</span>
<span class="p">};</span>
<span class="p">}</span>
<span class="p">);</span>
<span class="kn">in</span>
<span class="c"># merge all users</span>
<span class="nv">pkgs</span><span class="o">.</span><span class="nv">lib</span><span class="o">.</span><span class="nv">fold</span> <span class="p">(</span><span class="nv">str</span><span class="p">:</span> <span class="nv">acc</span><span class="p">:</span> <span class="nv">acc</span> <span class="o">//</span> <span class="nv">buildUser</span> <span class="nv">str</span><span class="p">)</span>
<span class="nv">andrehUser</span>
<span class="c"># for i in `seq -w 1 10`</span>
<span class="p">(</span><span class="kr">map</span> <span class="p">(</span><span class="nv">pkgs</span><span class="o">.</span><span class="nv">lib</span><span class="o">.</span><span class="nv">fixedWidthNumber</span> <span class="mi">2</span><span class="p">)</span> <span class="p">(</span><span class="kr">builtins</span><span class="o">.</span><span class="nv">genList</span> <span class="p">(</span><span class="nv">n</span><span class="p">:</span> <span class="nv">n</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="mi">10</span><span class="p">));</span>
<span class="nv">extraGroups</span><span class="o">.</span><span class="nv">guixbuild</span> <span class="o">=</span> <span class="p">{</span>
<span class="nv">name</span> <span class="o">=</span> <span class="s2">"guixbuild"</span><span class="p">;</span>
<span class="p">};</span>
<span class="p">};</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Here I used <code class="language-plaintext highlighter-rouge">fold</code> and the <code class="language-plaintext highlighter-rouge">//</code> operator to merge all of the
configuration sets into a single <code class="language-plaintext highlighter-rouge">extraUsers</code> value.</p>
<h2 id="creating-the-systemd-service">Creating the <code class="language-plaintext highlighter-rouge">systemd</code> service</h2>
<p>One other thing missing was the <code class="language-plaintext highlighter-rouge">systemd</code> service.</p>
<p>First I couldn’t just copy the <code class="language-plaintext highlighter-rouge">.service</code> file to <code class="language-plaintext highlighter-rouge">/etc</code> since in NixOS
that folder isn’t writable. But also I wanted the service to be better
integrated with the OS.</p>
<p>That was a little easier than creating the users, all I had to do was translate
the provided <a href="https://git.savannah.gnu.org/cgit/guix.git/tree/etc/guix-daemon.service.in?id=00c86a888488b16ce30634d3a3a9d871ed6734a2"><code class="language-plaintext highlighter-rouge">guix-daemon.service.in</code></a> configuration to an equivalent Nix
expression</p>
<div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="rouge-code"><pre><span class="c"># This is a "service unit file" for the systemd init system to launch
# 'guix-daemon'. Drop it in /etc/systemd/system or similar to have
# 'guix-daemon' automatically started.
</span>
<span class="nn">[Unit]</span>
<span class="py">Description</span><span class="p">=</span><span class="s">Build daemon for GNU Guix</span>
<span class="nn">[Service]</span>
<span class="py">ExecStart</span><span class="p">=</span><span class="s">/var/guix/profiles/per-user/root/guix-profile/bin/guix-daemon --build-users-group=guixbuild</span>
<span class="py">Environment</span><span class="p">=</span><span class="s">GUIX_LOCPATH=/root/.guix-profile/lib/locale</span>
<span class="py">RemainAfterExit</span><span class="p">=</span><span class="s">yes</span>
<span class="py">StandardOutput</span><span class="p">=</span><span class="s">syslog</span>
<span class="py">StandardError</span><span class="p">=</span><span class="s">syslog</span>
<span class="c"># See <https://lists.gnu.org/archive/html/guix-devel/2016-04/msg00608.html>.
# Some package builds (for example, [email protected]) may require even more than
# 1024 tasks.
</span><span class="py">TasksMax</span><span class="p">=</span><span class="s">8192</span>
<span class="nn">[Install]</span>
<span class="py">WantedBy</span><span class="p">=</span><span class="s">multi-user.target</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>This sample <code class="language-plaintext highlighter-rouge">systemd</code> configuration file became:</p>
<div class="language-nix highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="rouge-code"><pre><span class="nv">guix-daemon</span> <span class="o">=</span> <span class="p">{</span>
<span class="nv">enable</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="nv">description</span> <span class="o">=</span> <span class="s2">"Build daemon for GNU Guix"</span><span class="p">;</span>
<span class="nv">serviceConfig</span> <span class="o">=</span> <span class="p">{</span>
<span class="nv">ExecStart</span> <span class="o">=</span> <span class="s2">"/var/guix/profiles/per-user/root/guix-profile/bin/guix-daemon --build-users-group=guixbuild"</span><span class="p">;</span>
<span class="nv">Environment</span><span class="o">=</span><span class="s2">"GUIX_LOCPATH=/root/.guix-profile/lib/locale"</span><span class="p">;</span>
<span class="nv">RemainAfterExit</span><span class="o">=</span><span class="s2">"yes"</span><span class="p">;</span>
<span class="nv">StandardOutput</span><span class="o">=</span><span class="s2">"syslog"</span><span class="p">;</span>
<span class="nv">StandardError</span><span class="o">=</span><span class="s2">"syslog"</span><span class="p">;</span>
<span class="nv">TaskMax</span><span class="o">=</span> <span class="s2">"8192"</span><span class="p">;</span>
<span class="p">};</span>
<span class="nv">wantedBy</span> <span class="o">=</span> <span class="p">[</span> <span class="s2">"multi-user.target"</span> <span class="p">];</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<p>There you go! After running <code class="language-plaintext highlighter-rouge">sudo nixos-rebuild switch</code> I could get Guix
up and running:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="rouge-code"><pre><span class="nv">$ </span>guix package <span class="nt">-i</span> hello
The following package will be installed:
hello 2.10 /gnu/store/bihfrh609gkxb9dp7n96wlpigiv3krfy-hello-2.10
substitute: updating substitutes from <span class="s1">'https://mirror.hydra.gnu.org'</span>... 100.0%
The following derivations will be built:
/gnu/store/nznmdn6inpwxnlkrasydmda4s2vsp9hg-profile.drv
/gnu/store/vibqrvw4c8lacxjrkqyzqsdrmckv77kq-fonts-dir.drv
/gnu/store/hi8alg7wi0wgfdi3rn8cpp37zhx8ykf3-info-dir.drv
/gnu/store/cvkbp378cvfjikz7mjymhrimv7j12p0i-ca-certificate-bundle.drv
/gnu/store/d62fvxymnp95rzahhmhf456bsf0xg1c6-manual-database.drv
Creating manual page database...
1 entries processed <span class="k">in </span>0.0 s
2 packages <span class="k">in </span>profile
<span class="nv">$ </span>hello
Hello, world!
</pre></td></tr></tbody></table></code></pre></div></div>
<p>Some improvements to this approach are:</p>
<ol>
<li>looking into <a href="https://nixos.org/nixos/manual/index.html#sec-writing-modules">NixOS modules</a> and trying to bundle everything together
into a single logical unit;</li>
<li><a href="https://www.gnu.org/software/guix/manual/en/html_node/Requirements.html#Requirements">build Guix from source</a> and share the Nix store and daemon with Guix.</li>
</ol>
<p>Happy Guix/Nix hacking!</p>
EuAndreh[email protected]I wanted to run Guix on a NixOS machine. Even though the Guix manual explains how to do it step by step, I needed a few extra ones to make it work properly.