Jekyll2018-09-10T19:52:50+00:00https://mozilla.github.io/mentat/MentatProject Mentat is a persistent, embedded knowledge base. It draws heavily on DataScript and Datomic. Mentat is intended to be a flexible relational (not key-value, not document-oriented) store that makes it easy to describe, grow, and reuse your domain schema.Modeling data using Mentat2018-04-17T15:07:37+00:002018-04-17T15:07:37+00:00https://mozilla.github.io/mentat/examples/2018/04/17/worked-examples<h1 id="worked-examples-of-modeling-data-using-mentat">Worked examples of modeling data using Mentat</h1> <p>Used correctly, Mentat makes it easy for you to grow to accommodate new kinds of data, for data to synchronize between devices, for multiple consumers to share data, and even for errors to be fixed.</p> <p>But what does “correctly” mean?</p> <p>The following discussion and set of worked examples aim to help. During discussion sections a simplified syntax is used for schema examples.</p> <h2 id="principles">Principles</h2> <h3 id="think-about-the-domain-not-about-your-ui">Think about the domain, not about your UI</h3> <p>Given a set of mockups, or an MVP list of requirements, it’s easy to leap into defining a data model that supports exactly those things. In doing so we will likely end up with a data model that can’t support future capabilities, or that has crucial mismatches with the real world.</p> <p>For example, one might design a contact manager UI like macOS’s — a list of string fields for a person:</p> <ul> <li>First name</li> <li>Last name</li> <li>Address line 1</li> <li>Address line 2</li> <li>Phone</li> <li><em>etc.</em></li> </ul> <p>We might model this in Mentat as simple value properties:</p> <pre><code class="language-edn">[:person/name :db.type/string :db.cardinality/one] ; Incorrect: people can have many names! [:person/home_address_line_one :db.type/string :db.cardinality/one] [:person/home_address_line_two :db.type/string :db.cardinality/one] [:person/home_city :db.type/string :db.cardinality/one] [:person/home_phone :db.type/string :db.cardinality/one] </code></pre> <p>or in JSON as a simple object:</p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="s2">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Alice Smith"</span><span class="p">,</span><span class="w"> </span><span class="s2">"home_address_line_one"</span><span class="p">:</span><span class="w"> </span><span class="s2">"123 Main St"</span><span class="p">,</span><span class="w"> </span><span class="s2">"home_city"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Anywhere"</span><span class="p">,</span><span class="w"> </span><span class="s2">"home_phone"</span><span class="p">:</span><span class="w"> </span><span class="s2">"555-867-5309"</span><span class="w"> </span><span class="p">}</span><span class="w"> </span></code></pre></div></div> <p>We might realize that this proliferation of attributes is going in the wrong direction, and add nested structure:</p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w"> </span><span class="s2">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Alice Smith"</span><span class="p">,</span><span class="w"> </span><span class="s2">"home_address"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="s2">"line_one"</span><span class="p">:</span><span class="w"> </span><span class="s2">"123 Main St"</span><span class="p">,</span><span class="w"> </span><span class="s2">"city"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Anywhere"</span><span class="w"> </span><span class="p">}}</span><span class="w"> </span></code></pre></div></div> <p>(quick, is a home phone number a property of the address or the person?)</p> <p>Or we might allow for some people having multiple addresses and multiple homes:</p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="s2">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Alice Smith"</span><span class="p">,</span><span class="w"> </span><span class="s2">"addresses"</span><span class="p">:</span><span class="w"> </span><span class="p">[{</span><span class="w"> </span><span class="s2">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"home"</span><span class="p">,</span><span class="w"> </span><span class="s2">"line_one"</span><span class="p">:</span><span class="w"> </span><span class="s2">"123 Main St"</span><span class="p">,</span><span class="w"> </span><span class="s2">"city"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Anywhere"</span><span class="w"> </span><span class="p">}]}</span><span class="w"> </span></code></pre></div></div> <p>There are <a href="https://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/">lots of reasons the address model is wrong</a>, and <a href="https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/">the same is true of names</a>. But even the <em>structure</em> of this is wrong, when you think about it.</p> <p>A <em>physical place</em>, for our purposes, has an address. (It might have more than one.)</p> <p>Each place might play a number of <em>roles</em> to a number of people: the same house is the home of everyone who lives there, and the same business address is one of the work addresses for each employee. If I work from home, my work and business addresses are the same. It’s not quite true to say that an address is a “home”: an address <em>identifies</em> a <em>place</em>, and that place <em>is a home to a person</em>.</p> <p>But a typical contact application gets this wrong: the same <em>strings</em> are duplicated (flattened and denormalized) into the independent contact records of each person. If a business moves location, or its building is renamed, we must change the addresses of multiple contacts.</p> <p>A more correct model for this is <em>relational</em>:</p> <pre><code class="language-edn">[:person/name :db.type/string :db.cardinality/one] [:person/lives_at :db.type/ref :db.cardinality/many] ; Points to a place. [:person/works_at :db.type/ref :db.cardinality/many] ; Points to a place. [:place/address :db.type/ref :db.cardinality/many] ; A place can have multiple addresses. [:address/mailing_address :db.type/string :db.cardinality/one ; Each address can be represented once as a string. :db.unique/identity] [:address/city :db.type/string :db.cardinality/one] ; Perhaps this is useful? </code></pre> <p>Imagine that Alice works from home, and Bob works at his office on South Street. Alice’s data looks like this:</p> <pre><code class="language-edn">[{:person/name "Alice Smith" :person/lives_at "alice_home" :person/works_at "alice_home"} {:db/id "alice_home" :place/address "main_street_123"} {:db/id "main_street_123" :address/mailing_address "123 Main St, Anywhere, WA 12345, USA" :address/city "Anywhere"}] </code></pre> <p>and Bob’s like this:</p> <pre><code class="language-edn">[{:person/name "Bob Salmon" :person/works_at "bob_office"} {:db/id "bob_office" :place/owner "Example Holdings LLC" :place/address "south_street_555"} {:db/id "south_street_555" :address/mailing_address "555 South St, Anywhere, WA 12345, USA" :address/city "Anywhere"}] </code></pre> <p>Now if Alice (ID 1234) moves her business out of her house (1235) into an office in Bob’s building (1236), we simply break one relationship and add a new one to a new place with the same address:</p> <pre><code class="language-edn">[[:db/retract 1234 :person/works_at 1235] ; Alice no longer works at home. [:db/add 1234 :person/works_at "new_office"] {:db/id "new_office" :place/address [:address/mailing_address "555 South St, Anywhere, WA 12345, USA"]}] </code></pre> <p>If the building is now renamed to “The Office Factory”, we can update its address in one step, affecting both Alice’s and Bob’s offices:</p> <pre><code class="language-edn">[[:db/retract 1236 :address/mailing_address "555 South St, Anywhere, WA 12345, USA"] [:db/add 1236 :address/mailing_address "The Office Factory, South St, Anywhere, WA 12345, USA"]] </code></pre> <p>You can see here how changes are minimal and correspond to real changes in the domain — two properties that help with syncing. There is no duplication of strings.</p> <p>We can find everyone who works at The Office Factory in a simple query without comparing strings across ‘records’:</p> <pre><code class="language-edn">[:find ?name :where [?address :address/mailing_address "The Office Factory, South St, Anywhere, WA 12345, USA"] [?office :place/address ?address] [?person :person/works_at ?office] [?person :person/name ?name]] </code></pre> <p>Let’s say we later want to model move-in and move-out dates — useful for employment records and immigration paperwork!</p> <p>Trying to add this to the JSON model is an exercise in frustration, because there is no stable way to identify people or places! (Go ahead, try it.)</p> <p>To do it in Mentat simply requires defining a small bit of vocabulary:</p> <pre><code class="language-edn">[:place.change/person :db.type/ref :db.cardinality/many] [:place.change/from :db.type/ref :db.cardinality/one] ; optional [:place.change/to :db.type/ref :db.cardinality/one] ; optional [:place.change/role :db.type/ref :db.cardinality/one] ; :person/lives_at or :person/works_at [:place.change/on :db.type/instant :db.cardinality/one] [:place.change/reason :db.type/string :db.cardinality/one] ; optional </code></pre> <p>so we can describe Alice’s office move:</p> <pre><code class="language-edn">[{:place.change/person 1234 :place.change/from 1235 :place.change/to 1237 :place.change/role :person/works_at :place.change/on #inst "2018-02-02T13:00:00Z"}] </code></pre> <p>or Jane’s sale of her holiday home:</p> <pre><code class="language-edn">[{:place.change/person 2468 :place.change/reason "Sale" :place.change/from 1235 :place.change/role :person/lives_at :place.change/on #inst "2018-08-12T14:00:00Z"}] </code></pre> <p>Note that we don’t need to repeat the addresses, we don’t need to change the existing data, and we don’t need to complicate matters for existing code.</p> <p>Now we can find everyone who moved office in February:</p> <pre><code class="language-edn">[:find ?name :where [?move :place.change/role :person/works_at] [?move :place.change/on ?on] [(&gt;= ?on #inst "2018-02-01T00:00:00Z")] [(&lt; ?on #inst "2018-03-01T00:00:00Z")] [?move :place.change/person ?person] [?person :person/name ?name]] </code></pre> <h2 id="tend-towards-recording-observations-not-changing-state">Tend towards recording observations, not changing state</h2> <p>These principles are all different aspects of normalization.</p> <p>The introduction of fine-grained entities to represent data pushes us towards immutability: changes are increasingly changing an ‘arrow’ to point at one immutable entity or another, rather than re-describing a mutable entity.</p> <p>In the previous example we introduced <em>places</em> and <em>addresses</em>. Places and addresses themselves rarely change, allowing us to mostly isolate the churn in our data to the meaningful relationships between entities.</p> <p>Another example of this approach is shown in modeling browser history.</p> <p>Firefox’s representation of history is, at its core, relatively simplistic; just two tables a little like this:</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">history</span> <span class="p">(</span> <span class="n">id</span> <span class="n">INTEGER</span> <span class="k">PRIMARY</span> <span class="k">KEY</span><span class="p">,</span> <span class="n">guid</span> <span class="n">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">UNIQUE</span><span class="p">,</span> <span class="n">url</span> <span class="n">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">UNIQUE</span><span class="p">,</span> <span class="n">title</span> <span class="n">TEXT</span> <span class="p">);</span> <span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">visits</span> <span class="p">(</span> <span class="n">id</span> <span class="n">INTEGER</span> <span class="k">PRIMARY</span> <span class="k">KEY</span><span class="p">,</span> <span class="n">history_id</span> <span class="n">INTEGER</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">REFERENCES</span> <span class="n">history</span><span class="p">(</span><span class="n">id</span><span class="p">),</span> <span class="k">type</span> <span class="n">TINYINT</span><span class="p">,</span> <span class="k">timestamp</span> <span class="n">INTEGER</span> <span class="p">);</span> </code></pre></div></div> <p>Each time a URL is visited, an entry is added to the <code class="highlighter-rouge">visits</code> table and a row is added or updated in <code class="highlighter-rouge">history</code>. The title of the fetched page is used to update <code class="highlighter-rouge">history.title</code>, so that <code class="highlighter-rouge">history.title</code> always represents the most recently encountered title.</p> <p>This works fine until more features are added.</p> <h3 id="forgetting">Forgetting</h3> <p>Browsers often have some capacity for deleting history. Sometimes this appears in the form of an explicit ‘forget’ operation — “Forget the last five minutes of browsing”. Deleting visits in this way is fine: <code class="highlighter-rouge">DELETE FROM visits WHERE timestamp &lt; ?</code>. But the mutability in the data model — title — trips us up. We’re unable to roll back the title of the history entry.</p> <h3 id="syncing">Syncing</h3> <p>But even if you are using Mentat or Datomic, and can turn to the log to reconstruct the old state, a mutable title on <code class="highlighter-rouge">history</code> will cause conflicts when syncing: one side’s observed titles will ‘lose’ and be discarded in order to avoid a conflict. That’s not right: those titles <em>were seen</em>. Unlike a conflicting counter or flag, these weren’t abortive, temporary states; they were <em>observations of the world</em>, so there shouldn’t be a winner and a loser.</p> <h3 id="containers">Containers</h3> <p>The true data model becomes apparent when we consider containers. Containers are a Firefox feature to sandbox the cookies, site data, and history of different named sub-profiles. You can have a container just for Facebook, or one for your banking; those Facebook cookies won’t follow you around the web in your ‘personal’ container. You can simultaneously use separate Gmail accounts for work and personal email.</p> <p>When Firefox added container support, it did so by annotating visits with a <code class="highlighter-rouge">container</code>:</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">visits</span> <span class="p">(</span> <span class="n">id</span> <span class="n">INTEGER</span> <span class="k">PRIMARY</span> <span class="k">KEY</span><span class="p">,</span> <span class="n">history_id</span> <span class="n">INTEGER</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">REFERENCES</span> <span class="n">history</span><span class="p">(</span><span class="n">id</span><span class="p">),</span> <span class="k">type</span> <span class="n">TINYINT</span><span class="p">,</span> <span class="k">timestamp</span> <span class="n">INTEGER</span><span class="p">,</span> <span class="n">container</span> <span class="n">INTEGER</span> <span class="p">);</span> </code></pre></div></div> <p>This means that each container <em>competes for the title on <code class="highlighter-rouge">history</code></em>. If you visit <code class="highlighter-rouge">facebook.com</code> in your usual logged-in container, the browser will run something like this SQL:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>UPDATE TABLE history SET title = '(2) Facebook' WHERE url = 'https://www.facebook.com'; </code></pre></div></div> <p>If you visit it in the wrong container by mistake, you’ll get the Facebook login page, and Firefox will run:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>UPDATE TABLE history SET title = 'Facebook - Log In or Sign Up' WHERE url = 'https://www.facebook.com'; </code></pre></div></div> <p>Next time you open your history, <em>you’ll see the login page title, even if you had a logged-in <code class="highlighter-rouge">facebook.com</code> session open in another tab</em>. There’s no way to differentiate between the containers’ views.</p> <p>The correct data model for history is:</p> <ul> <li>Users visit a URL on a device in a container.</li> <li>Pages are fetched as a result of a visit (or dynamically after load). Pages can embed media and other resources.</li> <li>Pages, being HTML, have titles.</li> <li>Pages, titles, and visits are all <em>observations</em>, and as such cannot conflict.</li> <li>The <em>last observed</em> title to show for a URL is an <em>aggregation</em> of those events.</li> </ul> <p>The entire notion of a history table — a concept centered on the URL — having a title is a subtly incorrect choice that causes problems with more modern browser features.</p> <p>Modeled in Mentat:</p> <pre><code class="language-edn">[{:db/ident :visit/visitedOnDevice :db/valueType :db.type/ref :db/cardinality :db.cardinality/one} {:db/ident :visit/visitAt :db/valueType :db.type/instant :db/cardinality :db.cardinality/one} {:db/ident :site/visit :db/valueType :db.type/ref :db/isComponent true :db/cardinality :db.cardinality/many} {:db/ident :site/url :db/valueType :db.type/string :db/unique :db.unique/identity :db/cardinality :db.cardinality/one :db/index true} {:db/ident :visit/page :db/valueType :db.type/ref :db/isComponent true ; Debatable. :db/cardinality :db.cardinality/one} {:db/ident :page/title :db/valueType :db.type/string :db/fulltext true :db/index true :db/cardinality :db.cardinality/one} {:db/ident :visit/container :db/valueType :db.type/ref :db/cardinality :db.cardinality/one}] </code></pre> <p>Create some containers:</p> <pre><code class="language-edn">[{:db/ident :container/facebook} {:db/ident :container/personal}] </code></pre> <p>Add a device:</p> <pre><code class="language-edn">[{:db/ident :device/my-desktop}] </code></pre> <p>Visit Facebook in each container:</p> <pre><code class="language-edn">[{:visit/visitedOnDevice :device/my-desktop :visit/visitAt #inst "2018-04-06T18:46:00Z" :visit/container :container/facebook :db/id "fbvisit" :visit/page "fbpage"} {:db/id "fbpage" :page/title "(2) Facebook"} {:site/url "https://www.facebook.com" :site/visit "fbvisit"}] </code></pre> <pre><code class="language-edn">[{:visit/visitedOnDevice :device/my-desktop :visit/visitAt #inst "2018-04-06T18:46:02Z" :visit/container :container/personal :db/id "personalvisit" :visit/page "personalpage"} {:db/id "personalpage" :page/title "Facebook - Log In or Sign Up"} {:site/url "https://www.facebook.com" :site/visit "personalvisit"}] </code></pre> <p>Now we can show the title from the latest visit in a given container:</p> <pre><code class="language-edn">.q [:find (max ?visitDate) (the ?title) :where [?site :site/url "https://www.facebook.com"] [?site :site/visit ?visit] [?visit :visit/container :container/facebook] [?visit :visit/visitAt ?visitDate] [?visit :visit/page ?page] [?page :page/title ?title]] =&gt; | (the ?title) | (max ?visitDate) | --- --- | "(2) Facebook" | 2018-04-06 18:46:00 UTC | --- --- .q [:find (the ?title) (max ?visitDate) :where [?site :site/url "https://www.facebook.com"] [?site :site/visit ?visit] [?visit :visit/container :container/personal] [?visit :visit/visitAt ?visitDate] [?visit :visit/page ?page] [?page :page/title ?title]] =&gt; | (the ?title) | (max ?visitDate) | --- --- | "Facebook - Log In or Sign Up" | 2018-04-06 18:46:02 UTC | --- --- </code></pre> <h2 id="normalize-you-can-always-denormalize-for-use">Normalize; you can always denormalize for use.</h2> <p>To come.</p> <h2 id="use-unique-identities-and-cardinality-one-attributes-to-make-merging-happen-during-a-sync">Use unique identities and cardinality-one attributes to make merging happen during a sync.</h2> <p>To come.</p> <h2 id="reify-to-handle-conflict-and-atomicity">Reify to handle conflict and atomicity.</h2> <p>To come.</p>Worked examples of modeling data using Mentat