Bridgetown2024-10-01T11:35:26+00:00https://abuisman.com/feed.xmlABuisman.comBlog about solving problems with code, mostly Ruby, and other things I have learned working as an engineer and a technical leaderBuilding a Vector Database in Ruby Using Hash and PStore2024-09-29T00:00:00+00:002024-09-29T00:00:00+00:00https://abuisman.com/posts/ruby/vector-db-hash<p>Today, I’m sharing a very cool piece of code I’ve written. It is a vector database in pure Ruby that you could use for tiny datasets or to test some things in your console quickly.</p> <h2 id="first-why-would-we-need-a-vector-database">First, why would we need a vector database?</h2> <p><strong>A quick aside</strong>: what are vector databases, you ask? Why would I want to use one? Vector databases have recently become very popular in AI because they allow us to store “embeddings” in them and find other “embeddings” that are close to them. “Embeddings” are vectors that capture the meaning of text or images as numbers. They are designed in such a way that if the texts of two vectors are ‘similar’ to each other, the meaning of the texts is also close to each other.</p> <p>So, the vectors “I like pizza” and “I like souvlaki” are more “similar” than “The car has a flat tyre.”</p> <p>In this post, I will show you how we do these calculations and what these embeddings look like.</p> <h2 id="why-build-one-in-ruby">Why build one in Ruby?</h2> <p>I needed to play with some semantic search queries in a console without setting up a dedicated Vector database. So I thought, if we leave out a few things that a ‘real’ vector database does, we can probably write a bit of Ruby code that stores documents and vectors in an Array that we can loop through to find items closest to our query.</p> <p>Very quickly, I thought about inheriting from Hash. Hash is already sort of like a database; it allows us to store items based on an ID and key, and it is Enumerable, so we can loop through items.</p> <p>However, when you use Hash, you can’t persist your data. So, the next step is to port the Hash database to PStore so that everything can be persistent.</p> <p>I’ll start with Hash, though and before that with generating embeddings.</p> <h2 id="generating-embeddings">Generating embeddings</h2> <p>How do we get these embeddings? OpenAI has a practical API for this, and we can use the great <a href="https://github.com/alexrudall/ruby-openai">ruby-openai</a> client library by <strong>Alex Rudall</strong> gem to generate the embeddings.</p> <p>Let’s create a little wrapper for it to isolate the functionality:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># app/lib/openai_client.rb</span> <span class="k">class</span> <span class="nc">OpenaiClient</span> <span class="k">def</span> <span class="nf">embedding</span><span class="p">(</span><span class="n">text</span><span class="p">,</span> <span class="ss">model: </span><span class="s1">'text-embedding-3-small'</span><span class="p">)</span> <span class="n">response</span> <span class="o">=</span> <span class="n">api_client</span><span class="p">.</span><span class="nf">embeddings</span><span class="p">(</span> <span class="ss">parameters: </span><span class="p">{</span> <span class="ss">model: </span><span class="n">model</span><span class="p">,</span> <span class="ss">input: </span><span class="n">text</span><span class="p">,</span> <span class="p">},</span> <span class="p">)</span> <span class="n">response</span><span class="p">.</span><span class="nf">dig</span><span class="p">(</span><span class="s2">"data"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="s2">"embedding"</span><span class="p">)</span> <span class="k">end</span> <span class="kp">private</span> <span class="k">def</span> <span class="nf">api_client</span> <span class="vi">@api_client</span> <span class="o">||=</span> <span class="no">OpenAI</span><span class="o">::</span><span class="no">Client</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">access_token: </span><span class="no">Rails</span><span class="p">.</span><span class="nf">application</span><span class="p">.</span><span class="nf">credentials</span><span class="p">.</span><span class="nf">openai</span><span class="p">[</span><span class="ss">:api_key</span><span class="p">])</span> <span class="k">end</span> <span class="k">end</span> </code></pre></div></div> <p>We use the <code class="highlighter-rouge">text-embedding-3-small</code> model, which returns vectors with a length of 1536 tokens (numbers).</p> <p>Replace <code class="highlighter-rouge">Rails.application.credentials.openai[:api_key])</code> with whatever way you prefer to store your API keys.</p> <p>Let’s take it for a spin now.</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">oc</span> <span class="o">=</span> <span class="no">OpenaiClient</span><span class="p">.</span><span class="nf">new</span> <span class="n">oc</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="s2">"Crime"</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="o">-</span><span class="mf">0.001625204</span><span class="p">,</span> <span class="mf">0.01763012</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.008460364</span><span class="p">,</span> <span class="mf">0.09889614</span><span class="p">,</span> <span class="mf">0.026863838</span><span class="p">,</span> <span class="c1"># ...</span> <span class="n">oc</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="s2">"Criminal"</span><span class="p">)</span> <span class="p">[</span><span class="mf">0.029961154</span><span class="p">,</span> <span class="mf">0.00703114</span><span class="p">,</span> <span class="mf">0.0073463763</span><span class="p">,</span> <span class="mf">0.0476418</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.00031073904</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.0002184382</span><span class="p">,</span> <span class="c1"># ...</span> </code></pre></div></div> <p>Hmm, as they say in Germany, I only understand train station; that is to say, I don’t understand, but the algorithms probably do.</p> <p>Let’s work with something easier to read:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">godfather</span> <span class="o">=</span> <span class="n">oc</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="s2">"A coming-of-age story of a violent mafia son and his father's unhealthy obsession with oranges."</span><span class="p">)</span> <span class="n">meet</span> <span class="o">=</span> <span class="n">oc</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="s2">"A funny meeting between a father and a man who can milk just about anything with nipples, not having seen this is a crime."</span><span class="p">)</span> <span class="n">big</span> <span class="o">=</span> <span class="n">oc</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="s2">"A man gets his rug soiled by German nihilists who have no regard of the law."</span><span class="p">)</span> <span class="n">query</span> <span class="o">=</span> <span class="n">oc</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="s2">"Crime movie"</span><span class="p">)</span> </code></pre></div></div> <p>Now, we have a few arrays and need to calculate the cosine similarity. To do this, we need a few methods:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">def</span> <span class="nf">dot_product</span><span class="p">(</span><span class="n">vector1</span><span class="p">,</span> <span class="n">vector2</span><span class="p">)</span> <span class="n">vector1</span><span class="p">.</span><span class="nf">zip</span><span class="p">(</span><span class="n">vector2</span><span class="p">).</span><span class="nf">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="o">|</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span> <span class="p">}.</span><span class="nf">sum</span> <span class="k">end</span> </code></pre></div></div> <p>The dot product is the multiplication of two vectors. This can be calculated like so for vectors a and b:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>a · b = a[0] × b[0] + a[1] × b[1] ... </code></pre></div></div> <p>In Ruby, we can ‘zip’ two arrays together like this with the <code class="highlighter-rouge">zip</code> method:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">vec1</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span> <span class="n">vec2</span> <span class="o">=</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">]</span> <span class="nb">puts</span> <span class="n">vec1</span><span class="p">.</span><span class="nf">zip</span><span class="p">(</span><span class="n">vec2</span><span class="p">).</span><span class="nf">inspect</span> <span class="o">=&gt;</span> <span class="p">[[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">6</span><span class="p">],</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">7</span><span class="p">]]</span> </code></pre></div></div> <p>We map through this zipped array of the two vectors, multiply them (<code class="highlighter-rouge">a * b</code>) and then sum these multiplied numbers.</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">puts</span> <span class="n">dot_product</span><span class="p">(</span><span class="n">vec1</span><span class="p">,</span> <span class="n">vec2</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="mi">60</span> </code></pre></div></div> <p>This would be enough for the OpenAI vectors since they are normalised. This means that all of the magnitudes are 1. This, in turn, means that the dot product of the two vectors will give their similarity. We can’t always assume this will be the case, so we will continue calculating the cosine similarity.</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">def</span> <span class="nf">magnitude</span><span class="p">(</span><span class="n">vector</span><span class="p">)</span> <span class="no">Math</span><span class="p">.</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">vector</span><span class="p">.</span><span class="nf">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">component</span><span class="o">|</span> <span class="n">component</span><span class="o">**</span><span class="mi">2</span> <span class="p">}.</span><span class="nf">reduce</span><span class="p">(:</span><span class="o">+</span><span class="p">))</span> <span class="k">end</span> </code></pre></div></div> <p>Here is the magnitude of the un-normalised vectors <code class="highlighter-rouge">vec1</code> and <code class="highlighter-rouge">vec2</code></p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">puts</span> <span class="s2">"Magnitude vec1: </span><span class="si">#{</span><span class="n">magnitude</span><span class="p">(</span><span class="n">vec1</span><span class="p">)</span><span class="si">}</span><span class="s2">"</span> <span class="nb">puts</span> <span class="s2">"Magnitude vec2: </span><span class="si">#{</span><span class="n">magnitude</span><span class="p">(</span><span class="n">vec2</span><span class="p">)</span><span class="si">}</span><span class="s2">"</span> <span class="o">=&gt;</span> <span class="no">Magnitude</span> <span class="ss">vec1: </span><span class="mf">5.477225575051661</span> <span class="no">Magnitude</span> <span class="ss">vec2: </span><span class="mf">11.224972160321824</span> </code></pre></div></div> <p>Cosine similarity normalizes the dot product by the magnitudes of the vectors, effectively focusing on the direction rather than the magnitude.</p> <p>Here is how we calculate it.</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">def</span> <span class="nf">cosine_similarity</span><span class="p">(</span><span class="n">vector1</span><span class="p">,</span> <span class="n">vector2</span><span class="p">)</span> <span class="n">dot_prod</span> <span class="o">=</span> <span class="n">dot_product</span><span class="p">(</span><span class="n">vector1</span><span class="p">,</span> <span class="n">vector2</span><span class="p">)</span> <span class="n">magnitude1</span> <span class="o">=</span> <span class="n">magnitude</span><span class="p">(</span><span class="n">vector1</span><span class="p">)</span> <span class="n">magnitude2</span> <span class="o">=</span> <span class="n">magnitude</span><span class="p">(</span><span class="n">vector2</span><span class="p">)</span> <span class="n">dot_prod</span> <span class="o">/</span> <span class="p">(</span><span class="n">magnitude1</span> <span class="o">*</span> <span class="n">magnitude2</span><span class="p">)</span> <span class="k">end</span> </code></pre></div></div> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">puts</span> <span class="n">cosine_similarity</span><span class="p">(</span><span class="n">vec1</span><span class="p">,</span> <span class="n">vec2</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="mf">0.9759000729485332</span> </code></pre></div></div> <p>Now let’s try it on our movies dataset with the query: “Crime movie.”</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cosine_similarity</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">meet</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="mf">0.22359566594195673</span> <span class="n">cosine_similarity</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">big</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="mf">0.21230758115230425</span> <span class="n">cosine_similarity</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">godfather</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="mf">0.3426902243058039</span> </code></pre></div></div> <p>Firstly, these numbers are hard to interpret. Are they close or far apart? They live in 1500+ dimensional space, after all. Bear with me a little bit. Further down, we will try many more queries, which will make things clearer.</p> <p>We can see here that The Godfather is the closest, probably because it mentions ‘story’ and ‘mafia’. Meet the Parents is the furthest because it doesn’t mention an actual crime but contains a saying with the word ‘crime’.</p> <p>Nice! It’s a bit like what we expected. Let’s try another:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">query</span> <span class="o">=</span> <span class="n">oc</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="s2">"Gangster"</span><span class="p">)</span> <span class="n">cosine_similarity</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">meet</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="mf">0.18473859726676306</span> <span class="n">cosine_similarity</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">big</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="mf">0.21540343497659717</span> <span class="n">cosine_similarity</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">godfather</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="mf">0.3779570884570852</span> </code></pre></div></div> <p>The word <code class="highlighter-rouge">Mafia</code> has a much stronger effect here because members of the mafia are also called gangsters.</p> <h2 id="building-the-database">Building the database</h2> <p>Now that we’ve figured out how to generate our embeddings let’s start with the Hash-based database.</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">VectorDb</span> <span class="o">&lt;</span> <span class="no">Hash</span> <span class="k">def</span> <span class="nf">add_item</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">content</span><span class="p">:,</span> <span class="ss">embedding: </span><span class="kp">nil</span><span class="p">)</span> <span class="nb">self</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="n">content</span><span class="p">:,</span> <span class="ss">embedding: </span><span class="p">}</span> <span class="k">end</span> <span class="k">end</span> </code></pre></div></div> <p>This will store an item on a given ID along with its content and embedding.</p> <p>We now want to store the embedding if we haven’t passed one in yet; for this, we need the OpenAI client:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">VectorDb</span> <span class="o">&lt;</span> <span class="no">Hash</span> <span class="c1"># ...</span> <span class="k">def</span> <span class="nf">add_item</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">content</span><span class="p">:,</span> <span class="ss">embedding: </span><span class="kp">nil</span><span class="p">)</span> <span class="n">embedding</span> <span class="o">||=</span> <span class="n">openai_client</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="n">content</span><span class="p">)</span> <span class="nb">self</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="n">content</span><span class="p">:,</span> <span class="ss">embedding: </span><span class="p">}</span> <span class="k">end</span> <span class="kp">private</span> <span class="k">def</span> <span class="nf">openai_client</span> <span class="vi">@openai_client</span> <span class="o">||=</span> <span class="no">OpenaiClient</span><span class="p">.</span><span class="nf">new</span> <span class="k">end</span> <span class="k">end</span> </code></pre></div></div> <p>We add the same calculation methods as before to calculate distances between vectors:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">VectorDb</span> <span class="o">&lt;</span> <span class="no">Hash</span> <span class="c1"># ...</span> <span class="k">def</span> <span class="nf">dot_product</span><span class="p">(</span><span class="n">vector1</span><span class="p">,</span> <span class="n">vector2</span><span class="p">)</span> <span class="n">vector1</span><span class="p">.</span><span class="nf">zip</span><span class="p">(</span><span class="n">vector2</span><span class="p">).</span><span class="nf">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="o">|</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span> <span class="p">}.</span><span class="nf">reduce</span><span class="p">(:</span><span class="o">+</span><span class="p">)</span> <span class="k">end</span> <span class="k">def</span> <span class="nf">magnitude</span><span class="p">(</span><span class="n">vector</span><span class="p">)</span> <span class="no">Math</span><span class="p">.</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">vector</span><span class="p">.</span><span class="nf">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">component</span><span class="o">|</span> <span class="n">component</span><span class="o">**</span><span class="mi">2</span> <span class="p">}.</span><span class="nf">reduce</span><span class="p">(:</span><span class="o">+</span><span class="p">))</span> <span class="k">end</span> <span class="k">def</span> <span class="nf">cosine_similarity</span><span class="p">(</span><span class="n">vector1</span><span class="p">,</span> <span class="n">vector2</span><span class="p">)</span> <span class="n">dot_prod</span> <span class="o">=</span> <span class="n">dot_product</span><span class="p">(</span><span class="n">vector1</span><span class="p">,</span> <span class="n">vector2</span><span class="p">)</span> <span class="n">magnitude1</span> <span class="o">=</span> <span class="n">magnitude</span><span class="p">(</span><span class="n">vector1</span><span class="p">)</span> <span class="n">magnitude2</span> <span class="o">=</span> <span class="n">magnitude</span><span class="p">(</span><span class="n">vector2</span><span class="p">)</span> <span class="n">dot_prod</span> <span class="o">/</span> <span class="p">(</span><span class="n">magnitude1</span> <span class="o">*</span> <span class="n">magnitude2</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> </code></pre></div></div> <p>Now, I implement a <code class="highlighter-rouge">search</code> method where this all comes together.</p> <p>The concept here is that we will generate the query embedding, loop through the stored embeddings, and calculate their similarity to that query embedding.</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">VectorDb</span> <span class="o">&lt;</span> <span class="no">Hash</span> <span class="k">def</span> <span class="nf">search</span><span class="p">(</span><span class="n">query</span><span class="p">)</span> <span class="n">query_embedding</span> <span class="o">=</span> <span class="n">openai_client</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="n">query</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">each_with_object</span><span class="p">({})</span> <span class="k">do</span> <span class="o">|</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">item</span><span class="p">),</span> <span class="n">results</span><span class="o">|</span> <span class="n">results</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span> <span class="o">=</span> <span class="n">cosine_similarity</span><span class="p">(</span><span class="n">query_embedding</span><span class="p">,</span> <span class="n">item</span><span class="p">[</span><span class="ss">:embedding</span><span class="p">])</span> <span class="k">end</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span><span class="p">.</span><span class="nf">sort_by</span> <span class="p">{</span> <span class="o">|</span><span class="n">_id</span><span class="p">,</span> <span class="n">similarity</span><span class="o">|</span> <span class="n">similarity</span> <span class="p">}.</span><span class="nf">reverse</span><span class="p">.</span><span class="nf">to_h</span> <span class="k">end</span> <span class="c1">#...</span> <span class="k">end</span> </code></pre></div></div> <p>🧙‍♂️ Lo and behold ✨ a vector database.</p> <p>Let’s try it out:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">vb</span> <span class="o">=</span> <span class="no">VectorDb</span><span class="p">.</span><span class="nf">new</span> <span class="n">vb</span><span class="p">.</span><span class="nf">add_item</span><span class="p">(</span><span class="s2">"The Godfather"</span><span class="p">,</span> <span class="ss">content: </span><span class="s2">"A coming of age story of a violent mafia son and his father's unhealthy obsession with oranges."</span><span class="p">)</span> <span class="n">vb</span><span class="p">.</span><span class="nf">add_item</span><span class="p">(</span><span class="s2">"Meet the Parents"</span><span class="p">,</span> <span class="ss">content: </span><span class="s2">"A funny meeting between a father and a man who can milk just about anything with nipples, not having seen this is a crime."</span><span class="p">)</span> <span class="n">vb</span><span class="p">.</span><span class="nf">add_item</span><span class="p">(</span><span class="s2">"The Big Lebowski"</span><span class="p">,</span> <span class="ss">content: </span><span class="s2">"A man gets his rug soiled by German nihilists who have no regard of the law."</span><span class="p">)</span> <span class="n">vb</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span><span class="s2">"Being criminal"</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span><span class="s2">"The Big Lebowski"</span> <span class="o">=&gt;</span> <span class="mf">0.2902403092098413</span><span class="p">,</span> <span class="s2">"Meet the Parents"</span> <span class="o">=&gt;</span> <span class="mf">0.20513406388011676</span><span class="p">,</span> <span class="s2">"The Godfather"</span> <span class="o">=&gt;</span> <span class="mf">0.19525673473056426</span><span class="p">}</span> <span class="n">vb</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span><span class="s2">"Movie about breaking the law"</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span><span class="s2">"The Godfather"</span> <span class="o">=&gt;</span> <span class="mf">0.35009263294726534</span><span class="p">,</span> <span class="s2">"The Big Lebowski"</span> <span class="o">=&gt;</span> <span class="mf">0.2518669715823097</span><span class="p">,</span> <span class="s2">"Meet the Parents"</span> <span class="o">=&gt;</span> <span class="mf">0.22558495028153422</span><span class="p">}</span> <span class="n">vb</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span><span class="s2">"Farming cattle"</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span><span class="s2">"Meet the Parents"</span> <span class="o">=&gt;</span> <span class="mf">0.22182056416947168</span><span class="p">,</span> <span class="s2">"The Godfather"</span> <span class="o">=&gt;</span> <span class="mf">0.12570011314672785</span><span class="p">,</span> <span class="s2">"The Big Lebowski"</span> <span class="o">=&gt;</span> <span class="mf">0.008469368264364811</span><span class="p">}</span> </code></pre></div></div> <p>Seems like it works. Let’s add an item to skew the results a bit:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">vb</span><span class="p">.</span><span class="nf">add_item</span><span class="p">(</span><span class="s2">"Snatch"</span><span class="p">,</span> <span class="ss">content: </span><span class="s2">"A movie about a bunch of gangsters stealing a diamond and a dog."</span><span class="p">)</span> <span class="c1"># Rerun our query</span> <span class="n">vb</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span><span class="s2">"Movie about breaking the law"</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span><span class="s2">"Snatch"</span><span class="o">=&gt;</span><span class="mf">0.5023058260371518</span><span class="p">,</span> <span class="s2">"The Godfather"</span><span class="o">=&gt;</span><span class="mf">0.35009263294726534</span><span class="p">,</span> <span class="s2">"The Big Lebowski"</span><span class="o">=&gt;</span><span class="mf">0.2518669715823097</span><span class="p">,</span> <span class="s2">"Meet the Parents"</span><span class="o">=&gt;</span><span class="mf">0.22558495028153422</span><span class="p">}</span> </code></pre></div></div> <p>I suspect adding “a movie about” has influenced the search quite a bit here. Which is a good thing. The other descriptions in no way reflected that they were about movies. We can test this assumption by adding yet another movie and rerunning the search:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">vb</span><span class="p">.</span><span class="nf">add_item</span><span class="p">(</span><span class="s2">"Snatch v2"</span><span class="p">,</span> <span class="ss">content: </span><span class="s2">"A bunch of gangsters stealing a diamond and a dog."</span><span class="p">)</span> <span class="n">vb</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span><span class="s2">"Movie about breaking the law"</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span><span class="s2">"Snatch"</span><span class="o">=&gt;</span><span class="mf">0.5023058260371518</span><span class="p">,</span> <span class="s2">"The Godfather"</span><span class="o">=&gt;</span><span class="mf">0.35009263294726534</span><span class="p">,</span> <span class="s2">"Snatch v2"</span><span class="o">=&gt;</span><span class="mf">0.34647917573476067</span><span class="p">,</span> <span class="s2">"The Big Lebowski"</span><span class="o">=&gt;</span><span class="mf">0.2518669715823097</span><span class="p">,</span> <span class="s2">"Meet the Parents"</span><span class="o">=&gt;</span><span class="mf">0.22558495028153422</span><span class="p">}</span> </code></pre></div></div> <p>The hypothesis was correct. Who would’ve thought adding more context would give better results? It highlights the need to provide the models with high-quality, well-labeled information.</p> <h2 id="rubys-vector-type"><strong>Ruby’s Vector-type</strong></h2> <p>Now that we understand how this all works, I’ll tell you a little secret. Ruby has a Vector data type!</p> <p>This is how it works:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'matrix'</span> <span class="n">vec1</span> <span class="o">=</span> <span class="no">Vector</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span> <span class="n">vec2</span> <span class="o">=</span> <span class="no">Vector</span><span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">]</span> <span class="nb">puts</span> <span class="n">vec1</span><span class="p">.</span><span class="nf">dot</span><span class="p">(</span><span class="n">vec2</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="mi">60</span> <span class="nb">puts</span> <span class="n">vec1</span><span class="p">.</span><span class="nf">magnitude</span> <span class="nb">puts</span> <span class="n">vec2</span><span class="p">.</span><span class="nf">magnitude</span> <span class="o">=&gt;</span> <span class="no">Magnitude</span> <span class="ss">vec1: </span><span class="mf">5.477225575051661</span> <span class="no">Magnitude</span> <span class="ss">vec2: </span><span class="mf">11.224972160321824</span> <span class="c1"># `magnitude` is aliased as `r` and `norm`</span> <span class="k">def</span> <span class="nf">cosine_similarity</span><span class="p">(</span><span class="n">vector1</span><span class="p">,</span> <span class="n">vector2</span><span class="p">)</span> <span class="n">vector1</span><span class="p">.</span><span class="nf">dot</span><span class="p">(</span><span class="n">vector2</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">vector1</span><span class="p">.</span><span class="nf">norm</span> <span class="o">*</span> <span class="n">vector2</span><span class="p">.</span><span class="nf">norm</span><span class="p">)</span> <span class="k">end</span> </code></pre></div></div> <p>You can create a vector from an embedding like this:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">embedding</span> <span class="o">=</span> <span class="n">openai_client</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="s2">"My text"</span><span class="p">)</span> <span class="n">vector</span> <span class="o">=</span> <span class="no">Vector</span><span class="p">[</span><span class="o">*</span><span class="n">embedding</span><span class="p">]</span> <span class="c1"># or</span> <span class="n">vector</span> <span class="o">=</span> <span class="no">Vector</span><span class="p">.</span><span class="nf">elements</span><span class="p">(</span><span class="n">embedding</span><span class="p">)</span> </code></pre></div></div> <h2 id="update-vectordb-to-use-vector-types">Update VectorDb to use Vector-types.</h2> <p>We can replace all the calculation methods with the following:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">cosine_similarity</span><span class="p">(</span><span class="n">vector1</span><span class="p">,</span> <span class="n">vector2</span><span class="p">)</span> <span class="n">vector1</span><span class="p">.</span><span class="nf">dot</span><span class="p">(</span><span class="n">vector2</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">vector1</span><span class="p">.</span><span class="nf">norm</span> <span class="o">*</span> <span class="n">vector2</span><span class="p">.</span><span class="nf">norm</span><span class="p">)</span> <span class="k">end</span> </code></pre></div></div> <p>This leaves us with the following final result:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'matrix'</span> <span class="k">class</span> <span class="nc">VectorDb</span> <span class="o">&lt;</span> <span class="no">Hash</span> <span class="k">def</span> <span class="nf">search</span><span class="p">(</span><span class="n">query</span><span class="p">)</span> <span class="n">query_embedding</span> <span class="o">=</span> <span class="n">openai_client</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="n">query</span><span class="p">)</span> <span class="n">query_embedding</span> <span class="o">=</span> <span class="no">Vector</span><span class="p">.</span><span class="nf">elements</span><span class="p">(</span><span class="n">query_embedding</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">each_with_object</span><span class="p">({})</span> <span class="k">do</span> <span class="o">|</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">item</span><span class="p">),</span> <span class="n">results</span><span class="o">|</span> <span class="n">results</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span> <span class="o">=</span> <span class="n">cosine_similarity</span><span class="p">(</span><span class="n">query_embedding</span><span class="p">,</span> <span class="n">item</span><span class="p">[</span><span class="ss">:embedding</span><span class="p">])</span> <span class="k">end</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span><span class="p">.</span><span class="nf">sort_by</span> <span class="p">{</span> <span class="o">|</span><span class="n">_id</span><span class="p">,</span> <span class="n">similarity</span><span class="o">|</span> <span class="n">similarity</span> <span class="p">}.</span><span class="nf">reverse</span><span class="p">.</span><span class="nf">to_h</span> <span class="k">end</span> <span class="k">def</span> <span class="nf">add_item</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">content</span><span class="p">:,</span> <span class="ss">embedding: </span><span class="kp">nil</span><span class="p">)</span> <span class="n">embedding</span> <span class="o">=</span> <span class="n">openai_client</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="n">content</span><span class="p">)</span> <span class="k">if</span> <span class="n">embedding</span><span class="p">.</span><span class="nf">nil?</span> <span class="nb">self</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="ss">content: </span><span class="n">content</span><span class="p">,</span> <span class="ss">embedding: </span><span class="no">Vector</span><span class="p">.</span><span class="nf">elements</span><span class="p">(</span><span class="n">embedding</span><span class="p">)</span> <span class="p">}</span> <span class="k">end</span> <span class="k">def</span> <span class="nf">cosine_similarity</span><span class="p">(</span><span class="n">vector1</span><span class="p">,</span> <span class="n">vector2</span><span class="p">)</span> <span class="n">vector1</span><span class="p">.</span><span class="nf">dot</span><span class="p">(</span><span class="n">vector2</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">vector1</span><span class="p">.</span><span class="nf">norm</span> <span class="o">*</span> <span class="n">vector2</span><span class="p">.</span><span class="nf">norm</span><span class="p">)</span> <span class="k">end</span> <span class="kp">private</span> <span class="k">def</span> <span class="nf">openai_client</span> <span class="vi">@openai_client</span> <span class="o">||=</span> <span class="no">OpenaiClient</span><span class="p">.</span><span class="nf">new</span> <span class="k">end</span> <span class="k">end</span> </code></pre></div></div> <p>Let’s rerun the first query to double-check the result is the same:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">vb</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span><span class="s2">"Being criminal"</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="c1"># Before</span> <span class="p">{</span><span class="s2">"The Big Lebowski"</span><span class="o">=&gt;</span><span class="mf">0.2902403092098413</span><span class="p">,</span> <span class="s2">"Meet the Parents"</span><span class="o">=&gt;</span><span class="mf">0.20513406388011676</span><span class="p">,</span> <span class="s2">"The Godfather"</span><span class="o">=&gt;</span><span class="mf">0.19525673473056426</span><span class="p">}</span> <span class="c1"># After</span> <span class="p">{</span><span class="s2">"The Big Lebowski"</span><span class="o">=&gt;</span><span class="mf">0.2902403092098413</span><span class="p">,</span> <span class="s2">"Meet the Parents"</span><span class="o">=&gt;</span><span class="mf">0.20513406388011676</span><span class="p">,</span> <span class="s2">"The Godfather"</span><span class="o">=&gt;</span><span class="mf">0.19525673473056426</span><span class="p">}</span> </code></pre></div></div> <p>Looking very good.</p> <h2 id="pstore">PStore</h2> <p>So far, since we have been using Hash as a basis for our database, we have had to hit the API every time to get the embeddings again. This not only costs time but also money. It isn’t a big problem for testing similarities between a handful of sentences, but if we could store our data, we’d even be able to use it as a real database.</p> <p>Enter PStore.</p> <blockquote> <p>PStore implements a file based persistence mechanism based on a Hash. User code can store hierarchies of Ruby objects (values) into the data store file by name (keys). An object hierarchy may be just a single object. User code may later read values back from the data store or even update data, as needed. <a href="https://github.com/ruby/pstore">ruby/pstore: PStore implements a file based persistence mechanism based on a Hash.</a></p> </blockquote> <p>Let’s get started with the implementation. I will call the new database <code class="highlighter-rouge">Vstore</code> and add the <code class="highlighter-rouge">add_item</code> and <code class="highlighter-rouge">cosine_similarity</code> methods.</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'pstore'</span> <span class="k">class</span> <span class="nc">Vstore</span> <span class="o">&lt;</span> <span class="no">PStore</span> <span class="k">def</span> <span class="nf">add_item</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">content</span><span class="p">:,</span> <span class="ss">embedding: </span><span class="kp">nil</span><span class="p">)</span> <span class="n">transaction</span> <span class="k">do</span> <span class="n">embedding</span> <span class="o">=</span> <span class="n">openai_client</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="n">content</span><span class="p">)</span> <span class="k">if</span> <span class="n">embedding</span><span class="p">.</span><span class="nf">nil?</span> <span class="n">embedding</span> <span class="o">=</span> <span class="no">Vector</span><span class="p">.</span><span class="nf">elements</span><span class="p">(</span><span class="n">embedding</span><span class="p">)</span> <span class="nb">self</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="ss">content: </span><span class="n">content</span><span class="p">,</span> <span class="ss">embedding: </span><span class="n">embedding</span> <span class="p">}</span> <span class="k">end</span> <span class="k">end</span> <span class="k">def</span> <span class="nf">cosine_similarity</span><span class="p">(</span><span class="n">vector1</span><span class="p">,</span> <span class="n">vector2</span><span class="p">)</span> <span class="n">vector1</span><span class="p">.</span><span class="nf">dot</span><span class="p">(</span><span class="n">vector2</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">vector1</span><span class="p">.</span><span class="nf">norm</span> <span class="o">*</span> <span class="n">vector2</span><span class="p">.</span><span class="nf">norm</span><span class="p">)</span> <span class="k">end</span> <span class="kp">private</span> <span class="k">def</span> <span class="nf">openai_client</span> <span class="vi">@openai_client</span> <span class="o">||=</span> <span class="no">OpenaiClient</span><span class="p">.</span><span class="nf">new</span> <span class="k">end</span> <span class="k">end</span> </code></pre></div></div> <p>As you can see, <code class="highlighter-rouge">PStore</code> is very similar to <code class="highlighter-rouge">Hash</code>, except that we have to wrap operations in a <code class="highlighter-rouge">transaction</code> block.</p> <p>We need to pass a path to a location when initialising the database, like so:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">vdb</span> <span class="o">=</span> <span class="no">Vstore</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"my_vector_store.pstore"</span><span class="p">)</span> </code></pre></div></div> <p>The search method will be a bit different, though, because of how we have to loop over the records:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">search</span><span class="p">(</span><span class="n">query</span><span class="p">)</span> <span class="n">query_embedding</span> <span class="o">=</span> <span class="n">openai_client</span><span class="p">.</span><span class="nf">embedding</span><span class="p">(</span><span class="n">query</span><span class="p">)</span> <span class="n">query_embedding</span> <span class="o">=</span> <span class="no">Vector</span><span class="p">.</span><span class="nf">elements</span><span class="p">(</span><span class="n">query_embedding</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="p">{}</span> <span class="n">transaction</span><span class="p">(</span><span class="kp">true</span><span class="p">)</span> <span class="k">do</span> <span class="n">roots</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="nb">id</span><span class="o">|</span> <span class="n">item</span> <span class="o">=</span> <span class="nb">self</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span> <span class="k">next</span> <span class="k">if</span> <span class="o">!</span><span class="n">item</span><span class="p">.</span><span class="nf">key?</span><span class="p">(</span><span class="ss">:embedding</span><span class="p">)</span> <span class="o">||</span> <span class="n">item</span><span class="p">[</span><span class="ss">:embedding</span><span class="p">].</span><span class="nf">nil?</span> <span class="n">result</span><span class="p">[</span><span class="nb">id</span><span class="p">]</span> <span class="o">=</span> <span class="n">cosine_similarity</span><span class="p">(</span><span class="n">query_embedding</span><span class="p">,</span> <span class="n">item</span><span class="p">[</span><span class="ss">:embedding</span><span class="p">])</span> <span class="k">end</span> <span class="k">end</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span><span class="p">.</span><span class="nf">sort_by</span> <span class="p">{</span> <span class="o">|</span><span class="n">_id</span><span class="p">,</span> <span class="n">similarity</span><span class="o">|</span> <span class="n">similarity</span> <span class="p">}.</span><span class="nf">reverse</span><span class="p">.</span><span class="nf">to_h</span> <span class="k">end</span> </code></pre></div></div> <p>We start a read-only transaction with <code class="highlighter-rouge">transaction(true)</code> , loop over all the keys in the store by calling <code class="highlighter-rouge">roots,</code> and fetch the data with <code class="highlighter-rouge">self[id]</code>within the each-block.</p> <p>Apart from this, the implementation is the same as that of the <code class="highlighter-rouge">Hash</code>-based one.</p> <p>Let’s check the results:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">vb</span> <span class="o">=</span> <span class="no">Vstore</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s1">'movies.pstore'</span><span class="p">)</span> <span class="n">vb</span><span class="p">.</span><span class="nf">add_item</span><span class="p">(</span><span class="s2">"The Godfather"</span><span class="p">,</span> <span class="ss">content: </span><span class="s2">"A coming of age story of a violent mafia son and his father's unhealthy obsession with oranges."</span><span class="p">)</span> <span class="n">vb</span><span class="p">.</span><span class="nf">add_item</span><span class="p">(</span><span class="s2">"Meet the Parents"</span><span class="p">,</span> <span class="ss">content: </span><span class="s2">"A funny meeting between a father and a man who can milk just about anything with nipples, not having seen this is a crime."</span><span class="p">)</span> <span class="n">vb</span><span class="p">.</span><span class="nf">add_item</span><span class="p">(</span><span class="s2">"The Big Lebowski"</span><span class="p">,</span> <span class="ss">content: </span><span class="s2">"A man gets his rug soiled by German nihilists who have no regard of the law."</span><span class="p">)</span> <span class="n">vb</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span><span class="s2">"Being criminal"</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span><span class="s2">"The Big Lebowski"</span><span class="o">=&gt;</span><span class="mf">0.2902403092098413</span><span class="p">,</span> <span class="s2">"Meet the Parents"</span><span class="o">=&gt;</span><span class="mf">0.20513406388011676</span><span class="p">,</span> <span class="s2">"The Godfather"</span><span class="o">=&gt;</span><span class="mf">0.19525673473056426</span><span class="p">}</span> <span class="c1"># With Hash:</span> <span class="p">{</span><span class="s2">"The Big Lebowski"</span><span class="o">=&gt;</span><span class="mf">0.2902403092098413</span><span class="p">,</span> <span class="s2">"Meet the Parents"</span><span class="o">=&gt;</span><span class="mf">0.20513406388011676</span><span class="p">,</span> <span class="s2">"The Godfather"</span><span class="o">=&gt;</span><span class="mf">0.19525673473056426</span><span class="p">}</span> </code></pre></div></div> <p>Beautiful! We’ve created a vector database that generates the embeddings for the items we put into it and allows us to do semantic search!</p> <h2 id="want-to-learn-more">Want to learn more?</h2> <p>This post is based on a chapter from my new book, “<a href="https://abuisman.gumroad.com/l/from-rag-to-riches-with-ruby">From RAG to Riches: How to Build Insanely Good AI Applications That Use Your Own Data</a>,” which is out now!</p> <p>I wrote the book to share all the knowledge I gained from building AI applications that use documents and other data to give far better results.</p> <p>It irks me that big tech companies are trying to sell simple AI features as expensive add-ons or products while the underlying technology is not so complicated that you wouldn’t understand it. It is just so new at the moment that it will take a long time to figure out how it all comes together.</p> <p>I’ve combined all my learnings in the book, and after reading it, you should be at the top of the industry in using AI in Rails applications.</p> <div id="book" class="book-teaser-wide"> <a class="book-teaser-link" href="https://abuisman.gumroad.com/l/from-rag-to-riches-with-ruby" target="_blank"><img class="teaser-image emancipated" src="/images/book-teaser-wide.png" alt="From RAG to Riches cover" /></a> <p class="book-teaser-text">Get my new book <a href="https://abuisman.gumroad.com/l/from-rag-to-riches-with-ruby" target="_blank">From RAG to Riches</a>.</p> </div>Language matters: Building blocks2023-10-12T00:00:00+00:002023-10-12T00:00:00+00:00https://abuisman.com/posts/thoughts/language_matters_building_blocks<p>I have been working on a big platform with countless pages. Since its beginning, it was built using Bootstrap for the front end, but maintenance has become burdensome over time. This is due to multiple layers of custom CSS and the reuse of CSS, which has caused changes in one place to have unpredictable effects on others.</p> <p>Our way out of this situation is Tailwind. Since you write HTML and style in the same files, the styling is gone when you remove the HTML. You can still create custom tweaks per page or element without impacting another.</p> <p>Bootstrap did have its benefits: it provided developers with a nice set of UI elements to use when building screens. So, we have started developing a set of reusable elements.</p> <p>There are several approaches to creating such a set of elements in Rails. Partials and helpers are part of the framework, so it made sense to start from there. I discussed this with the developers, and they set off exploring our options.</p> <p>Very quickly, I started hearing them say things like:</p> <ul> <li>“The project’s goal is to build a set of components.”</li> <li>“Let’s use View Components by GitHub.”</li> <li>“Have we looked at building components with React?”</li> <li>“There are also these things called web components.”</li> </ul> <p>Wait a minute?! How did we go from a problem definition of conflicting CSS files and the lack of reusable elements to a whole set of new paradigms? Then it dawned on me. We had named the project “Tailwind Components”.</p> <p>I then realised that the word “Components” carries so much baggage; people already have a clear picture of what components are in their heads. It might not be the same for everyone, but when you say, “Let’s build components”, it starts having a life of its own.</p> <p>Components might be the right solution, but I’d like to find out first what we can do by just writing HTML, helpers and partials smarter; we could write “components” afterwards if we still needed them. This whole first phase was skipped because of the choice of words.</p> <p>The fix was to start calling them ‘building blocks’. This suddenly leaves room for solutions such as writing a modal in a Ruby partial, hooking a stimulus controller to it for the simple dynamics and documenting in Notion, where you can go to copy and paste the modal. This is not the best solution or what we need to go with, but now it is on the table again, along with what might be the real solution.</p> <p>So, language matters!</p>Super fast downloading of big files2022-10-10T00:00:00+00:002022-10-10T00:00:00+00:00https://abuisman.com/posts/developer-tools/super-fast-downloading-of-big-files<p>Using <code class="highlighter-rouge">curl</code> or <code class="highlighter-rouge">wget</code> for downloading big dumps to your virtual machines is slow! Use <code class="highlighter-rouge">aria2</code> instead.</p> <p>Today I had to download a 49GB Heroku Postgres backup to an EC2 server to import it into an RDS instance. The defacto way to download files is using <code class="highlighter-rouge">curl</code> and <code class="highlighter-rouge">wget</code>. The problem with these tools is that you download the file using one connection, and connections are often throttled.</p> <p>You can see that by looking at my first attempt:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-o</span> mydump.dump <span class="s2">"https://..."</span> % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 49.9G 0 144M 0 0 12.7M 0 1:07:00 0:00:11 1:06:49 16.5M </code></pre></div></div> <p>16.5M?!? One hour and six minutes!? This is silly, we are on a cloud hosted machine that is probably in the same data center as the source.</p> <p>Luckily I’ve experienced the olden days of the internet where we’d wait for hours on one mp3, and so I have some experience in ways to download things a bit quicker. One such method was splitting up downloads and downloading through multiple connections simultaneously. I used FlashGet and other tools like it for this.</p> <p>On servers, I’ve used a tool called <code class="highlighter-rouge">aget</code> in the past, but I couldn’t install it on my Ubuntu EC2 machine, so I looked around and found <code class="highlighter-rouge">aria2</code>. I installed it, looked at the instructions and came up with this command:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aria2c <span class="nt">-o</span> mydump.dump <span class="nt">--max-connection-per-server</span><span class="o">=</span>10 <span class="nt">-s</span> 10 <span class="s2">"https://..."</span> <span class="o">[</span><span class="c">#.... 36GiB/49GiB(74%) CN:10 DL:130MiB ETA:1m41s]</span> </code></pre></div></div> <p>130MiB that is more like it!</p> <p>The command above lets me download ten chunks at the same time. For some reason, the limit with <code class="highlighter-rouge">aria2</code> is 16, so let’s try that:</p> <p>With 16 parts:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aria2c <span class="nt">-o</span> mydump.dump <span class="nt">--max-connection-per-server</span><span class="o">=</span>16 <span class="nt">-s</span> 16 <span class="s2">"https://..."</span> <span class="o">[</span><span class="c">#... 3.0GiB/49GiB(6%) CN:16 DL:222MiB ETA:3m36s]</span> </code></pre></div></div> <p>222MiB, which means about 4 minutes instead of 1 hour and 6 minutes!</p> <p>Another advantage of using <code class="highlighter-rouge">aria2</code> is that you can continue where you left off if the connection breaks.</p>Zero downtime credential updates on Heroku.2022-08-30T00:00:00+00:002022-08-30T00:00:00+00:00https://abuisman.com/posts/rails/zero-downtime-credential-updates<p>When you use Rails credentials on Heroku, you probably have your key in your environment variables. Probably <code class="highlighter-rouge">RAILS_MASTER_KEY</code>. Let’s say you want to rotate this key. You re-encrypt your credentials, push the code (reboot 1) and then add the env-var (reboot 2). You could also flip these two steps around.</p> <p>In between these two reboots, your app won’t work since you either have a new credentials file with the old key, or the old credentials file with a new key.</p> <p>What I want to happen is:</p> <ol> <li>Set the new key in a new environment variable: <code class="highlighter-rouge">RAILS_MASTER_KEY_NEW</code></li> <li>The rails app will boot again since Heroku reboots after each ENV update and see the new key</li> <li>The rails app checks if the new key can decrypt the credentials, if not, it falls back to the old key</li> </ol> <p>This is how you can do this:</p> <p>Add this to your <code class="highlighter-rouge">config/application.rb</code>:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Application</span> <span class="o">&lt;</span> <span class="no">Rails</span><span class="o">::</span><span class="no">Application</span> <span class="c1"># Don't add this obviously</span> <span class="c1"># When we've defined RAILS_MASTER_KEY_NEW it means we are rotating the encryption key</span> <span class="c1"># for our credentials. What we want to do then is:</span> <span class="c1"># 1. Check if we can decrypt the current credentials file with the new key</span> <span class="c1"># 2. If we can, we will change RAILS_MASTER_KEY to equal RAILS_MASTER_KEY_NEW</span> <span class="c1"># 3. If not, we will fallback to the old key, thus leave RAILS_MASTER_KEY alone</span> <span class="k">if</span> <span class="no">ENV</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="s2">"RAILS_MASTER_KEY_NEW"</span><span class="p">,</span> <span class="kp">false</span><span class="p">)</span> <span class="nb">require</span> <span class="s1">'logger'</span> <span class="n">cred_logger</span> <span class="o">=</span> <span class="no">Logger</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="vg">$stdout</span><span class="p">)</span> <span class="c1"># Rubocop wanted this. He is the boss.</span> <span class="n">credential_path</span> <span class="o">=</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">root</span><span class="p">.</span><span class="nf">join</span><span class="p">(</span><span class="s2">"config/credentials/</span><span class="si">#{</span><span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="si">}</span><span class="s2">.yml.enc"</span><span class="p">)</span> <span class="k">begin</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">application</span><span class="p">.</span><span class="nf">encrypted</span><span class="p">(</span><span class="n">credential_path</span><span class="p">,</span> <span class="ss">env_key: </span><span class="s1">'RAILS_MASTER_KEY_NEW'</span><span class="p">).</span><span class="nf">read</span> <span class="no">ENV</span><span class="p">[</span><span class="s2">"RAILS_MASTER_KEY"</span><span class="p">]</span> <span class="o">=</span> <span class="no">ENV</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="s2">"RAILS_MASTER_KEY_NEW"</span><span class="p">)</span> <span class="n">cred_logger</span><span class="p">.</span><span class="nf">info</span> <span class="s2">"application.rb: Using the new credential key, it works!"</span> <span class="k">rescue</span> <span class="no">ActiveSupport</span><span class="o">::</span><span class="no">MessageEncryptor</span><span class="o">::</span><span class="no">InvalidMessage</span> <span class="o">=&gt;</span> <span class="n">e</span> <span class="n">cred_logger</span><span class="p">.</span><span class="nf">info</span> <span class="s2">"application.rb: Using the old key"</span> <span class="k">end</span> <span class="k">end</span> <span class="c1">#...</span> </code></pre></div></div> <p>That’s it.</p> <p>This needs to be at the top because Rails will pretty quickly try to load the credentials. Some trial and error will probably reveal the sweet spot, but this works just fine here.</p> <p>After successfully updating your credentials file, you can remove the new key again. Be sure to set the value of <code class="highlighter-rouge">RAILS_MASTER_KEY</code> to that of <code class="highlighter-rouge">RAILS_MASTER_KEY_NEW</code> though!</p>Cron monitoring with Blazer2022-08-23T00:00:00+00:002022-08-23T00:00:00+00:00https://abuisman.com/posts/developer-tools/cron-monitoring-with-blazer<p>Many apps have recurring jobs and we should track them. There are third-party services to do this, but I like to keep things simple and own my data.</p> <p>I’ll show you how to keep track of cron jobs within your Rails application and report on them through Slack. We collect logs in your database, so you can use them for other things as well.</p> <p>When done we can write this SQL:</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">job_logs</span> <span class="k">WHERE</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">'ping_site'</span> <span class="k">AND</span> <span class="n">strftime</span><span class="p">(</span><span class="s1">'%Y-%m-%d'</span><span class="p">,</span> <span class="n">created_at</span><span class="p">)</span> <span class="o">=</span> <span class="k">CURRENT_DATE</span><span class="p">;</span> </code></pre></div></div> <p>And get notifications like these:</p> <p><img src="/images/posts/cron-monitoring-with-blazer/slack_notification_passing_again.png" alt="&quot;Slack notification of check passing again&quot;" title="Slack notification of check passing again" /></p> <p>Using this UI:</p> <p><img src="/images/posts/cron-monitoring-with-blazer/create_todays_logs_check.png" alt="Creating our check for today's logs" title="Creating our check for today's logs" /></p> <p><img src="/images/posts/cron-monitoring-with-blazer/query_job_passing_again.png" alt="Query passing again" title="Query passing again" /></p> <p>There are two steps to this:</p> <ol> <li>We will build a logging endpoint into our Rails app</li> <li>We will install the blazer gem for building checks and notifications</li> </ol> <h2 id="logging-endpoint">Logging endpoint</h2> <p>I’ve taken inspiration from <a href="https://healthchecks.io/">Healthchecks.io</a>:</p> <p>Run our command and then curl to our logging endpoint:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>some_command <span class="o">&amp;&amp;</span> curl <span class="nt">-fsS</span> <span class="nt">-m</span> 10 <span class="nt">--retry</span> 5 <span class="nt">-o</span> /dev/null https://our-domain.com/job_logs/:token/:name </code></pre></div></div> <p>We match a log to a job through the <code class="highlighter-rouge">:name</code>. Then we hash a secret and <code class="highlighter-rouge">:name</code> into a <code class="highlighter-rouge">:token</code> and use it to prevent people from pushing bogus logs to our database.</p> <p>Let’s create the table we’ll use first:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># The migration</span> <span class="k">class</span> <span class="nc">CreateJobLogs</span> <span class="o">&lt;</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migration</span><span class="p">[</span><span class="mf">7.0</span><span class="p">]</span> <span class="k">def</span> <span class="nf">change</span> <span class="n">create_table</span> <span class="ss">:job_logs</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span> <span class="n">t</span><span class="p">.</span><span class="nf">string</span> <span class="ss">:name</span><span class="p">,</span> <span class="ss">null: </span><span class="kp">false</span> <span class="n">t</span><span class="p">.</span><span class="nf">string</span> <span class="ss">:status</span><span class="p">,</span> <span class="ss">null: </span><span class="kp">false</span><span class="p">,</span> <span class="ss">default: </span><span class="s1">'success'</span> <span class="n">t</span><span class="p">.</span><span class="nf">string</span> <span class="ss">:error</span> <span class="n">t</span><span class="p">.</span><span class="nf">string</span> <span class="ss">:duration</span> <span class="n">t</span><span class="p">.</span><span class="nf">string</span> <span class="ss">:output</span> <span class="n">t</span><span class="p">.</span><span class="nf">timestamps</span> <span class="k">end</span> <span class="k">end</span> <span class="k">end</span> </code></pre></div></div> <p>The fields <code class="highlighter-rouge">:status</code>, <code class="highlighter-rouge">:error</code>, <code class="highlighter-rouge">:duration</code> and <code class="highlighter-rouge">:output</code> might be useful in the future.</p> <p>Now for the controller. We need it to check the token, and create a log record:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Route - We use GET instead of POST</span> <span class="n">get</span> <span class="s1">'/job_logs/:token/:name'</span><span class="p">,</span> <span class="ss">to: </span><span class="s1">'job_logs#create'</span> <span class="c1"># The controller</span> <span class="k">class</span> <span class="nc">JobLogsController</span> <span class="o">&lt;</span> <span class="no">ApplicationController</span> <span class="k">def</span> <span class="nf">create</span> <span class="vi">@job_log</span> <span class="o">=</span> <span class="no">JobLog</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">job_log_params</span><span class="p">)</span> <span class="k">return</span> <span class="n">render</span> <span class="ss">json: </span><span class="p">{</span> <span class="ss">error: </span><span class="s1">'Invalid token'</span> <span class="p">},</span> <span class="ss">status: :unauthorized</span> <span class="k">unless</span> <span class="vi">@job_log</span><span class="p">.</span><span class="nf">valid_token?</span><span class="p">(</span><span class="n">params</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="ss">:token</span><span class="p">))</span> <span class="vi">@job_log</span><span class="p">.</span><span class="nf">save!</span> <span class="k">end</span> <span class="kp">private</span> <span class="k">def</span> <span class="nf">job_log_params</span> <span class="n">params</span><span class="p">.</span><span class="nf">permit</span><span class="p">(</span><span class="ss">:name</span><span class="p">,</span> <span class="ss">:status</span><span class="p">,</span> <span class="ss">:error</span><span class="p">,</span> <span class="ss">:duration</span><span class="p">,</span> <span class="ss">:output</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> </code></pre></div></div> <p>Our model uses the Rails app’s secret key base to generate a token and looks like this:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">JobLog</span> <span class="o">&lt;</span> <span class="no">ApplicationRecord</span> <span class="n">validates</span> <span class="ss">:name</span><span class="p">,</span> <span class="ss">presence: </span><span class="kp">true</span> <span class="n">validates</span> <span class="ss">:status</span><span class="p">,</span> <span class="ss">presence: </span><span class="kp">true</span> <span class="k">def</span> <span class="nf">token</span> <span class="no">Digest</span><span class="o">::</span><span class="no">SHA1</span><span class="p">.</span><span class="nf">hexdigest</span><span class="p">(</span><span class="s2">"</span><span class="si">#{</span><span class="nb">name</span><span class="si">}</span><span class="s2">-</span><span class="si">#{</span><span class="n">secret</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span> <span class="k">end</span> <span class="k">def</span> <span class="nf">valid_token?</span><span class="p">(</span><span class="n">input_token</span><span class="p">)</span> <span class="n">token</span> <span class="o">==</span> <span class="n">input_token</span> <span class="k">end</span> <span class="kp">private</span> <span class="k">def</span> <span class="nf">secret</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">application</span><span class="p">.</span><span class="nf">secrets</span><span class="p">.</span><span class="nf">secret_key_base</span> <span class="k">end</span> <span class="k">end</span> </code></pre></div></div> <p>Let’s try it out:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-fsS</span> <span class="nt">-m</span> 10 <span class="nt">--retry</span> 5 <span class="nt">-o</span> /dev/null http://localhost:3000/job_logs/123/ping_site curl: <span class="o">(</span>22<span class="o">)</span> The requested URL returned error: 401 <span class="c"># Let's generate a token:</span> bin/rails r <span class="s2">"puts JobLog.new(name: 'ping_site').token"</span> 40f958deccb8ba624562a7c81e5609ab66a71b4a curl <span class="nt">-fsS</span> <span class="nt">-m</span> 10 <span class="nt">--retry</span> 5 <span class="nt">-o</span> /dev/null http://localhost:3000/job_logs/40f958deccb8ba624562a7c81e5609ab66a71b4a/ping_site </code></pre></div></div> <h2 id="set-up-blazer-checks">Set up blazer checks</h2> <p>The <a href="https://github.com/ankane/blazer">blazer readme</a> describes how to install it. I’ve mounted it to <code class="highlighter-rouge">/blazer</code>.</p> <p>Take extra care to set up authentication to keep out prying eyes.</p> <p>To get Slack notifications working, you will have to set the <code class="highlighter-rouge">BLAZER_SLACK_WEBHOOK_URL</code> and also set an action_mailer default_url option like so:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># config/environments/development.rb</span> <span class="n">config</span><span class="p">.</span><span class="nf">action_mailer</span><span class="p">.</span><span class="nf">default_url_options</span> <span class="o">=</span> <span class="p">{</span> <span class="ss">host: </span><span class="s2">"localhost:3000"</span> <span class="p">}</span> </code></pre></div></div> <p>Let’s write a query to have a look at all our job logs so far:</p> <p><img src="/images/posts/cron-monitoring-with-blazer/job_logs_select_all.png" alt="Setting up a check in Blazer" title="Setting up a check in Blazer" /></p> <p>If we want to run a job daily, we’ll have to truncate the records to their day. We create the following query:</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">job_logs</span> <span class="k">WHERE</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">'ping_site'</span> <span class="k">AND</span> <span class="n">strftime</span><span class="p">(</span><span class="s1">'%Y-%m-%d'</span><span class="p">,</span> <span class="n">created_at</span><span class="p">)</span> <span class="o">=</span> <span class="k">CURRENT_DATE</span><span class="p">;</span> </code></pre></div></div> <p><img src="/images/posts/cron-monitoring-with-blazer/creating_our_query.png" alt="Creating our query in blazer" title="Creating our query in blazer" /></p> <p>Now we create a check based on this query. Notice the Slack channel I specified:</p> <p><img src="/images/posts/cron-monitoring-with-blazer/create_todays_logs_check.png" alt="Creating our check for today's logs" title="Creating our check for today's logs" /></p> <h3 id="running-cron">Running cron</h3> <p>Finally set up a cron job to run blazer’s checking command:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rake blazer:run_checks <span class="nv">SCHEDULE</span><span class="o">=</span><span class="s2">"1 day"</span> </code></pre></div></div> <p>When this runs, Blazer will check if our query returns a result. If it doesn’t, it sends a Slack notification like so:</p> <p><img src="/images/posts/cron-monitoring-with-blazer/slack_notification_failing.png" alt="Slack notification of failure" title="Slack notification of failure" /></p> <p>Once the check starts passing again, we will also get a notification:</p> <p><img src="/images/posts/cron-monitoring-with-blazer/slack_notification_passing_again.png" alt="Slack notification of check passing again" title="Slack notification of check passing again" /></p> <h2 id="conclusion">Conclusion</h2> <p>There you have it, we have a neat, low-effort way to track cron jobs from within our app. On top of this, we could build extra features such as API throttling or reports.</p> <p>Blazer enables you two to make something that otherwise requires quite a bit of coding. So think about what else you could do with it.</p> <p>Enjoy!</p>Quick page benchmarks2022-05-29T00:00:00+00:002022-05-29T00:00:00+00:00https://abuisman.com/posts/developer-tools/quick-page-benchmarks<p>I love optimising performance, be it in databases, scripts or webpages. It can be pretty evident in database queries when you’ve improved the performance, with total runtimes of scripts, etc.</p> <p>Webpages, however, are a bit more tricky when the differences are relatively small. I’ll show you a quick way to run benchmarks that won’t require you to set up a lot of stuff in your framework but instead works with <code class="highlighter-rouge">curl</code> and <code class="highlighter-rouge">hyperfine</code>, a command-line benchmarking tool.</p> <p><a href="https://github.com/sharkdp/hyperfine" target="_blank">Hyperfine is an excellent CLI benchmark tool that can benchmark different commands and shows a nice visual result. </a></p> <h2 id="what-are-we-testing">What are we testing?</h2> <p>So let’s first have a quick look at the code I want to demonstrate this with:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># The migration for our data</span> <span class="n">create_table</span> <span class="ss">:posts</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span> <span class="n">t</span><span class="p">.</span><span class="nf">string</span> <span class="ss">:title</span> <span class="n">t</span><span class="p">.</span><span class="nf">string</span> <span class="ss">:tag</span> <span class="n">t</span><span class="p">.</span><span class="nf">timestamps</span> <span class="k">end</span> <span class="k">class</span> <span class="nc">PostsController</span> <span class="o">&lt;</span> <span class="no">ApplicationController</span> <span class="k">def</span> <span class="nf">index</span> <span class="vi">@posts</span> <span class="o">=</span> <span class="no">Post</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="ss">title: </span><span class="s1">'Faster is more fun'</span><span class="p">,</span> <span class="ss">tag: </span><span class="s1">'ruby'</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> </code></pre></div></div> <p>I filled up the database with 440000 records to play with.</p> <p>I want to figure out whether adding an index to posts on <code class="highlighter-rouge">:title</code> and <code class="highlighter-rouge">:tag</code> will make the request faster. I am pretty sure it should.</p> <p>The example is contrived, but let’s explore it anyway. The view looks like this:</p> <div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;</span><span class="err">%</span> <span class="err">@</span><span class="na">posts.each</span> <span class="na">do</span> <span class="err">|</span><span class="na">post</span><span class="err">|</span> <span class="err">%</span><span class="nt">&gt;</span> <span class="nt">&lt;</span><span class="err">%=</span> <span class="na">post.id</span> <span class="err">%</span><span class="nt">&gt;</span> <span class="nt">&lt;</span><span class="err">%</span> <span class="na">end</span> <span class="err">%</span><span class="nt">&gt;</span> </code></pre></div></div> <p>Which looks like this:</p> <p><img src="/images/posts/developer-tools/quick-page-benchmarks/the-view.png" alt="The index with ids." /></p> <p>Imagine this is a webpage with some authorisation in front of it. How can we benchmark it? We might think of a capybara or rack test, which would work. We can hit it a few hundred times in a <code class="highlighter-rouge">Benchmark</code> block in Ruby. We’d then have to get the situation in the test suite right, sign in users, etc. Writing this sounds like a lot of work, but there is an easier way!</p> <p>We will grab the request from the browser as a curl command, which will include all the cookies we need for authentication, and we don’t have to think about the URL; it will be in there already. The test data is in our development database, which we could also point to a follower or some other place with more data.</p> <p>We will create a command line script to run the curl request and feed it into <code class="highlighter-rouge">hyperfine</code> to benchmark it.</p> <p>We will then make our tweaks and run hyperfine again to see the ‘after’ result and validate that we made things better, not worse.</p> <p>Ensure you have <code class="highlighter-rouge">hyperfine</code> installed: <code class="highlighter-rouge">brew install hyperfine</code>.</p> <p>So the steps are:</p> <ol> <li>Open up the development console</li> <li>Go to the Network tab</li> <li>Copy the request to the page as curl</li> <li>Paste it in a file called whatever you like, such as <code class="highlighter-rouge">benchmark.sh</code></li> <li>Run <code class="highlighter-rouge">chmod +x benchmark.sh</code></li> <li>Run hyperfine to capture the before-result: <code class="highlighter-rouge">hyperfine --warmup 10 --min-runs 100 ./benchmark.sh</code></li> <li>Make our changes and rerun the benchmark to capture the after-result.</li> </ol> <p>So:</p> <p><img src="/images/posts/developer-tools/quick-page-benchmarks/capture-curl.png" alt="Capture request as curl" /></p> <p>Which looks like this:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="s1">'http://localhost:3001/'</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s1">'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s1">'Accept-Language: en-GB'</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s1">'Cache-Control: max-age=0'</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s1">'Connection: keep-alive'</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s1">'Cookie: _...'</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s1">'If-None-Match: W/"2474ec9d05466c6ad84b819e7221085b"'</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s1">'Sec-Fetch-Dest: document'</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s1">'Sec-Fetch-Mode: navigate'</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s1">'Sec-Fetch-Site: same-origin'</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s1">'Sec-Fetch-User: ?1'</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s1">'Sec-GPC: 1'</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s1">'Upgrade-Insecure-Requests: 1'</span> <span class="se">\</span> <span class="nt">-H</span> <span class="s1">'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.61 Safari/537.36'</span> <span class="se">\</span> <span class="nt">--compressed</span> </code></pre></div></div> <p>After making the script executable, we can run hyperfine:</p> <p><code class="highlighter-rouge">hyperfine --warmup 10 --min-runs 100 ./benchmark.sh</code></p> <p>The flag <code class="highlighter-rouge">--warmup 10</code> will first run the request 10 times without measuring. I do this to wake up Rails and the database server a bit. <code class="highlighter-rouge">--min-runs 100</code> will ensure we fire at least 100 requests for our benchmark to have some statistical significance.</p> <p>The result looks as follows:</p> <p><img src="/images/posts/developer-tools/quick-page-benchmarks/benchmark-result.png" alt="Hyperfine benchmark result" /></p> <p>Or:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Benchmark 1: ./benchmark.sh Time (mean ± σ): 57.6 ms ± 8.7 ms [User: 2.5 ms, System: 3.6 ms] Range (min … max): 47.7 ms … 91.5 ms 100 runs </code></pre></div></div> <p>So we can see that without an index, our request has a mean time of 57.6ms.</p> <p>Let’s add an index and see if it makes it quicker.</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- We run this on the database:</span> <span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">index_posts_title_tag</span> <span class="k">ON</span> <span class="n">posts</span> <span class="p">(</span><span class="n">title</span><span class="p">,</span> <span class="n">tag</span><span class="p">);</span> </code></pre></div></div> <p>Now we rerun hyperfine:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Benchmark 1: ./benchmark.sh Time (mean ± σ): 29.5 ms ± 7.2 ms [User: 2.4 ms, System: 3.4 ms] Range (min … max): 23.4 ms … 61.8 ms 100 runs </code></pre></div></div> <p>Impressive! Twice as fast! But more importantly, we have validated our improvement!</p> <h2 id="note-1-be-sure-to-have-a-lot-of-data">Note 1: Be sure to have a lot of data!</h2> <p>There is no point in trying this with just 10 posts. With 10 posts, before the index:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Benchmark 1: ./benchmark.sh Time (mean ± σ): 18.4 ms ± 5.9 ms [User: 2.4 ms, System: 3.4 ms] Range (min … max): 12.8 ms … 56.3 ms 146 runs </code></pre></div></div> <p>After the index:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Benchmark 1: ./benchmark.sh Time (mean ± σ): 15.6 ms ± 2.1 ms [User: 2.4 ms, System: 3.3 ms] Range (min … max): 13.2 ms … 25.8 ms 157 runs </code></pre></div></div> <p>There is some difference, but it is much more visible with more data.</p> <h2 id="note-2-this-was-a-back-end-test">Note 2: This was a back-end test</h2> <p>Curl will only fetch the HTML, and that is it. It won’t load the HTML, extra resources, etc. So this is a back-end test. If you want to test the whole loading of all the front-end assets, etc., you will have to resort to some different command.</p> <p>I have not looked into the whole cookie copy-ing to make sure I am still signed in on the page I am testing, but running a headless chrome instance can be done like so:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># First run this to have an easy reference to `chrome` if you are on mac:</span> <span class="nb">alias </span><span class="nv">chrome</span><span class="o">=</span><span class="s2">"/Applications/Google</span><span class="se">\ </span><span class="s2">Chrome.app/Contents/MacOS/Google</span><span class="se">\ </span><span class="s2">Chrome"</span> </code></pre></div></div> <p>Call pages in the headless browser:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>chrome <span class="nt">--headless</span> <span class="nt">--disable-gpu</span> <span class="nt">--dump-dom</span> http://localhost:3001/ </code></pre></div></div> <p><a href="https://developers.google.com/web/updates/2017/04/headless-chrome" target="_blank">Taken from here</a></p> <p>Put the <code class="highlighter-rouge">chrome --headless...</code> command in the <code class="highlighter-rouge">benchmark.sh</code> script, and you can use the same basic principles above.</p> <h2 id="conclusion">Conclusion</h2> <p>I’ve found this benchmarking method handy because it is quick to do, and the results are pretty reliable.</p> <p>In my early past, I’ve tried to sort of blindly optimise things and hope for the best in production. Then in production, it is tough to figure out whether you improved performance by 10% or if there is a difference due to other factors such as server load. Doing a benchmark on your computer is relatively controlled, and it is easier to attribute differences to specific changes you made.</p> <p>You can use this concept for any framework, which makes it very portable.</p>Simple query optimisation2022-05-23T00:00:00+00:002022-05-23T00:00:00+00:00https://abuisman.com/posts/postgres/simple_query_optimisation<p>Query optimisation is one of my favourite things to do because you can get extra performance in response times and because we will use fewer resources in your database.</p> <p>To optimise queries properly, you have to look at the queries but also know something about your application’s domain. I will take you along in a straightforward one I did today.</p> <p>So starting from the following query:</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">SELECT</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">FROM</span> <span class="nv">"bookings"</span> <span class="k">WHERE</span> <span class="nv">"bookings"</span><span class="p">.</span><span class="nv">"state"</span> <span class="o">=</span> <span class="s1">'confirmed'</span> <span class="k">AND</span> <span class="p">(</span><span class="n">ends_at</span> <span class="o">&lt;</span> <span class="s1">'2022-05-24'</span><span class="p">);</span> <span class="k">Aggregate</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">88226</span><span class="p">.</span><span class="mi">78</span><span class="p">..</span><span class="mi">88226</span><span class="p">.</span><span class="mi">79</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">width</span><span class="o">=</span><span class="mi">8</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="nb">time</span><span class="o">=</span><span class="mi">127</span><span class="p">.</span><span class="mi">844</span><span class="p">..</span><span class="mi">127</span><span class="p">.</span><span class="mi">845</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Bitmap</span> <span class="n">Heap</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">bookings</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">12225</span><span class="p">.</span><span class="mi">25</span><span class="p">..</span><span class="mi">88146</span><span class="p">.</span><span class="mi">35</span> <span class="k">rows</span><span class="o">=</span><span class="mi">32171</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="nb">time</span><span class="o">=</span><span class="mi">55</span><span class="p">.</span><span class="mi">488</span><span class="p">..</span><span class="mi">125</span><span class="p">.</span><span class="mi">832</span> <span class="k">rows</span><span class="o">=</span><span class="mi">39106</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">Recheck</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(((</span><span class="k">state</span><span class="p">)::</span><span class="nb">text</span> <span class="o">=</span> <span class="s1">'confirmed'</span><span class="p">::</span><span class="nb">text</span><span class="p">)</span> <span class="k">AND</span> <span class="p">(</span><span class="n">ends_at</span> <span class="o">&lt;</span> <span class="s1">'2022-05-24'</span><span class="p">::</span><span class="nb">date</span><span class="p">))</span> <span class="n">Heap</span> <span class="n">Blocks</span><span class="p">:</span> <span class="n">exact</span><span class="o">=</span><span class="mi">30135</span> <span class="o">-&gt;</span> <span class="n">BitmapAnd</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">12225</span><span class="p">.</span><span class="mi">25</span><span class="p">..</span><span class="mi">12225</span><span class="p">.</span><span class="mi">25</span> <span class="k">rows</span><span class="o">=</span><span class="mi">32171</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="nb">time</span><span class="o">=</span><span class="mi">49</span><span class="p">.</span><span class="mi">448</span><span class="p">..</span><span class="mi">49</span><span class="p">.</span><span class="mi">449</span> <span class="k">rows</span><span class="o">=</span><span class="mi">0</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Bitmap</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">index_bookings_on_state</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">2482</span><span class="p">.</span><span class="mi">18</span> <span class="k">rows</span><span class="o">=</span><span class="mi">40768</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="nb">time</span><span class="o">=</span><span class="mi">21</span><span class="p">.</span><span class="mi">448</span><span class="p">..</span><span class="mi">21</span><span class="p">.</span><span class="mi">448</span> <span class="k">rows</span><span class="o">=</span><span class="mi">41685</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">((</span><span class="k">state</span><span class="p">)::</span><span class="nb">text</span> <span class="o">=</span> <span class="s1">'confirmed'</span><span class="p">::</span><span class="nb">text</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Bitmap</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">index_bookings_on_ends_at</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">9726</span><span class="p">.</span><span class="mi">74</span> <span class="k">rows</span><span class="o">=</span><span class="mi">198709</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="nb">time</span><span class="o">=</span><span class="mi">26</span><span class="p">.</span><span class="mi">043</span><span class="p">..</span><span class="mi">26</span><span class="p">.</span><span class="mi">043</span> <span class="k">rows</span><span class="o">=</span><span class="mi">198490</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">ends_at</span> <span class="o">&lt;</span> <span class="s1">'2022-05-24'</span><span class="p">::</span><span class="nb">date</span><span class="p">)</span> <span class="n">Planning</span> <span class="nb">time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">616</span> <span class="n">ms</span> <span class="n">Execution</span> <span class="nb">time</span><span class="p">:</span> <span class="mi">128</span><span class="p">.</span><span class="mi">003</span> <span class="n">ms</span> </code></pre></div></div> <p>We can see from the explain-analyse output that we first do a <code class="highlighter-rouge">Bitmap Index Scan on index_bookings_on_ends_at</code>. This means that Postgres scans the index to find the records that match the given where condition <code class="highlighter-rouge">Index Cond: (ends_at &lt; '2022-05-24'::date)</code>, which led to 198490 rows (<code class="highlighter-rouge">rows=198490</code>).</p> <p>Then a scan is done on <code class="highlighter-rouge">Bitmap Index Scan on index_bookings_on_state</code> with the condition <code class="highlighter-rouge">((state)::text = 'confirmed'::text)</code> which led to 41685 rows (<code class="highlighter-rouge">rows=41685</code>) .</p> <p>These things happen because there is an index separately on <code class="highlighter-rouge">ends_at</code> and <code class="highlighter-rouge">state</code>. This means that PostgreSQL has to figure out which records satisfy both conditions and that information is spread out and needs to be combined again.</p> <p>Combining, at least as I understand it, happens in the <code class="highlighter-rouge">BitmapAnd</code> operation. Then the rows are all fetched from their physical storage and scanned in the <code class="highlighter-rouge">Bitmap Heap Scan</code> to see if it is true that they satisfy both conditions, hence the <code class="highlighter-rouge">Recheck Cond: (((state)::text = 'confirmed'::text) AND (ends_at &lt; '2022-05-24'::date))</code>.</p> <p>This sounds like a lot of work. You might have guessed how we can improve this: we have to create a single index for <code class="highlighter-rouge">state</code> and <code class="highlighter-rouge">ends_at</code>. Let’s see what this does.</p> <p>We create the index like so:</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">index_ends_at_state</span> <span class="k">ON</span> <span class="n">bookings</span><span class="p">(</span><span class="n">ends_at</span> <span class="n">date_ops</span><span class="p">,</span><span class="k">state</span> <span class="n">text_ops</span><span class="p">);</span> </code></pre></div></div> <p>Let’s re-run our query:</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">Aggregate</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">5625</span><span class="p">.</span><span class="mi">65</span><span class="p">..</span><span class="mi">5625</span><span class="p">.</span><span class="mi">66</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">width</span><span class="o">=</span><span class="mi">8</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="nb">time</span><span class="o">=</span><span class="mi">35</span><span class="p">.</span><span class="mi">506</span><span class="p">..</span><span class="mi">35</span><span class="p">.</span><span class="mi">506</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="k">Index</span> <span class="k">Only</span> <span class="n">Scan</span> <span class="k">using</span> <span class="n">index_ends_at_state</span> <span class="k">on</span> <span class="n">bookings</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">42</span><span class="p">..</span><span class="mi">5545</span><span class="p">.</span><span class="mi">22</span> <span class="k">rows</span><span class="o">=</span><span class="mi">32171</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="nb">time</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">241</span><span class="p">..</span><span class="mi">30</span><span class="p">.</span><span class="mi">923</span> <span class="k">rows</span><span class="o">=</span><span class="mi">39106</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">((</span><span class="n">ends_at</span> <span class="o">&lt;</span> <span class="s1">'2022-05-24'</span><span class="p">::</span><span class="nb">date</span><span class="p">)</span> <span class="k">AND</span> <span class="p">(</span><span class="k">state</span> <span class="o">=</span> <span class="s1">'confirmed'</span><span class="p">::</span><span class="nb">text</span><span class="p">))</span> <span class="n">Heap</span> <span class="n">Fetches</span><span class="p">:</span> <span class="mi">0</span> <span class="n">Planning</span> <span class="nb">time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">280</span> <span class="n">ms</span> <span class="n">Execution</span> <span class="nb">time</span><span class="p">:</span> <span class="mi">35</span><span class="p">.</span><span class="mi">554</span> <span class="n">ms</span> </code></pre></div></div> <p>Nice! As expected, this is quicker: from <code class="highlighter-rouge">128.003 ms</code> to <code class="highlighter-rouge">35.554 ms</code>, which is 3.6 times faster or 92ms per call less.</p> <p>Why care?</p> <p><img src="/images/posts/postgres/simple-query-optimisation/simple_query_optimisation_usage_percentage.png" alt="The query uses 11% of total runtime" /></p> <p>Whether it should be or not, this query is being called ~70 times per minute at the time of writing. This means <code class="highlighter-rouge">70 * 0,92 = 6,44 seconds/minute</code> of savings. Pretty good, but more broadly speaking, we save some resources.</p> <h2 id="but-wait-there-is-more">But wait, there is more!</h2> <p>So here is where a bit of domain knowledge can help us further improve the query’s speed.</p> <p>We have defined our index with <code class="highlighter-rouge">ends_at</code> first and then <code class="highlighter-rouge">state</code>, which means that Postgres will first collect all the index entries that match the <code class="highlighter-rouge">ends_at</code> condition and then traverse its tree to reach the correct <code class="highlighter-rouge">state</code>.</p> <p>As you’ll recall, our <code class="highlighter-rouge">ends_at</code> condition resulted in 198709 rows in the separate index scan, whereas the condition on <code class="highlighter-rouge">state</code> resulted in 40768 rows.</p> <p>This means that <code class="highlighter-rouge">ends_at</code> limits the rows less than <code class="highlighter-rouge">state</code>.</p> <p>Let’s swap the columns in the index to see if this will help.</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">DROP</span> <span class="k">INDEX</span> <span class="n">index_ends_at_state</span><span class="p">;</span> <span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">index_state_ends_at</span> <span class="k">ON</span> <span class="n">bookings</span><span class="p">(</span><span class="k">state</span> <span class="n">text_ops</span><span class="p">,</span><span class="n">ends_at</span> <span class="n">date_ops</span><span class="p">);</span> </code></pre></div></div> <p>Now the query plan looks like this:</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">Aggregate</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">1248</span><span class="p">.</span><span class="mi">27</span><span class="p">..</span><span class="mi">1248</span><span class="p">.</span><span class="mi">28</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">width</span><span class="o">=</span><span class="mi">8</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="nb">time</span><span class="o">=</span><span class="mi">24</span><span class="p">.</span><span class="mi">197</span><span class="p">..</span><span class="mi">24</span><span class="p">.</span><span class="mi">198</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="k">Index</span> <span class="k">Only</span> <span class="n">Scan</span> <span class="k">using</span> <span class="n">index_ends_at_state</span> <span class="k">on</span> <span class="n">bookings</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">42</span><span class="p">..</span><span class="mi">1167</span><span class="p">.</span><span class="mi">84</span> <span class="k">rows</span><span class="o">=</span><span class="mi">32171</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="nb">time</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">085</span><span class="p">..</span><span class="mi">17</span><span class="p">.</span><span class="mi">707</span> <span class="k">rows</span><span class="o">=</span><span class="mi">39106</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">((</span><span class="k">state</span> <span class="o">=</span> <span class="s1">'confirmed'</span><span class="p">::</span><span class="nb">text</span><span class="p">)</span> <span class="k">AND</span> <span class="p">(</span><span class="n">ends_at</span> <span class="o">&lt;</span> <span class="s1">'2022-05-24'</span><span class="p">::</span><span class="nb">date</span><span class="p">))</span> <span class="n">Heap</span> <span class="n">Fetches</span><span class="p">:</span> <span class="mi">0</span> <span class="n">Planning</span> <span class="nb">time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">255</span> <span class="n">ms</span> <span class="n">Execution</span> <span class="nb">time</span><span class="p">:</span> <span class="mi">24</span><span class="p">.</span><span class="mi">242</span> <span class="n">ms</span> </code></pre></div></div> <p>Firstly you might look at the new execution time: 24 ms, which is quicker but could be a coincidence. It will go from 17-30 ms execution time every time you run this. So it feels faster, but is it?</p> <p>Let’s have a look at the <code class="highlighter-rouge">costs</code>:</p> <ul> <li>Separate indexes: <code class="highlighter-rouge">Aggregate (cost=88226.78..88226.79 rows=1 width=8)</code></li> <li><code class="highlighter-rouge">ends_at, state</code>: <code class="highlighter-rouge">Aggregate (cost=5625.65..5625.66 rows=1 width=8)</code></li> <li><code class="highlighter-rouge">state, ends_at</code>: <code class="highlighter-rouge">Aggregate (cost=1248.27..1248.28 rows=1 width=8)</code></li> </ul> <p>We can see that the calculated execution costs went from 88k to 1,2k, and swapping the columns made it 4,3x cheaper to run.</p> <p>So here you have it, a quick exploration of how using indexes according to what your application runs them and a bit of domain knowledge can help you squeeze out some extra speed and resources.</p>Fixing Ruby openssl errors2022-02-17T00:00:00+00:002022-02-17T00:00:00+00:00https://abuisman.com/posts/ruby/fixing-ruby-openssl-errors<p>The other day I wanted to try out Rails 7’s import maps on my new M1 MacBook, but I was blocked by an error telling me that Ruby couldn’t make any SSL requests.</p> <p>I’ll go through how I fixed this.</p> <p>The command I tried that gave me an error was:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bin/importmap pin @picocss/pico </code></pre></div></div> <p>I got the following error about SSL:</p> <div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/Users/abuisman/.asdf/installs/ruby/3.1.0/lib/ruby/3.1.0/net/protocol.rb:46:in `connect_nonblock': SSL_connect returned=1 errno=0 peeraddr=52.142.124.215:443 state=error: certificate verify failed (unable to get local issuer certificate) (OpenSSL::SSL::SSLError) from /Users/abuisman/.asdf/installs/ruby/3.1.0/lib/ruby/3.1.0/net/protocol.rb:46:in `ssl_socket_connect' from /Users/abuisman/.asdf/installs/ruby/3.1.0/lib/ruby/3.1.0/net/http.rb:1048:in `connect' from /Users/abuisman/.asdf/installs/ruby/3.1.0/lib/ruby/3.1.0/net/http.rb:976:in `do_start' from /Users/abuisman/.asdf/installs/ruby/3.1.0/lib/ruby/3.1.0/net/http.rb:965:in `start' from /Users/abuisman/.asdf/installs/ruby/3.1.0/lib/ruby/3.1.0/net/http.rb:1530:in `request' from (irb):25:in `&lt;main&gt;' from /Users/abuisman/.asdf/installs/ruby/3.1.0/lib/ruby/gems/3.1.0/gems/irb-1.4.1/exe/irb:11:in `&lt;top (required)&gt;' from /Users/abuisman/.asdf/installs/ruby/3.1.0/bin/irb:25:in `load' from /Users/abuisman/.asdf/installs/ruby/3.1.0/bin/irb:25:in `&lt;main&gt;' </code></pre></div></div> <p>I could reproduce this in a Ruby console with this code:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'uri'</span> <span class="nb">require</span> <span class="s1">'net/http'</span> <span class="nb">require</span> <span class="s1">'openssl'</span> <span class="n">url</span> <span class="o">=</span> <span class="no">URI</span><span class="p">(</span><span class="s2">"https://duckduckgo.com/"</span><span class="p">)</span> <span class="n">http</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">url</span><span class="p">.</span><span class="nf">port</span><span class="p">)</span> <span class="n">http</span><span class="p">.</span><span class="nf">use_ssl</span> <span class="o">=</span> <span class="kp">true</span> <span class="n">request</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Get</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="n">response</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span> <span class="nb">puts</span> <span class="n">response</span><span class="p">.</span><span class="nf">read_body</span> </code></pre></div></div> <p>I’ve seen this error before and usually it happens when you haven’t compiled ruby properly with openssl. So I first reinstalled [email protected] and openssl@3 and a few times again and again, but I couldn’t fix it.</p> <p>I was ducking around for some info and I sort of picked up in my searching that this could mean that my CA certificates were outdated.</p> <p>I’ll start with the fix:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew uninstall [email protected] <span class="nt">--force</span> <span class="nt">--ignore-dependencies</span> brew <span class="nb">install </span>[email protected] </code></pre></div></div> <p>The <code class="highlighter-rouge">--force --ignore-dependencies</code> is critical. I just ran <code class="highlighter-rouge">uninstall</code> without it the first 10 times or so and I didn’t notice that nothing was really being removed because of dependencies. It could be that it was clear from the command output but I do not remember seeing anything relating to dependencies. <code class="highlighter-rouge">--force --ignore-dependencies</code> was the thing that made a difference for me.</p> <p>With the following <code class="highlighter-rouge">.zshrc</code> configuration for compilation related things:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nb">export </span><span class="nv">OPTFLAGS</span><span class="o">=</span><span class="s2">"-Wno-error=implicit-function-declaration"</span> <span class="c"># readline</span> <span class="nb">export </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">"-L/opt/homebrew/opt/readline/lib"</span> <span class="nb">export </span><span class="nv">CPPFLAGS</span><span class="o">=</span><span class="s2">"-I/opt/homebrew/opt/readline/include"</span> <span class="nb">export </span><span class="nv">PKG_CONFIG_PATH</span><span class="o">=</span><span class="s2">"/opt/homebrew/opt/readline/lib/pkgconfig"</span> <span class="c"># openssl config</span> <span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="s2">"/opt/homebrew/opt/[email protected]:</span><span class="nv">$PATH</span><span class="s2">"</span> <span class="c"># Might be overkill</span> <span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="s2">"/opt/homebrew/opt/[email protected]/bin:</span><span class="nv">$PATH</span><span class="s2">"</span> <span class="nb">export </span><span class="nv">LIBRARY_PATH</span><span class="o">=</span><span class="s2">"/opt/homebrew/opt/[email protected]:</span><span class="nv">$LIBRARY_PATH</span><span class="s2">"</span> <span class="nb">export </span><span class="nv">RUBY_CONFIGURE_OPTS</span><span class="o">=</span><span class="s2">"--with-openssl-dir=/opt/homebrew/opt/[email protected]"</span> <span class="nb">export </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">"-L/opt/homebrew/opt/[email protected]/lib:</span><span class="nv">$LDFLAGS</span><span class="s2">"</span> <span class="nb">export </span><span class="nv">CPPFLAGS</span><span class="o">=</span><span class="s2">"-I/opt/homebrew/opt/[email protected]/include:</span><span class="nv">$CPPFLAGS</span><span class="s2">"</span> <span class="nb">export </span><span class="nv">PKG_CONFIG_PATH</span><span class="o">=</span><span class="s2">"/opt/homebrew/opt/[email protected]/lib/pkgconfig:</span><span class="nv">$PKG_CONFIG_PATH</span><span class="s2">"</span> <span class="c"># libffi</span> <span class="nb">export </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">"</span><span class="nv">$LDFLAGS</span><span class="s2">:-L/opt/homebrew/opt/libffi/lib"</span> <span class="nb">export </span><span class="nv">CPPFLAGS</span><span class="o">=</span><span class="s2">"</span><span class="nv">$CPPFLAGS</span><span class="s2">:-I/opt/homebrew/opt/libffi/include"</span> <span class="nb">export </span><span class="nv">PKG_CONFIG_PATH</span><span class="o">=</span><span class="s2">"</span><span class="nv">$PKG_CONFIG_PATH</span><span class="s2">:/opt/homebrew/opt/libffi/lib/pkgconfig"</span> </code></pre></div></div> <p>Then install ruby. For example with <code class="highlighter-rouge">asdf</code>:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>asdf <span class="nb">install </span>ruby 3.1.0 </code></pre></div></div> <p>Oh and how do we know it works?</p> <p>A colleague of mine pointed me to the bundler documentation and there they mention a nice command:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-Lks</span> <span class="s1">'https://git.io/rg-ssl'</span> | ruby </code></pre></div></div> <p><a href="https://bundler.io/v2.0/guides/rubygems_tls_ssl_troubleshooting_guide.html#automated-ssl-check">Bundler: How to troubleshoot RubyGems and Bundler TLS/SSL Issues</a></p> <p>Before the fix the output looked like this:</p> <p><img src="/images/posts/ruby/fixing-openssl/broken-open-ssl-config.png" alt="Brew ssl check shows it is broken." /></p> <p>Afterwards it looks like this:</p> <p><img src="/images/posts/ruby/fixing-openssl/working-ssl-config.png" alt="Brew ssl check shows it is working." /></p> <p>This means we have a quick way to test the config without having to boot up a ruby console.</p> <h2 id="we-have-to-could-go-deeper">We <del>have to</del> could go deeper</h2> <p>In my recent migration from an i7 MacBook to this new M1 Pro I moved over my dotfiles with my <code class="highlighter-rouge">.zshrc</code> config. I figured it was wrong at some point so I had a critical look. I must admit looked ridiculous:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">OPTFLAGS</span><span class="o">=</span><span class="s2">"-Wno-error=implicit-function-declaration"</span> <span class="c"># readline</span> <span class="nb">export </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">"-L/opt/homebrew/opt/readline/lib"</span> <span class="nb">export </span><span class="nv">CPPFLAGS</span><span class="o">=</span><span class="s2">"-I/opt/homebrew/opt/readline/include"</span> <span class="nb">export </span><span class="nv">PKG_CONFIG_PATH</span><span class="o">=</span><span class="s2">"/opt/homebrew/opt/readline/lib/pkgconfig"</span> <span class="c"># openssl config</span> <span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="s2">"/opt/homebrew/opt/[email protected]:</span><span class="nv">$PATH</span><span class="s2">"</span> <span class="nb">export </span><span class="nv">LIBRARY_PATH</span><span class="o">=</span><span class="s2">"/opt/homebrew/opt/[email protected]"</span> <span class="nb">export </span><span class="nv">RUBY_CONFIGURE_OPTS</span><span class="o">=</span><span class="s2">"--with-openssl-dir=/opt/homebrew/opt/[email protected]"</span> <span class="nb">export </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">"-L/opt/homebrew/opt/[email protected]/lib"</span> <span class="nb">export </span><span class="nv">PKG_CONFIG_PATH</span><span class="o">=</span><span class="s2">"/opt/homebrew/opt/[email protected]/lib/pkgconfig"</span> <span class="c"># libffi</span> <span class="nb">export </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">"-L/opt/homebrew/opt/libffi/lib"</span> <span class="nb">export </span><span class="nv">CPPFLAGS</span><span class="o">=</span><span class="s2">"</span><span class="nv">$-</span><span class="s2">I/opt/homebrew/opt/libffi/include"</span> <span class="nb">export </span><span class="nv">PKG_CONFIG_PATH</span><span class="o">=</span><span class="s2">"/opt/homebrew/opt/libffi/lib/pkgconfig"</span> </code></pre></div></div> <p>So many of these variables are being overwritten in every section. It seems like this stuff accumulated while I was trying out things over time or as I was installing libraries. Who knows. Git shows many of these lines as new since moving to the new Mac.</p> <p>What is the problem? For example <code class="highlighter-rouge">LDFLAGS</code> gets overwritten like so:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">"-L/opt/homebrew/opt/readline/lib"</span> <span class="nb">export </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">"-L/opt/homebrew/opt/[email protected]/lib"</span> <span class="nb">export </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">"-L/opt/homebrew/opt/libffi/lib"</span> </code></pre></div></div> <p>So in the end the value is just <code class="highlighter-rouge">-L/opt/homebrew/opt/libffi/lib</code>. You can see above that I corrected this by appending <code class="highlighter-rouge">:$LDFLAGS"</code> each time. This means that it will add things to the <code class="highlighter-rouge">LDFLAGS</code> variable instead of overwriting it. It turns out this cleanup didn’t do much, or at least reverting it didn’t break openssl again.</p> <p>Interestingly, I had already built Ruby 2.7.4 already and it was showing the same issues. After fixing 3.1.0, 2.7.4 was also fixed. I believe this has something to do with new certificate files being downloaded as I mentioned earlier. My general idea is that reinstalling openssl properly also updated some certificate.</p> <p>If you look here:</p> <p><img src="/images/posts/ruby/fixing-openssl/working-ssl-config.png" alt="Brew ssl check shows it is working." /></p> <p>You can see that there is a certificate directory and a certificate file:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SSL_CERT_FILE: /opt/homebrew/etc/[email protected]/cert.pem SSL_CERT_DIR: /opt/homebrew/etc/[email protected]/certs </code></pre></div></div> <p>When I look around in the directory it doesn’t contain anything and when I open the certificate file I see the following:</p> <p><img src="/images/posts/ruby/fixing-openssl/openssl-certificate-opened.png" alt="Brew ssl check shows it is working." /></p> <p>I have no clue what this certificate’s role is to be honest. I might find out and update here.</p> <h2 id="conclusion">Conclusion</h2> <p>The fix was for me as described above. I ran into several github issues, stack overflows, etc. that allowed me to piece together these instructions. I hope my gathering of these solutions will be the real fix for you, or for me when I run into this again and forget.</p>Crazyton - Better than module_function2021-12-30T00:00:00+00:002021-12-30T00:00:00+00:00https://abuisman.com/posts/ruby/crazyton<p>I created a new gem that gives you the nice syntax of <code class="highlighter-rouge">module_function</code>s but with the advantages of classes.</p> <p>An example:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">ApiStub</span> <span class="kp">include</span> <span class="no">Crazyton</span> <span class="k">def</span> <span class="nf">sign_up</span><span class="p">(</span><span class="n">email</span><span class="p">,</span> <span class="n">password</span><span class="p">,</span> <span class="n">status</span> <span class="o">=</span> <span class="mi">200</span><span class="p">,</span> <span class="n">response</span> <span class="o">=</span> <span class="n">default_sign_up_request</span><span class="p">)</span> <span class="n">stub_request</span><span class="p">(</span><span class="ss">:post</span><span class="p">,</span> <span class="s2">"https://some-service.com"</span><span class="p">)</span> <span class="p">.</span><span class="nf">with</span><span class="p">(</span><span class="ss">body: </span><span class="p">{</span> <span class="ss">email: </span><span class="n">email</span><span class="p">,</span> <span class="ss">password: </span><span class="n">password</span> <span class="p">})</span> <span class="p">.</span><span class="nf">to_return</span><span class="p">(</span><span class="ss">status: </span><span class="n">status</span><span class="p">,</span> <span class="ss">body: </span><span class="n">response</span><span class="p">.</span><span class="nf">to_json</span><span class="p">)</span> <span class="k">end</span> <span class="kp">private</span> <span class="k">def</span> <span class="nf">default_sign_up_request</span> <span class="p">{</span> <span class="ss">event: :signed_up</span><span class="p">,</span> <span class="ss">user_id: </span><span class="mi">123</span> <span class="p">}</span> <span class="k">end</span> <span class="k">end</span> <span class="c1"># Invoke</span> <span class="no">ApiStub</span><span class="p">.</span><span class="nf">sign_up</span><span class="p">(</span><span class="s2">"[email protected]"</span><span class="p">,</span> <span class="s2">"password123"</span><span class="p">)</span> </code></pre></div></div> <p>You can get it here: <a href="https://github.com/abuisman/crazyton">GitHub - abuisman/crazyton</a></p> <h2 id="why">Why?</h2> <p>Modules with <code class="highlighter-rouge">module_function</code> and are a pretty good way to define helpers and make them available in a nice way:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">AppSettings</span> <span class="kp">module_function</span> <span class="k">def</span> <span class="nf">google_analytics_enabled?</span> <span class="o">!!</span><span class="no">ENV</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="s1">'GOOGLE_ANALYTICS'</span><span class="p">,</span> <span class="kp">false</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> <span class="c1">## Usage:</span> <span class="no">AppSettings</span><span class="p">.</span><span class="nf">google_analytics_enabled?</span> </code></pre></div></div> <p>But they aren’t great once things get more complex.</p> <p>Let’s say that I am creating a few interfaces to some websites. Let’s call them crawlers.</p> <p>This is fine:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'uri'</span> <span class="nb">require</span> <span class="s1">'net/http'</span> <span class="nb">require</span> <span class="s1">'openssl'</span> <span class="nb">require</span> <span class="s1">'nokogiri'</span> <span class="k">module</span> <span class="nn">AbuismanCom</span> <span class="kp">module_function</span> <span class="k">def</span> <span class="nf">latest_post</span> <span class="n">get_selector_text</span><span class="p">(</span><span class="s2">"https://abuisman.com/"</span><span class="p">,</span> <span class="s2">".posts ul li:first-of-type"</span><span class="p">)</span> <span class="k">end</span> <span class="k">def</span> <span class="nf">get_selector_text</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">selector</span><span class="p">)</span> <span class="n">url</span> <span class="o">=</span> <span class="no">URI</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="n">http</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">url</span><span class="p">.</span><span class="nf">port</span><span class="p">)</span> <span class="n">http</span><span class="p">.</span><span class="nf">use_ssl</span> <span class="o">=</span> <span class="kp">true</span> <span class="n">http</span><span class="p">.</span><span class="nf">verify_mode</span> <span class="o">=</span> <span class="no">OpenSSL</span><span class="o">::</span><span class="no">SSL</span><span class="o">::</span><span class="no">VERIFY_NONE</span> <span class="n">request</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Get</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="n">response</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span> <span class="n">response</span><span class="p">.</span><span class="nf">read_body</span> <span class="n">document</span> <span class="o">=</span> <span class="no">Nokogiri</span><span class="o">::</span><span class="no">HTML</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="nf">read_body</span><span class="p">)</span> <span class="n">document</span><span class="p">.</span><span class="nf">css</span><span class="p">(</span><span class="n">selector</span><span class="p">).</span><span class="nf">text</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">).</span><span class="nf">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:strip</span><span class="p">).</span><span class="nf">join</span><span class="p">(</span><span class="s2">" "</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> <span class="nb">puts</span> <span class="no">AbuismanCom</span><span class="p">.</span><span class="nf">latest_post</span> <span class="o">=&gt;</span> <span class="no">Diffing</span> <span class="no">Rails</span> <span class="n">credentials</span> <span class="k">in</span> <span class="no">Rails</span> <span class="mf">6.0</span> <span class="mi">10</span><span class="o">/</span><span class="mo">05</span> </code></pre></div></div> <p>But when I add another site, I recognise that I want to reuse some of the code and things are going in the wrong direction:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'uri'</span> <span class="nb">require</span> <span class="s1">'net/http'</span> <span class="nb">require</span> <span class="s1">'openssl'</span> <span class="nb">require</span> <span class="s1">'nokogiri'</span> <span class="k">module</span> <span class="nn">AbuismanCom</span> <span class="kp">module_function</span> <span class="k">def</span> <span class="nf">latest_post</span> <span class="no">HttpHelper</span><span class="p">.</span><span class="nf">get_selector_text</span><span class="p">(</span><span class="s2">"https://abuisman.com/"</span><span class="p">,</span> <span class="s2">".posts ul li:first-of-type"</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> <span class="k">module</span> <span class="nn">HackerNews</span> <span class="kp">module_function</span> <span class="k">def</span> <span class="nf">top_post</span> <span class="no">HttpHelper</span><span class="p">.</span><span class="nf">get_selector_text</span><span class="p">(</span><span class="s2">"https://news.ycombinator.com/"</span><span class="p">,</span> <span class="s2">"#hnmain .itemlist tr:first-of-type"</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> <span class="k">module</span> <span class="nn">HttpHelper</span> <span class="kp">module_function</span> <span class="k">def</span> <span class="nf">get_selector_text</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">selector</span><span class="p">)</span> <span class="n">url</span> <span class="o">=</span> <span class="no">URI</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="n">http</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">url</span><span class="p">.</span><span class="nf">port</span><span class="p">)</span> <span class="n">http</span><span class="p">.</span><span class="nf">use_ssl</span> <span class="o">=</span> <span class="kp">true</span> <span class="n">http</span><span class="p">.</span><span class="nf">verify_mode</span> <span class="o">=</span> <span class="no">OpenSSL</span><span class="o">::</span><span class="no">SSL</span><span class="o">::</span><span class="no">VERIFY_NONE</span> <span class="n">request</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Get</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="n">response</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span> <span class="n">response</span><span class="p">.</span><span class="nf">read_body</span> <span class="n">document</span> <span class="o">=</span> <span class="no">Nokogiri</span><span class="o">::</span><span class="no">HTML</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="nf">read_body</span><span class="p">)</span> <span class="n">document</span><span class="p">.</span><span class="nf">css</span><span class="p">(</span><span class="n">selector</span><span class="p">).</span><span class="nf">text</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">).</span><span class="nf">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:strip</span><span class="p">).</span><span class="nf">join</span><span class="p">(</span><span class="s2">" "</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> <span class="nb">puts</span> <span class="no">AbuismanCom</span><span class="p">.</span><span class="nf">latest_post</span> <span class="o">=&gt;</span> <span class="no">Diffing</span> <span class="no">Rails</span> <span class="n">credentials</span> <span class="k">in</span> <span class="no">Rails</span> <span class="mf">6.0</span> <span class="mi">10</span><span class="o">/</span><span class="mo">05</span> <span class="nb">puts</span> <span class="no">HackerNews</span><span class="p">.</span><span class="nf">top_post</span> <span class="o">=&gt;</span> <span class="mi">1</span><span class="o">.</span> <span class="no">Tokyo</span> <span class="n">police</span> <span class="n">lose</span> <span class="mi">2</span> <span class="n">floppy</span> <span class="n">disks</span> <span class="n">containing</span> <span class="n">personal</span> <span class="n">info</span> <span class="n">on</span> <span class="n">citizens</span> <span class="p">(</span><span class="n">mainichi</span><span class="p">.</span><span class="nf">jp</span><span class="p">)</span> </code></pre></div></div> <p>I don’t like the call to <code class="highlighter-rouge">HttpHelper.get_selector_text</code>. You can fix that by using <code class="highlighter-rouge">extend(HttpHelper)</code> on the other two modules, so that is also ok. But I don’t want to do that for every website module that I create.</p> <p>I will want to add more helpers to the crawler and a caching feature and I don’t want to keep repeating those things in a list of <code class="highlighter-rouge">extend</code> calls. I could get a long way with something like <code class="highlighter-rouge">extend CrawlerStuff</code> which in turn <code class="highlighter-rouge">extends</code> those things, but then I’d just be avoiding inheritance for the sake of avoiding it or for that sweet sweet <code class="highlighter-rouge">HackerNews.top_post</code> syntax.</p> <h2 id="insert-include-crazyton"><del>Insert</del> <code class="highlighter-rouge">include Crazyton</code></h2> <p>Instead, I can use inheritance from a base class and define whatever I need there:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'uri'</span> <span class="nb">require</span> <span class="s1">'net/http'</span> <span class="nb">require</span> <span class="s1">'openssl'</span> <span class="nb">require</span> <span class="s1">'nokogiri'</span> <span class="nb">require</span> <span class="s1">'crazyton'</span> <span class="k">class</span> <span class="nc">BaseCrawler</span> <span class="kp">include</span> <span class="no">Crazyton</span> <span class="k">def</span> <span class="nf">get_selector_text</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">selector</span><span class="p">)</span> <span class="n">url</span> <span class="o">=</span> <span class="no">URI</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="n">http</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">url</span><span class="p">.</span><span class="nf">port</span><span class="p">)</span> <span class="n">http</span><span class="p">.</span><span class="nf">use_ssl</span> <span class="o">=</span> <span class="kp">true</span> <span class="n">http</span><span class="p">.</span><span class="nf">verify_mode</span> <span class="o">=</span> <span class="no">OpenSSL</span><span class="o">::</span><span class="no">SSL</span><span class="o">::</span><span class="no">VERIFY_NONE</span> <span class="n">request</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Get</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="n">response</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span> <span class="n">response</span><span class="p">.</span><span class="nf">read_body</span> <span class="n">document</span> <span class="o">=</span> <span class="no">Nokogiri</span><span class="o">::</span><span class="no">HTML</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="nf">read_body</span><span class="p">)</span> <span class="n">document</span><span class="p">.</span><span class="nf">css</span><span class="p">(</span><span class="n">selector</span><span class="p">).</span><span class="nf">text</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">).</span><span class="nf">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:strip</span><span class="p">).</span><span class="nf">join</span><span class="p">(</span><span class="s2">" "</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> <span class="k">class</span> <span class="nc">AbuismanCom</span> <span class="o">&lt;</span> <span class="no">BaseCrawler</span> <span class="k">def</span> <span class="nf">latest_post</span> <span class="n">get_selector_text</span><span class="p">(</span><span class="s2">"https://abuisman.com/"</span><span class="p">,</span> <span class="s2">".posts ul li:first-of-type"</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> <span class="k">class</span> <span class="nc">HackerNews</span> <span class="o">&lt;</span> <span class="no">BaseCrawler</span> <span class="k">def</span> <span class="nf">top_post</span> <span class="n">get_selector_text</span><span class="p">(</span><span class="s2">"https://news.ycombinator.com/"</span><span class="p">,</span> <span class="s2">"#hnmain .itemlist tr:first-of-type"</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> </code></pre></div></div> <p>Without all the <code class="highlighter-rouge">module_function</code> and <code class="highlighter-rouge">extend</code> stuff the crawlers look a lot cleaner. The <code class="highlighter-rouge">&lt; BaseCrawler</code> is very basic ruby stuff so I don’t need to explain to others in the team why you’d have to do some unfamiliar module setup. Even the <code class="highlighter-rouge">include Crazyton</code> happens only once inside of the <code class="highlighter-rouge">BaseCrawler</code>.</p> <p>What IS strange then is that you <em>would</em> have to explain to the others why <code class="highlighter-rouge">top_post</code> is suddenly callable like a class method instead of it being an instance method. In the end though I like how little bootstrapping code there is in the classes.</p> <p>Because we are using classes and inheritance I could also give the BaseCrawler and interface:</p> <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s1">'uri'</span> <span class="nb">require</span> <span class="s1">'net/http'</span> <span class="nb">require</span> <span class="s1">'openssl'</span> <span class="nb">require</span> <span class="s1">'nokogiri'</span> <span class="nb">require</span> <span class="s1">'crazyton'</span> <span class="k">class</span> <span class="nc">BaseCrawler</span> <span class="kp">include</span> <span class="no">Crazyton</span> <span class="k">def</span> <span class="nf">get_selector_text</span><span class="p">(</span><span class="n">selector</span><span class="p">)</span> <span class="n">url</span> <span class="o">=</span> <span class="no">URI</span><span class="p">(</span><span class="n">base_url</span><span class="p">)</span> <span class="n">http</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">url</span><span class="p">.</span><span class="nf">port</span><span class="p">)</span> <span class="n">http</span><span class="p">.</span><span class="nf">use_ssl</span> <span class="o">=</span> <span class="kp">true</span> <span class="n">http</span><span class="p">.</span><span class="nf">verify_mode</span> <span class="o">=</span> <span class="no">OpenSSL</span><span class="o">::</span><span class="no">SSL</span><span class="o">::</span><span class="no">VERIFY_NONE</span> <span class="n">request</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Get</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="n">response</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span> <span class="n">response</span><span class="p">.</span><span class="nf">read_body</span> <span class="n">document</span> <span class="o">=</span> <span class="no">Nokogiri</span><span class="o">::</span><span class="no">HTML</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="nf">read_body</span><span class="p">)</span> <span class="n">document</span><span class="p">.</span><span class="nf">css</span><span class="p">(</span><span class="n">selector</span><span class="p">).</span><span class="nf">text</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">).</span><span class="nf">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:strip</span><span class="p">).</span><span class="nf">join</span><span class="p">(</span><span class="s2">" "</span><span class="p">)</span> <span class="k">end</span> <span class="kp">private</span> <span class="k">def</span> <span class="nf">base_url</span> <span class="k">raise</span> <span class="s2">"You have to implement base_url"</span> <span class="k">end</span> <span class="k">end</span> <span class="k">class</span> <span class="nc">AbuismanCom</span> <span class="o">&lt;</span> <span class="no">BaseCrawler</span> <span class="k">def</span> <span class="nf">latest_post</span> <span class="n">get_selector_text</span><span class="p">(</span><span class="s2">".posts ul li:first-of-type"</span><span class="p">)</span> <span class="k">end</span> <span class="kp">private</span> <span class="k">def</span> <span class="nf">base_url</span> <span class="s2">"https://abuisman.com/"</span> <span class="k">end</span> <span class="k">end</span> <span class="k">class</span> <span class="nc">HackerNews</span> <span class="o">&lt;</span> <span class="no">BaseCrawler</span> <span class="k">def</span> <span class="nf">top_post</span> <span class="n">get_selector_text</span><span class="p">(</span><span class="s2">"#hnmain .itemlist tr:first-of-type"</span><span class="p">)</span> <span class="k">end</span> <span class="kp">private</span> <span class="k">def</span> <span class="nf">base_url</span> <span class="s2">"https://news.ycombinator.com/"</span> <span class="k">end</span> <span class="k">end</span> </code></pre></div></div> <p>Ah yes, did I mention <code class="highlighter-rouge">private</code> methods? Because modules don’t have those, but classes do.</p> <p>In summary, with <code class="highlighter-rouge">Crazyton</code> you get:</p> <ul> <li>smooth syntax</li> <li>inheritance</li> <li>standard looking ruby with a twist</li> <li>private methods</li> <li>a free Singleton instance</li> </ul> <p>Have fun!</p>Heroku differ: see which commits would be deployed2021-05-13T00:00:00+00:002021-05-13T00:00:00+00:00https://abuisman.com/posts/developer-tools/heroku-diff<p>Before pushing to Heroku, I’d like to know what will be in the release. Let’s use the power of git to answer this question.</p> <p>The script I created is the following, I’ll go over it, argument, by argument below.</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span> git log —first-parent heroku/master..master —abbrev-commit —date<span class="o">=</span>relative —format<span class="o">=</span>format:<span class="s1">'%C(bold blue)%h%C(reset) - %C(bold green)(%ar)%C(reset) %C(white)%s%C(reset) %C(dim white)- %an%C(reset)%C(bold yellow)%d%C(reset)'</span> </code></pre></div></div> <p>Firstly the output looks like this:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~/Sites/project <span class="o">(</span>master<span class="o">)</span> -&gt; bin/heroku_diff.sh 3820f5782 - <span class="o">(</span>10 hours ago<span class="o">)</span> Merge pull request <span class="c">#1331 from Project/tech/heroku-differ - Achilleas (HEAD -&gt; master, origin/master, origin/HEAD)</span> 996c35a69 - <span class="o">(</span>10 hours ago<span class="o">)</span> Merge pull request <span class="c">#1307 from Project/feature/get-go-leads-form - Other Dev</span> 996c35a69 - <span class="o">(</span>12 hours ago<span class="o">)</span> Merge pull request <span class="c">#1234 from Project/feature/some-feature - Other Dev</span> ~/Sites/project <span class="o">(</span>master<span class="o">)</span> -&gt; </code></pre></div></div> <p>So how did we end up here?</p> <p>Firstly, let’s check out a project I can use without worrying about sharing things I shouldn’t, I will use the rails codebase for this. In order to have a nice diff, I looked up a <a href="https://github.com/rails/rails/pull/35489">PR</a> by David Heinemeier Hansson (DHH), that is about building “a web interface for developing Rails itself. … like with the mailer previews, and now in Rails 6, with the Action Mailbox inbound emails processing.”. Sounds cool, whatever that means, most importantly for us though, it has a few commits that aren’t in the <code class="highlighter-rouge">main</code> branch yet. So let us pretend the `conductor</p> <p>I started out by looking up how to show the commits that are in one branch and not in the other:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~/Projects/rails (conductor6) -&gt; git log main..conductor6 commit 398053bbd7d481ff16cb3fbeeb5fe039fc302fae (HEAD -&gt; conductor6, origin/conductor6) Merge: c319e5da8d 93dbbe3a81 Author: David Heinemeier Hansson &lt;[email protected]&gt; Date: Wed Mar 27 16:49:23 2019 -0700 Merge branch 'master' into conductor6 commit c319e5da8d3b647f400f4e004b8d7875e9e0768f Merge: 00297b1891 a04a757e5d Author: Prathamesh Sonpatki &lt;[email protected]&gt; Date: Tue Mar 12 17:40:35 2019 +0530 Merge branch 'master' into conductor6 commit 00297b18913990af8473d11a3416394bc7e902d8 Merge: 08cfdd94f1 b4bb0c3f5c Author: Kasper Timm Hansen &lt;[email protected]&gt; Date: Tue Mar 12 10:30:01 2019 +0100 Merge pull request #35580 from prathamesh-sonpatki/conductor6-fixes Fix failing tests #..... etc </code></pre></div></div> <p>Ok, that looks good and it seems to be correct.</p>