Home on Nathaniel Thomas https://www.nathom.dev/ Recent content in Home on Nathaniel Thomas Hugo en Wed, 07 Jan 2026 00:00:00 +0000 Continual Learning is not Continual Midtraining https://www.nathom.dev/continual-learning/ Wed, 07 Jan 2026 00:00:00 +0000 https://www.nathom.dev/continual-learning/ <p>Many have caught onto the truth that AGI-through-pure-LLM-scaling is probably not going to happen. And many have identified <em>continual learning</em> as the key difference between LLMs and a generally intelligent agent. If you&rsquo;ve ever used Claude Code, you will be acutely aware of how effective context length limits LLMs&rsquo; general utility, and <em>if only</em> we had something that <strong>wouldn&rsquo;t run out of context</strong>, we could all finally be unemployed.</p> <p>A tempting (and sensible) attempt to solve this is continual midtraining. For example, this would be Anthropic collecting successful Claude Code traces and folding it back into the SFT stage for its next model, which they would release on a monthly basis. This might make them very powerful coding agents, but will not give them the ability to fully automate jobs. Why? Because all this procedure does is continually improve the LLM&rsquo;s <em>world model</em>, which is distinct from its <em>world state</em>. Its world state only exists within its position embedded KV cache.</p> Advent of Code 2025 in Haskell https://www.nathom.dev/notes/aoc2025/ Mon, 01 Dec 2025 00:00:00 +0000 https://www.nathom.dev/notes/aoc2025/ <p>It&rsquo;s <em>that</em> time of year again.</p> <p>You can try out the solutions <a href="https://github.com/nathom/aoc25" target="_blank" rel="noopener">here</a>.</p> <h2 id="day-1">Day 1</h2> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-haskell" data-lang="haskell"><span class="line"><span class="cl"><span class="kr">module</span> <span class="nn">Main</span> <span class="kr">where</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="nn">Common</span> <span class="p">(</span><span class="nf">parseFile</span><span class="p">)</span> </span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="nn">Control.Applicative</span> <span class="p">((</span><span class="o">&lt;|&gt;</span><span class="p">))</span> </span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="nn">Data.List</span> <span class="p">(</span><span class="nf">foldl&#39;</span><span class="p">,</span> <span class="nf">scanl</span><span class="p">)</span> </span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="nn">Data.Text</span> <span class="n">qualified</span> <span class="n">as</span> <span class="kt">T</span> </span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="nn">Data.Void</span> <span class="p">(</span><span class="kt">Void</span><span class="p">)</span> </span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="nn">Text.Megaparsec</span> </span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="nn">Text.Megaparsec.Char</span> </span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="nn">Text.Megaparsec.Char.Lexer</span> <span class="n">qualified</span> <span class="n">as</span> <span class="kt">Lex</span> </span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="nn">Text.Printf</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="kr">data</span> <span class="kt">Dir</span> <span class="ow">=</span> <span class="kt">L</span> <span class="o">|</span> <span class="kt">R</span> </span></span><span class="line"><span class="cl"> <span class="kr">deriving</span> <span class="p">(</span><span class="kt">Show</span><span class="p">,</span> <span class="kt">Eq</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="kr">data</span> <span class="kt">Rot</span> <span class="ow">=</span> <span class="kt">Rot</span> <span class="kt">Dir</span> <span class="kt">Int</span> </span></span><span class="line"><span class="cl"> <span class="kr">deriving</span> <span class="p">(</span><span class="kt">Show</span><span class="p">,</span> <span class="kt">Eq</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="kr">type</span> <span class="kt">Parser</span> <span class="ow">=</span> <span class="kt">Parsec</span> <span class="kt">Void</span> <span class="kt">T</span><span class="o">.</span><span class="kt">Text</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">dirSign</span> <span class="ow">::</span> <span class="kt">Dir</span> <span class="ow">-&gt;</span> <span class="kt">Int</span> </span></span><span class="line"><span class="cl"><span class="nf">dirSign</span> <span class="kt">L</span> <span class="ow">=</span> <span class="o">-</span><span class="mi">1</span> </span></span><span class="line"><span class="cl"><span class="nf">dirSign</span> <span class="kt">R</span> <span class="ow">=</span> <span class="mi">1</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">parseRot</span> <span class="ow">::</span> <span class="kt">Parser</span> <span class="kt">Rot</span> </span></span><span class="line"><span class="cl"><span class="nf">parseRot</span> <span class="ow">=</span> <span class="kt">Rot</span> <span class="o">&lt;$&gt;</span> <span class="p">(</span><span class="kt">L</span> <span class="o">&lt;$</span> <span class="n">char</span> <span class="sc">&#39;L&#39;</span> <span class="o">&lt;|&gt;</span> <span class="kt">R</span> <span class="o">&lt;$</span> <span class="n">char</span> <span class="sc">&#39;R&#39;</span><span class="p">)</span> <span class="o">&lt;*&gt;</span> <span class="kt">Lex</span><span class="o">.</span><span class="n">decimal</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">solve1</span> <span class="ow">::</span> <span class="p">[</span><span class="kt">Rot</span><span class="p">]</span> <span class="ow">-&gt;</span> <span class="kt">Int</span> </span></span><span class="line"><span class="cl"><span class="nf">solve1</span> <span class="n">rots</span> <span class="ow">=</span> <span class="n">length</span> <span class="o">$</span> <span class="n">filter</span> <span class="p">(</span><span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="o">$</span> <span class="n">scanl</span> <span class="n">update</span> <span class="mi">50</span> <span class="n">rots</span> </span></span><span class="line"><span class="cl"> <span class="kr">where</span> </span></span><span class="line"><span class="cl"> <span class="n">update</span> <span class="n">pos</span> <span class="p">(</span><span class="kt">Rot</span> <span class="n">d</span> <span class="n">x</span><span class="p">)</span> <span class="ow">=</span> <span class="p">(</span><span class="n">pos</span> <span class="o">+</span> <span class="p">(</span><span class="n">dirSign</span> <span class="n">d</span><span class="p">)</span> <span class="o">*</span> <span class="n">x</span><span class="p">)</span> <span class="p">`</span><span class="n">mod</span><span class="p">`</span> <span class="mi">100</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">solve2</span> <span class="ow">::</span> <span class="p">[</span><span class="kt">Rot</span><span class="p">]</span> <span class="ow">-&gt;</span> <span class="kt">Int</span> </span></span><span class="line"><span class="cl"><span class="nf">solve2</span> <span class="n">rots</span> <span class="ow">=</span> <span class="n">snd</span> <span class="o">$</span> <span class="n">foldl&#39;</span> <span class="n">expand</span> <span class="p">(</span><span class="mi">50</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="n">rots</span> </span></span><span class="line"><span class="cl"> <span class="kr">where</span> </span></span><span class="line"><span class="cl"> <span class="n">expand</span> <span class="p">(</span><span class="n">pos</span><span class="p">,</span> <span class="n">count</span><span class="p">)</span> <span class="p">(</span><span class="kt">Rot</span> <span class="n">d</span> <span class="n">x</span><span class="p">)</span> <span class="ow">=</span> </span></span><span class="line"><span class="cl"> <span class="kr">let</span> <span class="n">steps</span> <span class="ow">=</span> <span class="p">[</span><span class="n">pos</span> <span class="o">+</span> <span class="n">dirSign</span> <span class="n">d</span> <span class="o">*</span> <span class="n">n</span> <span class="o">|</span> <span class="n">n</span> <span class="ow">&lt;-</span> <span class="p">[</span><span class="mi">1</span> <span class="o">..</span> <span class="n">x</span><span class="p">]]</span> </span></span><span class="line"><span class="cl"> <span class="kr">in</span> <span class="p">((</span><span class="n">pos</span> <span class="o">+</span> <span class="n">dirSign</span> <span class="n">d</span> <span class="o">*</span> <span class="n">x</span><span class="p">)</span> <span class="p">`</span><span class="n">mod</span><span class="p">`</span> <span class="mi">100</span><span class="p">,</span> <span class="n">count</span> <span class="o">+</span> <span class="n">length</span> <span class="p">(</span><span class="n">filter</span> <span class="p">(</span><span class="nf">\</span><span class="n">p</span> <span class="ow">-&gt;</span> <span class="n">p</span> <span class="p">`</span><span class="n">mod</span><span class="p">`</span> <span class="mi">100</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="n">steps</span><span class="p">))</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">main</span> <span class="ow">::</span> <span class="kt">IO</span> <span class="nb">()</span> </span></span><span class="line"><span class="cl"><span class="nf">main</span> <span class="ow">=</span> <span class="kr">do</span> </span></span><span class="line"><span class="cl"> <span class="n">sampleRots</span> <span class="ow">&lt;-</span> <span class="n">parseFile</span> <span class="p">(</span><span class="n">parseRot</span> <span class="p">`</span><span class="n">sepEndBy</span><span class="p">`</span> <span class="n">newline</span><span class="p">)</span> <span class="s">&#34;input/day01_sample.txt&#34;</span> </span></span><span class="line"><span class="cl"> <span class="n">rots</span> <span class="ow">&lt;-</span> <span class="n">parseFile</span> <span class="p">(</span><span class="n">parseRot</span> <span class="p">`</span><span class="n">sepEndBy</span><span class="p">`</span> <span class="n">newline</span><span class="p">)</span> <span class="s">&#34;input/day01.txt&#34;</span> </span></span><span class="line"><span class="cl"> <span class="n">printf</span> <span class="s">&#34;Part 1 Sample answer: %s</span><span class="se">\n</span><span class="s">&#34;</span> <span class="p">(</span><span class="n">show</span> <span class="o">$</span> <span class="n">solve1</span> <span class="n">sampleRots</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="n">printf</span> <span class="s">&#34;Part 1 Final answer: %s</span><span class="se">\n</span><span class="s">&#34;</span> <span class="p">(</span><span class="n">show</span> <span class="o">$</span> <span class="n">solve1</span> <span class="n">rots</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="n">printf</span> <span class="s">&#34;Part 2 Sample answer: %s</span><span class="se">\n</span><span class="s">&#34;</span> <span class="p">(</span><span class="n">show</span> <span class="o">$</span> <span class="n">solve2</span> <span class="n">sampleRots</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="n">printf</span> <span class="s">&#34;Part 2 Final answer: %s</span><span class="se">\n</span><span class="s">&#34;</span> <span class="p">(</span><span class="n">show</span> <span class="o">$</span> <span class="n">solve2</span> <span class="n">rots</span><span class="p">)</span> </span></span></code></pre></div><p>Yes, I know part 2&rsquo;s solution is inefficient&mdash;it&rsquo;s prettier this way.</p> Comparing Structured Data Formats for LLMs https://www.nathom.dev/llm-data-formats/ Sun, 12 Oct 2025 00:00:00 +0000 https://www.nathom.dev/llm-data-formats/ <p>As we start training LLMs as Agents, we must think about how to best pass information to and from the real-world environment. If it calls an external function, how should arguments be passed? How should data from the environment be fed to the model? The simplest (and most general) solution uses <em>structured data formats</em>, such as JSON. These formats can encode arbitrarily nested, heterogeneously typed data structures.</p> <p>But is JSON the right choice? We have many options, such as TOML, YAML, XML, et al. In this post, we consider and measure metrics that will help us make the right choice.</p> The best T-Shirt https://www.nathom.dev/notes/tshirts/ Thu, 10 Apr 2025 00:00:00 +0000 https://www.nathom.dev/notes/tshirts/ <p>I recently bought Bryan Johnson&rsquo;s <a href="https://blueprint.bryanjohnson.com/products/super-veggie-t-shirt?variant=49506805219613" target="_blank" rel="noopener">Super Veggie T-Shirt</a>, in order to fully immerse myself in his protocol.</p> <p> <figure class=""> <div> <img loading="lazy" alt="" src=" /tshirts/superveggie.png"> </div> </figure></p> <p>It was $37—not a terrible price—and I think it looks cool. But once I receieved it, I noticed that the quality was markedly better than <em>any</em> other t-shirts I own, even those that were double the price. Hence, my quest for the best tee. The goal is to optimize for the quality/cost ratio. As a starting point, I&rsquo;m looking at only 6.1 oz/yd^2 tees, the same weight as the super veggie one. For convenience, I&rsquo;m only going to review those available on Amazon.</p> How to use $\LaTeX$ in Excalidraw https://www.nathom.dev/notes/latex_in_excalidraw/ Thu, 27 Mar 2025 00:00:00 +0000 https://www.nathom.dev/notes/latex_in_excalidraw/ <p>Excalidraw currently doesn&rsquo;t support <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8988em;vertical-align:-0.2155em;"></span><span class="mord text"><span class="mord textrm">L</span><span class="mspace" style="margin-right:-0.36em;"></span><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6833em;"><span style="top:-2.905em;"><span class="pstrut" style="height:2.7em;"></span><span class="mord"><span class="mord textrm mtight sizing reset-size6 size3">A</span></span></span></span></span></span><span class="mspace" style="margin-right:-0.15em;"></span><span class="mord text"><span class="mord textrm">T</span><span class="mspace" style="margin-right:-0.1667em;"></span><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.4678em;"><span style="top:-2.7845em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord textrm">E</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2155em;"><span></span></span></span></span><span class="mspace" style="margin-right:-0.125em;"></span><span class="mord textrm">X</span></span></span></span></span></span>, which sucks. The workaround is to generate an SVG for whatever math you want to render, and paste that in.</p> <p>You can use this script to generate the SVG:</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="c1"># use svg backend</span> </span></span><span class="line"><span class="cl"><span class="n">plt</span><span class="o">.</span><span class="n">switch_backend</span><span class="p">(</span><span class="s1">&#39;svg&#39;</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="c1"># enable latex rendering</span> </span></span><span class="line"><span class="cl"><span class="n">plt</span><span class="o">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s1">&#39;text.usetex&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="kc">True</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="c1"># create figure</span> </span></span><span class="line"><span class="cl"><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span> </span></span><span class="line"><span class="cl"><span class="n">ax</span><span class="o">.</span><span class="n">axis</span><span class="p">(</span><span class="s1">&#39;off&#39;</span><span class="p">)</span> <span class="c1"># no axes</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="n">latex_str</span> <span class="o">=</span> <span class="sa">r</span><span class="s2">&#34;$\nabla \mathcal J(\theta)$&#34;</span> </span></span><span class="line"><span class="cl"><span class="n">ax</span><span class="o">.</span><span class="n">text</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">,</span> <span class="n">latex_str</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">ha</span><span class="o">=</span><span class="s1">&#39;center&#39;</span><span class="p">,</span> <span class="n">va</span><span class="o">=</span><span class="s1">&#39;center&#39;</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="n">fig</span><span class="o">.</span><span class="n">savefig</span><span class="p">(</span><span class="s2">&#34;latex.svg&#34;</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s2">&#34;svg&#34;</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="s1">&#39;tight&#39;</span><span class="p">,</span> <span class="n">transparent</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> </span></span></code></pre></div><p>Et voilà:</p> Sharpe Ratio Based Portfolio Simulator https://www.nathom.dev/notes/sharpe/ Thu, 06 Mar 2025 00:00:00 +0000 https://www.nathom.dev/notes/sharpe/ <p>The Sharpe Ratio measures the quality of an equity or hedge fund by showing the return per unit of risk, calculated as <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.1994em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8544em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03588em;">σ</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.4461em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">μ</span><span class="mbin mtight">−</span><span class="mord mathnormal mtight" style="margin-right:0.02778em;">r</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span>, where <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">μ</span></span></span></span> is the expected return, <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">r</span></span></span></span> is the risk-free rate, and <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">σ</span></span></span></span> is the standard deviation (volatility). A higher ratio indicates better performance for the risk taken—more return without excessive variability. In the simulator, the standard deviation slider ( <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">σ</span></span></span></span>) controls the level of risk you accept: increasing it means you’re comfortable with greater fluctuations in value, while decreasing it reduces exposure to variability, reflecting how much uncertainty you’re willing to tolerate.</p> Entropy from First Principles https://www.nathom.dev/entropy/ Wed, 01 Jan 2025 00:00:00 +0000 https://www.nathom.dev/entropy/ <p>I find entropy to be extremely fascinating. But, matching the formula <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.3262em;vertical-align:-0.4811em;"></span><span class="mop op-symbol small-op" style="position:relative;top:0em;">∑</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8451em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3281em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.0714em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.4811em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span> to its &ldquo;intuitive&rdquo; explanations related to prefix free codes and information content is not obvious. Here, I want to go over a couple ways to independently arrive at the idea.</p> Advent of Code 2024 in Haskell https://www.nathom.dev/notes/aoc2024/ Sat, 21 Dec 2024 00:00:00 +0000 https://www.nathom.dev/notes/aoc2024/ <p>I&rsquo;m doing AoC in Haskell to learn the language. These are my solutions.</p> <h2 id="day-1">Day 1</h2> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-haskell" data-lang="haskell"><span class="line"><span class="cl"><span class="kr">import</span> <span class="nn">Data.List</span> </span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="k">qualified</span> <span class="nn">Data.Map</span> <span class="k">as</span> <span class="n">Map</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">f</span> <span class="n">xs</span> <span class="ow">=</span> </span></span><span class="line"><span class="cl"> <span class="kr">let</span> <span class="n">x1s</span> <span class="ow">=</span> <span class="n">sort</span> <span class="o">$</span> <span class="n">map</span> <span class="n">fst</span> <span class="n">xs</span> </span></span><span class="line"><span class="cl"> <span class="n">x2s</span> <span class="ow">=</span> <span class="n">sort</span> <span class="o">$</span> <span class="n">map</span> <span class="n">snd</span> <span class="n">xs</span> </span></span><span class="line"><span class="cl"> <span class="n">diff</span> <span class="n">x</span> <span class="n">y</span> <span class="ow">=</span> <span class="n">abs</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">y</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="kr">in</span> <span class="n">sum</span> <span class="o">$</span> <span class="n">zipWith</span> <span class="n">diff</span> <span class="n">x1s</span> <span class="n">x2s</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">counter</span> <span class="ow">=</span> <span class="kt">Map</span><span class="o">.</span><span class="n">fromListWith</span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="o">.</span> <span class="n">map</span> <span class="p">(,</span><span class="mi">1</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">sim</span> <span class="n">xs</span> <span class="ow">=</span> </span></span><span class="line"><span class="cl"> <span class="kr">let</span> <span class="n">c</span> <span class="ow">=</span> <span class="n">counter</span> <span class="p">(</span><span class="n">map</span> <span class="n">snd</span> <span class="n">xs</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="kr">in</span> <span class="n">sum</span> <span class="p">[</span><span class="n">x</span> <span class="o">*</span> <span class="kt">Map</span><span class="o">.</span><span class="n">findWithDefault</span> <span class="mi">0</span> <span class="n">x</span> <span class="n">c</span> <span class="o">|</span> <span class="n">x</span> <span class="ow">&lt;-</span> <span class="n">map</span> <span class="n">fst</span> <span class="n">xs</span><span class="p">]</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">main</span> <span class="ow">=</span> <span class="kr">do</span> </span></span><span class="line"><span class="cl"> <span class="n">l</span> <span class="ow">&lt;-</span> <span class="n">readFile</span> <span class="s">&#34;data1.txt&#34;</span> </span></span><span class="line"><span class="cl"> <span class="kr">let</span> <span class="n">xs</span> <span class="ow">=</span> <span class="p">[(</span><span class="n">read</span> <span class="n">x</span><span class="p">,</span> <span class="n">read</span> <span class="n">y</span><span class="p">)</span> <span class="o">|</span> <span class="p">[</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">]</span> <span class="ow">&lt;-</span> <span class="n">map</span> <span class="n">words</span> <span class="p">(</span><span class="n">lines</span> <span class="n">l</span><span class="p">)]</span> </span></span><span class="line"><span class="cl"> <span class="n">print</span> <span class="p">(</span><span class="n">f</span> <span class="n">xs</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="n">print</span> <span class="p">(</span><span class="n">sim</span> <span class="n">xs</span><span class="p">)</span> </span></span></code></pre></div><p>Pretty clean, I don&rsquo;t think I can make it nicer.</p> <h2 id="day-2">Day 2</h2> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-haskell" data-lang="haskell"><span class="line"><span class="cl"><span class="nf">allSame</span> <span class="kt">[]</span> <span class="ow">=</span> <span class="kt">True</span> </span></span><span class="line"><span class="cl"><span class="nf">allSame</span> <span class="p">(</span><span class="n">x</span> <span class="kt">:</span> <span class="n">xs</span><span class="p">)</span> <span class="ow">=</span> <span class="n">all</span> <span class="p">(</span><span class="o">==</span> <span class="n">x</span><span class="p">)</span> <span class="n">xs</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">monotonic</span> <span class="n">xs</span> <span class="ow">=</span> <span class="n">allSame</span> <span class="p">(</span><span class="n">zipWith</span> <span class="p">(</span><span class="nf">\</span><span class="n">x</span> <span class="n">y</span> <span class="ow">-&gt;</span> <span class="n">signum</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">y</span><span class="p">))</span> <span class="n">xs</span> <span class="p">(</span><span class="n">tail</span> <span class="n">xs</span><span class="p">))</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">diffValid</span> <span class="p">(</span><span class="n">x</span> <span class="kt">:</span> <span class="n">y</span> <span class="kt">:</span> <span class="n">xs</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="o">|</span> <span class="n">abs</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">y</span><span class="p">)</span> <span class="o">&gt;=</span> <span class="mi">1</span> <span class="o">&amp;&amp;</span> <span class="n">abs</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">y</span><span class="p">)</span> <span class="o">&lt;=</span> <span class="mi">3</span> <span class="ow">=</span> <span class="n">diffValid</span> <span class="p">(</span><span class="n">y</span> <span class="kt">:</span> <span class="n">xs</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="o">|</span> <span class="n">otherwise</span> <span class="ow">=</span> <span class="kt">False</span> </span></span><span class="line"><span class="cl"><span class="nf">diffValid</span> <span class="kr">_</span> <span class="ow">=</span> <span class="kt">True</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">isSafe</span> <span class="n">xs</span> <span class="ow">=</span> <span class="n">monotonic</span> <span class="n">xs</span> <span class="o">&amp;&amp;</span> <span class="n">diffValid</span> <span class="n">xs</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">without</span> <span class="n">xs</span> <span class="n">i</span> <span class="ow">=</span> <span class="p">[</span><span class="n">x</span> <span class="o">|</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="ow">&lt;-</span> <span class="n">zip</span> <span class="n">xs</span> <span class="p">[</span><span class="mi">1</span> <span class="o">..</span><span class="p">],</span> <span class="n">j</span> <span class="o">/=</span> <span class="n">i</span><span class="p">]</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">isSafeDamp</span> <span class="n">xs</span> <span class="ow">=</span> <span class="n">isSafe</span> <span class="n">xs</span> <span class="o">||</span> <span class="n">any</span> <span class="p">(</span><span class="n">isSafe</span> <span class="o">.</span> <span class="n">without</span> <span class="n">xs</span><span class="p">)</span> <span class="p">[</span><span class="mi">1</span> <span class="o">..</span> <span class="n">length</span> <span class="n">xs</span><span class="p">]</span> </span></span><span class="line"><span class="cl"> </span></span><span class="line"><span class="cl"><span class="nf">main</span> <span class="ow">=</span> <span class="kr">do</span> </span></span><span class="line"><span class="cl"> <span class="n">content</span> <span class="ow">&lt;-</span> <span class="n">readFile</span> <span class="s">&#34;data2.txt&#34;</span> </span></span><span class="line"><span class="cl"> <span class="kr">let</span> <span class="n">parsed</span> <span class="ow">=</span> <span class="n">map</span> <span class="p">(</span><span class="nf">\</span><span class="n">l</span> <span class="ow">-&gt;</span> <span class="n">map</span> <span class="n">read</span> <span class="p">(</span><span class="n">words</span> <span class="n">l</span><span class="p">))</span> <span class="p">(</span><span class="n">lines</span> <span class="n">content</span><span class="p">)</span> <span class="ow">::</span> <span class="p">[[</span><span class="kt">Int</span><span class="p">]]</span> </span></span><span class="line"><span class="cl"> <span class="kr">let</span> <span class="n">nSafe</span> <span class="ow">=</span> <span class="n">length</span> <span class="p">(</span><span class="n">filter</span> <span class="n">isSafe</span> <span class="n">parsed</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="kr">let</span> <span class="n">nSafeDamp</span> <span class="ow">=</span> <span class="n">length</span> <span class="p">(</span><span class="n">filter</span> <span class="n">isSafeDamp</span> <span class="n">parsed</span><span class="p">)</span> </span></span><span class="line"><span class="cl"> <span class="n">print</span> <span class="p">(</span><span class="s">&#34;number of safe elements: &#34;</span> <span class="o">++</span> <span class="p">(</span><span class="n">show</span> <span class="n">nSafe</span><span class="p">))</span> </span></span><span class="line"><span class="cl"> <span class="n">print</span> <span class="p">(</span><span class="s">&#34;number of safe elements (damping): &#34;</span> <span class="o">++</span> <span class="p">(</span><span class="n">show</span> <span class="n">nSafeDamp</span><span class="p">))</span> </span></span></code></pre></div><p>This is also quite clean and straightforward.</p> This Website https://www.nathom.dev/notes/website/ Tue, 17 Dec 2024 00:00:00 +0000 https://www.nathom.dev/notes/website/ <p>This entire site is static. All the visualizations are running completely in the browser.</p> <p>I use <a href="https://gohugo.io" target="_blank" rel="noopener">Hugo</a> to build the site. It&rsquo;s pretty neat, since its template language lets me program a lot features statically, without any JavaScript. Even the <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8988em;vertical-align:-0.2155em;"></span><span class="mord text"><span class="mord textrm">L</span><span class="mspace" style="margin-right:-0.36em;"></span><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6833em;"><span style="top:-2.905em;"><span class="pstrut" style="height:2.7em;"></span><span class="mord"><span class="mord textrm mtight sizing reset-size6 size3">A</span></span></span></span></span></span><span class="mspace" style="margin-right:-0.15em;"></span><span class="mord text"><span class="mord textrm">T</span><span class="mspace" style="margin-right:-0.1667em;"></span><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.4678em;"><span style="top:-2.7845em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord textrm">E</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2155em;"><span></span></span></span></span><span class="mspace" style="margin-right:-0.125em;"></span><span class="mord textrm">X</span></span></span></span></span></span> on this site is statically rendered!</p> <p>The theme is based off of <a href="https://github.com/tomfran/typo" target="_blank" rel="noopener">Typo by tomfran</a>, but I&rsquo;ve made a bunch of UI tweaks to my liking (like the slick Table of Contents on widescreen).</p> Interactive Gaussian Mixture Models https://www.nathom.dev/blog/gmm/ Fri, 06 Dec 2024 00:00:00 +0000 https://www.nathom.dev/blog/gmm/ <h2 id="goal">Goal</h2> <p>Suppose we have a dataset of features, but no labels. If we know (or guess) that there are <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.07153em;">K</span></span></span></span> classes in the dataset, we could model the dataset as the weighted average of <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.07153em;">K</span></span></span></span> class–conditional Gaussians. This is what Gaussian Mixture Models do.</p> <p>We assume that the model is parameterized by <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord"><span class="mord"><span class="mord boldsymbol" style="margin-right:0.03194em;">θ</span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.1244em;vertical-align:-0.2831em;"></span><span class="mopen">{</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">π</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">μ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-2.4169em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2831em;"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">}</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8413em;"><span style="top:-2.4169em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.07153em;">K</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2831em;"><span></span></span></span></span></span></span></span></span></span>, where <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">π</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> determines the weight of the <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span></span></span></span>th Gaussian in the model.</p> The Zed Text Editor https://www.nathom.dev/notes/zed/ Wed, 04 Dec 2024 00:00:00 +0000 https://www.nathom.dev/notes/zed/ <p>I am a Neovim diehard, but it is impossible to use over SSH. Since I do ML research, all my code runs on a remote server with high power GPUs. Reluctantly, I have been using VSCode, for its excellent remote-ssh plugin. But even with its half-baked Vim mode, it is still the same sluggish Electron app.</p> <p>Zed may the the editor that changes this game. It is extremely fast, supports LSP and Treesitter natively, has some pretty nifty AI features, and has native Vim bindings. It still has some rough edges, and remote development is not as smooth as VSCode, but its lightness makes it worth using for me.</p> Local Approximation https://www.nathom.dev/local-approximation/ Sun, 01 Dec 2024 00:00:00 +0000 https://www.nathom.dev/local-approximation/ <p>Training a deep neural network is essentially a compression task. We want to represent our training data distribution as a function parameterized by a bunch of matrices. The more complex the distribution, the more parameters we need. The rationale for approximating the <em>entire</em> distribution is so that we can forward <em>any</em> valid point at inference using the same model, with the same weights. But what if our model was trained on-the-fly, at inference? Then, when forwarding <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4444em;"></span><span class="mord mathbf">x</span></span></span></span>, we would only need to model the <em>local distribution</em> around <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4444em;"></span><span class="mord mathbf">x</span></span></span></span>. Since the local region should have lower dimensionality than the entire training set, a much simpler model will suffice!</p> Bayesian Parameter Estimation https://www.nathom.dev/notes/bpe/ Mon, 25 Nov 2024 00:00:00 +0000 https://www.nathom.dev/notes/bpe/ <p>Bayesian Parameter Estimation (BPE) is fundamentally different compared to <a href="https://www.nathom.dev/notes/mle/">MLE</a> or <a href="https://www.nathom.dev/notes/map/">MAP</a>. Whereas the latter two solve for an optimal set of parameters <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9579em;"></span><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9579em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord boldsymbol" style="margin-right:0.03194em;">θ</span></span></span></span><span style="top:-3.2634em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord">^</span></span></span></span></span></span></span></span></span></span> for the model, BPE treats <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord"><span class="mord"><span class="mord boldsymbol" style="margin-right:0.03194em;">θ</span></span></span></span></span></span> as a random variable with a distribution <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal">p</span><span class="mopen">(</span><span class="mord"><span class="mord"><span class="mord boldsymbol" style="margin-right:0.03194em;">θ</span></span></span><span class="mclose">)</span></span></span></span>.</p> <h2 id="setup">Setup</h2> <p>We are given a dataset <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span></span></span>, which contains <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">n</span></span></span></span> i.i.d. features <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7305em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathbf">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span></span></span></span>. Given a new feature vector <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4444em;"></span><span class="mord mathbf">x</span></span></span></span>, we want to classify it to some class <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span></span></span></span>. One way to do this is by the Bayes&rsquo; decision rule. That is, we choose class <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7167em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span></span></span></span> over class <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> if</p> Hario V60 Recipes https://www.nathom.dev/notes/v60/ Mon, 25 Nov 2024 00:00:00 +0000 https://www.nathom.dev/notes/v60/ <p>This is a collection of V60 recipes that I have used.</p> <h2 id="emi-fukahori-1-cup">Emi Fukahori (1 cup)</h2> <p><a href="https://youtu.be/3euEkTBxtEk?si=HUDLfQ8RyMAs-YuS" target="_blank" rel="noopener">Source video.</a></p> <p>This recipe is specific to the Hario switch, my current brewer. It gives a consistent and bright cup.</p> <ul> <li>Filtered Water: 200g</li> <li>Coffee: 14g</li> <li>Grind: Medium-coarse, 7.5 on Fellow Ode 2</li> <li>Ratio: 14.28</li> <li>Water temp: 95º C</li> </ul> <ol> <li>Close the switch (no flow), put filter, and preheat the brewer with hot water. After some time, open switch and toss the water.</li> <li>Close the switch, and add coffee.</li> <li>Start timer. Bloom with 45g water until 0:35. Open switch.</li> <li>Pour, in one stream down the center, 155g water (200g total) until timer hits 1:10 (~4g/sec).</li> <li>Give it one quick swirl to get the bits stuck to the side down, and let it drain.</li> <li>Feel free to close the switch ~5g before draining to keep the harsher, final coffee out</li> </ol> The Ten Armed Testbed https://www.nathom.dev/notes/ten_armed/ Mon, 25 Nov 2024 00:00:00 +0000 https://www.nathom.dev/notes/ten_armed/ <p>This is a method of evaluating strategies for the multi-armed bandit problem <sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>. The testbed works as follows:</p> <ol> <li>Generate <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">10</span></span></span></span> reward means <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal">μ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> associated with <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">10</span></span></span></span> actions <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></li> <li>On each iteration allow the agent to take some action <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7167em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span></span></span></span>, and receive a reward <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.02778em;">r</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em;"><span style="top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">∼</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.0361em;vertical-align:-0.2861em;"></span><span class="mord mathcal" style="margin-right:0.14736em;">N</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">μ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">1</span><span class="mclose">)</span></span></span></span>.</li> </ol> <p>We repeat this for <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">100</span></span></span></span> randomly sampled sets of <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal">μ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>. The agent&rsquo;s goal is to maximize average rewards. Hopefully, it should learn which action has the highest mean and sample from that.</p> Maximum A Posteriori (MAP) Estimation https://www.nathom.dev/notes/map/ Sun, 24 Nov 2024 00:00:00 +0000 https://www.nathom.dev/notes/map/ <p>The goal is essentially the same as <a href="https://www.nathom.dev/notes/mle/">MLE</a>. We have an assumed model for <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0361em;vertical-align:-0.2861em;"></span><span class="mord mathnormal">p</span><span class="mopen">(</span><span class="mord"><span class="mord mathbf">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mord">∣</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span> parameterized by <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">θ</span></span></span></span>. We want to classify a feature <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4444em;"></span><span class="mord mathbf">x</span></span></span></span> into some class <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7167em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span></span></span></span> based on a labeled dataset <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span></span></span>. In MLE, we were trying to maximize the <em>likelihood</em>:</p> <span class="katex-display"><span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.1079em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9579em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord boldsymbol" style="margin-right:0.03194em;">θ</span></span></span></span><span style="top:-3.2634em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">MLE</span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.5021em;vertical-align:-0.7521em;"></span><span class="mop">ar<span style="margin-right:0.01389em;">g</span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.4306em;"><span style="top:-2.3479em;margin-left:0em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord boldsymbol mtight" style="margin-right:0.03194em;">θ</span></span></span></span></span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span><span class="mop">max</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.7521em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">p</span><span class="mopen">(</span><span class="mord mathcal" style="margin-right:0.02778em;">D</span><span class="mord">∣</span><span class="mord"><span class="mord"><span class="mord boldsymbol" style="margin-right:0.03194em;">θ</span></span></span><span class="mclose">)</span></span></span></span></span><p>In MAP, we instead maximize the <em>a posteriori</em>:</p> Maximum Likelihood Estimation https://www.nathom.dev/notes/mle/ Sun, 24 Nov 2024 00:00:00 +0000 https://www.nathom.dev/notes/mle/ <h2 id="goal">Goal</h2> <p>We are given a dataset <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathcal" style="margin-right:0.02778em;">D</span></span></span></span>, which contains feature vectors <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5944em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathbf">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> and class labels <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>. Denote <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathcal" style="margin-right:0.02778em;">D</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> as the set of features of class <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>. We assume the following:</p> <ol> <li>That <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal">p</span><span class="mopen">(</span><span class="mord mathbf">x</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">∣</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.0361em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">∼</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.1302em;vertical-align:-0.3802em;"></span><span class="mord mathcal" style="margin-right:0.14736em;">N</span><span class="mopen">(</span><span class="mord"><span class="mord"><span class="mord"><span class="mord boldsymbol">μ</span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2175em;"><span style="top:-2.4559em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3802em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord"><span class="mord"><span class="mord mathbf">Σ</span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span>. That is, given a class label, the distribution of features belonging to that class forms a Gaussian with mean <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8247em;vertical-align:-0.3802em;"></span><span class="mord"><span class="mord"><span class="mord"><span class="mord boldsymbol">μ</span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2175em;"><span style="top:-2.4559em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3802em;"><span></span></span></span></span></span></span></span></span></span> and covariance <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9722em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord"><span class="mord"><span class="mord mathbf">Σ</span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span></span></span></span>.</li> <li>The samples <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5782em;vertical-align:-0.0391em;"></span><span class="mord mathbf">x</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathcal" style="margin-right:0.02778em;">D</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> are <em>independent and identically distributed (i.i.d.)</em> according to this assumed Gaussian distribution.</li> </ol> <p>The problem that MLE seeks to solve is to find the most likely set of parameters <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0664em;vertical-align:-0.3802em;"></span><span class="mord"><span class="mord"><span class="mord"><span class="mord boldsymbol">μ</span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2175em;"><span style="top:-2.4559em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3802em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord"><span class="mord"><span class="mord mathbf">Σ</span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span></span></span></span>, given the data. We denote</p> The Mechanics of Causal Self Attention https://www.nathom.dev/self-attention/ Wed, 13 Nov 2024 14:51:11 -0800 https://www.nathom.dev/self-attention/ <p>Causal self-attention is the mechanism underpinning most of the advances in AI since 2017. In this article, I will step through the computation and hopefully gain a better intuition of how it works.</p> <span class="katex-display"><span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord text"><span class="mord">SelfAttention</span></span><span class="mopen">(</span><span class="mord mathbf">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathbf">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathbf" style="margin-right:0.01597em;">V</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:2.4684em;vertical-align:-0.95em;"></span><span class="mord text"><span class="mord">softmax</span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">(</span></span><span class="mord text"><span class="mord">mask</span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">(</span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.5183em;"><span style="top:-2.1778em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9322em;"><span class="svg-align" style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord" style="padding-left:0.833em;"><span class="mord mathnormal">d</span></span></span><span style="top:-2.8922em;"><span class="pstrut" style="height:3em;"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em;"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702 c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14 c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54 c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10 s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429 c69,-144,104.5,-217.7,106.5,-221 l0 -0 c5.3,-9.3,12,-14,20,-14 H400000v40H845.2724 s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7 c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z M834 80h400000v40h-400000z"/></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1078em;"><span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathbf">Q</span><span class="mord"><span class="mord mathbf">K</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.93em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">)</span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">)</span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathbf" style="margin-right:0.01597em;">V</span></span></span></span></span><p>At a high level, this function takes one <em>sequence</em> and transforms it into another. A sequence is a list of token embeddings, a tensor of shape <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em;"></span><span class="mord mathnormal">L</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal">d</span></span></span></span>, where <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal">L</span></span></span></span> is the input sequence length and <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal">d</span></span></span></span> is the embedding dimension. Each row of this matrix corresponds to one input token, which is represented as a <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal">d</span></span></span></span>-dimensional vector.</p> Building and Deploying Rust to a Hugo Site https://www.nathom.dev/notes/hugo_wasm/ Mon, 22 Apr 2024 14:32:24 -0700 https://www.nathom.dev/notes/hugo_wasm/ <p>We&rsquo;re going to go through a minimal example that will let you run Rust code on the client side of a Hugo site. We are going to compile the Rust code into WebAssembly (wasm), which will give us near-native performance on the browser!</p> An Expert–Level 2048 Bot https://www.nathom.dev/2048/ Tue, 16 Apr 2024 18:06:16 -0700 https://www.nathom.dev/2048/ Explore different methods to win, and beat expert humans in 2048 interactively! Interactive MNIST Explorer https://www.nathom.dev/mnist/ Tue, 20 Feb 2024 11:53:54 -0700 https://www.nathom.dev/mnist/ Draw digits on the canvas and watch an AI guess what it is! Switching to Obsidian https://www.nathom.dev/notes/obsidian_daily_planner/ Thu, 14 Sep 2023 00:00:00 +0000 https://www.nathom.dev/notes/obsidian_daily_planner/ <p>One of the most striking elements of Silicon Valley to outsiders is <em>productivity culture</em>. Whereas most people in most places live in complete satisfaction doing their job as they would, Silicon Valley people won&rsquo;t find peace without optimizing their every habit and system to extract that extra iota of productivity per unit time. I am one of those people, and this article is about how I revolutionized my productivity switching from Neovim org-mode to Obsidian.</p> Hammerspoon Wizardry on macOS https://www.nathom.dev/notes/hammerspoon_wizardry_on_macos/ Fri, 04 Aug 2023 10:58:48 -0700 https://www.nathom.dev/notes/hammerspoon_wizardry_on_macos/ <p>If you&rsquo;re a nerd, and you&rsquo;ve been around Macs for a while, you might remember Applescript. It was a language developed by Apple to allow intermediate–to–advanced users to write simple scripts that could control Mac applications. It was actually created to resemble the English language, so accessing a pixel would be written as</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-applescript" data-lang="applescript"><span class="line"><span class="cl"><span class="nv">pixel</span> <span class="mi">7</span> <span class="k">of</span> <span class="nv">row</span> <span class="mi">3</span> <span class="k">of</span> <span class="nv">TIFF</span> <span class="na">image</span> <span class="s2">&#34;my bitmap&#34;</span> </span></span></code></pre></div><p>or even</p> <div class="highlight"><pre tabindex="0" class="chroma"><code class="language-applescript" data-lang="applescript"><span class="line"><span class="cl"><span class="nv">TIFF</span> <span class="na">image</span> <span class="s2">&#34;my bitmap&#34;</span>&#39;s <span class="nb">3rd</span> <span class="nv">row</span>&#39;s <span class="nb">7th</span> <span class="nv">pixel</span> </span></span></code></pre></div><p>Needless to say, there&rsquo;s a good reason modern programming languages don&rsquo;t look like this: it doesn&rsquo;t scale. Anyone who has worked with Applescript for extended periods of times knows how fast you run into limitations. Apple unofficially deprecated it in 2016, when Sal Soghoian, the creator, was let go for <a href="https://9to5mac.com/2016/11/17/mac-user-automation-sal-soghoian/" target="_blank" rel="noopener">&ldquo;business reasons&rdquo;</a>.</p> Not–so–casual Performance Optimization in Python https://www.nathom.dev/python-optimization/ Tue, 01 Aug 2023 10:56:08 -0700 https://www.nathom.dev/python-optimization/ <p>My <a href="https://www.nathom.dev/hello-world">previous post</a> (which was honestly created to test out the theme for this site), provided a few code snippets that computed <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span></span></span></span> terms of the sum of inverse squares. I wrote the code in my 4 favorite languages—Python, C, Rust, and Haskell—but when I ran the Python code, it was embarrassingly slow. Compared to the <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4831em;"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">950</span></span></span></span> ms it took sequential Rust, Python took 70 seconds! So, in this post, we&rsquo;re going to attempt to get Python some more reasonable numbers.</p> The Basel Problem (Hello, World!) https://www.nathom.dev/hello-world/ Fri, 28 Jul 2023 22:21:54 -0700 https://www.nathom.dev/hello-world/ <p>Hello, World! This is my first post, and it&rsquo;s exclusively used to test out this website&rsquo;s functionality.</p> <p>Here are some code snippets in various languages that compute the <a href="https://en.wikipedia.org/wiki/Basel_problem" target="_blank" rel="noopener">Basel Problem</a>:</p> Author https://www.nathom.dev/author/ Mon, 01 Jan 0001 00:00:00 +0000 https://www.nathom.dev/author/ <p>I&rsquo;m a Master&rsquo;s student at UCSD working on reinforcement learning for Large Language Models, advised by <a href="https://xiaolonw.github.io/" target="_blank" rel="noopener">Prof. Xiaolong Wang</a>.</p> <p>I got started with programming through <a href="https://github.com/nathom" target="_blank" rel="noopener">open source</a> in high school. Since then I&rsquo;ve interned at Anduril, Stanford AI Lab, Keysight, SDSC, and Yahoo.</p> <p>When I&rsquo;m not programming, I&rsquo;m brewing specialty coffee, lifting weights, or playing pickleball.</p> <p>You can contact me through <span class="scramble-email" data-scramble="bmF0aG9tYXNAdWNzZC5lZHU="><button style="font-family: var(--mono-font); color: #60A5FA; background: none; border: none; padding: 0; text-decoration: none; cursor: pointer; font-size: inherit;">{email}</button></span><script src="https://www.nathom.dev/js/scramble.min.2d6043c68cbbf8e84a13a37c0e042ec9514266185a0e24acbb847c44599a6b14.js"></script> or <a href="https://x.com/realnathom" target="_blank" rel="noopener">X</a>.</p> Books https://www.nathom.dev/bookshelf/ Mon, 01 Jan 0001 00:00:00 +0000 https://www.nathom.dev/bookshelf/ My digital bookshelf, in no particular order. Curriculum Vitae https://www.nathom.dev/cv/ Mon, 01 Jan 0001 00:00:00 +0000 https://www.nathom.dev/cv/