llimllib's noteshttp://notes.billmill.org/2026-04-23T12:03:28.174589Z
Obsidian Notes
Bill Millhttps://notes.billmill.org/link_blog/2026/04/I_am_building_a_cloud.htmlI am building a cloud2026-04-23T11:14:19.193Z2026-04-23T11:14:19.193Z<blockquote>
<p>I do not like the cloud today.</p>
<p>I want to. Computers are great, whether it is a BSD installed directly on a PC or a Linux VM. I can enjoy Windows, BeOS, Novell NetWare, I even installed OS/2 Warp back in the day and had a great time with it. Linux is particularly powerful today and a source of endless potential. And for all the pages of products, the cloud is just Linux VMs. Better, they are API driven Linux VMs. I should be in heaven.</p>
<p>But every cloud product I try is wrong. Some are better than others, but I am constantly constrained by the choices cloud vendors make in ways that make it hard to get computers to do the things I want them to do.</p>
</blockquote>
<ul>
<li><a href="https://crawshaw.io/blog/building-a-cloud" class="external-link">David Crawshaw</a></li>
</ul>
https://notes.billmill.org/book_notes/Designing_Data-Intensive_Applications/Chapter_2_-_defining_nonfunctional_requirements.htmlChapter 2 - defining nonfunctional requirements2026-04-20T15:17:14.922Z2026-04-20T15:17:14.922Z<p>The authors define a <em>functional requirement</em> as "the functionality that the application must offer", and the <em>nonfunctional requirements</em> as everything else you need to do.</p>
<p>Right off the bat, I struggle with this definition! They give performance as an example of a nonfunctional requirement, but obviously that's a matter of degree. Let's see if they clarify, or just allow the definition to be very fuzzy at the edges.</p>
<h2 id="twitter-example">Twitter example</h2>
<p>They start off with a motivating example, of a common interview question: let's design a service like twitter. Given three tables:</p>
<table>
<thead>
<tr>
<th>users</th>
<th>id</th>
<th>screen_name</th>
<th>profile_image</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>12</td>
<td>jack</td>
<td>123.png</td>
</tr>
</tbody>
</table>
<table>
<thead>
<tr>
<th>posts</th>
<th>id</th>
<th>sender_id fk(users)</th>
<th>timestamp</th>
<th>text</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>20</td>
<td>12</td>
<td>123456</td>
<td>just setting up my twttr</td>
</tr>
</tbody>
</table>
<table>
<thead>
<tr>
<th>follows</th>
<th>follower_id fk(users)</th>
<th>followee_id fk(users)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>9923882</td>
<td>12</td>
</tr>
</tbody>
</table>
<p>They give the first pass SQL query to create a timeline page:</p>
<pre><code class="language-sql"><div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">posts</span><span class="p">.</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">users</span><span class="p">.</span><span class="o">*</span><span class="w"> </span>
<span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">posts</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">follows</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">posts</span><span class="p">.</span><span class="n">sender_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">follows</span><span class="p">.</span><span class="n">followee_id</span>
<span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">posts</span><span class="p">.</span><span class="n">sender_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">users</span><span class="p">.</span><span class="n">id</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">posts</span><span class="p">.</span><span class="k">timestamp</span><span class="w"> </span><span class="k">DESC</span>
<span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">1000</span>
</pre></div>
</code></pre>
<p>This query will be expensive, so we will in practice need to materialize the timeline. For each user, we store their home timeline; when a user posts, we look up all their followers and insert that post into their timeline.</p>
<p><em>fan-out</em> means the factor by which the number of requests increases in such a scenario.</p>
<h3 id="describing-performance">describing performance</h3>
<p>Two main types of performance metric:</p>
<ul>
<li><em>response time</em> - the elapsed time from a request to a response</li>
<li><em>throughput</em> - the requests per second the system is processing</li>
</ul>
<p>Generally, response time decreases as throughput increases.</p>
<p>They have a brief sidebar about using jitter and exponential backoff plus circuit breakers to avoid thundering herd issues, but don't dive into it.</p>
<blockquote>
<p>response time is usually what users care about the most, whereas the throughput determines the required computing resources</p>
</blockquote>
<ul>
<li><em>response time</em> is what the client sees; includes all delays anywhere in a system</li>
<li><em>service time</em> is the duration for which the service is actively processing the client's request</li>
<li><em>queueing delays</em> they don't define, oddly, just put it in italics and assume its definition is understood</li>
<li><em>latency</em> is a catchall term for time during which a request is not being actively processed (i.e. during which it is latent)
<ul>
<li>This is interesting to me because they make latency a relativistic measurement; it matters if, from your perspective, you know that work is being done on it</li>
<li>We can call response time "request latency" because it's time when we (the client) don't know that the request is being actively processed (or we can say that it is not being actively processed by our system)
<ul>
<li>the author gives "network latency" as a particular example - <code>the time that a request and response spend traveling through the network</code>, but from the network stack's perspective, there are several different times within that span when the request is latent, and others when it's workign on it</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>There's a brief discussion of median/mean/percentiles</p>
<blockquote>
<p>Amazon describes response time requirements for internal services in terms of the 99.9th percentile... Optimizing the 99.99th percentile (the slowest 1 in 10,000 requests) was deemed too expensive and found to not yield enough benefit</p>
</blockquote>
<p>The authors point to several sources for the "performance == money" idea, and suggest that they're all insufficient and somewhat contradictory, and that we don't really know how much performance is functional, in the sense of making more money from users.</p>
<p>They point to <a href="https://cacm.acm.org/research/the-tail-at-scale/" class="external-link">the tail at scale</a> to describe <em>tail latency amplification</em>, whereby as a single service request makes more and more requests to other endpoints, it becomes more likely that one of them suffers a tail latency spike.</p>
<p>There's a brief discussion of how SLOs and SLAs may use percentiles to define their expectations, and an important sidebar that averaging percentiles is <strong>meaningless</strong>. I've seen people make that mistake many times in practice, and it's possible I have (I certainly have (sorry))</p>
<h3 id="reliability-and-fault-tolerance">reliability and fault tolerance</h3>
<p><em>reliability</em> is, roughly, continuing to work correctly, even when things go wrong</p>
<p>They distinguish betwen <em>faults</em> and <em>failures</em>:</p>
<ul>
<li>A <em>fault</em> occurs when a particular <em>part</em> of a system stops working correctly</li>
<li>A <em>failure</em> occurs when the system <em>as a whole</em> stops providing the required service to the user</li>
</ul>
<blockquote>
<p>They are the same thing at different levels</p>
</blockquote>
<p>I'm glad they said that, it was my immediate objection to that definition!</p>
<p>We call a system <em>fault-tolerant</em> if it continues providing the required service to users in spite of faults occurring, and a part is a <em>single point of failure</em> if failure causes failure to the whole system.</p>
<blockquote>
<p>Counterintuitively, in fault-tolerant systems, it can make sense to <em>increase</em> the rate of faults by triggering them deliberately -- for example, by randomly killing individual processes without warning. This is called <em>fault injection</em>... by deliberately inducing faults, you ensure that the fault-tolerance machinery is continually exercised and tested</p>
</blockquote>
<h3 id="hardware-and-software-faults">Hardware and software faults</h3>
<blockquote>
<p>Hardware redundancy increases the uptime of a single machine... using a distributed system has advantages, such as being able to tolerate a complete outage of one datacenter. For this reason, ccloud systems tend to focus less on the reliability of individual machines and instead aim to make services highly available by tolerating faulty nodes at the software level. Cloud providers use <em>availability zones</em> to identify which resources are physically co-located</p>
</blockquote>
<p>hardware failures are often less correlated than software faults, because it is common for many nodes to run the same software and thus have the same bugs</p>
<h3 id="human-beings">Human beings</h3>
<blockquote>
<p>One study of large internet services found that configuration changes by operators were the leading cause of outages, whereas hardware faults played a role in only 10-25% of cases [<a href="https://www.usenix.org/legacy/events/usits03/tech/full_papers/oppenheimer/oppenheimer_html/index.html" class="external-link">72</a>]</p>
</blockquote>
<p>ed: that study is from 2003, I wonder if there's anything more recent</p>
<blockquote>
<p>What we call "human error" is... a symptom of a problem with the sociotechnical system in which people are trying their best to do their jobs</p>
</blockquote>
<p>They cite "the field guide to understanding human error", which I had for a while but failed to read</p>
<h3 id="scalability">Scalability</h3>
<p><em>Scalability</em> is the term we use to describe a system's ability to cope with increased load</p>
https://notes.billmill.org/visualization/graphs/ggsql.htmlggsql2026-04-20T13:46:10.953Z2026-04-20T13:46:10.953Z<p><a href="https://ggsql.org/" class="external-link">https://ggsql.org/</a></p>
<blockquote>
<p>ggsql brings the elegance of the Grammar of Graphics to SQL. Write familiar queries, add visualization clauses, and see your data transform into beautiful, composable charts — no context switching, no separate tools, just SQL with superpowers.</p>
</blockquote>
<p><code>ggsql</code> introduces the <code>VISUALIZE</code> keyword into its sql dialect, allowing you to do grammar of graphics charts from SQL queries. The home page has an example:</p>
<pre><code class="language-sql"><div class="highlight"><pre><span></span><span class="c1">-- Regular query</span>
<span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">ggsql</span><span class="p">:</span><span class="n">penguins</span>
<span class="k">WHERE</span><span class="w"> </span><span class="n">island</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'Biscoe'</span>
<span class="c1">-- Followed by visualization declaration</span>
<span class="n">VISUALISE</span><span class="w"> </span><span class="n">bill_len</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">bill_dep</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">body_mass</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">fill</span>
<span class="n">DRAW</span><span class="w"> </span><span class="n">point</span>
<span class="n">PLACE</span><span class="w"> </span><span class="k">rule</span><span class="w"> </span>
<span class="w"> </span><span class="n">SETTING</span><span class="w"> </span><span class="n">slope</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">-</span><span class="mi">1</span>
<span class="k">SCALE</span><span class="w"> </span><span class="n">BINNED</span><span class="w"> </span><span class="n">fill</span>
<span class="n">LABEL</span>
<span class="w"> </span><span class="n">title</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="s1">'Relationship between bill dimensions in 3 species of penguins'</span><span class="p">,</span>
<span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="s1">'Bill length (mm)'</span><span class="p">,</span>
<span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="s1">'Bill depth (mm)'</span>
</pre></div>
</code></pre>
<p><a href="/visualization.png"><img class="bodyimg" src="/visualization.png"></a></p>
<p>There's an introductory blog post here: <a href="https://opensource.posit.co/blog/2026-04-20_ggsql_alpha_release/" class="external-link">https://opensource.posit.co/blog/2026-04-20_ggsql_alpha_release/</a> with some more motivations and introductory text</p>
https://notes.billmill.org/visualization/interactive_explainers/How_the_Heck_Does_Shazam_Work_.htmlHow the Heck Does Shazam Work?2026-04-20T13:42:31.266Z2026-04-20T13:42:31.266Z<p><a href="https://perthirtysix.com/how-the-heck-does-shazam-work" class="external-link">https://perthirtysix.com/how-the-heck-does-shazam-work</a></p>
<blockquote>
<p>How audio fingerprinting and a connect-the-dots trick lets Shazam identify a song in seconds.</p>
</blockquote>
<p>Another in Shri Khapalda's series of interactive explainers. Previously:</p>
<ul>
<li><a href="https://perthirtysix.com/how-the-heck-do-qr-codes-work" class="external-link">QR Codes</a></li>
<li><a href="https://perthirtysix.com/how-the-heck-does-gps-work" class="external-link">GPS</a></li>
</ul>
https://notes.billmill.org/music/music_blog/2026/04/Gwenifer_Raymond_Tiny_Desk.htmlGwenifer Raymond Tiny Desk2026-04-18T02:59:33.588Z2026-04-18T02:59:33.588Z<iframe width="560" height="315" src="https://www.youtube.com/embed/DQ2EwQ2B5bc?si=R-4ncZKjET2IfP1o" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
<p>I talked about her previously in <a href="/music/music_blog/2025/12/Last_Night_I_Heard_the_Dog_Star_Bark_-_Gwenifer_Raymond.html" class="internal-link">Last Night I Heard the Dog Star Bark - Gwenifer Raymond</a>, and her tiny desk really showcases her energy</p>
https://notes.billmill.org/book_notes/Designing_Data-Intensive_Applications/Chapter_1.htmlChapter 12026-04-17T13:23:40.062Z2026-04-17T13:23:40.062Z<blockquote>
<p>We call an application <em>data intensive</em> if data management is one of the primary challenges in developing the application. While in <em>compute-intensive</em> systems the challenge is parallelizing a very large computation, in data-intesnive applications we usually worry more about things like storing and processing large data volumes</p>
</blockquote>
<hr />
<p>The difference in use between backend engineers (who modify data & generally look at one user at a time) and business analysts/data scientists has led to a split between <em>operational systems</em> and <em>analytical systems</em> that are often kept separate.</p>
<hr />
<p><strong>point query</strong>: a query that looks up a small number of records based on a <em>key</em><br />
<strong>OLTP</strong>: Online Transaction Processing - a system that inserts, updates or deletes records generally based on a key<br />
<strong>OLAP</strong>: Online Analytical Processing - a system that generally scans a huge number of records to calculate aggregate statistics rather than returning individual records to the user</p>
<p>They differentiate ClickHouse etc (Pinot, Druid? never heard of them) as <strong>product analytics</strong> or <strong>real-time analytics</strong> which are designed for analytical workloads, but serve user-facing products</p>
<hr />
<p>Data from OLTP systems is often spread across the enterprise, and BAs do not want to have to query across potentially dozens of systems. A <strong>data warehouse</strong> contains data extracted from many OLTP systems in a company. (They go on to say that it can come from many other sources as well, which is more accurate imo)</p>
<p>A data warehouse often uses relational tables, which is well suited to what BAs want, but less well suited to data scientists' desires, training ML models and using NLP or computer vision.</p>
<p>(They say that feature engineering is particularly difficult to express using SQL? I'd like to understand what they mean by that more here. <code>[citation needed]</code>)</p>
<p>a <strong>data lake</strong> is a centralized data repository that holds a copy of any data that might be useful for analysis. The difference from a data warehouse is that a data lake simply contains files, without imposing any particular file format. (So, s3 is a data lake? Seems like a kinda useless definition imo)</p>
<hr />
<blockquote>
<p>the term <strong>cloud native</strong> is used to describe an architecture that is designed to take advantage of cloud services</p>
</blockquote>
<p>This is an extremely fuzzy definition to me. They say that Postgres and ClickHouse are self-hosted systems that are not cloud native, but that Aurora and Snowflake are cloud-native, and I think you'd be hard-pressed to really apply the definition above to make that distinction clear.</p>
<p>The things that distinguish them to me are:</p>
<ul>
<li>usually specialized to the operating environment of a particular cloud service provider</li>
<li>usually not available as open source</li>
<li>usually not available to run on a general-purpose computer, or available but in severely degraded form</li>
</ul>
<p>To me, if we want to split the mysqls and postgreses of the world from the aurorae and BigQueries, it's about how <em>specialized</em> they are to cloud service providers' computers as opposed to general-purpose computation</p>
<blockquote>
<p>The key idea of cloud native services is not only to use the computing resources managed by your operating system, but also to build upon the lower-level cloud services to create higher-level services</p>
</blockquote>
<p>yeah that's closer for me</p>
https://notes.billmill.org/programming/image_generation/zignal.htmlzignal2026-04-17T13:19:50.528Z2026-04-17T13:19:50.528Z<p><a href="https://github.com/arrufat/zignal" class="external-link">https://github.com/arrufat/zignal</a><br />
<a href="https://arrufat.github.io/zignal/" class="external-link">https://arrufat.github.io/zignal/</a></p>
<blockquote>
<p>Zignal is a zero-dependency image processing library</p>
</blockquote>
<blockquote>
<ul>
<li><strong>Core Math:</strong> Matrices (<code>SMatrix</code>, <code>Matrix</code>, SVD), PCA, ND Geometry (SIMD Points, affine/projective transforms, convex hull), Statistics, Optimization.</li>
<li><strong>Computer Vision:</strong> Feature detection and matching (FAST, ORB), Edge detection (Shen-Castan), Hough Transform, Feature Distribution Matching (style transfer).</li>
<li><strong>Image Processing:</strong> Spatial transforms (resize, crop, rotate), morphology, convolution filters (blur, sharpen), thresholding, advanced Color Spaces (Lab, Oklab, Oklch, Xyb, Lms, etc.), Perlin noise generation.</li>
<li><strong>I/O & Graphics:</strong> Pure-Zig PNG/JPEG codecs, Canvas API (antialiasing, Bézier curves), Bitmap/PCF Fonts, Colormaps, Terminal graphics (Kitty/Sixel).</li>
<li><strong>Platform Support:</strong> Native Zig, first-class Python bindings, and WASM compilation for the web.</li>
</ul>
</blockquote>
<p>neat-looking zig image processing library with python bindings. Has a CLI that is currently pretty bare-bones.</p>
<p>Pretty impressive to have a library that does its own image decoding, font rendering, etc etc!</p>
https://notes.billmill.org/link_blog/2026/04/Clubs__really__That_s_what_you_call_those_.htmlClubs, really? That's what you call those?2026-04-16T15:00:54.775Z2026-04-16T15:00:54.775Z<p><a href="https://approximateknowledge.net/inkhaven/2026/04/14/playing-cards.html" class="external-link">https://approximateknowledge.net/inkhaven/2026/04/14/playing-cards.html</a></p>
<p>A fun history of how we ended up with the suits common in playing card decks.</p>
<p>I tried to find some word to describe <em>which</em> playing card decks they're common in, but I failed because I have no idea how widespread the playing cards we have in America are</p>
https://notes.billmill.org/link_blog/2026/04/Use_Apple_s_App_Store_at_your_own_risk.htmlUse Apple’s App Store at your own risk2026-04-16T14:54:51.317Z2026-04-16T14:54:51.317Z<blockquote>
<p>The App Store, in other words, is rotten. And whatever Apple’s app-vetting procedure is, it’s not working. Perhaps that reflects the magnitude of the job. At last count there were approximately two million iOS apps on the store, which across its 18-year history equates very roughly to 9,000 per month. Factor in the acceleration over time, not to mention all the other apps that were vetted once but have since been removed because the developers stopping updating them, and that’s a lot of vetting, even for a company with major resources.</p>
</blockquote>
<blockquote>
<p>But is that an excuse? Not really. If running an app store is too much trouble, close it down. If comprehensive vetting is impractical, stop pretending the App Store is completely safe. (And definitely stop scaremongering about sideloading.) If you can’t make the App Store a truly reliable resource for good, safe, legitimate software, then give iPhone users the freedom to install from other places. Or just stop pretending the App Store monopoly is about anything other than revenue.</p>
</blockquote>
<ul>
<li><a href="https://www.macworld.com/article/3115356/use-apples-app-store-at-your-own-risk.html" class="external-link">David Price</a></li>
</ul>
<p>amen</p>
https://notes.billmill.org/blog/2025/07/An_AI_tool_I_find_useful.htmlAn AI tool I find useful2025-07-27T14:31:07.68Z2026-04-16T02:29:32.853Z<p><strong>update</strong>: This tool has been superseded by <a href="/AI/tools/pr-review.html" class="internal-link">pr-review</a>, a similar but more powerful tool</p>
<p>One of the tasks that I do most often is to review code. I've written a <a href="https://github.com/llimllib/personal_code/blob/032c597d4c1f805a0fb6030723e22fcf4349b2ef/homedir/.local/bin/review" class="external-link">review command</a> that asks an AI to review a code sample, and I've gotten a lot of value out of it.</p>
<p>I ignore <em>most</em> of the suggestions that the tool outputs, but it has already saved me often enough from painful errors that I wanted to share it in the hope that others might find it useful.</p>
<h2 id="how-to-install-it">How to install it</h2>
<ul>
<li>install <a href="https://github.com/simonw/llm" class="external-link">llm</a> and configure it to use the provider and model you'd like to use
<ul>
<li>on a mac, <code>brew install llm</code></li>
</ul>
</li>
<li>optionally install <a href="https://github.com/sharkdp/bat" class="external-link">bat</a>
<ul>
<li>on a mac, <code>brew install bat</code></li>
</ul>
</li>
<li>save the <a href="https://github.com/llimllib/personal_code/blob/c85d6b2106e1369339d045e8be0e43484149848e/homedir/.local/bin/review" class="external-link">review script</a> anywhere on your path and make it executable</li>
</ul>
<h2 id="how-it-works">How it works</h2>
<p>The main job of the script is to generate context from a git diff and pass it to <a href="https://github.com/simonw/llm" class="external-link">llm</a> for code review.</p>
<p>If you run <code>review</code> with no arguments, it will:</p>
<ul>
<li>run <code>git diff -U10</code>
<ul>
<li>the <code>-U</code> argument changes the amount of context given in a git diff</li>
<li>any additional arguments you pass will be forwarded directly to <code>git diff</code></li>
</ul>
</li>
<li>estimate the tokens in the output and check if that fits within the context window
<ul>
<li>shrink the context window if necessary and re-run <code>git diff</code></li>
<li>if you want to manually expand the context beyond 10 lines
<ul>
<li>you can append to the system prompt with the <code>--context</code> flag</li>
<li>you can expand the diff context to 100 by passing <code>-U100</code> or any number you prefer</li>
</ul>
</li>
</ul>
</li>
<li>pipe the context to <code>llm</code>
<ul>
<li>You can configure <code>llm</code> to use whichever model or AI company you prefer</li>
</ul>
</li>
<li>highlight the output with <code>bat</code> if available
<ul>
<li><a href="https://github.com/sharkdp/bat" class="external-link">bat</a> is great and I highly recommend using it if you don't already. <code>brew install bat</code> on a mac will install it</li>
</ul>
</li>
</ul>
<p>The result looks like this in my terminal:</p>
<p><a href="/images/Pasted image 20250727105305.png"><img class="bodyimg" src="/images/Pasted image 20250727105305.png"></a></p>
<h2 id="how-i-use-it">How I use it</h2>
<p>My main use of the command is to review a PR I'm preparing before I file it. The <strong>biggest value</strong> I've gotten out of it is that it frequently catches embarrassing errors before I file a PR - misspellings, <code>DELETEME</code>s I forgot to remove, and occasionally logic errors.</p>
<p>It also often suggests features that make sense to add before finishing the PR, or as next steps.</p>
<p>It is very important to <strong>use it intelligently</strong>! The LLM is just an LLM, and it also may be missing context. The screenshot above has two examples of mistaken suggestions that I read and ignored; you have to apply your own understanding and taste to its output.</p>
<p>Keep in mind that it is tasked via its system prompt with finding problems and making suggestions; no matter how good your code is it will try to find and suggest something.</p>
<p>I also use it for reviewing other people's PRs, with <code>review origin/main origin/some-feature-branch</code>. In these cases, I really am just using it for clues as to some things that I may need to investigate with my actual human brain. Please <strong>do not just dump llm suggestions into a PR</strong>! That's both rude and likely to be unhelpful.</p>
<h2 id="how-it-differs">How it differs</h2>
<p>That last point brings me to why I prefer this tool to github's own copilot review tool.</p>
<ul>
<li>I can use my <code>review</code> tool in private, and evaluate its suggestions in private</li>
<li>I can run it repeatedly as I change the code</li>
<li>The separation of my terminal from the code review tool provides a space for me to apply critical thinking</li>
</ul>
<h2 id="areas-for-improvement">Areas for improvement</h2>
<ul>
<li>A lot of things are hard-coded into the script, because I'm its only user
<ul>
<li>If you find use in it, please let me know!</li>
</ul>
</li>
<li>the system prompt seems to work fine, but the range of possible system prompts is so large that I'm sure it could be better</li>
</ul>
<h2 id="postscript">Postscript</h2>
<p>Thanks to <a href="https://lobste.rs/c/mx0moq" class="external-link">a suggestion</a> on <a href="http://lobste.rs" class="external-link">lobste.rs</a> from <code>davidcrespo</code>, I <a href="https://github.com/llimllib/personal_code/commit/032c597d4c1f805a0fb6030723e22fcf4349b2ef" class="external-link">added the ability</a> to provide context via stdin. Thanks for the suggestion!</p>
https://notes.billmill.org/link_blog/2026/04/Retrofitting_JIT_compilers_into_C_interpreters.htmlRetrofitting JIT compilers into C interpreters2026-04-15T12:41:08.38Z2026-04-15T12:41:08.38Z<p><a href="https://tratt.net/laurie/blog/2026/retrofitting_jit_compilers_into_c_interpreters.html" class="external-link">https://tratt.net/laurie/blog/2026/retrofitting_jit_compilers_into_c_interpreters.html</a></p>
<p>Very cool work by Laurence Tratt, Edd Barrett, Lukas Diekmann, and Pavel Durov to create a system, <code>yk</code>, which accepts an interpreter for a language (lua or python for example) and emits a meta-tracing JIT compiler for it.</p>
<div class="admon-info"><p class="info-title">the blog post explains what a _meta-tracing JIT_ is much better than I could here</p><p class="info-content"></p></div>
<p>The upshot of this is that you can, with a reasonable amount of work, automate the construction of JIT compiler from an interpreter that doesn't natively support it.</p>
<p>Put another way, you can build a Pypy-like JIT interpreter from the regular C python interpreter, <em>automatically</em>.</p>
<p>The article does a great job of walking you through what all this means and how it does it. Very cool work!</p>
https://notes.billmill.org/programming/javascript/search/pagefind.htmlpagefind2023-10-20T13:54:09Z2026-04-14T00:25:54.205Z<p><a href="https://pagefind.app/" class="external-link">https://pagefind.app/</a><br />
<a href="https://github.com/CloudCannon/pagefind/" class="external-link">https://github.com/CloudCannon/pagefind/</a></p>
<blockquote>
<p>Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users’ bandwidth as possible, and without hosting any infrastructure.</p>
</blockquote>
<blockquote>
<p>...The goal of Pagefind is that websites with tens of thousands of pages should be searchable by someone in their browser, while consuming a reasonable amount of bandwidth. Pagefind’s search index is split into chunks, so that searching in the browser only ever needs to load a small subset of the search index. Pagefind can run a full-text search on a 10,000 page site with a total network payload under 300KB, including the Pagefind library itself. For most sites, this will be closer to 100KB.</p>
</blockquote>
<p>The search engine appears to be a <a href="https://github.com/CloudCannon/pagefind/tree/1ea1e75c3254899f40a80a740ad829fd00df14dd/pagefind/src" class="external-link">rust application</a> built to a wasm bundle. It looks like it builds an index of the site, creates a bunch of page files, and then when you search it checks the index for what the relevant pages are, downloads only those ones, and serves results.</p>
<p>via <a href="https://mastodon.social/@carlmjohnson/111024763667968055" class="external-link">Carl M. Johnson</a> on mastodon</p>
<p>see also <a href="/programming/javascript/search/FlexSearch.html" class="internal-link">FlexSearch</a></p>
<p><em>Nov 3 2025</em>: Tim Bray <a href="https://www.tbray.org/ongoing/When/202x/2025/11/01/Blog-Search-Pagefind" class="external-link">writes</a> about his adoption of pagefind</p>
<hr />
<p>Following the <a href="https://pagefind.app/docs/" class="external-link">quick start</a>:</p>
<p>To serve a dev server:</p>
<ul>
<li><code>uv run --with 'pagefind[extended]' python -m pagefind --site output --serve</code></li>
</ul>
<p><a href="https://github.com/llimllib/obsidian_notes/commit/8749beabdc66ef2bc3779ac05b7e92fb2f17d4aa" class="external-link">here's the PR where I implemented it for my site</a></p>
https://notes.billmill.org/link_blog/2026/04/No_one_can_force_me_to_have_a_secure_website___.htmlNo one can force me to have a secure website!!!2026-04-14T00:23:56.928Z2026-04-14T00:23:56.928Z<iframe width="560" height="315" src="https://www.youtube.com/embed/M1si1y5lvkk?si=sPdNrfjNYeru8pXt" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
<p>In which the always-worth-watching author tom7 creates the least-secure HTTPS server they can manage</p>
<p>Previously: <a href="/link_blog/2024/06/BoVeX.html" class="internal-link">BoVeX</a></p>
https://notes.billmill.org/computer_usage/image_commands/magika_-_file_type_detection.htmlmagika - file type detection2026-04-13T16:36:57.706Z2026-04-13T16:36:57.706Z<p><a href="https://github.com/google/magika" class="external-link">https://github.com/google/magika</a><br />
<a href="https://opensource.googleblog.com/2024/02/magika-ai-powered-fast-and-efficient-file-type-identification.html" class="external-link">https://opensource.googleblog.com/2024/02/magika-ai-powered-fast-and-efficient-file-type-identification.html</a></p>
<blockquote>
<p>Magika is a novel AI-powered file type detection tool that relies on the recent advance of deep learning to provide accurate detection. Under the hood, Magika employs a custom, highly optimized model that only weighs about a few MBs, and enables precise file identification within milliseconds, even when running on a single CPU. Magika has been trained and evaluated on a dataset of ~100M samples across 200+ content types (covering both binary and textual file formats), and it achieves an average ~99% accuracy on our test set.</p>
</blockquote>
<p>An interesting-looking successor to <a href="https://github.com/file/file" class="external-link">libmagic</a></p>
https://notes.billmill.org/AI/claude/skills/impeccable__design_skill_.htmlimpeccable (design skill)2026-04-12T21:31:48.605Z2026-04-12T21:31:48.605Z<p><a href="https://impeccable.style/" class="external-link">https://impeccable.style/</a></p>
<blockquote>
<p>Anthropic created frontend-design, a skill that guides Claude toward better UI design. Impeccable builds on that foundation with deeper expertise and more control.</p>
</blockquote>
<blockquote>
<p>Every LLM learned from the same generic templates. Without guidance, you get the same predictable mistakes: Inter font, purple gradients, cards nested in cards, gray text on colored backgrounds.</p>
</blockquote>
<blockquote>
<p>Impeccable fights that bias with:</p>
</blockquote>
<blockquote>
<ul>
<li>An expanded skill with 7 domain-specific reference files (view source)</li>
<li>18 steering commands to audit, review, polish, distill, animate, and more</li>
<li>Curated anti-patterns that explicitly tell the AI what NOT to do</li>
</ul>
</blockquote>