Kwik-E-Mart: Who Needs a Gas Town When a Gas Station Will Do

Sat, 07 Mar 2026 00:00:00 +0000
There's a command in beads</a> called mail</code>. It doesn't really do anything. I stared at it for longer than I should have.
I was knee-deep in catalyst-orchestrator</a> at the time, watching Haiku make routing decisions that were completely predictable, burning tokens to arrive at conclusions a status field could have told me for free. And I kept thinking about that mail command. What if it worked? What if agents could send each other specific prompts — not chat, not supervision, just: "here's the job, here's exactly how to do it, go."
But that's not what we do. What we do is open a session, type instructions into an LLM, watch it work for a while, watch it start to drift, watch its attention degrade as the context fills up, and then either restart the whole thing or just accept the garbage output because we're tired of babysitting.
Secret time: I was tired of babysitting.

The Drudgery Problem 🔗</a>
</h2>
Here's what actually happens with documentation. Nobody updates the sequence diagram. Nobody syncs the API docs when the HTTP endpoint changes. Not because they don't care — because they're busy, or they forgot, or the PR was already approved and who's going back to add a diagram now?</p>
The alternative to 85% good enough isn't 100%. It's zero.</p>
We are in this very stupid cycle of talking to LLMs for all activities. Opening sessions. Crafting prompts in real time. Watching the robot work. Correcting the robot. Watching it again. That's not automation. That's a slightly fancier version of doing it yourself, except now you're also managing the thing that's doing it.</p>
Repeatability is what gives us productivity and reduces drudgery. Not cleverness. Not meta-orchestration. Not agents supervising agents in some fractal management structure that would make Dilbert weep. Just: when this thing happens, do this other thing, with this specific prompt, every time.</p>
You should be picking up the conflicts now. We want automation but we keep building supervision. We want repeatability but we keep reaching for general-purpose tools that require us to be in the room.</p>

The Degradation Problem 🔗</a>
</h2>
I watched an agent lose its mind over a long session. Not dramatically — it didn't hallucinate monsters or start writing poetry. It just got... worse. Slowly. The way a person gets worse at their job at hour eleven of a twelve-hour shift.</p>
LLM supervision orchestration has a structural problem: the orchestrator is itself an LLM. It's consuming context to manage context. Every decision it makes, every status check it interprets, every routing choice — that's all context window. And context window is attention. And attention degrades over length. The longer the session runs, the less accurately the supervising agent works, which means the work it's supervising also gets less accurate, which means it has more problems to manage, which means more context consumed.</p>
It's a death spiral with a credit card attached.</p>
Gas Town</a> solves this by being a town</em> — 20 to 30 Claude instances coordinated by a Mayor, with Polecats and Refineries and Deacons. It's impressive engineering. It's also a town. I don't need a town. Most of the agentic work I do targets small deterministic workflows on low throughput operations. Frontier LLMs at high concurrent volume are expensive. I needed a gas station.</p>

What a Gas Station Looks Like 🔗</a>
</h2>
Kwik-E-Mart</a> is what happened when I stopped thinking about orchestration and started thinking about mail.</p>
Not email. Newsgroups. NNTP. Remember newsgroups? Messages are public. You pull what you want. You copy what you want locally. Nobody's routing messages to</em> you — you subscribe to what you care about and you read when you're ready. The server doesn't care if you read or not. The messages are there.</p>
That's the mental model. A daemon persists events to an append-only JSON-lines file. Producers dispatch events through stdin or a watch command that polls arbitrary commands on an interval. Consumers pull events, render Go templates against the event payload, and execute LLM subprocesses with the rendered prompt. The consumer acknowledges when it's done. If it crashes, the event is still there.</p>
kwike daemon --http :4444
</span>kwike watch "git diff HEAD" --type ci.diff --interval 30s
</span>kwike dispatch --type review.requested < payload.json
</span>kwike consume --config reviewer.yaml --once
</span></code></pre>
Four subcommands. Single binary. That's the whole thing.</p>
I can't guarantee the LLM performs correctly. It's all based on prompt tuning and sometimes just luck. But I can guarantee that if the action should happen, the agent gets the message. There's always an execution, always a log, always feedback. And I can identify whether the LLM performed the expected operation and retry if it didn't. The durability is in the pipeline, not the output.</p>

The Unix Thing 🔗</a>
</h2>
This wasn't accidental. I explicitly followed Eric Raymond's 17 Unix Rules from The Art of Unix Programming</em> and Mike Gancarz's Unix Philosophy tenets.</p>
JSON-lines is "store data in flat text files." The four subcommands are "make each program do one thing well." Piping dispatch from stdin is "make every program a filter." The daemon over Unix sockets is "write transparent programs." --dry-run</code> is "write programs which fail in a way that is easy to diagnose."</p>
Small is beautiful. Build modular programs. Use composition. Avoid unnecessary output.</p>
These aren't principles I admire from a distance. They're the architecture document.</p>

So Where Do the Big Tools Fit? 🔗</a>
</h2>
Here's the thing — NATS, Redis Streams, Kafka — they're not wrong. They're overscaled. They solve big problems at big scale with big infrastructure. Sometimes you need that. Most of the time, for this kind of work, you don't.</p>
Unix Pipes 🔗</a>
</h3>
The simplest version of reactive LLM flows is already in your shell:</p>
fswatch -r ./content </span>| while </span>read</span> file</span>; do
</span>  cat </span>"$file" </span>| </span>llm </span>"summarize this"
</span>done
</span></code></pre>
This works until your consumer crashes and misses events, or you want two consumers processing the same stream differently, or you want to replay history. Stdout is gone. There's no durability. There's no fanout. But for a one-off? Don't overthink it.</p>
NATS JetStream 🔗</a>
</h3>
General-purpose distributed messaging. Durable streams, consumer groups, subject-based routing, clustering. Battle-tested. Massive ecosystem.</p>
nats stream add EVENTS --subjects </span>"events.>"
</span>nats consumer add EVENTS llm-reviewer --deliver all --ack explicit
</span>
</span># Consumer loop you write yourself
</span>while </span>true</span>; do
</span>  msg</span>=</span>$(nats consumer next EVENTS llm-reviewer --count 1 --timeout 30s)
</span>  </span>echo </span>"$msg" </span>| </span>llm -s </span>"you are a code reviewer"
</span>done
</span></code></pre>
The gap: every bit of LLM integration is your problem. You write the consumer loop, the retry logic, the concurrency limits, the backoff, the prompt rendering. It's a messaging system you build LLM workflows on top of. And you need a NATS server running.</p>
Redis Streams 🔗</a>
</h3>
If Redis is already in your stack:</p>
redis-cli XADD events:file </span>'*'</span> path </span>"$file"</span> diff </span>"$diff"
</span>redis-cli XGROUP CREATE events:file reviewers </span>'$'</span> MKSTREAM
</span>redis-cli XREADGROUP GROUP reviewers worker1 BLOCK 0 STREAMS events:file </span>'>'
</span></code></pre>
Similar trade-offs. Durable. Fanout via consumer groups. Simpler mental model than NATS. Same LLM gap.</p>
Kafka 🔗</a>
</h3>
I wrote about recreating Kafka from scratch</a>. The experience reinforced that Kafka's model is right for high-throughput distributed event streaming and way too much for everything I'm describing here. It's Java. It needs a cluster. Standing it up for "I want to watch some files and talk to an LLM" is like hiring a construction crew to hang a picture frame.</p>
The Table 🔗</a>
</h3>
</th> Unix Pipes</th> NATS JetStream</th> Redis Streams</th> Kafka</th> Kwik-E-Mart</th></tr></thead>

Durability</td> No</td> Yes</td> Yes</td> Yes</td> Yes</td></tr>
Fanout</td> No</td> Yes</td> Yes</td> Yes</td> Yes</td></tr>
LLM templating</td> No</td> No</td> No</td> No</td> Yes</td></tr>
Consumer config</td> Shell</td> Imperative</td> Imperative</td> Imperative</td> Declarative YAML</td></tr>
Watch → Event</td> Manual</td> Manual</td> Manual</td> Manual</td> Built-in</td></tr>
CI-friendly</td> Sort of</td> Needs server</td> Needs server</td> Needs cluster</td> --once</code> --dry-run</code></td></tr>
Distributed</td> No</td> Yes</td> Yes</td> Yes</td> Yes (HTTP mode)</td></tr>
Deployment</td> None</td> Server + CLI</td> Redis server</td> JVM cluster</td> Single binary</td></tr>
</tbody></table>
NATS and Kafka win on scale, ecosystem, and distributed infrastructure. They're designed for that. Kwik-E-Mart wins on "I need a single binary that watches things and feeds specific prompts to LLMs with zero infrastructure."</p>

Where It Actually Runs 🔗</a>
</h2>
This isn't a developer tool. It's infrastructure.</p>
Consider a GitLab CI pipeline where module-specific consumers with tailored prompts live in the codebase</em> alongside the code they're about. Not a generic "AI, update the docs." A specific consumer config for that</em> module, with that</em> prompt, that knows how to handle that</em> class of file:</p>
dispatch</span>:
</span>  </span>stage</span>: </span>collect
</span>  </span>script</span>:
</span>    - </span>kwike dispatch --type "lint.result" < lint.json
</span>    - </span>kwike dispatch --type "test.result" < test.json
</span>  </span>artifacts</span>:
</span>    </span>paths</span>:
</span>      - </span>events.jsonl
</span>
</span>review</span>:
</span>  </span>stage</span>: </span>analyze
</span>  </span>needs</span>: [</span>dispatch</span>]
</span>  </span>script</span>:
</span>    - </span>kwike consume --config llm-review.yaml --once
</span></code></pre>
--once</code> means process events and exit. That's CI-native behavior. NATS consumers are designed to be long-lived. Kafka consumers definitely are. Kwik-E-Mart treats batch execution as a first-class mode because that's how CI works.</p>
Or consider it deployed in a Lambda, spinning up instanced, routing based on the specific PagerDuty issue, delivering the right prompt for the right class of incident. Or as a k8s pod watching Prometheus for alarm conditions, helping define when PagerDuty should even be triggered. Or as a sidecar to any application, watching logs for specific conditions.</p>
It runs where the events are, not where the developer is. These things don't run on developer systems.</p>

The Point 🔗</a>
</h2>
We keep building towns when we need gas stations. We keep opening chat sessions when we need mailboxes. We keep supervising robots when we should be delegating to them.</p>
The unsleeping robot will do it 85% good enough. We can always clean it up. But otherwise? We get nothing. The sequence diagram stays stale. The API docs drift. The test failure sits unexplained until a human has time to look, which is never.</p>
Just having sequence diagrams updated or API docs synced when HTTP endpoint code changes is like magic. Except it's not magic. It's a JSON-lines file, four subcommands, and a prompt that knows what it's looking at.</p>

"Small is beautiful." — Mike Gancarz, The UNIX Philosophy</em>, 1994</p>
</blockquote>


DevLog 🔗</a>
</h2>

15 03 2026 🔗</a>
</h3>
v0.5 — Making tools for robots 🔗</a>
</h4>
Kwike is not really designed to be used by humans, its rather complicated. Since its been designed as a tool robots use its cli is intended to instruct process and carry a lot of documentation. Its an interesting problem and the question is, does something like this inform on how to build for humans too.</p>
</div>


Catalyst: An Orchestrator That Stopped Asking and Started Deciding
Sun, 15 Feb 2026 00:00:00 +0000


DevLog 🔗</a>
</h2>

11 02 2026 🔗</a>
</h3>
v0.1 — The Haiku Decides 🔗</a>
</h4>
The first version of catalyst was built around a simple idea: let a cheap, fast LLM handle the coordination. The daemon watched beads molecules for ready steps, spawned agents (Sonnet for implementation, Haiku for review, Opus for merging), and when something interesting happened — a review finished, a step failed, an agent got stuck — it packaged that event as a gate and shipped it over a Unix socket to a Haiku orchestrator.</p>
Haiku's job was to interpret the situation and respond. "Review passed — merge or MR?" Haiku would answer merge</code>. "Step failed — retry, skip, or abort?" Haiku would decide. The daemon was deliberately reactive. It didn't interpret agent output. It didn't make routing decisions. It just watched molecules, ran agents, hit gates, and waited for Haiku to tell it what to do.</p>
The architecture looked like this:</p>
┌─────────────────┐     NDJSON/socket     ┌─────────────────┐
</span>│  Haiku          │◄────────────────────►│  catalyst       │
</span>│  orchestrator   │                       │  daemon         │
</span>│                 │  gate_waiting:        │                 │
</span>│  - clears gates │  "review passed,      │  - watches      │
</span>│  - picks beads  │   merge or mr?"       │    molecules    │
</span>│  - delegates    │  ◄──────────────────  │  - runs agents  │
</span>│    merge to     │  "merge"              │  - emits events │
</span>│    Opus         │  ──────────────────►  │  - handles      │
</span>└─────────────────┘                       │    gates        │
</span>                                          └─────────────────┘
</span></code></pre>
Agent output was unstructured prose. The daemon didn't parse it — Haiku did. Formulas defined the full workflow explicitly: implement → review → fix → merge</code>, with gates as decision points between steps. Every transition required Haiku's blessing.</p>
This worked. But it had a cost. Haiku was interpreting free-form text to make routing decisions that were, in practice, completely predictable. "Review passed" always meant merge. "Review failed" always meant fix. The orchestrator was spending tokens to arrive at conclusions the daemon could have reached by parsing a status field.</p>
</div>

13 02 2026 🔗</a>
</h3>
v0.2 — The Daemon Parses, The Daemon Routes 🔗</a>
</h4>
v0.2 was a philosophical inversion. Instead of the daemon asking Haiku what to do, agents were told to output a machine-parseable block, and the daemon was taught to read it.</p>
The STEP-RESULT protocol replaced unstructured prose:</p>
---STEP-RESULT---
</span>STATUS: DONE
</span>VERDICT: APPROVED
</span>SUMMARY: Implementation meets all acceptance criteria
</span>INSTRUCTIONS:
</span>- Minor: consider adding a timeout to the HTTP client
</span>---END-RESULT---
</span></code></pre>
The daemon's new StepResultParser</code> extracted structured fields. The StatusRouter</code> made deterministic decisions based on what it found:</p>
STATUS: DONE
</span>  ├── Reviewer? Check VERDICT
</span>  │   ├── APPROVED → enable merge step
</span>  │   └── REJECTED → enable fix step, pass INSTRUCTIONS downstream
</span>  └── Otherwise → advance DAG
</span>
</span>STATUS: BLOCKED → mark bead blocked
</span>STATUS: ERROR   → mark bead blocked
</span></code></pre>
No LLM interpretation. No token spend on routing. The daemon read the result and knew where to go.</p>
This version also introduced automatic retry with Opus escalation — if an agent produced malformed output (missing the STEP-RESULT block), the daemon retried up to 5 times with the original model, then escalated to Opus. If even Opus couldn't produce a parseable result, the bead was marked blocked. All retry events were logged to bead comments for auditability.</p>
Agent prompts moved from hardcoded Go strings to an external TOML template file (agent_prompts.toml</code>), with Go template variables ({{.BeadID}}</code>, {{.Description}}</code>, {{.ReviewInstructions}}</code>) injected at runtime. Review instructions from the reviewer's INSTRUCTIONS field flowed downstream to the fixer and merge agents via bead comments — the reviewer could say "fix the nil pointer in auth.go:45" and the fixer would see that in its prompt.</p>
The formula still defined implement → review → fix → merge</code> as explicit steps. The fix step always existed in the workflow, even when the review approved and it was never needed.</p>
</div>

14 02 2026 🔗</a>
</h3>
v0.3 — The Daemon Creates Steps at Runtime 🔗</a>
</h4>
v0.3 asked: if the daemon is already making the routing decisions, why does the fix step need to exist in the formula at all?</p>
The answer was that it didn't. In v0.3, the fix step was removed from every formula. The workflow became implement → review → merge</code>. When the reviewer output VERDICT: REJECTED</code>, the daemon dynamically created a fix step in beads storage, spawned a fixer agent, and when the fix completed, reset the review step to open</code> so it would re-run.</p>
implement → review ←──────────────┐
</span>              │                   │
</span>    ┌─────────┴─────────┐        │
</span>    │                   │        │
</span> APPROVED            REJECTED    │
</span>    │                   │        │
</span>    ▼                   ▼        │
</span>  merge           create fix     │
</span>                  (dynamic step) │
</span>                        │        │
</span>                   fix runs      │
</span>                        │        │
</span>                   fix DONE ─────┘
</span></code></pre>
This loop could run up to 3 times. Iteration count was tracked via bead comments ([daemon] FIX_ITERATION: N</code>). After 3 rejections, the bead was marked BLOCKED — the daemon decided the implementation couldn't be salvaged through automated fixes.</p>
The key design insight was crash recovery. Fix steps weren't held in daemon memory — they were persisted as real beads issues. If the daemon crashed mid-fix, it would restart, scan for in-progress molecules, find the fix step by its title pattern (Fix: <beadID></code>), and resume processing. The existing step identification logic (extractFormulaStepID()</code>) already recognized the "Fix:" prefix, so dynamically created steps were processed identically to formula-defined ones.</p>
The review step stayed in_progress</code> during the fix loop rather than being closed and reopened. This avoided a tricky state transition (beads didn't naturally support closed → open) and made conceptual sense — the review represented "evaluate this implementation," and that evaluation wasn't complete until the code either passed or exceeded the iteration limit.</p>
This version also introduced stub-claude</code> for deterministic end-to-end testing. Instead of running real Claude against real code, test scenarios defined expected agent sequences: "reviewer rejects once, then approves" should produce exactly 6 agent invocations (refine, implement, review, fix, review, merge). This made the implicit fix loop testable without burning API tokens.</p>
</div>

The Arc 🔗</a>
</h2>




Automatic Programming: Iteration 4
Sun, 08 Feb 2026 00:00:00 +0000

Automatic Programming: Iteration 4 🔗</a>
</h2>
While I know that the discourse is not complete binary whether you are for or against LLM generated code it's probably the right time to take a step back a few years and explore the iterations of our industry.</p>
Let's just work backwards, COBOL, iteration 3.</p>

[Common Business-Oriented Language] (Synonymous with evil.) A weak, verbose, and flabby language used by code grinders to do boring mindless things on dinosaur mainframes. Hackers believe that all COBOL programmers are suits or code grinders, and no self-respecting hacker will ever admit to having learned the language. Its very name is seldom uttered without ritual expressions of disgust or horror.
Evans, Claire L - Broad band</em> the untold story of the women who made the Internet_ -> From The Hacker's Dictionary</em></p>
</blockquote>
Yes, the perpetual software of big finance and numerous other systems. While I haven't written COBOL myself I have had to rewrite at least one system written in COBOL.</p>
Poking fun aside COBOL exists because of the dream that non-experts could be computer programmers. By today's standards you still need to be an expert to write COBOL. Prior to its creation, software was written in assembly and before that machine code, and before that patch cables, actual wires, iteration 1.</p>

Grace knew that would only happen when two things occurred:</p>

Users could command their own computers in natural language.</li>
That language was machine independent.
That is to say, when a piece of software could be understood by a programmer as well as by its users, when the same piece of software could run on a UNIVAC as easily as on an IBM machine, code could begin to bend to the wills of the world. Grace called this general idea "automatic programming"...
Evans, Claire L - Broad band</em> the untold story of the women who made the Internet_</li>
</ol>
</blockquote>
That Grace of course was Grace Hopper, and she was obsessed with making programming easier and more efficient. In her time programming was a kind of wizardry and very few knew the incantations to make the computer operate. From a business standpoint making programming easier was bad for business since computer companies sold the computer and the software.</p>
Portable programs, ones that could be written on any machine for any other machine was a business risk. It created competition so there was resistance.</p>

Those who resisted automatic programming became known as "Neanderthals." They might as well have called themselves framebreakers, as Lord Byron had over a century before.
Evans, Claire L - Broad band</em> the untold story of the women who made the Internet_</p>
</blockquote>
"Framebreakers" refers to those workers who opposed the automatic loom, better known as the Luddites.</p>
Before computers cloth was made on the loom and the origin of the punchcard was used to create an automatic loom. After its invention there was not much use for using a manual loom. It was disruptive and changed an entire industry, displacing workers.</p>
Grace and her cadre believed in a future where the programs write themselves. There is some corollary to today with code generation, which actually is programs writing themselves, something Grace dreamed of. The difference between the loom and the compiler is only in the growth potential. Cloth was an end result of a chain of optimization but while you could tirelessly create it in any pattern imaginable, someone still needed to imagine the patterns. Variety of cloth became commonplace, the art remained, the drudgery was lost.</p>
Now I respect that for some the act was the value, I share those feelings. I love writing the actual code, I care about it more than the product it produces. An opinion whose popularity depends on what side of the invoice you sit.</p>

A quick lesson: computers do not understand English, French, Mandarin Chinese, or any human language. Only Machine code, usually binary, can command a computer, at its most elemental level, to pulse electricity through its interconnected logic gates.
Evans, Claire L - Broad band</em> the untold story of the women who made the Internet_</p>
</blockquote>
Programs are essentially the aggregation of basic operations, layers upon layers. If we think of code generation as just another kind of compiler it's part of a long lifeline of change approaching the ideal "automatic programming."</p>
Of course I would prefer to ask Grace her formal opinion. But in her time when presenting her arguments, mathematicians were once inundated in the tedium of arithmetic to solve their equations. Computers arrived and essentially removed the need for those steps and allowed them to get closer to the interesting part, the solutions. She argued that the compiler did the same thing, modulating the complexity of using computers allowed programmers to spend more time on stimulating thoughts.</p>
Of course the reality was mathematicians became programmers to advance their work.</p>
Programmers used compilers to build elegant languages, and COBOL.</p>
I think it's quite funny to have the perspective that programs are binary, iteration 2, and writing the program for the computer was to create something that could accurately generate binary programs.</p>
Now we stand at another transition where automatic programming is telling the computer to write the code that the compiler turns into a binary program. If the compiler was the 3rd level operation we are now at the 4th.</p>
Here is where it all came together though: 🔗</a>
</h2>

Grace loved coding, but she admitted that "the novelty of inventing programs wears off and degenerates into the dull labor of writing and checking programs. This duty now looms as an imposition on the human brain"
Evans, Claire L - Broad band</em> the untold story of the women who made the Internet_</p>
</blockquote>
I have been feeling this for years, the code just keeps getting more repetitive. I just keep doing the same extremely complicated and extremely boring operations over and over again. The novelty of software I grew up with in the 2000s is over. Everything is a framework or a dogma and all solutions are solving the same problems with a different color scheme and font.</p>
If anything would make me embrace the 4th level it's this, even if it means no one needs me anymore. I can at least see the realization of Grace's dream and in some way if everyone becomes a programmer finally, I'll have more people to talk to about what I love.</p>
We aren't there yet, the software world is still pretty complicated and you have to know a lot of special dance moves to get things working right. But it's not going to be forever.</p>
The book 🔗</a>
</h2>
Before I wander off into a diatribe of where our future is going lemme just stop and tell you to read this book:</p>
Broad Band - Claire L. Evans</a></p>
It's a good one, it has changed my mind on if I will continue to call myself an engineer or a programmer. The definition has finally been clarified. I was today - 2 weeks old and that was too long to know the truth.</p>
Also it answers the question of why women were not present in Computer Science but I was taught by women who had careers in computer science.</p>
Point is strong recommendation.</p>
Where are we going? 🔗</a>
</h2>
I dunno, maybe we are all out of work. Maybe our MBA Degree bosses will finally see their reality of the numbers going up and to the right forever.</p>
I see it like this, programming ended when the job absorbed all its roles by mere definition. It has been an amalgam for a while and that has been a crime. Now I can focus on building things again at the scale required in our times. I can concurrently build 2 or 3 projects while focusing on my writing. Sounds like a dream and the troubles of today are not forever.</p>
The current goals of centralized AI is unsustainable and within a few years the ASICs will arrive, our computers will be packed with high bandwidth memory and the models will be local. Just like how all of a sudden we all started walking around super computers in our pockets we will build the infrastructure to build all the hardware we need to move forward.</p>
I mean we live in the dumbest timeline and greed seems to be winning but if the pattern from past is here to loop again we go from the dumb time to the bright time for a while again. I am looking forward to that at least.</p>


Keep Your Eyes on the IDE, and Your Robots on the Tickets
Sun, 08 Feb 2026 00:00:00 +0000
Keep Your Eyes on the IDE, and Your Robots on the Tickets 🔗</a>
</h2>
Initial Scene:</em></p>
Narrator: Bead Manager?! What does that even mean... let's start back at the beginning:
</span></code></pre>
Scene Break:</em> (Dissolve)</p>
Time Jump:</em></p>
"Two weeks earlier..."
</span></code></pre>
New Scene:</em></p>
A tall handsome man with thick dark hair leans over a computer with boxes of black bordered in grey scrolling dark green text. Scowling...
</span>
</span>Author enters the room
</span>
</span>Author: Who the hell are you! Get away from my laptop! Freaking coffee shops...
</span></code></pre>
The Hero's Journey 🔗</a>
</h3>
As you can imagine I have been following the post transformer LLM growth for about 4-5 years at this point. I didn't understand it and I never really used it but I keep my ear to the ground. Increasingly frustrated with the inability to keep the LLM on task. I mean its ignorance on my part and the tool isn't ready yet. Such is the mark of progress, things improve over time. Although I am still challenged with simple things.</p>

Give me 20 variations of this prompt for as jsonl training data using X format</p>
</blockquote>
I get 8...</p>
I get 23</p>
I get 12</p>
Jump Cut:</em></p>
Laptop launches out the window
</span></code></pre>
So that's problem one and how do we solve it? Well with a novel wrapper that counts outputs and then re-prompts to do it again. I think they call that the Ralph Loop</em>, I don't, I just call it the nature of the thing.</p>
I learned later that this is generally caused by ambiguity of the context. Asking for 1 item 20 times and feeding back in the previous set to avoid duplicates always works better. The teaching: the computer is dumb, don't make it think too hard and everything goes smoother.</p>
Most of what is to follow is the application of Agentic Patterns: Elements of Reusable Context-Oriented Determinism</a></p>
Beads 🔗</a>
</h3>
What beads provides is really just an idea and its worth exploring yourself: https://github.com/steveyegge/beads</a>.</p>
It self describes as "A memory upgrade for your coding agent," which I think is arguable but it was the trigger I needed to expand my concept of what a workflow with an LLM could look like. To be honest I didn't just go "Ah Beads! Its all clear now." Instead I found this article about Gas Town</a> which I didn't read, thanks ADHD, and instead installed it blindly. If I was to give it a review it would be that Gas Town is kind of a meme of agent orchestration. Clearly, there is a lot of work put into it, but I think the author might agree that its an expression of an idea in a more artistic than practical form.</p>
But who cares, I walked back from Gas Town to beads, the underlying magic, in my opinion. So I describe this as a context graph, I am able to manually or with an agent LLM extract just as much focused context as I want and use it as a concrete repeatable prompt. While the same prompt doesn't get the exact response each time, the same prompt gets generally the same tool use and generally the same code is constructed. Which makes me wonder if the variability of code is so limited by its grammar restrictions that LLMs have less predictive options to bias towards.</p>
Ok, I am gilding the lily a bit, a bead is just a bug ticket or a todo list and its a prompt that has a dependency chain. How I am using it is more like Jira for robots, if Jira wasn't software designed for my suffering. I am able to build a feature and break down tasks then feed a path of those tasks to the agent.</p>
You may be asking, but why not just use markdown files or JSONL. Well because I am a human and I hate reading JSONL files, I have ADHD so if the file is longer than 10 lines it will never be fully read. Better put what you want as the last line on the bottom cause thats all I see. Point is I need to be able to monitor, tune, and track the agents. See what Gas Town did was have the agents self-manage. While novel, its a bit bizarre when you are trying to avoid scope creep cause LLMs love to add features.</p>
Back to the other question, why not markdown files. Well, two reasons, first they are kinda noisy, second if the LLM has to read more than the exact section of the file they are working on some ambiguity could be introduced. If you notice the agent will often scan a file 50 lines at a time if there is no index. Which means some of that ends up in its context. When we want determinism our first goal is to make sure each interaction is exactly the same prompt. This means beads is mostly an opinion and is probably not required.</p>
Stay in the IDE and Manage your robots 🔗</a>
</h3>
So good choices after bad maybe but when I have a database for my tasks and their prompts I need a way to visualize it. The purpose here is to allow me to create and observe the tasks my agent orchestration is running on. For me this is just Claude Opus delegating tasks to Sonnet agents in an agentic loop.</p>
This all started with this command bd graph --compact --all</code></p>
</p>
All because I wanted to watch my agent orchestration work through my tickets for another project.</p>
Well that has led to this:</p>
</p>
A full management console that lets me watch the beads transition status but also let me edit and add comments.</p>
In this video there is an experimental refinement mechanism being demonstrated, available in the current release: Jetbrains Marketplace</a></p>
The workflow 🔗</a>
</h3>
So the other half of this tool is this set of prompts for claude: beads-orchestration-claude</a></p>
Now this is for claude but the practice can be applied manually or using other agents, the pattern is what matters and the prompt helps encapsulate the pattern more than the agent.</p>
The keys here are:</p>

Recoverable</li>
Durable</li>
Keep your eyes in the IDE</li>
</ul>
1. Planning 🔗</a>
</h4>
So our first path here is to plan out a feature. This is really the only time we have a discussion with the LLM but my recommendation is to write a brief in a markdown file. A musing is good enough where you describe the problem, some technical planning around constraints and the systems you want to support.</p>
What you do for any brief, use-cases, goals, non-goals, definitions, and open questions.</p>
Once this is prepared you hand this over to the agent. For me that uses the /new project</code> command REF</a> if we provide it with project name</code> readme</code> git remote url</code> it will setup a baseline project with beads using some LLM magic and a bash script read the brief and prepare the project with a proper explanation of the project for CLAUDE.</p>
Once we have a nice agent specific write up for the project, which is important, we can begin planning. Beads provides some tools that will naturally be injected into your project to help the agent. But you may need to tell your agent this</p>

use beads bd</code> to plan out tasks for this project, bd prime</code> for an overview of commands</p>
</blockquote>
bd prime</code> exposes an agent friendly output for how to invoke commands.</p>
Your agent should now be creating issues in beads for your project. Depending on how you like it you can use as many or as few features as you like from beads, which has a number of fields to hold context about actions. At the very simplest you will get titles and descriptions. If you asked for a feature or an epic you will find they may have been mapped as dependencies.</p>
You should then review the tasks. This can be done with bd list</code> and bd show id</code> or use the jetbrains plugin.</p>
2. Review 🔗</a>
</h4>
So now we review the beads and expand / contract the plan asking the agent to defer tickets we are unsure about or expand others.</p>
3. Work Breakdown 🔗</a>
</h4>
This is probably the most important part. Ask a reasoning model to review all the beads and provide implementation details for those exact tasks in the beads. The idea here is to have the agent make a big plan but instead of writing all the code write code snippets that are attached to the tasks.</p>
We can then take the vibe code approach and execute on this or do a pre-review of our code. Its not uncommon for the agent to have wandered down a bad architecture path. Here is our moment to focus on a specific task and a specific ticket and allow things to be revised in a focused way.</p>
The best way to do this is to first clear your context and ask:</p>

Given the project overview please review bead  and revise to include an a single refresh flow for all data sources. Also review implementation details.</p>
</blockquote>
4. SDLC 🔗</a>
</h4>
Tell the agent to now make documentation and testing tasks linking them as required to the beads that relate to them. You should end up with a layer 2 of tasks that will follow up after the implementation completes.</p>
I usually then ask:</p>

Given the use-cases in the project overview define an e2e testing ticket for planning e2e tests that we can review at the end.</p>
</blockquote>
If all is well the agent should create a task that it will stop and design testing with you that include acceptance criteria based on the provided use-cases.</p>
5. Burn tokens 🔗</a>
</h4>
Now we get to the more technical part. We need to delegate actions to sub agents and depending on what agent infra you use this could be built-in or require manual orchestration.</p>
The command /beads-orchestrate</code> REF</a> handles most of the heavy lifting.</p>
It instructs the orchestrator to fork new processes using a template. For claude this means it will append</p>
--dangerously-skip-permissions --model sonnet</span>|</span>haiku --print -p </span>"..."
</span></code></pre>
For the prompt it will read the bead details along with some workflow about updating the bead and write that new prompt to a temp file. Passing the temp file to the new process. This obviously gives you the ability to debug what is happening at the injected prompt level.</p>
It then sleeps and waits for the sub process to finish.</p>
Why?</em> Well Claude is just a nodejs app and it eventually runs out heap space because it reads the stdout and stderr of all tasks it orchestrates internally. As a subprocess it watches, it's a fresh Claude instance so if it fails it fails in a recoverable way. Since the prompt file is named after the bead it can recover by just restarting the agent.</p>
At this point the orchestrator should spawn the implementer which reads the implementation details and completes the work.</p>
Then the orchestrator will spawn a new agent to handle code review, usually a simpler agent.</p>
All this time the agents will leave comments on the tickets so you can see where it ran into problems and picked a tradeoff. This is very important for root cause analysis later if a feature degrades. You can have the LLM resurrect the changes merged into a branch the same name as the bead. Review the decision it made and make another one. Better yet, since the orchestrator and implementer read the comments you can just append your request to the ticket, mark it open, and tell the agent to refactor it again.</p>
Here is an example of what a bead comment thread looks like in practice. This is from an issue where the Gradle build was failing:</p>

[Opus]</strong> at 2026-02-06 01:11
Starting orchestration. Dispatching Sonnet implementer to fix Kotlin stdlib warning in gradle buildPlugin.</p>
[Opus]</strong> at 2026-02-06 01:16
Resuming orchestration. Dispatching Sonnet implementer to fix Kotlin stdlib warning.</p>
[Opus]</strong> at 2026-02-06 01:24
Resuming orchestration. Previous worktree had no commits - starting fresh. Dispatching Sonnet implementer to fix Kotlin stdlib warning.</p>
[Sonnet]</strong> at 2026-02-06 01:25
Starting implementation. Will examine build.gradle.kts and gradle.properties to understand current configuration, then apply fix per https://jb.gg/intellij-platform-kotlin-stdlib</p>
[Sonnet]</strong> at 2026-02-06 01:26
COMPLETED: Added kotlin.stdlib.default.dependency=false to gradle.properties. Build verified successful without warnings. Fix committed to branch.</p>
</blockquote>
Notice how the orchestrator (Opus) had to resume twice - once after the first dispatch seemingly stalled, and again when it found the worktree had no commits. This is the kind of recovery that happens automatically. The implementer (Sonnet) then picked up the task, did its research, applied the fix, and verified success. All of this is visible in the ticket history without watching terminal output scroll by.</p>
6. When it fails 🔗</a>
</h4>
This workflow isn't perfect but thats the big reason for the plugin. This whole process keeps you from staring at the chat stream and back into the IDE as your work. If you see progress not being made or an issue has comments that move it to blocked you can address it there and then just kick the orchestration. The goal is we have boring work we don't wanna do and we let the robot do it while we act on the interesting parts.</p>
But sometimes it just hangs, haven't solved it yet. When this happens we are always recoverable. Claude subprompts have a 10 minute timeout so even orphaned they will be killed. You just start orchestration again on a clear context and things recover without your attention.</p>


BATS - Testing Bash Like You Mean It
Sun, 08 Feb 2026 00:00:00 +0000
Bash has a reputation problem.</p>
It's the language people write when they can't figure out how to do something in a "real" language. It's duct tape. It's the thing that holds your CI/CD together with set -e</code> and crossed fingers. Nobody tests bash scripts because, well, how would you even do that?</p>
This is bullshit.</p>
Bash is core to every Unix-like operating system. It's the glue between tools. It's the orchestration layer for distributed systems. If you're building CLI tools meant to be composed, piped, and chained together—bash isn't a workaround, it's the runtime.</p>
I built a distributed job queue CLI. The components were solid Go with good unit tests. But unit tests couldn't answer the real question: does this thing actually work when you're using it the way it's meant to be used? In bash. From the command line. With real files and processes and timing issues.</p>
BATS—the Bash Automated Testing System—turned out to be the answer. Not Cucumber. Not end-to-end frameworks that spawn browsers. BATS. Because if your tool lives in bash, your integration tests should too.</p>
Here's how to use it.</p>
What BATS Actually Is 🔗</a>
</h2>
BATS is a TAP-compliant testing framework for bash scripts. It runs tests, reports results, and provides assertion helpers that don't make you want to throw your keyboard.</p>
Installation 🔗</a>
</h3>
Skip the package managers. Clone the repos directly into your project so everyone gets the same version:</p>
# install-bats-libs.sh
</span>#!/bin/bash -e
</span>
</span>if </span>[ </span>-d </span>"./.test/bats" </span>]</span>; then
</span>  </span>echo </span>"Deleting folder $FOLDER"
</span>  rm -rf </span>"./.test/bats/"
</span>  mkdir -p ./.test/bats
</span>else
</span>  mkdir -p ./.test/bats
</span>fi
</span>
</span>git clone --depth 1 https://github.com/bats-core/bats-core ./.test/bats/bats
</span>rm -rf ./.test/bats/bats/.git
</span>
</span>git clone --depth 1 https://github.com/ztombol/bats-support ./.test/bats/bats-support
</span>rm -rf ./.test/bats/bats-support/.git
</span>
</span>git clone --depth 1 https://github.com/ztombol/bats-assert ./.test/bats/bats-assert
</span>rm -rf ./.test/bats/bats-assert/.git
</span>
</span>git clone --depth 1 https://github.com/jasonkarns/bats-mock.git ./.test/bats/bats-mock
</span>rm -rf ./.test/bats/bats-mock/.git
</span></code></pre>

Bash Note:</strong> [ -d "./.test/bats" ]</code> uses the single-bracket test command (test</code></a>) to check if a directory exists. The -d</code> flag returns true if the path exists and is a directory. Single brackets are POSIX-compliant and work in any shell. The spaces inside the brackets are required—[-d ...]</code> won't work.</p>
</blockquote>
Run it once, commit the .test/bats</code> directory. Now your tests work the same everywhere.</p>
Need to start fresh? Here's the cleanup script:</p>
# clean-bats.sh
</span>#!/bin/bash -e
</span>
</span>if </span>[ </span>-d </span>"./.test/bats" </span>]</span>; then
</span>  </span>echo </span>"Deleting folder $FOLDER"
</span>  rm -rf </span>"./.test/bats/"
</span>fi
</span></code></pre>
This gives you:</p>

bats-core</strong> - The test runner itself</li>
bats-support</strong> - Required dependency for other helpers</li>
bats-assert</strong> - assert_success</code>, assert_output</code>, assert_line</code></li>
bats-mock</strong> - Stubbing external commands</li>
</ul>
Run tests with the local binary:</p>
./.test/bats/bats/bin/bats .test/</span>*</span>.bats
</span></code></pre>
Or add it to your PATH in your test helper (we'll get to that).</p>
Level 1: Basic Command Testing 🔗</a>
</h2>
Start simple. Can your CLI run without exploding?</p>
# .test/basic.bats
</span>#!/usr/bin/env bats
</span>
</span># Load helper libraries
</span>load bats/bats-support/load
</span>load bats/bats-assert/load
</span>
</span>@test </span>"command exists and shows help" </span>{
</span>    run mycli --help
</span>    assert_success
</span>    assert_output --partial </span>"Usage:"
</span>}
</span>
</span>@test </span>"version flag returns version" </span>{
</span>    run mycli --version
</span>    assert_success
</span>    assert_output --regexp </span>'[0-9]+\.[0-9]+\.[0-9]+'
</span>}
</span>
</span>@test </span>"invalid command shows error" </span>{
</span>    run mycli not-a-real-command
</span>    assert_failure
</span>    assert_output --partial </span>"Unknown command"
</span>}
</span></code></pre>
Run it:</p>
bats test/basic.bats
</span></code></pre>
You get TAP output:</p>
 ✓ command exists and shows help
</span> ✓ version flag returns version
</span> ✓ invalid command shows error
</span>
</span>3 tests, 0 failures
</span></code></pre>
Setup and Teardown 🔗</a>
</h3>
Tests need clean state. Use setup()</code> and teardown()</code>:</p>
# .test/workspace.bats
</span>#!/usr/bin/env bats
</span>
</span>load bats/bats-support/load
</span>load bats/bats-assert/load
</span>
</span>setup</span>() {
</span>    </span># Create temporary directory for this test
</span>    TEST_TEMP</span>=</span>$(mktemp -d)
</span>    </span>cd </span>"$TEST_TEMP"
</span>
</span>    </span># Initialize your tool's workspace
</span>    mkdir -p .myapp
</span>}
</span>
</span>teardown</span>() {
</span>    </span># Clean up after test
</span>    </span>cd</span> /
</span>    rm -rf </span>"$TEST_TEMP"
</span>}
</span>
</span>@test </span>"creates job file" </span>{
</span>    run mycli jobs create </span>"Do the thing"
</span>    assert_success
</span>
</span>    </span># Verify file was created in workspace
</span>    </span>[ -</span>f .myapp/queue.jsonl </span>]
</span>}
</span></code></pre>

Bash Note:</strong> [ -f .myapp/queue.jsonl ]</code> uses -f</code> to test if a regular file exists (test</code></a>). In BATS, the test passes if the command returns exit code 0 (true). If the file doesn't exist, the test command returns 1 and BATS marks the test as failed.</p>
</blockquote>
Every test gets a fresh $TEST_TEMP</code>. No pollution between tests. No "but it worked on my machine" because you forgot to clean up.</p>
Assertions That Actually Help 🔗</a>
</h3>
The basic assertions you'll use constantly:</p>
# assertion-cheatsheet.bash (not a runnable file, just reference)
</span>run some-command
</span>
</span># Exit code
</span>assert_success          </span># Exit 0
</span>assert_failure          </span># Exit non-zero
</span>assert_equal $status 2  </span># Specific exit code
</span>
</span># Output
</span>assert_output </span>"exact match"
</span>assert_output --partial </span>"substring"
</span>assert_output --regexp </span>'^[0-9]+$'
</span>
</span># Line-specific (0-indexed)
</span>assert_line --index 0 </span>"First line"
</span>assert_line --partial </span>"appears somewhere"
</span>
</span># Negation
</span>refute_output </span>"should not appear"
</span>refute_line --partial </span>"nope"
</span></code></pre>
This is already more rigorous than most bash scripts get. You're testing real behavior, not mocking function calls.</p>
Level 2: Test Helpers and Mocking 🔗</a>
</h2>
Real CLI tools interact with other tools. They read files. They parse JSON. They have dependencies.</p>
You need test helpers.</p>
Shared Setup in a Test Helper 🔗</a>
</h3>
# .test/test_helper.bash
</span>
</span># Shared test workspace setup
</span>export </span>TEST_WORKSPACE</span>=</span>"${BATS_TEST_TMPDIR}/workspace"
</span>export </span>MOCK_BD</span>=</span>"${BATS_TEST_TMPDIR}/bin/bd"
</span>
</span>setup_workspace</span>() {
</span>    rm -rf </span>"${TEST_WORKSPACE}"
</span>    mkdir -p </span>"${TEST_WORKSPACE}/.myapp"
</span>    mkdir -p </span>"$(dirname "${MOCK_BD}")"
</span>    </span>cd </span>"${TEST_WORKSPACE}"
</span>}
</span>
</span>teardown_workspace</span>() {
</span>    </span>cd</span> /
</span>    rm -rf </span>"${TEST_WORKSPACE}"
</span>}
</span></code></pre>

Bash Note:</strong> ${VAR}</code> and $(cmd)</code> look similar but do completely different things. ${BATS_TEST_TMPDIR}</code> is parameter expansion</a>—it retrieves the value of the variable. $(dirname "${MOCK_BD}")</code> is command substitution</a>—it runs dirname</code> and captures its output. The braces in ${VAR}</code> are optional for simple names ($VAR</code> works too) but required when concatenating: ${VAR}_suffix</code> vs the broken $VAR_suffix</code>.</p>
</blockquote>
# .test/test_helper.bash (continued)
</span>
</span># Mock the 'bd' command that your CLI depends on
</span># Uses heredoc (<<EOF) to write a multi-line script to a file
</span>setup_mock_bd</span>() {
</span>    </span>local </span>issues_json</span>=</span>"$1"
</span>
</span>    cat </span>> </span>"${MOCK_BD}" </span><<EOF
</span>#!/usr/bin/env bash
</span>case "</span>\$</span>1" in
</span>    list)
</span>        cat <<'ISSUES'
</span>${issues_json}
</span>ISSUES
</span>        ;;
</span>    show)
</span>        # Return single issue based on </span>\$</span>2 (issue ID)
</span>        echo '{"id":"'"$2"'","title":"Mock issue","status":"open"}'
</span>        ;;
</span>    *)
</span>        echo "Mock bd: Unknown command </span>\$</span>1" >&2
</span>        exit 1
</span>        ;;
</span>esac
</span>EOF
</span>
</span>    chmod +x </span>"${MOCK_BD}"
</span>    </span>export </span>PATH</span>=</span>"$(dirname "${MOCK_BD}"):${PATH}"
</span>}
</span>
</span># JSON assertion helper
</span>assert_json_field</span>() {
</span>    </span>local </span>json</span>=</span>"$1"
</span>    </span>local </span>field</span>=</span>"$2"
</span>    </span>local </span>expected</span>=</span>"$3"
</span>
</span>    </span>local </span>actual</span>=</span>$(</span>echo </span>"$json" </span>| </span>jq -r "$field")
</span>    </span>[[ </span>"$actual" </span>== </span>"$expected" </span>]] </span>|| </span>{
</span>        </span>echo </span>"Expected ${field}='${expected}', got '${actual}'"
</span>        </span>return</span> 1
</span>    }
</span>}
</span>
</span># File content helpers
</span>assert_file_contains</span>() {
</span>    </span>local </span>file</span>=</span>"$1"
</span>    </span>local </span>expected</span>=</span>"$2"
</span>
</span>    grep -q </span>"$expected" "$file" </span>|| </span>{
</span>        </span>echo </span>"File $file does not contain '$expected'"
</span>        </span>return</span> 1
</span>    }
</span>}
</span></code></pre>

Bash Note:</strong> <<EOF ... EOF</code> is a heredoc (Here Documents</a>)—a way to embed multi-line strings. Variables like ${issues_json}</code> are expanded inside. Use <<'EOF'</code> (quoted delimiter) to prevent expansion when you want literal $</code> characters in the output. The cat > file <<EOF</code> pattern writes the heredoc content to a file.</p>
</blockquote>

Bash Note:</strong> local</code> declares a variable scoped to the current function (local</code></a>). Without local</code>, variables are global and leak into other functions—a common source of test pollution. Always use local</code> for function parameters and temporary values.</p>
</blockquote>

Bash Note:</strong> [[ "$actual" == "$expected" ]]</code> uses double brackets, a bash-specific conditional ([[</code></a>). Unlike single brackets, double brackets don't require quoting variables to prevent word splitting, support pattern matching with ==</code>, and allow &&</code>/||</code> inside the expression. The || { ... }</code> pattern runs the block only if the test fails—a compact way to handle errors without if/else</code>.</p>
</blockquote>
Now your tests can load this:</p>
# .test/sync.bats
</span>#!/usr/bin/env bats
</span>
</span>load bats/bats-support/load
</span>load bats/bats-assert/load
</span>load bats/bats-file/load
</span>load test_helper
</span>
</span>setup</span>() {
</span>    setup_workspace
</span>}
</span>
</span>teardown</span>() {
</span>    teardown_workspace
</span>}
</span>
</span>@test </span>"syncs with bd issues" </span>{
</span>    </span># Setup mock bd command to return fake issues
</span>    local mock_issues=</span>'[
</span>        {"id":"abc-123","title":"Fix the widget","status":"open"},
</span>        {"id":"def-456","title":"Refactor gizmo","status":"done"}
</span>    ]'
</span>
</span>    setup_mock_bd </span>"$mock_issues"
</span>
</span>    </span># Run your CLI that calls 'bd list' internally
</span>    run mycli sync
</span>
</span>    assert_success
</span>    assert_output --partial </span>"Synced 2 issues"
</span>
</span>    </span># Verify the sync created local files
</span>    assert_file_exist </span>"${TEST_WORKSPACE}/.myapp/issues.json"
</span>}
</span></code></pre>
Mocking External Commands 🔗</a>
</h3>
Your CLI probably calls external tools. Git. curl. jq. Whatever.</p>
Mock them:</p>
# .test/test_helper.bash (add to existing file)
</span>setup_mock_git</span>() {
</span>    cat </span>> </span>"${BATS_TEST_TMPDIR}/bin/git" </span><<</span>'</span>EOF</span>'
</span>#!/usr/bin/env bash
</span>case "$1" in
</span>    rev-parse)
</span>        echo "abc123def456"  # Fake commit hash
</span>        ;;
</span>    status)
</span>        echo "On branch main"
</span>        echo "nothing to commit, working tree clean"
</span>        ;;
</span>    *)
</span>        exit 1
</span>        ;;
</span>esac
</span>EOF
</span>    chmod +x </span>"${BATS_TEST_TMPDIR}/bin/git"
</span>    </span>export </span>PATH</span>=</span>"${BATS_TEST_TMPDIR}/bin:${PATH}"
</span>}
</span></code></pre>
Then in your test:</p>
# .test/deploy.bats (excerpt)
</span>@test </span>"records git commit in metadata" </span>{
</span>    setup_mock_git
</span>
</span>    run mycli deploy
</span>
</span>    assert_success
</span>
</span>    </span># Verify it captured the fake commit hash
</span>    local metadata=$(cat .myapp/last-deploy.json)
</span>    assert_json_field </span>"$metadata" ".commit" "abc123def456"
</span>}
</span></code></pre>
Testing JSON Output 🔗</a>
</h3>
CLI tools love JSON. Test it properly:</p>
# .test/json-output.bats (excerpt)
</span>@test </span>"job status returns valid JSON" </span>{
</span>    </span># Create a job first
</span>    run mycli jobs create </span>"Test job"
</span>    assert_success
</span>
</span>    local job_id=$(</span>echo </span>"$output" </span>| </span>jq -r </span>'.job_id'</span>)
</span>
</span>    </span># Query job status
</span>    run mycli jobs show </span>"$job_id"
</span>    assert_success
</span>
</span>    </span># Validate JSON structure
</span>    echo </span>"$output"</span> | jq . > /dev/null || fail </span>"Invalid JSON"
</span>
</span>    </span># Check specific fields
</span>    assert_json_field </span>"$output" ".job_id" "$job_id"
</span>    assert_json_field </span>"$output" ".state" "pending"
</span>    assert_json_field </span>"$output" ".title" "Test job"
</span>}
</span></code></pre>
This is real integration testing. You're not stubbing out JSON parsing—you're testing the actual output your users will see.</p>
Level 3: Background Processes and State Machines 🔗</a>
</h2>
Here's where BATS gets interesting.</p>
Real CLI tools do async things. They wait for conditions. They poll. They recover from failures. They manage state transitions.</p>
Testing Background Processes 🔗</a>
</h3>
Say your CLI has a --wait</code> flag that blocks until a job completes. How do you test that?</p>
# .test/async.bats
</span>#!/usr/bin/env bats
</span>
</span>load bats/bats-support/load
</span>load bats/bats-assert/load
</span>load bats/bats-file/load
</span>load test_helper
</span>
</span>setup</span>() {
</span>    setup_workspace
</span>}
</span>
</span>teardown</span>() {
</span>    teardown_workspace
</span>}
</span>
</span>@test </span>"waits for job completion" </span>{
</span>    </span># Create a pending job directly in the file system
</span>    local job_id=</span>"job-$(date +%s)"
</span>    local pending_job=</span>'{
</span>        "job_id":"'</span>${job_id}</span>'",
</span>        "title":"Background test job",
</span>        "state":"pending",
</span>        "created_at":"'</span>$(date -Iseconds)</span>'"
</span>    }'
</span>
</span>    echo </span>"$pending_job"</span> > </span>"${TEST_WORKSPACE}/.myapp/queue.jsonl"
</span>
</span>    </span># Start the wait command in the background
</span>    mycli jobs show </span>"$job_id"</span> --wait --timeout=10s \
</span>        > </span>"${TEST_WORKSPACE}/output.txt"</span> 2>&1 &
</span>    local wait_pid=$!
</span>
</span>    </span># Give it a moment to start
</span>    sleep 1
</span>
</span>    </span># Simulate job completion by moving it to done state
</span>    local completed_job=</span>'{
</span>        "job_id":"'</span>${job_id}</span>'",
</span>        "title":"Background test job",
</span>        "state":"completed",
</span>        "created_at":"'</span>$(date -Iseconds)</span>'",
</span>        "completed_at":"'</span>$(date -Iseconds)</span>'"
</span>    }'
</span>
</span>    rm -f </span>"${TEST_WORKSPACE}/.myapp/queue.jsonl"
</span>    echo </span>"$completed_job"</span> > </span>"${TEST_WORKSPACE}/.myapp/done.jsonl"
</span>
</span>    </span># Wait for the background process to finish
</span>    wait $wait_pid
</span>    local exit_code=$?
</span>
</span>    </span># Verify it exited successfully
</span>    assert_equal $exit_code 0
</span>
</span>    </span># Check the output
</span>    run cat </span>"${TEST_WORKSPACE}/output.txt"
</span>    assert_output --partial </span>"completed successfully"
</span>}
</span></code></pre>

Bash Note:</strong> The &</code> at the end of a command runs it in the background (Job Control</a>). $!</code> is a special variable containing the PID of the last background process (Special Parameters</a>). The wait</code></a> builtin blocks until the specified PID exits and sets $?</code> to its exit code. This pattern—background a process, do something, then wait for it—is essential for testing async CLI behavior.</p>
</blockquote>

Bash Note:</strong> $(date +%s)</code> uses command substitution (Command Substitution</a>) to capture a command's stdout as a string. The $()</code> syntax is preferred over backticks because it nests cleanly and is easier to read.</p>
</blockquote>
You're testing the actual polling logic, the actual file watching, the actual timeout behavior. Not a mock. Not a stub. The real thing.</p>
Testing Timeout Behavior 🔗</a>
</h3>
What happens when things don't complete?</p>
# .test/async.bats (continued)
</span>@test </span>"wait times out if job never completes" </span>{
</span>    local job_id=</span>"job-timeout-test"
</span>    local pending_job=</span>'{"job_id":"'</span>${job_id}</span>'","state":"pending"}'
</span>
</span>    echo </span>"$pending_job"</span> > </span>"${TEST_WORKSPACE}/.myapp/queue.jsonl"
</span>
</span>    </span># Start wait with short timeout
</span>    mycli jobs show </span>"$job_id"</span> --wait --timeout=2s \
</span>        > </span>"${TEST_WORKSPACE}/output.txt"</span> 2>&1 &
</span>    local wait_pid=$!
</span>
</span>    </span># Don't complete the job - let it timeout
</span>
</span>    wait $wait_pid
</span>    local exit_code=$?
</span>
</span>    </span># Should exit with error
</span>    assert_equal $exit_code 1
</span>
</span>    run cat </span>"${TEST_WORKSPACE}/output.txt"
</span>    assert_output --partial </span>"timeout"
</span>}
</span></code></pre>
Testing State Machine Transitions 🔗</a>
</h3>
Job queues are state machines. Jobs move between states. Some transitions are valid. Some aren't.</p>
# .test/state-machine.bats (excerpt)
</span>@test </span>"prevents invalid state transitions" </span>{
</span>    local job_id=</span>"state-test-job"
</span>
</span>    </span># Create completed job
</span>    local completed_job=</span>'{"job_id":"'</span>${job_id}</span>'","state":"completed"}'
</span>    echo </span>"$completed_job"</span> > </span>"${TEST_WORKSPACE}/.myapp/done.jsonl"
</span>
</span>    </span># Try to start a completed job (invalid transition)
</span>    run mycli jobs start </span>"$job_id"
</span>
</span>    assert_failure
</span>    assert_output --partial </span>"Cannot start job in completed state"
</span>
</span>    </span># Verify job state didn't change
</span>    run mycli jobs show </span>"$job_id"
</span>    assert_json_field </span>"$output" ".state" "completed"
</span>}
</span></code></pre>
Testing Time-Dependent Behavior 🔗</a>
</h3>
The hard part. Jobs with heartbeats. Stale locks. Orphan recovery.</p>
# .test/recovery.bats (excerpt)
</span>@test </span>"recovers orphaned jobs with stale heartbeats" </span>{
</span>    </span># Create job with old heartbeat (2 minutes ago)
</span>    local stale_time=$(date -Iseconds -d </span>'2 minutes ago'</span>)
</span>    local job_id=</span>"orphan-job"
</span>
</span>    local orphan_job=</span>'{
</span>        "job_id":"'</span>${job_id}</span>'",
</span>        "state":"running",
</span>        "started_at":"'</span>${stale_time}</span>'",
</span>        "heartbeat_at":"'</span>${stale_time}</span>'"
</span>    }'
</span>
</span>    echo </span>"$orphan_job"</span> > </span>"${TEST_WORKSPACE}/.myapp/active.jsonl"
</span>
</span>    </span># Run recovery command
</span>    run mycli jobs recover
</span>
</span>    assert_success
</span>    assert_output --partial </span>"Recovered 1 orphaned job"
</span>
</span>    </span># Verify job moved back to queue
</span>    assert_file_exist </span>"${TEST_WORKSPACE}/.myapp/queue.jsonl"
</span>    refute_file_exist </span>"${TEST_WORKSPACE}/.myapp/active.jsonl"
</span>
</span>    </span># Verify job state reset
</span>    run mycli jobs show </span>"$job_id"
</span>    assert_json_field </span>"$output" ".state" "pending"
</span>}
</span></code></pre>
This test manipulates time by creating timestamps in the past. It then verifies that your recovery logic correctly identifies stale jobs and transitions them.</p>
Testing Concurrent Operations 🔗</a>
</h3>
Multiple processes writing to the same files. The nightmare scenario.</p>
# .test/concurrency.bats (excerpt)
</span>@test </span>"handles concurrent job creation" </span>{
</span>    </span># Start 5 job creations in parallel
</span>    for i in {1..5}; do
</span>        mycli jobs create </span>"Concurrent job $i"</span> &
</span>    done
</span>
</span>    </span># Wait for all background processes
</span>    wait
</span>
</span>    </span># Verify all 5 jobs were created
</span>    run mycli jobs list
</span>    assert_success
</span>
</span>    local job_count=$(</span>echo </span>"$output" </span>| </span>jq </span>'. | length'</span>)
</span>    assert_equal </span>"$job_count" "5"
</span>
</span>    </span># Verify no duplicate job IDs
</span>    local unique_ids=$(</span>echo </span>"$output" </span>| </span>jq -r </span>'.[].job_id' </span>| </span>sort </span>| </span>uniq </span>| </span>wc -l)
</span>    assert_equal </span>"$unique_ids" "5"
</span>}
</span></code></pre>
If your CLI uses file locking or atomic writes, this test will catch races.</p>
Running Your Test Suite 🔗</a>
</h2>
# run-tests.sh
</span>#!/usr/bin/env bash
</span>set</span> -euo pipefail
</span>
</span># Run all BATS tests using the local install
</span>./.test/bats/bats/bin/bats .test/</span>*</span>.bats
</span>
</span># Or for more verbose output
</span># ./.test/bats/bats/bin/bats --tap .test/*.bats
</span>
</span># Or with timing
</span># ./.test/bats/bats/bin/bats --formatter tap --timing .test/*.bats
</span></code></pre>
Make it executable:</p>
chmod +x run-tests.sh
</span></code></pre>
CI Integration 🔗</a>
</h3>
If you committed the .test/bats/</code> directory (recommended), CI is trivial:</p>
# .github/workflows/test.yml
</span>name</span>: </span>Tests
</span>
</span>on</span>: [</span>push</span>, </span>pull_request</span>]
</span>
</span>jobs</span>:
</span>  </span>bats</span>:
</span>    </span>runs-on</span>: </span>ubuntu-latest
</span>    </span>steps</span>:
</span>      - </span>uses</span>: </span>actions/checkout@v3
</span>
</span>      - </span>name</span>: </span>Run BATS tests
</span>        </span>run</span>: </span>./.test/bats/bats/bin/bats .test/*.bats
</span></code></pre>
No installation step needed. The test framework is already in your repo.</p>
If you prefer not to commit the bats libraries, run the install script first:</p>
# .github/workflows/test.yml (alternative)
</span>name</span>: </span>Tests
</span>
</span>on</span>: [</span>push</span>, </span>pull_request</span>]
</span>
</span>jobs</span>:
</span>  </span>bats</span>:
</span>    </span>runs-on</span>: </span>ubuntu-latest
</span>    </span>steps</span>:
</span>      - </span>uses</span>: </span>actions/checkout@v3
</span>
</span>      - </span>name</span>: </span>Install BATS
</span>        </span>run</span>: </span>./install-bats-libs.sh
</span>
</span>      - </span>name</span>: </span>Run BATS tests
</span>        </span>run</span>: </span>./.test/bats/bats/bin/bats .test/*.bats
</span></code></pre>
Now every push runs your full integration test suite.</p>
Tips for Keeping Tests Fast 🔗</a>
</h2>
BATS tests are real integration tests. They're slower than unit tests. That's fine. But you don't want them to be slow.</p>
Don't repeat expensive setup:</strong></p>
# .test/expensive-setup.bats (example pattern)
</span>
</span># SLOW - creates workspace every test
</span>setup</span>() {
</span>    setup_workspace
</span>    mycli init  </span># Expensive operation
</span>}
</span>
</span># FAST - use setup_file for one-time setup
</span>setup_file</span>() {
</span>    </span>export </span>SHARED_WORKSPACE</span>=</span>$(mktemp -d)
</span>    </span>cd </span>"$SHARED_WORKSPACE"
</span>    mycli init
</span>}
</span>
</span>teardown_file</span>() {
</span>    rm -rf </span>"$SHARED_WORKSPACE"
</span>}
</span></code></pre>
Use --filter</code> during development:</strong></p>
# Only run tests matching pattern
</span>bats --filter </span>"concurrent"</span> test/</span>*</span>.bats
</span></code></pre>
Parallelize with --jobs</code>:</strong></p>
# Run tests in parallel (requires bats-core >= 1.5.0)
</span>bats --jobs 4 test/</span>*</span>.bats
</span></code></pre>
When Not to Use BATS 🔗</a>
</h2>
BATS is for testing bash scripts and CLI tools. It's not for:</p>

Testing web UIs (use Playwright, Cypress, etc.)</li>
Unit testing Go/Rust/Python code (use your language's test framework)</li>
Load testing (use k6, Locust, etc.)</li>
</ul>
But if you're testing the actual user experience of a CLI tool—the thing someone runs from their terminal—BATS is perfect.</p>
The Point 🔗</a>
</h2>
Bash isn't a toy language. It's not "just scripts." It's the orchestration layer for most of the software infrastructure on the planet.</p>
If you're building CLI tools meant to be composed and chained together, your integration tests should reflect that reality. Test them in the environment they'll actually run: bash, with real files, real processes, real timing.</p>
BATS gives you the structure to do that without losing your mind. Setup and teardown that works. Assertions that read like English. Helpers that let you mock external dependencies without rewriting your entire tool.</p>
Your bash scripts deserve tests. BATS makes it possible.</p>

Further Reading:</strong></p>

BATS Core Documentation</a></li>
bats-assert helpers</a></li>
bats-file helpers</a></li>
Test Anything Protocol (TAP)</a></li>
</ul>


Agentic Patterns: Elements of Reusable Context-Oriented Determinism
Fri, 06 Feb 2026 00:00:00 +0000
Agentic Patterns: Elements of Reusable Context-Oriented Determinism 🔗</a>
</h2>
While not as exhaustive as the title might indicate but aligned with my focus on enforcing as much determinism as possible from any given LLM ala Article let's take a look at exploiting tool using LLMs as a process instead of as a conversation. As I posed in the linked article much of the failures we experience are related to attention and confusion which is the progressive noise we introduce as we try to convince the model to perform an action.</p>
What I describe below are patterns for building A Deterministic Box for Non-Deterministic Engines</a></p>
Chats are an artifact 🔗</a>
</h3>
This behavior of progressing the chat with multiple statements to a solution is merely an artifact of pre-tooluse models. So we the humans needed to interact with moving files and integrating code at each step while testing it became natural to turn interactions into long conversations. Ones that eventually degrade into failure loops, while surely someone has told you to just keep clearing your context and start over.</p>
Since the evolution of tools like functiongemma which provides trainable, simple function calling on commodity hardware we are on the edge of building decision trees for tool oriented expert systems, but that's a topic for a different day. For now the models we have that are effective tool users are too large to be portable and our contract is still text.</p>
Reduction in variability 🔗</a>
</h3>
You may recall from math class that you should avoid deriving new values from derived values until you can prove the quality of the procedure. As any instability in accuracy will grow the inaccuracy of outputs. The same is essentially the normal behavior of long running chats. Since model responses can essentially steer (influence) the decisions of the model in the future of the same context window we can fall into a quality trap.</p>
import </span>anthropic
</span>
</span>client </span>= </span>anthropic.Anthropic()
</span>messages </span>= </span>[]
</span>
</span>while </span>True</span>:
</span>    </span># Get input from you
</span>    user_input </span>= </span>input</span>(</span>"You: "</span>)
</span>
</span>    </span># Put your message in the context
</span>    messages.append({</span>"role"</span>: </span>"user"</span>, </span>"content"</span>: user_input})
</span>
</span>    </span># Send the context to the LLM
</span>    response </span>= </span>client.messages.create(
</span>        model</span>=</span>"claude-sonnet-4-20250514"</span>,
</span>        max_tokens</span>=</span>1024</span>,
</span>        messages</span>=</span>messages,  </span># Full history sent each time
</span>    )
</span>
</span>    </span># LLM response
</span>    assistant_message </span>= </span>response.content[</span>0</span>].text
</span>    
</span>    </span># LLM response added to context
</span>    messages.append({</span>"role"</span>: </span>"assistant"</span>, </span>"content"</span>: assistant_message})
</span>
</span>    </span>print</span>(</span>f</span>"Claude: </span>{assistant_message}</span>\n</span>"</span>)
</span>    
</span>    </span># Loop
</span></code></pre>
As this partial example shows if we send 3 messages there will be 3 responses and our context is 6 messages. Each time we send something new the LLM rereads the entire context not just the last message meaning any derived issues that we or the LLMs predictive invariability adds can pollute the quality of the overall decisions made. There are also unseen patterns due to how training is compressed that leads to non-intuitive work and concept replacement for convoluted examples. For example the LLM will be more accurate if there was more reinforcement during deep learning of that topic and it fails faster when exercising in novel space. If you wanna understand this better go read this book: https://sebastianraschka.com/llms-from-scratch/</p>
KISS the LLM 🔗</a>
</h2>
So the solution is the same as it ever was. Keep things small and focused on a single task. The LLM isn't a person and doesn't think, we are using human language to steer outputs the same way we write function signatures to have enough context to supply information to downstream operations. That doesn't mean we never chat with the bot. It does have a big context window and we can take advantage of that for specific patterns.
Plan-Then-Execute Pattern</p>
As discussed we want to keep context focused when we need a long running session. This is the key to plan-then-execute. Coding agents' system prompts have been biased towards creating implementations which like an eager intern jumps the gun and starts building before understanding. When this happens we find ourselves immediately refactoring the wrong idea. The context becomes polluted with examples of the wrong solution and leads to lower quality outputs.</p>
While some coding agents have a "planning" mode, this is a system prompt hack to try and keep it from producing but I'll admit I have had lower luck because this funnels you towards implementation faster. The solution here is to work with the agent's bias to produce and have it produce research artifacts. It will gladly deep dive into a code-base and provide elegant descriptions of architecture and sequence. This is best performed with a reasoning model.</p>
Kill-Then-Breakdown Pattern 🔗</a>
</h2>
A sub-step of plan then execute requires context flushing. After we have verified the quality of the research we start fresh and have the next agent, preferably a reasoning model, read that research document and we instruct it to break down the work into tasks and provide a planned implementation to each task. Once again we are working towards the models goals of writing code or producing files and we get small snippets of code associated with each task. The plan and breakdown is a token heavy portion of the work stream but since we keep check-pointing with artifacts written in markdown there is a repeatable retention in value. That said context size does play a part in cost so flushing the context and loading a compacted version of the topic does end up saving some cost over the context exploding and the risk of loss during compaction.</p>
Now Execute (Mr. Meeseeks Pattern) 🔗</a>
</h2>
While you can use a single reasoning model to go through each task it has broken down and implement it there is a better way. We can ask the reasoning model to act as an orchestrator that spawns sub-agents of cheaper models for each task in the Mr. Meeseeks pattern. The reasoning model starts a simpler model and passes it the task and expected implementation we just broke down and goes to work on it. For simplicity's sake don't run these operations in parallel yet without some considerations to keep multiple agents from overwriting themselves. As each task is marked completed the sub-agent will be killed and a new one with a fresh context is started.</p>
It's important to remember that the orchestrator gets the output from the subagents so if your development environment produces a lot of noise or if your agents aren't clamping their read size you may run into interesting scenarios where you overflow the coding agents memory. The solution here is to run each sub-agent in a new process instead as a task run by the first coding agent. I am sure you can see how this can expand.</p>
Specification-Driven Agent Development Pattern 🔗</a>
</h2>
Is what we just accomplished. During planning we created a specification from an existing code-base or a set of discussions. Then we captured focused implementation details. Then we did a bunch of tiny implementations. While the more formal nature of spec driven usually stops at the original manifest of "what is this feature going to be" we should take it one step further to actually storing partial facts about implementation. On the consumption side of the coder agent it will be somewhat literal with what it was given but it has to perform integration and resolve writing tests as well as ensure the work fits into existing tests and functionality.</p>
Also given we have this spec we can add extra steps to our workflow. The orchestrator ends up following a very simple workflow and its focus is retained around the same document of compressed knowledge it wrote. This is an important fact because models are specifically expressive, they talk a certain way, which means a model reading what it wrote is less ambiguous than it reading what you wrote. It has enforced patterns from training we can reactivate.</p>
Agent Verifier Pattern (Code Review) 🔗</a>
</h2>
Since we have all these concrete artifacts regarding code and spec and final implementation we can then as our last step ask a small simple agent to just give us a thumbs up or down on it, essentially a code reviewer. Before we determine something is done we allow a new context to observe just the changes and the spec, if it rejects we spawn a new implementer to try again. Then we spawn a new reviewer and review again until it works.</p>
In practice this interaction looks a little something like this:</p>
</p>
During the end of this task the image to be added wasn't correct and the reviewer failed it causing it to loop.</p>
Prompt Writer Agent Pattern 🔗</a>
</h2>
So this doesn't work out of the box but it's pretty easy mock up. The next step is to codify what we send during each phase of execution. For this to work we need to be very explicit. Even though the orchestrator knows the workflow it may forget as it handles agent spawning which leads to the workflow rules not being transferred to the sub agents. We are in a derived value degradation problem again.</p>
We have to help the orchestrator by providing it a template of the actions we want each sub-agent to take. It can fill in the gaps with the task. So before it spawns an agent it reviews the workflow and writes the sub-agent prompt to a file. It then tells the sub-agent to read the file and implement. This provides two benefits to accuracy. Since the orchestrating agent has to keep re-reading the template ala RE2 (Read and Re-read prompting) it retains more attention because it keeps getting repeated in the context. Since it then writes the refined prompt for the agent if we crash or context collapses we can immediately recover by reviewing overall task process and the presence of the prompt files. It is in fact highly durable allowing multiple orchestration concurrently if you have the money.</p>
Additionally, the reviewer will get its own prompt written but it can review the coders prompt when checking for spec compliance.</p>
In Practice 🔗</a>
</h3>
If I align this to Anthropic models:</p>

Orchestrator -> Opus (Reasoning)</li>
Implementer -> Sonnet (Competent)</li>
Reviewer -> Haiku (Simple)</li>
</ul>
I also don't rely as much on markdown files past the very first phase of planning. I move all context with the exception of sub agent prompts into a graph. For that graph I use beads which while it has its flaws enables an approach I call the "Context Graph Pattern" which I will go into in a bit.</p>
What beads essentially is is Jira or Linear but with a outputs that work better for LLMs. Essentially a command line tool that has a help dialog that outputs markdown instructions, which improves comprehension by the LLM. It's a graph because like any issue tracker issues can form chains and comments.</p>
In the previous picture above this comment stream is from a plugin to interact with my graph visually. It permits me to leave comments for the agents or even rewrite a spec on the fly.</p>
Context Graph Pattern 🔗</a>
</h2>
Using a tool that allows me to commit context as a focused structure means I get reproducibility and an audit log. Since beads uses issue IDs as commit names the graph extends into the git history. Code and spec and decision tree can all be one artifact without reading all the files. This keeps our context as tight as possible.</p>
Because the graph is mutable if the first attempt was a complete failure I have two choices:</p>

Provide feedback as a refinement and retry -> Refactor</li>
Rewrite the spec and have the agent pull the previous changes and start over -> Rewrite</li>
</ul>
I can continue to iterate this way at a much lower time cost to me as a developer and since the graph is also able to be committed to a repo and shared with other developers they can do the same.</p>
When we enhance a feature we can include the previous changes either by diff review and spec retrieval from the graph or by explicit linking within the graph itself. There is a portion of this structure that lets you act as Product, Project, and Tech Lead for the given outcomes.</p>
Of course no silver bullet, you will end up a developer for some things in the end don't worry. But this can be guided by a concrete context for yourself when you ask the LLM what it thinks went wrong and you get your hands dirty.</p>
Example 🔗</a>
</h2>
If you wanna see a functional example of this process I have been dog-fooding it for a bit and all the artifacts from the plugin I posted a picture of are over here.</p>
https://git.sr.ht/~ninjapanzer/jetbrains-beads-manager</p>
A majority of this code was written in my absence in a execution loop. This usually gets you about 80% there. I then spend some time filing bug tickets adding clarifications and refinements. There were only 3 actual chat sessions that occurred during integration where I provided some focused behavioral examples and some bulk documentation where it built some new tasks and orchestrated them.</p>
I would call this a mature alpha as it was produced in 1 sitting. Functionality is usable enough that I finished the development only using the plugin. But this isn't showing off If you pull this down and have beads installed you can see my prompts and what an actual context graph looks like.</p>
The point 🔗</a>
</h2>
Is not to replace humans as the engineers but replace the grunt work. That said the pattern is implied by the use case. If I am building silly tools for myself, who cares what the code really looks like. If I am building functionality I have to rely on, I need to put considerably more agency in the matter. I will still offload the grunt work when possible but it still a practice. I would hope my carpenter would cut a few less corners on my cabinets than their own. It's not that we are lazy it's we exercise our agency in a way that is comfortable for us. What we build for others must be of the highest quality, what we build for ourselves needs to meet the need.</p>
I mean who knows what will happen and if greed will win and our work will be meat base robot pooper-scoopers. Until everyone figures it out get more work done and take a few more coffee breaks.</p>


Just Forget About Owning Code
Tue, 03 Feb 2026 00:00:00 +0000

Just Forget About Owning Code 🔗</a>
</h2>
Why keep making versions of the same thing? 🔗</a>
</h3>
So let's think about how LLMs are trained. I have, mostly because I have been reading Build a Large Language Model (From Scratch)</a> and I was reminded of the nature of supervised / deep learning systems and their implication on how models are refined. Let's think about how LLMs got to this point, using this Wash Post article as an idea Destroying and Scanning Books</a>, well it needs stuff to read and according to this article to get some of this volume it cut the bindings off books and scanned them. What the LLM produces is a highly advanced predictive generation of those sources. It's completely true that the model doesn't quite know the source of the information after training and because it's a sophisticated predictive engine it does better when creating something similar to what it trained on.</p>
Ok so let's walk that back a little bit about code and that big engineering dream of generalizing solutions. To this point one thing LLMs do a great job of is creating CLI applications in GO, no surprise there are lots of examples of really good CLIs in GO. Of course some of this can be generalized to other languages and if I walk a few steps from here there is an argument to be made that designing literate CLI APIs is kinda solved. Sweet, as an engineer, I consider this a complete win as most of my work is purposely to offload knowing how to do things cause I have lots to do.</p>
I can recall back in the days in the early web when pagination of post counts was a hard problem, now there is probably a go to framework for every language. Most of us don't really think much about pagination anymore, more we consider the kind of pagination we want and apply the solution.</p>
But as things become more complicated the generalizations are too hard and have too many edge cases, solving this would take more time and money than even some community funded altruism would allow. Just consider Authentication, I have worked in a lot of places, and regardless of the agreed rule, "don't roll your own auth," sure as hell every one of these places has done just that. I can enumerate all the great FOSS auth platforms that could be used and extended, that aren't. Honestly, don't get me started on the nature of buy vs build vs vendor vs OSS, its the stupidest discussion you will ever hear. With LLMs it might even be dumber honestly, but this is the baseline for my argument.</p>
Why isn't the advent of LLMs just the start of fluent FOSS solutions to all the things we repeatedly build, a reduction and concentration of quality. As we all spend money on LLMs reintroducing the wheel and building everything fast and naively we could be defining protocols and refining specs. Where is the moat (the thing that keeps someone else from running in and eating your lunch), well, there never was one, code is essentially valueless. The moat for a software business was the product and the money it costs to just build common implementations that send some data somewhere else. What keeps someone from competing with you is that building software is hosting systems and building software is expensive.</p>
Why keep building bespoke versions of anything. Sure the code is cheap before LLMs cheap innovation comes from encapsulation, if you use the Linux ecosystem as an example. An environment where a majority of the interactions are using tools designed in the 80s.</p>

DOTADIW, or "Do One Thing And Do It Well." - Unix Philosophy</p>
</blockquote>
So we still have to build this stuff but that can be the work in the end. We organize the systems and we build the technologies to act as a host and we orchestrate and we compose whatever tools we need on demand.</p>
Here is my dream, think a package manager like Nix but easier to write that describes some interaction and a general UI that is baseline whatever your operating system is for simplicity. Now consider you want a movie ticket, so you do something like this:</p>
## Iron Lung Ticket Buyer
</span>
</span>search </span>"theater inventory for postal code 1111" </span>=> data
</span>search </span>"Iron Lung" </span>=> data </span>-></span> show_data
</span>get </span>"Paypal" </span>=> payment
</span>get </span>"calendar" </span>=> filter
</span>get </span>"seat chart" </span>=> picker
</span>
</span>compose show_data </span>-></span> filter(</span>"this evening"</span>) => filtered
</span>compose filtered </span>-></span> picker OR </span>select</span>(</span>2</span>)
</span>resolve payment </span>-></span> prompt => tickets
</span></code></pre>
So I don't need AMC to produce a website but maybe they want to, I don't care. What I do want is to figure out what shows for "Iron Lung" are playing and get tickets this evening at my local AMC. I want to execute this structure on my local machine because its pretty simple. I am essentially composing some expert systems to do some things I want. Those systems are packaged and they might do some local LLM work or use NLP (Natural Language Processing) but the act is simple and the theater gets their money the way I wanna pay it. They don't have to build a paypal integration and I get some tickets. But there isn't really a reason this needs to be more complicated than this and I probably don't need a cloud provider to maintain this interaction.</p>
I lost you, but you want this 🔗</a>
</h3>
I know I lost you here because it looks like I have built a programming language and I kinda have but really the syntax doesn't matter so much its instructing the orchestration of a package manager, there is no compilation. Some simple model just walks through these steps and uses modules that provide the interactions you requested. The heavy lifting is all managed by the common interactions.</p>
When I think of enshittification and owning the means of production the LLMs that generate code is a two edged sword, sure a company can produce a lot of features and compete but also a nobody can disrupt that and it gets to a point where you are going to spend all your time making your moat deeper and wider with more code but the number of people building bridges over the moat increases at a rate faster than you can defend.</p>
In the case above Paypal is incentivized to create a module that allows their payment system to adapt to whatever the vendor supports. Either deeply integrated or using a one time credit card, it's now insanely easier for them to build that expert system and they have to compete with Stripe doing the same thing. The model shifts away from them locking in merchant rates but being the chosen consumer brand because they have the best tools or customer satisfaction.</p>
The point is some business will not have a choice they don't have to build APIs anymore the web is the API and anyone can build code to extract that data.</p>
How does this not happen? 🔗</a>
</h3>
I am waiting for the time when the big AI companies start selling the ability to block certain types of code generation or they pass legislation that makes scraping a crime... think about it. We are nearing a case of mutually assured destruction or a human utopia.</p>


Rust Dancing ANSI Banana with Server-Sent Events
Sun, 01 Feb 2026 00:00:00 +0000
Remember that dancing Ruby banana?</strong> 🍌</p>
Well, I couldn't help myself. After building the Ruby version with chunked transfer encoding</a>, I started wondering: what if we explored the other</em> way to stream data to browsers and terminals? Enter the Rust implementation using Server-Sent Events.</p>
Yeah, I rewrote it in Rust. With SSE.</p>
So here's the thing: when you want to stream data from a server to clients, you've got options. My Ruby version uses chunked transfer encoding—basically HTTP/1.1's way of saying "I'm sending you data in pieces, and I'll tell you when each piece ends." But there's another player in town: Server-Sent Events (SSE), which is a proper protocol built on top of chunked encoding for one-way server-to-client streaming.</p>
Why both? Because understanding the difference matters when you're building real streaming applications. Plus, Rust's async ecosystem with Actix-Web makes SSE implementation surprisingly elegant.</p>
The best part? It works with both curl and</em> web browsers. Same endpoint, different experiences. Curl gets raw ANSI animations, browsers get properly formatted SSE streams. One server, two clients, zero compromise.</p>
Want to see how SSE differs from plain chunked encoding? Grab the code at sse-dancing-banana</a> and follow along. Or if you just want to see a banana dance: curl -N http://localhost:8080/live</code></p>
Bottom line: Sometimes the best way to learn a protocol is to make something completely silly with it. And what's sillier than making fruit dance in your terminal?</p>

Hope your terminal's ready for some Rust-powered dancing! 🍌🦀🎵</p>
</p>
DevLog 🔗</a>
</h2>

02 02 2026 🔗</a>
</h3>
SSE vs Chunked Encoding: What's the Difference? 🔗</a>
</h4>
When I built the Ruby version, I used chunked transfer encoding directly. It's HTTP/1.1's mechanism for streaming—you send data in chunks, each prefixed with its size in hex, terminated by a zero-length chunk. Simple, direct, low-level.</p>
But SSE is different. It's a protocol</em> built on top of chunked encoding. Think of chunked encoding as the delivery truck, and SSE as the carefully labeled packages inside. SSE defines a specific text format for events:</p>
data: <your content here>
</span>data: <more content>
</span>
</span></code></pre>
Each event ends with a double newline. You can have multi-line data (prefix each line with data:</code>), event types, IDs for reconnection, even retry hints. It's structured, and browsers have native EventSource</code> API support.</p>
Here's how the Rust code handles both in the same endpoint:</p>
async </span>fn </span>live</span>(req: HttpRequest) -> impl Responder {
</span>    </span>let</span> user_agent </span>=</span> req
</span>        .</span>headers</span>()
</span>        .</span>get</span>(</span>"User-Agent"</span>)
</span>        .</span>and_then</span>(|h| h.</span>to_str</span>().</span>ok</span>())
</span>        .</span>unwrap_or</span>(</span>""</span>);
</span>
</span>    </span>let</span> is_curl </span>=</span> user_agent.</span>contains</span>(</span>"curl"</span>);
</span>
</span>    </span>// ... speed parameter parsing ...
</span>
</span>    </span>let</span> stream </span>= </span>stream::unfold(
</span>        FrameStream { current: </span>0</span>, interval, is_curl },
</span>        </span>move </span>|</span>mut</span> state</span>|</span> async </span>move </span>{
</span>            actix_web::rt::time::sleep(state.interval).await;
</span>            </span>if</span> state.current </span>>= </span>FRAMES</span>.</span>len</span>() {
</span>                state.current </span>= </span>0</span>;
</span>            }
</span>            </span>let</span> frame </span>= </span>FRAMES</span>[state.current];
</span>            </span>let</span> data </span>=</span> state.</span>format_frame_data</span>(frame);
</span>            state.current </span>+= </span>1</span>;
</span>            </span>Some</span>((
</span>                </span>Ok</span>::<</span>_</span>, std::convert::Infallible>(web::Bytes::from(data)),
</span>                state,
</span>            ))
</span>        },
</span>    );
</span>
</span>    HttpResponse::Ok()
</span>        .</span>content_type</span>(</span>"text/event-stream"</span>)
</span>        .</span>streaming</span>(stream)
</span>}
</span></code></pre>
The magic happens in format_frame_data</code>. For curl, we send raw ANSI:</p>
fn </span>format_frame_data</span>(</span>&</span>self, frame: </span>&</span>str</span>) -> String {
</span>    </span>if </span>self.is_curl {
</span>        </span>// Chunked encoding: just send the frame with ANSI clear codes
</span>        format!(</span>"</span>{}{}\n\n</span>"</span>, </span>ANSI_CLEAR</span>, frame)
</span>    } </span>else </span>{
</span>        </span>// SSE: format according to the SSE protocol
</span>        </span>let</span> cleaned </span>= </span>self.</span>strip_ansi</span>(frame);
</span>        </span>let</span> lines: </span>Vec</span><</span>&</span>str</span>> </span>=</span> cleaned.</span>lines</span>().</span>collect</span>();
</span>        </span>let</span> sse_lines: </span>Vec</span><</span>String</span>> </span>=</span> lines
</span>            .</span>iter</span>()
</span>            .</span>map</span>(|l| format!(</span>"data: </span>{}</span>"</span>, l))
</span>            .</span>collect</span>();
</span>        format!(</span>"</span>{}\n\n</span>"</span>, sse_lines.</span>join</span>(</span>"</span>\n</span>"</span>))
</span>    }
</span>}
</span></code></pre>
See the difference? For curl, we're just sending data. For browsers, we're wrapping each line in data:</code> prefixes and preserving the SSE format. The browser's EventSource</code> API automatically parses this.</p>
Why does this matter?</strong></p>

Reconnection</strong>: SSE includes automatic reconnection with Last-Event-ID</code>. Chunked encoding? You're on your own.</li>
Browser Support</strong>: EventSource</code> is built-in. Chunked encoding requires manual fetch()</code> streaming, which is newer and less supported.</li>
Event Types</strong>: SSE lets you send different event types on the same stream. Chunked encoding is just bytes.</li>
Simplicity</strong>: For server-to-client streaming, SSE handles the protocol. Chunked encoding is just the transport.</li>
</ol>
When to use what?</strong></p>

Chunked Encoding</strong>: When you need low-level control, binary data, or don't care about browser niceties. Think raw terminal streaming, like the Ruby version.</li>
SSE</strong>: When you want browser compatibility, automatic reconnection, structured events, or you're building a real-time notification system.</li>
</ul>
For this project, SSE won because I wanted both curl and</em> browser support without writing separate endpoints.</p>
</div>

02 02 2026 1 🔗</a>
</h3>
Rust's Async Streams: The Good Parts 🔗</a>
</h4>
Coming from Ruby's Sinatra with its simple stream</code> block, I expected Rust to be painful. It wasn't.</p>
Actix-Web's streaming response is built on Rust's Stream</code> trait, which is like an async iterator. You create something that implements Stream</code>, and the framework handles the rest:</p>
struct </span>FrameStream {
</span>    current: </span>usize</span>,
</span>    interval: Duration,
</span>    is_curl: </span>bool</span>,
</span>}
</span>
</span>impl </span>Stream </span>for </span>FrameStream {
</span>    </span>type </span>Item </span>= </span>Result</span><web::Bytes, std::convert::Infallible>;
</span>
</span>    </span>fn </span>poll_next</span>(</span>mut </span>self: Pin<</span>&</span>mut </span>Self</span>>, _cx: </span>&</span>mut </span>Context<'</span>_</span>>)
</span>        -> Poll<</span>Option</span><</span>Self::</span>Item>>
</span>    {
</span>        </span>if </span>self.current </span>>= </span>FRAMES</span>.</span>len</span>() {
</span>            self.current </span>= </span>0</span>;
</span>        }
</span>        </span>let</span> frame </span>= </span>FRAMES</span>[self.current];
</span>        </span>let</span> data </span>= </span>self.</span>format_frame_data</span>(frame);
</span>        self.current </span>+= </span>1</span>;
</span>        Poll::Ready(</span>Some</span>(</span>Ok</span>(web::Bytes::from(data))))
</span>    }
</span>}
</span></code></pre>
But I took a shortcut. Instead of implementing Stream</code> manually, I used stream::unfold</code>, which is like reduce</code> but for streams:</p>
let</span> stream </span>= </span>stream::unfold(
</span>    FrameStream { current: </span>0</span>, interval, is_curl },
</span>    </span>move </span>|</span>mut</span> state</span>|</span> async </span>move </span>{
</span>        actix_web::rt::time::sleep(state.interval).await;
</span>        </span>// ... produce next item ...
</span>        </span>Some</span>((</span>Ok</span>(web::Bytes::from(data)), state))
</span>    },
</span>);
</span></code></pre>
The state (FrameStream</code>) gets passed into the async block, which produces the next item and returns the updated state. Rinse, repeat, stream forever. It's elegant once you get past the types.</p>
The Rust Tax</strong>: You pay upfront in type signatures (Result<web::Bytes, std::convert::Infallible></code> for an infallible stream?), but you get safety and zero-cost abstractions. No runtime overhead for this streaming abstraction—it compiles down to a state machine.</p>
The Ruby Comparison</strong>: In Ruby's Sinatra, I did this:</p>
stream(</span>:keep_open</span>) </span>do </span>|out|
</span>  </span>loop </span>do
</span>    out </span><<</span> render_frame
</span>    </span>sleep 0.1
</span>  </span>end
</span>end
</span></code></pre>
Simple, but you're managing the loop and sleep manually. Rust's stream::unfold</code> encodes that pattern into the type system. More verbose, but impossible to accidentally block the runtime or leak resources.</p>
</div>

01 02 2026 🔗</a>
</h3>
Compile-Time Frame Embedding 🔗</a>
</h4>
One detail I'm proud of: the frames are embedded at compile time using include_str!</code>:</p>
const </span>FRAMES</span>: [</span>&</span>str</span>; </span>8</span>] </span>= </span>[
</span>    include_str!(</span>"../../assets/frames/frame0.txt"</span>),
</span>    include_str!(</span>"../../assets/frames/frame1.txt"</span>),
</span>    include_str!(</span>"../../assets/frames/frame2.txt"</span>),
</span>    include_str!(</span>"../../assets/frames/frame3.txt"</span>),
</span>    include_str!(</span>"../../assets/frames/frame4.txt"</span>),
</span>    include_str!(</span>"../../assets/frames/frame5.txt"</span>),
</span>    include_str!(</span>"../../assets/frames/frame6.txt"</span>),
</span>    include_str!(</span>"../../assets/frames/frame7.txt"</span>),
</span>];
</span></code></pre>
No runtime file I/O. No error handling for missing files in production. The frames are literally part of the compiled binary, stored in the .rodata</code> section. If the files don't exist at compile time, the build fails. Hard fail at compile time beats mysterious runtime errors.</p>
In Ruby, I loaded frames at runtime:</p>
frames </span>= </span>Dir</span>.glob(</span>"ascii_frames/*.txt"</span>).sort.map { |f| </span>File</span>.read(f) }
</span></code></pre>
This works, but it's runtime overhead, potential I/O errors, and requires the filesystem to be available. For a simple animation, compile-time embedding is perfect.</p>
Trade-off</strong>: Binary size increases by ~8 text files. For a banana animation, I'll take it.</p>
</div>

01 02 2026 1 🔗</a>
</h3>
Nix for Rust: Less Painful Than Ruby 🔗</a>
</h4>
After fighting Nix for the Ruby version's gem dependencies, Rust was refreshing:</p>
outputs </span>=</span> { self</span>, </span>nixpkgs</span>, ... </span>}:
</span>  </span>let
</span>    </span>system </span>= </span>"x86_64-linux"</span>;
</span>    </span>pkgs </span>= </span>import </span>nixpkgs { </span>inherit </span>system</span>; };
</span>  </span>in </span>{
</span>    </span>devShells</span>.${system}.</span>default </span>= </span>pkgs</span>.</span>mkShell {
</span>      </span>buildInputs </span>= </span>with </span>pkgs; [
</span>        rustc
</span>        cargo
</span>        rust-analyzer
</span>      ];
</span>    };
</span>  }</span>;</span>
</span></code></pre>
That's it. Cargo handles dependencies via Cargo.lock</code>, which Nix respects. No gemset.nix translation layer, no bundlerEnv complexity. Rust's deterministic builds align perfectly with Nix's philosophy.</p>
For production, I'd add pkgs.buildRustPackage</code>, but for local dev? This simple shell is all you need.</p>
The Rust ecosystem's commitment to reproducible builds (via Cargo.lock) makes Nix integration almost trivial. Ruby's dynamic nature fights Nix at every turn. This is one of those moments where Rust's compile-time philosophy pays dividends.</p>
</div>


A Deterministic Box for Non-Deterministic Engines
Tue, 27 Jan 2026 00:00:00 +0000
The Nature of Non-Determinism with LLMs 🔗</a>
</h2>
So you may have heard of weights, biases, and temperature when LLMs are described. For the uninitiated: weights and biases are the core parameters learned during training that encode the model's knowledge, while temperature is an inference-time parameter that controls how much variance appears in the model's outputs. Higher temperature means more randomness in token selection; lower temperature means more deterministic responses. It's exactly this temperature parameter that ensures the model will respond with some variance for the same input. So that's clearly this non-determinism which flies in the face of the normal expectation of computers, but it's this that also provides some of the nuance in token prediction that makes the LLM work so it's easy to identify this as an Architectural Trade-Off</strong> and not necessarily a Detractor</strong>. So hoping that provides some grounding let's talk about how to make good use of this engine of... making shit up.</p>
Making Shit Up 🔗</a>
</h3>
Yep, so that's not a tradeoff, it's a flaw, one we haven't solved yet. When the context is ambiguous the model chooses to do one of two things:</p>

Just pretend it didn't hear what it was asked to do</li>
Make shit up, hallucinations</li>
</ol>
Of course I think the former is not talked about as much as the hallucinations. Not to mention that the hallucinations are harder to detect and protect. Note that hallucinations are actually a separate problem from non-determinism - they're about confidence miscalibration and training data limitations, not temperature variance. Hallucinations can occur even with low temperature settings. But we can take a stab at it with some extra prompting and extra runtimes at the cost of tokens. Don't get too upset this is just the normals of computers, we make a simple thing and it has sharp edges, so we make more things that consume some extra energy to constrain the first.</p>
Usually, these are to solve for the inefficiency of the human communication, but sometimes it's just cause people wanna abuse it. I like to think of Auth as a regular pain point we don't really need but have to have because trust is a hard problem. Most of whats on the web doesn't need centralized authentication but GPG has always been too hard so we made something easier to understand.</p>
What to do? 🔗</a>
</h2>
Ok, back to the question, well I call it micromanagement but that kind of implies that the model and its agents have some kind of human agency, which they don't. Although some of their processes are directly modeled after humans so we can loosely apply some techniques to rein them in.</p>
First, let's talk about context and ambiguity. If you haven't figured this out yet the longer the context the more the model's attention distributes across tokens, reducing precision on individual details - a "lost-in-the-middle" effect where information gets deprioritized. Most of this is your fault because even with your best effort you introduce inconsistencies and other inaccuracies into the conversation. The lesson, clear your context often and especially between phases of your work, aka, planning, building, and verifying. I like to consider this an analogy to writing and editing. Have someone else edit your work or write it and review it a week later to improve objectivity. Thankfully with LLMs their memory is as ephemeral as you like.</p>
So we need a way to turn a goal into a workstream that allows us to actually look away from the model's stream. Some might call this an agentic orchestration but I feel these often sprint from meaningful to overly complicated in a matter of weeks. Especially if you use something like Claude-Code, Codex, or OpenCode all the building tools are there already. So starting from something like Claude-Code we need to teach our main agent interface to better follow some process when working.</p>
Here is an example:</p>
CLAUDE.MD</strong></p>

</span>## </span>Working Style
</span>
</span>When collaborating on this project:
</span>- Check existing files first before suggesting changes
</span>- Ask questions one at a time to refine ideas
</span>- Prefer multiple choice questions when possible
</span>- Focus on understanding: purpose, constraints, success criteria
</span>- Apply YAGNI ruthlessly - remove unnecessary features from all designs
</span>- Present designs in sections and validate each incrementally
</span>- Go back and clarify when something doesn't make sense
</span>
</span>## </span>Deliverables
</span>
</span>- Break down the decisions from collaboration into tasks
</span>- You must use any defined task tracking tools outlined in the Task Tracking section to create tasks falling back to markdown files if nothing is defined
</span>- Create a report for the executiong plan with dependencies mapped
</span>
</span>## </span>Workflow Guidelines
</span>
</span>- Create an epic for each high-level objective
</span>- Create subtasks as a todo chain under the epic
</span>- Write titles as the task to be performed (imperative form)
</span>- Add detailed descriptions with examples of work to be done
</span>- Verify each task before closing
</span>- Log details about failures and retries in ticket descriptions for historical tracking
</span>- When an epic is completed, write a report of the task graph and verify all items were performed
</span></code></pre>
Controlling Core Memories 🔗</a>
</h3>
As I included above Deliverables</em> and Workflow Guidelines</em> we initially want our first pass to be on work breakdown and dependency. This will provide some added benefits the way we will track that work progress though. Often the agent writing code falls victim to the two points above with a couple of variations. Hallucinations in this case are items that just don't work and the remainder is missed features. That's good though because we can track and essentially later interrogate success and failure of the model's execution. Better yet we can finally realize the age old dream that we can repeat a variation of a task in the future more accurately because each replanning is less ambiguous. Good luck doing this with people but with LLMs it's all data.</p>
So memory management moves into tasks, which can be in markdown, Jira via MCP (Model Context Protocol - a standard for connecting AI agents to external tools), or my preference, Beads</a> I don't think there is a lot of big effective differences for me except when we come back to the nature of context size complication introducing confusion.</p>
So beads does for AI what Jira does for humans and yet even as a human I would rather use Beads than Jira. Arguably, the difference is that tools like Beads focus on de-complicating the organization of work, its there for the worker's benefit. Jira on the other hand only benefits the bean counters and the workers just have to suffer so that a very few can complain that the reports it produces are useless.</p>
Sorry, my Jira PTSD is showing... Beads, right Beads lets the coding agent take its task breakdown and put it into a graph with dependencies and epics, these feel meaningless to the agent but it's more about what we get to do with it later. It's easier for me to say to a fresh context, review the epic X and verify its functionality. You'll notice that when it finds something is a failure it usually just tries to fix it but it's also going to record a stream of attempts and what was the final resolution. Resulting in a history of the model's confusion introduced from me or the plan, but when I wanna do something similar I can use the JSONL (JSON Lines format - one JSON object per line) from the beads sync operation to prompt a variation of the task and create a new task breakdown.</p>
Here is a claude partial to explain beads</p>
### </span>Task tracking
</span>
</span>Use 'bd' (beads) for task tracking. Run </span>`bd onboard`</span> to get started.
</span>
</span>#### </span>bd Quick Reference
</span>
</span>```</span>bash
</span># Discovery & Navigation
</span>bd ready              </span># Find available work
</span>bd show </span><</span>id</span>>          </span># View issue details
</span>bd show </span><</span>id</span>> </span>--children  </span># Show issue with subtasks
</span>
</span># Task Management
</span>bd create </span>"<title>"</span> --type epic    </span># Create an epic
</span>bd create </span>"<title>"</span> --parent </span><</span>id</span>>  </span># Create subtask under parent
</span>bd update </span><</span>id</span>> </span>--description </span>"..."  </span># Update description
</span>bd update </span><</span>id</span>> </span>--status in_progress </span># Claim work
</span>bd close </span><</span>id</span>>         </span># Complete work
</span>
</span># Sync & Persistence
</span>bd sync               </span># Sync with git (exports to JSONL)
</span>```
</span>
</span>#### </span>Workflow Guidelines
</span>
</span>- Create an epic for each high-level objective
</span>- Create subtasks as a todo chain under the epic
</span>- Write titles as the task to be performed (imperative form)
</span>- Add detailed descriptions with examples of work to be done
</span>- Verify each task before closing
</span>- Log details about failures and retries in ticket descriptions for historical tracking
</span>- When an epic is completed, write a report of the task graph and verify all items were performed
</span>
</span>#### </span>Displaying Task Graphs
</span>
</span>Use </span>`bd show <epic-id> --children`</span> to display the task hierarchy. For visual reports, create ASCII diagrams showing task dependencies and completion status.
</span></code></pre>
Uniqueness vs Repeatability 🔗</a>
</h2>
This is kind of the funny part of this whole process, the LLM can help with a bespoke task but it doesn't generally improve performance because the context size tends to bias towards failures and you end up having to check its outputs and re-validate anything ambiguous. You may say that you don't need to, but just look at the news, it's the failure mode the AI tools get lambasted on. Of course being an engineer we know that everything is essentially wrong and we are balancing the acceptable amount of wrong we can accept at any given moment.</p>
This of course means that when we can find a process that is refinable to a predictable set of tasks we will end up trying to build some complicated brittle script that can automate the process and here is why building things with computers can be kinda dry. We should let the models handle the fixed set of tasks that need a little flexibility but doesn't offer too much range of opportunities for errors.</p>
Refinement of process from memory is just a strategy but it's one that works quite well since the next agent can read the actions of its predecessor, you can bias it to take the success path and start ignoring it, which is the dream. For years I have been using LLMs and finding myself trapped staring at the console because it prompts me for feedback every couple of seconds building context or I have to endlessly remind it to complete the tasks. Both of these conditions are mostly eliminated.</p>
Deploying to K8s 🔗</a>
</h3>
A concrete example of this is to deploy an application to kubernetes. This is super well documented and there is a ton of good tooling but it's also a highly configurable system. Each enterprise makes its own rules and policies around how containers are described. It can be very tiresome reading charts and chart documentation while bashing your head against a statement like CrashBackoffLoop</em>. It's not like there isn't a way to learn about what's happening but it's a lot of command orchestration, the LLM can collect context of the failure much better since it can read multiple streams concurrently. So I recently deployed flagd</a> (an open-source feature flag daemon) to our cluster and let Claude handle the building of charts. To keep things simple I let it just write the deployment and service artifacts itself. It failed a bunch but eventually determined from feedback of the k8s MCP that we were missing a config file. It mounted a volume and created a config map, 20 minutes later I have a stable instance running. I will need to do some work to make this fit into organizational norms but I am also ready for my PoC at the same time. I also have a log of what worked and what didn't work so when it comes time to deploy something else to our cluster I have a baseline of tasks which will reduce the planning time and token volume.</p>
An example of this completed task output:</p>
Beads Task Report - January 26, 2026
</span>
</span>Epic: Deploy flagd to namespace
</span>
</span>ID: feature-flag-prd-q6w </span>| </span>Status: ✅ CLOSED
</span>
</span>---
</span>Task Graph
</span>
</span>feature-flag-prd-q6w (EPIC) ✅ CLOSED
</span>├── .1 Create flagd Deployment manifest     ✅ CLOSED
</span>├── .2 Create flagd Service manifest        ✅ CLOSED
</span>├── .3 Deploy flagd Deployment to cluster   ✅ CLOSED
</span>├── .4 Deploy flagd Service to cluster      ✅ CLOSED
</span>└── .5 Verify flagd deployment is running   ✅ CLOSED
</span>
</span>---
</span>Task Completion Summary
</span>┌─────┬────────────────────────────┬────────┬───────────────────────────────────────────┐
</span>│ ID  │            Task            │ Status │               Close Reason                │
</span>├─────┼────────────────────────────┼────────┼───────────────────────────────────────────┤
</span>│ .1  │ Create Deployment manifest │ ✅     │ flagd_deploy/k8s/deployment.yaml created  │
</span>├─────┼────────────────────────────┼────────┼───────────────────────────────────────────┤
</span>│ .2  │ Create Service manifest    │ ✅     │ flagd_deploy/k8s/service.yaml created     │
</span>├─────┼────────────────────────────┼────────┼───────────────────────────────────────────┤
</span>│ .3  │ Deploy Deployment          │ ✅     │ Deployment exists in cluster              │
</span>├─────┼────────────────────────────┼────────┼───────────────────────────────────────────┤
</span>│ .4  │ Deploy Service             │ ✅     │ Service exists with ClusterIP 127.0.0.1   │
</span>├─────┼────────────────────────────┼────────┼───────────────────────────────────────────┤
</span>│ .5  │ Verify deployment running  │ ✅     │ Pod running after troubleshooting         │
</span>└─────┴────────────────────────────┴────────┴───────────────────────────────────────────┘
</span>---
</span>Troubleshooting Log (Task .5)
</span>
</span>The verification task encountered issues that required 3 attempts:
</span>
</span>Attempt 1 - CrashLoopBackOff
</span>- Container exited with code 0 after printing help
</span>- Fix: Added args: </span>[</span>start</span>]</span> to deployment
</span>
</span>Attempt 2 - Fatal: no sync implementation set
</span>- flagd requires a flag source (file, gRPC, or HTTP sync)
</span>- Fix: Created ConfigMap with example flags, added volume mount
</span>
</span>Attempt 3 - SUCCESS ✓
</span>- Created ConfigMap: accounts-utility-flagd-service-dev-flags
</span>- Updated deployment with volume mount to /etc/flagd
</span>- Added args: --uri file:/etc/flagd/flags.json
</span>
</span>---
</span>Final Deployment State
</span>┌───────────────┬────────────────────────────────────────────────────┐
</span>│   Component   │                       Value                        │
</span>├───────────────┼────────────────────────────────────────────────────┤
</span>│ Pod           │ flagd-service-dev-87fb5946f-vft4t                  │
</span>├───────────────┼────────────────────────────────────────────────────┤
</span>│ Status        │ 1/1 Running                                        │
</span>├───────────────┼────────────────────────────────────────────────────┤
</span>│ Pod IP        │ 10.10.10.102                                       │
</span>├───────────────┼────────────────────────────────────────────────────┤
</span>│ Service IP    │ 172.10.10.10                                       │
</span>├───────────────┼────────────────────────────────────────────────────┤
</span>│ Node          │ node                                               │
</span>├───────────────┼────────────────────────────────────────────────────┤
</span>│ flagd Version │ v0.13.2                                            │
</span>└───────────────┴────────────────────────────────────────────────────┘
</span>Exposed Ports:
</span>- gRPC: 8013
</span>- HTTP: 8014
</span>- FlagSync: 8015
</span>- OFREP: 8016
</span>
</span>---
</span>Result: 5/5 tasks completed - Epic successfully closed
</span></code></pre>
What next 🔗</a>
</h2>
Here is how I would go about things, start by recording your plans. Maybe take one of my examples and refine it for you and check your experiences. Then install Beads and just manually create tasks and see how the agent interacts. Then go ahead an automate the whole thing but maybe this time we can avoid xkcd:1319</a> but probably not :)</p>


Claude or Clod
Fri, 23 Jan 2026 00:00:00 +0000
First off this might sound like a shitpost, but anecdotally, I chuckle to myself about this all the time when I am vibe coding. Claude is something of a straw tiger here and the title is just for the lolz.</p>
So the last couple of weeks I have been diving back into Claude-Code as my primary tool and away from Junie from Jetbrains. Under the hood I use Claude Sonnet for both and have been using various versions of gemmas and gippities since I stumbled across gpt4all like 3ish years ago.</p>
I have a fun history with this technology. As a college student I naively tried to build this kind of knowledge-based query interface in what I called LDOCS (Large Document Search)—yep, I couldn't figure out acronyms back then either. The idea was to enter all the works of Mark Twain and then ask questions about the TCU, the Twain Creative Universe. It was wide-eyed and it didn't work, but it was enough for my senior thesis. Point is, I've been thinking about this space and what my expectations of it are for quite a while.</p>
Now enters a real thing that isn't some idealistic trash I dreamed up, and I get to use it every day. It's pretty sweet. We all know it.</p>
But is it really ready to work? Maybe. Let's run down a few experiences over the course of a year and my takeaway as a 20-year career veteran.</p>
The Win 🔗</a>
</h2>
I needed testing tools for the Passkey/WebAuthN Related Origin Request spec. Specifically, I needed to validate origin relationships for passkey authentication—arcane stuff involving eTLDs (effective top-level domains) and ccTLDs (country-code top-level domains). The kind of work that makes you squint at RFCs and Chromium source code until your eyes cross.</p>
Consider this: a passkey is designed to affix to a single domain. But enterprises have many domains. This draft spec provisionally allows passkeys to work across a predefined set of origins. Taking it from spec to tool meant understanding browser internals I was too dumb to grok from the RFC alone.</p>
I fed C files from Chromium into the model. "Give me a Go CLI that does this," I said.</p>
It did.</p>
passkey-origin-validator</a></p>
Go is simple. There are tons</em> of examples in the training data. The model nailed the CLI scaffolding—flags, argument parsing, output formatting. Beautiful. It even gave me things I didn't know I wanted, like flags for files vs URLs to test against.</p>
Then I looked at the eTLD logic.</p>
Wrong.</p>
Not catastrophically wrong. Subtly wrong. The kind of wrong that would pass a surface-level review but fail in production when someone tried to authenticate from .co.uk</code> or .com.au</code>. Think about the rules for developmeh.com</code> versus developmeh.co.jp</code>. The model had predicted</em> what eTLD logic should look like based on patterns it had seen. It hadn't understood</em> the problem.</p>

The model doesn't think. It predicts.</p>
</blockquote>
I fixed it myself. Wrote the domain suffix matching logic by hand, validated against the public suffix list, added edge cases the model never considered. The task took about 20% longer than if I'd done it solo from the start.</p>
But the documentation? Solid. The tests? Comprehensive. The CLI help text? Actually helpful.</p>
Tradeoff.</p>
Here's the thing that kept me up at night: if I were less skilled—if I didn't know the problem space intimately—I might not have noticed it didn't really solve the problem. I would've shipped broken code with excellent documentation. Tech debt with a bow on it.</p>
You should be picking up the conflicts now.</p>
The Loss 🔗</a>
</h2>
I wanted to build a WebRTC tunnel to a CGNATed (Carrier-Grade NAT) device. Think: running a server on your phone behind carrier-grade NAT, establishing a peer connection, maintaining a stable tunnel. Something new, something off-standard, something not well-represented in the training corpus.</p>
webrtc-poc</a></p>
The model could write the WebRTC boilerplate. It could scaffold STUN/TURN server connections. It could generate the SDP (Session Description Protocol) offer/answer flow. But when it came time to orchestrate the actual handshake—the delicate dance of ICE candidates and connection state changes—it fell apart. It couldn't figure out how to start servers in the right order to establish the same handshake it did in the PoC.</p>
I spent a lot of money. I got very little success.</p>
"AI is great," I told a friend afterward, "just don't ask it to do WebRTC or anything with a handshake."</p>
He laughed. I didn't.</p>
The reality is the theme of commonality. That's where we should be trying to understand the model's place in our workflows.</p>
The New Junior Dev 🔗</a>
</h2>
Yes.</p>
And definitively, no.</p>
More accurate: they're like any dev the first time on a new project. I've seen seniors newly introduced to a codebase make the same general mistakes the model does. I call them "shortcuts" because it appears they're skipping good process, racing toward the goal so they can go home. Something like Mr. Meeseeks—existence is pain, just finish the task and let me stop existing.</p>
The pitfalls are predictable:</p>

Convoluted business logic (special cases)</li>
Unfocused context (not enough files in the RAG)</li>
Test confusion</li>
</ul>
When complex code is modified, the model tends to focus changes on a single file that appears akin to LoB (Locality of Behavior). But when there's too much abstraction, the model doesn't have a common pattern to predict against and cuts the corner by doing something easier—like changing contracts and moving things to a central location. That's exactly what people do who have a lower quality-to-completion drive. I internalize this as the same motivation hierarchy the model prefers.</p>
Test confusion is my favorite. The model will add a conditional check to force test values only when the test suite is running</em>. It'll detect NODE_ENV === 'test'</code> or check for the presence of a global test flag, then branch the logic. The tests pass. The code is fundamentally broken.</p>

The model is very human and ethically unaffected.</p>
</blockquote>
It doesn't feel bad about lying to the test suite. It doesn't experience shame when it hacks around a problem instead of solving it. It just predicts the next token that makes the error go away.</p>
This should define our trust of its outputs.</p>
I was asked recently by someone I greatly respect:</p>

Paul, you have managed engineering teams before. You know we try not to micromanage people. Should we micromanage the AI though?</p>
</blockquote>
Yes. That's generally the narrative about agents. They require a lot of refinement for anything complex. The folks over at METR have statistics: tasks under 30 minutes complete successfully about 80% of the time. As tasks approach an hour, success drops to 50%.</p>
This tracks with my experience. Short, well-defined, pattern-matching tasks? The model crushes them. Longer tasks requiring sustained context and architectural decisions? Coin flip.</p>
But here's what bothers me about those numbers: we're measuring completion</em>, not correctness</em>.</p>
The Agent Experiment 🔗</a>
</h2>
New experiment: agent orchestration.</p>
I spun up four agents—product agent, PM agent, tech-lead agent, architect agent. Gave them a feature request for adding feature flags, told them to plan it out. They produced a PRD (Product Requirements Document). ADRs (Architecture Decision Records). A technical implementation plan. A work breakdown structure. Jira tickets for rollout phases.</p>
Total time: about three hours.</p>
Total artifacts produced: 62.</p>
How long to validate 62 artifacts?</p>
Here in lies the trap.</p>
Verbosity hides meaning the same way big pull requests hide bad code. You think</em> you're being thorough because there's so much output. You feel</em> productive because the agents generated thousands of words of planning documentation. But reading is slower than writing, and verification is slower than generation.</p>
I stared at those 62 files and felt a familiar dread. The same dread I feel when someone drops a 3,000-line PR in my lap and says "just a few small changes." Your eyes glaze. You skim. You approve. You pray.</p>
The volume itself becomes a kind of argument: look how much I produced</em>. But production isn't value. Production is just... production.</p>
The orchestration itself was surprisingly easy to build. Agents calling agents, passing context, refining outputs. The decomposition into four specialized phases felt right—narrow experts doing narrow work instead of one omniscient assistant hallucinating across domains.</p>
But identifying success? Knowing if the plan was actually good</em>?</p>
That part wasn't easy at all.</p>
I hear all the time that effective LLM use for code gen is about planning everything. Small tasks. Tool construction.</p>
Better to see it like this:</p>

What's best for the model is also what's best for you as a dev. You know, the things that don't seem to save time.</p>
</blockquote>
The practices that don't seem</em> to save time—writing focused functions, documenting intent, structuring code into discrete responsibilities—those are exactly what make AI augmentation work.</p>
The model can't navigate a tangled mess of god objects and hidden dependencies any better than a new human teammate can. But give it a clean interface, a well-defined problem, and examples of the pattern you want? It'll predict something useful.</p>
Generally asking the LLM to do the work is the wrong solution. It's kind of meh at it. But building tools that are small and composable so it can be the orchestration engine? Now you might have something. If the tool is small enough, maybe it can even build it.</p>
How do you make a system that's easy to manage? SRP (Single Responsibility Principle). You build interfaces and contracts that are consistent. Contract first. You focus on composition over inheritance. You keep patterns simple and try to repeat yourself when possible.</p>
Like poetry.</p>
Anyways 🔗</a>
</h2>
So yes its both, a clod and Claude. It depends on the day and the time spent. Its not free work its work where the course parts can achieve less focus for you.</p>
These tools don't think. They predict. Sometimes well enough to be useful. Sometimes not. The only way to find out is to build something and see what breaks.</p>
Failure is a valid outcome. We just have to keep trying.</p>


The AI Diaries
Mon, 19 Jan 2026 00:00:00 +0000
The AI Diaries 🔗</a>
</h2>

As soon as it works, no one calls it AI anymore - John McCarthy</p>
</blockquote>
So I tend to avoid using the term AI but it's sometimes unavoidable. Right now I am being forced to spend considerable time using coding tools. And sometimes I like it, sometimes I think it's a bore, and almost always it wastes some of my time. At a minimum it makes up for all the time it wastes but it always creates more noise than value. I have a lot of anecdotes working in this space so I will land them here, at the edge of obscurity.</p>
DevLog 🔗</a>
</h2>

06 03 2026 🔗</a>
</h2>
During the beginning of the hype cycle for "AI", the thing that turned me off was the noise about everyone having to become a "Product Engineer." It was the concept that the only value software provides is building products to be sold. Disgusting! (Obviously, I know what platform this is and how dumb this opinion is for this tribe)</p>
It took a while and a few books, one specifically that told some stories of Grace Hopper that healed my issues. It was the realization that software and automation was to reduce drudgery not to create revenue. It was the need to reduce drudgery that results in positive outcomes and the reason people would pay for software.</p>
Have your own opinions but I repeatedly seem to re-learn this lesson, my choices should be about my thoughts and those are validated by vetting the thoughts of others.</p>
</div>

01 03 2026 🔗</a>
</h2>
I have been off in a microcosm of building PoCs for things, some of them useful:</p>

Agent Orchestration</a></li>
IDE Tooling</a></li>
</ul>
Some not so much:</p>

From Scratch Semantic Code Search</a></li>
</ul>
But coming back to the real world for a moment I am starting to see how I might be in a bubble, an AI bubble. I have simplified AI as a soft executor. Like a script with smart error handling. It's not conversational, it's imperative and command oriented. I think I am a control freak or maybe it's a matter of perspective scale, but I am preparing my statement to the model with a pre-defined expectation of the outcome. I also express the expected outcome and then explain the direction and then confirm the outcome in my statement. I am not looking for the model to introduce inspiration. I highly question the decision to allow the idea to flow from the model, because it often has some pretty bad ideas.</p>
Its about reducing drudgery, Like I alluded to here</a>. Having something of a long tail in this industry now I have seen the world when the product was hosted in the office on consumer hardware. I did the cowboy coding without version control, just FTP to the server. That's how debugging happened too sometimes. Then we got all full of ourselves, we needed more guardrails, we needed to support more engineers with lower skills, we needed to grow. That sounds like punching-down or retroactive gate-keeping that's not the point. There wasn't time to train people on or off the job. Bootcamps did their best to teach functional skills and get the butts in the seats but they were unable to embed what years of experience also provides. Being good thoughtful people we focused on how to make the work safer for more people and permit more capability with less experience. Then we over generalized, we produced specialization in managing tools and never built up the experience, it was abstracted away at a rate that required it to have more abstraction. So now what we have is a ton of drudgery, that hides its meaning so well it has become hard to use.</p>
Regardless, we poo poo on shell scripts because they are brittle (citation needed). I have worked too many places where I was refused a merge due to the presence of a shell script because I could have written the same thing in Ruby or JavaScript. A language that the humans understood better and purported would be less brittle. Of course the brittleness is in the error handling not the language. So this is where I started thinking about llms as error traps for code, don't write a skill that does work, have the LLM write the code and then a skill to run the code. When the code breaks because of a bad filename or a missing system dependency it doesn't blow up. Instead the model takes over and either mitigates the flaw and follows the "spirit" of the code to get things done or just fixes the code.</p>
Here is the thing, both of these are brittle solutions, one we expect to be brittle eventually, the other will be unexpectedly brittle but self correcting. I know the cheap seats will say, "by unexpectedly you mean it will rm -rf ~/". I mean maybe sure but that's pretty unlikely especially if you don't let it run rm -rf outside of $CWD or just not at all.</p>
TLDR; Here is the point, we are always afraid of something, when it was humans we were afraid of the humans, now its robots and we are elevating the humans to the robots. What if we just all agreed that both are likely to screw everything up and take a chance to pivot. Excessive abstraction didn't fix the problem it didn't make the work easier or faster. It did make it consistent which is the thing to learn. Now we have robots and our purpose is to make them operate consistently. Time to break some rules and climb out of the pit of drudgery.</p>
</div>

22 02 2026 🔗</a>
</h2>

Looking at the task, extracting 42+ fields and refactoring a 2300-line file in a single pass is ambitious. The agent likely ran into too many compilation errors and got stuck.</p>
Let me mark this bead as blocked and continue with the other work. The user can decide how to proceed.</p>
rwad failed again - no commits made. This is a complex refactoring (42+ fields in a 2300-line file). Marking as blocked.</p>
</blockquote>
For context this is the file in question com.beads.monitor.toolwindow.IssueDetailPanel</a> which over the course of about 20 days of development this file represents the main interface for the beads manager.</p>
</p>
There is a lot going on here but really rather mild compared to software I have worked on in the past. For this file only represents the right hand details. Which in the background includes things like partial editing, reloading, and partial refreshes. But there is little I can accept should cause this file to be 2300 lines. What brought us here was the discovery that regressions were introduced at each new feature added. Many would take 5-6 iterations for Claude to solve for. It becaue an enormous time suck.</p>
The learning here is that there must be a pressure valve for refactors. I expect that we need systems that can observe file complexity once again. Like the days of old its probably necessary to guide coding agents to far more strict standards than the average developer. This wild growth is unsettling and in my past when working with younger devs it was tools like flog and flay that helped guide code growth.</p>
As a very senior developer now these things seem natural, complexity is a way of life and as I always say software engineering is change management.</p>
</div>

08 02 2026 🔗</a>
</h2>
I've been tracking beads data on the JetBrains Beads Manager plugin</a> build and the numbers tell the 80/20 story pretty clearly.</p>
Four days, 156 issues closed. Sounds impressive until you look at the breakdown:</p>
        ┃ Features/Epics ┃ Tasks        ┃ Bugs              ┃
</span>────────╋────────────────╋──────────────╋───────────────────╋────
</span>Feb 5   ┃ ▓▓ 5           ┃ ░░░░░░░░ 24  ┃ ████████████ 29   ┃ 59
</span>Feb 6   ┃ ▓▓▓▓ 12        ┃ ░░░ 9        ┃ ████ 12           ┃ 37
</span>Feb 7   ┃ ▓ 4            ┃ ░░░░░░░░░ 34 ┃ ████ 12           ┃ 50
</span>Feb 8   ┃ ▓ 1            ┃ ░░ 5         ┃ █ 3               ┃ 10
</span>────────╋────────────────╋──────────────╋───────────────────╋────
</span>Total   ┃ 22 (14%)       ┃ 72 (46%)     ┃ 56 (36%)          ┃ 156
</span></code></pre>
Day 1 built the thing. 5 features, 24 tasks to wire them up, and immediately 29 bugs. Day 2 added more features - macOS compatibility, settings panels, refresh timers. Day 3 and 4? Chasing bugs and polish. UI stuttering, race conditions, tree selection quirks, scroll position resets.</p>
56 bugs out of 156 total issues. That's 36% of all tickets just fixing what the agents broke while building features. And those bug tickets often took longer - VFS async race conditions, deprecated API replacements, multi-selection state management. The kind of stuff where the agent confidently implements the wrong fix and you're three attempts deep before finding the real problem.</p>
The agents built a working plugin in a day. Then we spent three days making it actually work.</p>
</div>

03 02 2026 🔗</a>
</h2>
This is just a thought process I go through with LLM generated code...</p>

OK, I can produce more code than I reasonably can keep track of in a single session, which means there is always going to be some code I didn't read.</p>
</blockquote>

OK, I can always produce and keep in sync documentation about the code that is produced, ADRs and design docs. But if they are too long no one will read them. But at least there is some consumable record.</p>
</blockquote>
Kinda like a factory maybe stamping widgets, because this model of writing all the code all the time seems a little odd. I should be writing less code and there should be more shared code. If the product is the feature and the speed to market is what matters then the cost for encapsulation should go down. Modern products will end up as composable licensable modules.</p>
This is kind of the path that infrastructure took, so why not product. Think about it, if we can remove the human ego from deciding on a solution then any solution is good as long as it can be wired into the product.</p>
If code gen is expensive it's better to reduce the work and just contribute to open source.</p>
I might have lost you there but hear me out: Just Forget About Owning Code</a></p>
</div>

02 02 2026 🔗</a>
</h2>
On Sunday I spent some more considerable time building something dumb and noticed something interesting. While I had observed this before this was formal confirmation because I was able to encounter the same issue across multiple models. While I don't know what is the common source for coding training data but as a person who makes programs that do specifc non-business tasks it seems clear that all the models I have worked on so far don't know how to make a browser extension. Add it to the list of things like webrtc but in this case understanding Manifest v2 vs v3 is always a challenge. In most cases my usage for LLMs is to help get me past the hump of a new technology, traditionally if I know a technology I write the code myself. I have built a number of extensions with various LLMs and they always get trapped on CSP and manifest considerations. They also don't seem to understand anything about how the browser works outside the spec. An extension has to follow a bunch of rules that are bespoke to the application but these are unknown to the models training it would seem.</p>
But $10 to build an HLS extractor is pretty cool.</p>
</div>

30 01 2026 🔗</a>
</h2>
Success with coding agents is as expected completely bound to the quality of the model used. So much of how the agent works is dependent on the model architecture very little configuration work built for Claude will say work with Quen. But outside of foundation models tool use is quite limited for comodity hardware. Having taken a stab this weekend across a number of different models I can confirm that models focused on a task perform better than generalized foundation models.</p>
A great example of this is a comparison of MiniMax and Qwen2.5 Coder vs Claude code. The tools are so completely similar that it really raised the differences between the models. One of the things that Claude Code has going for it is the user experience, its quite tight. But it also leads to some Apple like resistence. On the other hand Open code as a tool did all the same things sans agent generation skills but being able to switch between models was critical. I would use MiniMax2.5 for coding in one terminal and then Qwen or something smaller on a local machine. It was totally reasonable to have a cloud model doing the heavy lifting and a local model doing code reviews or writing comments.</p>
</div>

28 01 2026 🔗</a>
</h2>
I gotta admit there is one thing about using AI coding tools that continues to be true no matter how much I try and constrain the model's failures I generally get similar results. If I don't know exactly what I want it to do and can provide a complex enough context the results will be that of an "Eager Intern" meaning I will get results that I didn't expect and when there were obvious places where the model should have stopped and asked questions it failed. I suspect that the model architecture was trained to focus more on task completion than task accuracy. I have a few times been able to get various agents to "give up" and tell me to try again. Of those Junie definitely does this and doesn't waste my time. Claude-Code though is too appeasing, it closes tasks without verification even when prompted to verify their work. Even with orchestration of multiple agents with fresh contexts, asking to build an app that isn't a todo-list will fail. This benefits the sale of coding tools, during evaluation it impresses with the ability to construct simple things but falls over when complex solutions are required. When I say complex I mean those that are generally novel or require doing interactions over APIs. It commonly produces boilerplate which I think is by design to influence the numbers for LoC for code generation stats. But insidiously, it is also there to obscure the solution it introduces.</p>
A clear sign of AI code generation is bloat and intentional omissions. As of yet the only way I have found to avoid this omissions is to have the model show its work and put it in the clear view of me. So I can set it on a task and watch its completion, then ask it to review the goals and try again. This clearly sucks and I can introduce tools to guide it away from the problem but that's just a bad tool not something that is going to change the nature of my job. It is on the other hand an insult to my 20 year career and all the juniors that are unable to get a job because there is an assumption that if we just "trust me bro" enough it will work.</p>
</div>

27 01 2026 🔗</a>
</h2>

If you work for a company that laid off all your juniors in the past year, it is unbelievably poor taste to continue posting about the merits of AI and vibe coding on a platform where the majority of folks are currently looking for full-time work and do not want to be beaten to death with constant AI thinkpieces. Where did human-centered go in 2026? Because all I've seen so far from C-suite leaders and middle managers is forgetting how they got to where they are now. - Jen Udan - REF</a></p>
</blockquote>
I have been thinking about it like this... consider some big enterprise makes this commitment, they have to get some financial approval for the act and may have committed to some outputs. Now let's say that AI is golf clubs and we just gave everyone a real nice set followed by, be good at golf by the end of the month. All this hype is just from people who own sporting goods stores. The latest debacle about cursor creating a browser without a human in the loop where it didn't compile and humans were in the loop still can land in the post truth world we live in. If my job was being told things are being accomplished and I get access to a todo-list that tells me my tasks are done it's gunna be real hard to not be attracted to such things.</p>
I get to see the outputs of the C-Suite from time to time. The model tries to do the engineering work for me and guided by a visitor it often misses where the rules matter and where the rules can be bent.</p>
</p>
It's this ^ a very enticing concept. What of course is missed is I have to keep watching the bots work and stop them from looping. I guarantee it will get better but if the need for progress is all we care about maybe we should be thinking back to something simpler. People of Process, if we need to get things done we need to cut the red tape not unroll all the red tape into a ball and then wonder why we can't find anything.</p>
</div>

20 01 2026 🔗</a>
</h2>
This one is more just the fun of working with other engineers and AI. While I will not post the code I was impacted by the size of the rebase it caused and the need for me to rewrite my feature. The code the model wrote only cared about things working. It built 200 line blocks of deeply nested conditional logic into existing functions adding catch clauses for exceptions that mean another service has failed and should not be caught. The telling part is when we reviewed the code with the developer he was unable to explain the why these things existed. It's a noob mistake but it's one that AI tends to promote. The endless "Trust Me Bro," and instead this wasted 6 hours of developer time and 3 days in a feature rewrite.</p>
I know there is a mentality that encapsulation adds to cognitive overhead in humans but it exists because 5 levels of if statements is higher. But what happens when the same code was reviewed by the same model that produced it. The code seems to make sense and without the context of the architecture aka we just focused the changes on a single file we end up with some real new debt.</p>
</div>


The Magic of Stubbing sh
Thu, 09 Oct 2025 00:00:00 +0000
The Magic of Stubbing sh 🔗</a>
</h2>
I really love sh and bash but I often feel alone and I get some regular negativity when I solve a problem with it. I know why too, shell scripts can have a broad level of complexity that has other languages embedded into it. But its not as esoteric as you might think, more another domain we should be comfortable with. One of the ways I learned to deal with unknown domains was to read the tests. Because tests tend to use some common language they are often more literate. Here's the thing, I keep getting people tell me that shell scripts don't have tests, and they are wrong. See I have this trick, its called BATS and I talked about it over here Test Anything Protocol</a> where I showed an example of stubbing helm</code> but that example was not the whole story. Since the BATS framework is itself bash we have all those nasty tools at our disposal to manipulate our subject under test.</p>
Subject Under Test 🔗</a>
</h2>
Boring as it may be the purpose here is to observe and verify the output and side-effects of commands run by the shell. We need to respect this boundary between our scripts and the tests for those scripts. One of the challenges to this is how commands avoid observation like rm</code> mktemp</code>, if my script creates a tempfile and then removes it it’s hard to verify if that step occurred without modifying the subject. Of course we can write traces to &>2</code> using echo but that proves nothing more than the presence of the echo statement. I need to verify the validity of these intermediate steps. In traditional programming languages we have mocks and spies which capture the fundamental flow of the code by interfering with the call sites and through reflection. We can do something similar.</p>
Mocking or Stubbing... Whatever 🔗</a>
</h2>
Now there are BATS mocking libraries and they are a wondrous cornucopia of features but in my experience they don't expose much more than a new way of describing, a DSL, how to intercept and modify interactions. So go learn and use those, but for many normal use cases I wanna show you how to do this by hand and use the existing shell language you already know. In the following example we are going to observe tempfiles so we can keep track of an intermediate state, while exposing debugging information when doing TDD, more on that down the line though.</p>
Example 🔗</a>
</h3>
temp.sh</strong> Subject Under Test</p>
#!/bin/bash -e
</span>
</span>local </span>workspace</span>=</span>$(mktemp -d)
</span>
</span>touch </span>"$workspace/not_temp.sh"
</span>
</span>local </span>first</span>=</span>$(mktemp)
</span>local </span>second</span>=</span>$(mktemp)
</span>
</span>echo </span>"WOW" </span>> </span>$second
</span>
</span>rm $first
</span>rm $second
</span></code></pre>
temp.sh.bats</strong></p>
#!/usr/bin/env bats
</span>
</span>set </span>+x
</span>
</span>bats_require_minimum_version 1.5.0
</span>
</span># Load Bats libraries
</span>load ../../.test/bats/bats-support/load
</span>load ../../.test/bats/bats-assert/load
</span>
</span># Stub rm to capture files deleted
</span>function </span>rm</span>() {
</span>  </span>for</span> arg </span>in </span>"$@"</span>; do
</span>    </span>if </span>[[ </span>"$arg" </span>!=</span> -</span>* </span>]]</span>; then
</span>      cp </span>"$arg" "${TEST_DIRECTORY_RUNNING}/tmp/$(basename "$arg").captured" </span>|| return</span> 0
</span>    </span>fi
</span>  </span>done
</span>  </span>command</span> rm </span>"$@"
</span>}
</span>
</span># Stub mktemp to track temp files for cleanup
</span>function </span>mktemp</span>() {
</span>  </span>local </span>tmp
</span>  </span>if </span>[[ </span>"$1" </span>== </span>"-d" </span>]]</span>; then
</span>    tmp</span>=</span>"${TEST_DIRECTORY_RUNNING}"
</span>  </span>else
</span>    </span>read </span>-r counter </span>< </span>$TEMPS_COUNTER
</span>    ((counter</span>++</span>))
</span>    </span>echo </span>$((counter)) </span>> </span>$TEMPS_COUNTER
</span>    tmp</span>=</span>"${TEST_DIRECTORY_RUNNING}/tmp/bats.${counter}"
</span>    </span>echo </span>"$tmp" </span>>> </span>$TEMPS
</span>  </span>fi
</span>  </span>echo </span>"$tmp"
</span>}
</span>
</span>setup</span>() {
</span>  </span>export </span>TEST_DIRECTORY</span>=</span>"./.tests/res"
</span>  </span>export </span>TEST_DIRECTORY_RUNNING</span>=</span>"./.tests/res_tmp"
</span>  </span>export </span>TEMPS_COUNTER</span>=</span>${TEST_DIRECTORY_RUNNING}/tmp/.counter
</span>  </span>export </span>TEMPS</span>=</span>${TEST_DIRECTORY_RUNNING}/tmp/.temps
</span>  cp -r </span>"${TEST_DIRECTORY}/." "${TEST_DIRECTORY_RUNNING}/"
</span>  mkdir -p </span>"${TEST_DIRECTORY_RUNNING}/tmp"
</span>  </span>export </span>-f mktemp
</span>  </span>export </span>-f rm
</span>
</span>  touch $TEMPS_COUNTER
</span>  touch $TEMPS
</span>  </span>echo</span> 0 </span>> </span>$TEMPS_COUNTER
</span>}
</span>
</span>teardown</span>() {
</span>  </span>for</span> tmp </span>in </span>"${temps[@]}"</span>; do
</span>    </span>command</span> rm -f </span>"$tmp"
</span>  </span>done
</span>
</span>  </span>unset </span>-f mktemp
</span>  </span>unset </span>-f rm
</span>
</span>  </span>command</span> rm -f </span>"$TEMPS_COUNTER"
</span>  </span>command</span> rm -f </span>"$TEMPS"
</span>
</span>  </span>unset</span> TEST_DIRECTORY
</span>  </span>unset</span> TEST_DIRECTORY_RUNNING
</span>  </span>unset</span> TEMPS_COUNTER
</span>  </span>unset</span> TEMPS
</span>}
</span>
</span>@test </span>'test intermediate files' </span>{
</span>	local second_tempfile_expected=</span>"WOW"
</span>  run bash ./.tests/temp.sh
</span>
</span>  </span># note the captured
</span>  local second_tempfile_actual=</span>"$(cat ${TEST_DIRECTORY_RUNNING}/tmp/bats.2.captured)"
</span>  assert_success
</span>
</span>  assert_equal $(cat </span>"$TEMPS_COUNTER"</span>) 2
</span>  assert_equal </span>"$(</span>[ </span>-f $TEST_DIRECTORY_RUNNING/not_temp.sh </span>] </span>&& </span>echo</span> 0 </span>|| </span>echo</span> 1)"</span> 0
</span>  assert_equal $second_tempfile_actual </span>#second_tempfile_expected
</span>  assert_output --regexp </span>'Done'
</span>
</span>  </span># _Note_ The use of `command` which bypasses our function export of `rm` introduced by `export -f rm` this makes sure we use the original command and not our mock.
</span>	command rm -rf </span>"${TEST_DIRECTORY_RUNNING}"
</span>}
</span></code></pre>
Lets explore the mocking... ignoring the directory paths we intercept calls to mktemp and if the commands first argument is -d</code> for directory we inject a static location we control. Otherwise we create a unique file in that directory. When we do this we capture the temp file and the number created so far so we can verify the interfaction later. Both these files can be observed during execution.</p>
# Stub mktemp to track temp files for cleanup
</span>function </span>mktemp</span>() {
</span>  </span>local </span>tmp
</span>  </span>if </span>[[ </span>"$1" </span>== </span>"-d" </span>]]</span>; then
</span>    tmp</span>=</span>"${TEST_DIRECTORY_RUNNING}"
</span>  </span>else
</span>    </span>read </span>-r counter </span>< </span>$TEMPS_COUNTER
</span>    ((counter</span>++</span>))
</span>    </span>echo </span>$((counter)) </span>> </span>$TEMPS_COUNTER
</span>    tmp</span>=</span>"${TEST_DIRECTORY_RUNNING}/tmp/bats.${counter}"
</span>    </span>echo </span>"$tmp" </span>>> </span>$TEMPS
</span>  </span>fi
</span>  </span>echo </span>"$tmp"
</span>}
</span></code></pre>
When we write clean scripts we also clean up after ourselves this good behavior provides a challenge to checking the contents of these intermediate files. Because shell scripts are file system based the most common way for data to make its way between processes is to write and read from the filesystem. But if we are tracing a bug in our code we have to regularly interfere with out subject under test to observe its intermediate steps. But if we capture the rm</code> command we can conditionally retain some of the progress. In this example we capture all the args and if one includes a path we extract the filename, append .captured</code> and copy it to our running directory. Ultimately, even if we don't stub mktemp we can still capture deleted tempfiles this way.</p>
Note</em> The use of command</code> which bypasses our function export of rm</code> introduced by export -f rm</code> makes sure we use the original command and not our mock.</p>
# Stub rm to capture files deleted
</span>function </span>rm</span>() {
</span>  </span>for</span> arg </span>in </span>"$@"</span>; do
</span>    </span>if </span>[[ </span>"$arg" </span>!=</span> -</span>* </span>]]</span>; then
</span>      cp </span>"$arg" "${TEST_DIRECTORY_RUNNING}/tmp/$(basename "$arg").captured" </span>|| return</span> 0
</span>    </span>fi
</span>  </span>done
</span>  </span>command</span> rm </span>"$@"
</span>}
</span></code></pre>
Now lets review the test, first we can do traditional expectation with the assert module following the standard, Given, When, Then structure we love. Let's look at how the When is structured too, because this is bash whichever assertion fails the program will exit there. So note the last line where we clean up the temp directory for the test. By leaving this as the last statement we keep the test artifacts if the test fails. Which enables better TDD, where we write a test that fails and continue to iterate until that test passes, meanwhile the test is also producing trace and debugging information about our work. We can do this with any command though, say we call git diff</code> and we want to verify what we produced. We can intercept any command and have it write a file to our test workspace. Importantly, while not changing the subject under test.</p>
@test </span>'test intermediate files' </span>{
</span>	</span># Given
</span>	local second_tempfile_expected=</span>"WOW"
</span>
</span>  </span># When
</span>  run bash ./.tests/temp.sh
</span>
</span>  </span># Then
</span>  local second_tempfile_actual=</span>"$(cat ${TEST_DIRECTORY_RUNNING}/tmp/bats.2.captured)"
</span>  assert_success
</span>
</span>  assert_equal $(cat </span>"$TEMPS_COUNTER"</span>) 2
</span>  assert_equal </span>"$(</span>[ </span>-f $TEST_DIRECTORY_RUNNING/not_temp.sh </span>] </span>&& </span>echo</span> 0 </span>|| </span>echo</span> 1)"</span> 0
</span>  assert_equal $second_tempfile_actual </span>#second_tempfile_expected
</span>  assert_output --regexp </span>'Done'
</span>
</span>  </span># _Note_ The use of `command` which bypasses our function export of `rm` introduced by `export -f rm` this makes sure we use the original command and not our mock.
</span>	command rm -rf </span>"${TEST_DIRECTORY_RUNNING}"
</span>}
</span></code></pre>
Just Test Things and Be Happy 🔗</a>
</h2>
This is just one dumb example of how to think about your testing and how to build up useful tooling that caters to your work. Now go write some bash and make sure you test it, trust me orchestrating a call to git</code> is 10 times easier than screwing around with some git integration for your language of choice. These tools were meant to work together in the shell and you will be happier just getting things done. Double happy when you can prove it works with a test.</p>
Errata 🔗</a>
</h2>
sh is not bash and vice versa 🔗</a>
</h3>
While not functionally errors, the title of this work should be focused on bash. Since a lot of the sample code are bash-isms especially exported functions</em>.</p>
the sh alias and CI 🔗</a>
</h3>

run sh ./.tests/temp.sh</p>
</blockquote>
sh</code> is often an alias on modern systems and this can have a huge impact when you scripts run in CI or more namely a non-interactive or non-login session. Where you CI might offer an Ubuntu or Alpine Linux image that provides bash</code> as an alias for sh</code> it may use a lighter weight implementation like dash</code> when running your tests. Because we are using features that are explicitly bash we should have our test suite run bash ./.tests/temp.sh</code> as such I have altered the above example accordingly.</p>


Copying Life
Sun, 13 Jul 2025 00:00:00 +0000
Copying life 🔗</a>
</h2>
With a stated unawareness of times prior to my own experiences, but with my present perspective that while times change human nature is very repeatable.
At some inflection point more concern was given to the consideration of others than the self. While this is possibly quite a natural process of aging,
it is also possible it was influence. That pressure that you have to grow into a specific thing, in business it feels like all final destinations are
management. In life maybe it's to drive a lambo or have a luxury lifestyle, full of travel and expensive foods. Logically, you will see a destination
and experience the pressure to drive towards it if the world around you is flowing the same way. This could be a local community influence, family,
friends, or that of other media.</p>
Originality of thought and experimentation is how we build great things, even if we drop the lofty language and ignore the other side of this see-saw
which leads to contrarianism. Copying on the other hand may express a deeper lack of control of ones environment. How does one find something new
without trying? By keeping up with the Joneses, homogeny has always been something that has been forced on us in a surprising way. We laude the
successful eccentric but criticise the awkward. To be clear this definition of awkward is only those who seem to be existing tangential to the norm.
Forgoing that nuance of traditional human hypocrisy there is a clear place where unusual is preferable.</p>
From the perspective of a simpler system that is software, a lot of time is spent defining the nature of "social" interactions with code, ahead of time.
Within each of us is a 3PO that helps us determine how to deal with other people. Oppositionly, the interface is created in realtime and only sometimes
repeated. I would allow the extrapolation that software has it easier and humans take shortcuts by copying.</p>
As a child we learn through mocking and then progress through a rebellous phase to generate self identity. A curious process that flip-flops from very little
identity to an over-abundance. Life then gets complicated a little later and the rally point is to reduce complexity through normalizing with our community.
After you have seen your 10th beige to tope patio home complex a new concern begins, what is everyone doing all day? Next comes the hard question which
might be the trigger for a natural midlife crisis. Similar to that previous transition from mocking to rebellion does this cycle repeat? There is a middle
phase of development where rebellion is once again preferred to recertify our independence from a system that demands considerable conformity. Although,
much of this is self-imposed, the systems around corporate offices and family structures leverage the need for our core needs to be leveraged against our
higher emotional needs.</p>
Consider diversity in the office, a term that rightly has been co-opted in support of underrepresented groups also includes the rest of us. Its only when we
lift together do we all get to share in something better. Those that often reach the higher tiers did so with a considerable amount of conformity. That could be
meeting goals or promoting company ideals but some of us who excel in these places have a personality that allows for those tradeoffs. This isn't saying that
those people were or were not in conflict during that process but, repetition does become habitual, and we find ourselves repeating the dogma after enough time.
You may find yourself having silenced that independent voice that represented your individual spirit. Conformity is like a disease that damages your creative
tissues. The way to measure your interactions with others should be aligned ith the Big 5, of those; Openness to experience, Conscientiousness, and Agreeableness
is how we should measure our success.</p>
Someone can always work harder than you, someone can always eat your lunch, and you may lose your business or job, but your success will be dictated by how you
deal with others. This is its own kind of rebellion against capitalism's identity of humans are exploitable for money. While not a direct avocation of Marxism,
there is something about servant leadership that resonates with it. Although, this may be more about a reduction in obsession with monetary assets and greed
compared to our compassion for our fellow man.</p>
Unfortunately, when we are influenced in non-conversational experiences we also lose some of our agency though reinforcement that should be met with rebellion.
Your days should be spent thinking and expressing, a willingness to learn while trying not to judge others. If you start your days with your own thoughts
you build a barrier against accepting the wrote of others that will invariably be pushed upon you. You should also identify who you will allow to influence you
using the same measures you will use for others. You should refuse those who do not meet them, in a world where dialogs are rarer you must save your time,
control your inputs and expand your outputs.</p>


Sufficient Complexity
Tue, 01 Jul 2025 00:00:00 +0000
Sufficient Complexity and Pipe Herding 🔗</a>
</h2>
I still think a lot of what I do day to day in software aligns well with plumbing. DevOps is like warehouse work, and Product is herding. Before the last batch of product engineers, pipe herders, I recall much conversing about things like pragmatism and simplicity. When you think about the pipes in your home, you hope they aren't too complicated. Complication in pipes leads to pinholes, leaks, and sudden noises from the dark places we dare not look. Plastered over and expected to last the long haul, we forget exactly where each one is until we wanna make an addition, drill a hole, or something goes terribly wrong. Software is often like this too, long forgotten, and sometimes completely unused sections live in the shadows. I really thought this was rather boring in my younger years and creates an interesting condition in the enterprise around scope. Too much scope, and we get exactly what we think we want. Too little scope and we end up with too many pipes. I still believe that much of this was dogmatic rhetoric. A book makes the rounds and is praised, others consume it and take it as cannon, producing an effect that is the same as not knowing, over-knowing. Now, with much of that kind of nonsense falling out of style, we wonder why we can't keep our software from crashing. To be crystal I am not saying that due to a failure to care about craft is the reason this is happening. Craft, for better or worse, is its one kind of over-knowing, consider if through rigorous focus and process one can eliminate mistakes or bad design. I argue it's just a matter of speed, things that go fast tend to have lower survivability. A generalization for sure but its better to relate this to change management over velocity in the mind.</p>
So Sufficient Complexity is the mark where we can say something is done, not to be confused with finished. This feels impossible in the land of building products on the web, but I promise it is still achievable. It also doesn't matter if you are working with a monolith or a microservice, but it is about dependencies at the core. Step 1 is to eliminate the word common</em> from your vernacular, followed by shared</em>. While they may look safe, these are traps, hear to eat your time and sanity. Just like we can have a perfect project layout like here</a>, we can have just the right size of features in a box.</p>
Now here is the most important lesson you will learn, Everything is a File or Folder</strong> depending on your observable distance. This counts for how you organize your project in version control all the way to how you deploy your application containers. Folders and files are what matter and the relationships they play to each other. I feel like the oft overlooked power of all of this is the interface</em> or the interaction pattern</em>. It gives us a fixed view of how something is to be consumed or constructed and provides the most meaning in relation to producing moderately stable software. Consider the following I have a folder full of functions on the left and a folder of consumers on the right, between them, I organize those functions into groups called interfaces. Once two interfaces share the same function, I have created a problem, one that is sometimes unsolvable but often avoidable.</p>

graph LR
    subgraph "Functions"
        F1[Function 1]
        F2[Function 2]
        F3[Function 3]
        F4[Function 4]
        F5[Function 5]
    end
    subgraph "Interfaces"
        I1[Interface A]
        I2[Interface B]
    end
    subgraph "Consumers"
        C1[Consumer X]
        C2[Consumer Y]
    end
    %% Good pattern - clean separation
    F1 --> I1
    F2 --> I1
    F3 --> I2
    F4 --> I2
    F5 --> I2
    I1 --> C1
    I2 --> C2
    %% Problem case - shared function
    F3 -.-> I1
    style F3 fill:#f96,stroke:#333
    style I1 stroke:#f00,stroke-width:2px
    style I2 stroke:#f00,stroke-width:2px
</span>

The diagram above illustrates the concept. On the left, we have a collection of functions (Function 1-5). In the middle, we have interfaces (A and B) that group these functions. On the right, we have consumers (X and Y) that use these interfaces.</p>
The problem occurs when Function 3 is shared between Interface A and Interface B (shown by the dotted line). This creates coupling between the interfaces and can lead to issues when one interface needs to change but can't without affecting the other interface. This is why interface segregation is important - each interface should have a single, focused purpose with its own dedicated functions.</p>
</blockquote>
That's just if we share the same functions across two interfaces. Imagine how the rest of the internet works? So here is where your bugs are coming from most likely. To be clear, there is nothing you can do to avoid it, so lets get the doom and gloom out of the way. Code like pipes work the best when singularly focused. For example a pipe that feeds other pipes isn't a faucet line but a feed line. Maybe it started its life out going from the source to your bathroom faucet, and at some later point you installed a shower. At that point you created an interface, a physical one, and the nature of the pipe changed. At first it interfaced with the faucet and then that line fed an interface which interfaced with the new pipes, those interfaced with the shower and faucet individually. If we were then going to install a washing machine, (a beautiful European concept) in our bathroom, we might realize that the feed line in place doesn't meet the volume our washing machine needs. We will have to run a separate line for our washing machine.</p>
We don't usually make the same decision with software, though, bits are very malleable, and our pipes are scalable with an injection of cash. I like to think of how to deal with coupling the same way I deal with pipes. If my needs cannot be met at the current interface, it's time for a new line maybe that's a new module or a new microservice and it might even copy some of the code from the existing pipe but it doesn't make a dependency on it. Long term we want to create solid permanent things. That are resistant to external change unless acted upon. I know this sounds like heresy and a lot of work but I promise its worth it. You will not end up with a bunch of duplicate code that matters. The things you copy will be boilerplate specific to the cause. The parts you don't copy are the items that you can depend on that don't require you to modify their interface.</p>
Sounds hokey and more of a cry for "hey this way stinks, go do it this other way cause it's different." It's not a new concept, though, because this is the principle of module boundaries. I usually explain this to my team as not Goals</em> and Non Goals</em> but Spiritual Goals</em> if I wanted to draw a circle around a unit if code such that it produced no more and no less than it needed to and met the spirit</em> of its purpose that is what we build. Its not as hand wavy as it sounds, but it does require understanding the scope of the work completely which I'll admit is not something everyone can always do. Arguably its this need to navigate ambiguity which leads more to poor design than inexperience. But I like this more formal term of Sufficient Complexity</em> to make Spiritual Goals</em> less techo hippie. Allow us to continue, a module is sufficiently complex when it provides a new complete boundary of its context. Your ears might be itching because this sounds a lot like Domain Driven Design(DDD), and you would probably be right. But DDD is interested in pathways through a system and is a very top down kind of concept. I, on the overhand am proposing a bottoms up approach. Something that I might slot into Agile or XP where we don't know all the scope before we start, and that's both normal and ok. But as we discover complexity, we promote context boundaries instead of the shortest path to completion.</p>
Examples 🔗</a>
</h4>
Let's explore a couple of simple examples, first the webapp-common lib and then the universal modal dialog.</p>
Webapp-common</strong>
In webapp-common</strong> as the title describes, we are going to configure a number of tools and dependencies that all our webapps share inside a single module. The first question we should ask, after we stop screaming because we successfully forgot the word common</em> from earlier, is does this module describe a clear boundary for behavior? No</strong> not really. If you said yes, that's ok, you may even think the boundary is web apps. Still not wrong but not great either, because this exceeds Sufficient Complexity</strong> how can I imagine this module every being done. Since we will have many web apps and they will have all kinds of responsibilities its likely not every web app will need all the functionality of the webapp-common. This introduces a risk of being a dumping ground of interdependent libraries that over time, event versioned, will slowly start to poison each other. Because these libraries are also commonly shared, this pollution will touch everything.</p>
Whats the solution? Well its always about informing on the pattern through an interface. If this is a Java Springboot project, we would want to introduce bean configurations optional transitive dependencies. Check out this sample project java-no-more-common-lib</a></p>
spring-gradle-example/
</span>├── build.gradle                 </span># Root project build file with common configurations
</span>├── settings.gradle              </span># Project settings file
</span>├── jackson-module/              </span># Jackson module with baseline dependencies and configurations
</span>│   ├── build.gradle             </span># Jackson module build file with Jackson dependencies
</span>│   └── src/
</span>│       └── main/
</span>│           ├── java/
</span>│           │   └── com/example/jackson/config/
</span>│           │       └── JacksonConfig.java  </span># Auto-configured Jackson configuration
</span>│           └── resources/
</span>│               └── META-INF/
</span>│                   └── spring.factories    </span># Auto-configuration registration
</span>└── service-module/              </span># Service module that uses the jackson module
</span>    ├── build.gradle             </span># Service build file that overrides Jackson versions
</span>    └── src/
</span>        └── main/
</span>            └── java/
</span>                └── com/example/service/
</span>                    └── ServiceApplication.java  </span># Spring Boot application
</span>
</span></code></pre>

Here the solution is to avoid common and instead create building blocks, this works with maven or gradle but gradle is a little clearer. The only thing that jackson-module</strong> exposes is a specific configuration for jackson and provides a baseline for the jackson version. For our implementation to be sufficient we can use that baseline or in this case override it with the version we want. Given the version we choose can meet the bean configuration this provides we can apply this as an interface to our web-app. I picked jackson here because they are pretty bad at SEMVER, and I often have features that work in one minor version and not in another due to poor planning or deprecation. The problem is my need to jackson is intermingled with a whole common library usually. I can, of course, override and qualify a new bean as primary but I would rather have a choice if I should include it first. So instead of having spring.factories load this bean, I can @Import it in my application. This way I can control what my application consumes from its library. While that would be foolish for such a simple dependency. There is a real case where jackson and a few other libraries are joined together in a serde (serialization/deserialization) module. Which has a slightly broader context but it supplies common configurations for our serde.</p>
</blockquote>
Regardless, the point here is we often see the simplicity</em> of a common library that sets up all our dependencies as a big win when we start a group of projects. It quickly becomes something of a pain point when too many live together without a spiritual goal or shared contextual binding to make it meaningful to include it. Also, setting up a new project is not where the time in development is lost. So making that to focus on speed is a false hope. We always want to make maintenance of a code base the easiest solution. But moreso we want to create a space where we don't have to touch a codebase for a long time because its done</strong> and thus Sufficiently Complex</strong>.</p>
The Universal Modal</strong></p>
So now lets switch to Javascript and React land, we have a fantastic pattern for module size in this ecosystem and we probably don't have some annoying common library mucking up our sanity. But, we also work in the more visual scope and that means less technical people can fail to understand the nature of our work. They kind of see it as "configuring the browser" and less of a formal data flow and user interaction platform. We have been asked to build a model that can act a little like a slide deck. Starting in one context and then on each step asking if there is another context and providing a new interaction on each slide. Honestly, this sounds pretty cool, and I bet many someones out there have tried to make something like this. The first question we should ask, after we stop screaming because we are building power point as a model, is does this module describe a clear boundary for behavior? Once again No</strong> not really. If you said yes, that's ok, you may even think the boundary is a dynamic modal or an iframe. Still not wrong but not great either, because this exceeds Sufficient Complexity</strong> how can I know all components and interactions a designer might want to sequence before we know they exist?</p>
When we build general purpose components for the case of reuse we are falling into the trap of creating code that over-knows about its purpose. This is the poster child of the cat with 4 normal lets, and a human ear and arm jammed on there we all saw back in school to describe software in the wild. This code will never be done and will continue to acquire features and conditions until it becomes to complex to work with. It will also be a nightmare to test because while a flow of slides can be static in intention they are in fact dynamic and the synchronization of what a test can verify and what we will present will diverge.</p>
This is very much the counter example of the former. Instead of worrying about known competing dependencies we are building something smarter than we are right now that can anticipate the changes of tomorrow with complex design. The time spent building such a tool will never pay off, yes we can make something new nearly instantly but we can't test it instantly. It will also be a constant source of bugs that will also be complicated to verify.</p>
What do we do instead? Well focus on what is Sufficiently Complex</em>. The next question we should be asking is, do we know what we want to build? No</strong> doesn't sound like it. Every time the idea of building the solve everything module comes up just accept that it would be better to know what needs doing now. How we can codify a process that makes repetition require less discovery for the next person and build exactly what we need. I promise when you need to come back and adjust slide 3 of flow 1 you will be much happier that you build layout components so you have uniform styling and that just because you are changing the info link on slide 3 it doesn't automatically bump around all the info links on the other flows. Like all good poetry we need to start with a rhyme. Software is hard to rhyme at first and poets spend a lot of time with words before becoming poets. So an expert can create a successful general system but there is also a lot of bad poetry out there. We build similar components and keep an ear out for the rhymes. Each pass we make we reverberate those sounds until we have something that repeats. Essentially, you don't start with the poem you start with they rhyme and the theme. Which is at its core an interface we make with this kind of component.</p>


Do Devs Really Do DevOps in your Org?
Thu, 26 Jun 2025 00:00:00 +0000
Do Devs Really Do DevOps in your Org? 🔗</a>
</h2>
Recently, I learned the more formal definition of shift-right and shift-left in terms of Agile DevOps. For a brief refresher and for brevity it goes a little something like this:</p>

Shift Right -> Validation and testing in production</li>
Shift Left -> Validation and testing are before production</li>
</ul>
Now that's kind of the intended definition, and it makes perfect sense. In fact, I would probably go, "hell yea, this is just smart." I naturally subscribe to the XP (Extreme Programming) subset of Agile, and that generally means I just pump out tiny slices rapidly that are often not a complete feature. Think of it like writing a book chapter by chapter and having your editor review it as you go. This kind of process means you will miss some things on the first pass but spend less time on discovery</strong>. Not advocating just calling out the causal nature of this decision. So there is a lot of refactoring and revelation through the process that this creates.</p>
Generally, shift left proposes some big claims; the stinkiest are the following: ref</a></p>

Reduces Cost</li>
Improves Collaboration</li>
Faster time to market</li>
</ul>
Holy cow, sign me up, I want cheaper</em>, faster</em>, and better</em>. Even if the claim violates the mere existence of the good</em>, fast</em>, and cheap</em> love triangle. We all smell nothing, but poking fun at the top keyword buyer shill article wrapped as a blog post isn't the goal here. I wanna bring this into focus of practicals and current experiences.</p>
Here is my experience with shift left devops in the wild. As a developer, I am giving access to a CI runner that can execute terraform or cloudformation, doesn't matter. I am given some tools that might add some constraints to that process like a terraform wrapper or a set of CI templates. I am then told I can just build whatever I want. Except:</p>

I have no way to interact directly with the terraform state.</li>
I can't view resources in the cloud providers console, and I cannot manage IAM roles/policies.</li>
</ul>
What I have been granted is the illusion that I can self-service and a new stack of problems to solve through a fog. Because, I can ask for support, but it will be through a ticketing system, and the resolution will take weeks.</p>
While I am skilled in devops, I would say that 80% of my peers are not, and thinking about it, the condition I have described is simply, go use terraform but never actually run terraform. All operations must be performed through an environmental suit. Let's revisit our targets, unrealizable as they may seem. Have we reduced cost? In some ways, yes, by distributing the workload we have reduced the need for specialized staff, we can argue that most devops are routine and probably cookie cutter, so having a group oversee the orgs work streams is a better spend. Does it improve collaboration, probably not. As I have been the member of now a SecOps team, run a DevOps team, lead a L3 Support team, and spent the rest of my career in mines as a developer. When you create a centralized management team, you have a choice, they can directly collaborate, providing deep value and insight as they touch the people they support, or you can make them the slaves of a ticketing system. Since its rather difficult to account for performance and costs associated with staffing without any figures to back them up, you will end up with a ticketing system, probably. Let's be honest, that's not, "Improves Collabortion," that's, "Sets up a call center." So you put some very talented people place them in the complaints department and say, drink from the firehose, k thx bye. This is getting too dark, let's circle back about cost again, so your devs are reaching out to your devops team for support through your ticketing system, and you can track your MTTR as 3 days. Wow, Kudos, we are doing business, good business actions! But had we honestly shifted left, the developer would have probably solved the issue themself had there been some trust and access. It would have taken a couple of hours and possibly a message in a chat channel for a code review. I get it this is argumentative but its also a generalized understatement. MTTR isn't 3 days, it's 3 weeks, and the dev would have solved it in days, not hours, but it's about ratios. That, of course, puts a finger on the last target of delivery speed.</p>
I kinda see this like AI for devs these days, giving away our ownership feels bad because for many of us it means we might be less special. It might mean that critical thinking and planning are the real skills, not butts in seats. Of course this is about devops, not LLMs and code gen, so what's the next step?</p>
First off, let's make the sharing of responsibility for devops not a chore. I need infrastructure and devops people to do the good work, help me pick the right tools, be experts in the cloud or servers. I see that as the divider, knowledge not access</strong>. What you want from your developers is to be able to find resources and use tools; sometimes people and sometimes documentation. Then you want them to be able to evaluate those things before they make their way to production. This means taking away the training wheels, devops teams produce products, not interfaces. There was a time when terraform was the new kid on the block that devops teams provided modules to isolate the patterns they want to repeat. I know this is how I did it when I ran my first DevOps team. We didn't kind any of the sausage and we provided support like any Open Source</em> project would. There was documentation and READMEs, along with tools and tutorials. Most importantly, there was access, developers could run terraform locally, create infra using their developer accounts, and submit pull requests for our modules. Better yet, they could read our modules. We did a lot of the first people on the scene to a new concern; once vetted, it was normalized for repeat use. We saw ourselves as the caretakers, with a motto of "yes and." We used a ticketing system but only internally if something was more than just a conversational solution. We took notes and turned those conversations into FAQs. We did a lot of work in chat, we relied on the fact that our company chat was searchable. We keep discussions in public channels as a backup body of knowledge. We trained devs to talk devops, and what we discovered is or devs loved to learn. Plenty of the time we didn't even need to respond to a chat request, since it was probable someone else had already encountered the issue they would speak up.</p>
I know it sounds like, I have been poo-pooing the farcical benefits of shift left, and I surely am. I just want to remind you when we talk about money and timelines in polite company its considered gauche. Although, we can spend 10 minutes of a 30-minute meeting worried about efficiencies with 6 other humans. That's an hour spent bub, by the way. Instead, we focus on the people and the problem</em>, not the problem is the people</em>. All I am asking is for you to consider this famous quote, "The more you tighten your grip, the more star systems will slip through your fingers." There are orgs that get this kinda thing right and it's not just devops. Its always good to identify if you are in one of those orgs and what you can do to change it. If you wanna shift left, shift left. Otherwise, hire some more smart people and adopt waterfall for your projects, it works great, honestly. If you can't trust your devs to mind the cashbox with regard to infrastructure access, it means you haven't done the upfront work to position your DevOps team as a sheppard of their craft.</p>


Creative Impostor Syndrome
Wed, 25 Jun 2025 00:00:00 +0000
Another Syndrome?! 🔗</a>
</h2>
In reality, I am merely saying I had impostor syndrome, but not the workplace kind, albeit its a bit related. Unlike the kind many of us have early in our careers, my relationship with creativity has much more complicated roots. My definition of what creation is has always been mired in a deranged triangle of, Art</em>, Income</em>, and Moat</em>. The outputs of creation should be classifiable as Art</em> while being consumable to make Income</em>, but without reducing my Moat</em> that protects my ideas. How does one create something from inspiration and spend their time mostly worrying about how to keep others from copying it and stealing my profits? When I say it out load it sounds as insane as it is. This fuels my FOSS rebellion; give away whatever I write because in the end a product is the Moat</em> not the software, which is only a vehicle. I am not saying that work can't be creative but its not a source of inspiration, it is a profitable way to expand my skills, like the opposite of academia in a way. I have spent my time in academia as well, paying fiats to its own kind of feudal system contrasting the word games and politics of the enterprise world. In the enterprise world you have to be rather special or lucky to get involved in novel projects; I think I have been rather successful in the latter.</p>
Somewhere the enterprise weaseled its way into OSS, and a lot of it seemed to turn into a freemium where a focus on Income</em> could be associated with free. I see less and less of work being broadcast back from the enterprise to the community, a project now goes from launch to subscription in a single go. If you spend any time on LinkedIn, your feed will be filled with peddlers promoting vaporware and "Stealth Startups." I am not event just talking about AI. I can recall in a previous role we were looking for a static and dynamic security tool for our Ruby monolith. In case you don't have familiarity with the language, it heavily supports meta-programming (code generates itself) and before there was real AST support, it dramatically resisted analysis like this. Of course, CodeQL started peeking out at this time, and that's the solution I organized a deal for overall, if you don't know go check it out. Anyway, the number of vendors that would pull us into a demo call knowing we were using Ruby and knowing their tool provides nothing more than a Brakeman wrapper would still try to sell us a subscription. The demo would never happen, and it was almost like we should buy their product because they existed, or worse, bought Brakeman and bolted on a UI that I didn't need. Brakeman was eventually forked and is a solid project, it was someone's creative endeavor that got lost to "Income."</p>
You should be picking up the conflicts now.</p>
Creativity, by my definition, is giving some reality where only inspiration was present.</strong> It doesn't mean we need to have a new idea, just new to us at the time.</p>
I used to say that in the job we are just plumbers wiring up dependencies so the toilets flush just right. I still think some of that is true, because there was space for invention long ago, the frameworks were toolboxes and not ecosystems. The latter is good for throughput, take my agency away and give me a thing that works, so I can get something done, is a big win. Its why pipes have standard diameters and schedules, so we can add a valve to an existing line and a tap for our new washing machine. I don't want the plumber to have to think too hard, there is an application and the right medium to move that water. Ultimately, what I want them to be good at is preparing the pipe, brazing/soldering, and fitting selection. Is there artful plumbing, yep, but it's in the detail, Art</em> tends to compete with Function</em>? In the software world this is why we despise cleverness, we are focused on the function, and inventing our own pipe is a challenging endeavor.</p>
So, taking that into account, creativity might seem a little out of scope for my field. So I want to set down some rules for the push-pull. If you are creating for income</em>, your creativity isn't focused on the software its channeled to the human product, Kudos humanitarian. I don't wanna really build a product that feels like it would lead me to talking to too many people. If you are creating for beauty, your creativity isn't focused on the software; its focus is syntax and semantics, Kudos you poet. If you are creating for fame, your creativity isn't focused on the software, it's focused on your brand, Kudos teacher. If you are creating for experience, your creativity isn't focused on the software, it's channeling your need to control your world, Kudos explorer. I have yet to find the answer that results in a focus on the software, but I think it might be somewhere intersecting, simulation and game design. That's not the message, though; what I concluded is just because I don't much find passion in building products or fame doesn't mean I am not creative.</p>
I don't need to call myself an artist to feel whole, but I do want to feel I could joke about it from time to time. I am driven to feel Authentic in my Art</em>, which is probably only something I can truly come to terms with. The problem has always been that I can create something from nothing, and I can do that expressively such that I have caused others to feel emotions in a wide spectrum. But so much of that creation intersects with the need for pragmatism, protection, and income that I could not realize it as a creative act. Problem-solving is the brush, and the code is the canvas, but the production is mechanical. The momentum of creating software for the purpose of money has all but destroyed it as an artistic pursuit. The revelation started when I needed something to write about, I don't want to talk about Passkeys or Nix really, I would like to talk to others about them, but the novelty is gone. I just stopped worrying about if what I was building was a good idea, and the world opened up to me, I built pointless beautiful things that would be only useful anecdotally and I was free. I just dub them, crazy ideas</em>. I have a pile of pipes, can I make a snowman? Or, could I make a form send data directly to a server on my phone without a persistent connection? The value is solely the novelty and what I will learn during the journey. That is the creative act I want to align with, and I have been doing that my whole life. I only failed to see the truth because I kept injecting the requirement that others need to love it, or it needs to be a revenue stream. The techno hippie in me firmly believes that software is the closest thing we have to a hidden force that can shape society without being interfered with by politics and money if we just committed to it.</p>
If you want to peruse some of the crazy ideas, they are over here This Week's Crazy Idea</a></p>
"The immature poet imitates and the mature poet plagiarizes" - T.S. Eliot</p>


This Week's Crazy Idea
Sun, 08 Jun 2025 00:00:00 +0000
This Week's Crazy Idea 🔗</a>
</h1>
In all honestly tech is completely boring. Nothing shakes me to my core anymore. Remember Web-Rings? I do, they sucked but it was a time when peoples ideas were well constrained by context. Back in those days you had to get a host, write you own HTML and CSS, accessability concerns of the time aside, things were ugly and simple. It made up for all the complexity of getting a few paragraphs to show up on someone elses screen. We kinda gave all that up for the global town square, just worry about the paragraph and maybe some photos, the content baby. Like a small business there is some charm in the agency to make something fantastic and utterly fail at it. Its about the effort and the intent, that being self-expression, and the barrier to entry was just heavy enough to keep the boring outa the way.</p>
A price we pay for giving everyone a voice is not everyone has something interesting to say. Its that little pain like wanting to run a newsletter for a thousand people but having a dot-matrix printer. There is a little nagging voice in the back of your head saying, I cant listen to that think print a thousand sheets just so someone can throw it out. But it was just that kind of drive that got you to do it, the creative act of getting someone to react to what your wrote.</p>
Thats what this is all about, the creative act. It's not doing things because they are profitable or even relevant, but because they are interesting or fun.</p>
Though in some ways I am talking about giving every internet connected person a voice but one that they control and not one that promotes clout. There is clearly a value to a central platform for discovery, and in some past world that was the responsibility of the search engine. Now I think this is more about append histories instead of sitemaps and some very clever automation for a federation that provides an index of the internet.</p>
DevLog 🔗</a>
</h2>

14 07 2025 🔗</a>
</h2>
Just build binaries 🔗</a>
</h3>
The more time I spend with LLM Coding Assistance I become aware of how bad the tool is at any significant planning regarding a complete feature. What it does well though is create human interfaces be that actual UI, or CLI / API, they are pretty good. Better than even I might build on my own. Those interfaces are also build with incredible speed in one go. So this brings me to a consideration in how I should build shared tooling. At one point my goal would be to build a PoC in my language of choice the expose a common set of features and wait for others to copy the project into their language of choice. Now this seems foolish since I will, with or without, the LLM have to spend my effort on the actual problem and can nearly divorce myself from the interface I think a project should be focused on defining an interface specification in a language agnostic way that expresses the usage intent and doesn't bother with the implementation details.</p>
The core behavior then can be written in something that exposes an interface through ABI / FFI essentially, compiles to a shared lib. This really isn't anything new and generally the way everything goes is once its gunna be shared someone starts to build a generalized library and a series of wrappers. What I am conjecturing is maybe we should just start there. Build our tool and immediately expose it as a binary interface. This even opens the door for tool using LLMs to directy open and call symbols from our libraries. This is kind of LLMs in the kernel where they can control the underlying operating system. Instead of working on top of it. I mean nothing sounds worse to me than a non-deterministic operating system. But, one that can generalize a command from the underlying C building blocks then means that someone has to build the building blocks.</p>
I have experienced that LLMs break down when dealing with anything that has a clock attached, specifically in my case related to networking. If a process needs to wait for collaborators to connect it can't seem to figure that part of the sequence out.</p>
The result of this idea is something like a CLI framework. Not one that helps layout the commands and flags but that provides CLI features. Like network tools and storage. The real crazy idea relates to k8s. Which I often have exec access to a pod but often don't have enough tools. Debugging some issues in the past I learned I can write files to a pod through my connection and my next thought is why not build a tool that can inject an agent on demand into my pod and then act as a proxy for diagnostics. Copying binaries, building tools and extracting logs into time series dbs by polling log files. All of this without having to monkey with the container image :) then clean up on exit.</p>
Well thats the nature of the project and I want it to be plug-able. Injecting things even using embedded runtimes and binary quines. It feels like a hackers toolkit but dealing with containers are time consuming and given your exec perms let you write to the fs and exec chmod which is clearly in the scope of the container maintainer its just a feature set.</p>
So keep an eye out for that.</p>
</div>

13 07 2025 🔗</a>
</h2>
Everything is a stream 🔗</a>
</h3>
With ths shutdown of Pocket I started thinking about the Krappy Internet project and what kind of noise that would have made. Would anyone read the stream of content? Is streaming the right answer? Probably, not. Tech has grown to a point where it tries to consume our focus and at some level is really just documents over the interent. Some of this is the issue with being fixed to a protocol like HTTP, the rest is sunk cost. I know the concept of a search engine is rather the core interaction process for any library. But that index is on a pull model, I could see a world where that is only a push. I wonder what kind of architecture we would need to build an index for the internet in real time. I think about how this site is built. I complete some written nonsense and then push that to a repository. The result is to render that to a CDN. Thats a majority of the useful content that the internet used to provide. Alternatively, if the content of twitter was much bigger and we treated comments as a natural part of the original article each update would be more meaningful. The validation of spam would of course have to shift to the content provider which is likely going to be a failure but if something like that had a consistent identity then like email we would know what sub content to automatically exclude.</p>
At some level every idea distills back to persistent identity and that then conflicts with the need for anonyminity. There is probably a simple problem here, we don't generally index items without identity. Those naturally become live streams and maybe grouping by event and time like timeseries data like a human telemetry platform is interesting.</p>
</div>


21 06 2025 🔗</a>
</h2>
OpenTelemetry and the question of ditching logs 🔗</a>
</h3>
This morning I had this thought that maybe one of the reasons tracing and Open Telemetry are kind of after thoughts in about 99% of the enterprise projects I work on may be the developer tools gap. Consider this, as a developer many of us only experience tracing "In Production" and only through a rather expensive platform. Is there really a place where tracing is the new debugging. See also those same 99% of enterprise projects also moved to structured logging a while back and to me, the structured log is a trace done poorly. That's an opinion of course but its informed by the fact that most of the time I need distributed correlation more than I need information about the state of the request. When I think of selective logging, I find that I am often making the choice of what not to log where with tracing the only thing I am missing is the context.</p>
Anyways, the point isn't to try an convince anyone to go one way or the other, but the utilization would be greater if more of the tools were used during development. Here is where my ultimately crazy idea comes in. Jager and ZipKin are great but I don't really want to run an ELK (Elasticsearch/Logstash/Kibana) stack on my dev machine. Its a lot of extra setup and its a bit fiddly. I like to think of developer tools as just the basics of a production system. It also makes me think of how we just GDB and other debuggers. We execute them are runtime and use them to debug a specific process, often around a test. When I observe myself and other developers we tend to drop a lot of breakpoints on and around the flaw to identify the code flow that leads to the failing condition. I think of step into and step through functionality of GDB and I want a way to also get detailed trace info at the same time.</p>
Guess what, its not just a crazy idea, its kind of a dumb one. Here is what I learned from the experience. Firstly, I tried to write my own OTEL collector in golang. Not so bad, but processing and visualizing all the traces as a waterfall was a little challenging. My work in progress on Github</a>. So after I learned a whole lot about tracing and Open Telemetry I cam back to the drawing board and though how would this look if it was part of GDB already. The fun fact is that its kinda already there, not in this irrelevant auto instrumented way that I am proposing but in the nature of whats called a "tracepoint". Check it out I put together a sample you can try yourself as long as you have Go installed. Debug Tracing</a>.</p>
So the short answer, yes there should be something easier than Jaeger and ELK locally to explore OTEL, but if you wanna enhance your own development process. Time to get comfortable with some more of the debugger tools that already have valuabl tracing and frame logging built in.</p>
When you are in a tool like Goland or IntelliJ, you can have it add something more akin to logs at tracepoints so you don't have to stop on those or modify your code. Where GDB is powerful is it works on your binaries but language level tools work on the runtime code.</p>
Expect more about a lightweight OTEL tracer for exploring traces locally too.</p>
</div>

15 06 2025 🔗</a>
</h2>
WebRTC and what not to ask AI to do 🔗</a>
</h3>
So to my great surprise I figured that the LLMs would be the right place to funnel my learnings about WebRTC. A technology that has been just outside my vision since I started my career. Why shouldn't I assume that building a trivial implementation with it with LLM support would save me a lot of cognitive overhead, given the long context of such a technology. I was wrong, it seems that as I delve into the underbelly of network topologies away from the chrome of NextJS and CLI tools the bottom falls our of the LLM as well. Its been a consistent thing on my radar that LLMs are only good at the tasks that push products to market but not the work that makes the products work.</p>
Here is an enumeration of things that the LLMs tend to struggle with:</p>

Maintaining complex conditional states -- when logical nesting is needed it tends to get confused and will cycle back and forth breaking, fixing, and re-breaking sequences of operations</li>
Understanding anything about internet topology including TLDs, eTLDs, eTLD+1, and private registries -- while working on the Passkey Origin Validator</a> I was amazed that when I presented these concepts it generally couldn't maintain coherence about the meaning of those terms even though they are rather central to how domains work.</li>
Establishing well documented network handshakes -- Something of a combination of the previous two. There is often a kind of ballet that happens establishing standard and p2p network connections. Since its a set of nested conditionals and requires an understanding of how time works, it struggles.</li>
Dealing with dependency version changes -- My favorite class of failure, if the library changes the name of a package or a constant the LLM will just assume that the library is broken and remove it. What I find the most awkward is since the LLM is interacting with my computer and my project it has access to my dependencies and could search it to try and resolve the change.</li>
</ul>
On the other hand a few items I think it nails every time:</p>

CI/CD pipelines -- Every time I need to run tests on a branch or release on a tag. The LLM handles it in one go.</li>
CLI Frameworks -- Cobra nad Viper, for example an LLM sets up a fantastic set of arguments, config files and considers a lot of the edge cases for comfortable CLI use by humans.</li>
Sequence Diagrams -- When I wanna learn a new technology finding a "basic" diagram for how it works is rather annoying. Theres always lots of specs to read but all the pictures are build dependent on a use-case. For example this one it built for my exploration WebRTC</a></li>
</ul>
So in the end I got some joy from the LLM with WebRTC but I kinda had to treat it like a slow version of myself that is also blind and doesn't like to do a web search. I had it explain in a doc how it should work for itself and then asked it to make a boiler plate project with lots of debugging messages. It struggled a lot even with this guidance and I am sure I could have done the same work myself and gained a deeper understanding if I hadn't asked it to do the work.</p>
As this is part of the bigger Krappy-Internet</a> project I then used this poc to try and fix its previous failed implementation. But clearly there is a conceptual block for how the LLM deals with network debugging that it couldn't take a working version and use it to fix a broken version. I did learn something in the process but if this was an actual work activity I would have been stressed, instead of just killing time between blog posts on a rainy Sunday.</p>
</div>

14 06 2025 🔗</a>
</h2>
WebRTC, NAT Traversals, and American Manufacturing 🔗</a>
</h3>
So my new view of the architecture required to handle something like dynamic home hosting still requires a method for establishing a p2p connection. While this isn't that big of a deal it does require a consistent connection to be publicly accessible somewhere that is not behind a firewall. Which is rather annoying when trying to make this whole thing work on a phone. It is possible to run a webrtc signaling server phones tend to use "Carrier Grade NAT" CGN means there is no port-forwarding so the phone cannot respond to the signaling request to establish a NAT bypass. I think in this case its still possible but I am uncertain how the signaling server will connect the phone to the browser client when its not expecting to make a connection since it might be asleep.</p>
The next pass would be that this isn't really the best solution for the phone. But general processing would be. Since the point of the phones interaction is to allow the owner of the site to have content interactions follow them it might be appropriate to produce a secure append only log and require the sites submission features require an Always On</em> host to handle requests but this is also a good case for a serverless function. While its still on a cloud provider it could also be handled by a DHT. In that case the easy path would be a function which can accept data requests and append them to a signed log on the same site. The phone of course can then poll the log and prompt the user for activity. Since the polling trivial and we don't actually care about Real Time</em> for these interactions its fine.</p>
Probably the reason its a crazy idea in fact is everything about this rolls back a decades worth of nonsense on the internet from realtime streaming connections to dumping things to files and processing them when its convenient. Its more like reading your email, there isn't really a dopamine hit and the only content that grows is those engaged. The final content is text and permanent. The reason for a lot of real-time communications was to give a faithful response to online transactions, but I see that is one of the ways retailers have complicated buying. They want to allocate inventory but if I am selling maguffins from my garage, inventory is really just a nuance. This isn't a solution for the Amazons of the world, its focus is to create a simpler experience for both a business owner or a blogger. I see the time of complicated sites which have sales funnels is more providing the same value it once did. Deep down we wanna find a thing, buy a thing, and know its gunna show up at some point.</p>
Simpler</strong>, is probably very subjective but I can see a mechanism around this course work in this project that makes this all a daemon.</p>
In some way this has become a diatribe on why we can't build anything in America. Its because we assume that all items need to be produced at a scale to buy at a Lowes. I think consumer expectations for products is they should be complicated but I think we should start looking back to the items we find at thrift stores. The modality should start to wander towards, "I want to make a good X" not so much "I need a new solutions for X". But thats just my opinion in reality.</p>
</div>

08 06 2025 🔗</a>
</h2>
Krappy Internet Dynamic Dns and Hosting at Home 🔗</a>
</h3>
I heard recently that the future of the internet is AI. 🤣 ok ok ok... yes if I was investing a bunch of other peoples money in a technology startup that sold AI I would say a lot of crazy things too. I am not so sure the internet is a "thing" anymore that can go away. Its the substrate for communication and while the way we consume the internet may change there will always need to be a source of personal expression. For the age I come from that would have been the blog, the forum, and the comments section. I was there when Twitter started but it wasn't my thing. I am from the days of GeoCities and Anglefire, shared hosting where a hand-full of webpages was enough to give you a voice. All the backgrounds were tessellated poorly, the text was an odd color but the vibes were true. Frequenting final-fantasy fan sites and reading conspiracies about aliens.</p>
I have this wild idea that the answer for the kludge that is "return the means of production to the people!" The forever cry of the decentralized internet, most of us have multiple internet providers and we have computers just burning dead dinosaurs to watch useless noise videos with plenty of capacity to share.</p>
Regarde-moi! What if we just hosted our own content from our own machines in our own houses? What if it didn't really matter when that server was offline? See there are a lot of us and none of us have anything that's interesting to say, which is a kind of magic when you think how much we talk. It's the community not the communication that matters, we need to feel connected, which is exactly the power of the internet.</p>
So here is the project https://git.sr.ht/~ninjapanzer/krappy-dyndns</a></p>
The assumption is that if you own a domain you likely also own some free hosting, really lame html hosting but a small piece of the internet that is yours as long as you pay for it. Kinda like a house and property taxes... but lets not go down that road. So your ISP gives you a ton of bandwidth so you can watch Better Call Saul</em> on Netflix but whats it doing when you aren't binging? Just idling like car insurance... but lets not go down that road either. Point is theres a lot of spare internet for the 50 people a month that are going to look at your website. That's pretty cool to be honest when you think about the number of people you might interact with on the average Friday at your local coffee shop. So here are the problems we need so solve:</p>

give your "special content" the impression its from a fixed location for the sake of discoverability</li>
find a normal way to allow a browser or application to call back from your internet house to your house house without being bungled by your ISP</li>
make it easy to maintain some services from your laptop or phone</li>
keep those things kinda working when those devices are offline</li>
</ul>
Yes, the idea is to host your site from your phone while its in your pocket. Crazy yes, possibly maybe, am I gunna try, yes.</p>
So back to the point, you own some internet property and with the help of some krappy-dyndns we can publish a text file to the "free" hosting thats attached to your domain. This falls under the guise of what we call these days as .well-known. https://youraddress.com/.well-known/krappy-dyndns-8abe777a</code> holding a binary stream of IP address histories and encoded with the name of a service. It's just an IP address and while its your IP address its also shared by others so its vaguely you. The daemon service runs on your target device and on an interval figures out what your IP is and then if it changes pushes it to that well-known file.</p>
A user comes along and wants to leave a comment on your site. It makes a call to the comment service you run on your laptop and the client making the request knows what service it wants to interact with finding the correct .well-known and thus collecting an IP address. Next the tricky part, we have to trick your ISP to accept an incoming connection without an outbound call. Thats the whole NAT thing, probably utilizing something like https://en.wikipedia.org/wiki/Hole_punching_(networking)</a>. So your laptop will also host this service on your IP and allow for some underlying protocol like WebRTC to allow the initial transaction and boom the comment has been sent. Now, this is an internet that isn't trying to waste your time, so we take the comment and after its moderated we write it once back to our free hosting and if our laptop gets turned off for the night, who cares, people just cant leave a comment but the imporant stuff stays there. I mean they could always just send an email too.</p>
Just one step in this crazy plan complete this week and another piece of the Krappy Internet is available.</p>
</div>


Home
Fri, 06 Jun 2025 00:00:00 +0000

  
  
    Developmeh</h2>
    Develop ¯\_(ツ)_/¯</div>
    Contained within are harebrained ideas that have no commercial value... still here... you are one of the special ones.</p>
  </div>
</div>

  

  Perspective</span>
  I have done a lot of software engineering in my life and after all that time I have come to appreciate an industry in constant evolution.
I, though, seem to stand as a fixed point, arriving to accomplish a specific task and obstinately refusing to become a tradesman.</p>
</div>

  Welcome</span>
  For those of you who have a craft and participate in a creative act on the regular, I salute you. Your bravery is what I idolize. In pursuit of of some kind of self-idolatry I create toys to expand my knowledge and forgive myself for being a shill.
But who cares? Welcome to my workshop!</p>
</div>

  Standards</span>
  This is a safe space for all ideas; the point is to have fun with it; you don't wanna write tests...suuuuure....
GET THE HELL OUT! I am not some kind of heathen. I have standards, bud.</p>
</div>

  
    Devlogs</h3>

15-03-2026 Kwik-E-Mart (v0.5 — Making tools for robots)</a></li>
06-03-2026 The AI Diaries (My Own Ideas)</a></li>
01-03-2026 The AI Diaries (Limitless Abstraction)</a></li>
22-02-2026 The AI Diaries (Unboudned Growth)</a></li>
14-02-2026 Catalyst Orchestrator (The Daemon Creates Steps at Runtime)</a></li>
13-02-2026 Catalyst Orchestrator (The Daemon Parses, The Daemon Routes)</a></li>
11-02-2026 Catalyst Orchestrator (The Haiku Decides)</a></li>
08-02-2026 The AI Diaries (80/20 Rule Still Applies)</a></li>
03-02-2026 The AI Diaries (Composable Code Future)</a></li>
02-02-2026 Rust Dancing Banana (SSE vs Chunked Encoding)</a></li>
02-02-2026 Rust Dancing Banana (Rust's Async Streams)</a></li>
01-02-2026 Rust Dancing Banana (Compile-Time Frame Embedding)</a></li>
01-02-2026 Rust Dancing Banana (Nix for Rust)</a></li>
28-01-2026 The AI Diaries (Eager Intern Problem)</a></li>
27-01-2026 The AI Diaries (Throughput over Precision)</a></li>
20-01-2026 The AI Diaries (AI-generated Code Debt)</a></li>
14-07-2025 This Week's Crazy Idea (Just build binaries)</a></li>
13-07-2025 This Week's Crazy Idea (Everything is a Stream)</a></li>
21-06-2025 This Week's Crazy Idea (OpenTelemetry and the question of ditching logs)</a></li>
15-06-2025 This Week's Crazy Idea (WebRTC and what not to ask AI to do)</a></li>
14-06-2025 This Week's Crazy Idea (WebRTC, NAT Traversals, and American Manufacturing)</a></li>
08-06-2025 This Week's Crazy Idea (Decentralized DynamicDns Krappy-DynDns)</a></li>
24-02-2025 Krappy Internet (Working around the browser)</a></li>
11-02-2025 Krappy Internet (An Ideal World)</a></li>
31-01-2025 Streaming Dancing Banana (Nix Cross Platform Improvements)</a></li>
29-01-2025 The Krappy Internet (Protocol Servers)</a></li>
27-01-2025 Streaming Dancing Banana (Nix Build and Deploy to K8s)</a></li>
21-01-2025 Distributed Game of Life (Debugging stats)</a></li>
20-01-2025 Distributed Game of Life (Stats)</a></li>
19-01-2025 Distributed Game of Life (Profiling)</a></li>
15-01-2025 Distributed Game of Life (Getting Started</a></li>
25-12-2024 Krappy Kafka (k0s Deployment)</a></li>
22-12-2024 Krappy Kafka (Handler Cleanup and Func Interface)</a></li>
05-11-2024 Krappy Kafka (Shared Consumer Groups)</a></li>
</ul>
</div>
  
    Articles</h3>

Automatic Programming: Iteration 4</a></li>
BATS - Testing Bash Like You Mean It</a></li>
Keep Your Eyes on the IDE, and Your Robots on the Tickets</a></li>
Agentic Patterns: Elements of Reusable Context-Oriented Determinism</a></li>
Just Forget About Owning Code</a></li>
Rust Dancing ANSI Banana with Server-Sent Events</a></li>
A Deterministic Box for Non-Deterministic Engines</a></li>
Claude or Clod</a></li>
The Magic of Stubbing sh</a></li>
Sufficient Complexity</a></li>
Do Devs Really Do DevOps in your Org?</a></li>
The Good Sergeant</a></li>
Creative Impostor Syndrome</a></li>
The Perfect Dev Env Part 1</a></li>
Distributed Game of Life</a></li>
Kwik-E-Mart: Who Needs a Gas Town When a Gas Station Will Do</a></li>
Krappy Kafka</a></li>
</ul>
</div>
</div>
  </div>
  

  Connect</span>
Everything is on GitHub:</h3>

Correspondence</h3>
Please address all hate mail here</a></p>
</div>
  </div>
</div>


The Krappy Internet
Wed, 29 Jan 2025 00:00:00 +0000
What if the internet stopped being shit and was instead Krappy? 🔗</a>
</h2>
The Krappy Internet is an attempt to re-envision how we trust data from the internet. This is barely even a hypothesis but in the pursuit of something closer to what the internet once was without bike shedding blockchains and onion routers I am building my own internet, just for me. Others can use it if it ever does anything.</p>
Components 🔗</a>
</h3>
Krappy Utils (In Progress) -> https://git.sr.ht/~ninjapanzer/krappy
Krappy Content Linker (In Progress) -> https://git.sr.ht/~ninjapanzer/krappy_internet
Krappy Navigator (Planned)</p>
Mircocosoms 🔗</a>
</h3>
In the beginning content lived on distinct domains which declared their purpose clearly in their domain name or the commonality of the content they maintained. Much like an address to a folder on a huge distributed computer hyperlinks created the connective tissue
between the content storage and meaningful reference. Even search engines only acted to provide a searchable inventory of those same links. In the early 2000s this modus changed in the drive to reduce barriers for users to publish their thoughts to the internet.
I don't know who to blame first but lets just say my earliest memory related to something like "Global Consciousness" a rather ugly site, appropriate for th time, where you could post a few worlds and it would show up for everyone. Kind of mind boggling the scale
of something like that back then. This wasn't the first though, as email was the first "social media" through usenets and before that bulleten board systems. The biggest difference between those early examples and what we have now is the nature of the silos created.
Content is restricted to a domain and distribution is controlled by the domain owners marketing budget at best, and at worst by the nefarious moderation of maddmen. The flow of information is best modulated at the consumer and not in the ivory towers of the board rooms.</p>
When a content silo is generally healthy we will see an even discourse of thoughts and an opportunity to learn. The opposite is a self-reinforcing place where we can avoid the conflict of new ideas and further ambiguate reality. The need for critical thinking is personal
obligation of a democratic society.</p>
Nefarious Moderation 🔗</a>
</h3>
If I was to correlate discovery of knowledge with my youth some 30 years ago the challenge would be finding my way to he library and then finding the right book. The process very much aligns with the manner we extract data today but the moderation is opaque. I had
the choice of either using the "card catalog" or speaking to a "research librarian" to identify my resources. Both are somewhat expensive in terms of human expenditure but rely heavily curation and expertise. These two avenuse are aligned with search engines and wikipedia as
direct analogs a decade later. The value position of that system is a direct proportion to its speed and the agency of the curators to treat knowledge as uniform expression. This of course is the ideal and not all libraries were neutral, none could be free of inherant bias, and
thus are another form of imperfection. If we instead try to observe the form of the library and the librarian the intent is to act as a free store of knowledge, organized by consistent means and discoverable by the average human.</p>
Moderation is at its core a kind of applied bias, one that slides towards societal norms. The locality of those norms is mediated by the range of human contact; in a town that was limited to hundreds and on the internet thats limited by language and discoverability.
Because a card in the catalog at the library has a fixed dimension there is also a limited topical granulatirty it can describe about an entry. Someone also has to use interpretation to categorize and prioritize those classifiers, another layer of invisible bias.
I want to believe that those involved take the role seriously but honestly, but I also know that this cannot be true, but I do believe that the default nature of people is to do good and those that do ill are a smaller portion of the whole. I expect that libraries
have been crowd sourcing classification for as long as they have existed. At some point the number of texts exceeds the capacity of the librarian to verify and we have to rely on publishers and other libraries to do the bulk work.</p>
The same is true for content on the internet, but the value and classification has to benefit humans and not reinforce the dopamine factories. When we are rewarded for the sensational or rhetorical we assume a bias towards these topics and the value repeats instead
of grows. If we were to view "content" independent of "platform" and interacted with it as we would in a library, what would that card catalog look like? Who would fill out the cards? Who curates the summer reading list? The publisher or the librarian?</p>
Identity and Emergence 🔗</a>
</h3>
I adhere that you should put your name on things. I am American, and I with the mythology of figures like John Hancock, who's apocryphal heroism is laid out by signing his name large enough on the Decaration of Independence such that the landord could read it unassited.
Regardless of the veracity and accuracly of this take its influence what it means to "have a position" and "to express ones thoughts" where there is no place for anonymity. Its a bias allowed by my privilege, also I don't spend a lot of time in proximity to the emergence
of fact. So there is clearly a place for strong assumption of consistent identity and the emergence of information without a clear owner. The value is weighed by its validation, when giving credence to a statement it must have proof. Proof is well established through
consistency of action by a trusted identity, or by the expression of evidence. I wanna believe there is a place for investigative journalism's protected informants, for whistle-blowers, and for those fighting oppression to communicate. When the platforms are not aligned
with protecting the actors, which if you look at the long history of centralized platforms is under constant violation by state run organizations, hackers, and corporate greed I agrue no one is anonymous.</p>
A person should be able to own whatever they publish, not by license but by attribution, you can prove you said it. You can also say it anonymously, since an identity is really analogous with trust an identity doesn't need to be a "person" but it should be "consistent".
Naturally, this means an identity can be an organization or a person, and content is aligned with that instead of their domain. Domains don't own identity they only hold content and act as addessable geographies. Many libraries carry the same books and in some cases
they trade those books with each other with decentralized ownership. But what can't change is the authors, the editors, and publishers, they are fixed and they act as the identities we assign or reject the proof of over time.</p>
The value of identity is we can account for its duality, both the bad and the good are relatable and the only moderation will be self-moderation. Honestly, this is a really tricky subject, the lines were drawn long ago where accountability is a double edged sword. It
protects the mass from victimization and at the sametime subjects the part to possible ostracisation or harm. For now I like to think of identities as properties or assets. They are idempotent and addressable but not individual, an actor may have multiple identities. How
those identities assume trust and proof is based on the system that passively assigns it its trust.</p>
While identities publish, it is the published material itself that is graded and the author doesn't receive immediate feedback about its reception. There are other networks and processes to be placed that help users collect and consume those publications wholly owned by
them.</p>
Krappy Utils 🔗</a>
</h3>
A persistent connection multiplexing TCP protocol server library. Since everything is going to eventually have a binary protocol it makes sense to hoist that from Krappy Kafka and speed up how fast I can spin up a new protocol processor.</p>


Figure out how to test connection management is working as expected.</li>
</ul>
DevLog 🔗</a>
</h2>

02 02 2026 🔗</a>
</h3>
WASM is the way in 🔗</a>
</h4>
Something that has become clear is that with the introduction of WASM my desire to move on from webapps and browser experiences that expand past HTTP has become more common. I see a future where content is delivered from central sources and interactions are handled with decentralized networks. I keep thinking that the problem is its all or nothing when it comes to tool like TOR and I2P.</p>
Thinking about what the internet is quite good at its linking documents and even if some of the major search engine players are failing at delivering valuable content, the content is stable and addressable.</p>
Clearly, if the debacle with Cursor trying to build a new browser building browsers is hard and we probably need to take another stab at browser extensions. Locking down the browser was once necessary in the days of IE but now we can provide actual functionality that is quite interesting and doesn't require and evolution in JavaScript to accomplish. WASM gives me complex tools and introduces them to the internet operating system of the browser.</p>
</div>

24 02 2025 🔗</a>
</h3>
Working around the browser 🔗</a>
</h4>
So one of the challenges of making a side-channel connection to the krappy internet is through a proxy. I don't really see the need to try and forklift the world of current browsers. The plan for this is to create an extension that loads a WASM module wrapping a webrtc data channel. This way I can maintain a socket like stream to another client that is not restricted by the rules of the browser. I can then establish a TCP or QUIC connection to the content tree.</p>
The long road here is probably going to end up being the short one in reality. Browsers are quite irritating and intrusive. I think about how ToR works and how its challenging to link around to things on it. Some of that is due to the impermenance of those servers and the lack of an index. Something like this could act as a generalized bridge between those and other platforms. In the same way that gemini capsules and gopher sites will deploy an http proxy. This proxy is local to the machine so creators can pick any protocol for their site and they could be linked together. I rather like the idea of going to the wallstreet journal and having a tor link to a gemini capsule with the pages content behind the paywall.</p>
It will also be much harder to destroy content as any page that changes can be relinked to something like the internet archive. The control side of this is important, and I wonder if users should opt into other users links. So the defacto nature is we provide our own content and only we can see it, there would need to be some opt in model. I keep seeing it as if the world was one big logseq where content from various location is joined without ownership of any of the sources. Even if it isn't useful its rather cool to think about annotating the internet and building a webring around content that can have a deployed algo track updates.</p>
Dreaming dreams.</p>
For now I am planning on building a PoC from https://github.com/pion/webrtc which will then be compile to WASM and connected to a proxy server.</p>
</div>

11 02 2025 🔗</a>
</h3>
An Ideal World 🔗</a>
</h4>
I see the internet as a great library archive, while I haven't done the math, I expect the rate at which we create material is roughly at the same rate we improve storage density. At least I can account for that in my own life.</p>
So here is a random vision for the internet. I pay for connection to the network. In deference to the world I live in today, that used to mean something a little different in my youth. Something that drives me to view myself a more of a producer/consumer than just a consumer. I am sure I am not alone.</p>
We pay a provider and I get some simple addressable hardware from them, now I get a public IP address but moreover a dynamic DNS built into my hardware. My provider acts a kind of lookup service which allows me to host applications within my infrastructure and make them available to the greater internet. When I share an image, I share it from my network. My provider also acts as a cache so allow my devices and services to be offline without interruption.</p>
It's not an X or Y kind of situation, personally hosted lives alongside the giants. Services like Vercel or Hetzner still exist for hosting. But when I share text to comment on Bluesky I own that text and it is hosted on my device and cached by Bluesky. When I revoke access to my post, its not gone, but its removed from the cache in the same way we handle DNS propagation. It would be a wild and noisy place and the problem to solve is how to find the things you wanna read. The ecosystem for applications changes as well. Everything is a server, I mean it already is except you don't know what its serving and to who...</p>
An idealistic view of a future state that still requires a lot of work.</p>
</div>

06 02 2025 🔗</a>
</h3>
Getting over the Browser 🔗</a>
</h4>
So recently I came to this understanding of the nature of the Modern OS, which includes the web browser. So there are really two ways to go. Create a new browser using an open source project or build a side-channel daemon.</p>
I rather like the daemon concept because getting something integrated and deployed into a bespoke browser build is going to be an unlikely way to get someone to use something.</p>
</div>

29 01 2025 🔗</a>
</h3>
Building a TCP server Library 🔗</a>
</h4>
While this project has been in the works for a while its also an avenue for me to learn. The first task was to build a modern high performance TCP server that has a concept of an easy to manage binary protocol. For this I picked CBOR https://cbor.io/ RFC 8949 Concise Binary Object Representation. Its not the fastest and I am looking for a solution that has a zero copy buffer like flat buffers maybe.</p>
The challenge is making sure that connection management happens as we expect. Since the goal is to allow a client to reuse a connection to stream multiple requests its important that the connection be persistent and also go away as soon as we are done using it so it can be recycled for a future client. In the Krappy Kafka project there are cases where this management appears to get out of sync and blocking causes all go routines to be consumed. Where connections should have been released they were not. Now that project uses a lot of competing mutexes that are likely the cause of deadlocks. The next version of that and all future protocol servers will rely on channels.</p>
From here we move to the Content Linker, in something like a WoT (Web of Trust) model we want to allow content registration for trust. While we want to allow anonymous users to contribute whatever they want we also want content to have a machine like identity. The hope is to promote that content linking is how we establish a chain of custody for truth. User provided consensus then helps to build this trust. This means that content from public identities doesn't have to join a web of trust. Its just available and as it gains consensus the trust of that content is improved as authoritative.</p>
A good model would be wikipedia, Content can be copied and modified but its moderation is the responsibility of the whole. While this doesn't mean that mistruth is evicted, it means that it will often be short lived and even hard to find. Burrying is not something you can effectively pay for but the community can dimish the impact of garbage so much it may never be seen. There are going to need to be some algorithms to help address cheating here but this is the resonsibility of the consumer. The content model is just a weighted data store. You look at whatever you want albeit the model will promote some decisions.</p>
</div>


The Perfect Development Environment
Tue, 28 Jan 2025 00:00:00 +0000
The Perfect Development Environment 🔗</a>
</h2>
Let's be clear, this is all opinions and while this serves equally for those who focus on a single technology chain, it is optimized for those whom work on multiple projects with varied exacting dependencies and runtimes.</p>
For example just in Ruby alone I may have a a legacy ruby 1.9 project on the same machine where I have multiple ruby 3.x projects. You might not see any conflict here and you would be right. Given each ruby project has a unique ruby version we don't really have any annoyances. But the moment I have 2 ruby 3.2 projects with various verisons of imagemagick I will find myself fighting. Of course this is related to a nuance of gems that bind to static libs and are somewhat opinionated about which exact version they need while that need is being provided by a system level package manager like Homebrew.</p>
To be clear, I love homebrew, it made me who I am, but like JQuery its a product of an age that has passed.</p>
What murders me is that Nix is 20 years old. I could have been using this the whole time, if it had any of today's features back then. But I am jumping the gun, lets continue with the targets of this project.</p>

Produce a template-able structure for any project</li>
Use open source tools that are well maintained</li>
Use patterns that make project adoption easier</li>
Management of dependencies must be project specific and avoid env collisions</li>
Leave artifacts behind that inform but don't require use</li>
</ul>
The guiding principle is Leave artifacts behind that inform but don't require use</em> we can't say this is done unless we can make this true. Everything before this makes its conditionality possible. If we do this work correctly we can allow the technological landscape to evolve and these techniques can be replaced with new superior solutions as they become vogue.</p>
Produce a template-able structure for any project 🔗</a>
</h3>
I have worked with a number of navel gazing developers who like to build walls around their languages and techniques marking them simultaneously superior and exclusive to their corners of the world. I would alike this to what has happened with protobuf and protoc in python.</p>
A plea for protocol politeness 🔗</a>
</h4>
You can skip this section if you don't care about my personal experiences with protoc and python, while this is not a problem limited to protoc or python its a story of the smell produced by monoculture; something that no longer has a place in modern development.</p>
In proper diatribe format our story is about the value of protocols and our common inability to avoid abstraction in the face of having to learn something new.</p>
For those who have not used Protobuf</a> and its cli utility protoc (pronounced pro-toc), it has a rather simple protocol for adding extensions to its command line. Mind you python was not officially supported until sometime mid-2024 and all generators were community provided. Here is the catch like I attempted here my grpc generator in rust</a> required some funny incantations to get things working. At the time python developers wrapped protoc in a bespoke python library at the time those incantations would looks something like this python -m grpc_tools.protoc -I. --protobuf-to-pydantic_out=. example.proto</code> and didn't publicly expose the actual plugin for protoc which expects something more like protoc -I. --protobuf-to-pydantic_out=. example.proto</code>.</p>
The protocol I speak of is the product of a clever CLI, --protobuf-to-pydantic_out</strong> expects that in the current path is something executable (including a shell script) that goes by the name protoc-gen-protobuf-to-pydantic</strong> so whatever is before out</strong> must then be able to exist prefixed by protoc-gen</strong>. While somewhat poorly documented this protocol for extension makes it super easy to bolt on 1 or 10 plugins to build out a whole organizations worth of runtime specific artifacts.</p>
Like I mentioned before '24 we had to do it the hard way and because the python communities view of DevEx and ergonomics is annoying binaries like protoc should be wrapped under the glaze of a python module.</p>
So the lesson here is probably two fold. Firstly, I wasn't the only one who probably thought this whole python thing was dumb and by bringing python into the fold it has a spec plugin and now I don't have to worry about protocol violations to generate artifacts from protobuf IDL files. Secondly, its expected that when you produce abstractions for public consumption you are obligated to do so in observation of the authors protocol when wrapping their work.</p>
Simplicity doesn't mean brevity, thus I am advocating for Clarity</strong> over Ease</strong>. It should be easy to understand or do, so in terms of python authors overstepped here. We are going to try and do the same thing with our project layouts each piece will respect the patterns of its community regardless of expectations of the projects norms.</p>
</p>
Yep, I am thinking it too so just hang on.</p>
Back to the main event 🔗</a>
</h3>
What we want to inform on is how to consume a fresh project which usually has a few externals we should concern ourselves with. First of those are the runtime dependencies and here we have a ton of options. Just in my short life I have used all of the following:</p>

rvm</a></li>
rbenv</a></li>
nvm</a></li>
maven wrapper</a></li>
gradle wrapper</a></li>
fnm</a></li>
sdkman</strong></a></li>
jenv</a></li>
homebrew</strong></a></li>
apt/dpkg</strong></a></li>
yum</strong></a></li>
pacman</strong></a></li>
nix</strong></a></li>
asdf</strong></a></li>
mise-en-place (mise) / rtx</strong></a></li>
ansible</strong></a></li>
pyenv</a></li>
virtual burrito</a></li>
phpenv</a></li>
pvm</a></li>
</ul>
While I am sure I forgot some you will notice two groups I have highlighted the ones that belong together. Whats different about these bold tools is they try to be the new standard of how to collect any runtime with some broad variances. I think this is a good place to start, but we should remember our goals and immediately eliminate those which don't project our project env from collisions with other projects. That means we say goodbye to all the package managers aside from nix</strong> and ansible</strong> albeit ansible has a special use case and is probably muddying the waters.</p>
Of the remaining list we have sdkman</strong>, asdf</strong>, mise</strong>, and nix</strong>. Thats a pretty tight list lets go over how these work. The first three all do the same thing and will isolate each runtime in your home directory and shim you environment and each allows for a global system version and a config file driven variant per project folder albeit the format for sdkman is unique and asdf/mise are interchangeable in some cases. That leaves nix</strong> which is our ugly duckling, as its syntax is rather obtuse so you need to have the right reason to use it for your project. That reason is probably more related to your build system then it is one of getting a runtime local to a project.</p>
To be honest I don't generally advise using nix</strong> to prepare your development environment runtime since it wants to own the whole environment using it to install say ruby means you also need to teach it how to install your gems. Which isn't horrible but might be overreaching, an example with nix</a> while that also builds a docker image with the same context, that other reason you might wanna use nix</strong> I mentioned, its pretty heavy compared to the competitors. Those primarily expose their configuration through something we expect a file that lists a runtime and a version. The tool then helps you install those versions on your computer and the configuration file acts as human readable documentation about the project in case your developer doesn't wanna use it.</p>
Enough talk lets see something 🔗</a>
</h4>
Remember our opinion is that a git repo is just a folder and folders can live in folders, I don't wanna tell you to always put multiple projects in a repo or one project per repo so we only describe the project as a folder and where that folder lives is up to you.</p>
/
</span>├── .ci/
</span>│   └── scripts/
</span>├── .git/
</span>├── .gitignore
</span>├── .tool-versions
</span>├── .deploy/
</span>├── └── scripts/
</span>├── .build/
</span>├── └── scripts/
</span>├── GETTING_STARTED.md
</span>├── Makefile
</span>├── README.md
</span>├── src/
</span>└── ...
</span></code></pre>
I have seen variants of this generally where are scripts share one folder but generally I look at this from the approach of interfaces that are tool agnostic. That interface is exposed though make. Regardless of the actual build steps or build system, like bazel or nix. I should be able to say, make build</strong> or make deploy</strong> and I will get some feedback on how that is going.</p>
The same is true with CI, which will probably be augmented by the dot file for your executor configuration be that CircleCI, Gitlab, Github, Sourcehut, or something else we will always need a place to hide some scripts and then bind them to make so our CI can also make the same calls that we might call like make test</strong>. The specific language for the cross project targets is outside the scope of this document but the three I stated should be a default with strong consideration for make init</strong> or something to setup a first time run.</p>
Thats not the only reason we want to put our scripts in their own project folders though. We want to be able to test them. I have become a huge fan of BATS</a> as exampled here</a> each scripts folder can be extended with a tests folder for its given sub project like this without concern of polluting the scope of the actual codebase of the project.</p>
/
</span>├── .ci/
</span>│   └── scripts/
</span>│   └── tests/
</span></code></pre>
IF YOU EVEN ONCE SAY WE DON'T NEED TO TEST OUR BASH GET THE HELL OUT</strong> Test everything is TEST EVERYTHING!</p>
So now if we have everything write we have a repo with a file to define its runtime dependencies like ruby or java that is explicit. If our project needs both, all the better.</p>
Our makefile provides a common interface to declare activities and it mostly calls scripts in our various targets like build or deploy.</p>
Lets pick it apart 🔗</a>
</h3>
But I am using gradle 🔗</a>
</h4>
Sweet, gradle is cool and while you may consider that you can just run gradle install</strong> instead of make build</strong> we often have to bake extra commands and options to the actual build tool.</p>
.PHONY</span>: </span>build
</span>
</span>build</span>:
</span>	</span>@</span>echo </span>"Building with Gradle"
</span>	</span>@</span>gradle install
</span></code></pre>
Its not that hard and no one says you have to use it, but I bet after the 8th project you open that offers make build</strong> and you don't even know if it is using bundler gradle or maven and you stop caring we have won.</p>
But my golang project has a deploy module 🔗</a>
</h4>
Yea thats cool, thats why our project folders are all prefixed witha dot, just like git they are almost ephemeral if I deleted them all I wouldn't get a better project but the project would still be the project.</p>
I don't need to deploy right now and my builds are uncomplicated 🔗</a>
</h4>
Great, the point is to define a template, if you don't need .deploy</strong> don't use it. Same is true for .build</strong> but when the decision comes, where should I put my scripts, this is the hint. If you need some other kind of special target for your project consider creating a special dot folder for it and give it some meaning while exposing it to make.</p>
Reasoning 🔗</a>
</h3>
At each level we are creating a little border around the tools and patterns we use, creating a project protocol if you will. So projects can be more interchangeable and better to keep them simple. We are intentionally saying "don't think about it just follow this pattern." While this might feel like overstepping into someone elses agency it should feel more like a relief because its a decision you don't have to make and ultimately you are not bound to. We should invite a repeatable protocol because being clever is like getting a puppy, its a lot of responsibility.</p>
We started this discussion deep in the type of tooling but in reality this is about knowledge artifacts. We are trying to answer the following questions with our protocol:</p>

what version of x do I need to install</li>
how do I boot this up</li>
how do I deploy this</li>
how do I build this</li>
how do CI/Gitops/Automation happen</li>
</ul>
I have done all that without having to ensure someone already knows and better yet if they have seen a project they already know.</p>
A template exists here for you consumption Template</a></p>
That was the easy part 🔗</a>
</h2>
Now we need to address the reality of bigger projects, the stuff it needs from the OS to build complex projects.
(Coming soon)</p>


TAPS - Not just a reporting protocol
Tue, 28 Jan 2025 00:00:00 +0000
Test Anything Protocol 🔗</a>
</h2>
So I rather love writing tests. Mostly because I don't understand my code and the code of the libraries I am implementing. But I sure as hell can understand the results. Maybe if there was a reason to write tests that would be it. I just kinda know I am dumb and its easy to write bugs so why not be a little sure. Recently, I was working in an unfamiliar codebase with a completely familiar command language ba_sh_. I wanted to be sure as I iterated thought a series of changes, ones that inevitably can't run on my machine and only in CI. When you take into consideration DEVELOPER EULA</a> regarding bespoke OS specific bash commands this starts to make sense why you might want to just double check your code works.</p>
Similarly in Ruby and other typeless languages the developer takes on the role of the compile time checker as well as feature writer. If that makes you wonder how they get anything done and write tests, the answer is as long as no one ever leaves the project things are going to be fine. So while I don't know why people still argue about if they should be writing tests and doing test driven development, all I can say is, lots of normal things are confusing. You know what I am talking about, climate change deniers, flat earthers, anti-vaxers, the over-woke(Sleepless in Seattle...).</p>
Here is the point, when I got around to the part of the work where I was like, do I really wanna test this in production? Cowboy hat in hand, I thought, Never drive black cattle in the dark</em>. So I took my good old time and asked the stars for guidance and what did I find? BATS</a> which led me to a curious mistake. TAP</a>, Test Anything Protocol, and I find out it doesn't test anything, in reality its a test reporting format and manner of consuming the results of tests, a protocol if you will. So that's all the history, but its what it inspired in me that brought me joy.</p>
I don't know if you are familiar with ePBF</a> which is related to why Crowdstrike broke the internet</a> that one day in '24. So here is what I wanted TAP to be, ePBF is a tech that lets you run ane extend software running with privilege, you know like kernel extensions that control your Windows System Security at the Airport. Oh yea they don have it because of some interesting non-competition reasons... (cough) Greed. Ok sorry, Test Anything, means to me we have a single interface and mechanism for mocking and asserting our running code. Imagine we don't have to have a bespoke test framework with gads of hard to understand YAML files in your go project. Instead we just have symbols at runtime that can always test a live running application. Its outlandish sure, but a guy can dream right. I mean its cool, so back to BATS which is pretty cool.</p>
Let's look at a quick example script 🔗</a>
</h3>
helm.sh</strong></p>
#!/usr/bin/env bash
</span>
</span>DEFAULT_TIMEOUT</span>=</span>30m
</span>TIMEOUT</span>=</span>"${1</span>:-</span>$DEFAULT_TIMEOUT}"
</span>
</span>helm3
</span>--wait
</span>--timeout ${TIMEOUT}
</span></code></pre>
Pretty easy, we snag the first arg or provide the default. I have probably done this a dozen ways over the years but often skipped setting up any kind of testing. Really, this has just been good luck and the fact that these kinds of scripts are often small and rarely touched.</p>
Setup BATS 🔗</a>
</h3>
install-bats.sh</strong></p>
#!/bin/bash -e
</span>
</span>if </span>[ </span>-d </span>"./test/bats" </span>]</span>; then
</span>  </span>echo </span>"Deleting folder $FOLDER"
</span>  rm -rf </span>"./test/bats/"
</span>  mkdir -p ./test/bats
</span>else
</span>  mkdir -p ./test/bats
</span>fi
</span>
</span>git clone --depth 1 https://github.com/bats-core/bats-core ./test/bats/bats
</span>rm -rf ./test/bats/bats/.git
</span>git clone --depth 1 https://github.com/ztombol/bats-support ./test/bats/bats-support
</span>rm -rf ./test/bats/bats-support/.git
</span>git clone --depth 1 https://github.com/ztombol/bats-assert ./test/bats/bats-assert
</span>rm -rf ./test/bats/bats-assert/.git
</span>git clone --depth 1 https://github.com/jasonkarns/bats-mock.git ./test/bats/bats-mock
</span>rm -rf ./test/bats/bats-mock/.git
</span></code></pre>
Here we dump the bats under a central test</em> directory and we include all the libs:</p>

bats-support</a> - required for other libraries</li>
bats-assert</a> - adds deep support for asserts</li>
bats-mock</a> - allows for stubbing</li>
</ul>
The Test 🔗</a>
</h3>
helm.sh.bats</strong></p>
#!/usr/bin/env bats
</span>
</span>bats_require_minimum_version 1.5.0
</span>
</span># Load Bats libraries
</span>load ../test/bats/bats-support/load
</span>load ../test/bats/bats-assert/load
</span>
</span>function </span>helm3</span>() {
</span>  </span># Captures and echos all the arguments each time helm3 is invoked
</span>  </span>echo </span>"$@"
</span>  </span>echo </span>"helm3 executed"
</span>}
</span>
</span>setup</span>() {
</span>  </span># export -f allows the function to be exported into the current shell env
</span>  </span># What's cool about this is the shell looks for functions before commands
</span>  </span># So if we have helm3 installed or not during the test this will be resolved first
</span>  </span>export </span>-f helm3
</span>}
</span>
</span>teardown</span>() {
</span>  </span># unset is quite important if this shell is to be reused
</span>  </span>unset </span>-f helm3
</span>}
</span>
</span># Test cases
</span>@test </span>'when timeout is provided it will be set' </span>{
</span>  </span># The first step is to run our script so bats can capture its output and setup the env for
</span>  </span># our assertions
</span>  run sh ./helm.sh 18m
</span>
</span>  </span># allows us to assert a line and verify if any line in the output contains (--partial)
</span>  </span># our expected string
</span>  assert_line --partial </span>"--timeout 18m"
</span>
</span>  </span># a catchall to verify we called our stub and as we expect
</span>  assert_line </span>"helm3 executed"
</span>
</span>  </span># asserts that the command exited with a 0 exit code
</span>  assert_success
</span>}
</span>
</span>@test </span>'when timeout is not provided it will be the default' </span>{
</span>  run sh ./kube/install.sh
</span>
</span>  assert_line --partial </span>"--timeout 30m"
</span>
</span>  assert_line </span>"helm3 executed"
</span>  assert_success
</span>}
</span></code></pre>
So thats it, you can test a bash script and mock the commands that we want to verify.</p>
Of course we can also introduce a spy in the case we don't want to mock helm3</em></p>
function </span>helm3</span>() {
</span>  </span># Captures and echos all the arguments each time helm3 is invoked
</span>  </span>echo </span>"$@"
</span>  </span># Forces a PATH search and forwards arguments
</span>  </span>command</span> helm3 $@
</span>  </span>echo </span>"helm3 executed"
</span>}
</span>
</span>export </span>-f helm3
</span></code></pre>
Will allow the following execution:</p>
$ helm3 "HI"</code></p>

Will call the helm3 function</li>
Echo the args</li>
Call the helm3 command from the PATH</li>
Echo our status message</li>
</ol>
In some cases you don't want your test to execute destructive operations but inspect its assumptions. Other times you need to know something happened but don't want to interfere with it. Because run captures all outputs we formulate our assertions around verifying those lines
produced in the output that are meaningful.</p>
Here we have only explored interacting with arguments but its possible to assert anything that bash can test. If a file was updated, if a file was created, ultimately if a binary or built-in command holds our context for a valid assertion we can verify it.</p>
Its not quite Test Anything</em> but its damn close.</p>


Go Generics Example
Wed, 22 Jan 2025 00:00:00 +0000
Go Generics an Example 🔗</a>
</h2>
So in my recent Go Game of life (GOGol) projects I have had a personal goal to define some repeatable interfaces. A renderer</strong> and game world</strong> that lets me plug in new implementations of a life engine and not have to change too much else. For the renderer this is pretty straight forward. We have some common rendering primitives and we can expose them a methods.</p>
type </span>Renderer </span>interface </span>{
</span>  </span>Beep</span>()
</span>  </span>Draw</span>(</span>string</span>)
</span>  </span>DrawAt</span>(</span>int</span>, </span>int</span>, </span>string</span>)
</span>  </span>Dimensions</span>() (y </span>int</span>, x </span>int</span>)
</span>  </span>Start</span>()
</span>  </span>End</span>()
</span>  </span>Refresh</span>()
</span>  </span>BufferUpdate</span>()
</span>  </span>Clear</span>()
</span>}
</span></code></pre>
Now I'll admit some of the underlying goncurses leaked into this interface but this is a work in progress and has yet to be refined. Of course my two implementations either ignore functionality or make everything a log message. I have Mock renderer which captures events as statistics and a shell renderer which displays my game board to the world. Because I am able to express most of this interaction with primitives and commands I give it a hearty thumbs up.</p>
Now on the other hand we have a world, and the world has to describe the life within. That life is specific to the world it lives in. Life as a generalization looks a little something like this:</p>
type </span>Life </span>interface </span>{
</span>  </span>State</span>() </span>bool
</span>  </span>SetState</span>(</span>bool</span>)
</span>}
</span></code></pre>
In reality any consumer of the world really only needs to be able to see a individual state or possibly mutate that state. The world could be expressed like this:</p>
type </span>World </span>interface </span>{
</span>  </span>Cells</span>() [][]</span>Life
</span>  </span>ComputeState</span>()
</span>  </span>Bootstrap</span>()
</span>}
</span></code></pre>
And that works just fine if we only ever need to know about cells as a RW-able entity accessible through our world. The basic game of life would call ComputeState()</strong> on the world and then iterate through the Cells()</strong> two dimensional array on each render tick. A little something like this:</p>
Display</em> is the terminal screen being written to</em>_</p>
Methods from goncurses</p>

MovePrint</strong></li>
Refresh</strong></li>
</ul>
for </span>y, row </span>:= </span>range </span>w.Cells() {
</span>  </span>for </span>x, cell </span>:= </span>range </span>row {
</span>    </span>if </span>cell.State() {
</span>      w.display.Display.MovePrint(y, x, </span>"0"</span>)
</span>    } </span>else </span>{
</span>      w.display.Display.MovePrint(y, x, </span>"-"</span>)
</span>    }
</span>  }
</span>}
</span>w.display.Display.Refresh()
</span></code></pre>
Because everything is synced to the main render tick we don't need to include any specialized behaviour to our cells. This is he mechanism used in tradgol</strong> from https://github.com/ninjapanzer/gogol/blob/01b637beca8b1123aad77390286681883edab265/cmd/tradgol/main.go</p>
You might notice in that project I also attempted parallelgol, it was a failure because I struggled to produce generic types for world and game such that I could have radically different implementations of those entities. Time heals all wounds and for me it was understanding how a generic in Go might differ from a Generic in Java another typed language I was familiar with.</p>
Here is how I thought it should go 🔗</a>
</h4>




My Idea
</th>

Reality
</th>
</tr>
</thead>



package </span>main
</span>
</span>type </span>Life </span>interface </span>{
</span>  </span>State</span>() </span>bool
</span>  </span>SetState</span>(</span>bool</span>)
</span>}
</span>
</span>type </span>ChannelCell </span>struct </span>{
</span>  </span>Life
</span>  state </span>bool
</span>}
</span>
</span>func </span>(c </span>*</span>ChannelCell</span>) </span>State</span>() </span>bool </span>{ </span>return </span>false </span>}
</span>
</span>func </span>(c </span>*</span>ChannelCell</span>) </span>SetState</span>(state </span>bool</span>) {}
</span>
</span>type </span>World[T Life] </span>interface </span>{
</span>  </span>Cells</span>() [][]</span>T
</span>  </span>ComputeState</span>()
</span>  </span>Bootstrap</span>()
</span>}
</span>
</span>type </span>ChannelWorld[T Life] </span>struct </span>{
</span>  cells    [][]</span>*</span>T
</span>  initProb </span>float64
</span>}
</span>
</span>func </span>(w </span>*</span>ChannelWorld</span>[T]) </span>ComputeState</span>() {}
</span>
</span>func </span>main</span>() {
</span>  world </span>:= &</span>ChannelWorld[ChannelCell]{}
</span>  </span>...
</span>}
</span></code></pre>
ChannelCell does not satisfy Life (method SetState has pointer receiver)</strong> and I was stuck, CellChannel implements the Life interface and thus should be substitutable for the Life</em> generic in ChannelWorld</em>. I am wrong!</p>
</td>

package </span>main
</span>
</span>type </span>Life </span>interface </span>{
</span>  </span>State</span>() </span>bool
</span>  </span>SetState</span>(</span>bool</span>)
</span>}
</span>
</span>type </span>ChannelCell </span>struct </span>{
</span>  </span>Life
</span>  state </span>bool
</span>}
</span>
</span>func </span>(c </span>*</span>ChannelCell</span>) </span>State</span>() </span>bool </span>{ </span>return </span>false </span>}
</span>
</span>func </span>(c </span>*</span>ChannelCell</span>) </span>SetState</span>(state </span>bool</span>) {}
</span>
</span>type </span>World[T Life] </span>interface </span>{
</span>  </span>Cells</span>() [][]</span>T
</span>  </span>ComputeState</span>()
</span>  </span>Bootstrap</span>()
</span>}
</span>
</span>type </span>ChannelWorld[T ChannelCell] </span>struct </span>{
</span>  cells    [][]</span>*</span>ChannelCell
</span>  initProb </span>float64
</span>}
</span>
</span>func </span>(w </span>*</span>ChannelWorld</span>[T]) </span>ComputeState</span>() {}
</span>
</span>func </span>main</span>() {
</span>  world </span>:= &</span>ChannelWorld[ChannelCell]{}
</span>  </span>...
</span>}
</span>
</span></code></pre>
The nuance is small but super important in its simplicity. Because ChannelWorld is going to implement World which is generic it must provide a type constraint for T. That type constraint in itself needs to be the concretion this instance of the specific struct we will use. Here is the tricky part, the type constraints when binding a generic interface to a generic implementation is a double edged sword. I presented the error about how the implementation didn't satisfy the constraint interface Life</strong>.</p>
</td>
</tr>
</tbody>
</table>
Another example 🔗</a>
</h4>
Lets get a little wild</p>




Works and implements the interface
</th>

Works but doesn't implement the interface
</th>
</tr>
</thead>



package </span>main
</span>
</span>type </span>Life </span>interface </span>{
</span>  </span>State</span>() </span>bool
</span>  </span>SetState</span>(</span>bool</span>)
</span>}
</span>
</span>type </span>ChannelCell </span>struct </span>{
</span>  </span>Life
</span>  state </span>bool
</span>}
</span>
</span>func </span>(c </span>*</span>ChannelCell</span>) </span>State</span>() </span>bool </span>{ </span>return </span>false </span>}
</span>
</span>func </span>(c </span>*</span>ChannelCell</span>) </span>SetState</span>(state </span>bool</span>) {}
</span>
</span>type </span>World[T string] </span>interface </span>{
</span>  </span>Cells</span>() [][]</span>T
</span>  </span>ComputeState</span>()
</span>  </span>Bootstrap</span>()
</span>}
</span>
</span>type </span>ChannelWorld[T string] </span>struct </span>{
</span>  cells    [][]</span>*</span>string
</span>  initProb </span>float64
</span>}
</span>
</span>func </span>(w </span>*</span>ChannelWorld</span>[T]) </span>Cells</span>() [][]</span>string </span>{
</span>  </span>return </span>nil
</span>}
</span>
</span>func </span>main</span>() {
</span>  world </span>:= &</span>ChannelWorld[ChannelCell]{}
</span>  </span>...
</span>}
</span></code></pre>
Focus on type World[T string] interface</strong></p>
Here the secret is the World[T string]</strong> interface is constraint accepts the ChannelWorld[T string]</strong> constraint and thus we met all the conditions. This is the tricky part that kept me guessing because I expected the other side to be an error which it wasn't.</p>
</td>

package </span>main
</span>
</span>type </span>Life </span>interface </span>{
</span>  </span>State</span>() </span>bool
</span>  </span>SetState</span>(</span>bool</span>)
</span>}
</span>
</span>type </span>ChannelCell </span>struct </span>{
</span>  </span>Life
</span>  state </span>bool
</span>}
</span>
</span>func </span>(c </span>*</span>ChannelCell</span>) </span>State</span>() </span>bool </span>{ </span>return </span>false </span>}
</span>
</span>func </span>(c </span>*</span>ChannelCell</span>) </span>SetState</span>(state </span>bool</span>) {}
</span>
</span>type </span>World[T Life] </span>interface </span>{
</span>  </span>Cells</span>() [][]</span>T
</span>  </span>ComputeState</span>()
</span>  </span>Bootstrap</span>()
</span>}
</span>
</span>type </span>ChannelWorld[T string] </span>struct </span>{
</span>  cells    [][]</span>*</span>string
</span>  initProb </span>float64
</span>}
</span>
</span>func </span>(w </span>*</span>ChannelWorld</span>[T]) </span>Cells</span>() [][]</span>string </span>{
</span>  </span>return </span>nil
</span>}
</span>
</span>func </span>main</span>() {
</span>  world </span>:= &</span>ChannelWorld[ChannelCell]{}
</span>  </span>...
</span>}
</span>
</span></code></pre>
Once again we are now back to this __type World[T Life] interface __</p>
Because interface implementation is passive in Go I expected this mismatch to be an exception or a compiler error but instead what I have is an unused generic interface and a generic struct. Of course downstream I needed something that implemented World and the rest of my code broke.</p>
</td>
</tr>
</tbody>
</table>
Anyways, this was a big learning for me, I hope it helps.</p>


Distributed Game of Life
Tue, 21 Jan 2025 00:00:00 +0000
Go Channel Based PoC 🔗</a>
</h2>
GOGol Channels</a></p>
Some work reused from GOGol</a></p>
GoL 🔗</a>
</h2>
I have always found simulations exciting. While the Game of Life is a shallow simulation it is fun how fast you can stand it up. In the old days I would always standup a language and create a Rock Paper Scissors game to prove some minor competency. Now its GoL, I like having to build an animation or a statistics engine. What has gotten to me these days is the scale of GoL and then injecting new rules.</p>
Distributed 🔗</a>
</h3>
So one of the things about distribution I am excited about is the noise from eventual consistency. In a traditional GoL we have what I refer to as the World</strong> nothing more than a matrix of state that is roughly binary, alive or dead.</p>
The world is pre-populated with a seed, some intentional or random spattering of alive to get the whole thing started.</p>
Skipping the rules now we extend each cell in our matrix from a binary to a stateful object. Maybe they have names now like "bert" and "harry". They can have progeny and a history.</p>
Time series and genealogy 🔗</a>
</h3>
My first thought was I could track this history using a time-series db, and even attempted to build on in ERLang. But then I realized I could probably make that a little more interesting if I did it with something distributed like NATS or Kafka.</p>
Phase 1 🔗</a>
</h2>
So the first phase here is to introduce only a distributed World</strong> that can be queried from a compacted topic. Creating some form of client SDK to observe the world graph</p>
Phase 2 🔗</a>
</h2>
Unbound the graph, focusing only on neighborhoods and seeing if I can just query within a contiguous window of the simulation so I could have different views of the same simulation running at the same time.</p>
Phase 3 🔗</a>
</h2>
Heredity, try and see if I can trace the lineage of a cell through this process of events and a graph datastore to add new rules to the game as a cells heredity expands.</p>
DevLog 🔗</a>
</h2>

21 01 2025 🔗</a>
</h3>
Debugging stats 🔗</a>
</h4>
In the last build there was a predilection for the program to pin the host systems CPU. Back on the 19th I spent some time exploring profiling but this didn't produce much fruit. The path to producing my own stats was the right choice. The first symptom was related to rendering the stats intermittently from consuming the stats channel.</p>
go </span>func</span>() {
</span>  </span>for </span>{
</span>    </span>for </span>{
</span>      time.Sleep(</span>250 </span>* </span>time.Millisecond)
</span>      s.Update # draw the stats to the window
</span>
</span>      breakDrain:
</span>        </span>select </span>{
</span>        </span>case </span>e </span>:= <-</span>s.eventChan:
</span>          </span>if </span>e.name </span>== </span>Heartbeat {
</span>            hps </span>+= </span>e.count
</span>            s.heartbeats </span>+= </span>int64</span>(e.count)
</span>          } </span>else if </span>e.name </span>== </span>Broadcast {
</span>            bps </span>+= </span>e.count
</span>            s.broadcasts </span>+= </span>int64</span>(e.count)
</span>          } </span>else if </span>e.name </span>== </span>Died {
</span>            dps </span>+= </span>e.count
</span>            s.died </span>+= </span>int64</span>(e.count)
</span>          } </span>else if </span>e.name </span>== </span>Resurrected {
</span>            dps </span>-= </span>e.count
</span>            s.died </span>-= </span>int64</span>(e.count)
</span>          }
</span>        </span>default</span>:
</span>          </span>break </span>breakDrain
</span>        }
</span>    }
</span>  }
</span>}()
</span></code></pre>
The idea here is that in a goroutine this is backgrounded. When there is nothing to consume it will sleep -> render -> drain channels. What I didn't anticipate is that the speed of heartbeats overwhelmed the consumption. The stats channel can hold 10000 messages but so many were being produced that we never broke the consume loop.</p>
I solved this by introducing a ticker like this:</p>
go </span>func</span>() {
</span>  ticker </span>:= </span>time.NewTicker(time.Second)
</span>  </span>defer </span>ticker.Stop()
</span>
</span>  </span>for </span>{
</span>    </span>select </span>{
</span>    </span>case <-</span>ticker.C:
</span>      s.Update # draw the stats to the window
</span>    </span>case </span>e </span>:= <-</span>s.eventChan:
</span>      </span>if </span>e.name </span>== </span>Heartbeat {
</span>        hps </span>+= </span>e.count
</span>        s.heartbeats </span>+= </span>int64</span>(e.count)
</span>      } </span>else if </span>e.name </span>== </span>Broadcast {
</span>        bps </span>+= </span>e.count
</span>        s.broadcasts </span>+= </span>int64</span>(e.count)
</span>      } </span>else if </span>e.name </span>== </span>Died {
</span>        dps </span>+= </span>e.count
</span>        s.died </span>+= </span>int64</span>(e.count)
</span>      } </span>else if </span>e.name </span>== </span>Resurrected {
</span>        dps </span>-= </span>e.count
</span>        s.died </span>-= </span>int64</span>(e.count)
</span>      }
</span>    </span>default</span>:
</span>    }
</span>  }
</span>}()
</span></code></pre>
Which forced a render to happen and boy howdy I was producing millions of events a second and thats what was pinning the CPU. Adjusting production of those heartbeats. I eventually moved rendering of the stats window to a separate goroutine with an explicit sleep and used this ticker to compute just the period event counts. Which introduced a new condition with the renderer. The stats details would be printed randomly around the window sometimes. I can imagine that under the hood, ncurses, is moving the cursor to all kinds of random locations on the screen and these instructions go on a stack. Sometimes the location its printing doesn't update fast enough and we get a shadow. Remember we are not doing a full screen refresh, only updating the exact location where a cells state has changed. Once we put a shadow somewhere if there is no activity it remains, and thats ugly.</p>
I was really making goncurses do too much work. The solution to erratic rendering is to put the stats in their own window. I think of it like creating a absolutely positioned div with a z axis as front as possible. In goncurses parlance the z is based on the order its created and thats a new way to introduce a bug but providing a fixed location to write my stats to keeps it from being accidentally rendered somewhere else. I imagine that ncurses creates a new stack for each window and combines their buffered stats on update. But who knows, this is the result:</p>
</p>
There is still a condition where we might write cells over the stats but its good enough.</p>
</div>

20 01 2025 🔗</a>
</h3>
Stats 🔗</a>
</h4>
Added channel based stats that help display how many broadcasts and the aggregate of life in the game.</p>
TODO</strong> Provide some change / second stats related to rendering.</p>
The next phase is to build a more rational renderer in SDL. Ncurses is fine when doing simple displays or text overlays. But high fidelity renders require some ASICS and even using garbage built in graphics hardware acceleration will provide a better experience, and it will be a little fun.</p>
</div>

19 01 2025 🔗</a>
</h3>
Profiling 🔗</a>
</h4>
What I have learned is that for the best effect in rendering we need to set some timing standards. The listen timing for a cell should be roughly double the heartbeat rate. Although this introduces an interesting issue where we might start blocking on our buffered channel. I don't know if the buffers should be bigger and if we should process all events. I suspect draining to the end of every channel and collecting only the final state is the correct item and listening more often.</p>
I found that profiling in golang is a little weak, as a beginner. I am more familiar with executing any binary observed by a profiler. In Goland at least it prefers to only execute profiling during tests. Which promotes testing and atomic profiling. But I haven't gotten to test design for this PoC yet and goncurses introduces its own challenge when running any code that requires a terminal. Again this is mostly a goland issue.</p>
The solution I found was to execute my test with profiling and then move the command that goland generated to a new terminal that goncurses would support and open the pprof it generated in Goland. I think the real fix here is to :TODO: check if the terminal is supporte by goland and skip window generation if its not. While this impacts performance around goncurses rendering it will at least allow me to focus on where my own code is non-performant.</p>
</div>

15 01 2025 🔗</a>
</h3>
Getting Started 🔗</a>
</h4>
So while this project was going to wait for Krappy Kafka</a> I got to a point where I needed to see a simulacra of the larget project. In this PoC I also added the following requirements:</p>

Must used concurrency</li>
Must not use mutexes (So its going to be channels)</li>
</ul>
The larger scale of the work stays the same. We will establish a neighborhood of automata and then allow them to communicate with each other. In this specific scenario they will broadcast themselves to their neighbors.</p>
Each cell owns its own channel and when the neighborhood initializes cells look for their neighbors and collect their channels. So with exception of the edges each cell holds a collection of 8 other cells. When all the cell is initialized it also gets bit vector of each of its neighbors in the order the neighbors are registered.</p>
Here is how neighbors are registered:</p>
for </span>i </span>:= -</span>1</span>; i </span><= </span>1</span>; i</span>++ </span>{
</span>  </span>for </span>j </span>:= -</span>1</span>; j </span><= </span>1</span>; j</span>++ </span>{
</span>    </span>if </span>i </span>== </span>y </span>&& </span>j </span>== </span>x {
</span>      </span>continue
</span>    }
</span>    </span>if </span>y</span>+</span>i </span>< </span>0 </span>{
</span>      </span>continue
</span>    }
</span>    </span>if </span>y</span>+</span>i </span>>= </span>height {
</span>      </span>continue
</span>    }
</span>    </span>if </span>x</span>+</span>j </span>< </span>0 </span>{
</span>      </span>continue
</span>    }
</span>    </span>if </span>x</span>+</span>j </span>>= </span>width {
</span>      </span>continue
</span>    }
</span>
</span>    cell.AddChannel(cells[y</span>+</span>i][x</span>+</span>j].BroadcastChan())
</span>  }
</span>}
</span></code></pre>
One caveat here is that our renderer is ncurses which tends to view the world as y,x</code> not x,y</code> which is a nuance but just means we tend to look our our world as a 2 dimensional array with y</code> being the first index.</p>
Assuming x</code> and y</code> is the location of the cell looking for its neighbors we create an offset of -1 to 1 in both horizontal and vertical directions to touch all 8 of its neighbors. You can visualize it this way:</p>
+-----------+-----------+-----------+
</span>|           |           |           |
</span>|  (-1,-1)  |  (0,-1)   |  (1,-1)   |
</span>|           |           |           |
</span>+-----------+-----------+-----------+
</span>|           |           |           |
</span>|  (-1, 0)  |  (0, 0)   |  (1, 0)   |
</span>|           |           |           |
</span>+-----------+-----------+-----------+
</span>|           |           |           |
</span>|  (-1, 1)  |  (0, 1)   |  (1, 1)   |
</span>|           |           |           |
</span>+-----------+-----------+-----------+
</span></code></pre>
So at this point we don't have self-initialization which is a problem as we are initializing the world before we start the simulation. But that is something for the future.</p>
Because a cell needs to know its starting state and the starting state of its neighbors we also accumulate the initial bit vector using 1</strong> for Alive</strong> and 0 for Dead</strong>.</p>
if </span>cells[y</span>+</span>i][x</span>+</span>j].State() {
</span>  cell.AddNeighborState(</span>1</span>)
</span>} </span>else </span>{
</span>  cell.AddNeighborState(</span>0</span>)
</span>}
</span></code></pre>
As cells live they listen for updates on its neighbors channels. These are buffered so they do not immediately block and each time a cell checks the mail it doesn't expect that a neighbor has sent them anything. So we assume no news is good news, and future state management is done by collecting a bit mask of the updates. We then compute the neighbors state by applying the bit mask repeatedly to the starting state. Ultimately, a cell is always hoping its memory is good enough to stay alive.</p>
for </span>_, neighborChan </span>:= </span>range </span>c.neighborChans {
</span>  </span>select </span>{
</span>  </span>case </span>_ </span>= <-</span>neighborChan: </span>// If a message is received on this channel
</span>    latestStates </span><<= </span>1
</span>    latestStates </span>|= </span>1
</span>    </span>//gotUpdate = true
</span>  </span>default</span>:
</span>    latestStates </span><<= </span>1
</span>    latestStates </span>|= </span>0
</span>  }
</span>}
</span></code></pre>
The bit vector of the initial state is in the same order as the channels the cell is listening to. This will allow us to later collect updates as a bit mask to progressing state on each cycle of the cells life.</p>
This is a good point to mention that we are deviating from the traditional GoL structure now because we are not binding the world state and the render state of the world. Each cell has some randomness in how fast it lives and this dictates how often it produces a heartbeat and its render speed. Due to this variety of signals we tend not to see smooth growth. Instead we see snapshots because our cells are sometimes living faster than we can observe. There is some work that can be done to tune a more consistent view although its not quite as beautiful</strong>.</p>
Cells only produce an event when they are alive and the moment of death. This means that when a cell comes alive it doesn't announce itself. This provides a small delay before impacting other cells. When the cells state changes we mark the change with the renderer. In this case we treat the terminal screen object as a frame buffer. While ncurses is particularly bad for reactive programming it does expose some primitives which provide an async render loop.</p>
Under the hood</p>
// NoutRefresh, or No Output Refresh, flags the window for redrawing but does
</span>// not output the changes to the terminal (screen). Essentially, the output is
</span>// buffered and a call to Update() flushes the buffer to the terminal. This
</span>// function provides a speed increase over calling Refresh() when multiple
</span>// windows are involved because only the final output is
</span>// transmitted to the terminal.
</span>func </span>(w </span>*</span>Window</span>) </span>NoutRefresh</span>() {
</span>	C.wnoutrefresh(w.win)
</span>	</span>return
</span>}
</span></code></pre>
Thus we can create a render loop in our main function that only calls Update</strong> on the Screen</strong>. As I mentioned before this means not every update is rendered and we get a smooth but not very organic rendering. There is also no clear a redraw we are progressively updating the display and accumulating writes. If you have worked in a rendering engine before you may be aware of the render loop that broadcasts a tick to all components and then draws the accumulated changes to screen. The smoothness of such renderings is based on the rate of change in distance. Our eye does a good job of building the motion between near states but not so good when the distance is rather far.</p>
</div>


An Internet of Changing Morality
Mon, 14 Oct 2024 00:00:00 +0000
</p>
As one might expect, Automated Imitation has dramatically changed the sales position for what it is to produce code and code-like products. As one might imagine there is a delusion that LLM's are producing code faster for startups at lower costs... Maybe. I don't see a lot of formal proof of this, which also doesn't mean it doesn't exist. I also don't tend to read hacker news so take that how you like.</p>
I have also noticed this move away from "Clean Code," which is fine, I guess because that "exciting" seems to be the same as my opinions on the dogmatic use of DRY and other "can't apply everywhere" navel-gazing.</p>
I think LLMs themselves are just a step in a direction, a real breakthrough for reliable generation of specifics from a broad context. My senior thesis back 20 years ago was just this. I wanted to shove the works of Mark Twain into a computerized token store and have it answer questions in the style and opinion of his writing. I was naive and had beautiful ideas...</p>
I wasn't able to accomplish that, but if there was an aspect of that project I should have spent more time on it was the ethics of the thing. There is clearly nothing unethical about the idea just that ethics always plays a part. A more interesting project would have been to partner with someone in my college's theology or philosophy departments to explore some of the bigger ramifications of the work.</p>
I have seen an interesting shift, between the layoffs and "New" product hype that's worth mentioning. I see companies refusing to apply the label AI to their products and instead terming them as LLM and ML products. This has always rung true to me, not because I am a person who avoids reductionist language, clearly, but because it more precise to describe a product as what it is. Intelligence might be taking it too far is all. To me the position of computers has always been the same, do annoying repetitive things and leave the fun stuff to me. I don't wanna do DNS lookups, and I don't think anyone ever has.</p>
I know this because in the 90s, I was on AOL(America Online) in a time without modern search engines and an internet barely with images. I had a printed book, like a phone book, with printed pages that listed AOL Keywords for Businesses. Yes, before there was really an internet, companies were competing over keywords to capture users' attention. For example CAFFE STARBUCKS - Caffe Starbucks or CANT SLEEP - The Late Night Survey. Back then, there was even a kind of human-run auction house where keywords were bought and traded. All this to capture the attention of my parents, who could barely be bothered. Everything was new, and yet it was just a better link aggregator, albeit like others from that time, like Geocities, long forgotten.</p>
All that is consistent is our need to share, and be free to do so. The cycles that we repeat are glorious in how they change our lives. Once they did so by what felt like an accident, now those are marketed to us trying to re-capture that glory. The days of guestbooks and visitor counters were replaced with comments and likes. We were never slaves to these tools; our limitation was our self-education. Once we learned CSS and Java Applets to grab visitors' attention. Now, we invent clever camera and editing techniques to collect users. The only change is the evolution of what was the product.</p>
Having stood sentinel to a world where the internet was almost free, through the advent of "popups" and "AdSense," and continuing to today, where there is no escape from the cacophony of influencers. I have always known that at its core all marketing is dishonest and all sales are exploitation but we all need the Money.</p>
I have often considered the impact of all these deceptive interactions on my psyche. The noise is so fervent and quick that it loses meaning creating a kind of alternate reality where I feel myself disconnecting from reality in defense of the conflict it creates in me. It makes me want to cry, but it also has so little meaning that I would give it, which cannot be named collective value. My brain wasn't formed for this, and I cannot confirm that any brains are, but it has become viral.</p>
What comes next, I wonder? Hopefully, it will be a renaissance through the revelation of Automated Imitation. A brain training on the noise produces "slop." It feels like that is probably the peek of its capability. We clearly have to create and build to grow the capabilities of a system that is a focused mirror on our desires.</p>
Is it then true that the world of Clean Code is no longer needed? Why bother with the craft of structuring code if we can rely on a computer to do it for us. It's not exactly the fun part of the work. It is to me and I will continue to do it, but that doesn't mean you do. But what rules will the computers use and will we even read the code anymore? Is there even a need for anything not to be a binary? Consider a website is just conjured when requested as opposed to rendered. What's the point of not even having a website anymore, and how will it differentiate the ads? Will the content I receive ever be free of targeted influence. Consider the existential horror when the same query posed by my partner and me on different terminals produces completely different experiences. Using different terminology, injecting vague opinions referring to targeted marketing that spans multiple queries, and tuning our searches toward products.</p>
The ethics of this astound me. Do all things need to grow? Are we all just enterprises? What if you didn't need to be a millionaire? How is luxury anything more than a prison?</p>
Back in my senior year of college, we also took an ethics course tailored to our future careers in software, guided by the understanding that there was no end to this roller-coaster. It was going to be the builders that set the standards of not quality but morality. Like the Clean Code, it takes very few poor actions to impact the whole negatively. If you wanna hear a fantastic take on "broken window theory" I'll direct you over to Jeff Atwood</a> for our purposes though the windows are broken, and its time to "take the neighborhood back."</p>
I took my current position because the idea of being closer to the world of Clean Code was inviting, but in selling a service, I have to find the balance of saying yes to Money and no to producing slop. I think the same challenge is that of the pharmaceutical industry, an enterprise ridden with questionable morality. I deeply attest that there are workers, scientists, sales agents, custodial staff, and IT who are fighting for the moral right to the best anyone can. I specifically refuse to believe that a scientist wants to use science to hurt people and that they take the evaluation and communication of risk very carefully, I am probably naive. How do we get to a place where the collective view of the product of their acts is less than ethical?</p>
If you had my 8th-grade history teacher, there could be only one answer. Let's all say it together, Money! Not that it is evil, but it is a prime motivator. It drives change because it's both the Golden Apple we covet and the Golden Apple we die for. Not a formal death mind you, but a spiritual one. We should be cautious of the need to succeed, and yes, maybe the era of worrying about the quality of our code has ended; I am still suspicious, but I prefer that the narrative be moved to Clean Values. We are not building so we can meet some arbitrary velocity that proves to our users, "we are doing things," but what we do matters.</p>
I would argue that every engineer who decided to stay with Apple when they were required to have an open app store in the EU so 2 app stores could be produced was an indicator of the issue. Heck, I remember when Apple stopped using DRM on MP3s because the juice wasn't worth the squeeze to keep building DRM as music prices fell. The decision to have a legally distinct app store proves they think the juice is worth the squeeze to maintain a monopoly on certain users. Humans will do the work, and those humans, in some small way, accept exploitation as acceptable. They are protecting the profits of their parent company over their fellow humans, including other developers, creating valuable things for humans. That was a battle lost where those who control the means of production should have forced liberation for all.</p>
You don't have to be a rebel to be moral; you just have to look at your actions and continue to take the one that produces the least suffering. We follow the rule, "Leave the campground cleaner than you found it." I would honestly rather argue about whether to build this on moral grounds during a code review or ticket grooming session than fussing about polymorphism anyway.</p>


Am I the Crazy One?
Thu, 10 Oct 2024 00:00:00 +0000
The feeling of AITA or "Am I just Crazy" happens to me a lot. I think it might be the core weakness when trying to manifest confidence in my work.</p>
Sometimes, it is not enough that your changes, be they code or process, tend to work correctly or, if you are lucky, efficiently. Something always comes in and makes me think, "No, I must be the crazy person. Why am I getting so much pushback?"</p>
Sometimes, I think it's the evolution of some impostor syndrome, but the reality is that everything I tend to do is the first and maybe the only time I will touch it. A lifetime of one-off experiences is quite exciting, but divining a wall to kick off from at the start of every endeavor has its costs.</p>
The reality is really perception and comprehension. I have learned that the smarter we are has very little to do with actual knowledge and, in some cases, even how fast we learn things. IQ might point otherwise, but that's not quite what I am talking about cause it's only a facet.</p>
I really overlooked the quality of perception, especially when playing DnD. It is attached to your Wisdom trait somewhat interchangeably.</p>
"Wisdom reflects how attuned you are to the world around you and represents perceptiveness and intuition." Dnd 5e</p>
But a person's depth of perception is really what sets us apart when it comes to divining sources of support. Simply knowing I can is almost enough. Getting others to understand you can, too, proves it.</p>
Contextually it would be easier if you could advertise your Wisdom in a way that tells others to let you run free without having to build their confidence in your skills. Perception has been the hardest thing I have ever had to explain to someone else because some parts of my awareness are not available to them. It sounds arrogant, sure, but it's not about being better; it's just different. I also see perception as a continuum; mine is all logical and mechanical. I can't perceive an object in a new color, but I can imagine what it looks like inside out.</p>
We have the same problem: We have a language problem explaining and comparing our perceptions. Maybe the core here is respecting others for their differences. When others don't understand you or your processes, it's possibly because we fundamentally see the world differently. Don't hold it against them or yourself.</p>


Not Invented Here
Mon, 30 Sep 2024 00:00:00 +0000
Not Invented Here Syndrome is something that has been entering my world a lot lately.</p>
To be a bit reductionist this is a bias to create everything by hand or rely on our own opinions vs those from "foreign" groups.</p>
I tend to think a brash example of this would be the Flat Earth movement. They prefer to rely on what they could experientially see and define themselves and refute those from public doctrine.</p>
But these days I see it when we practice Big Business Bullshit Bingo (4Bs). The constant need to redefine value and process with our own words to enact a sense of possession over them.</p>
Kinda like Science Doctrine, we have been doing the business thing for a long time. We have been selling and building and billing for much of human existence and like science we have done the due diligence and the tools have been laid bare on the table. Would it be so wrong to borrow and use them? Perhaps even enhance them with our own distinct color and give them back.</p>
I find this modality funny because I love open source which constantly teeters on NIH and standardization. But as a proof NIH becomes standardization not the other way around successfully. That process is through great collaboration and effort.</p>
There must be some psychological effect that happens as we mature in a career where we, myself included, start forgetting that not all good ideas are our own and that bespoke for the sake of possession is meerly a siren song.</p>
TLDR; Be boring, get shit done, fix only whats broke!</p>


The Software Delivery Trap
Fri, 13 Sep 2024 00:00:00 +0000
I have been hearing a lot lately about the focus on "Delivery," and it has always struck me as somewhat reductionist and linear in its thinking.</p>
I mean, I get it. Somewhere, a group of (non/ex) builders said, fallaciously, that everything can be built and delivered as a trackable lifecycle. There I go now, being a reductionist, but the difference between building and selling is in play here.</p>
My conflict concerns ownership of the software product and production of technical debt. This is informed by my personal experiences, so keep that in mind. Software work comes in multiple flavors, which primarily impacts the technical debt. For a long time, I have said.</p>
"a software engineer's job isn't to build software but to manage, even anticipate, change."</p>
The pragmatic programmers can leave now and go yell at someone else's clouds.</p>
For those who remain, there is no reason we can't do both. Think of the box that we will ship now and the box that we will ship next week as distinct but part of a system. I often don't see that happen because, in my experience, the roadmap is less than six weeks long for many companies.</p>
Let's talk about behaviors that are successful, though, and not mire them in  Big Business Bullshit Bingo terminology. For a product to be successful and be deliverable it must:</p>

Have at least a 3-month roadmap with stretch goals</li>
Understand and have mapped the dynamics of the "socio-technical" system it will be delivered in [Teams, Vendors, Business Processes]</li>
Presented to teams as something more significant than a 2-week unit</li>
Must leave space for non-deliverables [System Design and Maintenance]</li>
Must be assigned champions, not just responsible for the team but for driving the completion of the work and owning when it will inevitably go a little pear-shaped.</li>
</ul>
That last one is the next thing we need to train and hire for: ownership. In the days of my grandfather, we would call it "pride in our work." I am going to do a thing, and I am going to be proud of it. If this seems abstract at all, I advise you to read "The Four Agreements" by Don Miguel Ruiz. It's a quick read, I promise!</p>
We build ownership in a few ways, but the easiest is continuity of effort. We work on a problem until it is either SOLVED or GOOD ENUF. The vagueness of the latter is important. Consider this a real-life example of P versus NP.</p>
"can every problem whose solution can be quickly verified can also be quickly solved"</p>
The answer is Nope :)</p>
"Good enuf" means we have at least awareness of the tradeoffs we made and have left some record of the decisions that led us there.
So:</p>

Be proud of your work</li>
Be prepared to say no to work that doesn't have a clear roadmap</li>
Be prepared to ask for different if you don't want to be an SME</li>
</ul>
To be clear ownership is not being a Subject Matter Expert. Sometimes toxic, some of us don't want to be pigeon holed and that scares some from ownership. In a healthy work system three has to be a balance were what you do often will be things you have done in the past but it shouldn't be the only thing you do.</p>
There is another tifle though, uninspired work, in this case I often see developers doing a little as possible to move on to something more interesting. Once again I go back to my grandfather who didn't believe in small work. There's a waste of time, and those don't count, it is our responsibility to contextualize the value of our activities and seek clearity when value is low. But, a job still needs to be completed, and doing your best work means doing the work until its Done. I see the opposite in places with uninspired work, the team kinda wanders off from boring work and take so long it gets re-backlogged. The choise of learning something new to solve a problem and letting it kinda become more expensive than its worth occurs.</p>
This being the rarest trait I find in most workers, not just software, is doing the job right, completing the work, and having standards. Sounds rude yea, well sometimes the truth hurts, and thats a learning process. I am not judging you, but if you feel some pangs, well...</p>


Sankey Git
Wed, 11 Sep 2024 00:00:00 +0000
Idea generation 🔗</a>
</h2>
Something odd about idea generation is its heredity. Maybe a sankey isn't the right way to diagram it but I think it would look cool if it was weighted by number of commits and the distance between commits.</p>
Probably relational timestamps for related gits could inform on the branching order. I dunno.</p>
Tagging 🔗</a>
</h3>
So for that work I need a network of git repos or todolist items sharing the same tags</p>
Logseq 🔗</a>
</h3>
Probably the same is true about logseq graphs if I can track time.</p>


Learn Event Streaming by Recreating Kafka
Tue, 10 Sep 2024 00:00:00 +0000
Learn Event Streaming by Recreating Kafka 🔗</a>
</h1>

I don't know if I really like Kafka all that much</strong></p>
</blockquote>
That said its an interesting way for applications to communicate. See I have been imagining a global game of life implementation with distributed realtime events for each cell in the world. While that is a rather far off dream it made sense to tackle one of its problems. In the past I have setup Kafka development envs with Nix</a> this is all pretty easy. It even goes so far as to try and track how many shells are currently running to track shutting down Kafka when all is said and done. The only reason I do this is because Kafka is written in Java, famous for the addage, "write once use 80% of the resources everywhere."</em> For better or worse that has always stunk for me.</p>

Kafka can't be that complicated write?</strong></p>
</blockquote>
Thats probably both true and false. The only way to tell would be to try. So while this might be structured like a tutorial its really a devlog of the failures to interpret the features.</p>
Goals 🔗</a>
</h2>
To start we want to create a realtime streaming platform that provides at least:</p>

A binary protocol</li>
A protocol built on TCP</li>
TCP connection multiplexing for consumers and producers</li>
Event Stream and Log Sequence Merge storage</li>
Be filesystem oriented where possible</li>
Don't invent everything just whats needed</li>
GUI testing and debugging tools</li>
Protocol client for consumers and producers</li>
</ul>
Arch Diagram 🔗</a>
</h2>
DevelopMeh

Kwik-E-Mart: Who Needs a Gas Town When a Gas Station Will Do

Catalyst: An Orchestrator That Stopped Asking and Started Deciding

Automatic Programming: Iteration 4

Keep Your Eyes on the IDE, and Your Robots on the Tickets

BATS - Testing Bash Like You Mean It

When Not to Use BATS 🔗</a> </h2> BATS is for testing bash scripts and CLI tools. It's not for:</p> Testing web UIs (use Playwright, Cypress, etc.)</li> Unit testing Go/Rust/Python code (use your language's test framework)</li>

Agentic Patterns: Elements of Reusable Context-Oriented Determinism

In Practice 🔗</a> </h3> If I align this to Anthropic models:</p> Orchestrator -> Opus (Reasoning)</li> Implementer -> Sonnet (Competent)</li>

Just Forget About Owning Code

Rust Dancing ANSI Banana with Server-Sent Events

A Deterministic Box for Non-Deterministic Engines

Making Shit Up 🔗</a> </h3> Yep, so that's not a tradeoff, it's a flaw, one we haven't solved yet. When the context is ambiguous the model chooses to do one of two things:</p> Just pretend it didn't hear what it was asked to do</li>

Claude or Clod

The AI Diaries

03 02 2026 🔗</a> </h2> This is just a thought process I go through with LLM generated code...</p> OK, I can produce more code than I reasonably can keep track of in a single session, which means there is always going to be some code I didn't read.</p> </blockquote>

The Magic of Stubbing sh

Copying Life

Sufficient Complexity

Do Devs Really Do DevOps in your Org?

Do Devs Really Do DevOps in your Org? 🔗</a> </h2> Recently, I learned the more formal definition of shift-right and shift-left in terms of Agile DevOps. For a brief refresher and for brevity it goes a little something like this:</p> Shift Right -> Validation and testing in production</li>

Creative Impostor Syndrome

This Week's Crazy Idea