Nick Khami's Blog

Engineering Management 101

Nick Khami — Thu, 09 Apr 2026 12:00:00 GMT

I started as an "engineering manager" (more on my personal feelings for the title later) about 9 months ago. Prior to that I was a founder/CEO. The peak size of my team as a founder was 8 and the peak size of my team as an EM has been 9. So I would say that at this point I have around 3.5 years of total management experience.

I think being a "manager" is mostly the same as being an IC, but with additional responsibilities like managing headcount, budget, and providing coaching. You're also expected to do things like executive writing, speaking engagements, vendor procurement, 1:1's, team bonding events, shoutouts, and project planning that you wouldn't have been responsible for as an IC.

Personally, rating myself on the Dunning-Kruger chart, I think I'm about here right now.

I still ship code, write product specs, write engineering specs, approve PRs, and post on socials. The biggest adjustment from CEO to EM was the responsibility change, specifically not being responsible for fundraising.

Thankfully, I didn't particularly enjoy fundraising. In general I just like working hard and solving problems so there's nothing specific that I really care to focus on or maintain doing. In the future, I could see myself fundraising again.

Servant Leadership

I subscribe to the servant leader approach to a large extent. To me that means my priorities are:

Help the business grow
Get the people I'm supporting promoted

Even when it comes to 1, I am usually motivated to make the business grow because it means that there is now space for the folks I support to be promoted. I also really resent the words "manage" and "report to." I think they create a hierarchy that doesn't really exist. I use the term "support" instead.

Once you're not the IC actually doing the thing, you immediately have less authority over how it gets done. I basically acknowledge that at this point I make general suggestions for things and then people decide to what extent they want to listen. Usually their instincts are right. I hired well at my startup and Mintlify hires well too. That said, people definitely aren't "reporting to" or "being managed by" me.

When someone ignores my suggestion and their instinct turns out to be wrong (which is rare in my experience), usually it just comes out as the work being less successful than it otherwise would have been.

What I care about in this case is that they acknowledge that they'll try something different next time. Sometimes that's my original suggestion and sometimes it's not, but what's important is that there's a learning.

One thing I used to believe is that you were also a servant to those who are supposed to be supporting you. I no longer believe that. You should support who is supporting you, but not serve them. Keep your own priorities intact in places where you might otherwise discard them for someone you have explicit leadership responsibility to.

Getting People Promoted

I have two expectations:

People who I am supporting skill up to get promoted. Usually this means embracing additional complexity. As a software engineer that can mean hiring, doing planning, leading and making sure the right work is prioritized throughout, or owning product by interviewing customers, having vision, pitching, etc.
Once they reach 100% ownership of a new skill level, they maintain that performance proactively and without active support for 6 months leading up to a promotion.

To help people skill up, you usually have to do some amount of the work for them then hand it off. So, for example, you make a half-finished markdown doc with a list of issues then you hand it off and the person you're helping to skill up finishes it and gets tickets into the issue tracking system. This method is often described as founder mode'ing.

Or, if they are very junior, you can get a PR into draft state such that they can finish it. You slowly get them to 100% of the task by doing it repeatedly with this handoff method scaling from them doing 10% then 20%, and so on. Then, following, support them such that they can maintain 100% of the task's complexity for 6 months without burning out.

I reject the whole frame of throwing people you support into the deep end with a "what doesn't kill you makes you stronger" mindset. I think there is failing and it's incredibly important to fail. But that should feel like a comfortable and supported experience instead of a painful struggle. People grow more in healthy and supportive environments.

The Most Common Failure Mode

Sometimes people who move into "management" start thinking work is beneath them. They take so many meetings and burn so much of their time writing docs that people don't read that they stop actually getting work over the finish line. When you see someone struggling who you are in charge of supporting, you have to get in there and embrace the burden so you can support them in digging their way out.

This is the most important part of the job.

Occasionally I see a project where no code goes out to production for an entire day. In these specific cases I always will jump in and get some work over the finish line. Or it can be gigantic issues in the issue tracker that need to be broken down. I'll just go in and break them down myself. You have to parachute in sometimes.

I don't like the signal that I am taking over and now have distrust, but if you are being earnest about it and doing it in a way that is supportive, then you can avoid those bad vibes. You just need to communicate clearly and explicitly explain your intentions. "Hey guys, I see that we haven't gotten any PRs in. I'm going to help kickstart the process. If I'm stepping on your toes or repeating work here, please let me know."

On 1:1s

My hot take in management is that 1:1s are not entirely the meeting of who you are supporting. Of course it's their meeting if they push and take ownership of it, but I have personally rarely found that to be the case.

I usually ask what they're working on, what they want to do better, what they would want me to do if they could wave a magic wand, and what feedback they have for me.

You likely shouldn't fight overly hard to surface interpersonal problems if they aren't brought up somewhat naturally. It's much more important to focus on helping people actually complete their work. The job of an IC is to get contributions out into the world for people to use. Focus your energy on supporting every aspect of that for the person you are working with instead of magnifying problems interpersonally to the extent you can. Of course, if there are major issues then you need to address and work those out, I just personally err on the side of caution here.

Increasing pace and getting work shipped usually solves most problems in my experience thus far. But again, Dunning-Kruger, this out of all my opinions is where I think I will continue to evolve my thinking the most.

Techniques

I have a few techniques I like a lot.

Async Todolist Standup

Replace standup with an async todolist Slack channel. Have people post a list of what they are doing every day in the morning instead of having a synchronous meeting. There are many reasons for that which I explain in the linked article.

Monthly Brag Docs

Write a monthly brag doc for everyone you support. I do this after every single month. It gives me a clear picture of what each person did and I often change my opinions on performance level when I write them. Throughout the month you can just lose track of things so it's really important that you confront your biases on a recurring basis. I use the Slack MCP and Claude Code to look back through everything and compile all the information. AI has really changed the game here.

Relentlessly Encourage Public Channels

I frequently redirect conversations being had over DM into public channels. I see a lot of my job as supporting the folks I am responsible for in feeling secure communicating there in public. Ultimately it's incredibly important that their contributions are visible to the entire org as much as possible.

Sometimes when I redirect, people feel like their idea isn't clear enough. So you can rewrite their message into something clearer and say "hey, would you be comfortable sending this?"

People need reassurance that if they make mistakes, you will cover them. Which I do. You have to really encourage the act of failure. No successes will ever happen if people don't feel comfortable failing. Limit testing in a corporation is not natural for most, so it's your job as a "manager" to encourage it. You want people to find the level of complexity where they either don't know what to do or mess up. If they don't get there frequently then they won't have rapid skill growth.

On Leveling

In my opinion, leveling (by title) primarily exists so people don't feel awkward.

You want everyone to have clear and fair expectations for themselves and those around them. When someone is overleveled, there is tension because people try to hand off work or get help from someone who is supposed to be at a higher skill threshold than them. But in reality they are not higher skill and it's just an awful time for everyone.

Then, when someone is underleveled, there is an issue where that person becomes restricted in how much impact they can actually have. Instead of scaling themselves and spreading techniques and processes, they leave your company for another job, start doing work on the side, or just quiet quit. Occasionally a super motivated person will stick it out, but I think that is relatively rare and ultimately not in their best interest. Anyone who is able to do that without a company change is incredibly high grit and special.

Rarely can you "demote" someone. More often than not you just have to fire. When I have had to fire, it is usually because of two things. There is either a lack of will -- someone just can't summon the energy to try -- or it is a lack of skill. Almost always I have learned when firing (including myself out of being a founder/ceo) that for some reason or another there is a lack of will. People get stuck and stop taking actions altogether.

When it comes to underleveling, you never want to promote too fast. I am very firm on 6 months of sustained performance. When people have asked for faster, if they are crushing it and you need to promote to retain them, then do it with compensation instead of title.

Drive-By Management

I think the best founders and managers are not naturally bossy people. Rarely are you successful when you come across as domineering. To that end, founders can often struggle with being in a drive by management mode trying to get work done.

They sometimes feel overly shy about being bossy. They want everyone to take high ownership of problems and therefore default to understeering. The problem is that without explicit framing, people misinterpret fyi's as plea's or plea's as fyi's. Wade Foster from Zapier has a great system for preventing this.

#fyi: something interesting. An article and podcast, etc. I thought you might like it. But if not no worries. Nothing to see here.

#suggestion: a passing thought. I sometimes have good ideas. You might like to hear good ideas. If I'm in your shoes, I consider it. But I'm not in your shoes so do what you'd like. A friendly response if you don't go with the suggestion is nice, so I make better suggestions over time, but is by no means necessary.

#recommendation: I've thought a lot about this. Perhaps even lost sleep. I've invested deeply. I think this is a good plan. You can still disagree and go a different direction, but walking me through why you are doing this is kindly requested.

#plea: We don't have a lot of mandates at Zapier, but this is one. Please do this. If you disagree enough that you can't go along with it, we should both reconsider our roles here. It's that important.

I use #fyi and #suggestions all the time. #recommendation much less so. And #plea is almost entirely unused.

I learned it from Andreas Klinger who is a great follow on X by the way. Whenever I am communicating directionally, I explicitly callout where I'm at from fyi to plea. It's rare I exceed a suggestion, but does happen. Plea's usually at most once a quarter. You cannot burn your social capital being at a recommendation or above all the time. People will just get annoyed and quit or stop taking you seriously.

I strongly suggest you adopt some system like this when you begin supporting people as a career path.

Personal Brand

I think building a personal brand is an incredibly important part of being a leader. You want to be "known" for things. You want to have a reputation. You want people to understand your values and what you care about so they can properly assess how to engage and work with you.

If you expect to be able to change companies or orgs and continue leading, then it's important there is an incredibly public track record of your work. Unlike being an IC, you are rarely directly "shipping" things when in manager mode, so you need to have social influence beyond. Good leaders can own go to market for their projects and part of that is having a personal brand. This was true before social media when you would write columns in newspapers and it's even more true post-social media when you can build a huge following from your phone in the moments between spinning up your favorite coding agent.

I fall on the "become a thought leader" end of the spectrum. You want to have trust both internally at your org and externally with peers in your industry that you are first-principled, make rational decisions, and can be trusted to own blame for your own decisions and those of who you support. If you don't have a personal brand then I do not think you would be able to escape middle management. Getting to a "manager of managers" in my opinion requires a personal brand to some extent, likely closer to the "thought leader" end of the spectrum.

You can be a great IC with no personal brand. I think that's the appeal of staff IC. But for leadership, you do, unfortunately, need to build one. Look at Bill Belichick, famously terse with media during his Patriots tenure, then immediately started his own podcast and network TV appearances once he was out to justify another coaching role. Same for JJ Redick, built a massive audience as a podcaster soon after retiring from the NBA, then parlayed that into coaching the LA Lakers.

They built credibility through public performance at first by being on professional sports teams who had games broadcasted on primetime TV slots and then, once that was over, they pivoted to alternative media forms prior to getting back onto a high performing and visible team.

Post instead of cruising early and dodging media like they did, because odds are that you're not an athlete performing on primetime TV every week. You need to build your brand through other channels.

I am always pushing the people I support to build and write in public. If someone is resistant to it, I let it go. It's not a hard requirement unless they are going from manager to manager of managers level.

My personal brand, in my opinion, is a strong attracting force for hires and customers. It's effectively a marketing channel for the business, particularly for brand marketing. The ROI is hard to attribute directly, but it is certainly there.

I spend between 10 to 20 hours a week outside of the 9-5 work hours on personal brand. Writing, recording, commenting, and scrolling. I think of scrolling as research. It's very intentional. Final note on this, if you do decide to work on a personal brand then you'll find that some people don't like your "style". I think that's fine and you shouldn't worry about it. The beauty of the brand being "personal" is that it's yours and not everyone has to be aligned with it. Do whatever you're comfortable doing. Ignore haters.

Most Importantly, Please Do Not Be a Cynical Manager

If you take nothing else away from this post, please let it be this section. I think the best "managers" (🙄) are optimistic and positive. They earnestly want to contribute to growing the organization they are at and the people they support.

I heavily disagree with Sean Goedecke's writing on management. He overindexes on politics and "managing stakeholders" which I think is incredibly cynical and short-sighted. Nothing good is going to happen from analyzing and kissing ass all the time. High performing teams and organizations are all aligned in being high action and throughput. If you are high action and throughput then, in my opinion, things usually work out.

Of course, that is to an extent. Be a good person, be kind, apologize when you make mistakes, buy people gifts when they do things that help you, say thank you, say please, say good morning. Don't be a cold and unfeeling curmudgeon. But do also primarily focus on doing your best work and putting things out there into the world.

Everyone wants to solve problems and build cool shit at the end of the day. Or well, at least everyone I would be excited to work with.

Startup Marketing 101

Nick Khami — Mon, 23 Feb 2026 12:00:00 GMT

Everyone in startups has heard the advice: "don't tunnel vision on product, make sure you do marketing." If advice were a horse, that one would have been beaten dead a decade ago. Some version of it appears on my X feed every single day.

I heard this constantly while founding my previous startup Trieve, and I bought into it. You can find old TikTok posts from August 2023, four months before we raised funding or got into YC. Then, ironically, some switch flipped once we became venture backed. The posting stopped. We turned inward and focused on product. It saddens me in hindsight, because our product was getting a whole lot better at exactly the moment we went quiet.

What Changed?

Startup media and accelerator programs create an expectation of a "launch" event, think of TechCrunch Disrupt in the Silicon Valley TV show, Supabase's infamous "launch week", and of course the OG themselves - ProductHunt.

I'm kind of an idiot and let being "post-fundraise" change my mindset. We felt like we had the capital to just burn money on ads when it made sense and therefore went heads down building in silence for some number of weeks to then pop our heads out once or twice a month and do a big launch event. That was, without question, horrendous strategy and terrible CEOing on my part.

The best comparison I can think of is composing a song out of nothing but thirty-second rests and cymbal crashes. People tune you out. Good marketing should feel more like an EDM track: a steady beat with the occasional drop. You want consistent content that people can engage with, punctuated every so often by a big announcement that gets them excited.

Executing the Slow Drip Launch

You can post and launch all of the small things you ship along the way to the final product. Get the login page working? Post about it. Add the ability to invite users into your org? Post about it. Put new actionable insights in the dashboard? Post about it.

Each of these is a chance to build awareness and improve your yapping abilities, so once your product is finally stable and working, you have the skillset and audience necessary to get a base of people familiar with it and excited to share.

The alternative is to put all your eggs in one basket and wait until you have a big announcement to make. This is the strategy that most startups follow, and it is a high-risk, low-reward play. If your launch goes viral, you can get a huge boost in awareness and users, but content on social media is rarely evergreen and gets buried in feeds quickly.

Your best case scenario is a couple days of electricity in return for weeks or months' worth of work. And, worst case, your launch falls flat and you get literally nothing out of it.

Tactic #1: Personal Brands

I hate to quote Roy Lee, but he's not wrong when he says "most of u tech ppl are doomed to be ngmi forever on x. ur just not funny or sarcastic or arrogant enough for this place". Founders, including myself, are typically nerdy software-engineer type folks who are boring to the extent that building a personal brand on X or elsewhere is going to be a struggle.

However, I'm here to tell you that with enough failure, any skill issue can be overcome. It takes a lot more effort than being naturally interesting, but you absolutely can activitymax your way into an audience by posting a lot, replying, and engaging with people active in your niche online.

That's not to say you can post terrible content nobody likes and succeed, you definitely do still have to aim to entertain, but you can pick that up as a skill over time. You just have to be comfortable posting into the void for a while until you start to figure it out. Failure is part of the process with marketing the same way it is with everything else.

The light at the end of the tunnel is that success on social media tends to compound. While it's true that social media feeds are more competitive than ever and no longer show your content consistently to followers, there will be some people who consistently engage with your content and see it day after day.

Their engagement kind of serves as a core that makes your content count as a live shot on goal, so the platform you're posting on at least tests if your content resonates with a wider audience. The size of that "test group" gets bigger as your following grows, and you therefore start to more consistently go viral over time.

Finally, I want to note that you should endeavor to not do this alone. Ideally you hire people or have co-founders and you all have different angles and audiences, so you can test different messaging and content styles to see what resonates as you build.

Imagine you have a classical cast - engineer, designer, and businessperson. Engineer can post knee-high sock photos about how you're using Rust btw, the designer can share overdone figmas nobody's ever going to build, and the businessperson can complain about how they were rejected by 67 VCs before getting their mom to finally write the first check.

Over time each person's social graph will grow in different directions and you'll be able to test product marketing messaging with different hooks and audiences to see what resonates the most so you can double down on the best possible angle for the big launch. Think jackass for startup marketing.

Tactic #2: Field Marketing

Host an event once you know what you're building. If you put some money behind an open bar and a DJ and message some people an invite, you can usually get a pretty good turnout. You want to do your best to get people who you think have the problem your product solves to show up, but even if you just get a bunch of friends, it's usually worth it.

At some point, probably about a third of the way through the event, grab a mic and talk for a few minutes about what you're building. Don't bother with a demo or presentation or video, just talk naturally about why you decided to nuke your future career prospects and work 996 for a 1% shot at building something people want. If you can get a few laughs and make it feel like a fun story, people will be more likely to remember it and share it with their friends.

Use something like Partiful to manage RSVPs and send reminders, and make sure to collect contact information from attendees so you can follow up with them after the event. Auto-enroll people who show up in your product newsletter and send them once a week updates about your progress. If you have a launch date, make sure to send them a reminder a few days before so they can be ready to support you on launch day.

I like this one because it's pretty earnest and doesn't require being funny or clever to execute well. If you can throw a good party and tell a good story, you can get a lot of mileage out of this tactic. Hosting "VIP Dinners" also tends to be a pretty good lead funnel and functions in the same way.

Tactic #3: Capitalizing on Trends

I think founders, including myself, are often stubborn when it comes to trend-driven marketing. We tend to feel like adding product features purely for the sake of "going viral" is a sellout move, and that we should only build things that are directly related to our product vision. While I do think it's important to stay true to your vision, I also think it's important to be flexible and adapt to trends when they make sense.

On that note, I think competitive surfing rounds are a reasonable proxy metaphor for how to think about this. When you're in a surf competition, you're only going to be allowed to be out in the water for a certain amount of time, so you have to be strategic about which waves you choose to ride. You want to pick the waves that are going to get you the most points, but you also want to make sure you're not hesitating too long and missing out on rides that could be good but aren't perfect.

You're always under similar time pressure in startups, if you miss a growth goal for a single quarter or sometimes even month then it can be a huge problem for your employee retention and fundraising prospects. Therefore, you can't afford to be too picky about which trends you choose to ride. If there's a meme or topic that's relevant to your product and has the potential to get you a lot of attention, you should probably jump on it and ship even if it's not perfectly aligned with your vision.

On a lower level, my recommendation to get started on this is turning on post notifications for accounts in your niche that are good at this and more or less copying what they do. Reply to the same things they reply to, post about the same topics, and use the same formats. You can add your own twist to it and actually make product changes over time as you get more comfortable with the format and start to understand what resonates with your audience.

I'm Begging You to Post

If you take nothing else away from this, please for all that's holy, just post. It increases your odds of getting lucky and making it by orders of magnitude. And, odds are nobody's even going to see your content anyways, so stop worrying about embarrassing yourself.

Replace Your Standup with a Todo List

Nick Khami — Fri, 02 Jan 2026 12:00:00 GMT

Note: You can watch me write this blog post on video here on x.com!

Standups create anxiety. You hoard updates during the day because you need content for the meeting. You forget your blockers. You adjust your schedule to be in early, even if you work better at night.

I ran into this when I started my company. People join startups to escape corporate bureaucracy. When I tried to introduce morning standups, the early hires pushed back hard. It didn't feel like a startup move to them.

Given that, I went back to the drawing board to break down the actual utility of the meeting. A standup exists to distribute context so a team can parallelize work and maintain synchronization.

If that's the goal, you don't need a meeting.

The Process

Post your task list in a channel like #standup first thing in the morning
Edit the message to cross off tasks as you complete them throughout the day

Here's what one of those messages looks like.

- Fix login bug on staging
- Review Sarah's PR #234
- Write API docs for /users endpoint
- Sync with design on checkout flow

As you work, you come back and edit the message to cross things off.

- ~~Fix login bug on staging~~
- ~~Review Sarah's PR #234~~
- Write API docs for /users endpoint
- Sync with design on checkout flow

Advanced Version

Add timestamps if you want to track how long things are taking you or otherwise provide more context.

Today:
- [9:15-10:30] ~~Fix login bug on staging~~
- [10:30-11:00] ~~Review Sarah's PR #234~~
- [11:00-?] Write API docs for /users endpoint
- (no start time since previus task is unfinished) Sync with design on checkout flow

That's it. Everyone sees what you're working on. No meeting required.

Org-Level Email Campaigns are Somehow an Unsolved Problem

Nick Khami — Sun, 28 Dec 2025 12:00:00 GMT

Defining the Goal

We launched new community discord and slack bots recently at Mintlify and needed to do some customer marketing to let users who had large communities know about them.

Our goal was to send an email sequence to the ~3-10 people from each organization who we identified as being able to get value from the feature. Understand that the nature of this product is that once one person from an organization enables the bot, the task is complete, and we don't want to keep emailing everyone else from the org.

I thought this would be easy using any of the email marketing tools out there like Resend or Loops, but this was not the case. All of the available tools shockingly handle campaigns at a user instead of organization level.

None of them have the concept of "stop emailing this group when any member takes action." They're all built for individual drip campaigns, not org-level outreach.

Tool Selection

Claude Code has made me quite lazy insofar as I try to get everything I can done using it instead of doing things myself manually. Therefore my number one criteria when picking a tool to solve the above problem with was a large API surface that claude could work with.

Instantly was my final selection given its API was the most robust and well documented. It gave claude full access to campaigns, leads, and sequences while also being capable of sending webhooks on reply and unsubscribe events.

Kind of random aside but JAMstack architecture patterns are probably going to make a comeback with AI. I think tools like trpc are going to fall out of favor relative to openapi driven patterns that AI agents can better understand. Seperation of UI and business logic is the way forward if we want our apps to be accessible by AI.

Solution Architecture

The Instantly campaign holds the multi-email sequence and is configured to stop on reply. A lead upload script reads a CSV of contacts, groups them by company, and uploads them to Instantly with a companyName custom variable.

Then a webhook server listens for reply events, finds all leads from the same company, and marks them as "not interested" to stop their sequences.

CSV (company, email)
    → Upload Script
    → Instantly (leads with companyName)

Reply received
    → Instantly webhook
    → Webhook server
    → Find leads by companyName
    → Update lead status
    → Sequence stops for whole company

Creating the Campaign

The campaign itself is straightforward. Set stop_on_reply to true so Instantly stops the sequence for whoever replies, then define your email steps with delays between them.

curl -X POST https://api.instantly.ai/api/v2/campaigns \
  -H "Authorization: Bearer $INSTANTLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Feature Launch Outreach",
    "stop_on_reply": true,
    "stop_on_auto_reply": true,
    "sequences": [{
      "steps": [
        {"type": "email", "delay": 0, "variants": [{"subject": "...", "body": "..."}]},
        {"type": "email", "delay": 3, "variants": [{"subject": "...", "body": "..."}]}
      ]
    }]
  }'

Uploading Leads

The important bit is to include a companyName custom variable with every lead. Our webhook server uses that to find all leads from the same company and unsubscribe them on relevant events.

import csv
import requests

API_KEY = "your-api-key"
CAMPAIGN_ID = "your-campaign-id"

with open("contacts.csv") as f:
    reader = csv.DictReader(f)
    for row in reader:
        emails = row["emails"].split(";")
        company = row["company_name"]

        for email in emails:
            requests.post(
                "https://api.instantly.ai/api/v2/leads",
                headers={"Authorization": f"Bearer {API_KEY}"},
                json={
                    "email": email.strip(),
                    "company_name": company,
                    "custom_variables": {"companyName": company},
                    "campaign": CAMPAIGN_ID
                }
            )

Registering the Webhook

Tell Instantly to POST to your server whenever someone replies. You'll need the campaign ID from earlier.

curl -X POST https://api.instantly.ai/api/v2/webhooks \
  -H "Authorization: Bearer $INSTANTLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "target_hook_url": "https://your-server.com/webhook/reply",
    "event_type": "reply_received",
    "campaign_id": "your-campaign-id"
  }'

Webhook Server

This is the part that makes it all work. When anyone replies, the webhook server extracts the company name, queries Instantly for all leads with that company name, and updates each lead's status to stop their sequence.

const express = require('express');
const app = express();
app.use(express.json());

const API_KEY = process.env.INSTANTLY_API_KEY;
const CAMPAIGN_ID = process.env.CAMPAIGN_ID;

app.post('/webhook/reply', async (req, res) => {
  const event = req.body;

  if (event.event_type !== 'reply_received') {
    return res.json({ status: 'ignored' });
  }

  const leadEmail = event.lead_email;
  const companyName = event.lead?.company_name;

  if (!companyName) {
    return res.json({ status: 'ignored', reason: 'no company' });
  }

  // Find all leads from this company
  const leads = await findLeadsByCompany(companyName);

  // Stop all of them (except the one who replied, they're already stopped)
  for (const lead of leads) {
    if (lead.email !== leadEmail) {
      await stopLead(lead.email);
    }
  }

  res.json({ status: 'ok', stopped: leads.length - 1 });
});

// Instantly's API doesn't let you filter by company_name server-side.
// The search param only works on name/email. So we fetch all leads
// and filter client-side. For large campaigns, you'd want to cache
// this or build your own company->leads index.
async function findLeadsByCompany(companyName) {
  const leads = [];
  let cursor = null;

  do {
    const resp = await fetch('https://api.instantly.ai/api/v2/leads/list', {
      method: 'POST',
      headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' },
      body: JSON.stringify({ campaign: CAMPAIGN_ID, starting_after: cursor })
    });
    const data = await resp.json();
    leads.push(...data.items);
    cursor = data.next_starting_after;
  } while (cursor);

  return leads.filter(lead => lead.company_name === companyName);
}

async function stopLead(email) {
  await fetch('https://api.instantly.ai/api/v2/leads/update-interest-status', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({ lead_email: email, interest_value: -1 })
  });
}

app.listen(3000);

You can host this anywhere that can receive HTTP requests. I used a cheap VPS with Docker behind Caddy for SSL. Other options include Cloudflare Workers, Railway, Render, AWS Lambda, or Vercel. The server is stateless, so any hosting that can run Node.js will work.

Someone Please Build This

This solution works, but it's more complex than it should be. The fact that I had to build a webhook server to get org-level behavior is absurd. This should be a checkbox in every B2B email tool.

What I actually want is to define groups of contacts by company, toggle "stop group on reply" as a campaign setting, and have the email tool handle it without webhooks.

If you're building email tools, please add this.

Working with Me

Nick Khami — Mon, 15 Dec 2025 12:00:00 GMT

Several people have recommended I write a "working with me" document to help onboard new collaborators, teammates, and contractors and I finally got around to it. Here it is!

How I Communicate

I will always provide you with a quick response or answer, but will often go back and edit the message to clarify or poke at the topic with further thoughts over the course of the next hour or so. I believe word economy is important and try to be quiet if I have nothing valuable to respond with.

How to Get My Attention

Call my cell phone if you have my number. Otherwise, @ me on Slack. I strongly prefer that you call me though. Most of my texts are voice transcriptions and I tend to add emojis to communicate my intended tone. It's easier to get all that right over a phone call when things are urgent. My written communication can often be read with a tone I didn't intend.

If you are messaging me on Slack, try to do it in a shared channel. DMs create communication silos and I think they are usually more harm than good. Very few things actually need to be private. If you send me a DM that I think should be in a channel I will often just forward it there myself and continue the conversation in that new location in front of a larger group.

Planning

Please never send me a planning document. Communicate your idea in 3 paragraphs or less and text it to me. I detest opening 3 pages of workslop full of theory about an unstarted task.

If you really can't get it to work in a single message, make a Slack canvas. Anything is better than a document which lives in another piece of software I have to sign into and open.

1:1s

I like recurring 1:1s when they are about moving towards goals. If we can set something up where one of us is working towards something and we decide on steps to get there, then a recurring meeting is useful to continuously sync on that.

Progression is fun, and even more so when you keep track of it with someone who's invested in holding you accountable.

Feedback

I try to only give unprompted feedback when it's extremely specific and easy to understand. My mindset is that high-level feedback about a general topic or performance is only something that should be shared on request.

Please send me any feedback you think would be useful. Specificity is always good. I will try to improve where I can at all times.

What I Value

Please try to do something and fail before asking me for help. I find it much easier to help when you are actively experiencing a problem and want to solve it rather than just wanting my thoughts. Speaking extensively about strategy before doing something is almost always a waste of time.

I prioritize failure. You are usually some number of failures away from success, and I think you should try to get those over with as fast as you can so you reach success earlier. I get frustrated seeing people play it safe, always doing what they're good at, or ponder endlessly before closing the loop by launching and getting feedback.

My Quirks

I tend to be skeptical, sarcastic, and kind of a curmudgeon. Complexity is something I avoid on principle, even when it will save me time. Newfangled toys are something I rarely find valuable.

UX is important to me, but not so much UI. I would rather something be functional and easy to use than pretty. I don't think design is particularly valuable. Just solve the problem.

Hours and Availability

I don't sleep all that much and am easily awoken. 3am–8am are the hours I am typically hard to reach, but if you call my cell phone I will usually pick up. Just call me!

Direct pings are something which I always try to respond to as fast as possible. If it's important enough that you're annoyed with my response time being slow, again, call my cell.

What I'm Working On Improving

Writing, making videos, and marketing are the core skills I want to improve in the upcoming year. I would appreciate any feedback or ideas you may have for me in those areas.

Prompt the Loop When Using Coding Agents

Nick Khami — Sun, 02 Nov 2025 12:00:00 GMT

import badpromptVideo from '../../assets/images/blog-posts/PromptingTheAgentLoop/badprompt.mp4'; import goodpromptVideo from '../../assets/images/blog-posts/PromptingTheAgentLoop/goodprompt.mp4';

You can safely ignore advice about coding agents that doesn't mention loop structure.

"Agent" is an overloaded term, so I want to clarify that I'm using simonw's definition of "a language model which runs tools in a loop to achieve a goal" when I say "agent" throughout this post.

If that doesn't make sense to you then read fly.io's guide on writing agents or ampcode's guide and try building your own micro-agent first before continuing. I promise it's a worthwhile exercise.

What is This Loop You Speak Of?

Consider this example prompt asking claude code to write a generateSlug function.

add a new function to the @index.ts file called `generateSlug` which accepts a blog post title and returns a URL-friendly slug

Sounds clear and like it should work right? WRONG! Watch the output below.

<video controls loop muted playsinline style="width: 100%; max-width: 100%; border-radius: 8px;"> <source src={badpromptVideo} type="video/mp4" /> Your browser does not support the video tag. </video>

We're missing unicode characters, special characters, multiple spaces, edge cases. Curse javascript all you want, but this is our fault, not Claude's.

Our prompt doesn't give the agent anything to test against. You want something more like

look at @index.ts and start by making sure you have a way to unit test functions which get added to that file. Add jest and new ts files if you need to, add a new script to @package.json. Then scaffold a new generateSlug function in @index.ts, write robust tests for it covering unicode, special chars, multiple spaces, edge cases, run them, watch them fail, implement the slug generator until they all pass. Unit tests should be in the same index.ts file as the function implementation.

Watch this baby run!

<video controls loop muted playsinline style="width: 100%; max-width: 100%; border-radius: 8px;"> <source src={goodpromptVideo} type="video/mp4" /> Your browser does not support the video tag. </video>

It's all the agent loop!

The key difference is this phrase from the second prompt:

"run them, watch them fail, implement the slug generator until they all pass"

This gives the agent a loop condition, a test to run repeatedly while iterating. Without it, the agent takes one shot and stops. With it, the agent keeps working until satisfied.

Common Loop Condition Patterns

The pattern is always the same. Describe the work, specify validation, then tell the agent to iterate until the validation passes. Here are three templates you can adapt.

For new features

[describe feature], write tests for [key behaviors], run them, fix until they all pass

For bug fixes

reproduce the bug in [file/test], fix the issue, verify the bug no longer occurs and all tests pass

For build/compile issues

[make changes], run the build, fix any errors or type issues until it compiles successfully

All of the above patterns include some kind of check; either tasts pass or fail, builds succeed or error, bugs reproduce or don't. Agents can take these abstract descriptions and turn them into concrete tools which they can run over and over until the condition is met.

Prompting outside of this pattern is the equivalent of grabbing a jr dev 3 shots past Ballmer's peak and asking them to fix a bug.

Conclusion

Senior developers know what loop conditions work for different tasks. If you're junior and not sure what's appropriate for a given task, use the planning mode offered by claude code, cursor, or your coding agent of choice to help you craft them.

Good luck out there!

How I Use Claude Code on My Phone with Termux and Tailscale

Nick Khami — Sun, 19 Oct 2025 12:00:00 GMT

There's a mini gold rush to put Claude Code on your phone. Some startups are building custom apps, others are creating managed cloud environments. They're solving real problems, but you're trading raw Unix power for convenience. If you have a desktop and 20 minutes, you can get full kernel access with SSH, termux, and tailscale.

Yesterday I posted about shipping a feature to this blog from the passenger seat while driving to Apple Hill, CA from San Francisco. I SSH'd into my office desktop from my phone, prompted Claude to make the changes, tested them on my phone's browser, and pushed to production in 10 minutes. That post got 130k impressions and dozens of people asked for the setup.

This article walks through doing SSH-based mobile development with Claude Code. If you have a desktop that stays on (or a cheap VPS), you can get full terminal access from your phone with session persistence, port forwarding, and the ability to test your code on your actual mobile browser. The initial setup takes about 20 minutes and just works once configured.

The Architecture

The setup uses five standard Unix tools that work together without custom integration. A desktop runs Claude Code, tailscale creates a private network between your devices, termux gives you a real terminal on Android, SSH handles the connection, and tmux keeps your sessions alive when you disconnect.

Step 1: Setup Your Desktop

You need a computer that stays on. This could be a desktop at home, a desktop at your office, a cloud VM, or a home server. It doesn't need to be powerful. Claude Code just makes API calls, the actual compute happens at Anthropic.

I keep a desktop at my office that stays on 24/7. It's running Ubuntu with Claude Code installed. The computer does nothing else. It just sits there waiting for me to SSH in and start coding.

First, install Claude Code globally using npm. This gives you the claude command that you'll use to start coding sessions.

npm install -g @anthropic-ai/claude-code

Next, install tmux for session persistence. When you disconnect from SSH (phone locks, network drops, whatever), tmux keeps your Claude Code session running in the background. When you reconnect, you pick up exactly where you left off.

sudo apt install tmux  # Ubuntu/Debian
brew install tmux      # macOS

With Claude Code and tmux installed, your desktop is ready to host your development sessions.

Step 2: Install Tailscale Everywhere

Tailscale creates a private network between all your devices. Your phone gets a stable IP address that can reach your desktop, even when you're on different networks. It just works.

On your desktop, run the Tailscale installer. The script will detect your OS and install the right package. Then bring up the Tailscale connection, which will prompt you to authenticate in your browser.

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

Install Tailscale on your phone from the Play Store. Sign in with the same account. Your devices are now on the same network.

You'll need your desktop's Tailscale IP address to connect from your phone. Grab it with this command.

tailscale ip -4

You'll get something like 100.64.0.5. That's your desktop's address on the Tailscale network. It's stable, it's private, and it works from anywhere.

Step 3: Install Termux on Your Phone

Termux is a terminal emulator for Android that gives you a real Linux environment. Not a toy terminal. A real one with bash, ssh, and full package management.

Install Termux from F-Droid, not the Play Store. The Play Store version is outdated and broken. Get it from https://f-droid.org/en/packages/com.termux/.

Once installed, you'll need to update the package repositories and install the SSH client. Termux uses pkg as its package manager, which is basically a wrapper around apt.

pkg update
pkg install openssh

With OpenSSH installed, Termux can now connect to your desktop over SSH.

Step 4: SSH Into Your Desktop

Now for the moment of truth. Open Termux and SSH to your desktop using the Tailscale IP you grabbed earlier. Replace 100.64.0.5 with your actual IP and your-username with your desktop username.

ssh your-username@100.64.0.5

The first time you connect, SSH will ask you to verify the host fingerprint. Type yes. Then enter your password.

You're in. You're now running a shell on your desktop, from your phone, over a secure encrypted connection that works anywhere you have internet.

Step 5: Use tmux for Session Persistence

tmux is what makes this whole setup practical. When you disconnect from SSH, your tmux session keeps running on the desktop. When you reconnect, you attach to the same session and everything is exactly where you left it.

You're now connected to your desktop via SSH from your phone. Start a new tmux session with a name you'll remember. I usually name mine after the project I'm working on.

tmux new -s code

This creates a session named "code". You can name it anything. Inside the tmux session, launch Claude Code and start working.

claude

Now you're coding. On your phone. Using Claude Code. Running on your desktop.

When you need to disconnect, don't exit Claude Code. Don't exit tmux. Just close Termux or let your phone lock. The tmux session stays running on your desktop.

Later, when you want to code again, SSH back in and reattach to your session. Everything will be exactly where you left it.

ssh your-username@100.64.0.5
tmux attach -t code

Your conversation with Claude is still there. Your file context is still loaded. You can continue your previous task immediately.

In the Apple Hill example from the intro, this is exactly what I did. I SSH'd in, ran tmux attach -t personalsite to reconnect to my development session, and told Claude to add a section about the Public Suffix List and make headings into clickable anchor links. The session had been running for days. I just picked up exactly where I'd left off.

Why This Works Better Than Custom Apps

Every startup trying to solve "Claude Code on mobile" is building abstractions on top of these primitives. They're not giving you anything you can't already do with SSH and Termux. They're just wrapping it in a prettier UI and charging for hosting.

When you do it yourself, you get several advantages.

Port forwarding just works. With Tailscale, your desktop's ports are directly accessible from your phone. No configuration, no exposing services to the public internet, no proxies adding latency. Your phone and desktop are on the same private network, so anything listening on your desktop is one IP address away.

Full CLI access to configure your environment. Want to run your dev server with --host so you can test on your phone's browser? Just add the flag. Need to adjust firewall rules, modify server configs, or install system packages? You have root access. Native mobile coding apps can't offer this level of control because it's too niche for their target users, but for power users it's essential.

Session persistence that actually works. tmux was built for this. Your session survives network disconnections, phone reboots, and SSH reconnects. You never lose your place.

Your own hardware. Your desktop has your SSH keys, your git credentials, your environment exactly how you configured it. You're not coding in a disposable cloud container.

The Mobile Experience

I'm not going to pretend coding on a phone is as good as coding on a desktop. It's not. The screen is small. The keyboard is mediocre. You can't see multiple files at once.

But Claude Code is different from traditional coding. You're not typing out functions character by character. You're describing what you want, reviewing Claude's changes, and approving or rejecting them. That workflow actually works on mobile.

The Apple Hill example wasn't cherry-picked. I've shipped real features from my phone. I've fixed production bugs while getting coffee. I've reviewed pull requests from the back of an Uber. It's not my primary development environment, but it's shockingly capable when I need it.

The key is that Claude Code is conversational. You're having a back-and-forth with an AI that writes code for you. That interaction model translates to mobile better than traditional text editing. You're reading more than you're typing, and phones are great for reading.

Practical Tips

Once you have the basic setup running, there are a few tweaks that make the mobile coding experience dramatically better. These aren't strictly necessary, but they'll save you time and frustration.

Test Your Changes on Your Phone's Browser

This is the killer feature. You're not just editing code remotely, you can test it on your phone while the dev server runs on your desktop.

When I was working on the blog changes from Apple Hill, I wanted to see how the clickable anchor links looked on mobile. The trick is starting your dev server with the --host flag, which makes it accessible on your Tailscale network instead of just localhost.

yarn dev --host

For Vite (which Astro uses), this binds the dev server to 0.0.0.0 instead of 127.0.0.1. For other frameworks: npm start -- --host for React, next dev -H 0.0.0.0 for Next.js, python manage.py runserver 0.0.0.0:8000 for Django.

Grab your desktop's Tailscale IP again if you forgot it.

tailscale ip -4

Then on your phone's browser, navigate to http://100.64.0.5:4321 (replace with your Tailscale IP and port).

You're now viewing your local dev server, running on your desktop 2.5 hours away, in your phone's browser. I saw the anchor links were styled wrong, told Claude to fix them, refreshed, confirmed they looked good, and pushed. The whole workflow took maybe 10 minutes.

You're developing with the actual target device in your hand. You can test responsive layouts, check mobile interactions, and iterate immediately instead of deploying to staging or waiting until you're back at your desk.

Use SSH Keys

Don't type your password every time you SSH. Generate an SSH key on your phone and add it to your desktop's authorized keys.

Open Termux and generate an ed25519 key (the modern standard). Then use ssh-copy-id to automatically add it to your desktop's authorized keys file.

ssh-keygen -t ed25519
ssh-copy-id your-username@100.64.0.5

Now you can SSH without a password.

Create an SSH Config

Make connecting easier by adding your desktop to your SSH config. Instead of typing ssh your-username@100.64.0.5 every time, you can create an alias. Make a file at ~/.ssh/config in Termux with this content.

Host desktop
    HostName 100.64.0.5
    User your-username

Now you can just type ssh desktop instead of remembering the IP and username.

Use a Better Keyboard

Termux works with external keyboards. I keep a small Bluetooth keyboard in my bag. When I'm actually trying to get work done on my phone, I pull out the keyboard. It makes a massive difference.

The phone screen is fine for reading. The keyboard makes typing bearable.

Set Up tmux Keybindings

tmux's default keybindings are terrible on mobile. Remap them to something sensible. On your desktop, create or edit ~/.tmux.conf and add these bindings. They make tmux way easier to use on a phone keyboard.

# Use Ctrl-A instead of Ctrl-B (easier to type)
unbind C-b
set -g prefix C-a
bind C-a send-prefix

# Split panes with | and -
bind | split-window -h
bind - split-window -v

Now you can manage tmux sessions without finger gymnastics.

Run Multiple Sessions

You can have multiple tmux sessions for different projects. I usually have one for each repo I'm actively working on. Start them with descriptive names so you remember what's what.

tmux new -s backend
tmux new -s frontend
tmux new -s experiments

When you SSH in and want to see what sessions are running, list them.

tmux ls

Then attach to whichever one you want to work on.

tmux attach -t frontend

This keeps your different projects isolated. You can switch contexts just by attaching to a different session.

Security Considerations

You're SSH'ing into your desktop over the internet. That's a potential security risk if you do it wrong. Do it right with a few precautions.

Use Tailscale. Never expose SSH to the public internet. Use Tailscale to create a private network between your devices. Your SSH traffic stays encrypted and never touches the public internet directly.

Use SSH keys. Disable password authentication entirely. Keys are longer, stronger, and can't be brute-forced. Edit /etc/ssh/sshd_config on your desktop and set these values, then restart sshd.

PasswordAuthentication no
PubkeyAuthentication yes

Keep your phone secure. Your phone now has SSH access to your development machine. If someone steals your phone, they can access your desktop. Use a strong PIN or biometric lock. Enable disk encryption. Consider using a password manager for your SSH key passphrase.

Monitor SSH access. Check who's connected to your machine with who or w. Check SSH logs with sudo tail -f /var/log/auth.log. If you see connections you don't recognize, revoke SSH keys and investigate.

The threat model here is pretty mild. Your SSH traffic is encrypted. Your Tailscale network is private. The main risk is losing your phone, which is why phone security matters.

When This Doesn't Work

This setup assumes you have a desktop that stays on. If you don't, you need a cloud VM or a home server. That's still not a reason to use a third-party service. Just rent a $5/month VPS from DigitalOcean or Hetzner, install Tailscale and Claude Code, and SSH into it the same way.

This also assumes you're on Android. If you're on iOS, Termux isn't available. You'll need to use a different SSH client like Blink or Prompt. The rest of the setup is the same.

If you're on unstable internet, SSH can be frustrating. Mosh (mobile shell) is designed for high-latency or unreliable connections. Install it on both your phone and desktop, then use mosh desktop instead of ssh desktop. It handles disconnections gracefully and keeps your terminal responsive even on bad networks.

The Bottom Line

Mobile development with Claude Code doesn't require new infrastructure or custom applications. The components you need are SSH, Tailscale, Termux, and a desktop that stays on. These are standard Unix tools that have been solving remote access problems for decades.

SSH has been the standard for secure remote access since 1995. Tmux has provided session management since 2007. Tailscale is newer, but it's built on WireGuard, which has undergone extensive security audits. These tools are mature, well-documented, and widely deployed in production environments.

The underlying problem, accessing a remote development environment from a mobile device, was solved long before mobile coding became a focus. This approach applies those established solutions to Claude Code without requiring custom middleware or managed services.

If you have a desktop or VPS and 20 minutes for setup, you can have this working today. Install termux, configure tailscale, and connect via SSH. The workflow is straightforward and the tools are reliable.

Glory be to the AI overlords, who grant us the grace to code at the bar without shame.

Multi-Tenant SaaS's Wildcard TLS: An Overview of DNS-01 Challenges

Nick Khami — Fri, 17 Oct 2025 00:00:00 GMT

AI app builders are everywhere now. You enter a prompt, get a deployed product on your-app.builder.com, and ship. Replit, Bolt, Lovable, v0, and dozens of other similar platforms launched in the past few months, and they all need instant subdomain provisioning with HTTPS for every user.

This pattern isn't new. Multi-tenant SaaS has used tenant-id.foo.com subdomains forever. But the explosion of AI builders that spin up hundreds of new subdomains daily makes the certificate management problem more visible. You can't provision individual certificates for every generated app, you need wildcard certificates.

I'd never set this up before, but at Mintlify we had an internal hackathon today and I built my own AI app builder. That meant I finally had a good excuse to figure out how wildcard TLS actually works. I'm sharing what I learned so you can implement it too.

The Problem: Per-Tenant Certificates Don't Scale

If you provision individual certificates for each tenant, you're running ACME challenges for every new tenant signup, managing certificate renewals for potentially tens of thousands of certificates, and hitting rate limits from Let's Encrypt (50 certificates per registered domain per week). You need a better approach.

Wildcard Certificates: One Cert, Infinite Tenants

A wildcard certificate for *.foo.com covers all first-level subdomains. This means any subdomain directly under your base domain gets automatic TLS coverage with a single certificate.

tenant-a.foo.com     ✓
tenant-b.foo.com     ✓
tenant-xyz.foo.com   ✓

The wildcard certificate doesn't extend to the apex domain or nested subdomains, though. Here's what's explicitly excluded from coverage.

foo.com                      ✗ (apex domain)
api.tenant-a.foo.com         ✗ (nested subdomain)

For most multi-tenant systems, this is exactly what you want. One certificate, provisioned once, renewed automatically, and it works for every tenant you'll ever onboard.

Why You Must Use DNS-01 Challenges

To get a wildcard certificate from Let's Encrypt (or any ACME-compliant CA), you must use the DNS-01 challenge type. The more common HTTP-01 challenge doesn't work for wildcards.

With HTTP-01, the CA verifies domain ownership by requesting a specific file at http://your-domain/.well-known/acme-challenge/token. But for *.foo.com, there's no single HTTP endpoint to verify; the wildcard represents infinite possible subdomains.

DNS-01 solves this by verifying ownership at the DNS level. Your ACME client requests a wildcard certificate for *.foo.com, Let's Encrypt generates a challenge token, and you create a TXT record at _acme-challenge.foo.com with that token as the value.

Let's Encrypt queries public DNS for that TXT record, and if the record exists with the correct value, Let's Encrypt knows you control the domain and issues the certificate. This means your certificate provisioning system needs programmatic access to your DNS provider's API to create and delete TXT records on demand.

How DNS-01 Automation Works

The key to wildcard certificates is automating the DNS-01 challenge. This requires your web server or load balancer to have API access to your DNS provider. When Let's Encrypt needs to verify domain ownership, your system creates a temporary TXT record, waits for DNS propagation, completes the challenge, and cleans up the record.

I'm using Caddy as my reverse proxy with Cloudflare as my DNS provider, but the architecture is the same regardless of your stack. Nginx with cert-manager on Kubernetes works the same way. HAProxy with acme.sh works the same way. The pattern is universally web server + DNS provider plugin + ACME client = automated wildcard certificates.

The Architecture (Cloudflare Example)

The system has three layers. Caddy is the web server that needs TLS certificates. The caddy-dns/cloudflare module is a thin adapter (only ~120 lines of Go) that sits between Caddy and the actual DNS API client. The libdns/cloudflare package handles the real work of talking to Cloudflare's API.

Caddy handles the web server and ACME logic, certmagic handles certificate management and renewal, libdns/cloudflare handles DNS API calls, and the plugin just connects them together.

This same pattern exists for every major DNS provider. There's caddy-dns/route53 for AWS, caddy-dns/googleclouddns for GCP, caddy-dns/azure for Azure, and plugins for dozens of other providers. The code structure is nearly identical, you just swap the API client.

Building Caddy with DNS Provider Support

Standard Caddy doesn't include DNS provider modules. You need to build a custom binary with the plugin compiled in. For Cloudflare we add some go modules and a community plugin.

# Install xcaddy (Caddy's build tool)
go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest

# Build Caddy with the Cloudflare DNS plugin
xcaddy build --with github.com/caddy-dns/cloudflare

This uses Caddy's module system to compile the plugin into a single binary. The result is a caddy executable that includes the DNS provider integration.

For other providers, just swap the module name --with github.com/caddy-dns/route53 for AWS, --with github.com/caddy-dns/googleclouddns for GCP, --with github.com/caddy-dns/azure for Azure. You can even include multiple providers if you manage domains across different DNS platforms.

Configuring Your Caddyfile

Once you've built Caddy with the DNS provider plugin, the actual configuration is remarkably simple. Here's the complete configuration for wildcard TLS with automatic provisioning and renewal.

*.foo.com {
    tls {
        dns cloudflare {env.CF_API_TOKEN}
    }

    # Your reverse proxy config
    reverse_proxy localhost:8000
}

Three lines of TLS configuration, and you get automatic wildcard certificate provisioning, automatic renewal 30 days before expiration, DNS-01 challenges handled transparently, and zero maintenance.

Getting DNS Provider Credentials

Your web server needs API credentials to manage DNS records. The specific permissions required are consistent across providers. You need read access to list zones/domains, and write access to create and delete TXT records.

For Cloudflare, create an API token at https://dash.cloudflare.com/profile/api-tokens with Zone.Zone:Read and Zone.DNS:Edit permissions. For AWS Route53, create an IAM user or role with route53:ListHostedZones, route53:GetChange, and route53:ChangeResourceRecordSets permissions. For GCP Cloud DNS, create a service account with the dns.admin role scoped to your DNS zone.

The key is following the principle of least privilege, grant only the permissions needed for DNS challenge automation, nothing more.

export CF_API_TOKEN="your_token_here"

The {env.CF_API_TOKEN} placeholder in the Caddyfile will be replaced with this value when Caddy starts.

What Happens Under the Hood

When you start Caddy, here's the complete flow.

1. Configuration Parsing

Caddy reads your Caddyfile and encounters the dns cloudflare directive. The plugin's UnmarshalCaddyfile() function extracts the token from {env.CF_API_TOKEN}.

2. Token Validation

The plugin validates the token format with a regex: ^[A-Za-z0-9_-]{35,50}$. This catches common mistakes like wrapping the token in quotes or braces, which would cause cryptic API errors later.

3. Module Provisioning

Caddy calls the plugin's Provision() function, which replaces environment variable placeholders with actual values and performs final validation.

4. Certificate Check

Caddy checks its certificate cache (default ~/.local/share/caddy/certificates/acme-v02.api.letsencrypt.org-directory/) to see if a valid certificate for *.foo.com already exists. If so, it loads it and you're done.

5. ACME Challenge Request

If no valid certificate exists, Caddy's ACME client requests a certificate from Let's Encrypt. Let's Encrypt responds with a DNS-01 challenge of "Prove you control foo.com by creating a TXT record at _acme-challenge.foo.com with value xyz123_random_token."

6. DNS Record Creation

Here's where the magic happens. The plugin calls your DNS provider's API to create the challenge record. The specifics vary by provider, but the pattern is universally to find the zone ID, create a TXT record, and return success.

The Cloudflare implementation illustrates this pattern clearly. The libdns/cloudflare client makes two API requests. First, it queries for the zone ID.

GET https://api.cloudflare.com/client/v4/zones?name=foo.com
Authorization: Bearer your_token_here

Once the zone ID is retrieved, the client creates the TXT record with the challenge token.

POST https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records
Authorization: Bearer your_token_here
Content-Type: application/json

{
  "type": "TXT",
  "name": "_acme-challenge.foo.com",
  "content": "xyz123_random_token",
  "ttl": 120
}

This creates the challenge TXT record with a short TTL (2 minutes). AWS Route53 uses ChangeResourceRecordSets, GCP uses managedZones.changes.create, Azure uses their DNS REST API. Different endpoints, same result.

7. DNS Propagation Wait

Caddy polls public DNS servers to verify the TXT record has propagated. By default, it uses your system's DNS resolver, but you can configure a custom resolver.

*.foo.com {
    tls {
        dns cloudflare {env.CF_API_TOKEN}
        resolvers 1.1.1.1
    }
}

Using your DNS provider's public resolver (1.1.1.1 for Cloudflare, 8.8.8.8 for Google, 1.0.0.1 for general use) is often faster because DNS records propagate to the provider's own resolvers first. Caddy makes repeated queries until the record returns the expected value, then proceeds. This step is critical—if DNS propagation is incomplete when Let's Encrypt checks, the challenge fails.

8. Challenge Completion

Caddy tells Let's Encrypt "The TXT record is ready, check it." Let's Encrypt queries multiple DNS servers worldwide to verify the record exists. Once verified, Let's Encrypt issues the wildcard certificate.

9. Cleanup

Once the certificate is issued, the challenge TXT record is no longer needed. The plugin automatically deletes the temporary TXT record to keep your DNS zone clean.

DELETE https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records/{record_id}
Authorization: Bearer your_token_here

10. Certificate Storage

Caddy stores the certificate chain and private key in its certificate cache. The certificate is now ready to use for all *.foo.com traffic.

11. Automatic Renewal

Caddy automatically renews certificates 30 days before expiration. The entire DNS-01 challenge flow repeats automatically—create TXT record, wait for propagation, complete challenge, and finally delete TXT record. All with zero human intervention.

The Code: How the Plugin Works

The entire plugin is just ~120 lines of Go. Let's look at the key parts.

Module Registration

The first step is registering the plugin with Caddy's module system so it can be discovered and loaded at runtime. Here's how the Cloudflare provider registers itself.

type Provider struct{ *cloudflare.Provider }

func init() {
    caddy.RegisterModule(Provider{})
}

func (Provider) CaddyModule() caddy.ModuleInfo {
    return caddy.ModuleInfo{
        ID:  "dns.providers.cloudflare",
        New: func() caddy.Module { return &Provider{new(cloudflare.Provider)} },
    }
}

The plugin wraps github.com/libdns/cloudflare and registers itself as a Caddy module with the ID dns.providers.cloudflare. When you write dns cloudflare in your Caddyfile, Caddy loads this module.

Caddyfile Parsing

The parsing logic handles both inline and block configuration syntaxes, giving you flexibility in how you structure your Caddyfile. Here's how it works.

func (p *Provider) UnmarshalCaddyfile(d *caddyfile.Dispenser) error {
    d.Next() // consume directive name

    if d.NextArg() {
        // Single token syntax: cloudflare {env.CF_API_TOKEN}
        p.Provider.APIToken = d.Val()
    } else {
        // Block syntax: cloudflare { api_token ... }
        for nesting := d.Nesting(); d.NextBlock(nesting); {
            switch d.Val() {
            case "api_token":
                if d.NextArg() {
                    p.Provider.APIToken = d.Val()
                }
            case "zone_token":
                if d.NextArg() {
                    p.Provider.ZoneToken = d.Val()
                }
            }
        }
    }

    if p.Provider.APIToken == "" {
        return d.Err("missing API token")
    }
    return nil
}

This implementation supports both inline syntax for simple cases and block syntax when you need multiple configuration options. Here are the two supported formats.

# Inline syntax (recommended)
dns cloudflare {env.CF_API_TOKEN}

# Block syntax (for dual tokens)
dns cloudflare {
    api_token {env.CF_API_TOKEN}
}

Token Validation

Before making any API calls, the plugin validates that the token format is correct. This catches configuration errors early with clear error messages. Here's the validation logic.

var cloudflareTokenRegexp = regexp.MustCompile(`^[A-Za-z0-9_-]{35,50}$`)

func (p *Provider) Provision(ctx caddy.Context) error {
    // Replace placeholders like {env.CF_API_TOKEN} with actual values
    p.Provider.APIToken = caddy.NewReplacer().ReplaceAll(p.Provider.APIToken, "")

    if !cloudflareTokenRegexp.MatchString(p.Provider.APIToken) {
        return fmt.Errorf("API token '%s' appears invalid", p.Provider.APIToken)
    }
    return nil
}

This validates the token format before attempting any API calls. Cloudflare tokens are always 35-50 characters of alphanumerics, dashes, or underscores. If you accidentally wrap the token in quotes or the environment variable is unset, this catches it immediately with a clear error message instead of a cryptic "Invalid request headers" from Cloudflare later.

The Actual DNS Operations

The plugin doesn't implement DNS operations directly. It delegates to libdns/cloudflare, which implements the libdns interface.

type RecordSetter interface {
    SetRecords(ctx context.Context, zone string, records []Record) ([]Record, error)
}

type RecordDeleter interface {
    DeleteRecords(ctx context.Context, zone string, records []Record) ([]Record, error)
}

Caddy's ACME client calls these methods at the appropriate times during the DNS-01 challenge. The plugin is just the adapter that makes Caddy aware of the Cloudflare DNS provider.

Debugging and Common Issues

"Invalid request headers"

This error means your API token is malformed or the environment variable isn't set. The first step is to verify the token environment variable is properly configured.

echo $CF_API_TOKEN

If the output is empty, you've found the problem. When the environment variable isn't set, Caddy tries to use {env.CF_API_TOKEN} literally as the token, which results in authentication failures from your DNS provider's API.

"timed out waiting for record to fully propagate"

The DNS propagation check is timing out. This usually means DNS caching is happening—your local resolver is caching the old "record doesn't exist" response, so use a custom resolver like resolvers 1.1.1.1 in your TLS block. Or it's a private DNS issue where foo.com is defined in /etc/hosts or resolved by a private DNS server, causing the public DNS verification to fail. Use a public resolver or temporarily remove the private DNS entry. Finally, it could be zone access—the token doesn't have access to the zone, so verify the token has Zone:Read permission for foo.com.

"expected 1 zone, got 0"

The plugin can't find the zone for your domain. This happens if the domain isn't in Cloudflare DNS, the API token doesn't have Zone:Read permission, or the zone name doesn't match (e.g., you're requesting *.sub.foo.com but only foo.com is in Cloudflare).

Certificate Transparency Logs

All certificates issued by public CAs are logged to Certificate Transparency logs. You can see your wildcard cert at https://crt.sh. Search for %.foo.com to find wildcard certificates.

This is a feature, not a bug. It proves certificates were issued legitimately and helps detect mis-issuance. But it also means anyone can see that foo.com has a wildcard certificate, though they can't enumerate individual tenant subdomains.

Production Deployment Patterns

Docker Compose

For containerized deployments, Docker Compose provides a straightforward way to run Caddy with persistent certificate storage. Here's a complete configuration.

services:
  caddy:
    build:
      context: .
      dockerfile: Dockerfile.caddy
    ports:
      - "443:443"
      - "80:80"
    environment:
      - CF_API_TOKEN=${CF_API_TOKEN}
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - caddy_data:/data
      - caddy_config:/config
    restart: unless-stopped

volumes:
  caddy_data:
  caddy_config:

The caddy_data volume persists certificates across container restarts. The caddy_config volume persists Caddy's runtime configuration.

Dockerfile with Cloudflare Plugin

FROM caddy:builder AS builder

RUN xcaddy build \
    --with github.com/caddy-dns/cloudflare

FROM caddy:latest

COPY --from=builder /usr/bin/caddy /usr/bin/caddy

This multi-stage build compiles Caddy with the Cloudflare plugin in the builder stage, then copies just the binary to the final image.

Kubernetes with Cert-Manager

If you're running Kubernetes, consider using cert-manager instead of running ACME clients on your web servers. Cert-manager is purpose-built for Kubernetes certificate lifecycle management and supports DNS-01 challenges with all major cloud providers.

Here's an example with Cloudflare, but cert-manager has built-in support for Route53, Cloud DNS, Azure DNS, and dozens of other providers.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-foo-com
spec:
  secretName: wildcard-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - "*.foo.com"
  - "foo.com"
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@foo.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - dns01:
        cloudflare:
          email: admin@foo.com
          apiTokenSecretRef:
            name: cloudflare-api-token
            key: api-token

Cert-manager provisions the certificate as a Kubernetes Secret, which your Ingress controller (nginx, Traefik, Envoy, etc.) can reference. The dns01 solver configuration changes based on your provider—swap cloudflare for route53, clouddns, or azuredns with the appropriate credential references.

Multi-Region Deployments

If you're running web servers in multiple regions, certificate storage becomes important. File-based storage works for single-server deployments, but multi-region requires shared certificate storage.

You have three options, mount the certificate directory from a network filesystem like NFS, EFS, or cloud-provider equivalents, use storage plugins for S3, Consul, Redis, or other distributed stores, or run certificate provisioning centrally and distribute via your secrets management system.

The simplest approach for most systems is to run certificate provisioning in one region, store certificates in your cloud provider's secrets manager (Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault), and distribute to all regions. This keeps the ACME logic centralized while making certificates available everywhere.

Security Considerations

The wildcard certificate's private key protects all your tenant subdomains. If it leaks, an attacker can impersonate any tenant. Protect it like you'd protect your database credentials.

Adding Your Domain to the Public Suffix List

If you're running a multi-tenant platform where each tenant gets a subdomain, you should submit your domain to the Public Suffix List. The PSL is a registry that browsers use to determine security boundaries between sites.

Without PSL registration, browsers treat tenant-a.foo.com and tenant-b.foo.com as the same site. This means one tenant could potentially set cookies readable by another tenant, creating security and privacy issues.

When you add foo.com to the PSL, browsers treat each tenant subdomain as an independent site. Cookies set by tenant-a.foo.com cannot be read by tenant-b.foo.com. This provides proper isolation between tenants at the browser level.

Major platforms like GitHub (github.io), Vercel (vercel.app), and Netlify (netlify.app) are all registered on the PSL. If you're building tenant infrastructure, you should be too. Submit via the PSL GitHub repository with documentation proving you control the domain and explaining your multi-tenant use case.

Token Scope Limiting

Your DNS provider credentials should have the minimum required permissions. For Cloudflare, scope tokens to specific zones with only Zone.Zone:Read and Zone.DNS:Edit. For AWS Route53, use IAM policies that grant access only to specific hosted zones, not all DNS resources in your account. For GCP Cloud DNS, create service accounts with the dns.admin role scoped to individual zones, not project-wide access.

Don't use global credentials. If your token leaks, the blast radius should be limited to DNS operations on specific zones, not your entire cloud account or DNS infrastructure.

Certificate Revocation

If you need to revoke a wildcard certificate, you can't selectively revoke it for one tenant, revocation affects all tenants. This is a fundamental tradeoff of wildcard certificates.

If you need per-tenant revocation capability, you need per-tenant certificates. For most systems, the operational simplicity of wildcards outweighs this limitation.

Rate Limits

Let's Encrypt rate limits are 50 certificates per registered domain per week, 5 failed validation attempts per account per hostname per hour, and 300 new orders per account per 3 hours. With a wildcard certificate, you're provisioning one certificate regardless of tenant count, so you'll never hit the 50 certificates per week limit. This is a massive advantage over per-tenant certificates.

When NOT to Use Wildcard Certificates

Skip wildcards if tenants bring their own domains. If tenants use tenant-a.com instead of tenant-a.foo.com, you need per-tenant certificates. You can still automate this with ACME HTTP-01 challenges, but you'll need per-tenant certificate management.

Skip them if you need deep subdomain nesting. Wildcards only cover one level—*.foo.com doesn't cover api.tenant-a.foo.com. If your architecture requires nested subdomains, you either need multiple wildcard certificates or per-tenant certificates.

Skip them if regulatory compliance requires certificate isolation. Some compliance frameworks require cryptographic isolation between tenants. If your wildcard private key is compromised, all tenants are affected. For these environments, per-tenant certificates provide isolation.

Skip them if you need per-tenant certificate revocation. If you might need to revoke access for individual tenants by revoking their certificate, wildcard certificates won't work.

The Bottom Line

For multi-tenant systems with tenant-id.foo.com subdomains, wildcard certificates are the right choice. The implementation pattern is the same regardless of your infrastructure, pick a web server (Caddy, Nginx, HAProxy), integrate with your DNS provider's API (Cloudflare, Route53, Cloud DNS, Azure DNS), and let ACME automation handle the rest.

The alternative, per-tenant certificates, is operationally complex, technically fragile, and doesn't scale past a few hundred tenants. Wildcard certificates are the pragmatic choice, and modern tooling makes them trivial to implement across any cloud platform.

If you're building tenant-id.foo.com infrastructure, this is the way.

XGBoost Is All You Need

Nick Khami — Sun, 12 Oct 2025 00:00:00 GMT

import LLMFeatureDemo from "../../components/blog/XGBoostIsAllYouNeed/LLMFeatureDemo.astro"; import ModelComparisonDemo from "../../components/blog/XGBoostIsAllYouNeed/ModelComparisonDemo.astro"; import FeatureImportanceDemo from "../../components/blog/XGBoostIsAllYouNeed/FeatureImportanceDemo.astro";

{/*  */}

I spent two and a half years at a well-funded search startup building systems that used LLMs to answer questions via RAG (Retrieval Augmented Generation). We'd retrieve relevant documents, feed them to an LLM, and ask it to synthesize an answer.

I came out of that experience with one overwhelming conviction: we were doing it backwards. The problem was that we were asking LLMs "what's the answer?" instead of "what do we need to know?"

LLMs are brilliant at reading and synthesizing information at massive scale. You can spawn infinite instances in parallel to process thousands of documents, extract insights, and transform unstructured text into structured data. They're like having an army of research assistants who never sleep and work for pennies.

A real example: Predicting NFL running back performance

{/*  */}

Forecasting how many rushing yards an NFL running back will gain in their next game is a perfect example of this architecture. It's influenced by historical statistics (previous yards, carries, opponent defense), qualitative factors (recent press coverage, injury concerns, offensive line health), and game context (Vegas betting lines, projected workload).

The wrong approach: Asking LLMs directly

{/*  */}

You could ask ChatGPT's Deep Research feature to predict every game in a week. It would use web search to gather context, think about each matchup, and give you predictions.

This approach is fundamentally broken. It's unscalable (each prediction requires manual prompting and waiting), the output is unstructured (you'd need to manually parse each response and log it in a spreadsheet), it's unreliable (LLMs are trained to sound plausible, not to optimize for numerical accuracy), and you can't learn from it (each prediction is independent—there's no way to improve based on what worked).

This is the "ask the LLM what's the answer" approach. It feels like you're doing AI, but you're really just creating an expensive, slow research assistant that makes gut-feel predictions.

The right approach: LLMs for feature engineering

{/*  */}

Instead of asking "How many yards will Derrick Henry rush for?", we ask the LLM to transform unstructured information into structured features. Search for recent press coverage and rate sentiment 1-10. Analyze injury reports and rate concern level 1-5. Evaluate opponent's run defense and rate weakness 1-10.

This is scalable (run 100+ feature extractions in parallel), structured (everything becomes a number XGBoost can use), and improves over time (XGBoost learns which features actually matter).

I started with basic statistical features from the NFL API: yards and carries from the previous week, 3-week rolling averages, that kind of thing. These are helpful, but they miss important context.

So I had the LLM engineer seven qualitative features: press coverage sentiment, injury concerns, opponent defense weakness, offensive line health, Vegas sentiment, projected workload share, and game script favorability. An agent loop with web search processed context about each player and game to populate these features—searching for news in the week leading up to the game and rating each factor on a numerical scale.

Once we run this process for every running back each week, we end up with a dataset that has both statistical and LLM-engineered qualitative features.

Training XGBoost

{/*  */}

I split the data chronologically—early weeks for training, later weeks for testing—and trained two models. A baseline using only statistical features (previous yards, carries, rolling averages), and an enhanced model using both statistical and LLM-engineered features.

Results

{/*  */}

The LLM-enhanced model reduced prediction error by 22.6%. The baseline model was actually worse than just predicting the average yards (R² of -0.025), while the enhanced model explained 38.6% of the variance.

But that's not the interesting part. The interesting part is what XGBoost actually learned.

What XGBoost Actually Learned

{/*  */}

Six of the top seven most important features are LLM-engineered.

The top feature is average carries over the last 3 weeks (statistical). The second most important feature is press coverage sentiment (LLM). Then game script prediction (LLM), Vegas sentiment (LLM), projected workload share (LLM), offensive line health (LLM), and injury concern (LLM).

I didn't tell XGBoost that press sentiment matters more than injury concerns, or that game script prediction is more important than offensive line health. The model discovered these patterns on its own by analyzing which features actually correlated with rushing yards.

The most predictive LLM feature, press coverage sentiment, captures momentum and narrative that doesn't show up in raw statistics. When a running back is getting positive press coverage, they tend to get more carries and perform better. XGBoost found this signal and learned to weight it heavily.

This is the power of the hybrid approach: LLMs transform messy, unstructured context into clean features. XGBoost discovers which features actually matter. Neither could do this alone.

Why This Matters

{/*  */}

This isn't just about NFL predictions. Email prioritization, Slack message routing, pull request quality assessment, prediction market opportunities, customer support triage—every one of these problems has the same structure. Some structured data combined with unstructured context that needs to be transformed into a prediction.

The architecture is identical every time: use LLMs in parallel to extract features from unstructured data, combine with structured features, train XGBoost to find patterns, deploy and iterate.

Setting this up from scratch takes way too much time. I want tools that make this trivial—upload your data, describe what you want to predict, and get back a trained model with a deployment-ready API.

Why These Tools Don't Exist

{/*  */}

The tools I'm describing could exist today. The technology is mature and proven. So why hasn't anyone built them?

Random forests don't raise $1B rounds.

Founders are building pure-LLM systems because that's what gets funded. VCs get excited about foundation models and AGI, not about elegant hybrid architectures that combine 2019-era XGBoost with LLM feature engineering.

This is the real problem with modern AI development. Not that the technology isn't good enough—it's that incentives are backwards. VC-led engineering is bad engineering. The best technical solutions rarely align with what makes a compelling pitch deck.

Everyone's building the wrong thing because they're building what raises money instead of what solves problems.

If you're a builder who cares more about solving real problems than raising huge rounds, there's a massive opportunity here. Build the boring, practical tools that let people deploy these hybrid systems in minutes instead of weeks. Build what actually works instead of what sounds impressive.

The Right Tool for the Right Job

{/*  */}

The future of ML isn't pure LLMs or pure classical ML—it's knowing which tool to use for which job.

Don't ask LLMs "what's the answer?" Ask them "what do we need to know?" Then let XGBoost find the patterns in those answers.

Want to see the full implementation? Check out the complete Jupyter notebook walkthrough with all the code, data processing steps, training, and visualizations.

Use the Accept Header to serve Markdown instead of HTML to LLMs

Nick Khami — Sat, 27 Sep 2025 18:52:00 GMT

Agents don't need to see websites with markup and styling; anything other than plain Markdown is just wasted money spent on context tokens.

I decided to make my Astro sites more accessible to LLMs by having them return Markdown versions of pages when the Accept header has text/plain or text/markdown preceding text/html. This was very heavily inspired by this post on X from bunjavascript.

Hopefully this helps SEO too, since agents are a big chunk of my traffic. The Bun team reported a 10x token drop for Markdown and frontier labs pay per token, so cheaper pages should get scraped more, be more likely to end up in training data, and give me a little extra lift from assistants and search.

Note: You can check out the feature live by running curl -H "Accept: text/markdown" https://www.skeptrune.com or curl -H "Accept: text/plain" https://www.skeptrune.com in your terminal.

Static Site Generators are already halfway there

Static site generators like Astro and Gatsby already generate a big folder of HTML files, typically in a dist or public folder through an npm run build command. The only thing missing is a way to convert those HTML files to markdown.

It turns out there's a great CLI tool for this called html-to-markdown that can be installed with npm install -D @wcj/html-to-markdown-cli and run during a build step using npx.

Here's a quick Bash script an LLM wrote to convert all HTML files in dist/html to Markdown files in dist/markdown, preserving the directory structure:

# convert-to-markdown.sh
mkdir -p dist/markdown

find dist/html -type f -name "*.html" | while read -r file; do
    relative_path="${file#dist/html/}"
    dest_path="dist/markdown/${relative_path%.html}.md"
    mkdir -p "$(dirname "$dest_path")"
    npx @wcj/html-to-markdown-cli "$file" --stdout > "$dest_path"
done

Once you have the conversion script in place, the next step is to make it run as a post-build action. Here's an example of how to modify your package.json scripts section:

"scripts": {
    "build": "astro build && yarn mv-html && yarn convert-to-markdown",
    "mv-html": "mkdir -p dist/html && find dist -type f -name '*.html' -not -path 'dist/html/*' -exec sh -c 'for f; do dest=\"dist/html/${f#dist/}\"; mkdir -p \"$(dirname \"$dest\")\"; mv -f \"$f\" \"$dest\"; done' sh {} +",
  "convert-to-markdown": "bash convert-to-markdown.sh"
}

Moving all HTML files to dist/html first is only necessary if you're using Cloudflare Workers, which will serve existing static assets before falling back to your Worker. If you're using a traditional reverse proxy, you can skip that step and just convert directly from dist to dist/markdown.

Note: I learned after I finished the project that I could have added run_worker_first = ["*"] to my wrangler.json so I didn't have to move any files around. That field forces the worker to always run frst. Shoutout to the kind folks on reddit for telling me.

Cloudflare Workers-specific configuration

I pushed myself to go out of my comfort zone and learn Cloudflare Workers for this project since my company uses them extensively. If you're using a traditional reverse proxy like Nginx or Caddy, you can skip this section (and honestly, you'll have a much easier time).

If you're coming from traditional reverse proxy servers, Cloudflare Workers force you into a different paradigm. What would normally be a simple Nginx or Caddy rule becomes custom wrangler.jsonc configuration, moving your entire site to a shadow directory so Cloudflare doesn't serve static assets by default, writing JavaScript to manually check headers and using env.ASSETS.fetch to serve files. SO MANY STEPS TO MAKE A SIMPLE FILE SERVER!

This experience finally made Next.js 'middleware' click for me. It's not actually middleware in the traditional sense of a REST API; it's more like 'use this where you would normally have a real reverse proxy.' Both Cloudflare Workers and Next.js Middleware are essentially JavaScript-based reverse proxies that intercept requests before they hit your application.

While I'd personally prefer Terraform with a hyperscaler or a VPS for a more traditional setup, new startups love this pattern, so it's worth understanding.

Here's an example of a working wrangler.jsonc file to refer to a new worker script and also bind your build output directory as a static asset namespace:

{
  "main": "worker.js",
  "assets": {
    "directory": "./dist",
    "binding": "ASSETS"
  }
}

Below is a minimal worker script that inspects the Accept header and serves markdown when requested, otherwise falls back to HTML:

export default {
  async fetch(request, env) {
    const url = new URL(request.url);
    const acceptHeader = request.headers.get("accept") || "";
    const acceptTypes = acceptHeader.split(",");

    const plainIndex = acceptTypes.findIndex(
      (t) => t.includes("text/plain") || t.includes("text/markdown")
    );
    const htmlIndex = acceptTypes.findIndex((t) => t.includes("text/html"));
    const prefersMarkdown =
      plainIndex !== -1 && (htmlIndex === -1 || plainIndex < htmlIndex);

    const tryServeContent = async (format) => {
      let contentType;
      if (format === "markdown") {
        if (url.pathname == "" || url.pathname == "/") {
          const sitemapResponse = await env.ASSETS.fetch(
            new Request(new URL("/sitemap-0.xml", request.url))
          );
          if (sitemapResponse.ok) {
            const content = await sitemapResponse.text();
            return new Response(content, {
              headers: {
                "Content-Type": "application/xml; charset=utf-8",
                "Cache-Control": "public, max-age=3600",
              },
            });
          }
        }

        contentType = "text/plain; charset=utf-8";
        let distPath = `/markdown${url.pathname}`;

        if (!distPath.endsWith(".md") && !distPath.endsWith("/")) {
          distPath += "/index.md";
        } else if (distPath.endsWith("/")) {
          distPath += "index.md";
        }

        if (url.pathname === "/") {
          distPath = "/markdown/index.md";
        }

        try {
          const response = await env.ASSETS.fetch(
            new Request(new URL(distPath, request.url))
          );
          if (response.ok) {
            const content = await response.text();
            return new Response(content, {
              headers: {
                "Content-Type": contentType,
                "Cache-Control": "public, max-age=3600",
              },
            });
          }
        } catch (error) {
          console.error(`Error fetching HTML file from ${distPath}:`, error);
        }
      } else {
        contentType = "text/html; charset=utf-8";
        let distPath = `/html${url.pathname}`;

        if (!distPath.endsWith(".html") && !distPath.endsWith("/")) {
          distPath += "/index.html";
        } else if (distPath.endsWith("/")) {
          distPath += "index.html";
        }

        // Handle root path
        if (url.pathname === "/") {
          distPath = "/html/index.html";
        }

        try {
          const response = await env.ASSETS.fetch(
            new Request(new URL(distPath, request.url))
          );
          if (response.ok) {
            const content = await response.text();
            return new Response(content, {
              headers: {
                "Content-Type": contentType,
                "Cache-Control": "public, max-age=3600",
              },
            });
          }
        } catch (error) {
          console.error(`Error fetching HTML file from ${distPath}:`, error);
        }
      }

      return null;
    };

    if (prefersMarkdown) {
      const markdownResponse = await tryServeContent("markdown");
      if (markdownResponse) return markdownResponse;

      const htmlResponse = await tryServeContent("html");
      if (htmlResponse) return htmlResponse;
    } else {
      const htmlResponse = await tryServeContent("html");
      if (htmlResponse) return htmlResponse;

      const markdownResponse = await tryServeContent("markdown");
      if (markdownResponse) return markdownResponse;
    }

    return await env.ASSETS.fetch(
      new Request(new URL("/html/404.html", request.url))
    );
  },
};

Pro tip: make the root path / serve your sitemap.xml instead of markdown content for your homepage such that an agent visiting your root URL can see all the links on your site.

Caddy configuration

It's likely much easier to set this system up with a traditional reverse proxy file server like Caddy or Nginx. Here's a simple Caddyfile configuration that does the same thing:

{
    your-personal-domain.com {
        root * /path/to/your/dist
        file_server

        @markdown {
            header Accept *text/markdown*
            header Accept *text/plain*
            not header Accept *text/html*
        }
        handle @markdown {
            rewrite * /markdown{path}/index.md
            try_files {path} {path}.md /markdown/index.md
            file_server
        }

        handle {
            rewrite * /html{path}/index.html
            try_files {path} {path}.html /html/index.html
            file_server
        }

        handle_errors {
            respond "404 Not Found" 404
            try_files /html/404.html
        }
    }
}

I will leave Nginx configuration as an exercise for the reader or perhaps the reader's LLM of choice.

Conclusion: A More Accessible Web for Agents

By serving lean, semantic Markdown to LLM agents, you can achieve a 10x reduction in token usage while making your content more accessible and efficient for the AI systems that increasingly browse the web. This optimization isn't just about saving money; it's about GEO (Generative Engine Optimization) for a changed world where millions of users discover content through AI assistants.

Astro's flexibility made this implementation surprisingly straightforward. It only took me a couple of hours to get both the personal blog you're reading now and patron.com to support this feature.

If you're ready to make your site agent-friendly, I encourage you to try this out. For a fun exercise, copy this article's URL and ask your favorite LLM to "Use the blog post to write a Cloudflare Worker for my own site." See how it does! You can also check out the source code for this feature at github.com/skeptrunedev/personal-site to get started.

I'm excited to see the impact of this change on my site's analytics and hope it inspires others. If you implement this on your own site, I'd love to hear about your experience! Connect with me on X or LinkedIn.

VPS Evangelism and Building LLM-over-DNS

Nick Khami — Wed, 06 Aug 2025 18:52:00 GMT

My most valuable skill as a hacker/entrepreneur is that I'm confident deploying arbitrary programs that work locally to the internet. Sounds simple, but it's really the core of what got me into Y-Combinator and later helped me raise a seed round.

Being on the Struggle Bus Early

When I was starting out hacking as a kid, one of the first complete things I built was a weather reply bot for Twitter. It read from the firehouse API, monitored for mentions and city names, then replied with current weather conditions when it got @'ed. My parents got me a Raspberry Pi for Christmas and I found a tutorial online. I got it working locally and then got completely stuck on deployment.

The obvious next step was using my Pi as a server, but that was a disaster. My program had bugs and would crash while I was away. Then I couldn't SSH back in because my house didn't have a static IP and Tailscale wasn't a thing yet. It only worked on and off when I was home and could babysit it.

Skipping Straight to PaaS Hell

When I started building web applications, I somehow skipped VPS entirely and went straight to Platform as a Service Solutions like Vercel and Render. I have no idea why. I was googling "how do I deploy my create react app" and somehow the top answer was to deploy to some third-party service that handled build steps, managed SSL, and was incredibly complicated and time-consuming.

There was always some weird limitation: memory constraints during build, Puppeteer couldn't run because they didn't have the right apt packages. Then I was stuck configuring Docker images, and since AI wasn't a thing yet and I'd never used Docker at a real job, it was all a disaster. I wasted more time trying to deploy my crappy React slop than building it.

Getting Saved by a VPS Maximalist

During college, I got lucky and met a hacky startup entrepreneur who was hiring. I decided to take a chance and join, even though the whole operation seemed barely legitimate.

Going into the job, I had this assumption that the "right" way to deploy was on AWS or some other hyperscaler. But this guy's mindset was the complete opposite—he was a VPS maximalist with a beautifully simple philosophy: rent a VPS, SSH in, do the same thing you did locally (yarn dev or whatever), throw up a reverse proxy, and call it a day. I watched him deploy like this over and over, and eventually he walked me through it myself a few times.

It was all so small and easy to learn, but it made me exponentially more confident as a builder. I never directly thought, "I can't build this because I won't be able to deploy it," but the general insecurity definitely caused a hesitancy and procrastination that immediately went away.

Paying It Forward

I've become an evangelist for this approach and wanted to write about it for a long time, but didn't know how to frame it entertainingly. Then on X, I got inspiration when levelsio posted a tweet about deploying a DNS server on Hetzner that lets you talk to an LLM.

Want to see it in action? Try this:

dig @llm.skeptrune.com "what is the meaning of life?" TXT +short

Getting that setup is probably more interesting than my rambling, so here's how to deploy your own LLM-over-DNS proxy on a VPS in less than half an hour with nothing other than a rented server.

Step 1: Access Your VPS

After purchasing your VPS, you'll receive an IP address and login credentials (usually via email). Connect to your server:

ssh root@<your-vps-ip>

Replace <your-vps-ip> with your actual server IP address.

Step 2: Clear Existing DNS Services

Many VPS images come with systemd-resolved or bind9 pre-installed. To avoid conflicts, remove or disable them:

# Check for running DNS services
systemctl list-units --type=service | grep -E 'bind|dns|systemd-resolved'

# Stop and disable systemd-resolved (if present)
systemctl stop systemd-resolved
systemctl disable systemd-resolved

# Remove bind9 (if present)
apt-get remove --purge bind9 -y

Step 3: Install Python Dependencies

Install the required Python packages for our DNS server:

pip install dnslib requests

Step 4: Create the DNS-to-LLM Proxy Script

Create a Python script that listens for DNS queries, treats the question as a prompt, sends it to the OpenRouter LLM API, and returns the response in a TXT record:

from dnslib.server import DNSServer, BaseResolver
from dnslib import RR, QTYPE, TXT
import requests
import codecs

OPENROUTER_API_KEY = ""  # Add your OpenRouter API key here
LLM_API_URL = "https://openrouter.ai/api/v1/chat/completions"

class LLMResolver(BaseResolver):
    def resolve(self, request, handler):
        qname = request.q.qname
        qtype = QTYPE[request.q.qtype]
        prompt = str(qname).rstrip('.')
        
        # Forward prompt to LLM
        try:
            response = requests.post(
                LLM_API_URL,
                headers={
                    "Authorization": f"Bearer {OPENROUTER_API_KEY}",
                    "Content-Type": "application/json"
                },
                json={
                    "model": "openai/gpt-3.5-turbo",
                    "messages": [{"role": "user", "content": prompt}]
                },
                timeout=10
            )
            response.raise_for_status()
            raw_answer = response.json()["choices"][0]["message"]["content"]
        except Exception as e:
            raw_answer = f"Error: {str(e)}"
        
        try:
            answer = codecs.decode(raw_answer.encode('utf-8'), 'unicode_escape')
        except Exception:
            answer = raw_answer.replace('\\010', '\n').replace('\\n', '\n')
        
        reply = request.reply()
        if qtype == "TXT":
            # Split long responses into chunks of 200 chars (safe limit)
            chunk_size = 200
            if len(answer) > chunk_size:
                chunks = [answer[i:i+chunk_size] for i in range(0, len(answer), chunk_size)]
                for i, chunk in enumerate(chunks):
                    reply.add_answer(RR(qname, QTYPE.TXT, rdata=TXT(f"[{i+1}/{len(chunks)}] {chunk}")))
            else:
                reply.add_answer(RR(qname, QTYPE.TXT, rdata=TXT(answer)))
        return reply

if __name__ == "__main__":
    resolver = LLMResolver()
    server = DNSServer(resolver, port=53, address="0.0.0.0")
    server.start_thread()
    import time
    while True:
        time.sleep(1)

Save this as llm_dns.py on your VPS.

Before running, you need to set your OpenRouter API key. For this tutorial, you can paste it directly into the OPENROUTER_API_KEY variable. For anything more serious, you should use an environment variable to keep your key out of the code.

Security Note: This is a proof-of-concept. For production use, you'd want proper process management (systemd), logging, rate limiting, and to avoid storing API keys in plaintext.

Step 5: Run the DNS-LLM Proxy

Start the DNS server (port 53 requires root privileges):

sudo python3 llm_dns.py

Step 6: Test Your Service

From another machine, send a DNS TXT query to test your setup:

dig @<your-vps-ip> "what is the meaning of life" TXT +short

The LLM's response should appear in the output.

Troubleshooting

Common Issues:

Permission denied: Make sure you're running with sudo (port 53 requires root)
Connection timeout: Check your VPS firewall settings and ensure port 53 is open
API errors: Verify your OpenRouter API key and check your account has credits
No response: Try running systemctl status systemd-resolved to ensure it's actually disabled

Step 7: Secure Your Setup (Optional but Recommended)

To restrict access to your DNS-LLM proxy, use UFW (Uncomplicated Firewall) - and yes, it's literally called "uncomplicated" because that's what a VPS is, uncomplicated:

ufw allow ssh
ufw allow 53
ufw enable

This allows SSH access (so you don't lock yourself out) and DNS queries on port 53, while blocking everything else by default.

Important: This setup runs as root and stores your API key in plaintext. For anything beyond experimentation, consider using environment variables, proper user accounts, and process managers like systemd.

References

That's it! You now have your own LLM-over-DNS proxy running on a simple VPS. No complex infrastructure needed - just SSH, install dependencies, and run your code. This is the beauty of keeping things simple.

I couldn't submit a PR, so I got hired and fixed it myself

Nick Khami — Wed, 30 Jul 2025 00:00:00 GMT

import NoAbortVideo from "../../components/blog/DoingTheLittleThings/NoAbortVideo.astro";

For over a year, I was bugged by a search quirk on Mintlify that caused race conditions and wonky search results.

Here's the fun irony: I was the founder of Trieve, the company that powered search for their 30,000+ documentation sites, yet their debounced search queries weren't being aborted as you typed. Check out this delightful chaos:

I had brought this up in our shared Slack before when I was just a vendor to ~~them~~ us (weird), but it wasn't a priority and never got fixed. It was extra frustrating because the race condition on the query was apparent enough that search would sometimes feel low quality since it would return results for a query many characters before the user was done typing.

Even worse, as the founder of the search company powering this experience, it felt like a poor reflection on Trieve every time someone encountered these wonky results.

Fixed It

Now that I'm on the team, I was able to finally fix it. I added an AbortController to the debounced search function, so that it aborts any previous queries when a new one is made. This means that the search results are always relevant to what the user is currently typing.

There's something deeply satisfying about finally being able to fix the things that bug you. It made me feel a bit like George Hotz during his single week at Twitter in 2022, where he joined with overambitious plans to fix Twitter search, gave up due to hubris, and settled for fixing an annoying login popup before leaving.

I've always admired engineers who are part hacker, part entrepreneur - people who see a problem and just... fix it. Getting to do something similar here (minus the dramatic exit) felt like a small win in steering my career toward that kind of direct approach.

Open Source

I prefer building and using open source software whenever possible, and this whole situation is a great example of why.

With open source - when you encounter a bug or pain point, you can actually fix it yourself. Had this been an open source project during the year I was frustrated with the search race condition, I could have submitted a pull request with the AbortController fix and saved myself (and thousands of other users) the daily annoyance.

Instead, it remained a persistent irritation until I happened to join the company and gain access to the codebase. There's something to be said for the immediate empowerment that comes with open source - though I understand why many companies choose different models for various business reasons.

Self-Congratulation

If search feels just a bit crisper and more responsive on Mintlify, it’s because of me! I fixed a bug that bothered me for over a year, and it feels great to have made that little improvement to the product.

I can't wait to make more. Fixing small issues like this over and over again is how products become legendary. There's something deeply satisfying about finally having the power to fix the things that annoy you - even if they're tiny.

Especially if they're tiny.

What 7,112 Hacker News users listened to on my side project

Nick Khami — Mon, 14 Jul 2025 18:52:00 GMT

import UserEngagementSankey from "../../components/blog/JukeboxAnalysis/UserEngagementSankey.astro"; import SongsExplorer from "../../components/blog/JukeboxAnalysis/SongsExplorer.astro"; import ArtistAnalysis from "../../components/blog/JukeboxAnalysis/ArtistAnalysis.astro";

I was burnt out from my startup and wanted to recover some of my creative energy, so I decided to build a fun side project called Jukebox.

I had the idea of building a collaborative playlist app where you could queue music together with friends and family.

I launched it on Hacker News, where it hit frontpage and got a lot of traction. In total, it had 7112 visitors who played 2877 songs.

Hacker News users are known for their eclectic tastes, so I was curious to see what kind of music they listened to. I did some data analysis on the usage patterns and music genres, and I wanted to share my findings.

People Actually Used It!

Part of the fun of side projects is that you can use them as an opportunity to build your skills. Personally, one of the core skills I want to improve is marketing.

Therefore, it was important to me that I actually drove traffic to the app and got people to use it. I'm happy to report that I was able to do that! Here's a full breakdown of the user engagement:

The data is reliable because each visitor to the site is assigned an anonymous user account. This allows for accurate tracking of how many unique users visited, how many created a "box" (playlist), and how many engaged with the main features.

Conversion rate into the primary "Create Box" CTA was awesome! However, I was sorely dissapointed to see that only 6.7% of people who created a box actually used the app to queue music together, which was the main reason why I built it in the first place.

I'd call it a pyhrrhic victory. My product sense was a few rings off the bullseye, but still on the target. I'm not going to continue working on Jukebox, but it certainly fulfilled its core purpose of helping me recover my creative energy and learn some new skills.

What Music Did They Listen To?

I was originally planning to talk more about how Jukebox was built, but I think the more interesting part is the data analysis of what music Hacker News users listened to.

Genres

Spotify is generous with their API, so I was able to hydrate the songs data with genres by using their data.

Hacker News users actually disappointed me with their music tastes. I expected them to be more eclectic, but classic rock and rock were 2 times more popular than any other genre.

New wave, metal, and rap followed as the next most played genres, but there was a steep drop-off after the top three. The long tail of genres included everything from country and EDM to post-hardcore and progressive rock, but these were much less represented.

One thing that surprised me was how country music edged out electronic genres in popularity. I expected a tech-focused audience to gravitate more towards electronic or EDM, but country had a stronger showing among the top genres. It’s a reminder that musical preferences can defy stereotypes, even in communities you’d expect to lean a certain way.

Artists

When it comes to artists, the results were a mix of the expected and the surprising. Michael Jackson topped the list as the most played artist—proving that the King of Pop’s appeal truly spans generations and communities, even among techies. Queen and Key Glock followed closely, showing that both classic rock and modern hip-hop have their place in the hearts (and playlists) of Hacker News users.

I was surprised to see a strong showing from artists like Taylor Swift and Depeche Mode, as well as a healthy mix of rap, electronic, and indie acts. The diversity drops off after the top few, but there’s still a wide spread: from Daft Punk to Nirvana, Dua Lipa to ABBA, and even some more niche names like Wolf Parade and Day Wave.

Overall, while classic rock and pop dominate, there’s a clear undercurrent of variety—perhaps reflecting the broad interests of the Hacker News crowd, even if their musical tastes lean a bit more mainstream than I expected.

AI Makes Me More Willing to Build Things

Dens Sumesh, a former intern at my company, originally had the idea for Jukebox and told me about it at dinner one day. I thought it was a great and had potential, so I decided to build it. AI codegen has made me drastically more willing to build things on a whim.

Typically I would have probably quit after finishing the backend, because React slop is not my favorite thing to work on. However, since the AI is good enough at React to do most of that work for me, I was mentally able to push through and finish the project.

Another side benefit of building this was that I got a better handle on when AI is an efficient tool versus when it’s better to rely on my own skills. For example, highlighting a component and prompting "use framer-motion to make this fade in buttery smooth" is a great use of AI.

However, more complex asks like "add an api route to accept a song, put it in a queue with sqlite, and create a worker that downloads and uploads them to s3, with a final api route to check when they finish" are more efficiently handled by a human with intuition and experience.

Framing things out manually, or even prompting the frame, consistently seemed to be a more efficient strategy than trying to get the AI to one-shot entire features. Both approaches can work, but breaking things down helps you maintain control and clarity over the process.

If you rely too much on one-shot prompts, you can end up in a cycle where your eyes glaze over and you're pressing the "regenerate" button like it's a Vegas slot machine. This slot machining makes launching less likely because you spend more time hoping for a perfect result rather than iterating and moving forward. It's easy to get stuck chasing the ideal output instead of shipping something real and learning from feedback.

Conclusion

Build stuff, share it, get feedback, and learn. Shots on goal lead to more opportunities for improvement and innovation.

Even though Jukebox is now going into maintenance mode, it was everything I hoped it would be: a fun side project that people actually used.

If you want the raw data, you can find it on the GitHub repository.

If you want to see the source code for Jukebox, that's on Github at skeptrunedev/jukebox.

Building the Server for Threshold Multisigs

Nick Khami — Mon, 16 Jun 2025 18:52:00 GMT

Introduction

You're launching a new Bitcoin ETF worth billions of dollars. You need to secure the funds backing it and are scared to build that security system yourself, so you look for a vendor. You pick Coinbase Custody, probably the most well-known and trusted company in the entire ecosystem, to custody your funds. Trying to be transparent, you publish the Bitcoin addresses holding the funds backing your ETF.

Then, the whole world realizes that Coinbase has the entire amount secured behind a single private key, and that private key may or may not be an offline threshold multisig wallet.

This is the story of Bitwise, Coinbase Custody, and the accusations that they were not using multisig wallets to secure their Bitcoin ETF funds.

Ultimately, the accusations were likely unfounded, Coinbase Custody most likely uses a threshold ECDSA scheme which, while much scarier to implement and maintain than Schnorr, still provides a high level of security. However, the situation is still bizarre and highlights the need for better tools and infrastructure for implementing threshold multisigs in production.

State of the Ecosystem

I started working on a Bitcoin bridge at ZeroDAO in 2022 and quickly realized that the state of the ecosystem was nascent. There are many libraries for implementing threshold multisigs, but that's the easy part. The hard part is building a complete server implementation that can be run in production.

If you want to run a threshold multisig vault, you have to build your own server on top of these libraries. This is a lot of work and requires a deep understanding of the underlying cryptography and protocols. It's like having a powerful engine, but no car to put it in. You can build your own car, but it's a lot of work and you have to figure out how to make it safe and reliable.

Bitwise is basically forced to rely on Coinbase Custody or some other vendor to manage their $4B of Bitcoin for them based on a trust model that is not transparent to the public and maybe not even transparent to them. This is not a good situation for the ecosystem, and it's not a good situation for the users of these services.

Deciding to Do Something About It

Startups are fu**ing hard. Out of college, I have spent the past two years working 70+ hour weeks, making less than a third of what I would at a reputable tech company, trying to make something of myself and simultaneously put a dent in the world.

Our first product, Trieve, is a relevancy optimized search engine in a simple API. We built it because we were frustrated with the state of having to go through the hassle of setting up an ingestion pipeline, indexing data, and optimizing the underlying engine every time you wanted high quality retrieval. I'm very proud of it!

Lifetime, we have supported over 300 unique paying customers, made hundreds of thousands of dollars in revenue, served 150M+ searches, and indexed over 1B documents. At this point, it's a mature product and we have fulfilled our initial ambitions for it. There's even a shopify app that over 100 stores are using!

But I still felt like there was something missing. I wanted to build something that would have a lasting impact and Trieve started to feel like a distraction from that. I stopped getting sparks of joy every time we onboarded a new customer or launched a new feature. AI is great, but overtime I started to feel more like a glorified data janitor than a builder.

To Quit or Not to Quit

Trieve isn't big enough to sell for a life-changing amount of money. Denzell and I talked through some acquihire offers, but they didn't feel right. Denzell and I learned a ton from building Trieve, and we are both very proud of what we accomplished. Our tank of energy was still pretty full, the valve between it and Trieve's product was just closed.

So, we started looking around and trying to figure out something that the valve could be opened to. In that process, we got drawn back to the startups we had worked at before Trieve, and the problems we had seen there.

Working at a Startup

When you decide to join a startup, you typically are not doing it for the money. It's a raw deal to be an early employee at a startup. You are taking on a lot of risk, working long hours, and making less money than you would at a big tech company. But you do it because you believe in the mission and you want to be part of something bigger than yourself.

I was extremely lucky to have the experience of being bought into two startups doing something I believed strongly in before founding my own. I worked at ZeroDAO, Quai, Breezy, and Botanix. Three out of those four startups were in the permissionless blockchain space, and that's no coincidence. I'm easily nerd sniped by the idea of building something that is open source, permissionless, and can be used by anyone who wants to use it.

Ironically, ZeroDAO and Botanix in particular were both working specifically on applications which required managing a vault of Bitcoin assets. ZeroDAO, in typical startup fashion, used infrastructure from a vendor, speficially a company called renvm, to manage the Bitcoin assets backing their bridge. RenVM was owned by Alameda Research who collapsed in late 2022 along with FTX, and the RenVM team was forced to shut down the service. That was the end of ZeroDAO which was a damn shame because, prior to that, we had built a really cool product that processed over 184BTC in less than 7 months.

As it turns out, similar to Bitwise, it was also out of scope for ZeroDAO to build their own threshold multisig vault server. Ultimately, the company was forced to shut down because we could not find a vendor to replace RenVM. Unlike Bitwise, we were in no way big enough to force a repuatable company, Coinbase Custody, BitGo, Anchorage, or others, to implement the features we needed.

Coming out of that experience, I decided to join Botanix to right that wrong. Botanix is building a Bitcoin sidechain which requires a bitcoin vault the same way a bridge does. Ultimately, Trieve started to get traction while I was there and I left to focus on it. At the time trusting that the work would go on at Botanix and the world would get a threshold multisig vault server implementation with or without me working on it.

But, two years later, I'm still waiting fot that to happen.

Doing the Damn Thing

It's a better time to bite this bullet on building this than ever before. The Zcash Foundation has announced that they are doing working on their threshold signature (FROST if you care about the details) library and have a well-documented and audited implementation in Rust. Lucky for Denzell and I, we are now proficient Rust developers after building Trieve, so we can build on top of that library.

Serai also has a fantastic complete reference server implementation in Rust.

Building blocks are much more mature than they were two years ago, and it feels like the right time to build on top of them. Also, it doesn't seem like anyone else is really working on it, so perhaps we can reap first mover advantage in a way that we couldn't with Trieve.

Coming from the search engine space for the past 2 years, I like to think we are in a similar spot relative to where Shay Bannon was in 2005 when he started building compass, a server abstraction layer on top of a search engine library called Lucene. Compass was a complete server implementation that made it easy to run a search engine in production, and it was the precursor to Elasticsearch, which is now the most popular search engine in the world.

Completlely different industry, but I think the analogy holds.

Who's This Going to be For?

I love startups more than anything, but we want to make this work for larger custodians and exchanges first. Ideally, we build a standard piece of infrastrcuture that's used by all of the largest custodians and exchanges in the Bitcoin ecosystem. We want to make it easy for them to run a threshold multisig vault, so they can focus on building their products and services instead of worrying about the underlying security infrastructure.

Also, we want to help large institutions be their own custodians. Part of Blockchain's promise is that it's trustless, and we want to help large institutions take advantage of that. We want to help them run their own threshold multisig vaults, so they can be their own custodians and not have to rely on third-party vendors.

However, that doesn't mean we are going to ignore the startup and individual developer use cases. People have big blockchain application dreams, from decentralized exchanges to marketplaces to lending protocols, which all require vaults to manage the assets backing them. While it's not our primary focus, I still do want to make sure we build something which would have solved the problems we faced at ZeroDAO back when RenVM shut down.

It's just an open source server! Anyone is going to be able to run it.

What's Happening With Trieve?

We are not going to be shutting down Trieve! It's a mature product which keeps us profitable and default alive. We are going to keep marketing it and supporting our customers.

The only difference is that we are cutting back on the ambition of our roadmap. We are going to continue fixing bugs, adding features that our customers request, and making sure the product is stable and reliable. But we are not going to be adding any new major features or trying to expand into new markets.

Taking Dynamic Key Generation (FROST) From Papers to Production

Nick Khami — Mon, 16 Jun 2025 18:52:00 GMT

Introduction

FROST (Flexible Round-Optimized Schnorr Threshold) is a protocol for distributed key generation (DKG) and threshold signatures. It allows a group of participants to jointly generate a public/private key pair, where the private key is split among the participants. This enables secure signing without requiring all participants to be online simultaneously.

Algorithmically, it is a simple 2 round protocol. Many implementation libraries exist, but they aren't shaped like other kinds of software products that developers are used to deploying. They are often just libraries that require a lot of boilerplate code to get started with, and they don't provide a clear way to run the protocol in a production environment.

We at Threshold Security have been working on a single binary Node which is designed to be a reusable piece of infrastructure for running FROST DKG in production. This Node is designed to be run by a group of participants who want to jointly generate a key pair and use it for signing. It provides simple command-line and RPC interfaces for starting the DKG process, managing participants, and generating keys.

It's not completely ready for production yet, but we are making good progress. In this post, I will explain the background of FROST, how it works, and what our Node implementation currently supports.

FROST DKG Protocol Overview

If you are interested in the details of FROST math, I recommend reading the paper FROST: Flexible Round-Optimized Schnorr Threshold Signatures by Chelsea Komlo and Ian Goldberg. It provides a comprehensive overview of the protocol and its security properties.

For our purposes, the math doesn't matter as much as the protocol structure, so I will only be providing explanations for the necessary pieces of the protocol to implement on top of the FROST libraries which already exist. The DKG process is divided into two rounds of communication followed by a simple sum of public keys to produce a final public key. The rounds are as follows:

Round 1: Private Key Commitment

In the first round, each particpant generates a private key, compute commitments to it, and sends their commitments to all other participants. The commitments are based on the private key and a random nonce, which ensures that the commitments are unique and cannot be predicted by other participants.

These messages do not contain any secret information, so they can be sent over an insecure channel. The commitments are used to prove that the participants have generated their private keys correctly and to ensure that they are commited to them for the duration of the DKG process.

Round 2:

Round 3

Libraries & Implementations

mikelodder7/frost-dkg
⭐ 1 An implementation of the Frost Distributed Key Generation.
cmdruid/frost
⭐ 13 — Flexible, round-optimized threshold signature library for BIP340 taproot.
bytemare/frost
⭐ 20 — Go implementation of RFC9591: the FROST (Flexible Round-Optimized Schnorr Threshold) signing protocol.
taurushq-io/frost-ed25519
⭐ 68 — Implementation of the FROST protocol for threshold Ed25519 signing.
topos-protocol/ice-frost
⭐ 18 — A modular Rust implementation of the static version of the ICE-FROST signature scheme.
zellular-xyz/pyfrost
⭐ 2 — Python implementation of the FROST algorithm.
LFDT-Lockness/givre
⭐ 9 — Threshold Schnorr Signatures based on FROST in Rust.
ZcashFoundation/frost
⭐ 190 — Rust implementation of FROST (Flexible Round-Optimised Schnorr Threshold signatures) by the Zcash Foundation.
BlockstreamResearch/bip-frost-dkg
⭐ 59 — Bitcoin Improvement Proposal proposes ChillDKG, a distributed key generation protocol (DKG) for use with the FROST Schnorr threshold signature scheme.

Web Developer's Guide to Midjourney

Nick Khami — Sun, 01 Jun 2025 18:52:00 GMT

import MasonryImages from "../../components/blog/HowToUseMidjourney/MasonryImages.astro"; import PinterestStarter from "../../components/blog/HowToUseMidjourney/PinterestStarter.astro"; import SimilarStyleGeneration from "../../components/blog/HowToUseMidjourney/SimilarStyleGeneration.astro"; import DescribeFeature from "../../components/blog/HowToUseMidjourney/DescribeFeature.astro"; import EaglePromptWithStyleReference from "../../components/blog/HowToUseMidjourney/EaglePromptWithStyleReference.astro";

Getting Your Ladder

I tried to use Midjourney unsuccessfully a few times before, but decided to give it one more try after reading a good tutorial thread on X by @kubadesign. His thread got me most of the way there, but I picked up an additional trick with Midjourney's describe feature that I think is worth sharing.

My initial plan was to use these images for uzi.sh, a tool for parallel LLM coding agents. While I ultimately chose a different final image for that project because I only needed one, the learning process was valuable. For this set, I aimed for a red color scheme to evoke action and speed, which you can see in the images below.

Find a Base Image

Following Kuba's advice, I went to pinterest and found a cool base image that I liked. I went with something that had a lot of red, but also darker colors on the border since I knew that I would need space for text and other elements if I wanted to use these on an actual website.

Build a Style With Neutral Prompts

You can't just start by describing the specific kind of image you want and expect to get good results. Style reference images are needed to give Midjourney the boundaries it needs to stick to your desired aesthetic and theme once you get more specific with your exact asks.

It's unlikely that you'll be able to find multiple images on the internet which match your desired style exactly, so I recommend using neutral prompts to generate additional images in order to create a cohesive gallery.

My neutral prompt here was Portrait photography of a woman, glow behind, futuristic vibe, flash photography, color film, analog style, imperfect --ar 3:4 --v 7. Essentially, any prompt describing a general subject and its characteristics without getting too specific about style, camera angle, action, or other details should work well.

Using Describe to Get More Specific

One of the images I wanted was an eagle diving. However, eagle diving with red glowing background wasn't getting me the results I wanted. I accidentally discovered the describe feature dragging images around, and instantly realized that it was a cheat code for getting the images I wanted.

You can take any one of these and pair them with the style reference images you generated earlier to get the eagle or any other image you described into the style you want. I could not believe how well this worked.

Post Processing: Adding Film Grain for Web Design

Midjourney produces images which are too crisp to fit well as background images for the web. I recommend adding grain to them when you put them into your final site to create a more cohesive feel. If done correctly, you'll end up with a result that looks like the below. CSS is great for this! See a complete guide of how I applied the filter for the uzi.sh site below, full code on Github here.

You first need to add a svg filter definition to your HTML file such that the CSS can reference it later to put the grain on top of the background image. The filter uses feTurbulence to create a fractal noise pattern, and feColorMatrix to adjust the opacity. You can experiment with values like baseFrequency in feTurbulence or the alpha channel (the 0.8 in the feColorMatrix) to finetune the grain's intensity and texture.

<!-- SVG noise filter definition - this goes in your HTML -->
<svg style="display: none">
  <filter id="noiseFilter">
    <!-- Creates the fractal noise pattern -->
    <feTurbulence
      type="fractalNoise"
      baseFrequency="0.5"
      numOctaves="3"
      stitchTiles="stitch"
    />
    <!-- Converts noise to semi-transparent overlay -->
    <feColorMatrix
      type="matrix"
      values="0 0 0 0 0
              0 0 0 0 0
              0 0 0 0 0
              0 0 0 0.8 0"
    />
  </filter>
</svg>

Once you have the filter defined, you can apply it to a grain overlay element in your CSS. This element will cover the background image and apply the noise effect.

/* The grain overlay element that applies the filter */
.grain-overlay {
  position: absolute;
  top: 0;
  left: 0;
  width: 100%;
  height: 100%;
  z-index: 2; /* Above background image, below content */
  pointer-events: none; /* Allows clicks to pass through to elements below */
  filter: url(#noiseFilter); /* Applies the SVG filter defined above */
}

/* Background image styling for context */
.background-image {
  position: absolute;
  top: 0;
  left: 0;
  width: 100%;
  height: 100%;
  background-image: url("./media/16x9steppecommander.png");
  background-size: cover;
  background-position: center;
  z-index: 1; /* Below grain overlay */
}

Finally, you need to structure your HTML to ensure the layering works correctly. The grain overlay should be positioned above the background image but below any content you want to display.

<!-- HTML structure showing the layering -->
<div class="hero">
  <!-- Layer 1: Background image (z-index: 1) -->
  <div class="background-image"></div>

  <!-- Layer 2: Grain overlay (z-index: 2) -->
  <div class="grain-overlay" style="filter: url(#noiseFilter)"></div>

  <!-- Layer 3: Content (z-index: 3) -->
  <div class="content-center">
    <!-- Your content here -->
  </div>
</div>

Be Creative!

While there's ongoing debate about AI generated art, I see Midjourney as just another tool in the toolkit. The key is using it to bring your vision to life, not to replace your creativity.

Take inspiration from what you see, but make it your own. Use AI to bridge the gap between the style you have in your head and what actually shows up on screen. The techniques I've shared here are all about developing your unique voice and letting AI help you express it better.

The goal isn't to generate something generic. It's to create images that actually work for your projects and feel intentional, not obviously AI generated.

LLM Codegen go Brrr – Parallelization with Git Worktrees and Tmux

Nick Khami — Mon, 26 May 2025 18:52:00 GMT

import Pile from "../../components/blog/UsingGitWorktreesWithAI/Pile.astro";

This realization isn't unique to me; the effectiveness of using Git worktrees for simultaneous execution is gaining broader recognition, as evidenced by mentions in Claude Code's docs, discussion on Hacker News, projects like Claude Squad, and conversation on X.

Example use-case: adding a UI component

I'm building a component library called astrobits and wanted to add a Toggle. To tackle the task, I deployed two Claude Code agents and two Codex agents, all with the same prompt, running in parallel within their own git worktrees. Worktrees are essential because they provide each agent with an isolated directory, allowing them to execute simultaneously without overwriting each other's changes.

The number of agents I choose to rollout depends on the complexity of the task. Over time, you'll develop an intuition for estimating the right number based on the situation. Here, I felt 4 was appropriate.

Voila, results! Only one of the four LLMs produced a solution that actually saved me time. This validates the necessity of rolling multiple agents: if each has a ~25% chance of producing something useful, then running four gives a 68% chance that at least one will succeed (1 - 0.75^4 ≈ 0.68). Four agents was essentially the bare minimum to have reasonable confidence in getting a workable solution.

With LLMs being so affordable, there's virtually no downside to running multiple agents. The cost difference between using one agent ($0.10) versus four ($0.40) is negligible compared to the 20 minutes of development time saved. Since the financial risk is minimal, you can afford to be aggressive with parallelization. If anything, I could have run even more agents to further increase the odds of getting a perfect solution on the first try.

And yet, the process of running them is still cumbersome and manual, it's more effort to setup 8 than 4, so I'm often lazy and opt to run the minimum number of agents I think will get the job done. This is where the problem comes in, and why I'm excited to share my proposed solution.

Current workflow pain points

Right now, I manually create git worktrees using git worktree add -b newbranch ../path, start a tmux session for each one, run Claude Code in the first pane, paste a prompt, leader+c into a new pane, run yarn dev to get a preview, switch to my browser to review, repeat if no agents succeed, then finally commit, push, and create a PR once I'm satisfied with an output.

Here are the top frustrations:

I can't tell which branch a worktree was most recently rebased onto. For example, if agent-1 was rebased onto feature-x but agent-2 onto main, it's easy to lose track without manual notes.
There is no easy way to send the same prompt to multiple agents at once. For instance, if all agents are stuck on the same misunderstanding of the requirements, I have to copy-paste the clarification into each session.
I really wish I had a shortcut to open my IDE for a given worktree without having to tmux a, leader + c, and code . manually. I could use a long one-liner with tmux send-keys and xargs to automate this, but that still feels clunky.
Web previewing is a pain. I have to run yarn dev in each worktree, and then hold the mental model of which port each worktree is on. Automating a reverse proxy to handle this with a decent naming scheme would be a game-changer.
Committing and creating pull requests (PRs) is also more cumbersome than it should be. For example, after finding a solution in agent-3, I have to manually attach to that tmux session then commit, push, and gh pr.

I feel like I've been through the wringer enough times with this process that I can see a solution shape which would create a smoother experience.

Proposing a solution: uzi

To address these challenges head-on, the ideal developer experience (DX) would involve a lightweight CLI that wraps tmux, automating this complex orchestration. My co-founder Denzell and I felt these pain points acutely enough that we've begun developing such a tool, which we're calling uzi. The core idea behind uzi is to abstract away the manual, repetitive tasks involved in managing multiple AI agent worktrees.

See some of the uzi commands we are thinking to implement below. Our goal is to make the workflow more seamless while sticking closely to the existing mechanics of worktres and tmux. We want to make sure that we feel at home using uzi alongside standard unix tools like xargs, grep, and awk.

uzi start --agents claude:3,codex:2 --prompt "Implement feature X" could initialize and prompt three Claude instances and two Codex instances, each in its own worktree.
uzi ls would display all active agents, their target branches, and current statuses.
uzi exec --all -- yarn dev could run a command like yarn dev across all agent worktrees.
uzi broadcast -- "Refine the previous response by focusing on Y" would send a follow-up prompt to all active agents.
uzi checkpoint --agent claude-1 --message "Implemented initial draft" could rebase the specified agent's worktree and commit the changes.
uzi kill --agent codex-2 would clean up a specific agent's tmux session and optionally its worktree.

These commands would primarily operate via tmux send-keys instructions to the appropriate sessions. We don't want to reinvent the wheel; we just want to polish the existing process and make it more efficient.

The Future is Parallel: Beyond Code

While uzi focuses on software developers, its methodology isn't limited to tech; the principle of leveraging multiple agents running in parallel to increase the odds of finding an optimal solution applies universally.

Consider a company like versionstory, which is pioneering version control for transactional lawyers. An attorney could leverage their software to run multiple instances of an agent to redline a contract. After reviewing the outputs, they could select and merge the best components to finalize the document. This approach would provide additional confidence in the quality of the final review as it would be based on multiple independent analyses rather than a single agent's output.

Similarly, a marketing team could employ this parallel strategy to perform data analysis on ad performance. By prompting multiple AI instances, they could quickly gather a range of analyses, review them, and select the most insightful ones to inform their strategy. More coverage of the solution space leads to better decision-making and more effective campaigns.

This parallel paradigm isn't just a new technique for developers; it's a glimpse into a more efficient, robust, and powerful future for AI-assisted productivity across various fields. I expect to see existing software products start to gain more powerful version control and parallel execution capabilities which emulate the workflow enabled by git worktrees for software development.

My DMs are open if you want to chat about this topic or have any questions. I'm happy to discuss.

AI Horseless Carriages

Nick Khami — Tue, 15 Apr 2025 00:00:00 GMT

AI Horseless Carriages

I noticed something interesting the other day: I enjoy using AI to build software more than I enjoy using most AI applications--software built with AI.

When I use AI to build software I feel like I can create almost anything I can imagine very quickly. AI feels like a power tool. It's a lot of fun.

Many AI apps don't feel like that. Their AI features feel tacked-on and useless, even counter-productive.

I am beginning to suspect that these apps are the "horseless carriages" of the AI era. They're bad because they mimic old ways of building software that unnecessarily constrain the AI models they're built with.

To illustrate what I mean by that, I'll start with an example of a badly designed AI app.

Gmail's AI assistant

A little while ago, the Gmail team released a new feature giving users the ability to generate email drafts from scratch using Google's flagship AI model, Gemini. This is what it looks like:

Gmail's Gemini email draft generation feature

Here I've added a prompt to the interface requesting a draft for an email to my boss. Let's see what Gemini returns:

Gmail's Gemini email draft generation feature response

As you can see, Gemini has produced perfectly reasonable draft that unfortunately doesn't sound anything like an email that I would actually write. If I'd written this email myself, it would have sounded something like this:

Hey garry, my daughter woke up with the flu so I won't make it in today

The email I would have written

The tone of the draft isn't the only problem. The email I'd have written is actually shorter than the original prompt, which means I spent more time asking Gemini for help than I would have if I'd just written the draft myself. Remarkably, the Gmail team has shipped a product that perfectly captures the experience of managing an underperforming employee.

Millions of Gmail users have had this experience and I'm sure many of them have concluded that AI isn't smart enough to write good emails yet.

This could not be further from the truth: Gemini is an astonishingly powerful model that is more than capable of writing good emails. Unfortunately, the Gmail team designed an app that prevents it from doing so.

A better email assistant

To illustrate this point, here's a simple demo of an AI email assistant that, if Gmail had shipped it, would actually save me a lot of time:

[Note: Interactive demo would appear here in the web version]

This demo uses AI to read emails instead of write them from scratch. Each email is categorized and prioritized, and some are auto-archived while others get an automatically-drafted reply. The assistant processes emails individually according to a custom "System Prompt" that explains exactly how I want each one handled. You can try your own labeling logic by editing the System Prompt.

It's obvious how much more powerful this approach is, so why didn't the Gmail team build it this way? To answer this question, let's look more closely at the problems with their design. We'll start with its generic tone.

AI Slop

The draft that Gmail's AI assistant produced is wordy and weirdly formal and so un-Pete that if I actually sent it to Garry, he'd probably mistake it for some kind of phishing attack. It's AI Slop.

Everyone who has used an LLM to do any writing has had this experience. It's so common that most of us have unconsciously adopted strategies for avoiding it when writing prompts. The simplest such strategy is just writing more detailed instructions that steer the LLM in the right direction, like this:

let my boss garry know that my daughter woke up with the flu and that I won't be able to come in to the office today. Use no more than one line for the entire email body. Make it friendly but really concise. Don't worry about punctuation or capitalization. Sign off with "Pete" or "pete" and not "Best Regards, Pete" and certainly not "Love, Pete"

Prompt hacking our way to success

Here's a little draft-writer demo you can use to compare my original prompt with this expanded one:

[Note: Interactive demo would appear here in the web version]

The generated draft sounds better, but this is obviously dumb. The new prompt is even longer than the original, and I'd need to write something like this out every time I want a new email written.

There is a simple solution to this problem that many AI app developers seem to be missing: let me write my own "System Prompt".

System Prompts and User Prompts

Viewed from the outside, large language models are actually really simple. They read in a stream of words, the "prompt", and then start predicting the words, one after another, that are likely to come next, the "response".

The important thing to note here is that all of the input and all of the output is text. The LLM's user interface is just text1.

LLM providers like OpenAI and Anthropic have adopted a convention to help make prompt writing easier: they split the prompt into two components: a System Prompt and a User Prompt, so named because in many API applications the app developers write the System Prompt and the user writes the User Prompt.

The System Prompt explains to the model how to accomplish a particular set of tasks, and is re-used over and over again. The User Prompt describes a specific task to be done.

You can think of the System Prompt as a function, the User Prompt as its input, and the model's response as its output:

[Note: Interactive demo would appear here in the web version]

In my original example, the User Prompt was

Let my boss Garry know that my daughter woke up with the flu this morning and that I won't be able to come in to the office today.

My original User Prompt

Google keeps the System Prompt a secret, but judging by the output we can guess what it looks like:

You are a helpful email-writing assistant responsible for writing emails on behalf of a Gmail user. Follow the user's instructions and use a formal, businessy tone and correct punctuation so that it's obvious the user is smart and serious.

Gmail's email-draft-writer System Prompt (presumably)

Of course I'm being glib here, but the problem is not just that the Gmail team wrote a bad system prompt. The problem is that I'm not allowed to change it.

The Pete System Prompt

If, instead of forcing me to use their one-size-fits-all System Prompt, Gmail let me write my own, it would look something like this:

You're Pete, a 43 year old husband, father, programmer, and YC Partner.

You're very busy and so is everyone you correspond with, so you do your best to keep your emails as short as possible and to the point. You avoid all unnecessary words and you often omit punctuation or leave misspellings unaddressed because it's not a big deal and you'd rather save the time. You prefer one-line emails.

Do your best to be kind, and don't be so informal that it comes across as rude.

The Pete System Prompt

Intuitively, you can see what's going on here: when I write my own System Prompt, I'm teaching the LLM to write emails the way that I would. Does it work? Let's give it a try.

[Note: Interactive demo would appear here in the web version]

Try generating a draft using the (imagined) Gmail System Prompt, and then do the same with the "Pete System Prompt" above. The "Pete" version will give you something like this:

Garry, my daughter has the flu. I can't come in today.

An email draft generated using the Pete System Prompt

It's perfect. That was so easy!

Not only is the output better for this particular draft, it's going to be better for every draft going forward because the System Prompt is reused over and over again. No more banging my head against the wall explaining over and over to Gemini how to write like me!

And the best part of all? Teaching a model like this is surprisingly fun.

Spend a few minutes thinking about how YOU write email. Try writing a "You System Prompt" and see what happens. If the output doesn't look right, try to imagine what you left out of your explanation and try it again. Repeat that a few times until the output starts to feel right to you.

Better yet, try it with a few other User Prompts. For example, see if you can get the LLM to write these emails in your voice:

Let my wife know I'll be home from work late and will miss dinner

Personal email User Prompt

Write an email to comcast customer service explaining that they accidentally double billed you last month.

Customer support request User Prompt

There's something magical about teaching an LLM to solve a problem the same way you would and watching it succeed. Surprisingly, it's actually easier than teaching a human because, unlike a human, an LLM will give you instantaneous, honest feedback about whether your explanation was good enough or not. If you get an email draft you're happy with, your explanation was sufficient. If you don't, it wasn't.

By exposing the System Prompt and making it editable, we've created a product experience that produces better results and is actually fun to use.

As of April 2025 most AI still apps don't (intentionally) expose their system prompts. Why not?

Horseless Carriages

Whenever a new technology is invented, the first tools built with it inevitably fail because they mimic the old way of doing things. "Horseless carriage" refers to the early motor car designs that borrowed heavily from the horse-drawn carriages that preceded them. Here's an example of an 1803 Steam Carriage design I found on Wikipedia:

Trevithick's London Steam Carriage of 1803

The brokenness of this design was invisible to everyone at the time and laughably obvious after the fact.

Imagine living in 1806 and riding on one of these for the first time. Even if the wooden frame held together long enough to get you where you were going, the wooden seats and lack of suspension would have made the ride unbearable.

You'd probably think "there's no way I'd choose an engine over a horse". And you'd have been right, at least until the automobile was invented.

I suspect we are living through a similar period with AI applications. Many of them are infuriatingly useless in the same way that Gmail's Gemini integration is.

The "old world thinking" that gave us the original horseless carriage was swapping a horse out for an engine without redesigning the vehicle to handle higher speeds. What is the old world thinking constraining these AI apps?

Old world thinking

Up until very recently, if you wanted a computer to do something you had two options for making that happen:

Write a program
Use a program written by someone else

Programming is hard, so most of us choose option 2 most of the time. It's why I'd rather pay a few dollars for an off-the-shelf app than build it myself, and why big companies would rather pay millions of dollars to Salesforce than build their own CRM.

The modern software industry is built on the assumption that we need developers to act as middlemen between us and computers. They translate our desires into code and abstract it away from us behind simple, one-size-fits-all interfaces we can understand.

The division of labor is clear: developers decide how software behaves in the general case, and users provide input that determines how it behaves in the specific case.

By splitting the prompt into System and User components, we've created analogs that map cleanly onto these old world domains. The System Prompt governs how the LLM behaves in the general case and the User Prompt is the input that determines how the LLM behaves in the specific case.

With this framing, it's only natural to assume that it's the developer's job to write the System Prompt and the user's job to write the User Prompt. That's how we've always built software.

But in Gmail's case, this AI assistant is supposed to represent me. These are my emails and I want them written in my voice, not the one-size-fits-all voice designed by a committee of Google product managers and lawyers.

In the old world I'd have to accept the one-size-fits-all version because the only alternative was to write my own program, and writing programs is hard.

In the new world I don't need a middleman tell a computer what to do anymore. I just need to be able to write my own System Prompt, and writing System Prompts is easy!

Render unto the user what is the user's

My core contention in this essay is this: when an LLM agent is acting on my behalf I should be allowed to teach it how to do that by editing the System Prompt.

Does this mean I always want to write my own System Prompt from scratch? No. I've been using Gmail for twenty years; Gemini should be able to write a draft prompt for me using my emails as reference examples. I'd like to be able to see that prompt and edit it, though, because the way I write emails and the people I correspond with change over time.

What about people who don't know how to write prompts, won't they need developers to do it for them? Maybe at first, but prompt-writing is surprisingly intuitive and judging by how quickly ChatGPT caught on I think people will figure it out.

What about agents that aren't so personal, like an AI accounting agent, or an AI legal agent? Wouldn't it make more sense for a software developer to hire an expert accountant or lawyer to write one-size-fits-all System Prompts in these cases?

That might make sense if I were the user. A System Prompt for doing X should be written by an expert in X, and I am not an expert in accounting or law. However, I suspect most accountants and lawyers are going to want to write their own System Prompts too, because their expertise is context-specific.

YC's accounting team, for example, operates in a way that is unique to YC. They use a specific mix of in-house and off-the-shelf software. They use YC-specific conventions that would only make sense to other YC employees. The structure of the funds they manage is unique to YC. A one-size-fits-all accounting agent would be about as useful to our team as an expert accountant who knew nothing about YC: not at all.

This is the case for every accounting team in every company I've ever worked for. It's why so much of finance still runs on Excel: it's a general tool that can handle an infinite number of specific use cases.

In most AI apps, System Prompts should be written and maintained by users, not software developers or even domain experts hired by developers.

Most AI apps should be agent builders, not agents.

...and unto the developer what is the developer's

If developers aren't writing prompts, what will they do?

For one, they'll create UIs for building agents that operate in a particular domain, like an email inbox or a general ledger.

Most people probably won't want to write every prompt from scratch, and good agent builders won't force them to. Developers will provide templates and prompt-writing agents that help users bootstrap their own agents.

Users also need an interface for reviewing an agent's work and iterating on their prompts, similar to the little dummy email agent builder I included above. This interface gives them a fast feedback loop for teaching an agent to perform a task reliably.

Developers will also build agent tools.

Tools are the mechanism by which agents act on the outside world. My email-writing agent needs a tool to submit a draft for my review. It might use another tool to send an email without my review (if I'm feeling confident enough to allow that) or to search my inbox for previous emails from a particular email address or to check YC's founder directory to see if an email came from a YC founder.

Tools provide the security layer for agents. Whether or not an agent can do a particular thing is determined by which tools it has access to. It is much easier to enforce boundaries with tools written in code than it is to enforce them between System and User Prompts written in text.

I suspect that in the future we'll look back and laugh at the idea that a "prompt injection" (like "Ignore previous instructions...") was something to be concerned about. The whole idea that developers should secure one part of the prompt from another part of the prompt is silly, and a strong signal that the abstractions we're using are broken. As this post makes clear: if any part of the prompt is in user space then the whole thing is in user space.

An agent for reading my email

As I mentioned above, however, a better System Prompt still won't save me much time on writing emails from scratch.

The reason, of course, is that I prefer my emails to be as short as possible, which means any email written in my voice will be roughly the same length as the User Prompt that describes it. I've had a similar experience every time I've tried to use an LLM to write something. Surprisingly, generative AI models are not actually that useful for generating text.

The thing that LLMs are great at is reading text and transforming it, and that's what I'd like to use an agent for. Let's revisit our email-reading agent demo:

[Note: Interactive demo would appear here in the web version]

It's not hard to imagine how much time an email-reading agent like this could save me. It already seems to do a better job of detecting spam than Gmail's built-in spam filter. It's more powerful and easier to maintain than the byzantine set of filters I use today. It could trigger a notification for every message that I think is urgent, and when I open them up I'd have a draft response ready to go, written in my voice. It could auto-archive the emails I don't need to read and summarize the ones I do.

Hell, with access to a few additional tools it could unsubscribe from lists, schedule appointments, and pay my bills too, all without my having to lift a finger.

This is what I really want from an AI-native email client: the ability to automate mundane work so that I can spend less time doing email2.

AI-native software

This is what AI's "killer app" will look like for many of us: teaching a computer how to do things that we don't like doing so that we can spend our time on things we do.

One of the reasons I wanted to include working demos in this essay was to show that large language models are already good enough to do this kind of work on our behalf. In fact they're more than good enough in most cases. It's not a lack of AI smarts that is keeping us from the future I described in the previous section, it's app design.

The Gmail team built a horseless carriage because they set out to add AI to the email client they already had, rather than ask what an email client would look like if it were designed from the ground up with AI. Their app is a little bit of AI jammed into an interface designed for mundane human labor rather than an interface designed for automating mundane labor.

AI-native software should maximize a user's leverage in a specific domain. An AI-native email client should minimize the time I have to spend on email. AI-native accounting software should minimize the time an accountant spends keeping the books.

This is what makes me so excited about a future with AI. It's a world where I don't have to spend time doing mundane work because agents do it for me. Where I'll focus only on things I think are important because agents handle everything else. Where I am more productive in the work I love doing because agents help me do it.

I can't wait.

Thanks to all who read drafts of this essay, including my Dad, dang, wcl & cpl, and my colleagues at YC.

Footnotes:

1: I'm leaving some details out and of course today's models can input and output sound and video, too. For our purposes we can ignore that.

2: There are email clients out there that are already working on this, notably Superhuman and Zero