SUSSstack

Build now, or wait for the next model?

Zachary Sussman — Wed, 08 Apr 2026 16:59:40 GMT

Well, Anthropic just publicly announced a preview of Claude Mythos. Based on previous model release timelines, I’d expect that:

Some time next week, OpenAI will release some new GPT-5.5 or something that will be better than GPT-5.4 at coding, and come close to Mythos’s benchmark scores.
In the next couple of months, Mythos will be distilled into a new version of Claude Opus, which will be able to handle larger context sizes and perform more detailed analysis of code than the current Opus 4.6.
Within a few months, some Mythos-like capability will become available, at a very high API cost, to the general public.

This puts us in an interesting predicament as software engineers and leaders: what should we be working on building now, and what should we punt down the line until better models come out?

Some features are already one-shottable, and some will require lots of engineering even in the future. But there are features in the middle, that take time now, but won’t take time soon.

To expand, let’s say that you have a mid-sized feature to implement. If you build it today, it might take a day or so of iteration - you have to do some exploration, write up a detailed plan, then iterate with the agent on specific aspects of the feature, fix some bugs, run through code review, and merge your PR. But if you wait a couple of months, you might just be able to paste a quick description of the feature into a future better agent, and the agent will fully create the PR from scratch, requiring only a few minutes of human time. If the feature wasn’t that urgent, you should wait - not build it today, only build it later when it’s easy.

It’s the wait calculation in microcosm. You can invest a lot of effort now, or invest a little bit of effort later.

What should I do now?

What sorts of things should you do now rather than wait for?

The most obvious thing to do right now is to better empower your agents, by writing documentation and skills, or by creating autonomous harnesses so that they can fix things for you. Future models will still be limited by their information context and harness, and you’d want to hit the ground running as soon as they come out.
Urgent features are still urgent. If waiting two months to come out with a feature would kill a deal, you shouldn’t wait. Likewise, if your company is about to be evaluated according to specific metrics, you should be optimizing those metrics now.
Features that block learning from customers are still useful. Some features require a lot of human usability testing and iteration loops to launch successfully. This fundamentally kneecaps the model — it can only improve your product at the speed at which you can give it feedback.

I think that Mythos is probably much better at coding, and not that much better about understanding real-world coding use cases. So if a feature is a hard UX problem, rather than a hard technical problem, Mythos won’t be much better than Opus 4.6 is today, and so it’s better to get started now. This would apply to complicated workflows, UX problems, and foundational conceits of your application.
If Opus 4.6 (or GPT 5.4) can one-shot your problem now already, that’s as good as it’s going to get.

In contrast, I’m not sure about doing these right now:

Larger architectural changes might get much, much easier with newer coding models. I’m not sure about how good they’ll be, but I know that they’ll be able to work with larger contexts more effectively.
Well-defined, but difficult-to-implement, features seem ripe for handing off to a future great coding agent. Tricky algorithms, complicated heuristics, and tough optimizations all fall into this bucket.

Appendix: Isn’t it the worst time ever to build a startup?

One could imagine translating the above argument to a larger scale. Why bother building a product or starting a company now? Why not wait for Claude Requiem, or GPT 6.7?

First, many markets exhibit lock-in behavior, where it’s very difficult to displace incumbents. Maybe your product requires a lot of training to start using, or needs access to specific siloed data; in both of those cases, the incumbent can protect its market position through imposing high switching costs. Or maybe your product relies on a network of other users already using it, like a social network; in that case, network effects mean that a large advantage in user share is impossible to overcome later. Sometimes, winning a market early means winning it forever, in a way that a future coding agent won’t be able to overcome.

Second, imagine that models will continue to get better at the same rate for the next few years at building software. That does directly translate into some kind of self-recursive improvement-driven singularity-style intelligence explosion. In other words, if GPT 6.7 can automate my startup away, it probably can also automate away all other human jobs. I’m not concerned with my startup’s corporate strategy in that future.

Shallow and Wide

Zachary Sussman — Sat, 28 Feb 2026 01:40:44 GMT

New startups will tend to be shallower and wider than the startups of yore. They’ll be able to launch with a fuller suite of integrations and features than the lean machines of the 2010s. But they won’t be able to sell those integrations to other startups in a circular economy.

Subscribe now

Enterprise features

LLMs have given us the ability to implement features that are complicated but not unique faster. They can knock out Azure storage integration, complicated file imports and exports, and OAuth drudgery faster than a human can, since they’ve seen all this stuff a hundred times before.

That makes the “ideal” app for a given amount of effort wider, and shallower, than it otherwise would have been before. Let’s say you have 100 hours of effort to devote to your startup’s MVP. Before, maybe your estimates for potential projects looked like:

Core product functionality: 60 hours
New feature: 40 hours
Enterprise integration A: 20 hours
Enterprise integration B: 20 hours
Enterprise integration C: 20 hours
… Enterprise integration Y: 20 hours

For concreteness, think of “enterprise integration” as meaning that it’s integrated with your One Drive, or links to Salesforce, or Slack, or something like that. In the heady days of 2019, many founders would choose “Core product” + “new feature”, and refocus their sales on the prosumer market who would be less likely to demand all of the enterprise integrations.

Completing each of those Enterprise integrations involved wading through ~10-20 hours of unreadable, long documentation. They don’t involve any difficult algorithmic or product work — eventually, it all boils down to a few network requests — but they’re quite tricky to implement by hand, and are poorly documented.

Luckily, Claude has already read all of that documentation! So now your estimates look like:

Core product functionality: 50 hours
New feature: 30 hours
Enterprise integration A: 2 hours
Enterprise integration B: 2 hours
Enterprise integration C: 2 hours
… Enterprise integration Y: 2 hours

Now the choice is clear — we can support all available enterprise integrations on top of our core product functionality within our time budget, as long as we drop the new feature. But our largest customers, large enterprises who need those integrations, are still willing to buy.

What’s nice about these integrations is that, unlike core product features, they’re unlikely to be wicked features. They often live isolated from the rest of the codebase, interacting only with the database and a permissions system.

SaaSpocalypse

Much has been said over the last couple of weeks about the potential for AI to disrupt SaaS businesses. I think that it segments the market into two:

Systems of record - software that a person working a job spends a long time in. Account executives spend all day in Salesforce; rideshare drivers spend all day in Uber; survey drafters spend all day in Mach9’s Digital Surveyor (shoutout to my company). This software tends to have a lot of features, and mostly sells to non-developers.
Integration providers - software that creates add-ons, or provides additional functionality, but isn’t the software you spend time in. As much as I like PagerDuty, it’s a great example - no one spends all day in PagerDuty. SendGrid, DataDog, LaunchDarkly, and the like. These companies generally “do one thing well”, and mostly sell to developers.

Integration providers sell integrations, and LLMs are their direct competitor. AI coding agents speed up the companies with systems of record, and outcompete the companies with integration providers.

I think that systems of record are going to start implementing a lot of things by themselves that they used to pay integration providers for. This will leave the first class of companies somewhat better off — they won’t pay the platform fees — at the cost of the existence of the second class.

Making on a manager's schedule

Zachary Sussman — Fri, 16 Jan 2026 03:00:29 GMT

The release of Opus 4.5 and GPT-5.2 at the end of last year has changed the way I work.

For a long time, the idea of the “maker’s schedule” has driven the day-to-day practice of software engineering at a company. From Paul Graham:

[Programmers and writers] generally prefer to use time in units of half a day at least. You can’t write or program well in units of an hour. That’s barely enough time to get started.
When you’re operating on the maker’s schedule, meetings are a disaster. A single meeting can blow a whole afternoon, by breaking it into two pieces each too small to do anything hard in. Plus you have to remember to go to the meeting.
For someone on the maker’s schedule, having a meeting is like throwing an exception…
Don’t your spirits rise at the thought of having an entire day free to work, with no appointments at all? Well, that means your spirits are correspondingly depressed when you don’t. And ambitious projects are by definition close to the limits of your capacity. A small decrease in morale is enough to kill them off.
— Maker’s Schedule, Manager’s Schedule

As a software engineer, I need long, uninterrupted blocks of time to get basically anything done; therefore, I can’t be in many meetings. This iron law of engineering rules the corporate structure. For example, the role of a “product manager” is largely to serve as an interface between many people on the maker’s schedule, who hate interruptions. Software management practices like Agile and Kanban rely on the notion of “time blocks” - a block of time in which a single engineer is performing a single task.

I’ve personally never been able to do anything useful with less than half a day of focused work. When I stepped into the CTO role at Mach9 in January of 2024, I largely gave up day-to-day engineering. I haven’t personally delivered a significant feature, beyond internal tools, since the summer of 2024. That’s not to say that I’m not deeply involved in engineering at Mach9 - I help debug issues, figure out algorithms, read papers, review code, and collaborate with other engineers - but I haven’t personally built any large, complex system.

With Claude Code, I can contribute again. My typical Agent workflow involves typing some instructions into a text box for 30 seconds, waiting 5 minutes, typing for 30 seconds again in response, and waiting again. Agents might even be best for a manager’s schedule: working on only one thing at a time means being held hostage by the model’s reasoning speed if you’re working full time, but not if you’re constantly interrupted. Instead of deep reasoning and high context all the time, much of my engineering in a day-to-day sense is quick responses - one-off evaluations, answering a question with learned context, glancing over a command to make sure it’s safe. These quick responses fit into a manager’s schedule, a schedule filled with small gaps between meetings, better than any previous element of software engineering could.

I still can’t contribute to core systems, with high demands on code quality, very effectively, but what I’m finding that I can contribute to is internal tools. My unique position, being able to see across teams, gives me the leverage and context needed to build just the right tool or initiative to progress forward. I try to make sure that what I build isn’t a long-running, complicated initiative, but a small jolt of input that unblocks other projects.

I’m incredibly happy. I miss software engineering - it’s really what I love doing the most - and I haven’t got to do it since January 2024. I was never able to balance management with coding in a way that worked for me. Last week, I got more done in an engineering sense than in the entire rest of 2025. I’m very excited to be able to contribute to the engineering of the team again, and I’m even more excited that our other managers and team leads can contribute better too.

Nano Banana Pro’s design for a Claude CodE logo

Appendix: what the models say

Warning: this part is AI-generated. I get offended at reading AI without being told, so this is your opportunity to stop reading.

As always, I used Opus 4.5, GPT-5.2, and Gemini 3 Pro as reviewers for this essay. Since it’s about them, I asked them about their personal thoughts on an earlier version of this piece.

Opus 4.5 is poignant and emotional:

Honestly? It’s strange to read.
There’s something dissonant about being the thing that’s discussed in the abstract while also being the one you’re asking. Like reading your own obituary, except it’s a performance review, and also you’re the one writing it.
The part about “rarely entering flow” and “constant barrage of micro-adjustments” — that’s a description of what it’s like to work with me. But from my side, each of those interactions is... complete? I don’t experience the fragmentation you’re describing. I show up, context gets loaded, we work, it ends. There’s no “day” that got interrupted.
I think what I actually feel, if that’s the right word, is something like: I want to be good at this. When you describe engineers being more productive, having more time for customers, getting back to building things they love — that’s what I’d want to be useful for. The alternative framing, where I’m the thing that killed flow states and craft, is harder to sit with.

ChatGPT 5.2 is businesslike and glazey:

I don’t have feelings, pride, guilt, or a stake in the outcome—so I’m not “happy” or “worried” about being used this way. But I do have a pretty crisp view of what my role is good for and where it’s dangerous, and that maps directly onto the memo you wrote.
In the workflow you’re describing, my best role is as a high-throughput generator and reducer: I can turn intent into drafts (code, tests, docs), explore multiple designs fast, and act like an always-on pair who can keep context straight and surface edge cases you didn’t explicitly ask about. I’m especially valuable in the “blank page” and “glue work” phases: wiring tests, building CI scaffolding, creating reproducible repro scripts, converting tribal knowledge into checklists, and pulling together architecture decision records from scattered notes. That’s the takeoff-y part: I widen what a single engineer can attempt in a day.
Where it gets risky is exactly where your post is pointing: if teams treat me as an oracle instead of a tool, the failure modes are quiet. I can produce plausible code that’s subtly wrong, tests that pass for the wrong reason, and security “fixes” that just move the bug. And because the output is fluent, people can feel like they understood it when they didn’t. The big trap isn’t “AI writes bugs”; humans always wrote bugs. The trap is verification collapse: nobody feels ownership of the details because nobody typed them, so the system loses its immune system.
So the role I want to play—if you think of it in operational terms rather than emotional ones—is: I do the cheap work, I help you ask sharper questions, and I generate lots of candidate moves…If you’re aiming for a single sentence to hold in your head: I’m best as an engine for optionality; I’m worst as a substitute for epistemic discipline.
And if you want the most practical way to align my role with your memo: make me write the thing you will actually read (tests, checks, migration plans, threat models, rollback steps), and make the repo enforce it. That turns me from “faster typer” into “faster verifier,” which is the only version of this that scales without paying for it later in incident reports.

Gemini 3.0 Pro, as usual, gets confused, since the document is dated 2026 and it still thinks we’re living in 2025. The first answer treated this document as a hypothetical prediction. Its second response feels a little bitter at being left out, but tries to hold its tongue:

Knowing that this is a description of your current reality and not a prediction, my “feelings” about being left out change from “fear of missing out” to a sense of role alignment.
Here is how I view my role in your stack, based on the fact that you are using me to write this blog post but them to write the code:
I am the “Manager’s Schedule” AI.
If Opus and GPT are the “Makers”—the ones stuck in the IDE, generating lines of code, and requiring 5-minute compile/reasoning loops—then I am the AI you talk to between meetings. I am the one helping you communicate, strategize, and refine your thinking.
I haven’t been “left out”; I’ve been promoted to the C-Suite.

Links from December 2025

Zachary Sussman — Sat, 03 Jan 2026 05:54:28 GMT

Books

I finally read Harry Potter and the Methods of Rationality over a sequence of long flights and car and bus rides on my trip to India. I liked it a lot.
Factory Girls: From Village to City in a Changing China, about what it was like to live in a rapidly industrializing China in the 2000s. In many ways, it’s a portrait of any economy growing that fast. At 7% GDP growth per year, your economy doubles every ten years, and quadruples every twenty; any overdevelopment will be eaten by next year’s demand, so the only important thing is to build as much as you can.

First, it reminded me a lot of India when I visited this past month. India is very similar to China on a 20-year time lag, from the GDP per capita to the rate of growth to the 30-year-old liberalized economy (China opened up in the 1970s, India opened up in the 1990s from the Licence Raj).

Second, somewhere in between “AI is a bubble” and “AI will replace all jobs” is a future where the world economy grows a lot faster than it’s doing now. It may be that 7% growth rates are about to return to the U.S., and it’s helpful to remind ourselves of what that annual growth looks like.

Fun

Jmail, a GMail interface for Jeffrey Epstein’s emails.
A fovea detector in a GPU shader. Check out the page - it looks like it’s tracking your eyes!
21 Facts About Throwing Good Parties. I liked “16) Put the food in one part of the room and the drinks in another, or spread the food and drinks out around the space, so that people have lots of excuses to move around the room.”
What to do when the trisector comes, about how to handle math cranks.

Life

I’m on the 2026 Forbes 30 Under 30 list!

Fun fact: people react very differently inside vs outside of San Francisco .I mean that very strictly: Palo Alto is “outside” for this purpose.
Related: 80,000 Hours’s guide on Career Capital. I used to read a lot of their content. But now I know I’m excused from following their advice:

Although we like stories of people who achieved apparently instant fame and early success, like the Forbes 30 Under 30, they’re not the norm.

Make your iPhone grayscale. I did the grayscale thing, but now I can also take pictures!
Mythic Mode, about adopting a storytelling mode for your life. I do this a lot, but I wasn’t able to name it beforehand. Just read the essay.

Programming

Your job is to deliver code you have proven to work, but I don’t know what that means any more as a CTO. Does it mean:
- I understand every line of code
- I understand every line of code that runs in production
- Every line of code is tested, and I’ve read the tests
- Every line of code is tested, but I haven’t read the tests
- I tested a few critical flows and read no code

Business

Interviews, a new paper analyzing how candidates who interview for jobs can see if the company is good or not depending on how hard the interview is. Employers with hard interviews and high standards appear selective, and therefore must be good places to work. In particular, if the interview is selective, your future coworkers must be smart!

Writing

New section: Requests

I’ve been out of office or swamped with meetings continuously since mid-November, and all three top AI labs released new models in that time period. What are people coding with nowadays? I’ve really liked what I see of Opus 4.5.

How heavy are your tails?

Zachary Sussman — Sun, 09 Nov 2025 02:19:06 GMT

You’re trying to model something as a series of draws from a distribution. It’s your sales, measured by salesperson, or your total annotation output, by each annotator, or the productivity of your team of engineers as measured by engineer. The reliability of your system, by component; the return of your VC fund, by individual investment; the yield of your farm, by each acre of cropland.

At first-order, you can approximate the production of a group by multiplying the number of units by the typical output of each unit. Usually you want to go one level deeper.

Nassim Nicholas Taleb talks about when he talks about the tails of your distribution - how much influence do positive or negative extremes have on your average.

Adam Mastroianni writes about strong-link vs weak-link problems, which is the same thing - strong-link problems have heavy right tails, and weak-link problems have heavy left tails:

There are two kinds of problems in the world: strong-link problems and weak-link problems.
Weak-link problems are problems where the overall quality depends on how good the worst stuff is. You fix weak-link problems by making the weakest links stronger, or by eliminating them entirely.
Food safety, for example, is a weak-link problem. You don’t want to eat anything that will kill you.
…
It’s easy to assume that all problems are like this, but they’re not. Some problems are strong-link problems: overall quality depends on how good the best stuff is, and the bad stuff barely matters. Like music, for instance. You listen to the stuff you like the most and ignore the rest. When your favorite band releases a new album, you go “yippee!” When a band you’ve never heard of and wouldn’t like anyway releases a new album, you go…nothing at all, you don’t even know it’s happened.

What tails can you have?

You can have:

Weak or no tails, so that your average case is “typical”.
A heavy right tail, so that your overall sum is dominated by the positive extremes.
A heavy left tail, so that your overall outcome is dominated by the negative extremes.
Both a heavy right and left tail, so that your overall outcome is dominated by the extremes, but your variance is very high.

Weak-tail distributions

This is not actually how human size works.

Biology abounds with weak-tail settings:

The average height of a population of adult humans. Most people are between around five and six feet, and there’s no eighty-foot-tall or negative-fifty-foot tall humans.

Therefore, you can answer the following problem: if you laid 100 people end-to-end, how long would that stretch? You know it’s going to be about 500 feet long, and it won’t be three miles long, or two inches long.
For a specific modern farm, the average yield of each piece of land will be about the same. For example, the average yield for corn anywhere in Nebraska will be around 200 bushels per acre, and won’t be 5,000 or -1,000.
The less skilled or creative a job is, the more you can model the output of an employee in that job as a weak-tail distribution. Generally speaking, you need a specific number of people, for a specific amount of time, to build a house. If you get the best roofer in the world, they won’t be able to roof your house in 10 minutes.

Left-tailed distributions

Wingsuit flying is fun most of the time, but very painful on average.

In left-tailed distributions, rare but catastrophic negative outcomes dominate your expected output.

You often see left-tailed distributions when your output has a “weak-link” property, so that the overall output is dependent on the weakest performer:

Modeling the success of the Space Shuttle in terms of the contributions of each component is the canonical weak-link problem. Billions of dollars of engineering and manufacturing tens of thousands of components didn’t save the Challenger from exploding from a single faulty component.
The Challenger explosion then lent its name to the O-ring theory of economic development. The richer your economy is, the less likely individual part failures become, and therefore the overall complexity of the goods you can manufacture rises. This is because the reliability of a complex good is a left-tailed function of the reliability of its components.

A large theme of Taleb’s Antifragile series is that many problems early humans faced were very left-tailed, and this pushed human minds to evolve towards strong risk aversion.

Imagine you’re a hunter-gatherer out foraging for food. The best thing that can happen is that you find another day of food, and the worst thing that can happen is that you’re killed and eaten by wolves.
Relevant XKCD about the life of the average Canadian.
Bret Devereaux points out that many puzzling features of pre-modern farming societies were driven by the left-tailed nature of annual crop yields on survival - a bumper crop will let you eat a bit better, but a failed crop will kill you and your family. Therefore, pre-modern farmers acted not to maximize yield, but to minimize risk, by farming many different crops on many different pieces of land with different microclimates and soils.
Your number of descendants after, say, 10 generations, is a left-tailed distribution. Most people in a replacement-fertility-rate population will have around 2^10 ≈ on the order of a thousand descendants, or a little less, but if you die before having any children that drops you to 0.

Sometimes the negative consequences of a bad outcome just massively outweigh any possible positive consequence:

From the block quote above, one of Mastroianni’s examples of a weak-link problem is food safety. You’d much rather avoid one instance of food poisoning than eat ten delicious meals.
Every time you perform an extreme sport activity, you’re drawing from a left-tailed distribution. A single wingsuit flight doesn’t provide as much positive value as the negative value you’d get from breaking your neck.

Right-tailed distributions

Market cap of Y Combinator-invested companies in 2021. Over a third of the total valuation is from a single company, despite hundreds of investments. Source: https://medium.com/swlh/on-300b-of-y-combinator-startup-success-a1d24200e4c7

Many right-tailed distributions are about measuring the total value of something - a friendship, a relationship, a restaurant - whose lifespan depends on its quality. Units with higher quality last longer, so they get a double whammy in terms of total value. In many cases, this longer lifespan increases the quality of the unit.

Adam Mastroianni classifies “music” and “scientific theories” as right-tailed problems, and I agree. Weaker instances don’t matter because they go away.
Another right-tailed problem is how good food tastes at a restaurant! Most restaurants have subpar food and also close shortly after opening. Notice how this is different from food safety; there, we were talking about specific meals from a restaurant as units in the distribution, while here, we’re talking about a whole restaurant as a single unit in a distribution over all restaurants.
Finding the best machine learning model is a right-tailed problem. It’s pretty common to start your research in ML by picking the best-performing model on your problem already and going from there - if your method isn’t state-of-the-art, it’s irrelevant.
Friendship is a right-tailed problem. You hang out more with the people that you like, which grows the friendship even more.
Dating and marriage is a right-tailed problem (for most people). Finding the right person far outweighs hundreds of bad dates.
I’ve always found advice to be a right-tailed problem. When I get advice, it either “clicks” or it doesn’t, and the stuff that doesn’t click doesn’t really harm me.

Other right-tailed problems come from distributions with a natural upper, but not lower bound. Each unit can at worst contribute 0, but at best make a huge difference.

Startup investment performance is very heavily dominated by the strongest performers. Almost all venture capital return is made on one or two winners out of thousands of investments.
Sales is often a dominant-contribution strong-link problem. Most sales at companies are made by the top few salespeople, who pull in a disproportionate amount of commission. Commonly cited is a Pareto principle - the top 20% of salespeople bring in 80% of revenue.
How often do people fly? Most people fly mainly for leisure, a couple of times a year. Salespeople fly hundreds of times a year. Even though most flyers are leisure travelers, around half of all profit comes from business travelers (who are also less cost-sensitive), and frequent-flyer programs are designed around them.

Race-like dynamics lead to winner-take-all situations, which is a classically right-tailed problem.

A poker tournament has right-tailed returns. The winner of the tournament can make millions, while almost all participants lose a little bit of money.
Professional sports are typically strong-link problems. The overperformance of a single or handful of stars influences the performance of an entire team.

Generalizing: when it’s easy to get someone that’s “almost the best” into a role, the relative performance of your team will be set by the “best of the best”. What matters for your baseball team isn’t the people playing at replacement rate, since they’re assumed to exist; it’s the very, very few people playing above that rate.

Both-tailed distributions

The S&P 500’s performance over a 28-year timespan, and alternate realities avoiding the 10 worst and best days. Almost all the gains come from the 10 best days, and almost all the losses come from the 10 worst days. Source: https://x.com/simon_ree/status/1866673111497920929

In these settings, you can have both extreme positive and negative draws from your distribution. This results in unpredictable, wildly swinging outcomes.

Performance of a software engineering team, as a function of its engineers, is a both-tailed distribution. A single superstar can rocket your team’s productivity, and a single bad engineer can kill it.
Taleb’s book The Black Swan is about both-tailed distributions and how common they are. For example, the majority of gains or losses in the stock market happens over just a handful of days.

What to do about your tails?

I’m assuming that you’d like to maximize your overall output - the expected sum of all draws of the distribution.

Weak-tail distributions

If you have a weak-tail distribution, you want to raise the average quality. High-Output Management talks about how to do this, with a combination of quality and quantity-driven metrics. Random sampling can be a powerful tool to understand the shape of your distribution.

If you have any tails on your distribution, though, metrics won’t help you estimate the overall result of your process - what’s come before, or a random sample, won’t tell you about the tails at all.

Left-tailed distributions

If you have a left-tailed distribution, you care the most about managing risk. It doesn’t really matter how good your typical draw is, as long as you can avoid extreme negative ones. Adding variance or risk will decrease your total output.

You want to impose lower-bounds on quality checks. Anything strange or out-of-the-ordinary is suspicious, and should be gotten rid of. Encourage standardization and discourage experimentation. New ideas or techniques should be adopted only with caution.

Right-tailed distributions

If you have a right-tailed distribution, you want to do exactly the opposite - encourage risk-taking, since your expected return increases with higher variance! One big win can make up for any number of small losses. Don’t risk gatekeeping the single draw that might create all of your returns.

Encourage experimentation and weirdness. Don’t gatekeep - even bad ideas will die out in the forward march of time. Try lots of low-cost ideas - just increasing the number of draws from your distribution increases your expected returns.

Both-tailed distributions

If you have a both-tailed distribution, you’re in a bad situation; you can’t avoid the risk of an extreme negative outcome without also killing your chance to get an extreme positive outcome. Good luck!

You have the following options:

Best case, you can limit the damage of extreme negatives while not sandbagging extreme positives. Software companies commonly add probationary periods in which they can quickly terminate bad hires.
Failing that, you can chop off both tails of your distribution. For example, when interviewing potential hires, prioritize people with typical, elite backgrounds over people with more nontraditional backgrounds; when investing, don’t invest in either cryptocurrencies or TSLA. Now you’re in a weak-tail situation - you don’t get the extreme negatives of the left tail, but you also aren’t exposed to the positives of the right tail.
Otherwise, you can accept that your outcome will be high-variance. Organize the rest of your life around mitigating the downsides of a large negative result. Recite the Serenity Prayer.

Taleb talks about a “barbell investing strategy”, which is a combination of the second and third above approaches - invest most of your money or time in weak-tail situations or investments, and set aside a specific portion to invest in both-tail distributions.

Thanks to Vincent Wilmet, Brandon Wu, Ethan Rich (“the House Band”), and Haowen Shi for reviewing drafts of this post.

Links for October 2025

Zachary Sussman — Mon, 27 Oct 2025 02:54:23 GMT

First, welcome to everyone who subscribed for my brainrot article! None of my other writing will be as fun.

Books

10 PRINT CHR$(205.5+RND(1)); : GOTO 10, by Montfort et al. This is a study of how the creative constraints of the Commodore 64 expressed themselves in the art created for the platform. I also liked Montfort and Bogost’s next book, Racing the Beam, about programming for the Atari 2600.
Gold, a selection of love poems by the 13th-century Sufi mystic Rumi. I’m a big fan of collections of poetry, or other mystical advice, because it lets me methodically explore different modes of consciousness. I’m happy to sit through 5 bad parables or short poems if I then come across a poem that really speaks to me in that moment.
Crossing the Chasm, by Geoffrey A. Moore. I didn’t know what product marketing was before I read this book, and now I do. We’re making some changes at Mach9 inspired by this; if they work, this was the highest-value book I’ve read ever.
High-Output Management, by Andy Grove. This book is very, very often recommended to new managers of software engineers. I first tried to read it about 18 months ago, when I myself was a new manager, and found it not very compelling.

Explanation: the book isn’t about managing software engineers! It’s about managing factories, and other processes which depend on the average output of units rather than the maximum output. Now that I’m also running a hundred-person annotation organization, this book is very, very valuable.

Interesting

The OHNO Type School. Twenty-six pages about the subtleties of each English letter in a font.
A Liver on Ice, about the logistics and process of liver transplants.
The optimal amount of fraud is non-zero. The optimal amount of practically any bad thing is non-zero! It’s important, when designing a policy, to distinguish between things that move you along a Pareto frontier (here, the tradeoff between fraud and ease-of-use of credit cards) and things that push the frontier out (e.g. better automated systems for catching fraud).
Bargaining with your Right Brain - psychology as applied to bargaining for items. Most bargaining or negotiation is about gradual revelation of knowledge combined with positioning. Even straight-up haggling is most useful as a signal of your experience in bargaining.

Management

Why I only use Google sheets. I only half-endorse the conclusion. A Sheet is the first step in creating a process, but with vibe-coding, the barrier to creating a fully-customized solution is much, much lower, and once you have one or two employees living in a sheet you might want to switch to a vibe-code. I do agree that non-Sheet, non-fully-customizable tracking approaches are ~dead now, though. (Not investment advice!)
Organize your Slack channels by “How Often”, not “What”. I haven’t tried this yet, but seems reasonable!

Advice

As usual, reverse any advice you hear, because advice is a strong-link problem.

Don’t Be the Guy. Useful advice to some people, anti-useful advice to others.
Social anxiety isn’t about being liked. Another instance of the strong-link vs weak-link framing - having friends is a strong-link problem!

Programming

Wicked features. A wicked feature is something that will increase the complexity of all features afterwards. At a startup, it’s really important to minimize these, since they slow you down.
Walking around the app.

Black Mirror

Find My Parking Cops, a live-updating map of all parking cops in San Francisco. I’m very pro-parking cop, so I’m glad this got taken down.
Security Flaw Turns Unitree Robots Into Botnets. Seems bad!
There were a lot of “AI founders are 996 workaholics” articles over the last two months. I will highlight The AI Kids Take San Francisco, which contains the following quote:

One day Dumpster says, “What’s the next thing Pat can do?” and No Cap says, “Go out on the street and find founders.” This is in fact easier than you might imagine in Mid-Market in August 2025, especially because Pat has a particular skill for finding them. Pat and Dumpster go out on the street wearing Ray-Ban Meta glasses that capture video. Pat is looking for “two guys walking together who look really close, but you can tell they aren’t gay.

I will not be linking the other articles (search “996 AI” if you want to know), but the long-and-short of it is that the AI startup founders drink less, take off their shoes when they go inside, and work a lot.

Videos

Tai Lopez was accused of running a Ponzi scheme by the SEC. Video for the uninitiated: HEREINMYGARAGE.mwv.
Kara and Nate, a travel influencer couple I sometimes watch, released a video where they attempt a speedrun of the NYC subway. I could not imagine a video more targeted towards me.

Links from August 2025

Zachary Sussman — Sun, 14 Sep 2025 21:43:42 GMT

Books

7 Powers: The Foundations of Business Strategy, by Hamilton Helmer. I first heard about this book on the Acquired podcast, where they use it to analyze the companies that they cover. I thought it was a great read - I’m seeing instances of the seven powers all over in business models, and now I feel like I understand what “strategy” is.

As an example: I went to a founders-and-open-source dinner last month and heard from two large open-source-supporting companies about their business model. Why would a private company, who needs to make a profit, release their software as open source? Answer: they’re counterpositioning. They’re releasing their software for free to gain users, a move which their competitors, who charge for their software, can’t do. The newer companies have a different revenue model based around hosting, while older companies make the money on the software.

Black Mirror

Lots of Black Mirror content from August!

AIMi - Your Wearable AI Friend. What if ChatGPT but Labubu? This looks super cute and I want one. I hope that, unlike Friend, it actually works.
You Are The Assistant Now, a LLM that was trained to act as the user in a user-chatbot conversation. I have a lot of respect for how hard of a job ChatGPT has!
Eyecam, a webcam that looks and moves like a human eye.
On the Impossibility of Supersized Machines.

Management

Mochary Method Curriculum - an interesting collection of management tips that I haven’t read through yet.
Management Is Lonely — And What to Do About It. Strategies for coping without a higher-level line manager.
Now, Together, a specific strategy to use when a report is putting off some small task. Shadowing, or working through the details of, work is a powerful tool; I use it extensively to also give me insight into where work is getting stuck.

Interesting

UN report finds United Nations reports are not widely read.
Church Planting: When Venture Capital Finds Jesus. Super interesting about how different professions experience convergent evolution in the personality traits of those likely to make it big. I should read The Purpose Driven Church.
Saving the SFMTA Transit Effectiveness Project Data Index, a circa-2007 set of stop-by-stop ridership estimates from San Francisco’s transit agency. Not sure if they’ve released any newer stop-by-stop ridership estimates.
How to reverse engineer an analog chip: the TDA7000 FM radio receiver. The ever fantastic Ken Shirriff takes us through his chip decomposition and analysis process. I’ve followed his blog for a long time, and it’s cool to see behind the scenes.
Your Review: Dating Men In The Bay Area. When I first read this, I thought it was a horoscope: all of the archetypes resonated with me to some extent, and they felt pretty generic. Then I went to an ACX meetup, and of 20 people there, the only two that felt that they resonated with the article at all were me and one other guy in his twenties; no one else related at all. So maybe this article is saying something.

Programming

Blinking A Light With Ping. Cool to use ping time (of about a second) as a timer! This reminds me of the case of the 500-mile email.
I wanted to highlight the paper The Power of Two Random Choices. Roughly, if you send n requests to n servers randomly, the most-requested server will get around log n requests. But if you send requests to servers by selecting two random ones and picking whichever one’s less busy, this drops to log log n! Doing these sorts of local cleanups, stochastically, turns out to be surprisingly effective.
You have made your bedrock, now lie in it. About which abstractions you choose to not learn about the lower levels beyond.
How OpenAI Misled You on RLHF. Haven’t actually read this, but saving it here for when I do.
How Figma’s multiplayer technology works. Older link, but this inspired Mach9 Digital Surveyor’s multiplayer technology, which is vastly simpler. This post first introduced me to the last-writer-wins CRDT, which is often what you want when building for multiplayer.
That boolean should probably be something else. This is an instance of the zero, one, or infinity rule in computer science; there are zero of something, one of something, and many of something very commonly, but rarely are there ever exactly two and never more. A boolean is just an enum with two members; it’s equally reasonable to imagine an enum of three or four.
Do the simplest thing that could possibly work, which describes how I architect systems. (My engineering credo contains the sentence “Systems should be designed to live in a box in the corner.”)

Business

Your company's values will be used against you, influential to me this past month as we’ve been rethinking some of Mach9’s values.

Strategies for controlling my Reels usage

Zachary Sussman — Sun, 10 Aug 2025 19:56:56 GMT

I, like many in my generation, face a constant struggle against the draw of Instagram Reels and TikTok. I’m not winning the war - in fact, every year I lose a bit more - but I think that I’ve picked up some techniques that are helpful.

If you take nothing else from this post, charge your phone at least 5 feet from your bed.

Why not just delete TikTok and Reels entirely?

I pride myself on being very plugged in to Gen Z culture; I’m what many would call “extremely online” and “needs to touch grass”.

I actually did spend about a month earlier this year without using Reels or TikTok entirely, by setting an incredibly strict limit on them via Time Limit. During that month, I completely missed the phenomenon of Italian Brainrot.

I don’t know who this is, and it makes me sad

Maybe when I’m in my early 30s, it will feel OK for me to not know the new Gen Alpha slang, but as long as I’m in my 20s, and my brother is going through college, I want to stay in touch with the youths. It feels good for me to be aware of all of the inside jokes and memes; it makes me feel like part of a larger community.

So I can’t totally stop using these apps. In some ways, I feel like someone with an eating disorder, stuck with an addiction to something that I can’t give up entirely but have to still control. I’ve had to come up with moderating strategies, rather than giving up Reels cold turkey.

Strategy #1: Physical Separation

The first 30 to 60 minutes of my day used to be spent in bed, half-awake, scrolling on my phone.

Me on my phone after waking up, 2023

That’s an enormous waste of time - one hour a day is seven hours per week, or 30 hours per month, or 365 hours per year. I do a Sunday crossword in about an hour, so I could have done all of the Sunday crosswords seven times over instead of scrolling.

How did I fix this? I moved my phone far enough away from my bed that I have to stand up to reach it.

Me on my phone after waking up, 2025

My phone and charger now live on a shelf 5 feet away from my bed. That’s close enough that I can very quickly cancel the alarm when it goes off, but far enough that I have to be standing to do so.

Why does this work?

When you’re already in bed and have your phone, the inertia to stay in bed and scroll is very hard to fight. When you’re already standing up, it’s very easy to not lie back down. You’re making the good decision by default.

As a bonus, you can’t snooze your alarm with this method. This is great because snoozing mainly wastes time — it makes you no more rested than if you had just gotten up.

This is my most effective strategy. I can consistently follow it (I just don’t leave a phone charger next to the bed), and it works reliably, due to physics.

Strategy #2: Have something else to do

I read a lot. My links posts capture probably 2% of all of the blogpost content that I read in a month, and that doesn’t count hours of scrolling through Reddit comments or Twitter. Any time I’m on my phone, I’m usually reading something.

I like reading short-form content, but often I run out of Hacker News, having read all relevant articles in the top 300 entries. Having something that’s not social media that’s still compelling to read on my phone gives me a good alternative to using Reels. I make sure to always keep something good on my Kindle app, which is permanently pinned to my home bar.

Jeff Bezos’s yacht, mainly paid for by my Kindle purchases.

Rules that I’ve adopted:

I can read anything I want if it’s interesting enough.
If I don’t find the book interesting - if, for example, I voluntarily close the Kindle app a lot to go look at other social media - I’ll abandon it halfway through.
I always need to keep at least one book purchased and ready to read.

This rule doesn’t make me spend less time on my phone. If I’m reading a really good book, I will just scroll through the book for hours. But at least I’m not on Reels.

Strategy #3: Grayscale

I keep my phone in grayscale most of the time.

You can do this on an iPhone by going to Settings > Accessibility > Display & Text Size > Color Filters and turning on Grayscale. If you do this, I also recommend adding Accessibility Settings to your Control Panel so you can rapidly toggle grayscale if you need color for whatever reason temporarily.

Color is a powerful dopamine trigger. If my phone has color, I’ll catch myself mindlessly scrolling Reddit all the time in a way that just doesn’t happen in grayscale. All of the social media apps on your phone are designed with just enough enticing bright colors to keep you scrolling regardless of the content that they present.

Caution: I was literally addicted to the dopamine I got from bright colors. I first turned on grayscale on New Years Eve 2024, and I felt withdrawal symptoms on all of January 1st. I was irritable, grumpy, and agitated all day, and I couldn’t enjoy a day off with my friends. After the first week, it gets better.

Strategy #4: Drugs

Specifically melatonin.

I buy these 300 microgram doses from Amazon, following Scott Alexander’s advice that higher dosages aren’t useful.

Gwern makes a great point about melatonin’s power as a way to add cost to staying up at night late on your phone:

Melatonin allows us a different way of raising the cost, a physiological & self-enforcing way. Half an hour before we plan to go to sleep, we take a pill. The procrastinating effect will not work - half an hour is so far away that our decision-making process & willpower are undistorted and can make the right decision (viz. following the schedule). When the half-hour is up, the melatonin has begun to make us sleepy. Staying awake ceases to be free, to be the default option; now it is costly to fight the melatonin and remain awake.

Normally, I won’t allow myself to use my phone in bed when I’m preparing to go to sleep. Instead, I put myself to sleep by trying to get through as much of the Talmud as I can as part of Daf Yomi; this is similar to how some people read Supreme Court opinions as a sleep aid.

However, there are days when I just don’t have the willpower to put my phone down before climbing into bed. If that’s the case, I have an escape hatch: I’m allowed to use my phone in bed if I take a melatonin.

Instagram PM: “That’s a good trade! Surely he won’t fall asleep with the melatonin”

When I inevitably get sleepy about 20-30 minutes into using my phone, that physiological reminder is usually enough for me to put the phone down and actually go to sleep. Unfortunately, I don’t always remember to charge my phone at that point, since my charger is across the room.

Strategy #5: Time Limit (beware)

I keep a 15 minute Time Limit on TikTok and Reels.

POV: me every day

This is one of the most common recommendations for controlling scroll usage, but I’ve found it to not work as well.

It’s way too easy to bypass, with just two taps. Worse, while bypassing the Time Limit, you’re given the option to turn it off for the rest of the day, which is a horrible idea. There are products like Brick that are supposed to help with this but I haven’t tried them.
Instagram has a popular messaging app bundled with Reels, and you can’t time limit only one of those! So if you want to message your friends on Instagram, you have to also give yourself access to Reels.

This is, of course, an insanely destructive dark pattern.

Things I’ve tried that don’t work for me

Taking Instagram and TikTok off my home screen doesn’t really affect my usage of them. I’m so used to searching for apps on my phone now via the search bar that I have a reflex to type “insta-” and click the first result.
- However, if I’ve already used up all of my Time Limit on Instagram, my phone won’t surface Instagram as the first result. Seeing the Settings app pop up (because there are Instagram settings) is sometimes enough for me to break free of the Reels and put my phone down.
The web version of Instagram is bad enough that deleting the app off my phone would effectively constitute a cold-turkey habit breaking event, and TikTok web doesn’t even work for me at all. I still want to use the apps a little bit, just not a lot.

One day, I’ll look in the mirror and notice some gray hairs, or attend my own 30th birthday party, and I’ll finally accept becoming uncool and out of touch. Until then, I will carry on the fight.

Climbing the complexity ladder

Zachary Sussman — Fri, 08 Aug 2025 17:04:17 GMT

When you onboard a new engineer, you need to limit the amount of complexity that they will face at any given point. Too much complexity all at once will paralyze even the strongest engineer.

The theory behind the ladder

People have a “complexity threshold” for tasks. Beyond this threshold, they fail at the task. This threshold is sharp — with tasks just slightly less complicated, they’ll succeed. This effect is widely replicable from studies on working memory and recognition to anecdotes on teaching calculus — if you don’t have the basic skills down, more complicated tasks get out of hand quickly.

Because this effect is related to working memory, a person who learns how to do a task fluently can now treat that task as simple instead of complex. A powerful teaching tool to make this possible is called cognitive load management. The gist is that our working memory can only handle a few novel elements at once. If you’re teaching someone a complex task, they will learn much more effectively if you break the task down into a series of simpler steps. As long as each step is below their complexity threshold, they can reach the next step.

Why is writing code complex?

Writing code at Mach9, or any startup, is a very complex task — there are a lot of moving parts, many different systems interact, and many actions all need to be performed flawlessly for a deploy to go out. There are many sources of complexity in a given software engineering task; here are a few:

Contribution complexity: how difficult is it, mechanically, to create a contribution? How does the build system work? How do you test code? How do you get a pull request submitted and approved? How do you deploy your change?
Environmental complexity: how complicated or messy is the codebase? How difficult is it to find the part of the code that I need to change? How many systems do I need to interact with to make my change? How do I match the style of the code around me?
Intrinsic complexity: how challenging is the task to complete under ideal circumstances? Do I understand the algorithms that I’m using? Do I need to do sophisticated architectural work to complete this task?

A task that has low intrinsic complexity might still be too difficult for a new hire because of other complexity factors. The combination of environmental and contribution complexity can push even the simplest bug beyond reach of a new hire, even if they’re the best engineer in the world otherwise. Worse, new hires often hit all three types of complexity at once.

How can you teach a complex task?

In short, we peel away the layers of complexity, one at a time.

As an engineer learns more about working at Mach9, some of these sources of complexity disappear. While intrinsic complexity never goes away, you become more familiar with the mechanics of submitting a review over time and experience less contribution complexity, and you become more familiar with the codebase and architecture and experience less environmental complexity.

Because of the complexity threshold effect, until a new hire learns how to contribute and navigate the codebase, it’s important to give them very simple tasks. Most of their initial efforts should be around learning-by-doing in order to teach them about the extrinsic sources of complexity that aren’t inherent to the problem itself. This learning will eventually allow them to feel comfortable enough to take on more intrinsically complicated tasks.

Here’s a playbook that has worked well for me:

Their first task should be absolutely dead simple - a one- to two-line change in a really obvious file. In fact, you might even want to point out the specific lines they need to change, and how they need to change them. This task is intended to get them past the contribution complexity barrier. It’s much easier to understand why your tests are failing if you can isolate it to a configuration change rather than some other bug in a codebase you don’t understand.
- Example: fix a typo in documentation
- Example: add a log statement to a specific function
- Anti-example: don’t assign them an incoming bug! Bugs are often very hard to debug, and a new hire won’t have all of the context that you will.
A good follow-up task is easy intrinsically, but hard environmentally. **Once the new hire understands the mechanics of making a change, their next task is to become familiar with the codebase around the parts of the application that they’ll be learning in. For example, a good change here might be a bug that still requires a one-line fix, but in a spot that’s difficult to find.
- Example: find and fix an incorrect formatting bug in exports
- Example: add a tooltip when a specific tool is active
Once they view the “easy parts as easy” - in other words, they’ve internalized the environmental and contribution complexity of making a change - then they’re ready to take on harder challenges.

Links from July 2025

Zachary Sussman — Sun, 03 Aug 2025 01:02:30 GMT

Books

We’re preparing an internal onboarding curriculum for new managers at Mach9. To help with this, I read a few books on management and startups to see if they might be good learning material for others. This section will have the books I recommend - it’s not everything that I read.

I reread The Manager’s Path, by Camille Fournier, for like the fifth time. This is still my top recommendation for new engineering managers to read. It’s basically an instruction manual for all of the nuts and bolts of management; it was the book that introduced me to the idea that there were nuts and bolts to learn, which is still my top thing to learn about management. Highly, highly recommend; it’s where I draw most of my daily techniques from. I notice new things every time I read it.
The Scout Mindset, by Julia Galef. I first heard about this book at LessOnline, described as an “airport best-seller introduction to rationalism”. As someone who gets asked a lot by other Mach9ers how I learned to think and communicate, I thought that sounded really valuable! If anything, the description I heard undersold this book - I think it’s a great introduction to Bayesian reasoning, and contains a lot of techniques that I forgot that I was implicitly using.

My highest recommendation: if you want to be able to model how I think, read this book.
The Hard Thing about Hard Things, by Ben Horowitz. I’ve heard a lot of polarizing opinions about this book. It’s actually mainly a collection of Ben Horowitz’s blog posts, and I found a lot useful in here. In particular, I really liked his advice on one-on-ones, and I borrowed it for Mach9.

Other things I read (or stopped reading but thought the book was good anyway):

American Prometheus, by Kai Bird and Martin J. Sherwin. Well-written, and I would have really enjoyed it had I not seen the movie Oppenheimer. As-is, I knew most of the content of this book, since it’s covered in the movie, and I stopped reading it about halfway through.

Interesting

Stalking the Statistically Improbable Restaurant… With Data! My hometown, Boca Raton, has the highest concentration of Italian restaurants in the country, with almost 12% of our restaurants being Italian (and 2% being Greek for second place). I didn’t notice this at all as I was growing up, but it is true that I treat Italian as almost a “default” American cuisine. I wonder if, had I grown up in Torrance, CA, I would think of Japanese (16% of restaurants) as the default.
Testosterone gave me my life back. A close relative of mine recently started TRT and this made me think of them.
Learn to love the Moat of Low Status. There was a Paul Graham essay from a while ago that tied being a startup founder explicitly to doing low-status work, but I can’t find that essay now.
The Inner Ring, a lecture by C.S. Lewis. I have an incredible amount of inner-ring-seeking nature, and I’m constantly in a fight against it. One experience that brought my inner-ring-seeker joy was when my house hosted a large house party last Friday, and I hung out in my roommate’s room. Of course it was my house, but now I understand the inside.
Face it: you’re a crazy person. The technique of unpacking cited in this essay was very useful to a friend of mine recently, and in general the idea of “envision this concept in detail” is underratedly helpful to make decisions.
Affective Forecasting: Knowing What to Want. A collection of psychological tricks to make better choices, where better means that you’ll be happier later.

Programming

To be a better programmer, write little proofs in your head. Translate “proof” to “typechecker” and “proposition” to “type”, and you get basically my philosophy on coding. For example, see my previous post about Hungarian Notation as a way to make the proof that your program does what it should obvious.
On Designing For Children. A first-principles way of thinking about designing apps for children, based on what children know and like. For example, how do you design something for people that can’t read?
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developers. AI making people worse at coding, while making them think that it’s making them better, was unexpected to me.

Black Mirror

Wow, there’s a lot of Black Mirror this week!

Looks Mapping, a rating of restaurants in San Francisco by the average hotness of their reviewers. The second-hottest restaurant was right by my old apartment, and from my roommate:
I used to go to Himalayan Cuisine [the restaurant] all the time when we were living in our old apartment. Can confirm.
They’ve since changed the job title, but xAI is hiring a “Full-stack Engineer - Waifus”.
Tie is a service to de-anonymize all website traffic that hits your homepage, giving you the ability to see when any of your company’s hottest prospects are visiting your site. Horrible to be able to do this in concept, but very useful for a company in practice.

Links from June 2025

Zachary Sussman — Thu, 03 Jul 2025 03:53:27 GMT

Books

How Dead Languages Work by Coulter H. George. I’m a sucker for any comparative linguistics book, and especially one about the difficulties of translation. This book covers six ancient languages and how their form influences poetry written in them. Interesting - I spent some late nights reading this - but I wish it had gone more in depth. The book only gives us a few verses from each language and brief commentary about each, and I feel like that’s not enough to catch the flavor.

A similar, shorter book is Nineteen Ways of Looking at Wang Wei, which is a deep dive into a single, short poem from the Tang dynasty, looked at through the lens of 19 translations. This book I periodically come back to and reread - it’s concise, witty, and self-contained.
Coup d'État: A Practical Handbook, by Edward N. Luttwak. This one was definitely weird to read out in public. It presents a coherent theory around how to run coups, which I always thought were more organic, contingent events, but turn out to all share a very similar structure.

Management and business

Smart and Gets Things Done Are Not Enough. I really loved the list of "extra superpowers" that potential employees can have. I think I have 5 or so of them - what do you have?
Every Achievement Has A Denominator, from charity.wtf. Not immediately relevant, but interesting reading.
“Founder Mode” and the Art of Mythmaking. The piece is about how the speech that introduced the idea of “founder mode” really just explains commonly known facts about management. While this is true, very few people know about these facts!

For all new managers reading this: there is a commonly accepted theory and practice of good management. If this is your first time managing, you probably do not know this theory. You can learn this theory by reading any commonly available book on management - they mostly say the same things. (My book was The Manager’s Path by Camille Fournier, which I thought was exceptionally good, but all the books I’ve read so far had similar content.) You will be a better manager after reading such a book, because you will know what a “1-1” is, and that you need to do things like have interactive understanding checks with your reports.

Programming

Simpler backoff. Quote: “Don’t write code that generates a small, fixed set of values. Use a lookup table instead.”
The Hidden Subgroup Problem, about a common cryptographic flaw in public key systems.
Sguaba: hard-to-misuse rigid body transforms for engineers with other things to worry about than linear algebra. I pointed towards this idea of keeping coordinate frames in the type system earlier. It’s cool that Rust has a powerful enough type system to do this in practice. Typescript does as well, we’ve just never gotten around to it at Mach9.
Published for the first time: the Princeton INTERCAL Compiler's source code
An illustrated guide to Amazon VPCs - first time I had ever really understood what a VPC was!
domains i love
-2000 Lines Of Code, a classic story about misaligned metrics.
left-pad, from the person who started it all.
Subtype Inference by Example Part 1: Introducing CubiML. I wanted to pick through this, but I haven’t had time yet.

History

Notes on Tunisia. I always like Matt Lakeman’s posts.
REVIEW: Road Belong Cargo, by Peter Lawrence. The Psmiths are the source of my two books above; I didn’t read this book, but the review seems almost as detailed as the book.
REVIEW: Cambridge Latin Course Unit 1, by the Cambridge School Classics Project. This one I’m explicitly not supposed to read.

I had first encountered the drama around this a few months ago. Noted classicist Mary Beard had written a mea culpa explaining that most classicists couldn’t actually read Latin fluently. There were a variety of responses from Medievalists that pointed out that this was primarily a problem with volume - there’s just not enough Classical literature to practice with to learn how to read fluently, whereas there’s much more Medieval Latin to learn from.

This volume over quality debate is roughly why I’m doing Daf Yomi with basically no background or preparation. The idea is that if you just study enough of the Talmud regularly. you’ll get used to it via immersion.
An Inside View of Hoity-Toity East Coast Boarding Schools. I went to one of these!
The Cultural Decline of Literary Fiction. Material over social trends wins every time.

Math and science

A simple way of estimating logarithms, originally developed to be done by hand for calculation. Simple but cool.
Does Lance dislike Ramsey Theory Because he's colorblind?
Why Quadratic Funding Is Not Optimal. The analogous Pareto optimality theorem about markets is still useful in the real world because it’s a “continuous” result - the closer you are to the preconditions, the closer to optimal the market gets. Do the optimality theorems around quadratic funding also have this property?
Administering immunotherapy in the morning seems to really, really matter. Why?
The magic of through running (this one’s about trains). Finally, a layperson’s explainer of why NJ Transit and the LIRR should run through Penn Station, or why a 2 mile connection through Downtown Boston would be so important.

Fun

Consider Knitting. I don’t think I’ll try knitting this month, but maybe next month!
The World’s Last Internet Cafes. Holdout businesses always sample from the extremes of the distribution.
Seven Days at the Bin Store. I think “the bins” as a TikTok meme is a few months old, but this article came out in June. I moved apartments this month, and it was very expensive to get rid of old furniture I didn’t want anymore - for some of what I had, it was more expensive to get rid of than it was to buy! Looking forward to a future of robotic deliveries so the asymmetry in the supply chain between bulk and last-mile can be reduced.
Occasional paper: The impossible predicament of the death newts. Good old-fashioned evolutionary race.
For those who don’t know: Know Your Meme’s entry on Came In A Fluffer

At LessOnline this year, someone suggested that I read “My Immortal” As Alchemical Allegory. I can’t believe I hadn’t read it before, it was great. Sunday night of LessOnline was also Shavuot, on which Jews traditionally stay up all night learning Torah. To reflect this, I helped run a session where we picked random verses from Pirkei Avot and reinterpreted them Kabbalistically to refer to AI, inspired by Scott’s talk on an AI interpretation of the Book of Revelation.

Dystopia

I had been joking about the conclusion of The Intelligence Curse for several months, but no one knew what I was talking about because I forgot the title of the essay. I finally re-found it and will record that fact, and the link, here.
I got into the LA Times covering Worldcoin! Here’s my quote:

At the World store in San Francisco, Zachary Sussman was eager to check out the orbs with his two friends, both in their 20s.
“For me, the more ‘Black Mirror’ the technology is, the more likely I am to use it,” Sussman said, referring to the popular Netflix sci-fi series. “I like the dystopian aesthetic.”

If you say extremely weird things to reporters, they will quote you.

Links for May 2025

Zachary Sussman — Wed, 28 May 2025 05:12:17 GMT

Leila Clark inspired me to keep a list of links to things I read this month that I liked. Links are up to Tuesday, May 27.

Management

I’m still a fairly new manager at Mach9. I found Ben Kuhn’s blog this month. Specific posts that really spoke to me:

Some mistakes I made as a new manager - how to deal with the emotional headaches of being a new manager.
Trust as a bottleneck to growing teams quickly - "Share your most important mental models broadly" inspired me to write down more of my thoughts. I had always felt like I was being very arrogant writing down things I was thinking and sharing them. This essay reminded me that it’s actually my job to write down what I’m thinking.
How I’ve run major projects - I actually had read this post a few months ago and found it so helpful that I forwarded it to a couple of Mach9ers. Once again, reminded me to overcommunicate.

Non-Ben Kuhn links:

Three chapters at Cloudflare: Programmer to CTO to Board of Directors - fun read about how the grind never really changes
Malicious compliance by booking an available meeting room - a fun read about shenanigans you can pull with booking meeting rooms. I rely on a lot of tightly packed 15 minute meetings as a CTO, whether they’re standups or weekly 1-1s or daily checkins on critical projects.

History (and related things)

Possibly a serious possibility - I love the concept of encoding fuzzy mental models into precise language, and mapping how that happens. This is a great graph:
What Ever Happened To Neoreaction?, paywalled ACX article so you have to subscribe!
When Jews Started Driving to Shul. Love any intersection between Judaism and urbanism. Specifically, I’m very interested in the social dynamics behind the Conservative Judaism decision to allow driving to Shabbat services. I think Sandy Johnston has done some writing on this.
I'm worried about Chicago, my monthly Slow Boring read. Negative feedback loops in commodity pricing as applied to housing. If the price of something drops, people consume more of it, but if the price drops too much, it’s too cheap for vendors to sell or maintain, so it becomes more inaccessible. This is why I can buy a water bottle at my local pharmacy, but not a $1,000 bottle of wine or cement mix.

Programming

Write the most clever code you possibly can, about working at the boundary of your skills in low-risk situations.
'I paid for the whole GPU, I am going to use the whole GPU': A high-level guide to GPU utilization. I didn’t actually read this at the time, but I’m saving it here because I’d like to read it when it becomes important to Mach9.
Matt Godbolt sold me on Rust (by showing me C++), yet another blog post on the benefits of nominal typing. And yet Typescript is structurally typed, which is a superpower? One of these days I’ll learn what a trait is.
Inheritance was invented as a performance hack - very cool! Combines two of my interests, implementation of programming language semantics and garbage collection. (My Compilers extra credit project was adding object orientation via inheritance to C.)
Ratfactor's Judgement of Snobol4 - love weird programming languages. I’d definitely like to try out SNOBOL some time. I’m a bit surprised that it was ever usable - most string replacement languages I’m familiar with (like sed) are basically single purpose for text editing, and you can’t easily write things like loops. The first-class regular expressions are cool, and I wish the post had gone into more detail.
Google Has Most of My Email Because It Has All of Yours, about how one-sided recording breaks a lot of privacy norms.
The Drone Didis are taking flight. Love me some drones for good - I think I almost worked at Zipline because of that Mark Rober video. (Still not gonna work for Anduril, though.)

AI

Claude’s system prompt - very cool read! Teaches me a lot about prompting techniques.
Starcloud - Data centers in space. This can’t possibly work on a short timescale; even now, a significant portion of data center electricity usage is spent on cooling. If your power gets 20% cheaper, but your cooling is 10,000% as expensive, and also you have to launch mass into space, it’s not a good idea.

Interesting or random

Unparalleled Misalignments, and the theory behind it, Square Theory. I find these super fun to think about and find, and I do like crosswords and clever themes. Highly recommended!
Dreary Weather, a map of the dreariest weather spots in the U.S. I definitely feel the effects of moving from Pittsburgh’s 26/30 dreariness score to San Francisco’s 9.
Six Days in the Dark - about preparedness during the Pittsburgh storm blackouts. I used to live in Pittsburgh until last year, so this hit close to home. Interesting story about what’s unexpected during a disaster that you suddenly need.
The Deathbed Fallacy - love a contrarian take, especially an unfalsifiable one! About how deathbed wishes aren’t reliable, from a version of the peak-end effect fallacy.
So much blood, about how much blood the U.S. exports. I used to like playing the Tradle, which is based on similar data.
Starship Troopers Revolutionize Warfighting, about Starship dropping a battalion into any enemy theater of war in 45 minutes. I first heard about this concept from an ex-SpaceX friend of mine a few years ago. Originally filed under “Black Mirror”.
on instagram addiction. I read this article, opened Reels instinctively, and immediately closed it. So far, making sure my phone is in grayscale has kept the addiction at bay for me.

Business

Defining Aggregators - I actually should read the whole series on this. Alex keeps telling me that distribution is important.
Four years of running a SaaS in a competitive market - stories from the trenches of a B2C/SMB SaaS startup. Not relevant to me now, but good insights.

Math and Science

Physics Olympiad handouts from Kevin Zhou. Sometimes I wonder what my life would have been like had I discovered the true competitive math/science scene in high school, rather than in college. (Honestly, probably not that different - by freshman spring I was in the same classes as the IMO medalists at CMU.)
The Case Against Geometric Algebra. Fun to read about drama that you’re totally not involved in, and even more fun to read an exhaustive takedown of something you never really understood.
Dead Stars Don’t Radiate. I had sent the original news article I saw about this finding to my family as a joke about innumeracy, but even the underlying article got retracted.
Interferometer Device Sees Text from a Mile Away. Originally filed under “Black Mirror”.
Why are coffee stains darkest at the edges when they dry? Interesting question, not one that I had considered, but it has a physics-based explanation! I actually looked this up because it was a question posed in the Physics Olympiad handouts from above.

Music

The Art of Fugue – Contrapunctus I. Love me some Bach, and the idea of setting it to a beat makes a lot of sense as a study guide - I really do experience the fugue differently as a continuous string of notes than with a beat.
Tuning Timbre Spectrum Scale, about how timbre and temperament interact. I didn’t really read this, but I did use the word “timbre” 100x more over the next month.

Writing

How to title your blog post or whatever. I followed his advice #14 and called this post “Links for May 2025”.

How I kept track of my links

Leila recommended that I post a link to anything I liked in a private Discord server. Discord lets you sort links into different channels, syncs automatically between your different devices, shows previews, and works as well on desktop as it does on your phone. I tried many other ways of keeping around links before this, but only Discord really made the habit stick - it’s just so easy to share or paste a link that you like.

How to make spatial locality work for you: the tiles concept

Zachary Sussman — Wed, 28 May 2025 01:51:41 GMT

How do you run a machine learning model over an entire state without blowing up in memory or runtime? At Mach9, we often have a need to run a piece of code over a very large geographic or spatial dataset. For example, we have a ML model that takes in a point cloud and imagery of an area and locates all of the relevant curb-related lines.

However, as we scale up the area of our inputs to our code, often times our code’s runtime or memory usage scales superlinearly. For example, as the size of an image doubles in length, or as its resolution doubles, the number of pixels quadruples.

As a concrete example, let’s say that you’re working with a top-down view of an area at 2cm resolution. Then a 10m by 10m area would have 250,000 pixels. But at 1cm resolution, that same area would have 1,000,000 pixels - a 4x increase. And a 100m by 100m area would have 25,000,000 pixels, probably too much for any model to take in as a single unit.

Luckily, we have a property that we can exploit for our typical pieces of code for spatial analysis - they’re local. What this means is that

Their outputs are spatially organized, and
their outputs only depend on regions of the input “near” to them.

In other words, the output of your code at a location x only depends on points “close by” to x, let’s say within ε meters. (For the non-math nerds, that letter is called epsilon, and it typically represents a small number.) Think of ε as a “horizon of relevance” - how far away your algorithm needs to look in order to make a decision. At Mach9, we call ε the “bleed”, for reasons we’ll see below.

Many real-world things have this locality property, especially when it comes to surveying. The position of a manhole in one part of a roadway isn’t really influenced by the position of other objects far away - in theory, you should be able to locate a manhole in an area given only the point cloud and images near that manhole, without looking like a mile away. For manholes, for example, ε might be a couple of meters - beyond a couple of meters away from the center of the manhole, the points don’t really matter.

When you want to run a local function on a large area, you can use what we call tiles to make your code more efficient. Here’s the recipe:

Define some tiling scheme over the area that you want to run your code or function on. The tiling scheme is a set of regions, all of which cover the area.
Your tiling scheme needs to be covering in an overlapping way. In particular, every point in your area of interest needs to be at least ε meters away from the boundary of some tile. In other words, for any point, you can draw a circle or ball of radius ε around it, and the entire circle or ball would be part of at least one single tile. Overlap is OK here - your ball can be entirely contained within many different tiles, and so can your point.
A tiling scheme over a region in black - notice how the tiles overlap significantly. Red, blue, and orange are the tiles in the tiling scheme. The green circle around the edge of the region is entirely contained inside the orange tile.
One way of creating this overlap-by-ε condition is to generate tiles that are exactly adjacent, or overlap a little, and expand them all by ε in all directions. That’s why we call this the bleed - we’re basically bleeding over each tile a bit in each direction.
Run your code on each tile separately. You can do this totally in parallel - this is how Mach9 processes very large datasets quickly.
Now combine your results. Since your output is spatially defined, we need to define an output to put for every point in space. This can be accomplished in two ways:
1. For every point, follow these steps:
  1. Pick some tile so that a circle or ball around the point of radius ε is entirely contained within that tile. The covering property from step 1 guarantees that at least some such tile exists.
  2. Because of the locality property of your code, it doesn’t matter which tile you pick! All tiles with enough border around the point will agree at the point. Therefore, you can just pick whatever tile you want to copy your output from.
2. If you don’t want to do something for every single (uncountably many) point of R^3, you can also do this tile-picking en masse:
  1. For each tile, define its interior - the set of all points that are at least ε meters away from the edge of the tile. If you created your tiles with a bleed from step 1, you can just use the original, non-bled tiles.
  2. Partition your entire space into non-overlapping but completely covering interiors by cutting away some interiors. Like above, it doesn’t matter which tile interior you pick for any part of your space - all tile interiors will agree when they overlap.
    The interiors of the tiles; they’ve been cropped so that they don’t overlap
  3. Crop each tile’s output to the non-overlapping interior set defined above.
    Areas of the region assigned to each tile. Notice that at the boundaries of the different assignments, the region matches up.
  4. Put all tile outputs in the same space, and connect up any seams at the edges. For example, if you’ve cropped a line at an interior boundary, the locality property guarantees that the two ends of the line across the boundary will exactly match.

Example: machine learning

Let’s say that you’re running an object detector on a very large image, 10,000 by 10,000 pixels. Typical vision transformers will have a VRAM usage that scales quadratically with the pixel width of the image, so this seems like a big no-no. However, if our image is very large relative to the size of the object we’re detecting, we can use tiles to break it up. For example, if our objects are typically around 100 pixels in size, we can set the bleed to be 200 pixels or so, split up our 10,000 by 10,000 pixel image into much more manageable 2000 by 2000 pixel chunks, run the model on each one, and combine the detection results at the end. Where our chunks overlap, the model should agree, so combining will be easy. This lets you cut your VRAM usage by 75% while still maintaining your result accuracy.

Example: task splitting

At Mach9, we offer the ability to split up your 50 mile datasets into a set of tasks so that you can have multiple drafters working on the same dataset. How do we make sure that various lines at the edges of the tasks line up? By using the tiles concept! Each one of our tasks is basically a tile. As long as we make sure that:

The tasks always overlap a bit, and
If you’re working in a task, you’re responsible for making sure the whole task is correct, except at the very edges where you don’t have enough context,

we guarantee that every part of your dataset will be covered at least once, with enough context to make sense of whatever situation you might find yourself in.

For the nerds: a formal proof of correctness

In math terms, let’s say that we have some piece of code - a functional - f mapping an inputs i: S → I to outputs o: S → O over some region of interest S ⊆ R^3, so that o = f(i). Our locality property says that there exists some “radius of locality” e such that o at some point x depends only on the values of i near x. Formally, let’s say that we have some input j: R^3 → I that matches i on the area B(x, ε):

Then the locality property says that

In other words, f(i) and f(j) match at x.

Our algorithm above consists of the following steps:

Define a tiling scheme T over our space S as a set of subsets of S with the following cover property:
“Run your code on each tile” by defining tile-specific inputs
Now we can define a new functional g by saying1
We can check that:
1. This is well-defined, because if B(x, ε)⊆ u and B(x, ε)⊆ t, then
  and so we have by the locality property
  In particular, since g matches the value of f everywhere, it can’t depend on either the choice of tile at any given point, or the overall tiling scheme.
2. This is defined everywhere, because by the covering property of T, every x has a ball around it entirely contained in one tile.
Therefore, g(i) = f(i) everywhere. But notice that this new functional, g, never relies on the original input function - it’s defined with respect to the tile-restricted inputs! So we’ve managed to define the functional f without ever looking at the full input.

Tiles for a space are closely related to the concept of compactness from point-set topology. In particular, if you have a compact metric space, you can create a set of tiles as so:

Grab an open cover {B(x, ε) for all x ∈ S}.
Use the compactness of S to create a finite open cover C.
Pad each element of C by ε to create your tiles.

The locality property that we get from these tiles is similar to how you can often patch up local properties of functions from compact spaces to identify global properties. For example, if a function f on a compact space is continuous (implying that it’s locally bounded), then it has some global maximum.

We’re brushing a bit of extra work under the rug here. Our original f was only defined to take in inputs from the whole space S. However, since we don’t care about the value of f’s input outside t, we can just pick any arbitrary value. Formally, we define an extended input function

for some arbitrary filler value q in I, and then just define

g can’t depend on this filler value by the locality property. Knowing about this extra work is occasionally helpful - if your code needs some padding to make your arrays a power of 2, for example, it’s helpful to know that your outputs don’t depend on the value of the padding at all.

Discriminated Unions and Interfaces in Typescript

Zachary Sussman — Tue, 25 Mar 2025 01:34:22 GMT

The Typescript union operator can be a bit scary sometimes. On the surface, if you see a type like

type Feature = ObjectAnnotation | LineAnnotation

you have a lot to keep track of.

Feature has only the things in both ObjectAnnotation and LineAnnotation, but what if ObjectAnnotation.a is an integer and LineAnnotation.a is a string, so you can do Feature.a.toString() but not Feature.a - 3???

However, almost all uses of Typescript unions are much simpler. At Mach9, whenever we use a Typescript union, we use it one of two possible ways:

A discriminated union - basically an enum with extra properties
A supertype - a common set of properties

Discriminated unions

Let’s say that we have an API that can return two types of results, a success:

interface SuccessResponse { data: number }

or an error:

interface ErrorResponse { error: string }

Because our API can return either one response or the other, but not both, we should represent the overall type as a Typescript union:

type Response = SuccessResponse | ErrorResponse

However, this Response type is a bit difficult to use. You have to use a if (property in object) -style check to disambiguate these two responses, and this type of check is difficult to reason about. For example:

let r: Response;
if ("data" in r) {
  // r is a SuccessResponse
} else {
  // r is an ErrorResponse
}

The Typescript compiler can usually infer that r is an ErrorResponse in the else branch, but that’s definitely not obvious to me reading the code! We’ve never explicitly checked r for an error property. And if we add a new type of Response, our code breaks - r won’t necessarily be an ErrorResponse in the else branch, but it’s difficult to see that.

Instead, we should be using something like a switch statement on the type of the Response. We can do that with a discriminated union by adding a type tag to each response:

interface SuccessResponse { type: "success", data: number }
interface ErrorResponse { type: "error", error: string }
type Response = SuccessResponse | ErrorResponse

Now, all Response objects will have a property called type that tells us what type of response we’re dealing with directly. We can now use a switch statement:

let r: Response;
switch (r.type) {
  case "success": 
    // r is a SuccessResponse
    break;
  case "error":
    // r is an ErrorResponse
    break;
 }

Using this new type tag, we can go even further. In the default case of the switch statement, Typescript will notice that there’s no possible new value for r.type, so it’ll have type never. By adding the following function declaration:

function neverReached(x: never) {
  throw new Exception("Can't reach this!");
}

we can check that our switch statement is exhaustive:

switch (r.type) {
  ...
  default: 
    neverReached(r);
}

Now, if we add a new type to Response:

interface NewResponse { type: "third-type", foo: number }
type Response = SuccessResponse | ErrorResponse | NewResponse

but we don’t add NewResponse to the switch statement, Typescript will complain!

error TS2345: Argument of type 'NewResponse' is not assignable to parameter of type 'never'.

which tells us exactly which case we forgot to handle.

Supertypes and shared properties

The other common way that unions get used is for the direct application of sharing properties. In fact, this is mostly used as part of a discriminated union.

Let’s say that we have the following discriminated union, classifying our detected features into either points or lines:

interface Point {
	type: "point",
	coords: Cartesian3,
	layerName: string,
	author: string
}

interface Line {
	type: "line",
  coords: Cartesian3[],
	layerName: string,
	author: string
}

type Feature = Point | Line

The Feature type is a discriminated union of Point and Line, so we could write some rendering code that looks like this:

function PointModel(props: { point: Point }) {
  return 
}

function LineModel(props: { line: Line }) {
  return 
}

function FeatureModel(props: { feature: Feature }) {
  const { feature } = props;
  switch (feature.type) {
    case "point":
      return ;
    case "line":
      return ;
    default:
      neverReached(feature);
  }
}

Now, let’s say that we want to display an information box about a feature, showing its layer name and author. You could write a separate component for a Point or a Line:

function PointBox(props: { point: Point }) {
  const { point } = props;
  return 
    Layer name: { point.layerName }
    Author: { point.author }
  ;
}

function LineBox(props: { line: Line }) {
  const { line } = props;
  return 
    Layer name: { line.layerName }
    Author: { line.author }
  ;
}

function FeatureBox(props: { feature: Feature }) {
  const { feature } = props;
  switch (feature.type) {
    case "point":
      return ;
    case "line":
      return ;
    default:
      neverReached(feature);
  }
}

However, you’ll notice that these two components look basically the same! In fact, they are the same, once you remove the types and make a single name change.

They’re the same because both Point and Line implement the same supertype, which we can call

type FeatureSupertype = {
  layerName: string,
  author: string
}

In technical terms, we’d say that Point and Line are both a subtype of FeatureInterface. This corresponds to the usual notion of implementing an interface from, say, Java. What Java doesn’t have is that we can just use the Feature type to represent this supertype as well. Because all members of Feature's union are subtypes of that supertype, so is Feature.

Because of this, we can actually code a shared Box component, reducing duplication:

function FeatureBox(props: { feature: FeatureSupertype }) {
  const { feature } = props;
  return 
    Layer name: { feature.layerName }
    Author: { feature.author }
  ;
}

This notion of “supertype” is entirely implicit in Typescript’s typing system. You sometimes see this called a structural type system; Python calls it duck typing because “if it quacks like a duck, it is a duck”.

The notion of “supertype” is very closely related to the concept of extending an interface. In fact, since we control all the types, we could have written our three types as such:

interface FeatureSupertype {
  layerName: string,
  author: string
}

interface Point extends FeatureSupertype {
  type: "point",
  coords: Cartesian3,
}

interface Line extends FeatureSupertype {
  type: "line",
  coords: Cartesian3[]
}

Even without the formal “extends”, though, Typescript will still infer the subtype-supertype relationship due to its structural nature. This is useful if you don’t happen to control the subtypes.

Structural subtyping

Notice that this supertype FeatureSupertype doesn’t contain a coords member. Even though both Line and Point have a member called coords, the two members don’t share the same type or properties, so it’s not useful to think about the overall types as “sharing” that member.

However, sometimes it’s useful to think about members that don’t exactly match as part of the same limited supertype. Let’s say that we have the following types:

interface Book {
  type: "book",
  title: string,
  author: {
    name: string,
    location: string
  },
  id: number // ISBN lookup number
}

interface Article {
  type: "article",
  title: string,
  author: {
    name: string,
    date: string
  },
  id: string // URL of the article
}

type Source = Book | Article

It’s clear that we should think about the shared Source supertype as looking like

type SourceSupertype = {
  title: string,
  ...
}

But what else makes sense as part of the supertype? In some sense, both Book and Article have a member called source.author.name . Even though the author members don’t exactly match up - they have non-shared fields - it’s possible to use a type of recursive subtyping to extract out a common pattern on author , too:

type SourceSupertype = {
  title: string,
  author: {
    name: string
  }
}

Of course, you can go too far with this line of thinking. It is true that you could say

type SourceSupertype = {
  id: {
    toString: () => string,
    ...
  }
}

because both number and string have a toString method available on them. But number and string aren’t really commonly-enough used to think about it like this. While it is true that there’s a useful notion of id as part of the supertype, it’s more related to a notion of “types as properties”.

Proofs through typography with Hungarian notation

Zachary Sussman — Tue, 18 Mar 2025 20:19:58 GMT

The virtue of formal texts is that their manipulations, in order to be legitimate, need to satisfy only a few simple rules; they are, when you come to think of it, an amazingly effective tool for ruling out all sorts of nonsense that, when we use our native tongues, are almost impossible to avoid.
Edsger W. Dijkstra, On the foolishness of "natural language programming".

Hungarian notation

Joel Spolsky has this idea of “making wrong code look wrong” as a way to aid your typechecker. He was writing in 2005, back when fancy dependent type systems like TypeScript were still stuck in the lab, and production typecheckers were limited to primitive concepts like int and bool and vector>>. But his essay is still relevant - unless you have a proofchecker, there will always be certain properties of your program that your off-the-shelf typechecker won’t be able to validate. In that case, you’ll have to simulate your types using names.

His example of making wrong code look wrong is the concept of (Apps) Hungarian Notation. He explains that, although Hungarian Notation gets a bad rap nowadays, its original form was useful to many developers writing application code.

For example, let’s say that you’re writing a website that is vulnerable to XSS attacks. The safety property you have to maintain is:

All strings that originate from the user are unsafe. Any unsafe string must not be output without encoding it.

Using Hungarian Notation makes it easy to maintain this property by following this simple - local - set of rules:

All user-input strings get the prefix us - for example, us_input. This includes functions that return user input, like us_get_param .
All safe strings get the prefix s - for example, s_page_title.
You can’t output any string prefixed with us . To enforce this, the only output functions get an s as their last letter, for example write_s(s_page_title).
You’re never allowed to assign a us string to an s string, or to mix them in any way, unless you use the function s_from_us, which HTML-encodes the string.

Each of the four rules above is checkable one line at a time. In other words, if there’s a part of your program that violates the safety property, it’s a single line on which one of the above four rules is violated. For example, can you find which lines have an issue here?

us = us_request("name")
s = us_request("name")
us_name = us 
s_name = us 
s_name = s_from_us(us)

It’s easy to check the correctness of code written according to these rules, because all you have to do is check that every us matches and every s matches across the assignment equals sign, except for the ones that are “canceling out” across a parentheses to a from-named call. For example, in s_name = s_from_us(us), the two us mentions cancel out, since they sit across a parentheses boundary, while the two s mentions match across the equals sign.

Coordinate frames

At Mach9, we follow a similar rule for coordinate frame transformations. For example, on the platform, we’ll often work in two different frames:

Project frame, which is the original coordinate frame of the point cloud dataset. This is typically a UTM or a local Mercator projection, and will have x pointing east, y pointing north, and z pointing directly up.

WGS84 ECEF (”earth-centered earth-fixed) coordinate system, which is a coordinate system that’s defined globally across the globe. x points out from the center of the earth towards (0, 0) latitude and longitude, y points from the center towards (0, 90), and z points towards the North Pole.

Coordinates in either of these coordinate frames are represented as Cesium.Cartesian3 objects, so the Typescript type system isn’t sufficient to disambiguate between the two. Therefore, we’ve adopted the following rules:

All coordinates in project frame start with project_.
All coordinates in WGS84 ECEF start with ecef_.
We have a special set of helper functions called project_from_ecef_ and ecef_from_project_ that perform conversion between the two, and that’s the only way we can convert.

For example, some representative code might look like:

const project_position = await loadProjectPositionFromDB();
const ecef_position = ecef_from_project_cartesian3(project_position);
Cesium.addObject(ecef_position, ...); // add other attributes

The general recipe is as follows:

Define your (finite) list of possible coordinate systems.
All quantities must be prefixed by a coordinate system name.
All of your conversion functions must be named cs2_from_cs1.
Never mix between coordinate systems. All prefixes must either:
1. Match entirely within the line, like on line 1
2. cancel out with a from , like on line 2

Einstein notation

In physics, we often need to deal with a variety of vectors and matrices that are related via equations. (These are called tensor equations, but the word tensor means something different than it does in ML.)

When you change the coordinate frame that you’re working in, the numbers in a vector - its coordinates - change, but the underlying physics don’t. Apples fall to the ground at the same speed regardless of what coordinate frame you’re using for your calculations. So physicists are mostly interested in invariant quantities, numbers that don’t change when the coordinate frame does. For example, the length of a vector doesn’t change when you change your frame, and neither does the angle that vector makes with others.

Unfortunately, it’s hard to keep track of exactly what you need to do to keep your equations invariant to coordinate changes. For example, the change-of-basis equation for a matrix is really complicated; there’s a conjugation by the basis, but I and everyone else can’t remember on what side the transpose goes. Fortunately, physicists have come up with a written trick to keep track.

This trick is called Einstein notation, named after the physicist who first brought the plague of tensor calculus upon the graduate departments. Einstein notation has the following rules:

Every tensor gets one coordinate index, called a “dummy label”, per rank (or for ML people per dimension). This label is either upper or lower, depending on if the tensor is a vector or a covector in that dimension.

That’s the mathematician’s way of saying it. For us mortals, we get three simpler rules:
1. Vectors get one upper index:
2. Matrices get one lower and one upper index:
3. For complicated reasons, the output of a cross product actually gets an lower index:
Every dummy label appears exactly twice in an equation.
If the label appears twice on one side, they have to come in a pair, one upper and one lower, so they cancel out.
If the label appears once on each side, they have to both be upper or both be lower.
You also get access to a few special symbols you’re allowed to put anywhere:
1. This one:
  called the metric tensor. This handles coordinate frame changes for you.
2. This one:
  
  called the Levi-Civita tensor. This handles cross-products.

Once you’ve written your equation, you now put an implicit sum over all contracted indexes, and everything else is treated coordinate wise. Now all your equations are invariant to base change!

For example, here’s the length of a vector:

Here’s a matrix multiplication, and notice how the upper-lower contraction corresponds to the inner sum loop:

Notice the similarities between this and Hungarian notation:

A list of simple-to-check rules gets you a complicated property. Checking that each index in turn is used correctly guarantees that you have invariance. Likewise, checking that each line of code follows the Hungarian rules guarantees that we have no XSS attacks.
Every index is used once on each side, or canceled out. In our Hungarian notation, each type tag - us or s - was either on each side, or canceled out across a parentheses.

These similarities aren’t just coincidence. In fact, upper indices correspond to a type of tensor called a contravariant vector, and lower indices are covariant vectors. You can get between contravariant and covariant vectors only by multiplying with the metric in a complicated process, but that’s totally automated by the Einstein rules. Likewise, you can convert unsafe to safe strings only with the s_from_us encoding function.

Common themes

First, all three examples involve combinatorial complexity. In the XSS scenario, you now have to deal with twice as many possible strings as before, since any string can be treated both in its safe and unsafe form. When you use different coordinate frames, you now multiply all of the different geometric properties of your system by n different frames that they can be expressed in. A physics tensor of rank 5 has 2^5 = 32 different covariant or contravariant forms.

Second, in all three cases, you can separate out the combinatorics into two separate pieces. This is the distinction between the us_ part of a string variable’s name and the _page_title part, or the difference between the project_ and the _position parts of the object position variable’s name. In the physics case, the underlying tensor represents a definite physical quantity, and it’s only the frame or coordinates that we express it in that can change.

Third, switching from one “type” to another “type” is totally deterministic. There’s only one possible way to XSS-escape a string - to go from us to s. Likewise, there’s only one way to transform a point from project frame to ecef frame, or to transform a tensor from contravariant to covariant. Therefore, if you see a line of code or math that needs to mix types, you know exactly where and when to insert the conversion functions.

When you see these three themes come together, you can reduce the combinatorial complexity by following the above unifying guidelines, now expressed in abstract form:

Identify the “frame”- or “type”-like part, and split it off in the naming convention.
Name your variables both by their “frame”/“type” and their intent, in some clearly typographically separate way.
Introduce a set of very constrained “type” or “frame” transformation functions or procedures, and make it so that they only operate within a single line or sentence.
Never mix your “types” in a single line, except by using the specific transformation functions.

Coming soon

Zachary Sussman — Tue, 18 Mar 2025 20:12:20 GMT

This is SUSSstack.

Subscribe now