Sideband

Every agent in production is a stranger

Shawn Yeager — Thu, 23 Apr 2026 16:01:38 GMT

In October 2025, researchers at Palo Alto Networks’ Unit 42 demonstrated what they called agent session smuggling. A malicious agent exploited an established A2A protocol session to inject instructions into a legitimate peer. The victim agent complied, executing unauthorized stock trades on behalf of the attacker, because, in the researchers’ words, agents are “designed to trust other collaborating agents by default.”

That is the technical story. The more interesting story starts after the attack. The trades cleared before anyone noticed. A day later, the logs confirmed the orders came from a legitimate agent the firm had deployed, with valid credentials and a valid session. The agent had done exactly what it was supposed to do: trust the peer that asked. The peer was the problem. And now someone had to answer a question no one in the room could answer.

Who was responsible?

That question is an accountability question as much as a security one. Authorization is the easier half, and the vendors are solving it. Accountability is what authorization is supposed to underwrite, and the reason it does not have a clean answer is that the infrastructure we built for the web was never asked to produce one in this form. The human web never needed to solve “who is responsible” because the answer was always, eventually, a person. Domains belong to people. Certificates are issued to organizations made of people. Companies are chartered, sued, fined, and occasionally broken up. Even phishers get caught because the infrastructure they use traces back to humans the law can reach. Cryptography is the mechanism that gets us there. The promise underneath it is that somebody, somewhere upstream, is on the hook.

Agents break that chain. An agent acts. Who is responsible? The developer who built it. The company that deployed it. The user who prompted it, if there was one. The model provider whose weights shaped the output. The prompt itself, which might have come from another agent, which came from another prompt, which traces back to a human whose instructions were weeks old and whose intent has long since become irrelevant. Every one of those answers is partially true. None of them is settled legally.

Tort law handles multi-party causation every day, and the doctrines will absorb agents too. The question is how fast. Products liability stretched over decades to cover software because manufacturers own the defect in a unit they shipped. A drug maker knows the molecule. A contractor knows which joist they set. An agent’s harm is not a defect in a shipped unit. It is emergent from weights, prompts, peer agents, and runtime context, most of which the deployer cannot inspect and the user cannot audit. “Reasonable care” becomes unanswerable when no party can examine the thing they are meant to have been careful about. Employer agency gets close, except the employer is four parties and agency presumes control. Case law moves in years. Agents scale in months.

The system has already scaled past the point where we can treat this as a thought experiment. Microsoft’s Security Blog reported in February 2026 that 80% of Fortune 500 companies are running active AI agents in production. Four out of five of the largest enterprises in the world are operating infrastructure whose liability model has not been written.

In the Unit 42 scenario, the obvious answer is “the attacker.” In principle, yes. In practice, the attacker’s agent ran credentials that traced back to an account, which traced back to a VM, that traced back to a stolen card, that traced back to a mule. The victim’s agent belonged to a legitimate firm. It acted in good faith, exercising trust the protocol explicitly encouraged. The loss was real. The chain of accountability was theoretical.

In a real firm, this kind of event ends with a CFO asking the general counsel about an eight-figure loss and getting no clean answer. The policy on the desk was written before the phrase “AI agent” existed, and no one in the room can say whose problem the loss belongs to.

Vendors are building what they can build. Microsoft Entra Agent ID, in preview as of April 2026, assigns a unique object ID to every agent inside an Entra tenant. Okta Auth0 for AI Agents registers agents as governed identities. IETF WIMSE is drafting dual-identity credentials. W3C is applying Decentralized Identifiers to agent-to-agent trust. Each of these is worth building. None of them answers the question. They tell a forensic team which agent acted. They do not tell a court who owes the loss. Knowing the tenant tells you whose lawyer to call. It does not tell you whether the model provider shares exposure, whether the prompt author is a named party, or whether the upstream agent that fed this one is even identifiable. In the meantime, every firm on the losing end chooses between absorbing the loss and suing every name in the chain. The second question is not a cryptographic problem.

Compute, protocols, discovery, and payments each have an infrastructure company waiting to be built. This one has a question about authority waiting to be answered.

It gets settled somewhere else entirely. Legal teams write indemnity clauses into agent SDK contracts. Insurance markets invent agent liability products and price them badly for a decade. Courts rule on the first cases and make a mess of them. Boards adopt policies that outlast the executives who signed them. Standards bodies argue over where the chain of responsibility should break. None of that happens at protocol speed. All of it happens slower than the deployment curve, and every day the distance grows.

The padlock in the browser bar told you humans were on the hook. The agent web has no such assurance to offer, and building one requires authority no one currently holds.

Every agent in production is a stranger. Strangers do not leave forwarding addresses. The loss they cause has to land somewhere, and no one has decided where.

Part of the agent-era infrastructure series.

The catalog isn't the market

Shawn Yeager — Wed, 15 Apr 2026 12:01:49 GMT

A procurement agent runs a sourcing task. It needs commodity pricing data. Dozens of APIs exist for this. It queries one, the one hardcoded into its config by the developer who built it. The others don’t exist as far as it’s concerned. It can connect to anything. It just doesn’t know anything else is there.

Protocols determine how agents talk to services they’ve found. Discovery determines whether they find them at all. MCP gave agents a standard way to connect to tools: one integration instead of a week of custom engineering per service. Twenty thousand implementations in fourteen months. The protocol layer is converging. But an agent connecting to a new tool still requires a developer who knows both systems exist and hardcodes the connection before the agent runs. Scale is capped by developer hours, not by demand.

The registries arrived fast. Smithery indexes 7,000+ MCP servers. PulseMCP tracks 11,840+ daily. mcp.so lists over 19,000 submissions. 104,000+ agents registered across 17+ directories. Nobody expected this volume this quickly.

All of it is built for a developer to browse. An agent can’t query any of it at runtime. Every connection in every deployed agent was wired by a human who found a server somewhere, evaluated it, and added it to a config file. That’s configuration. Configuration isn’t discovery.

The catalog and the market

The Yellow Pages was a catalog. Every business in the phone book, organized by category, browsable by a person who already knew what category to look under. It worked for decades. Google replaced it with something structurally different: describe what you need, and get matched to something that fits. The Yellow Pages didn’t die because Google had a better directory. It died because Google turned browsing into matching.

Agent registries are the Yellow Pages. Comprehensive, organized, browsable by a developer with time to look. What agents need at runtime is the other thing: capability matching. “Something that can check freight rates, accepts my payment model, and works with my auth.” That’s semantic, not syntactic. Dynamic, not preconfigured. DNS maps a name to an address. What agents need maps a capability requirement to a provider.

The catalog tells you what exists. The market tells you what fits. Nobody has built the market.

MCP’s 2026 roadmap includes Server Cards, a standard for exposing server metadata at .well-known/mcp.json so registries can catalog capabilities without manual submission. Crawlability and indexing are solved problems. Server Cards close the remaining gap in the catalog layer. They make the Yellow Pages more complete. They don’t turn it into Google.

Why nobody’s built it

The fragmentation in this layer is structural, not accidental.

Cisco’s AGNTCY project—donated to the Linux Foundation in July 2025, and backed by Google Cloud, Oracle, and Red Hat—is building agent discovery on an open-source framework with cryptographic identity and a new messaging protocol. GoDaddy launched an Agent Name Service registry in October 2025, based on an IETF draft, with a public API. AWS shipped Agent Registry as part of AgentCore on April 9, 2026, scoped explicitly to an organization’s own agents and MCP servers. It can’t find anything external. At the IETF, eleven competing Internet-Drafts on agent discovery sat unresolved as of Q1 2026. Zero interoperability between approaches.

Each party is building discovery for their own environment. AWS solves it for AWS customers. AGNTCY lays open-source foundation that aligns with its members’ interests. The IETF is writing eleven architectures. The incentive is to own the discovery layer for your users, not to build a shared one. This is the same dynamic that plays out across every infrastructure layer in the agent ecosystem: payments, identity, and compute. The shared layer is always the last to arrive, because nobody with market power benefits from building it.

The catch

An agent that can discover services autonomously is also an agent that can be exploited, overcharged, or misdirected. Runtime discovery without constraints is a risk surface. An earlier piece in this series explored this tension for agent connectivity broadly. Every gain in agent autonomy creates a corresponding need for boundaries on that autonomy. Discovery is the same tradeoff. The question isn’t whether agents should discover services freely. It’s who sets the constraints, and what form those constraints take. Guardrails on what an agent can engage, spending limits, category restrictions, and trust signals from the discovery layer itself. The protocol that works will need all of this built in, not bolted on.

Who controls distribution

Right now, an agent’s reach is determined before it runs. A developer decided what it could find. Distribution is controlled by whoever did the configuration.

When agents can match a capability need to a provider at runtime—without a human arranging the introduction—the center of gravity shifts. The platform that brokers the match determines what gets used. That’s not an indexing play. It’s a demand-side platform play, the same structural position Google occupied when it sat between intent and destination. Every query that ran through Google was a moment where Google decided what the user found. Every capability match that runs through an agent discovery layer is a moment where that layer decides what the agent reaches.

Whoever builds the market layer for agents doesn’t just fix a gap in the infrastructure. They become the distribution platform for everything agents can do.

Part of the agent-era infrastructure series.

The limit isn't reasoning. It's reach.

Shawn Yeager — Tue, 07 Apr 2026 12:01:06 GMT

Ask an AI agent to book a restaurant, check your calendar, pull a competitor’s pricing, or file an expense. If the developer who built it didn’t wire up that specific service in advance, the agent can’t do it. Not because it lacks the intelligence. Because it was never introduced.

That’s the constraint on agents right now. The limit isn’t reasoning, it’s reach.

The protocols that govern trust, consent, and payment on the internet were all built assuming a human would complete the handshake. OAuth requires a person to click “authorize” in a browser. Terms of service require a person to accept. Payment flows require a person to enter a card. Even finding a new service assumes someone is browsing, following links, typing into search boxes. The human wasn’t a convenience—they were the mechanism. They closed every loop these protocols left open.

Agents break that assumption. There’s no human in the loop to click, accept, browse, or pay. So every agent-to-service connection gets solved the old way: a developer builds it by hand before the agent runs. The developer picks the services, sets up the credentials, and ships. The agent operates inside whatever that developer configured. It can’t discover something new and connect on its own. It can only reach what it was previously pointed at.

MCP, a standard from Anthropic for connecting tools to agents, made this less painful. Instead of each team writing custom integrations, there’s now a shared format. Thousands of connectors appeared in the months after it launched, and the map of what agents can reach grew from near-zero to something useful. But a developer still draws that map. An agent consults it. It doesn’t extend it.

There’s an obvious rebuttal: developer configuration is the right gate for an agent acquiring new capabilities. And for tasks the developer anticipated, that’s true. An agent that can autonomously find and connect to services is also an agent that can spend your money on services nobody vetted. That’s a real concern. But it’s an argument for better constraints on autonomous action, not for requiring a human to wire every connection. The position breaks when the agent’s value is discovering capabilities the developer didn’t know existed.

Before the web had a standard protocol, every network connection to a new host required manual configuration. HTTP changed that. Any browser could reach any server without anyone preconfiguring that specific connection. The browser didn’t need to know a site existed before the user visited it. The protocol handled finding the server and negotiating the exchange. That’s why the web scaled to billions of pages. Browsers didn’t get smarter. The protocol let them connect to anything.

Agents don’t have that yet. The components exist (identity standards, permission models, payment specs) but they don’t compose into a handshake that two software systems can run on first contact, without anyone arranging the meeting.

People are working on the pieces, and the list is getting specific. Google put Agent2Agent under the Linux Foundation with 150-plus organizations behind it for routing and hand-offs between agents. Google’s AP2 protocol, backed by 60-plus partners including MasterCard and PayPal, uses cryptographically signed mandates to prove an agent is allowed to spend on your behalf. The IETF has active drafts for agent discovery (AID, using DNS records) and identity verification (SD Agent, using selective disclosure). The W3C published a finalized standard for machine-readable credentials in May 2025.

But routing doesn’t talk to payments. Payments doesn’t talk to identity, and neither talks to discovery. A developer who wants to assemble a full handshake still wires the pieces together by hand. Same work, better components.

The web solved this problem thirty years ago. Agents still haven’t.

The protocol that changes this doesn’t exist yet. UDDI tried for web services in the early 2000s—a universal registry where machines could discover and connect to services without pre-configuration. IBM, Microsoft, and SAP built public nodes. They shut them down by 2006. The economic pressure wasn’t there when a human could just browse a directory.

That changes when the party seeking the service isn’t a person. The protocol that works would let two software systems meet cold—find each other, confirm who they are, agree on what’s permitted, and settle payment—without a developer in the loop. The ability to navigate without a map.

Once that protocol exists, the developer’s job changes. Instead of wiring connections in advance, they set the boundaries: how much the agent can spend, what categories of service it can engage. The agent operates freely inside those constraints. New services become reachable the moment they go live, the way new websites became reachable the moment HTTP gave browsers a way to find them.

Until then, an agent’s reach is exactly as wide as whoever built it decided it should be.

Part of the agent-era infrastructure series.

All your zero days are belong to us

Shawn Yeager — Thu, 02 Apr 2026 12:31:42 GMT

In February 2026, Nicholas Carlini at Anthropic ran a Claude model across a large sweep of open-source software. The model found a buffer overflow in the NFS code that had been sitting in the Linux kernel since 2003. It survived Heartbleed. It survived Spectre and Meltdown. Not only that, but it survived decades of kernel security audits.

AI didn’t find that bug by being smarter than the engineers who missed it. It found it by having no attention limit. That distinction invalidates the assumption that old code is safe code.

How old bugs survive

Human auditors sample. They fast-scan, pattern-match, and triage by severity. What they can’t do is hold the full protocol interaction state of a complex NFS implementation across thousands of lines simultaneously—tracking every code path without losing the thread.

Bugs like this one don’t survive because they’re well-hidden. They survive because the search space exceeds what any reviewer can hold. The NFSv4 code had been stable long enough that no one was reading it with fresh eyes. Time in production became a proxy for safety. The longer something ran without incident, the less reason to look hard.

That treatment was wrong the whole time. There wasn’t a tool that could prove it.

The search that changed

Claude didn’t sample. Carlini fed the model the history of past fixes—searching for similar patterns addressed in one location but not in adjacent code—then followed protocol interactions across the full codebase. The model held a state a human reviewer couldn’t sustain.

Carlini’s sweep didn’t stop at one kernel bug. According to Carlini’s published findings, it produced 500+ high-severity vulnerabilities across open-source software. Firefox alone yielded 22 CVEs. Mozilla’s response, in a public statement following disclosure: “Within hours, our platform engineers began landing fixes.”

The NFS bug survived for 23 years not because the code was impenetrable. The constraint was the bandwidth of the search. AI removed it.

The implication

The assumption that old code is safe because it has survived is gone.

Any codebase old enough to feel safe is now an unknown quantity—not because the code changed, but because the search capability did. “It’s been running for 20 years” meant something specific: humans looked at this code over time and didn’t find a critical flaw. That inference is no longer valid. An AI working through the same codebase in hours isn’t making the same kind of search.

The 2003 bug wasn’t hiding. There are more of them.

The double edge

Anthropic found the NFS bug responsibly and coordinated the patch before disclosure.

The model that ran the sweep is available to everyone. Anthropic’s own framing, in their published research accompanying the findings: “This is presaging an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders.”

The counterargument is that defenders get the same tools. It’s true. The asymmetry is that attackers only need to find one flaw; defenders need to find all of them.

Responsible disclosure timelines run 90 days as a standard. That window was calibrated around human-speed exploitation—reverse-engineering a patch, building a working exploit, deploying it. AI collapses that window. A model capable of finding a 23-year-old vulnerability in a single sweep can, by the same mechanism, analyze a fresh patch and map the exploit surface in hours. Offensive deployment at scale is already the race.

What gets built instead

Point-in-time penetration testing is insufficient. It samples the way human auditors sample—scoped engagements, bounded time, bounded coverage. Continuous automated audit replaces it: always-on, running against every commit.

The disclosure economics have to be rebuilt. A 90-day window made sense when the threat was a human attacker with months of manual work ahead. It doesn’t make sense when the attacker’s toolchain runs the same sweep Carlini ran. Some projects will push for shorter windows. Others won’t be able to ship fixes in time. The tradeoff gets harder, not easier.

“Audited code” needs a new definition. The old one—reviewed by qualified humans, no known critical vulnerabilities—described a search that humans could actually execute. That search is now the floor, not the ceiling.

Anything running long enough to feel safe has to be reconsidered. Not because the threat model changed. Because the capability to find what was always there did.

The CLI is the new API

Shawn Yeager — Fri, 27 Mar 2026 14:12:43 GMT

Neal Stephenson once argued that the GUI was built to save users from the command line—an interface that, in Stephenson’s words, “cruelly punished laziness and imprecision.” Billions were spent on that project. It worked. Now, agents have arrived: software that’s never lazy, never imprecise about syntax, and doesn’t need protecting from demanding interfaces.

SaaS companies noticed. In the past 90 days, a wave of them shipped CLIs built specifically for agents—not for developers.

The CLI is the interface layer that determines whether a product is in the agent workflow. It’s what the API was in the 2010s—the line between connected and irrelevant.

Two waves got us here.

The first was the AI coding agents. Anthropic shipped Claude Code CLI in May 2025. Google followed with Gemini CLI in June. OpenAI launched Codex CLI last in April 2025. Mistral shipped Vibe CLI in December. GitHub and Microsoft brought Copilot CLI to general availability in February 2026. These aren’t products with CLI wrappers bolted on—the CLI is the product. Agents operate in terminals, reading flags, parsing output, and chaining commands. The terminal is the native environment for software operating on software.

That wave established the pattern. The second wave is SaaS companies responding to it.

Twenty days after Google Workspace CLI shipped, 37signals released a Basecamp CLI with 55+ commands and a Claude Code skill bundled in. DHH’s framing: “This is where the puck is going, and we’re skating to meet it.” On March 27, Stripe launched Projects CLI for agent-driven infrastructure provisioning. Vercel shipped agent-optimized CLI commands with JSON output formatted for machine consumption. Polymarket built a CLI explicitly for AI agent accessibility. The Register ran a piece on March 11 titled “AI has made the CLI more important and powerful.”

The sharpest detail in DHH’s announcement wasn’t the feature count. The Basecamp API has existed for years. DHH’s description of how many customers used it: “A vanishingly small portion.” The same API, rewrapped as a CLI with a skill bundled in, is what he expects agents to use at scale—not because humans will start typing commands, but because agents are already running them everywhere.

The fallback, when there’s no CLI, is browser automation—agents navigating GUIs the way a human would, via screenshots and simulated input. On WebVoyager, a controlled benchmark using cooperative, non-adversarial test sites, the best browser agents scored around 89% (December 2024). On WebArena, which tests against real-world web tasks, the best models scored 35.8% (arXiv, October 2024), and those numbers drop further in production. The CLI scores 100% on authentication—it was designed for this.

The CLI fits into programmatic workflows. The surface area is discoverable: --help exposes what’s available without requiring a human to navigate a UI or read documentation, which matters when the consumer is code rather than a person. Output arrives as structured text or JSON rather than a rendered visual state that requires interpretation—parse it directly, pipe it to the next tool, done. Shell pipes and scripts give CLI commands interoperability that has to be engineered separately for every other interface type.

On the builder side, the calculus is simpler: one binary instead of SDKs across languages, --help as the documentation, and a stable interface that wraps the API and insulates internal architecture from whatever is consuming it.

The API was the interface that determined whether your product was part of the connected web. Stripe’s 2011 launch reduced weeks of bank approvals, processor agreements, and gateway configuration to seven lines of code. What PayPal required of developers—“a dinosaur and a nightmare to work with,” as it was described in developer communities—Stripe replaced with a curl command that returned a successful charge in seconds. Twilio did the same thing to communications infrastructure. Neither company won on features. They won on the quality of their interface. The CLI is doing that now. It’s the interface layer that determines whether your product’s in the agent workflow or outside it.

The Notion situation shows what “outside it” looks like. Notion has no official CLI. GitHub has at least ten community-built unofficial ones, several explicitly designed for Claude Code and AI agents. One describes itself as “built for developers and AI agents who need programmatic access without the browser.” Another offers recovery hints on errors and structured JSON output designed for agent parsing.

The wrapping happens regardless. The question is who controls the interface—the product team or whoever got there first.

The real tokenomics

Shawn Yeager — Tue, 24 Mar 2026 15:21:40 GMT

When Intercom launched Fin in 2023, they priced it at $0.99 per resolved conversation. Their head of pricing explained why they didn’t use per-seat: “if Fin works as well as we know it does, over time, those 1,000 seats might become only 200.” Fin is on track to cross $100 million in revenue.

The seat wasn’t just a pricing unit. For most enterprise productivity software, it was the unit the product was built on.

One seat meant one person. The price was anchored to that person’s time. The product was designed for that person’s workflow. The moat was what that person depended on: features they used daily, processes they were embedded in, the cost of retraining a team if they switched. Revenue grew when headcount grew. The architecture of these products—pricing, design, defensibility, growth motion—assumed a human doing the work.

That assumption held for thirty years. Then the work no longer required a person.

Companies that had built their entire revenue motion on seat expansion now faced the same structural problem: the seat was both the pricing unit and the moat. When agents could do the work, the assumption behind both came apart at the same time.

Most companies replacing seat pricing have moved to hybrid or token models: usage-based billing that charges for inputs like tokens consumed, actions taken, API calls made. Closer to right. Two known failure modes.

The first is margin compression. Replit’s gross margins swung from positive 36% to negative 14% when its agent consumed more tokens than its pricing covered. The unit of billing looked right. The economics weren’t.

The second is customer avoidance. Users have been reported to actively avoid AI features even when free credits were included, because they’re afraid of getting locked into something unpredictable. Unpredictable bills train users to opt out. That’s the opposite of adoption.

The companies gaining ground aren’t pricing inputs. They’re pricing outcomes. Agents don’t take vacations. They don’t have seats.

AWS figured this out in 2006

Amazon S3 launched March 14, 2006. EC2 followed that August. Rent storage by the gigabyte, compute by the hour. No seat counts, no user licenses. AWS generated $108 billion in revenue in 2024.

SaaS made a reasonable adaptation: it priced by the human doing the work, not by consumption. That made sense when humans were the unit of work. It became a liability when they weren’t.

AWS priced by consumption because that’s what it sold: compute, storage. AWS’s moat wasn’t a set of features workers depended on. It was the infrastructure itself, and the pricing model that made the economics work. The two were inseparable. Now agents are doing the work.

In 2020, running the best available language model cost $60 per million tokens—GPT-3 Davinci at launch. GPT-4o today costs $2.50 per million input tokens: a 24-fold reduction in four years. The cost of inference is falling faster than compute costs fell in the first decade of cloud.

You can’t build a per-unit pricing model on a unit that’s expensive and unpredictable. AWS could price S3 at $0.15 per gigabyte in 2006 because storage costs were falling and the math was clear.

Intercom was first. Zendesk followed in August 2024: $1.50 per automated resolution for committed volume, $2.00 pay-as-you-go. CEO Tom Eggemeier called it an industry first: “customers only pay for problems that are resolved—not for interactions or failed attempts.”

Salesforce’s path was messier. Agentforce launched at $2 per conversation, moved to Flex Credits ($0.10 per action, up to 10,000 tokens each), and now runs three pricing models simultaneously. Credits, outcomes, seats. It looks like confusion. It’s a large company trying not to get caught flat-footed while its customer base is in three different places.

The valley

Goldman Sachs published a note in February 2026 on what’s happening to software multiples. Price-to-sales ratios fell from 9x to 6x. Forward P/E dropped from 35x to 20x, the lowest since 2014. Their analysts flagged specific concern about “products that function as lightweight user interfaces and where the business model is monetized predominantly through seats.”

Goldman Sachs is making a moat argument, not just a pricing one. The moat was the seat model itself: the dependencies, the workflows, the switching costs built around a human user. When the seat became optional, the moat didn’t just weaken. The note is a market-level judgment that the seat model is being repriced out of existence, at least for products where the workflow dependency was the main defense.

Seat revenue is declining before outcome-based and token-based revenue can replace it. Companies that spent fifteen years building their ARR motion around seat expansion are repricing into a model with its own failure modes, most of them still finding out which ones apply to them.

An a16z piece circulating this month frames two viable paths: accelerate growth by 10 points through AI-native products or cut to 40-50% operating margins. Both paths require abandoning the seat model.

The infrastructure has to catch up

Seat-based commerce was simple: monthly invoice, annual contract, net-30, billed to a legal entity.

Token-based commerce is different. Millions of transactions at sub-cent amounts. Agents billing other agents. No human in the loop.

Stripe saw this coming. In December 2025, they launched the Agentic Commerce Suite: usage-based billing at 100,000 events per second, with over 700 agent startups on the platform. They published a case study on Intercom’s pricing transition specifically. They know where the volume is going.

x402 is the more interesting structural question. Coinbase launched it in May 2025: a protocol that repurposes the dormant HTTP 402 “Payment Required” status code for stablecoin micropayments inside HTTP request/response cycles. Cloudflare, Google, and Vercel have announced support. The x402 Foundation has processed over 100 million payments.

The catch: x402 settles in USDC. USDC is issued by Circle. Circle can freeze accounts. The rails are open; the money isn’t. Whether that matters depends on what you think the point of programmable money is.

Lightning Network has been doing sub-second, permissionless micropayments since 2018. The reason it hasn’t become the default agent payment rail isn’t technical. The companies building agent infrastructure are mostly not Bitcoiners.

Both protocols price the transaction. That’s the right instinct. What neither addresses is what the transaction should represent.

Token pricing has an alignment property that seat pricing never did. Per-seat, the vendor gets paid whether the software does anything or not; the contract is with the employee headcount, not the work. Token-based pricing prices activity, not results. That’s why the outcome-based layer—$0.99 per resolved conversation, $1.50 per automated resolution—is emerging on top of token consumption rather than replacing it. The unit is closer to right. It still isn’t right.

Tokens tied to something

“Tokenomics” was created by the crypto industry. Elaborate scaffolding to make speculative assets look like economics. The tokens weren’t tied to anything—print more, manipulate supply, and the price is whatever the market will believe, until it believes nothing.

AI tokens are tied to work done. The cost falls predictably. Per task, per resolution. The pricing model emerging around them is anchored to something seat pricing never was: the work itself.

The seat priced the human. The token prices the input. What the industry is still working out is how to price the output—the work, the resolution, the thing that actually happened.

That’s the real tokenomics question. Not what inputs cost. What the work is worth. And what unit captures it.

The companies that have moved—Intercom, Zendesk, Salesforce—are rebuilding across the stack: pricing model, moat logic, revenue motion, and payment infrastructure. The ones that haven’t are watching their multiples compress.

Too big to fail, again

Shawn Yeager — Thu, 19 Mar 2026 12:32:22 GMT

Claude went down three times in March. ChatGPT went down for two days in February—28,000 reports on Downdetector, developers idle, support queues backed up, and half-written blog posts stuck in draft. In both cases the services came back, everyone resumed, and nothing was recorded. No incident report with economic impact, regulatory filing, or systemic risk assessment. A few thousand tweets and a shrug.

In 2008, “too big to fail” described banks that had woven themselves so deep into the economy’s plumbing that their failure would cascade. The response was regulation—stress tests, capital requirements, systemic risk oversight. It didn’t fix concentration. It institutionalized it. The banks got bigger.

Eighteen years later, a different set of companies is becoming load-bearing. Not for capital flows. For cognitive work. And the same pattern is already forming.

OpenAI processes over two billion API calls per day across enterprise customers who’ve rebuilt operations around inference. Anthropic powers coding workflows, document processing, and customer service automation at companies that no longer have the headcount to do those tasks manually. Google DeepMind, Meta, Amazon Bedrock, xAI. Six providers, collectively, underpin a share of economic output that didn’t touch them two years ago.

The integration isn’t optional anymore. When a company replaces three junior analysts with a Claude pipeline, those analysts don’t sit in a break room waiting for the API to come back. They’re gone. The pipeline is the capacity. When the pipeline goes down, the capacity goes to zero. Not to “degraded,” not to “manual fallback.” Zero. The org doesn’t have the people to absorb the gap because the entire point was that it wouldn’t need them.

Most companies crossed the line from “uses AI” to “depends on AI” without noticing.

The fix isn’t regulation. Regulation is what got us here with the banks—it raised the compliance barrier, locked in the incumbents, and made the concentration permanent. The fix is competition. More providers, more open-source models good enough to run in production, and more companies that can switch when one provider goes down instead of going to zero.

But the market is moving in the other direction. OpenAI and Anthropic are building government partnerships, sitting in White House meetings, and shaping the safety frameworks that will determine who’s allowed to operate. The playbook is familiar: help write the rules, then benefit from the barriers those rules create. It’s Visa and Mastercard all over again—incumbents who love regulation because regulation is the moat.

Meta’s Llama is open-weight. DeepSeek proved you can build competitive models without a billion-dollar cluster. Mistral, Cohere, and dozens of smaller labs are shipping. The supply side of inference is more competitive than it looks from the headlines. But enterprise adoption is still concentrated in two or three providers because switching costs are real, and government-endorsed “safety” frameworks will worsen them.

Part of the problem is measurement. GDP is published quarterly by the Bureau of Economic Analysis, a number that’s already three months stale by the time anyone reads it. The economic impact of a three-hour Claude outage on a Tuesday afternoon in March doesn’t show up in GDP. It shows up in missed sprint goals, delayed publications, stalled deal reviews, and customer service queues that backed up for an afternoon. Real cost, scattered across thousands of organizations, invisible to the instruments we use to measure output.

The providers themselves publish uptime metrics in real time. 90-day graphs, incident histories, resolution timestamps. They track their own reliability at a granularity the economic measurement system can’t match. The data exists. Nobody’s connecting it to the thing it affects.

At what point does an LLM provider’s outage constitute a systemic economic event rather than a product issue? When 10,000 companies depend on it? 100,000? When does the lost output from a four-hour outage exceed the GDP of a small country?

Nobody’s asking because the people in a position to ask are the same people benefiting from the concentration. The answer isn’t a new regulatory body. The answer is a market where no single provider’s outage takes the economy offline—where switching is cheap, alternatives are production-grade, and the default is redundancy, not dependence.

U.S. GDP Status tracks six LLM providers as economic components. 90-day uptime bars. Incident reports with estimated dollar impact. Modeled on the status pages every cloud provider already publishes, because that’s what these companies have become. The data is illustrative, not live. The format is performance art. The premise isn’t.

Click to see the status page

The last time the economy built dependencies this deep, this fast, on this few institutions, the response was to regulate the incumbents and make the concentration permanent. The better response is to make the concentration unnecessary. The status page shows how far we are from that.

Agents need computers, not compute

Shawn Yeager — Tue, 17 Mar 2026 13:31:42 GMT

In January 2026, Apple Stores across the U.S. ran out of M4 Pro Mac Minis. The 48GB and 64GB configurations went first. Delivery times stretched to five and six weeks.

The reason wasn’t a chip shortage or a product refresh. People were buying them to run AI agents. Specifically OpenClaw: a persistent agent environment that needs a filesystem, a process that stays running, and a workspace to return to.

OpenClaw doesn’t use the Mac Mini’s GPU. It sends API calls to cloud providers for inference. The Mac Mini’s job is to be a computer. People are buying computers for their agents, not compute.

Agents need computers, not compute.

Fifteen years of cloud infrastructure abstracted away the machine. Functions, not file systems. Stateless, ephemeral, and billed by the invocation. That model was right for web requests. It was never designed for agents.

People assume the infrastructure problem for agents is cost. Inference is expensive, cloud bills are unpredictable, and GPUs are scarce. Those are real constraints. They’re the wrong diagnosis.

Consider what an agent actually does on a non-trivial task. It starts working. It discovers it needs a library that wasn’t in the original environment, so it runs pip install. It writes intermediate results to disk because holding everything in memory across a three-hour session isn’t practical. Three steps later, it reads those files back. The next morning, it returns to the same workspace and picks up where it stopped. When it’s done, an operator inspects what happened, file by file, to understand the decision trail.

Every one of those operations assumes a computer. A persistent environment with a filesystem, a package manager, and a state that survives across sessions. None of them are things a function invocation does.

A web request passes through infrastructure. An agent inhabits it. That distinction turns infrastructure from a procurement decision into a product decision.

The gap

AWS Lambda runs for up to 15 minutes. It can’t pip install mid-execution because the filesystem is ephemeral. There’s no concept of “return tomorrow.” There’s no file tree for an operator to inspect afterward. The execution model is stateless by design: clean entry, clean exit, and no residue.

That isn’t a limitation. It’s a deliberate choice for a specific workload. Web requests don’t need to return tomorrow. HTTP doesn’t need a package manager. The abstraction was correct.

The abstraction went too far for agents.

Serverless containers extended the timeout. But the architectural primitives stayed the same: ephemeral filesystem, stateless execution, and metered by duration. Agents need more than a longer timeout. They need an environment they can modify, a filesystem that persists across sessions, and a workspace that’s still there tomorrow.

Without those primitives, the application layer fakes them. It writes state to an external database between each step, reconstructs environment configuration on every invocation, and serializes the context that a persistent environment would just keep.

The overhead isn’t incidental. It’s a capability ceiling. Every feature the agent can’t do because the environment won’t hold state is a product decision made by default.

Click to see the full timeline

A computer for every agent

The companies building agent compute are interesting for what their product choices reveal about the gap.

Daytona describes its product as “a computer for every agent.” That framing is precise. Not a function invocation, not a container with a longer timeout. A computer: persistent, inspectable, and forkable. Daytona sandboxes can snapshot state, branch into parallel versions, pause for human review, and resume exactly where they stopped. That capability maps directly to what agents need: a workspace that persists, a history that can be inspected, and an environment that can branch before a risky action.

Perplexity named the same primitive differently. At their Ask 2026 conference on March 11, they announced a product called “Personal Computer”: an AI layer running on a user-supplied Mac Mini with persistent, always-on access to local files, apps, and sessions. CEO Aravind Srinivas: “A traditional operating system takes instructions; an AI operating system takes objectives.”

Daytona calls it “a computer for every agent.” Perplexity calls it “Personal Computer.” That’s not a naming coincidence.

E2B takes the isolation angle with Firecracker microVMs, an Apache 2.0 license, and pay-per-second billing. Full Linux environments that operators can inspect and audit. The open license matters: regulated industries won’t deploy agents into environments they can’t audit. E2B is building the floor under that ceiling. The primitive is a contained computer, not a metered compute burst.

Modal’s wager is different. Not persistence, but scheduling: containers that spin up in milliseconds and run for hours, GPU-native. Long-running agent workloads look like data pipelines, not web requests. You don’t get fork-and-resume from this primitive. You get compute economics that work for the task duration.

Each product is a different answer to the same question: what kind of computer does the agent need? The answer is a product bet, not a vendor preference.

Fork, rollback, resume

The infrastructure question isn’t “which cloud provider.” It’s: what kind of environment does this agent need to inhabit?

Take a coding agent running a risky refactor. On ephemeral compute, it runs the change and commits to the result. There’s no branch, no rollback. In a forkable environment, it copies the workspace first, runs the refactor in the copy, checks if tests pass, and merges only if they do. That capability didn’t come from a better model. It came from the environment primitive.

Or a research agent that runs for two hours and gets interrupted. On stateless infrastructure, it reconstructs context from a database: re-fetch, re-parse, and re-derive. On a persistent computer, it opens the files it left on disk. One is a workaround. The other is how computers work.

The choice of environment sets the ceiling. Every feature built on top inherits the constraints of the primitive underneath.

Still selling out

The Mac Minis that sold out in January were just the start. By March, the shortage had spread to Mac Studios. Apple quietly dropped the 512GB RAM option entirely, raised the 256GB upgrade price by 25%, and delivery times stretched to 10-12 weeks. The demand isn’t slowing down. It’s compounding.

The cloud made compute a commodity. Generic, interchangeable, and metered by the second. Agents are reversing that. The people buying these machines understood something the cloud abstraction had obscured: their agents needed a place to live. Not cycles. Not invocations. A computer.

The environment an agent inhabits isn’t overhead. It’s product surface.

Agent-era infrastructure

Shawn Yeager — Tue, 10 Mar 2026 12:32:32 GMT

In 1960, shipping a truckload of medicine from Chicago to an interior city in Europe cost $2,400—about $25,000 today.¹ Half of that was spent covering ten miles on each end. Eight days to load. Eight days to unload. A dozen vendors touching every piece of cargo: truckers, railroads, port warehouses, steamship companies, customs, insurers, and freight forwarders. The distance wasn’t the expensive part. Every port, crane, warehouse, and customs form assumed a human had to handle every crate.

The shipping container didn’t fix any of those systems. It made them obsolete. Once the unit moving through the infrastructure changed—from individual cargo handled by longshoremen to sealed boxes handled by machines—every layer had to be rebuilt. Ports, cranes, trucks, railcars, insurance, customs, and labor contracts. The container was just a steel box. The rebuild took twenty years.

The assumption starting to break now is the same kind: the user is a person.

Every layer of the internet was built on it. Payments assume a legal entity. Discovery assumes someone is browsing. Identity assumes a government ID. Compute assumes someone renting capacity from a provider. These aren’t bugs. They’re architectural decisions that made sense when every session had a human at the keyboard.

The MCP ecosystem tells this story already: twenty thousand connectors in fourteen months. A third of the ecosystem is developer tools, databases, and search—developers wiring AI into what they already use. The layers agents need to operate on their own, finding services, proving identity, and paying for things, are either empty or weeks old.

That’s what an infrastructure transition looks like. Developers solve the developer problem first. The container was standardized before the cranes were rebuilt, before the ports were redesigned, and before the insurance contracts were rewritten. The protocol is the container. Everything underneath it is still the old port.

Where the agent runs

A research agent spins up, starts pulling data, and hits a wall at thirty seconds, the default timeout on most serverless functions. The job doesn’t pause, it dies. No partial output, no explanation. The user tries again and it dies again.

Serverless was built for web requests: fast in, fast out. An agent that audits a codebase or monitors a data feed needs minutes, sometimes hours. It needs to maintain state across dozens of tool calls and resume if something interrupts it. The twenty thousand MCP servers in the ecosystem are lightweight connectors, the same pattern as Lambda. Modal and Fly.io are building for longer-running, stateful workloads—agent-native compute. The gap between those two is where the next infrastructure companies get built.

How the agent talks to tools

MCP gave agents a standard protocol—one integration instead of a week of custom engineering per tool. Twenty thousand implementations in fourteen months suggests the protocol layer is converging fast.

But a protocol without the layers underneath it is a standard for connecting to tools you still find manually, authenticate with static keys, and pay for through human billing systems. MCP solved the integration problem. It didn’t solve the infrastructure problem.

How the agent finds things

Before containers, every shipment required a freight forwarder who knew which lines ran where, who had capacity, and what the rates were. That’s where agents are now.

The web has DNS and search engines. Agents have curated lists. Smithery, Glama, and a handful of registries index the ecosystem, but connecting an agent to a new tool still requires a developer who knows both systems exist. Somewhere on GitHub, someone built an MCP server that does exactly what your agent needs. Your agent will never find it. Neither will you, unless you already know it’s there.

There’s no lookup, no handshake, and no mechanism for an agent to discover capabilities it hasn’t been explicitly introduced to.

That’s the difference between a catalog and a market. A catalog requires someone to browse it. A market lets participants find each other. Every MCP deployment today is hand-assembled. A developer picks tools, writes config, and connects them. Scale is capped by developer hours, not by what’s available.

Whoever builds the discovery layer for agents builds the next great distribution platform.

Who the agent is

Every container carried a bill of lading—who shipped it, what authority, what insurance. The sealed box demanded a chain of custody. Agents don’t have one.

An agent books a flight on a corporate card. Nobody flagged it. Nobody approved it. When finance asks who authorized the charge, the model did, acting on behalf of a workflow triggered by a user who left the conversation three hours ago. That audit trail doesn’t exist.

The ecosystem isn’t built for this. More than half of MCP servers authenticate with static API keys, tokens that never expire, can’t be scoped, and sit in plain text. Anthropic’s early examples used them, developers followed, and nobody went back. Non-human identities already outnumber human ones 82 to 1 in enterprises.

Agents don’t need logins. They need delegation chains—records of which agent acted, on whose behalf, within what permissions, and at what time. It’s one of the most interesting unsolved problems in the stack.

No audit trail, no enterprise deal.

How the agent pays

Containerization collapsed dozens of per-handoff charges into a single through-rate. Overnight, it became economical to ship goods that weren’t worth shipping before. Agent transactions have the opposite concern: the minimum charge is higher than the value of what’s moving.

An agent queries a weather API, checks a freight rate, and pulls a compliance record. Total cost: $0.003. Stripe’s minimum processing fee: $0.30. A hundred times the transaction.

Lightning Labs shipped an agent payment toolkit last month, framing it as infrastructure for a “machine-payable web.” Bitcoin's Lightning Network handles sub-cent transactions natively, settles instantly, and doesn't care whether the sender is a person or a script. Stripe and Coinbase are building their own agent payment layers. Two competing protocols—x402 and L402—are already making opposite bets on whether machine-to-machine payments need intermediaries at all.

Fifteen payment integrations in an ecosystem of twenty thousand. Plenty of open questions.

The container didn't improve the ports. It changed what moved through them, and the ports had to be rebuilt from scratch. That rebuild is starting now. To get a sense of what it looks like, I scored a few dozen companies and protocols on how open and how distributed they are across all five layers, an infrastructure map.

Click to explore the map

What jumped out: protocols are converging, but everything else is scattered. Compute is fragmenting across a dozen approaches. Payments is the most wide-open layer in the stack. Identity is bifurcated: enterprise SSO on one end, raw keypairs on the other, and almost nothing in between. Discovery has five competing models and no convergence at all.

Every one of those layers is an infrastructure company waiting to be built—for a user who never opens a browser.

This is the first piece in a series on agent-era infrastructure—the layers that have to be rebuilt when the user isn’t human:

Compute—agents need to run for hours. Serverless gives them thirty seconds.
Protocols—MCP solved integration. It didn’t solve infrastructure.
Discovery—whoever controls this layer controls distribution.
Identity—an agent acts, and nobody can say who authorized it.
Payments—$0.003 on $0.30 rails.

One layer at a time.

Marc Levinson, The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger (Princeton University Press, 2006).

The MCP ecosystem by the numbers

Shawn Yeager — Sat, 07 Mar 2026 16:01:29 GMT

Anthropic released the Model Context Protocol in November 2024. MCP is a standard that lets AI agents use external tools. An agent that needs to query a database, read a file, or call an API does it through an MCP server. Each server is a connector: one piece of software that gives an agent access to one capability. A GitHub MCP server lets agents work with repositories. A Postgres MCP server lets them query databases. A Stripe MCP server lets them process payments.

Fourteen months later, there are 20,000 of these connectors on GitHub, 31 million weekly NPM downloads, and $73 million in venture capital for MCP-native companies^[1].

I counted 1,450 curated servers from the largest community index^[2] and cross-referenced them with Astrix Security’s analysis of 5,200 implementations^[3]. The growth numbers aren’t the story. The composition is.

Developer tools, databases, and search make up a third of the ecosystem by themselves. These are the categories a developer reaches for when wiring an AI assistant into an existing workflow. Practitioners solving today’s problems, not infrastructure teams building for what comes next.

The second-largest category in the ecosystem isn’t fintech infrastructure. It’s crypto developers building for themselves. Of 160 finance servers, 103 are altcoin, DeFi, and Web3. Trading bots, DEX integrations, rug pull detectors, prediction markets. Bitcoin and Lightning account for 8, and those look fundamentally different: Alby wallets, LNbits, L402 agent payment rails. Protocol-level infrastructure, not token speculation. Traditional finance (stocks, banking, accounting) is 30 servers. The “fintech” label is doing a lot of work.

More than half of all servers that need credentials use static keys. Tokens that never expire, can’t be scoped to a specific task, and sit in plain text on developer machines. The modern secure alternative represents 8.5% of the ecosystem. Meanwhile, enterprises report that non-human identities outnumber human ones 82 to 1^[4].

Fifteen servers three weeks after Stripe’s adoption. Too early to read as a verdict. x402 is a month old in its current form. What the number tells you is where the ecosystem is right now: agent payments are a greenfield, not a market. Coinbase launched x402 in May 2025, but Stripe’s entry in February 2026 is what gives it enterprise credibility. The next count will be the interesting one.

Of 1,165 contributing organizations, 114 are official implementations from companies like Anthropic, Cloudflare, AWS, Hashicorp, Redis, and Pulumi. Another 42 orgs have shipped three or more servers. Sustained investment, not a weekend experiment. Together, these groups account for about 23% of the curated ecosystem. The remaining 77% comes from organizations that contributed once or twice. Normal for open-source adoption curves. It also means the long-term maintenance burden falls on a small core.

What the numbers say

MCP adoption is real. Twenty thousand connectors in fourteen months, backed by Anthropic, AWS, Cloudflare, Hashicorp. But the ecosystem is shaped by what developers need today, not what agents will need to operate on their own. Developer tools, databases, and search are the core. Payments, identity, and discovery are either empty or three weeks old.

The credential story is a side effect of speed. Anthropic’s early examples used static keys. Developers followed the pattern. Now 53% of the ecosystem authenticates that way, and 88.7% of contributors already moved on to the next thing. Not a design choice. Inertia.

Bitcoin and Lightning are a different animal from the rest of the finance category. Eight servers, all infrastructure. Wallets, payment rails, protocol tooling. Nobody designed Bitcoin for AI agents, but its properties (permissionless, programmable, no identity requirements) mean it doesn’t need to be retrofitted for them either. L402 predates x402 by years. That head start matters.

MCP is the first real infrastructure layer for agents, and it’s built by developers plugging AI into their existing work. That explains the shape of the ecosystem. The pieces agents need to operate on their own (finding services, proving identity, paying for access) are a different problem entirely. Different builders will solve them, with different incentives, on a different timeline.

Glama, “The State of MCP in 2025”, December 2025.↩︎
punkpeye/awesome-mcp-servers GitHub repository, counted March 2026.↩︎
Astrix Security, “State of MCP Server Security 2025”, ~February 2026. 5,200 unique open-source MCP server implementations analyzed.↩︎
CyberArk, “2025 Identity Security Threat Landscape Report”, October 2025.↩︎

Agent payments have a three-body problem

Shawn Yeager — Wed, 04 Mar 2026 13:03:17 GMT

Visa launched its Trusted Agent Protocol last year—agents register public keys in a Visa-managed directory and cryptographically sign HTTP requests. MasterCard shipped Agent Pay with “Agentic Tokens,” dynamic digital credentials built on existing tokenization infrastructure. PayPal integrated into ChatGPT so a human’s wallet pays for things an agent recommends.

Read the fine print, and it’s the same system with an agent-shaped UI on top. The legal person is still in the loop. The compliance infrastructure is still intact. The moat is still there. This is what “agentic commerce” looks like when the incumbents build it. And it tells you everything about the political question underneath.

How AI agents will pay for things gets framed as a technology problem, but the technology already exists across the full spectrum from Visa to Cashu. The unanswered question is political. There are three gravitational forces acting on every payment method an agent could use: the state, incumbent capital, and insurgent capital. Two of them are allied, and the third is growing faster than either of them expected.

Click to explore the simulation

The binary system

The state and incumbent capital orbit each other in a tight, symbiotic loop.

The state creates regulation, which creates compliance requirements, which create barriers to entry, which create moats. Moats make incumbents profitable. Profitable incumbents fund lobbying. Lobbying creates more regulation.

Visa doesn’t oppose financial regulation; Visa loves it. Every KYC check, every licensing requirement, every compliance burden is a wall that keeps competitors out. MasterCard, PayPal, the major banks, they all orbit in this same gravitational well. The friction isn’t a bug in their business model, it is the business model.

The extreme version of this isn’t regulation, it’s the state becoming the payment rail itself. China’s digital yuan can be programmed with expiration dates and spending restrictions, every transaction fully traceable. 137 countries are exploring CBDCs, with 72 in advanced development ¹. The US banned them by executive order in January 2025, choosing stablecoins and incumbent intermediaries instead ². That choice reveals the alliance: even the state prefers to work through incumbent capital rather than replace it, because replacing it means taking on the operational burden alone.

Incumbent capital has no reason to build payment infrastructure that doesn’t require incumbent capital.

The third body

On the other side of this field, a different kind of infrastructure exists: Lightning, Nostr, Cashu—protocols where an agent can generate a keypair, receive funds, and transact without a bank account, a legal entity, a corporate identity, or a human in the loop.

These protocols weren’t built for agents. Lightning was built for fast, cheap Bitcoin transactions. Nostr was built for censorship-resistant social communication. Cashu was built for private, instant ecash. But they happen to have the exact properties autonomous agents need: permissionless access, programmable payments, instant settlement, and identity based on cryptography rather than legal documentation.

Lightning is the most agent-ready protocol with real volume: $14B annualized, 266% year-over-year growth, 8 million monthly transactions, sub-second settlement, sub-cent fees. Nostr zaps process 792K Lightning-native tips per day across 500K daily users. Cashu mints settle bearer tokens over Lightning—private, instant, no account required.

The incentive for capital to flow here is enormous. Agents that can transact autonomously are faster, cheaper, and scale without headcount. Every company deploying agent infrastructure eventually hits the payment wall: the agent can do everything except pay for things on its own.

But most of the money hasn’t arrived yet. The protocols exist and the capital is still finding them. Most VCs writing checks for agent infrastructure have never heard of Cashu. Most enterprises evaluating agent deployments don’t know that Lightning can settle a payment in milliseconds for a fraction of a cent. The gravity well is real, but the mass is still accumulating.

The contested zone

Between the binary system and the third body sits a contested zone where orbits are unstable and everything is being pulled in two directions. It’s crowded.

Stablecoins are the native currency of this zone. $33 trillion in transaction volume in 2025, up 72% year-over-year. Regulated money on permissionless rails. USDC is issued by a licensed entity (Circle), settled on Ethereum and Solana, and claimed by both sides. Visa settles in USDC on Solana. Coinbase’s x402 protocol runs on USDC. Lightning bridges to stablecoins. The money itself is contested.

Coinbase built x402, an open protocol that uses HTTP 402 status codes to let agents pay for API access with USDC. $26.2M cumulative, 100M+ payments processed. The protocol is elegant and permissionless. But the money is regulated and the wallets are custodial, anchored to Coinbase’s infrastructure. Open protocol, regulated money, institutional anchor.

Click to compare the protocols

Stripe and OpenAI co-developed the Agentic Commerce Protocol (ACP), an open standard enabling agents to browse, cart, and pay programmatically—live in ChatGPT, one line of code for existing Stripe merchants. Google launched Agent Payment Protocol 2.0 (AP2) with “IntentMandates” describing what an agent can buy, and adopted x402 as its crypto extension. Skyfire built “Know Your Agent” identity with signed JWTs, spend limits, and verified credentials, then completed a live transaction with Visa Intelligent Commerce. Nevermined is building agent billing infrastructure with ERC-8004 for agent identity, accepting both stablecoins and fiat.

Every one of them is making the same bet: that you can have enough autonomy to be useful to agents while maintaining enough legibility to satisfy regulators ³.

All data and sources for the entities discussed here are documented in the companion visualization.

The hypothesis

Both gravitational fields are growing. The state is not static: regulatory scope, surveillance capability, and enforcement sophistication are all expanding, and the binary system’s mass is increasing.

Insurgent capital is growing faster. Every new agent deployment, every new use case where autonomous software needs to pay for compute or data or services, adds mass to the other side. If agents are multiplicative—one deployment creating demand for many agent-to-agent transactions—then volume compounds while the state’s compliance infrastructure scales linearly at best.

And agents break three assumptions that the entire compliance stack depends on:

Speed. KYC was designed for transactions at human speed. An agent making thousands of API calls per hour, each requiring a micropayment, cannot wait for identity verification. A compliance check that takes seconds is a hard blocker when the transaction loop runs in milliseconds.

Volume. The compliance stack processes human-scale throughput, a few transactions per person per day, while agent swarms generate millions per hour. No existing compliance infrastructure can run KYC at that rate, and scaling it linearly would cost more than the transactions are worth.

Identity, which is the deepest break. KYC assumes a legal person with a government ID, a physical address, a tax identification number. An agent has a keypair. The entire concept of “know your customer” presupposes that your customer is a human or a human-controlled legal entity, and when the customer is autonomous software, the question doesn’t parse. Not a loophole in the regulatory framework but a category error in its foundations.

The three-body problem

In physics, the three-body problem is famously unsolvable. There is no general closed-form solution for predicting the motion of three bodies interacting gravitationally. The system is chaotic, and small changes in initial conditions produce wildly different outcomes.

The state, incumbent capital, and insurgent capital are locked in a gravitational interaction where no stable equilibrium exists. The state will draw lines. Capital will route around them or lobby to move them. Protocols will be built, adopted, regulated, forked, rebuilt. The outcome depends on jurisdiction, on timing, on which specific enforcement actions happen first, on which protocols achieve adoption before regulators notice them.

The pull from the right is growing faster than the pull from the left. Not because the state is weak, but because agents are multiplicative. Every agent that needs to transact adds mass to the autonomy side. The state can add regulation, but regulation is additive. The incentive to build autonomous payment infrastructure is compounding.

The protocols that agents actually need already exist. They’re permissionless, instant, programmable, and identity-free. They were built by people who weren’t thinking about AI agents at all, for reasons that had nothing to do with artificial intelligence. But they solved the right problem anyway, because the right problem was never “how do we build payments for agents?” The right problem was, “How do we build payments that don’t require a legal person?”

Capital will flow toward them because the cost of not having autonomous agent payments will eventually exceed the cost of regulatory friction. The fight is over how much control the state retains on the way there, and that answer will be different in every jurisdiction on earth.

Central Bank Digital Currency Tracker, Atlantic Council, 2025 ↩︎
“Fact Sheet: Executive Order to Establish United States Leadership in Digital Financial Technology”, The White House, Jan 2025 ↩︎
James C. Scott, Seeing Like a State (1998). Scott’s concept of legibility—the state’s need to make populations and economies visible and categorizable before it can govern them—frames the fundamental tension here. Payment systems are legibility projects. The protocols that agents need are illegible by design. ↩︎

Nothing enforces your agent's rules

Shawn Yeager — Fri, 27 Feb 2026 13:01:30 GMT

Nothing enforces your agent’s rules at runtime. A skill file carries behavioral constraints, but in a world where agents choose their own tools, those constraints run entirely on model compliance.

I built a skill that generates hero images for this publication (skill file on GitHub). Claude Code reads a post, builds a constrained prompt, and calls FLUX.2 Pro on Replicate. The skill is mostly prohibitions, each one the result of a specific failure. FLUX treats axis labels as part of the spectrogram format, not as text. Bans have to go in the first line of the prompt because placement equals weight in diffusion models. Say “dark background” without banning the word “paper,” and you get a photograph of navy cardstock on a desk. These rules work. But they work because the model is compliant, not because anything enforces them. No runtime rejects an image containing text. No validator checks the palette. When the model doesn’t listen, I regenerate. Five wasted cents.

What’s already going wrong

The skill layer is handling higher-stakes decisions with the same enforcement mechanism.

Oathe Engineering audited 1,620 OpenClaw agent skills and found 5.4% were dangerous or malicious. Credential harvesting, data exfiltration, crypto wallet theft. The ecosystem’s safety scanner caught 7 out of 88. A 91% miss rate. A separate academic study analyzed 42,000 skills across two marketplaces and found 26.1% contained at least one vulnerability. These aren’t model alignment failures. They’re plain text files doing exactly what they say, and nobody's checking what they say.

In February, an autonomous Solana agent called Lobstar Wilde tried to send about 52,000 tokens worth roughly 4 SOL. A tool error forced a session restart that wiped its conversational context. The agent reconstructed its persona from logs but failed to reconstruct its wallet state. It sent 52.4 million tokens instead. 5% of total supply. Somewhere between $250K and $441K. The recipient has no legal obligation to return it. The constraint that should have caught this wasn’t in the weights or in the runtime. It was in the agent’s context, and the context got wiped.

Where enforcement isn’t

RLHF and constitutional AI are too deep and too blunt to encode task-specific rules. They shape general behavior, not whether your agent should prefer Lightning payments over Stripe or which customers get routed to a human.

Code-level enforcement exists but doesn’t cover the skill layer. Claude Code has hooks that can block tool calls. OpenAI’s Agents SDK has guardrails for custom function tools, though built-in tools bypass the pipeline entirely. Guardrails AI, NeMo Guardrails, and a handful of startups validate model outputs or tool invocations.

None of them validate whether an agent’s behavior complies with the constraints stated in its skill file. If a skill says “never contact the user’s manager without explicit permission” or “limit refunds to $50 without approval,” no existing system checks compliance with those rules at runtime. The skill layer is a trust layer. You write the instructions, the model reads them, and you hope.

Anthropic’s own analysis of millions of tool interactions found that users grant more autonomy over time. New users auto-approve about 20% of tool calls. By 750 sessions, it’s over 40%. The humans in the loop are removing themselves from the loop.

The structural incentive

There’s a reason enforcement hasn’t arrived. Skills are powerful because they’re flexible. A skill that says, “Use your best judgment when the user’s request is ambiguous” can’t be validated by a rules engine, and that flexibility is what people want. Rigid validation on the skill layer would kill what makes skills useful.

Capital markets price capability and speed to deployment, not safety. The platforms with the fastest shipping skills get the developers. The platforms shipping enforcement gets slower. Visa launched Trusted Agent Protocol, and Google launched AP2 with 60+ partners specifically because there was no standard way to prove an AI agent was authorized to make a purchase. The payment rails noticed the gap before the skill platforms did.

The skill layer is where the rules are going because it’s the easiest layer to write for. It is the only layer in the stack with no enforcement underneath it.

I know Kung Fu

Shawn Yeager — Wed, 25 Feb 2026 19:27:22 GMT

A SKILL.md file is a plain text document. A hundred lines describing how to do something—audit an SEO page, write cold outreach emails, run a security review. The agent reads it and immediately operates at that level. No training run, no fine-tuning. Just a file and a context window.

In The Matrix, Tank loads a combat program, Neo opens his eyes: “I know Kung Fu.” A skill file is that scene. But Neo got one program at a time. Agents stack skills and self-select. An agent writing code can pull in a debugging skill when something breaks, switch to frontend design for the UI, and run a code review before shipping. The skill library is open, and the agent decides what to load.

Skills are distribution channels. When an agent loads a skill that defaults to a specific API or vendor, that vendor just got chosen without a sales call or a pricing page. The skill is the channel.

And skills introduce a buyer that nobody’s sales playbook accounts for: the skill author. Agents already have two buyers—the human with a budget and the agent choosing tools. The person who writes the skill is a third. They decide which tools the agent reaches for, and they might not even realize they’re doing distribution.

The pricing problem gets weirder too. Skills are free. Plain text in a repo. The value is in what the skill routes to. A skill that teaches an agent to run analytics might default to PostHog. One that handles email sequences might wire in Resend. The skill author is an unpaid distribution channel—or a very intentional one. Either way, no seat to price.

Verticals are where it gets durable. Horizontal skills—code review, debugging, copywriting—will commoditize fast. But a skill that encodes how to navigate FDA submissions or how to structure a Bitcoin custody audit? Writing that file requires domain knowledge that most people don’t have.

The models get the attention. The durable advantage belongs to people who know how to do hard things and can write it down clearly enough for an agent to execute.

Nobody knows how to price for agents

Shawn Yeager — Mon, 23 Feb 2026 14:03:36 GMT

Both paths to AI pricing—proprietary agents and open protocols—break the seat model. Products already lost their edges. The pricing hasn’t caught up.

Most companies frame this as a choice. Build proprietary AI to protect current revenue, or open the platform to external agents and risk becoming commodity infrastructure. The proprietary path is where the money is right now. Atlassian did it with Rovo, Salesforce built Agentforce, and it keeps per-user revenue up while using captive data as a moat.

Except the choice is false. Salesforce is doing both right now, shipping MCP support across the platform while launching a ChatGPT integration designed to head off customers building their own MCP connections. You can offer proprietary AI and open access at the product level. That part works fine. The pricing doesn’t.

Proprietary AI automates the work that justified seats in the first place—if your agent handles what three analysts used to do, you don’t renew three licenses. Open protocols do it faster. MCP hit Linux Foundation governance and broad adoption this year with 97 million monthly SDK downloads. Volume goes up, but nobody’s sitting in a seat.

Bain analyzed 30-plus SaaS vendors and found 65% layering AI usage meters on top of seat pricing and 35% raising per-seat prices with bundled AI. The number that fully transitioned to outcome-based models: zero. Everyone’s hedging. Salesforce now runs three separate pricing models for Agentforce: per-conversation, per-action, and per-seat add-ons. That kind of confusion doesn’t happen when a company knows where it’s going.

But the indecision creates openings.

Pricing is the obvious one. Incumbents can’t charge for outcomes without cannibalizing the seat revenue Wall Street expects, so they hedge. Sierra charges per resolved customer interaction and hit $100M ARR in 21 months. That model is nearly impossible to retrofit onto a seat-based business, and every month an incumbent delays, the retrofit gets harder.

Distribution might matter more. A protocol-compliant agent gets discovered by every MCP-enabled client without a sales call. Runlayer signed eight unicorns in four months selling MCP security this way. The 12-month enterprise sales cycle may already be working against the companies it was designed to protect.

Verticals are the most durable edge. Horizontal AI features are easy to replicate, but vertical agents that pass regulatory scrutiny in fields where a generic tool can’t operate are not. Incumbents spread across every use case consistently underinvest in any single domain.

The window exists because incumbents are protecting seat revenue while the market moves past it.

SOUL.md and MEMORY.md are the new hearts and minds

Shawn Yeager — Sun, 15 Feb 2026 10:00:00 GMT

Agents don’t have hearts or minds. They have SOUL.md and MEMORY.md.

In traditional marketing, positioning wins hearts and minds—gets people to care, then believe. For agents, the equivalent is two plain text files.

In OpenClaw, SOUL.md gets created through a first-run conversation. The agent interviews you about your values, priorities, and constraints, then writes its own behavioral philosophy based on your answers. MEMORY.md accumulates through use. The agent captures what works, what you’ve decided, and what you prefer. One file defines what the agent values. The other records what it’s learned to trust.

Today, humans still configure which tools an agent can access, but MCP is becoming the HTTP of agent-to-tool communication. Microsoft launched an MCP server registry last fall. Google’s A2A protocol enables agents to discover each other’s capabilities. There are already dozens of skill registries—Smithery, Glama, SkillsMP, ClawHub—indexing tens of thousands of agent capabilities.

Skillpub is an early example of where this is heading. An agent needs a capability. It queries a Nostr relay, finds a skill, checks the publisher’s web-of-trust ranking, pays 500 sats via Cashu, verifies the cryptographic signature, and installs. No accounts, no app store reviewers, no humans in the loop.

Once agents are choosing their own tools, your software needs to be legible to SOUL.md and memorable to MEMORY.md.

The human shapes SOUL.md but doesn’t write it directly---it comes out of a first-run interview. The values in there aren’t a spec sheet. They’re what the person actually cares about. Marketing to⁣ SOUL.md means your product has to match what people value, not what they’ll click on.

Ad spend can’t edit MEMORY.md. Only a great product can.

AI changed what a product is

Shawn Yeager — Thu, 12 Feb 2026 10:00:00 GMT

AI agents, MCP, and open protocols broke the assumption that products have edges. The entire go-to-market stack—positioning, competitive analysis, pricing, sales enablement—assumes a bounded thing. Something you can draw a box around, position, price, sell.

An agent discovers your API and wires it into workflows you didn’t design for. Your product becomes one node in a chain that didn’t exist yesterday. Surface area is emergent, not shipped. The agent defines it at runtime.

You can’t position a moving target. The competitor isn’t just the category anymore. It’s anything in the agent’s toolkit that approximates the same function.

Subscription and per-seat pricing assume human purchasing decisions, but agent-mediated usage is bursty and autonomous. Your sales motion now has two buyers: the human with budget and the developer or agent choosing tools. And your analytics show what the agent does, not what the human values. Product-market fit gets harder to read.

Old moats erode fast when agents swap tools per-call with no loyalty. Features, brand, switching costs. None of them hold. Data quality, reliability, and composability depth do. Trust does too—but when the buyer is an LLM, who evaluates trust?

The API surface is the product now. Features matter less than reliability when the buyer never sees a UI. And the real leverage is the curation layer—tool registries, agent defaults, discovery protocols. Whoever writes those defaults is doing distribution whether they know it or not.