Ben Werdmuller

Agentic Engineering Patterns

Ben Werdmuller — Mon, 16 Mar 2026 13:45:42 GMT

Simon Willison’s work-in-progress deep dive into agentic engineering is predictably good.

From the introduction, distinguishing agentic engineering from vibe coding:

“Some people extend that definition to cover any time an LLM is used to produce code at all, but I think that's a mistake. Vibe coding is more useful in its original definition - we need a term to describe unreviewed, prototype-quality LLM-generated code that distinguishes it from code that the author has brought up to a production ready standard.”

I’ve been using the term AI-assisted engineering, but standardizing around agentic seems more precise for the kind of activity we’re talking about.

And from the anti-patterns page:

“Don't file pull requests with code you haven't reviewed yourself.

If you open a PR with hundreds (or thousands) of lines of code that an agent produced for you, and you haven't done the work to ensure that code is functional yourself, you are delegating the actual work to other people.

They could have prompted an agent themselves. What value are you even providing?”

The temptation is to write and push code that you haven’t reviewed personally, but technology leaders need to enforce a human-review process. You are responsible for all code you push, and you are responsible for not wasting your colleagues’ time.

From writing code is cheap now:

“Delivering new code has dropped in price to almost free... but delivering good code remains significantly more expensive than that.”

Simon is building a really great guide to not just the process but the underlying mindsets behind good agentic engineering. It’s worth reading and following.

[Link]

Businesses rush to rehire staff after regretted AI-driven cuts

Ben Werdmuller — Sun, 15 Mar 2026 18:06:35 GMT

[Dexter Tilo in Human Resources Director]

Not a surprise. Careerminds polled 600 HR professionals from organizations that had made layoffs in the last year.

“It found that 32.7% of organisations that conducted AI-led layoffs had already rehired between 25% to 50% of the roles they initially let go.

Another 35.6% said they had already rehired more than half of the roles that they cut.”

Say it with me: AI can’t replace the skill, judgment, creativity, and taste of real people. Replace the word “AI” with “spreadsheet” and the nonsense behind AI-led layoffs becomes even clearer. AI is a potentially very powerful tool, but it’s just a tool, and it works better when more highly-skilled people are using it.

Which organizations are beginning to find out:

“According to the findings, more than half of HR leaders said AI required more human insight than anticipated.”

It’s worth saying that around 21% of respondents did report that their layoffs went okay. It’s possible that they’re lying. They could also have been employing people to do very manual, repetitive data work without any degree of insight, which seems like a poor use of a workforce. But generally speaking, the study found hundreds of orgs that saw the potential to save costs and were so blinded by dollar signs that they didn’t go beyond the marketing claims about what AI could actually do:

“What ties all these findings together is that the organisations that struggled the most were making significant, irreversible decisions without the full picture of AI capabilities and what a reduction would do to their workforce.”

These organizations treated people poorly. The real tragedy is that they seem not to have understood the skills and value of their own employees. There’s a deeper problem there than just AI.

[Link]

FCC Chair Threatens to Revoke Broadcasters’ Licenses Over Iran War Coverage

Ben Werdmuller — Sun, 15 Mar 2026 17:28:15 GMT

[Ashley Ahn in The New York Times]

Seems bad:

“Brendan Carr, the chairman of the Federal Communications Commission, threatened on Saturday to revoke broadcasters’ licenses over their coverage of the war with Iran, his latest move in a campaign to stomp out what he sees as liberal bias in broadcasts.”

The irony is that it was the FCC under Reagan in 1987 that withdrew the fairness doctrine, a policy that had required broadcasters to reflect different viewpoints on controversial matters of public interest since 1949. If the current FCC felt like there was a persistent liberal bias, they could reinstate it. Regardless, painting US broadcast media as overwhelmingly liberal is a stretch in a post-Bari Weiss CBS era; perhaps Carr sees a problem with there being any liberal-leaning media.

Only free-to-air channels require broadcast licenses, so in practice this affects a minority of news consumers. Most people get their news via the internet or cable news these days. But there’s clearly an intended chilling effect here, and obviously there are other routes to get to newsrooms beyond broadcast TV: merger approvals, adverse regulations, and so on. It’s a threat to the entire US news media in a country that claims to treasure free speech.

[Link]

Trump is using immigration policy to suppress speech, lawsuit claims

Ben Werdmuller — Sat, 14 Mar 2026 15:16:54 GMT

[Shannon Bond at NPR]

This lawsuit, filed by The Knight First Amendment Institute at Columbia University and Protect Democracy on behalf of the Coalition for Independent Technology Research (CITR), is important:

“The suit accuses the administration of violating the First Amendment with an official policy to deny visas to or deport noncitizens who work on or study social media platforms, fact-checking or other activities the government deems "censorship" of Americans' speech. It argues that amounts to unconstitutional viewpoint discrimination.”

The work conducted by researchers into social media is vital: it helps us build safer communities that allow democratic discourse to take place. Unfortunately, the Trump administration has decided that this safety work is a radical act — and in particular that research into Trump ally Elon Musk’s X is verboten. I would assume that Ellison’s American TikTok will receive the same preferential treatment.

My friend Dr J. Nathan Matias, who runs the amazing Citizens and Technology Lab, is a named declarant. His experiences are laid out in the suit:

“[…] the Censorship Policy has deprived Dr. Matias of significant contributions from his noncitizen collaborators in the United States. Because of the fear that they will be denied reentry to the United States under the Policy, some of Dr. Matias’s U.S.-based noncitizen collaborators have decided not to travel abroad, including to attend meetings with new community partners who have important experiences related to online safety and freedom of expression. As a result, Dr. Matias has been unable to pursue collaborations with those partners, who would have added significant value to his research. Because of the same fear, one of Dr. Matias’s noncitizen collaborators felt compelled to make extensive contingency plans in connection with international travel, requiring more flexibility in their work and less visibility on their projects, which has significantly delayed progress on those projects. Additionally, because of the fear that they will be targeted for detention or deportation under the Policy based on their work, at least one of Dr. Matias’s noncitizen collaborators has decided not to speak to journalists or answer questions from policymakers on topics related to their work. As a result of these chilling effects, Dr. Matias has lost important opportunities to develop new research, obtain expert feedback on research, and bring visibility to his work and the work of his lab.”

This is unacceptable. We should all hope that the lawsuit is successful.

[Link]

When Using AI Leads to “Brain Fry”

Ben Werdmuller — Sat, 14 Mar 2026 15:01:21 GMT

[Julie Bedard, Matthew Kropp, Megan Hsu, Olivia T. Karaman, Jason Hawes and Gabriella Rosen Kellerman in Harvard Business Review]

Interesting research about the interaction between AI use and burnout, studying 1488 (an incidentally unfortunate number) US-based workers. Burnout is real:

“Participants described a “buzzing” feeling or a mental fog with difficulty focusing, slower decision-making, and headaches. This AI-associated mental strain carries significant costs in the form of increased employee errors, decision fatigue, and intention to quit.

There’s some nuance here, however. We also found when AI is used to replace routine or repetitive tasks, burnout scores—but not mental fatigue scores—are lower. This highlights the subtle-but-important distinction between the types of stress that AI can alleviate, and those that it may worsen.”

So the kind of AI use matters. The researchers found that AI use cases that required increased oversight (coding is one, using AI with sensitive internal data is another) increased the risk of burnout. This was particularly true because the people who used these tools were more likely to take on more work, pushing their total cognitive load beyond their limits. But using it for more straightforward repetitive tasks reduced the risk of burnout.

The high-risk activities cluster around certain teams:

“After marketing, people operations, operations, engineering, finance, and IT were the functions with the highest prevalence of AI brain fry.”

Legal teams, who presumably use AI on evidence sets and on contract analysis using tools like Harvey but not their actual legal analysis, were the least likely to suffer from this problem.

This should inform how managers think about AI use and how to set humane norms internally.

[Link]

BuzzFeed Nearing Bankruptcy After Disastrous Turn Toward AI

Ben Werdmuller — Sat, 14 Mar 2026 14:37:48 GMT

[Victor Tangermann at Futurism]

This looks like a cut and dry story about a media company turning from producing content using writers to using AI and suffering the consequences:

“Peretti said BuzzFeed would be using the software to enhance the company’s infamous quizzes by generating personalized responses.

[…] Now, three years after its AI pivot, the writing is on the wall. The company reported a net loss of $57.3 million in 2025 in an earnings report released on Thursday. In an official statement, the company glumly hinted at the possibility of going under sooner rather than later, writing that “there is substantial doubt about the Company’s ability to continue as a going concern.””

The content was underwhelming, and this shift coincided with Buzzfeed shutting down its award-winning news division.

There’s a lot AI can do, but it can’t replace the judgment and taste of human beings. It can’t be a great writer or a great creative. It can take on drudge-work and be a good copilot, like a grammar checker or other supportive tools can be good copilots, but it’s not a replacement for a skilled workforce. (I would also argue that there’s no such thing as unskilled labor: almost every job you can think of benefits from human nuance and judgment.)

But there are a lot of people who see dollar signs and hope that replacing people with predictive engines will help them scale and increase their margins. All that means is that there will be a lot more Buzzfeeds.

[Link]

Notable links: March 13, 2026

Ben Werdmuller — Fri, 13 Mar 2026 13:30:23 GMT

Most Fridays, I share a handful of pieces that caught my eye at the intersection of technology, media, and society.

Did I miss something important? Send me an email to let me know.

Coming Off the Bench for Bluesky

I’m mostly pretty excited about Bluesky’s CEO change. Toni Schneider was the CEO of Automattic for a very long time, and was arguably the grownup in the room. I’ve never met him, but he seems to understand open source and the principles that Bluesky is trying to uphold.

Jay Graber, of course, did an amazing thing. She first wrangled the community that was established to figure out what Bluesky even was, then was the keeper of the argument that it should be an independent entity rather than part of Twitter, and finally marshaled it into a real startup that raised millions of dollars to bring the platform to life. When Jack Dorsey became upset that Bluesky was embracing community safety over laissez-faire decentralization, she weathered that too, and he left the board. These things are hard. I’m glad she’s sticking around as Chief Innovation Officer; my sense is that she’s going to kick ass in tech for a long time.

Toni explains the miracle here:

“I’ll be honest: I was skeptical about decentralized social. The vision was always compelling. A social web that no single company controls, where users own their identity and their relationships, where anyone can build on top of the protocol. But I’d seen enough promising decentralized projects fade or fragment that I had stopped expecting one to get to scale.

Bluesky changed that. Hearing their vision and, more importantly, learning about the architecture they’d built (the AT Protocol) I became a believer. This was a real, scalable foundation for a different kind of internet.”

Over 40 million people use Bluesky. Toni’s job is to add a zero, or find the right person who will — and wrestle with all of the organizational, financial, engineering, and product decisions that lead to that growth. The result will be a significant decentralized platform in social media, a realm where the underlying power dynamics of centralization have led to thrown elections, genocides, wars, and a global rise of fascism. So no pressure! The world needs a change, and I want Bluesky to succeed.

Proton Mail Helped FBI Unmask Anonymous ‘Stop Cop City’ Protester

Worth knowing if you think of Proton Mail as being a blanket security solution: in this case it was compelled to provide payment information for an account to the Swiss authorities, who then, via a Mutual Legal Assistance Treaty, handed it over to the FBI. As a result, the FBI were able to determine the identity of the account owner, an activist who does not appear to have been charged with a crime.

This is also kind of a weasely statement:

“Edward Shone, head of communications for Proton AG, the company behind Proton Mail, told 404 Media in an email: “We want to first clarify that Proton did not provide any information to the FBI, the information was obtained from the Swiss justice department via MLAT. Proton only provides the limited information that we have when issued with a legally binding order from Swiss authorities, which can only happen after all Swiss legal checks are passed. This is an important distinction because Proton operates exclusively under Swiss law.” Functionally, though, the material was provided to the FBI.”

Not every Proton Mail account is paid. But adding payment information can effectively de-anonymize a user. Proton does allow cash payments, which are effectively anonymous; this is in line with tools like the Mullvad VPN, which also allows payments to be made fully anonymously.

BBC says ‘irreversible’ trends mean it will not survive without major overhaul

I hold three potentially-conflicting opinions about the BBC at once:

The license fee is a regressive tax that is punitive for lower-income people and needs to be overhauled
While it’s supposed to be independent and representative, its news coverage has sometimes fallen short of this standard
It is a treasure and must be protected at all costs

Every British household that watches live content is supposed to pay £169.50 (around $225) a year. That’s more than many streaming services — although you arguably get a lot more for your money, considering the plethora of local coverage, stations, and other programs that the BBC supports. It doesn’t represent all of its income, but it accounts for most of it.

“In its opening response to government talks over its future, the corporation said 94% of people in the UK continued to use the BBC each month, but fewer than 80% of households contributed to the license fee.”

Because more households are moving to on-demand instead of live — except, perhaps, for sports and some rare but high-profile events — license fee revenue has fallen. It’s interesting to think about what it would take to reform this funding structure to preserve public service broadcasting in the UK.

There’s also an elephant in the room, which is the intentional gutting of public service broadcasting here in the US. How could the British ecosystem be inoculated — or at least strengthened — against that kind of threat from a future government?

I’m not sure that turning it into a “Netflix for British TV” is the right answer. What might it look like to take a more open approach and turn the BBC into something that doesn’t copy any private company’s business model but is something truly new that meets public service media needs in the 21st century? Could it be more of an operating system that supports new experimentation and different kinds of media? How might it be more radically collaborative and representative in ways that private broadcasters aren’t able to achieve? There’s a lot to talk about.

The Safety Levers

Another really good framework from Corey. Leading with vulnerability gives the people on your team permission to be vulnerable too.

“When leaders frame work as execution, they imply the answer is already known. When they frame it as learning, they acknowledge uncertainty is part of the work.

[…] When leaders project certainty, dissent feels risky. When leaders acknowledge fallibility, speaking up becomes contribution, not challenge.”

Modeling uncertainty, learning, and humility allows everyone to be in growth mode vs approaching their work with a fixed mindset. But it has to be done with intention: uncertainty that doesn’t also come with norms around experimentation, feedback, and accountability just feels like instability.

I’m still growing here myself: in my world, everything is a prototype that can be challenged, experimented with, and iterated on. But providing the clear, structured lanes for people to experiment is crucial — and that intentional structure can be one of the first things to go when things get busy or fraught. Structures and norms only matter if they guide us through every situation and if they’re for everyone.

Workers who love ‘synergizing paradigms’ might be bad at their jobs

The results of this study into corporate BS isn’t going to surprise anyone who’s spent much time in an office. The researchers generated meaningless corporate gobbledegook and tested how workers rated its business-savviness.

“Workers who were more susceptible to corporate BS rated their supervisors as more charismatic and “visionary,” but also displayed lower scores on a portion of the study that tested analytic thinking, cognitive reflection and fluid intelligence. Those more receptive to corporate BS also scored significantly worse on a test of effective workplace decision-making.

[…] Essentially, the employees most excited and inspired by “visionary” corporate jargon may be the least equipped to make effective, practical business decisions for their companies.”

The Cornell report labels this as a paradox, I guess because these people disproportionately liked their supervisors but were also bad at their jobs. I don’t see that as a paradox at all: my bias is that people who think for themselves and are more distrustful of hierarchy are, to be honest, smarter.

I love this sentence:

“Researching BS also points out the importance of critical thinking for everyone, inside the workplace and out. “

Well, yes.

Your Browser Becomes Your WordPress

This is absolutely bonkers. If you’re on a desktop browser, it’s worth trying now.

“With my.WordPress.net, WordPress runs entirely and persistently in your browser. There’s no sign-up, no hosting plan, and no domain decision standing between you and getting started. Built on WordPress Playground, my.WordPress.net takes the same technology that powers instant WordPress demos and turns it into something permanent and personal. This isn’t a temporary environment meant to be discarded. It’s a WordPress that stays with you.”

Using WASM and local storage, an entire WordPress setup is installed in your browser, private to you. I’m curious about how nicely this plays with browser syncing — I’m a Zen user and use Firefox accounts to sync between devices, but haven’t kicked the tires yet. Because I flip between a few devices every day, that would be meaningful to me.

But still: running a web application like WordPress in a browser is a meaningful innovation. Launching it as a product instead of some kind of labs experiment tucked away somewhere also indicates that they’re confident in it. It’s interesting to think about what that might mean for other self-hosted personal applications. To-do lists? CRMs? Source management? Lots of scope for private apps that are entirely based on the web platform. What a neat thing.

Coming Off the Bench for Bluesky

Ben Werdmuller — Fri, 13 Mar 2026 13:15:40 GMT

[Toni Schneider]

Toni explains the miracle here:

“I’ll be honest: I was skeptical about decentralized social. The vision was always compelling. A social web that no single company controls, where users own their identity and their relationships, where anyone can build on top of the protocol. But I’d seen enough promising decentralized projects fade or fragment that I had stopped expecting one to get to scale.

Bluesky changed that. Hearing their vision and, more importantly, learning about the architecture they’d built (the AT Protocol) I became a believer. This was a real, scalable foundation for a different kind of internet.”

[Link]

Your Browser Becomes Your WordPress

Ben Werdmuller — Thu, 12 Mar 2026 13:41:01 GMT

[Brandon Payton at WordPress]

This is absolutely bonkers. If you’re on a desktop browser, it’s worth trying now.

“With my.WordPress.net, WordPress runs entirely and persistently in your browser. There’s no sign-up, no hosting plan, and no domain decision standing between you and getting started. Built on WordPress Playground, my.WordPress.net takes the same technology that powers instant WordPress demos and turns it into something permanent and personal. This isn’t a temporary environment meant to be discarded. It’s a WordPress that stays with you.”

[Link]

Workers who love ‘synergizing paradigms’ might be bad at their jobs

Ben Werdmuller — Fri, 06 Mar 2026 14:24:45 GMT

[Kate Blackwood in Cornell Chronicle]

“Workers who were more susceptible to corporate BS rated their supervisors as more charismatic and “visionary,” but also displayed lower scores on a portion of the study that tested analytic thinking, cognitive reflection and fluid intelligence. Those more receptive to corporate BS also scored significantly worse on a test of effective workplace decision-making.

[…] Essentially, the employees most excited and inspired by “visionary” corporate jargon may be the least equipped to make effective, practical business decisions for their companies.”

I love this sentence:

“Researching BS also points out the importance of critical thinking for everyone, inside the workplace and out. “

Well, yes.

[Link]

Proton Mail Helped FBI Unmask Anonymous ‘Stop Cop City’ Protester

Ben Werdmuller — Thu, 05 Mar 2026 22:44:14 GMT

[Joseph Cox at 404 Media]

This is also kind of a weasely statement:

“Edward Shone, head of communications for Proton AG, the company behind Proton Mail, told 404 Media in an email: “We want to first clarify that Proton did not provide any information to the FBI, the information was obtained from the Swiss justice department via MLAT. Proton only provides the limited information that we have when issued with a legally binding order from Swiss authorities, which can only happen after all Swiss legal checks are passed. This is an important distinction because Proton operates exclusively under Swiss law.” Functionally, though, the material was provided to the FBI.”

Not every Proton Mail account is paid. But adding payment information can effectively deanonymize a user. Compare and contrast to, say, Mullvad, which allows payments to be made fully anonymously. (Edited to add: Proton does support cash payments, which are a better route for anonymous users.)

[Link]

BBC says ‘irreversible’ trends mean it will not survive without major overhaul

Ben Werdmuller — Thu, 05 Mar 2026 14:23:17 GMT

[Michael Savage in the Guardian]

I hold three potentially-conflicting opinions about the BBC at once:

The license fee is a regressive tax that is punitive for lower-income people and needs to be overhauled
While it’s supposed to be independent and representative, its news coverage has sometimes fallen short of this standard
It is a treasure and must be protected at all costs

“In its opening response to government talks over its future, the corporation said 94% of people in the UK continued to use the BBC each month, but fewer than 80% of households contributed to the license fee.”

[Link]

The Safety Levers

Ben Werdmuller — Thu, 05 Mar 2026 14:01:16 GMT

[Corey Ford at Point C]

Another really good framework from Corey. Leading with vulnerability gives the people on your team permission to be vulnerable too.

“When leaders frame work as execution, they imply the answer is already known. When they frame it as learning, they acknowledge uncertainty is part of the work.

[…] When leaders project certainty, dissent feels risky. When leaders acknowledge fallibility, speaking up becomes contribution, not challenge.”

[Link]

Can we build the dog?

Ben Werdmuller — Tue, 03 Mar 2026 13:46:24 GMT

“Will the dog hunt?”

My sneakers squeaked on the concrete floor. Twenty entrepreneurs in black hoodies looked up at me, taking notes. This room had been a working garage once; now, we ceremonially opened the garage door to let in new cohorts of early-stage media startups with the potential to change media for good. Outside, the San Francisco traffic honked and screeched.

We were midway through the bootcamp: the week-long course at the beginning of the accelerator that aimed to teach startups the fundamentals of human-centered venture design. We’d taken them out of their comfort zone to help them use journalistic skills to understand who they were building for and why. We’d helped them to think about how to effectively tell the story of their business in a way that helped them sharpen their underlying strategy.

And now I was trying to explain feasibility.

I echoed Corey Ford, the Managing Director, who had laid out the groundwork in the days before. Repetition was our friend.

Desirability, I explained, is your user risk: are you building something that meets a real person’s needs? Will the dog eat the dog food?

Viability, in turn, is your business risk: if you are successful, can your venture succeed as a profitable, growing business? Will the dog hunt?

And now it was time to explore Feasibility: can you provide this service with the team, time, and resources reasonably at your disposal? I leaned in conspiratorially and vamped: can we build the dog?

It was the best job I ever had: using my experience as a founder, an engineer, and a storyteller to support teams that were genuinely trying to make a difference. People who went through Matter have gone on to help countless newsrooms succeed by being more empathetic and product-minded; some have left media and even gone on to build hospitals.

I went through Matter as the founder of Known in 2014 and came back to support other founders a few years later. Since then, how I think about feasibility has completely changed.

The build vs the long, wagging tail

The center of gravity for feasibility, at least in my mind, used to be the build stage. How do you build the initial version of a tool or a service that provides a minimum desirable experience to meet your user’s need?

Startups also need to consider how to make it scale so that you can address a larger potential number of users with the time, team, and resources potentially at your disposal. If you can’t get there with the resources you have, could you get there with investment dollars you could realistically raise?

In a newsroom or other organization the formula is a little bit different. You’re probably not raising money for a specific tool or a service — although, sometimes, grant funding or funding from a corporate parent is available for certain things. But you’re most often asking whether you can provide the service or tool with the time, team, and resources currently at your disposal. Often, your time is limited, your team is small, and your resources are meagre. You have to make stark tradeoffs.

In both contexts, you don’t want to waste time spinning your wheels building solutions to problems other people have already solved. I’ve already written about building vs buying for newsrooms:

Newsroom tech teams are like startups in that they’re running with limited resources and constantly trying to assess how they can provide the most value. Back when I was Director of Investments at Matter Ventures, I advised them to spend their time building the things that made them special — their differentiating value — and using the most boring, accepted solution for everything else.

It's a rule of thumb that works universally: build what makes you special and buy the rest. But the critical difference is that, in newsrooms, what makes you special is the journalism that software enables, not the software itself.

The cost of building something new has fallen through the floor over the last decade. Developer tools have become more powerful, numerous, and freely available. Open source has exploded with libraries that can help you get to an initial version much faster.

Enter AI. Almost without warning, AI-enabled tools dramatically expanded what a resource-strapped team can create. It’s a genuine sea change. The more founders and senior engineers I speak to who are actively using these tools, the more stories I hear about accelerated development. People are building smaller tools that would have taken many sprints in less than a day; founders are building entire startups that might have taken six months in less than one.

But all code needs to be maintained. There are bugs, libraries need to be upgraded, underlying platform changes introduce security flaws and incompatibilities. Changes in business needs mean that tools and services need to be adjusted. All of those things add up to a maintenance overhead that comes with introducing any new tool or service. If we rapidly build more and more software, that maintenance overhead accumulates at speed. Even if we have the discipline to keep our technical footprint small, we’re not absolved from doing what has to be done to keep everything running.

When we consider feasibility, the center of gravity is no longer in building the thing. It’s supporting it.

A shared rubric reduces risk

The dynamics may have changed but every team still needs to make a bet about whether we can build and support a project before it takes it on. If something is obviously not feasible, the team shouldn’t do it. On the other hand, if a team doesn’t have a clear, shared understanding of how to assess feasibility, it can become an easy way for someone to subjectively shut down a project for arbitrary reasons. Without a clearly shared understanding of risk, the idea of risk can be poison.

So that’s what we need: a shared rubric for assessing the feasibility of a project. Our assessments won’t always be right, and we always learn new things about a project in the course of building or supporting it. But while complete certainty is hard to achieve, this will at least provide directional information about whether we can do it.

There are existing frameworks, but they’re mostly designed for large enterprise environments: instead of giving you a directional gut check, they produce documents used in commissioning vendors, justifying budgets to executives, and satisfying governance processes. TELOS — Technical, Economic, Legal, Operational, and Scheduling — feasibility tests are very broad and don’t consider technology alone. PIECES examines whether a proposed project will improve the status quo across Performance, Information, Economics, Control, Efficiency, and Services. Both are useful to understand, but also not quite what most time-strapped contexts demand. We need something scrappier.

A prototype rubric for feasibility

Here are some questions a team can ask themselves. Not only are they useful in themselves, but you can use them for alignment: if a product manager ranks a factor with a low score but a senior engineer on the team ranks it with a high one, you know there’s a problem that you need to dig into.

Each of these questions can lead to its own targeted discussion. The purpose of the rubric is not to be a thought-ending exercise: it’s to align a team around what’s actually important to consider, and open up conversations about any disagreements so that everyone can come to a consensus.

Each person on the team should run through the rubric — perhaps asynchronously — and then share their results with the group in a shared meeting.

These questions take our AI engineering context into account: questions about exploring new architectures are weighted lower than they would have been ten years ago.

1. The problem context (25 points)

How much of this project's scope is a black box? (10 points)
1 = We have built this exact thing before. 10 = This is completely new territory; we don't even know what questions to ask yet.

How much friction will our existing tech stack, legacy systems, or organizational quirks add to the build? (10 points)
1 = Greenfield project using our preferred stack; 10 = Navigating a maze of legacy spaghetti code or systems.

How fundamentally difficult is the core problem we are trying to solve? (5 points)
1 = A standard CRUD app; 5 = Uncharted algorithmic research.

2. Execution (15 points)

Here, the “development period” is tailored to your unit of project organization time on your product roadmap. On some teams, it’s a quarter; for others, it’s half a year. These questions are not meant to be considered at the sprint level.

How much of the team’s total capacity will the initial build consume? (10 points)
1 = We can spin this up in an afternoon; 10 = Consumes 100% of the engineering team's capacity for the development period.

How well do the skills required match the people we currently have? (5 points)
1 = We have deep expertise here; 5 = Requires learning new frameworks from scratch.

3. The long, wagging tail (60 points)

While most of these questions consider up-front issues, this section describes the ongoing overhead for a team. This represents risk: time a team spends working on maintaining an existing tool is time it can’t spend building anything new or maintaining other tools. Over time, without careful lifecycle management and brutal decision-making, a team’s bandwidth can disappear into ongoing maintenance.

How long are we committing to keep this system alive? (20 points)
1 = A disposable prototype or short-term event tool (weeks or months); 20 = A permanent, foundational system we expect to rely on for years.

How much of our team’s capacity will keeping this system alive consume in a typical month? (30 points)
1 = Set it and forget it; 30 = Requires daily babysitting, constant bug fixes, and continuous adaptation to upstream changes.

How heavily does this project rely on teams, platforms, APIs, or vendors that we do not control? (10 points)
1 = Fully self-contained; 10 = Dependent on unstable APIs, beta AI models, or restrictive third-party vendors.

4. The blast radius (40 points)

Because modern tools let small teams build powerful things quickly, the risk of deploying something dangerous or irreversible is higher. This category is weighted heavily to catch those risks.

How sensitive is the information that this tool handles? (20 points)
1 = Public, anonymous data; 20 = Handling highly sensitive PII, whistleblower documents, or financial data.

How catastrophic is it if we ship an imperfect, glitchy version? (10 points)
1 = We can ship it broken and iterate safely; 10 = Mission-critical; if it’s not perfect on day one, we burn trust or face legal ruin.

If we realize this was a mistake halfway through, how hard is it to undo? (10 points)
1 = A two-way door; we can easily turn it off; 10 = A one-way door; involves irreversible data migrations or permanent structural changes.

Once everyone has tallied their numbers, compare your total scores out of 140. This is nothing more than a temperature check: again, it should be considered a conversation-starter, not the final word.

Roughly, here’s how the scores break down:

0–45: Green light. This project is highly feasible. The initial lift is manageable, the ongoing tax on your team is low, and the blast radius if things go wrong is minimal. Build the dog.

46–95: Yellow light. This is the danger zone of hidden costs. You can probably build this, but the lifespan, ongoing maintenance, vendor dependencies, or security requirements will create a permanent drag on your team’s velocity. Before proceeding, ask yourselves: what existing project are we willing to sunset to make room for this new maintenance burden? What’s the opportunity cost, and is the lift to the organization worth it? While this rubric only considers feasibility, this is a good time to go back to desirability and make sure the juice is worth the squeeze.

96–140: Red light. This project is fundamentally infeasible with your current resources. The complexity is too high, the blast radius is too dangerous, or the multi-year maintenance load will simply sink your engineering team. If this project is absolutely vital to the business, you cannot build it scrappily: you need to buy an enterprise solution, hire specialists, or radically reduce the scope by choosing a much smaller problem to solve first.

Once again, this isn’t the be-all and end-all. For one thing, the rubric is not a simple average: pattern matters as much as the total. It’s worth checking for category dominance: if your scores are generally low but are much higher for the long, wagging tail, that doesn’t mean you have an unambiguous green light. These categories may also surface other conversations that aren’t cleanly captured by the rubric. But the first step towards shared understanding is building a structure to achieve understanding about — and hopefully this gets you some of the way there.

And please: talk to people. For more complicated projects, I always think it’s a good idea to speak to experts in order to validate your assumptions about feasibility. For any project, speaking to your users to make sure you’ve nailed desirability, and speaking to equivalent businesses to validate your viability assumptions, are crucial. The map is not the territory, and sometimes you need multiple maps.

Can we build the dog?

The beauty of a shared rubric isn’t that it automatically makes decisions for you. It’s that it forces a team to look at the exact same map. If a product manager scores the project a 40, but a senior engineer scores it a 105, you’ve found an area of disagreement that you need to explore. It’s far better to do that early, before you dive into complicated specification work or writing code.

In a world where AI and modern tooling make it dangerously easy to spin up new software, our ultimate constraint is no longer our ability to type code. It’s our capacity to care for the things we bring into the world. Saying "no" to a project with a massive, hidden maintenance burden isn't a failure of imagination; it is how you protect your team’s time so they can focus on the journalism, the community, or the core mission that actually makes your organization special.

Today, building the dog is the easy part. The real question is whether you have the time, energy, and resources to feed it, walk it, and take it to the vet for the next five years.

If you do, then by all means: let’s see if it hunts.

Good vibes, bad vendors

Ben Werdmuller — Wed, 25 Feb 2026 10:00:03 GMT

When I was thirteen or fourteen I had a really comfortable sweatshirt that I wore to school all the time — but it did have a few inherent problems. For one thing, it had a great big target on it, and wearing a literal target to high school was just asking for it. For another, on top of that, in Looney Tunes writing, was the confident phrase: “It’s a good vibe!”

I was bullied as mercilessly as one might expect, but I honestly think it might have killed in the AI era. I’d like to think I was just ahead of my time.

Andrej Karpathy, an early OpenAI researcher who now works at his own startup, coined the phrase vibe coding last year. To vibe code is to use an LLM like Claude or ChatGPT to generate source code instead of writing it yourself. He meant it as a way to loosely prototype code or to make progress on a weekend project. LLMs, at least at the time, could not be fully trusted to write well-written, working code. It was an out-there idea.

What a difference a year makes. Today, it’s a mainstream conversation that is rapidly reshaping technology strategy — and informing layoffs across industries.

AI conversations are always fraught, for good reasons that include the underlying power dynamics and the bad behavior of most of the AI vendors. At the same time, the whole AI landscape is changing incredibly rapidly, and it’s become a cliché to point out that any discussion of what LLMs can and can’t do today will probably be invalid two or three months from now. And, of course, millions of words have been written about it at this point. But even despite all that, I still think it’s worth talking about.

If you’re running technology in a small, resource-constrained environment — like a newsroom or a non-profit — how should you think about AI-enhanced software engineering? Come to that, how should I?

Let’s talk about it.

First things first: does it work?

It didn’t, and then it did.

Six months ago, LLMs could generate a certain amount of code, but they would often make inefficient decisions or hallucinate libraries and API endpoints, and you’d need to babysit them a lot. Their use was mostly passive: they would generate code snippets based on immediate user prompts, and engineers would have to spend a bunch of time debugging the output. And in terms of security, it was the Wild West; there were essentially no security considerations. LLMs are famously stochastic (their output is randomly determined, not deterministic) and prone to hallucinations. The result was unreliable code.

A lot has changed since then. In particular, the models released in February, 2026 are a sea change in reliability: given the right prompt, they often genuinely can write decent code in one shot. Tools like Claude Code can go off, spawn multiple agents, investigate a problem, build a reasonable plan, and then execute on it, while working in a safely sandboxed environment.

It’s not just about improved models, although they obviously have a central part to play. An ecosystem is developing around doing AI-assisted software engineering well. Plugins like Jesse Vincent’s Superpowers encourage good decision-making based on principles of excellent software architecture design and product management. Structured frameworks like spec-driven development similarly help lead the agent to sensible outcomes; both are incorporated in all-in-one coding lifecycle toolkits like Metaswarm. A rigorous process is preserved, and throughout, there are far more safety guardrails to prevent security incidents (although it’s easy to overcome them, or they’re sometimes not on by default), and using AI to generate code is much safer than it was.

Claude Code absolutely can write the code, build a plan, and document its work. I have been an AI skeptic, but in my experiments I’ve found that it really can feel like magic. You can reasonably object to AI for any number of reasons, but this is no longer one. It works.

The thing to understand is that this is a tool for engineers — and senior engineers will get the best results. It takes real engineering skill to craft a prompt that will do the right thing and result in a strong architecture.

The process changes the center of gravity from writing source code in a programming language to crafting goals, understanding your user, being crystal clear about the experience and the value you want to convey, and thinking about architectural implications. That probably means talking to people, forming a hypothesis about what they need, testing it with them, and considering the ongoing technical implications of the work.

Those are things that senior engineers already spend much of their time doing — indeed, I’d argue that it’s what separates a great senior engineer from a mid-level one. The core question a senior engineer navigates well comes down to: lots of people can write code, but should they? Why, and for whom? Those questions only become more important in a world where AI is writing the source code. When implementation is faster, problem selection and scoping become the scarce skills.

Friction is training: we learn how to engineer software through our terrible experiences. When things go wrong, we learn. When we have to refactor, we learn. When we talk to our peers about our work, we learn. AI removes most of this friction and hides the complexity away from us: it obscures failure, compresses the process of debugging, and automates refactoring. When these hard-earned skills are the reason we can make good software engineering decisions with AI, but the AI doesn’t offer newcomers the ability to build those skills, who will train the AI once we are gone?

Opinions on that change in center of gravity will be intensely divided. I stand by this New Year’s Day thought about Claude Code:

It has the potential to transform all of tech. I also think we’re going to see a real split in the tech industry (and everywhere code is written) between people who are outcome-driven and are excited to get to the part where they can test their work with users faster, and people who are process-driven and get their meaning from the engineering itself and are upset about having that taken away.

I’m very much an outcome-driven developer, and to me it’s a giant relief. Not everyone will feel the same way.

Resource-constrained environments must be outcome-driven. They can’t spend their time on the process of software engineering; the best way for them to move forward is to start small, release a valuable core that solves a problem for some set of users as early as possible, and then continually iterate around it, using user feedback as a guide.

There’s no alternative to having empathetic, human-centered senior engineers on your team — with or without AI. But AI engineering tools may have an interesting side effect: I can see a world where pushing these product and spec questions to the forefront helps more engineers build those skills more quickly. The first step, after all, is understanding that those answers are needed to begin with.

It’s worth saying that there will be many managers who hope that tools like Claude Code will mean they can do away with engineers or dramatically cut their workforces. Of course there will. They may even see engineers as gatekeepers, and there may be resentment that they’re needed at all — and a hope that this work can be done directly by managers or other key employees. In a newsroom, for example, can’t the journalists produce tools now?

For non-engineers, they can be a useful prototyping tool: for a product manager, for example, it may help assess a user interface or experiment with an idea. But those prototypes are not enduring software; nor are they projects that can be “handed off” to engineers to support.

To properly architect a system, there’s a lot you need to consider. This includes performance, scalability, and the ongoing overhead of maintaining a project and keeping it safe: nobody wants to rely on software that proves to be slow, insecure, or impossible to update. You also need to assess the technical implications of a project: are there technical standards that the project should be adhering to, or battle-tested best practices that the design should take into consideration? For all these reasons, an engineer must be involved from the beginning.

These tools can’t replace technical staff, and they shouldn’t. Like I said, these tools are for engineers, not a replacement for them.

Okay, but what about those power dynamics?

Consider an individual, indie developer. Over the last few decades, they’ve become more and more empowered: developer tools have become cheaper and more of them are open source. Power and control have been devolved to the individual; you can run the tools you want on your own hardware, configure or recode them to your needs, use them for free, and share any of your changes. Engineering has become more and more of an open collective built on radical collaboration. That allows developers with fewer resources to build more easily, widening the pool of people who can build startups, create useful tools, and learn these skills to begin with.

AI-assisted engineering centralizes power back in the other direction. Claude Code, Codex, and so on are all centralized, proprietary tools that become harder to move away from the longer they’re relied upon. They’re also expensive: while open source tools are decentralized and free, it’s incredibly easy to spend large amounts on Claude. Based on my own experimentation and anecdotes from friends and peer companies, any engineer that relies on Claude Code as part of their daily work is likely to spend hundreds of dollars a week; these are new costs that didn’t previously exist.

Those extra costs could theoretically be offset by significant performance or efficiency gains. The thing is, those gains aren’t as strong as you might expect given the apparent magic of automatically generated code. A study recently published in Harvard Business Review indicated that adding AI actually intensified the workload, putting engineers at risk of burning out:

The changes brought about by enthusiastic AI adoption can be unsustainable, causing problems down the line. Once the excitement of experimenting fades, workers can find that their workload has quietly grown and feel stretched from juggling everything that’s suddenly on their plate. That workload creep can in turn lead to cognitive fatigue, burnout, and weakened decision-making. The productivity surge enjoyed at the beginning can give way to lower quality work, turnover, and other problems.

These can be mitigated by good work hygiene: enforcing breaks and sensible work hours. But the employers who are most enthusiastic about introducing AI may also be the ones that are least enthusiastic about benefits that center employee well-being over productivity.

I’ve already mentioned that some managers may hope that AI can reduce their investment in software engineers. One can easily imagine that the presence of AI — or, rather, the threat of being replaced by it — could be used as a cudgel to depress engineer salaries. It gives managers more leverage beyond money, too: those longer hours and more intense workloads that the HBR study found could burn engineers out might be more likely in a world where engineers fear for their jobs.

The long-term implications are even starker. Consider a world where the recentralization of power from individuals to large, centralized companies continues at the current pace. When AI writes most source code, fewer and fewer engineers will be capable of doing this work themselves, which will lead to even more dependence and lock-in.

It’s been noted in the past that while generative AI robs artists of the interesting work and leaves them with the mundane bits, for outcome-oriented engineers it robs them of the mundane bits and leaves them with the interesting parts. I’d argue that the real value is in the intersection between coding and the higher-level work; they’re inseparable. By improving the way we code, we improve the way we can solve problems for real people. (How can you solve a problem if you don’t really understand how the solution works?) By improving the way we think about solving problems, we improve the way we code. (How can you code something well if you don’t know who it’s for or why it needs to exist?) They aren’t two separate processes; they’re parts of the same thing. Removing one makes the other less effective.

Without concerted effort, an entire industry will be de-skilled and de-valued, their human expertise replaced with software that charges by the token.

So let’s put in the effort

AI isn’t going away, and AI-assisted software engineering is a permanent addition to the way we build software. But that’s not the same thing as saying that the way we use AI today won’t change.

Any policy for AI-assisted engineering has to take into account risks of various kinds. I’d loosely separate them into the following categories:

Employee risk: preventing burnout, staff turnover, and poor morale.
Security risk: preventing data leaks and security incidents that compromise customers, sources, employees, or other members of the community.
Quality risk: preventing low-quality code that impacts the efficiency, experience, or perceived quality of the organization’s work.
Supplier risk: reducing the potential impact of potentially harmful choices made by AI vendors.

While I’m not going to go into a full framework here — that’s part of what I do at my day job — let’s talk about how we might think about addressing them together.

Employee risk

In that Harvard Business School report about AI-driven burnout in engineers, the authors suggested some sensible mitigations. These included creating, as team norms, structured time for quiet reflection on the project at hand, and limiting interruptions; intentional processes for limiting the work that can move forward, to prevent engineers from taking on (or being asked to take on) too many tasks just because they think they can; and creating more space for empathetic human connection as a team.

Those are all things that every team should do, whether or not they use AI! But they become even more important on an AI-accelerated team. If you don’t have any norms about tightly controlling when work moves forward, for example, adding a tool that accelerates the work will result in a higher volume of work getting processed, but not necessarily any strategic selection about the most important work to do.

Perhaps most importantly, engineers are worried that they’ll be replaced at the hands of managers who may not understand what they do. They need to have the emotional safety and security that comes from knowing that they won’t. It needs to be communicated to them that the importance of their skills is understood. They are experts in their fields, and they’ve just gained another tool to help them; they are not interchangeable with the tool.

Security and quality risk

It turns out that you go a long way towards addressing a lot of security, quality, and efficiency issues — as well as some of the morale issues that lead to employee risk — by placing engineers at the center of the process. Some AI processes talk about “human in the loop”. That term was borrowed from more traditional machine learning processes; in the case of anything where AI takes an action in the world on behalf of a user, like engineering, I’d prefer to reframe it as a tool that is always directly under human control.

In that light, all code must have a human owner who will take responsibility for it. It’s their code, just as if they’d written it in an integrated development environment; they just happened to use a different tool. If all generated code must ultimately be owned and reviewed by a human, that person is able to tune the results for safety, efficiency, and quality.

Most well-run engineering teams have a peer review process where code written by an engineer must be officially reviewed by a second engineer before it can be merged into the main codebase. If we assume that generated code is owned by Engineer A, that means there must be a human Engineer B to give it a second pair of eyes. They might also be using automated tools to help their review along, but they’re the ones who ultimately take responsibility for a review.

This isn’t enough. All projects need to have comprehensive automated testing: tests that must run on code that is about to be merged into the main codebase in order to make sure everything still functions. Tests for efficiency, adherence to style guidelines, and security issues can be run here too. What’s kind of fun is that when these are in place, tools like Claude Code will look at the test output, make corrections when something doesn’t pass, and try again — all automatically.

Supplier risk

The centralization that removes power and agency from engineers also introduces a serious business risk. If a core part of an organization’s value comes from software development, inexorably placing a centralized service in the middle of your process makes you heavily dependent on their decision-making. They can increase their prices, make changes to their stack, or change the way they think about keeping your data and source code safe, and there’s very little you can do about it.

The good news is that, right now, no AI vendor can lock you into their services, because your source code itself and your infrastructure stack are independent of your AI tools. Your code is managed, stored, and hosted in different places, and you can think of source code itself as being a kind of open protocol: because it’s plain text, you can use virtually any tool with it. Source code still has the devolved, open, decentralized properties of the open source ecosystem that has put power in engineers’ hands for decades. That provides at least some protection against an AI vendor suddenly increasing their prices or changing their privacy stance: you can always vote with your feet.

If you’re uncomfortable using one of the major model providers, open source alternatives are available. Tools like Aider and Cline can provide agentic coding using any model, including local models that could theoretically be run on an organization’s own infrastructure. In practice, though, this requires more powerful hardware than most smaller organizations can afford; this may become less of an issue over time, as new hardware emerges, but it certainly is one now. Still, local models could help prevent lock-in — and may prevent some security issues, too.

This inherent openness could change as AI vendors look for ways to increase their revenue and reduce churn. We may see AI-specific alternatives to git and GitHub; I can even imagine programming languages that are “optimized for AI” but that just happen to be proprietary and locked in to a vendor. Every company that builds software should watch for these forms of lock-in and reject them.

We should also be wary of marketing that tells us to just let the AI write code autonomously. These are ideas that cement vendors as a full replacement for the software development process, moving a center of expertise that was previously owned by an organization into a centralized technology owned by someone else. It’s a trap: that world is one where the source code can’t be moved between agents and your products are fully locked into their services without a credible exit.

Do we want to invite these companies into our workplaces?

A ton has been written on the issues surrounding AI. Last summer, I wrote a broader guide to navigating AI that I think still holds up. In it, I noted:

A lot of money has been spent to encourage businesses to adopt AI — which means deeply embed services provided by these vendors into their processes. The intention is to make their services integral as quickly as possible. That’s why there’s heavy sponsorship at conferences for various industries, programs to sponsor adoption, and so on. Managers and board members see all this messaging and start asking, “what are we doing with AI?” specifically because this FOMO message has reached them.

My approach to evaluating AI remains through two main lenses: the technology itself and the vendors who make it, many of whom are intentionally furthering harms and participating in authoritarianism. The further reading section of that earlier piece is a good place to start.

The thing that I didn’t mention then, but is worth calling out now, is the sheer precarity of these vendors. AI vendors are offering their services for below cost and have struggled to articulate value in a way that could credibly lead to profitability. Apparently feeling this gap, OpenAI is experimenting with ads and porn, while finding itself under scrutiny for putting teen wellbeing at risk through choices it made to boost engagement. Anthropic was sued by Reddit last year for scraping Reddit data for training without authorization, and had to settle a high-profile lawsuit brought by book authors whose work it stole for training data. I’ve mentioned Claude Code a bunch in this piece, because it works really well, but it was trained using stolen work.

Meanwhile, from a technical standpoint, there has been some research that there are diminishing returns to new LLM development and we’re already past the peak.

There’s no guarantee these companies will make it. If an organization has invested in agentic coding processes that don’t substantially keep humans in the loop and the vendors that power them disappear, they will be left in a bind, with no in-house expertise, and company strategies that depend on AI. That makes it a dangerous gamble. We will have lost internal skill while increasing our dependence on very fragile external suppliers.

So how should you think about it?

AI coding works. It shifts the center of gravity from implementation to judgment, which increases the value of senior engineering skills. It also introduces significant power, labor, and supplier risks. That means that solid guardrails and cultural norms are non-optional.

Even if you haven’t rolled it out yet, your engineers are almost certainly using it. In conversations with my peers, I’ve heard countless stories of organizations that banned it but discovered that its workforce had just taken matters into their own hands. While there are many engineers who refuse to touch it, many more are eager to have it.

You could ban it, but it’d likely be fruitless: those engineers who use their own accounts will probably keep doing so. It’s better to have the tools in a place that’s under your control and observable than used in the shadows in a way that might put your data at risk. Given that, it’s better to roll it out than to not. But you need to do it with your eyes wide open and with a sense of intentionality. Be aware of the risks, and mitigate them in advance with common sense cultural norms like the ones I discussed earlier: pay attention to your employee and supplier risks in particular. Don’t let AI push to production without oversight. And keep humans not just in the loop but fully in control.

I don’t think it’s productive to mandate the use of AI-assisted engineering, which runs the risk of alienating some engineers — the split between AI skeptics and those who are excited about the technology is real — and preventing nuanced discussions about how the technology can be used inside your workplace. What happens in practice when you just let it roll out to anyone who wants to try it is that people do try it; they find that it’s useful for some tasks, and then quickly find its limitations. That’s a healthy exploration.

How should I think about it? I’m still figuring it out. Jesse Vincent compares the process of hating “agentic” development and pushing through it to discover that code was never the most important part of building software to the process of being a manager and asking your team to build things instead of coding them yourself. I agree that these experiences rhyme — but of course, when you lead a team, you’re investing in human beings, working alongside them, and helping them to grow in the process. That’s exponentially more rewarding than leading software agents built to provide value for a megacorporation.

But it doesn’t need to be that way. You can do both. If you treat the technology as a tool, albeit one that has been made by genuinely problematic companies, you can roll it out to a real, human team and continue to build things together. You can invest in and support them while you navigate new kinds of software problems; together, you can figure out how to shape the culture of an engineering team that is undergoing a paradigm shift. You can train the next generation of software engineers, both keeping the long history of software development in mind, and taking into account these new skills. And you can look for the next thing that properly devolves power down the stack to the individual, for the benefit of everyone.

Software development is still human. You can work together towards a shared mission, pick and choose the pieces of this new technology that make sense according to your strategy and values, and build community in the process.

That’s a good vibe.