How to Add a Secure JavaScript Execution Tool to Microsoft Agent Framework

There is a recurring moment in agent design where a team realizes the model does not just need to reason. It needs to compute. It needs to transform JSON, run a formula, post-process extracted fields, normalize dates, build a dynamic object, or apply domain logic that is simply easier to express in JavaScript than in prompt text.

That is where most teams make a dangerous move. They reach for evalFunction, or Node’s vm module and tell themselves it is “sandboxed enough.”

It is not.

Node’s own documentation is explicit that node:vm is not a security mechanism and should not be used to run untrusted code. Worker threads are also not the right boundary for hostile code because they are designed for parallelism and can share memory. At the same time, Microsoft Agent Framework is built to let agents call external tools through function tools, so the clean pattern is not “run JavaScript inside the agent host.” The clean pattern is “make JavaScript execution a remote tool with a hardened execution boundary.” (Node.js)

That is the architecture this post covers:

  • Microsoft Agent Framework in .NET
  • A custom function tool exposed to the agent
  • A tRPC call from the tool to a separate Node.js execution service
  • Execution inside a locked-down isolate, not vm
  • Explicit whitelisting of namespaces and packages
  • Validation, time limits, memory limits, and auditable policy controls

The key design principle is simple: treat JavaScript execution as a privileged capability, not a convenience API.

The architecture

At a high level, the flow looks like this:

  1. The agent decides it needs computation.
  2. Microsoft Agent Framework calls a function tool.
  3. The function tool sends a request over HTTP to a tRPC endpoint in Node.js.
  4. The Node service validates the request with Zod.
  5. The Node service creates an isolated execution environment.
  6. Only approved globals and wrapped package facades are injected.
  7. The code runs with strict limits for time, memory, and output shape.
  8. The result is returned to the agent as tool output.

Microsoft Agent Framework supports function tools as first-class extensions to an agent, and tRPC gives you a type-safe RPC layer with input and output validation. That combination is ideal here because the .NET side stays thin and deterministic, while the execution policy lives in one place on the Node side. (Microsoft Learn)

First principle: “secure eval” is really “isolated execution”

It is important to be direct here. There is no magic secureEval() in Node.js. If you are executing model-authored or user-authored JavaScript, the safest practical pattern is:

  • out-of-process execution boundary
  • fresh isolate per run or per tenant pool
  • no ambient filesystem or network access
  • no raw require
  • whitelisted host-provided capabilities only
  • timeouts, memory ceilings, and payload size limits
  • container and OS-level restrictions around the service

Why not use node:vm? Because the Node docs explicitly say not to use it as a security boundary. Why not just use worker threads? Because workers are concurrency primitives, not isolation primitives. A better starting point for JavaScript isolation in Node is isolated-vm, which exposes V8 isolates and is designed for running code in fresh environments with no default Node runtime capabilities. Node’s permission model can also further restrict the Node process itself. (Node.js)

The important nuance is this: even isolated-vm should be one layer, not the only layer. The strongest production posture is to run the execution service in its own locked-down container or workload boundary and assume defense in depth.

Tool contract design

Do not let the model send arbitrary source code and a free-form module list with no governance. Give it a constrained contract.

A good request shape looks like this:

import { z } from "zod";
export const ExecuteJsInput = z.object({
code: z.string().max(10_000),
input: z.unknown().optional(),
allowedNamespaces: z.array(z.string()).default([]),
allowedPackages: z.array(z.string()).default([]),
expectedResultSchema: z
.object({
type: z.enum(["json", "string", "number", "boolean", "array", "object"]),
})
.optional(),
timeoutMs: z.number().int().min(50).max(3000).default(1000),
});

This matters for two reasons.

First, tRPC is designed around typed procedures, and Zod-driven validation makes the boundary explicit. Second, you now have a place to enforce policy before any code gets near an isolate. (trpc.io)

The Microsoft Agent Framework side

On the .NET side, the tool should be boring. That is the goal.

Microsoft Agent Framework lets you expose custom logic through function tools, including by creating an AIFunction from a C# method. The agent does not need to know how tRPC works. It just needs a tool description that makes the capability understandable to the model. (Microsoft Learn)

A simplified example:

using System.ComponentModel;
using System.Net.Http.Json;
public class JavaScriptExecutionTool
{
private readonly HttpClient _httpClient;
public JavaScriptExecutionTool(HttpClient httpClient)
{
_httpClient = httpClient;
}
[Description("Executes tightly sandboxed JavaScript for deterministic data transformation and calculation.")]
public async Task<string> ExecuteSandboxedJavaScript(
[Description("The JavaScript source to execute. Must return a serializable result.")] string code,
[Description("Optional JSON input payload for the script.")] string? inputJson = null,
[Description("Approved namespaces the script may access.")] string[]? allowedNamespaces = null,
[Description("Approved package facades the script may access.")] string[]? allowedPackages = null)
{
var request = new
{
code,
input = string.IsNullOrWhiteSpace(inputJson) ? null : System.Text.Json.JsonSerializer.Deserialize<object>(inputJson),
allowedNamespaces = allowedNamespaces ?? Array.Empty<string>(),
allowedPackages = allowedPackages ?? Array.Empty<string>(),
timeoutMs = 1000
};
var response = await _httpClient.PostAsJsonAsync("/trpc/js.execute", request);
response.EnsureSuccessStatusCode();
return await response.Content.ReadAsStringAsync();
}
}

Then you register it as a function tool with your agent. The architectural point is more important than the exact setup syntax: the agent host never evaluates code locally. It delegates execution to the hardened service. (Microsoft Learn)

The tRPC boundary

tRPC is a strong fit because it gives you typed procedures, validation, and a clean contract between the .NET caller and Node service. Even though .NET is not consuming generated TypeScript types directly, the Node service still benefits from strict schemas and a maintainable procedure surface. (trpc.io)

Example router:

import { initTRPC } from "@trpc/server";
import { z } from "zod";
import { ExecuteJsInput } from "./schemas";
import { runSandboxedScript } from "./sandbox";
const t = initTRPC.create();
export const appRouter = t.router({
js: t.router({
execute: t.procedure
.input(ExecuteJsInput)
.mutation(async ({ input, ctx }) => {
return await runSandboxedScript(input, ctx.policyStore);
}),
}),
});
export type AppRouter = typeof appRouter;

This is where you can also add authentication, tenant context, rate limiting, audit metadata, and policy lookup.

The secure execution service

This is the heart of the design.

The mistake many teams make is trying to whitelist modules by exposing require. Do not do that. If you expose require, you are recreating Node inside the sandbox and dramatically expanding the attack surface.

Instead, preload and wrap approved capabilities in the host, then inject only those facades into the isolate.

That means your whitelist is not “the sandbox may import lodash.” It is “the sandbox may access a safe facade called packages.lodash that exposes only getpick, and omit.”

That is a much better boundary.

Example policy registry

type NamespaceFactory = () => Record<string, unknown>;
type PackageFactory = () => Record<string, unknown>;
const namespaceRegistry: Record<string, NamespaceFactory> = {
math: () => ({
round: Math.round,
floor: Math.floor,
ceil: Math.ceil,
max: Math.max,
min: Math.min,
}),
dates: () => ({
nowIso: () => new Date().toISOString(),
}),
};
const packageRegistry: Record<string, PackageFactory> = {
lodash: () => {
const { get, pick, omit } = require("lodash");
return { get, pick, omit };
},
decimal: () => {
const Decimal = require("decimal.js");
return { Decimal };
},
};

Notice what is missing: no arbitrary imports, no filesystem, no fetch, no process access, no environment access.

Example isolate runner

import ivm from "isolated-vm";
import { ExecuteJsInput } from "./schemas";
export async function runSandboxedScript(
request: z.infer<typeof ExecuteJsInput>,
policyStore: PolicyStore
) {
const policy = await policyStore.resolve({
namespaces: request.allowedNamespaces,
packages: request.allowedPackages,
});
const isolate = new ivm.Isolate({ memoryLimit: 64 });
const context = await isolate.createContext();
const jail = context.global;
await jail.set("global", jail.derefInto());
const safeNamespaces = Object.fromEntries(
policy.namespaces.map((name) => [name, namespaceRegistry[name]!()])
);
const safePackages = Object.fromEntries(
policy.packages.map((name) => [name, packageRegistry[name]!()])
);
await jail.set("input", new ivm.ExternalCopy(request.input ?? null).copyInto());
await jail.set("namespaces", new ivm.ExternalCopy(safeNamespaces).copyInto());
await jail.set("packages", new ivm.ExternalCopy(safePackages).copyInto());
const wrapped = `
"use strict";
(async function () {
const console = undefined;
const process = undefined;
const require = undefined;
const module = undefined;
const exports = undefined;
const Buffer = undefined;
const setTimeout = undefined;
const setInterval = undefined;
const userFn = async ({ input, namespaces, packages }) => {
${request.code}
};
return await userFn({ input, namespaces, packages });
})()
`;
const script = await isolate.compileScript(wrapped);
try {
const result = await script.run(context, { timeout: request.timeoutMs });
const copied = await new ivm.Reference(result).copy();
return {
ok: true,
result: copied,
};
} catch (error) {
return {
ok: false,
error: sanitizeError(error),
};
} finally {
isolate.dispose();
}
}

This is intentionally opinionated.

  • The sandbox gets input
  • The sandbox gets namespaces
  • The sandbox gets packages
  • The sandbox does not get Node
  • The sandbox does not get require
  • The sandbox does not get the environment

That is the right posture.

The isolated-vm project describes these isolates as separate JavaScript environments free of the extra capabilities that Node normally exposes. That is why it is a better primitive here than vm. (GitHub)

How whitelisting should really work

A lot of teams hear “whitelist packages” and think they should allow date-fns or lodash directly. That is still too coarse.

You want three policy levels.

1. Namespace whitelist

These are internal capability groups you define, such as:

  • math
  • dates
  • currency
  • tax
  • normalizers

These are ideal for domain logic because they let you present stable semantic surfaces to the model.

2. Package facade whitelist

This is not raw NPM package access. It is a curated wrapper over a package.

Example:

const packageRegistry = {
dateFns: () => {
const { addDays, formatISO, parseISO } = require("date-fns");
return { addDays, formatISO, parseISO };
},
};

3. Tenant or tool policy whitelist

Even if a package exists in the registry, a given agent or tenant may not be allowed to use it.

That means final access should be the intersection of:

  • globally supported capabilities
  • tenant policy
  • current agent policy
  • current tool invocation request

That keeps the model from escalating its own power simply by naming more packages.

What “most secure method” means in practice

Here is the honest version.

If the code is untrusted, the strongest production pattern is not “just use a safer JavaScript library.” The strongest pattern is:

  • dedicated Node execution service
  • running in a separate process or container from the agent host
  • Node permission model enabled where possible
  • no filesystem permission unless explicitly required
  • no network permission unless explicitly required
  • no child process permission
  • no raw module loading
  • isolate-based execution inside the service
  • per-request timeout
  • per-request memory cap
  • rate limiting and audit logging
  • kill-and-recycle strategy for suspicious runs

Node’s permission model is now stable and is specifically intended to restrict access to resources during execution. That makes it a useful outer control around the execution worker process. (Node.js)

So the recommendation is:

Do not run JavaScript evaluation in the Microsoft Agent Framework process. Run it in a separate hardened execution service, and inside that service use an isolate with only host-injected safe facades.

Prompting the agent correctly

One subtle mistake is giving the model too much freedom in how it uses the tool. Your tool description should bias toward deterministic use cases.

Good use cases:

  • schema normalization
  • mathematical calculations
  • JSON reshaping
  • derived field generation
  • deterministic validation helpers
  • short business-rule transforms

Bad use cases:

  • arbitrary web requests
  • importing unknown libraries
  • long-running workflows
  • anything requiring secret access
  • anything that should really be a reviewed backend feature

You want the tool to feel more like “dynamic formula execution” than “tiny remote code runner.”

Observability and governance

Once you add this capability, you need a paper trail.

Log:

  • agent name
  • conversation or run id
  • caller identity
  • code hash
  • requested namespaces
  • requested packages
  • approved namespaces
  • approved packages
  • execution duration
  • memory tier
  • success or failure
  • sanitized error output

Do not log secrets in payloads. Do log enough to reconstruct who ran what and under which policy.

This matters because the risk is no longer just technical. It is operational. A dynamic execution tool without auditability becomes impossible to govern at scale.

Where this pattern is worth it

This pattern is especially valuable when building agents that need deterministic computation without shipping a new backend endpoint for every micro-use-case.

Examples:

  • tax calculation helpers
  • document extraction post-processing
  • migration mapping rules
  • payroll normalization
  • dynamic scoring or threshold logic
  • transforming AI output into strict structured shapes

In all of those cases, JavaScript is the execution language, but policy is the product.

Final opinion

The wrong way to add JavaScript to an agent is to think of it as a convenience feature.

The right way is to think of it as a controlled runtime.

Microsoft Agent Framework gives you the right extension point through function tools. tRPC gives you a clean typed boundary. Node can host the execution service. But the part that separates a toy from a production design is this: never let the model execute inside your primary trust boundary, and never equate “sandboxed” with “safe” unless you can explain the exact layers doing the isolation. (Microsoft Learn)

That is the architecture I use.

The Steinberger Threshold

Most leaders are asking the wrong question about AI.

They ask whether their teams are using it. They ask which model to standardize on. They ask whether agents are ready for production. They ask how quickly they can drive adoption.

That is all downstream. The real question is simpler and far more revealing: who on your team can actually direct AI, and who is starting to be directed by it? That is the divide I keep seeing in product and engineering organizations.

Some people use AI to expand their judgment. Others use it to avoid judgment. Some get faster while staying in control. Others get busy, impressed, and strangely passive. On the surface, both groups can look productive. Both can generate output. Both can show progress.

Only one of them is actually becoming more valuable. That is what is being referred to by some as The Steinberger Threshold.

I am borrowing the phrase from recent discussion around Peter Steinberger, but what matters is not the label. What matters is the shift it names. Steinberger is worth paying attention to because he has lived through multiple eras of software building, from deep technical craftsmanship to AI-native execution. The lesson embedded in his public writing and interviews is clear: the advantage is no longer just in doing the work yourself. The advantage is in framing the work, shaping the environment, inspecting the result, and deciding what happens next.

That is not prompt engineering. That is modern judgment. And that is why this matters more than another generic debate about AI productivity.

We are moving into a world where the cost of execution is falling fast. Agents can increasingly read codebases, edit files, run tests, summarize options, and handle meaningful chunks of delivery work. As that happens, the bottleneck shifts.

When execution gets cheaper, judgment gets more expensive.

That changes who stands out. It changes who scales. It changes who should lead.

The people who thrive in this environment will not be the ones who simply know how to use AI. That bar is dropping quickly. The people who thrive will be the ones who can define intent clearly, give the agent enough structure to move fast, and still know when the machine is wrong, shallow, overconfident, or drifting off mission.

That is the threshold. Below it, people let the agent set the pace, shape the work, and quietly narrow their thinking. Above it, people use the agent as leverage while keeping hold of direction, standards, and accountability.

This is not a tooling issue. It is a leadership issue.

The biggest mistake I see companies making is assuming AI adoption and AI capability are the same thing. They are not. Giving people access to powerful models tells you almost nothing about whether they can use them well. In fact, broad access can hide the problem for a while. Everyone suddenly looks more productive. More documents appear. More prototypes show up. More code gets written. More tickets move.

But velocity is a bad metric when the system can generate convincing motion on demand.

That is where executives get trapped. They see acceleration and assume capability has risen with it. Sometimes it has. Sometimes they are just watching the organization become more dependent on machine output without improving its ability to set direction or judge quality.

That is the real risk.

The person below the Steinberger Threshold is not necessarily junior. They are not necessarily non-technical. They are simply no longer fully in command once AI enters the loop. They delegate too early. They trust polished output too quickly. They confuse completeness with correctness. They let the system define the path instead of using the system to execute against a path they have defined.

The person above the threshold behaves very differently. They treat the agent like fast, tireless, sometimes brilliant labor. They know what outcome they want. They know where ambiguity is useful and where it is dangerous. They know when to tighten the frame. They know what needs review and what can be safely skimmed. Most importantly, they stay accountable for the result.

That last point matters more than people admit.

The best agent operators are usually not the ones writing the fanciest prompts. They are the ones with the clearest standards. They know what good looks like. They can spot weak reasoning. They can tell when the agent is optimizing for fluency instead of truth, or speed instead of soundness. They do not need to inspect every line, but they know exactly which lines matter.

This is why I think the rise of agents will reshuffle status inside product and engineering teams more than most people expect.

Some managers will struggle because they were already operating through abstraction without enough contact with the actual work. AI will expose that quickly. If you cannot define success in a way that a machine can execute against and a human can validate, your authority gets thinner.

Some engineers will struggle too, especially those whose identity is tied too tightly to personal output. AI does not care about your attachment to hand-crafted implementation if someone else can steer the machine to a better result faster.

And some people in the middle of the organization will rise quickly. They may not have the biggest titles. But they have taste. They can decompose messy problems. They can write clear acceptance criteria. They can create structure where others create noise. They can tell the difference between a useful first pass and a dangerous hallucination. In an agentic world, those people become force multipliers.

You can already see the outlines of this shift in the market. Companies are starting to act as though part of every team’s job is now translating work into something machines can execute. Whether you look at AI-first operating models, agentic coding environments, or the emerging idea of software factories, the pattern is the same: the bottleneck is moving away from raw execution and toward the ability to define, direct, and verify execution.

That is the Steinberger Threshold in practice.

So how do you figure out who has crossed it? Not with training completion rates. Not with prompt libraries. Not with AI badges. You run a scout mission.

By that I mean a real piece of work with enough ambiguity that judgment matters, enough structure that success can be observed, and enough consequence that the quality of direction shows up clearly. It should be something an agent can materially accelerate, but not something so trivial that the agent can stumble into a passable answer without supervision.

A good scout mission is not theater. It is a bounded business problem that exposes how someone thinks in an agentic environment.

Give them a real bug with messy symptoms. Give them a workflow that needs redesign. Give them a thin internal tool to build. Give them a reporting process full of edge cases. Then watch what they do.

Do they sharpen the objective before they delegate? Do they define acceptance criteria? Do they improve the environment with better tests, clearer documentation, or stronger context? Do they review the critical path or only the polished summary? Do they notice drift? Do they challenge the output? Can they explain why the result should be trusted?

Most importantly, when the agent gets stronger, do they become more decisive or more passive? That is the question.

Because that is what separates someone who is using AI as leverage from someone who is slowly handing over their agency to it.

My view is simple. The companies that win with AI will not be the ones with the most licenses, the biggest model budget, or the loudest transformation language. They will be the ones that identify who can actually operate above the Steinberger Threshold, then redesign teams, workflows, and leadership expectations around those people.

Because once agents become part of the execution layer, judgment becomes the scarce asset.

And scarce assets end up running the system.

From Using AI to Running AI: The Next Skill Gap

The biggest mistake leaders are making right now is framing the next era as a contest between humans and AI.

That is not what is happening inside high-performing teams. The real separation is already showing up somewhere else: between people who use AI and people who orchestrate it.

AI users get output. AI orchestrators get outcomes.

AI users treat the model like a clever intern. They prompt, they paste, they polish. Their ceiling is the quality of a single interaction.

AI orchestrators design a system where multiple interactions, tools, guardrails, and humans combine into a reliable workflow. They turn “a helpful answer” into “a completed job.” They stop thinking in prompts and start thinking in production.

You can see the industry converging on this. Microsoft is explicitly pushing “multi-agent orchestration” in Copilot Studio, including patterns for handoffs, governance, and monitoring because real work is rarely single-step. (Microsoft) OpenAI’s own guidance leans into the same idea: routines, handoffs, and coordination as the core primitives for building systems you can control and test. (OpenAI Developers) Anthropic draws a clean distinction between workflows that are orchestrated through predefined paths and agents that dynamically use tools, then spends most of its energy on what makes those systems effective in practice. (Anthropic) LangGraph has effectively positioned itself as the “agent runtime” layer for state, control flow, and debugging, which is exactly what orchestration needs when you leave toy demos behind. (LangChain)

This is why “AI literacy” is quickly becoming table stakes and then getting commoditized. Everyone will learn to prompt. Everyone will learn to generate code, slides, summaries, and drafts. That advantage collapses fast.

Orchestration does not collapse fast because it is not a trick. It is an operating model.

What an AI orchestrator actually does

Orchestration is not “use more agents.” Orchestration is the discipline of turning messy work into a repeatable machine without pretending the work is clean.

An orchestrator:

  • Breaks work into steps that can be delegated and verified, not just executed.
  • Connects AI to the real world through tools, systems, and data.
  • Designs handoffs, failure modes, and escalation paths as first-class product features. (Microsoft Learn)
  • Builds observability so you can debug behavior, not just admire outcomes. (Microsoft Learn)
  • Treats evaluation as a release gate, not a vibe check. (Anthropic)

That is why orchestration is showing up everywhere as “multi-agent,” “tool use,” and “workflows vs agents.” It is the same idea wearing different vendor hoodies. (Anthropic)

The uncomfortable truth: orchestration is where leadership lives

If you are a CTO, CPO, or head of product engineering, here is the quiet part out loud: orchestration forces accountability.

Prompting lets teams hide behind cleverness. Orchestration exposes whether you actually understand how value is created in your business.

Because the minute you try to orchestrate, you run into the real constraints:

  • Your data is scattered, permissions are inconsistent, and definitions disagree.
  • Your process is tribal knowledge, not a system.
  • Your edge cases are the product.
  • Your compliance needs are not optional, and your audit trail is not “we asked the model nicely.” (Microsoft Learn)

That is also why orchestration is a strategic advantage. It is hard precisely because it sits at the intersection of product, engineering, operations, security, and change management.

Why “AI users” will hit a wall

AI users become faster individuals. That is useful, but it is not compounding.

They save time on tasks that were never the bottleneck. They produce more artifacts, not more outcomes. They accelerate local productivity while the organization still moves at the speed of coordination.

Orchestration compounds because it scales across people. It turns expertise into a reusable workflow. It captures institutional knowledge in a living system, not in the heads of your best operators.

If you want a practical mental model, stop asking: “How do we get everyone to use AI?”

Start asking: “Which workflows, if orchestrated, would change our unit economics?”

A real-world smell test for orchestration readiness

If any of these sound familiar, you do not have an AI problem. You have an orchestration problem.

  • “We have great pilots, but nothing sticks.”
  • “We got a productivity bump, but delivery still feels chaotic.”
  • “We cannot trust outputs enough to automate anything material.”
  • “We are worried about security and compliance, so we are stuck in chat mode.”
  • “Everyone uses different prompts and gets different answers.”

Those are not model problems. Those are design problems.

The playbook: how teams move from AI use to AI orchestration

You do not need a moonshot. You need a workflow that matters, a thin orchestration layer, and ruthless clarity about quality.

  1. Pick one workflow with real stakes. Something with a clear definition of done. Not “research,” not “brainstorming.” Pick a job like triaging incidents, drafting customer responses with policy constraints, or converting messy inputs into structured records.
  2. Separate roles. Planning, execution, validation, and reporting should not be the same agent or the same step. That separation is the difference between a demo and a system. (OpenAI Developers)
  3. Build handoffs and guardrails, not a super-agent. Multi-agent orchestration exists because specialization plus controlled delegation is easier to debug and govern. (Microsoft)
  4. Make observability mandatory. Logging, tracing, and transcripts are not enterprise overhead. They are how you make AI behavior operational. (Microsoft Learn)
  5. Treat evaluation like CI. Define tests for correctness, policy compliance, and failure modes. If you cannot measure quality, you cannot scale automation. (Anthropic)

The new career moat

In the next two years, “good at prompting” will be like “good at Google.”

Nice. Expected. Not differentiating.

The career moat, and the organizational moat, belongs to the people who can do all of this at once:

  • translate business intent into workflows
  • connect tools and data safely
  • design guardrails and evaluation
  • ship systems that survive contact with reality

That is the orchestrator.

So yes, the gap will widen. But it will not be AI vs humans.

It will be AI users who generate more content versus AI orchestrators who design machines that reliably produce outcomes.

The Iron Triangle Is Back. AI Just Made It Sharper.

Every decade, the tech industry rediscovers a timeless truth and tries to dress it up as something new. Today’s version comes wrapped in synthetic intelligence and VC-grade optimism. But let’s be honest: AI did not kill the Iron Triangle. It fortified it.

For years we have preached that product decisions always balance quality, speed, and cost. You can choose two. The third becomes the sacrifice. AI arrives and many leaders immediately fantasize that this constraint has dissolved. It has not. It has only changed the failure modes.

AI accelerates coding. AI accelerates design. AI accelerates analysis. But the triangle still stands. What changes is which side collapses first and how painfully.

AI Makes “Fast” Frictionless and That Is the Problem

Teams adopt AI believing speed is now the default output. And in a sense it is. Prompt, generate, review, refine, and in minutes you have something that would have taken hours.

But the moment speed becomes effortless, the other two sides of the triangle take the hit.

Where things break:

  • Quality erodes quietly. Models hallucinate domain logic that engineers fail to notice. It compiles, it runs, and it is dangerously wrong.
  • Architectural discipline collapses. AI can ship features faster than teams can design scalable foundations. The result is a time bomb with fancy UX.
  • Costs compound through rework. The speed you gained upfront becomes technical debt someone must pay later, usually at triple the price.

AI made it easy to go fast. It did not make it safe.

AI Can Make Things “Cheap” but Often Only on Paper

Executives love AI because it hints at lower staffing costs, faster cycles, and higher margins.
They imagine a world where a handful of developers and designers can do the work of an entire department.

But here is the uncomfortable truth:

AI reduces the cost of creation, not the cost of correction.

The cheapest phase of a project is the moment you generate something. The most expensive phase is everything that comes after:

  • validating
  • integrating
  • securing
  • governing
  • maintaining
  • debugging
  • explaining to auditors why your model embedded training data into a client deliverable

AI does not make product development cheap. It simply delays the bill.

AI Promises “Quality” but Delivers Illusions of It

Platforms brag about AI-enhanced quality: fewer bugs, cleaner architecture, automated testing, smarter design. In reality, quality becomes performance theater unless teams evolve how they think, work, and review.

Common pitfalls:

  • AI code looks clean, reads well, and still violates half your constraints.
  • AI documentation is confident and completely fabricated.
  • AI test cases are shallow unless you explicitly direct them otherwise.

AI produces confidence without correctness. And too many leaders mistake the former for the latter. If you optimize for quality using AI, you must slow down and invest in human review, architecture, governance, and domain expertise. Which means speed suffers. Or costs rise.

The triangle always demands a price.

The Harsh Truth: AI Did Not Break the Triangle. It Exposed How Many Teams Were Already Cheating.

Before AI, many organizations pretended they could have all three. They could not, but the inefficiencies were human and therefore marginally manageable.

AI amplifies your ambition and your dysfunction.

  • Fast teams become reckless.
  • Cheap teams become brittle.
  • Quality-obsessed teams become paralyzed.

AI accelerates whatever you already are. If your product culture is weak, AI makes it weaker. If your engineering fundamentals are fragile, AI shatters them.

So What Do Great Teams Do? They Choose Deliberately.

The best product and engineering organizations do not pretend the triangle is gone. They respect it more than ever.

They make explicit choices:

  • If speed is the mandate, they pair AI with strict guardrails, strong observability, and pre-defined rollback paths.
  • If cost is the mandate, they track total lifecycle cost, not just dev hours.
  • If quality is the mandate, they slow down, invest in architecture, require human-in-the-loop validation, and accept that throughput will dip.

Great teams do not chase all three. They optimize two and design compensations for the third.

The Takeaway: AI Is Not a Shortcut. It Is a Magnifier.

AI does not free you from the Iron Triangle. It traps you more tightly inside it unless you understand where the real constraints have shifted.

The leaders who win in this era are the ones who stop treating AI as magic and start treating it as acceleration:

  • Acceleration of value
  • Acceleration of risk
  • Acceleration of consequences

AI is a force multiplier. If you are disciplined, it makes you unstoppable. If you are sloppy, it exposes you instantly.

AI did not remove the tradeoffs.
It made them impossible to ignore.

Idea to Demo: The Modern Operating Model for Product Teams

Most product failures do not start with bad intent. They start with a very normal leadership sentence: “We have an idea.”

Then the machine kicks in. Product writes a doc. Engineering estimates it. Design creates a few screens. Everyone nods in a meeting. Everyone leaves with a different movie playing in their head. Two months later, we discover we built the wrong thing with impressive efficiency.

If you want a practical, repeatable way to break that pattern, stop treating “demo” as something you earn at the end. Make it the thing you produce at the beginning.

Idea to demo is not a design preference. It is an operating model. It pulls product management and product engineering into the same room, at the same time, with the same object in front of them. It forces tradeoffs to show up early. It replaces vague alignment with shared context, shared ownership, and shared responsibility.

And in 2026, with AI prototyping and vibecoding, there is simply no excuse for big initiatives or even medium-sized features to stay abstract for weeks.

“A demo” is not a UI. It is a decision

A demo is a working slice of reality. It can be ugly. It can be mocked. It can be held together with duct tape. But it must be interactive enough that someone can react to it like a user, not like a reviewer of a document.

That difference changes everything:

  • Product stops hiding behind language like “we will validate later.”
  • Engineering stops hiding behind language like “we cannot estimate without requirements.”
  • Design stops being forced into pixel-perfect output before the shape of the problem is stable.

A demo becomes the shared artifact that makes disagreement productive. It is much easier to resolve “Should this step be optional?” when you can click the step. It is much harder to resolve in a doc full of “should” statements.

This is why “working backwards” cultures tend to outperform “hand-off” cultures. Amazon’s PR/FAQ approach exists to force clarity early, written from the customer’s point of view, so teams converge on what they are building before scaling effort. (Amazon News) A strong demo does the same thing, but with interaction instead of prose.

AI changed the economics of prototypes, which changes the politics of buy-in

Historically, prototypes were “expensive enough” that they were treated as a luxury. A design sprint felt like a special event. Now it can be a Tuesday.

Andrej Karpathy popularized the phrase “vibe coding,” describing a shift toward instructing AI systems in natural language and iterating quickly. (X (formerly Twitter)) Whether you love that phrase or hate it, the underlying point is real: the cost of turning intent into something runnable has collapsed.

Look at the current tool landscape:

  • Figma is explicitly pushing “prompt to prototype” workflows through its AI capabilities. (Figma)
  • Vercel’s v0 is built around generating working UI from a description, then iterating. (Vercel)
  • Replit positions its agent experience as “prompt to app,” with deployment built into the loop. (replit)

When the cheapest artifact in the room is now a runnable demo, the old sequencing of product work becomes irrational. Writing a 12-page PRD before you have a clickable or runnable experience is like arguing about a house from a spreadsheet of lumber instead of walking through a frame.

This is not just about speed. It is about commitment.

A written document is easy to agree with and easy to abandon. A demo creates ownership because everyone sees the same thing, and everyone’s fingerprints show up in it.

Demos create joint context, and joint context creates joint accountability

Most orgs talk about “empowered teams” while running a workflow that disempowers everyone:

  • Product “owns” the what, so engineering is brought in late to “size it.”
  • Engineering “owns” the how, so product is kept out of architectural decisions until they become irreversible.
  • Design “owns” the UI, so they are judged on output rather than outcomes.

Idea to demo rewires that dynamic. It creates a new contract: we do not leave discovery with only words.

In practice, this changes the first week of an initiative. Instead of debating requirements, the team debates behavior:

  • What is the minimum successful flow?
  • What is the one thing a user must be able to do in the first demo?
  • What must be true technically for this to ever scale?

That third question is where product engineering finally becomes a co-author instead of an order-taker.

When engineering participates at the start, you get better product decisions. Not because engineers are “more rational,” but because they live in constraints. Constraints are not blockers. Constraints are design material.

The demo becomes the meeting point of product intent and technical reality.

The hidden superpower: demos reduce status games

Long initiatives often become status games because there is nothing concrete to anchor the conversation. People fight with slide decks. They fight with vocabulary. They fight with frameworks. Everyone can sound right.

A demo punishes theater.

If the experience is confusing, it does not matter how good the strategy slide is. If the workflow is elegant, it does not matter who had the “best” phrasing in the PRD.

This is one reason Design Sprint-style approaches remain effective: they compress debate into making and testing. GV’s sprint model is built around prototyping and testing in days, not months. (GV) Even if you never run a formal sprint, the principle holds: prototypes short-circuit politics.

“Velocity” is the wrong headline. Trust is the payoff.

Yes, idea to demo increases velocity. But velocity is not why it matters most.

It matters because it builds trust across product and engineering. Trust is what lets teams move fast without breaking each other.

When teams demo early and often:

  • Product learns that engineering is not “blocking,” they are protecting future optionality.
  • Engineering learns that product is not “changing their mind,” they are reacting to reality.
  • Design learns that iteration is not rework, it is the process.

This is how you get a team that feels like one unit, not three functions negotiating a contract.

What “Idea to Demo” looks like as an operating cadence

You can adopt this without renaming your org or buying a new tool. You need a cadence and a definition of done for early-stage work.

Here is a practical model that scales from big bets to small features:

  1. Start every initiative with a demo target. Not a scope target. A demo target. “In 5 days, a user can complete the core flow with stubbed data.”
  2. Use AI to collapse the blank-page problem. Generate UI, generate scaffolding, generate test data, generate service stubs. Then have humans make it coherent.
  3. Treat the demo as a forcing function for tradeoffs. The demo is where you decide what you will not do, and why.
  4. Ship demo increments internally weekly. Not as a status update. As a product. Show working software, even if it is behind flags.
  5. Turn demo learnings into engineering reality. After the demo proves value, rewrite it into production architecture deliberately, instead of accidentally shipping the prototype.

That last step matters. AI makes it easy to create something that works. It does not make it easy to create something that is secure, maintainable, and operable.

The risks are real. Handle them with explicit guardrails.

Idea to demo fails when leaders mistake prototypes for production, or when teams treat AI output as “good enough” without craftsmanship.

A few risks worth calling out:

  • Prototype debt becomes production debt. If you do not plan the transition, you will ship the prototype and pay forever.
  • Teams confuse “looks real” with “is real.” A smooth UI can hide missing edge cases, performance constraints, privacy issues, and data quality problems.
  • Overreliance on AI can reduce human attention. There is growing debate that vibe-coding style workflows can shift attention away from deeper understanding and community feedback loops, particularly in open source ecosystems. (PC Gamer)

Guardrails solve this. The answer is not to avoid demos. The answer is to define what a demo is allowed to be.

As supporting material, here is a simple checklist I have seen work:

  • Label prototypes honestly: “demo-grade” vs “ship-grade,” and enforce the difference.
  • Require a productionization plan: one page that states what must change before shipping.
  • Add lightweight engineering quality gates early: basic security scanning, dependency hygiene, and minimal test coverage, even for prototypes.
  • Keep demos customer-centered: if you cannot articulate the user value, the demo is theater.
  • Make demos cross-functional: product and engineering present together, because they own it together.

The leadership move: fund learning, not just delivery

If you want teams to adopt idea to demo, you have to stop rewarding only “on-time delivery” and start rewarding validated learning. That is the executive shift.

A demo is the fastest way to learn whether an initiative is worth the next dollar. It is also the fastest way to create a team that acts like owners.

In a world where AI can turn intent into interfaces in minutes, your competitive advantage is no longer writing code quickly. It is forming conviction quickly, together, on the right thing, for the right reasons, and then applying real engineering discipline to ship it.

The companies that win will not be the ones with the best roadmaps. They will be the ones that can take an idea, turn it into a demo, and use that demo to align humans before they scale effort.

That is how you increase velocity. More importantly, that is how you build teams that are invested from day one.

Why First Principles Thinking Matters More Than Ever in the Age of AI

It sounds a bit dramatic to argue that how you think about building products will determine whether you succeed or fail in an AI-infused world. But that is exactly the argument: in the age of AI, a first principles approach is not just a mental model; it is essential to cut through hype, complexity, and noise to deliver real, defensible value.

As AI systems become commoditized, and as frameworks, APIs, and pretrained models become widely accessible, the margin of differentiation will not come from simply adding AI or copying what others have done. What matters is how you define the core problem, what you choose to build or not build, and how you design systems to leverage AI without being controlled by it. Doing that well requires going back to basics through first principles.

What Do We Mean by “First Principles” in Product Development?

The notion of first principles thinking goes back to Aristotle. A “first principle” is a foundational assumption or truth that cannot be deduced from anything more basic. Over time, modern thinkers have used this as a tool: instead of reasoning by analogy (“this is like X”), they break down a problem into its core elements, discard inherited assumptions, and reason upward from those fundamentals. (fs.blog) (jamesclear.com)

In product development, that means:

  • Identifying the core problem rather than symptoms or surface constraints
  • Questioning assumptions and conventions such as legacy technology, market norms, or cost structures
  • Rebuilding upward to design architecture, flows, or experiences based on what truly matters

Instead of asking “What is the standard architecture?” or “What are competitors doing?”, a first principles mindset asks, “What is the minimal behavior that must exist for this product to deliver value?” Once that is clear, everything else can be layered on top.

This approach differs from incremental or analogy-driven innovation, which often traps teams within industry norms. In product terms, first principles thinking helps teams:

  • Scope MVPs more tightly by distinguishing essentials from optional features
  • Choose architectures that can evolve over time
  • Design experiments to test core hypotheses
  • Avoid being locked into suboptimal assumptions

As one product management blog puts it: “First principles thinking is about breaking down problems or systems into smaller pieces. Instead of following what others are doing, you create your own hypothesis-based path to innovation.” (productled.com)

How to Define Your First Principles

Before applying first principles thinking, a team must first define what their first principles are. These are the non-negotiable truths, constraints, and goals that form the foundation for every design, architectural, and product decision. Defining them clearly gives teams a common compass and prevents decision-making drift as AI complexity increases.

Here is a practical process for identifying your first principles:

  1. Start from the user, not the system.
    Ask: What does the user absolutely need to achieve their goal? Strip away “nice-to-haves” or inherited design conventions. For example, users may not need “a chatbot”; they need fast, reliable answers.
  2. List all assumptions and challenge each one.
    Gather your team and write down every assumption about your product, market, and technical approach. For each, ask:
    • What evidence supports this?
    • What if the opposite were true?
    • Would this still hold if AI or automation disappeared tomorrow?
  3. Distinguish facts from beliefs.
    Separate proven facts (user data, compliance requirements, physical limits) from opinions or “tribal knowledge.” Facts form your foundation; beliefs are candidates for testing.
  4. Identify invariants.
    Invariants are truths that must always hold. Examples might include:
    • The product must maintain data privacy and accuracy.
    • The user must understand why an AI-generated output was made.
    • Performance must stay within a given latency threshold.
      These invariants become your design guardrails.
  5. Test by reasoning upward.
    Once you have defined your base principles, rebuild your solution from them. Each feature, model, or interface choice should trace back to a first principle. If it cannot, it likely does not belong.
  6. Revisit regularly.
    First principles are not static. AI tools, user expectations, and regulations evolve. Reassess periodically to ensure your foundations still hold true.

A helpful litmus test: if someone new joined your product team, could they understand your product’s first principles in one page? If not, they are not yet clear enough.

Why First Principles Thinking Is More Critical in the AI Era

You might ask: “Is this just philosophy? Why now?” The answer lies in how AI changes the product landscape.

1. AI is a powerful tool, but not a substitute for clarity

Because we can embed AI into many systems does not mean we should. AI has costs such as latency, interpretability, data needs, and hallucinations. If you do not understand what the product must fundamentally do, you risk misusing AI or overcomplicating the design. First principles thinking helps determine where AI truly adds leverage instead of risk.

2. The barrier to entry is collapsing, and differentiation is harder

Capabilities that once took years to build are now available through APIs and pretrained models. As more teams embed AI, competition grows. Differentiation will come from how AI is integrated: the system design, feedback loops, and human-AI boundaries. Teams that reason from first principles will design cleaner, safer, and more effective products.

3. Complexity and coupling risks are magnified

AI systems are inherently interconnected. Data pipelines, embeddings, and model interfaces all affect each other. If your architecture relies on unexamined assumptions, it becomes brittle. First principles thinking uncovers hidden dependencies and clarifies boundaries so teams can reason about failures before they occur.

AI also introduces probabilistic behavior and non-determinism. To guard against drift or hallucinations, teams must rely on fundamentals, not assumptions.

In short, AI expands what is possible but also multiplies risk. The only stable foundation is clear, grounded reasoning.

Examples of First Principles in Action

SpaceX and Elon Musk

Elon Musk often cites that he rejects “reasoning by analogy” and instead breaks down systems to their physical and cost components. (jamesclear.com) Rather than asking “How do other aerospace companies make rockets cheaply?”, he asked, “What are rockets made of, and what are the true material costs?” That approach led to rethinking supply chains, reuse, and design.

While this is not an AI product, it illustrates the method of reimagining from fundamentals.

SaaS and Product Teams

  • ProductLed demonstrates how first principles thinking leads to hypothesis-driven innovation. (productled.com)
  • UX Collective emphasizes designing from core user truths such as minimizing friction, rather than copying design conventions. (uxdesign.cc)
  • Starnavi discusses how questioning inherited constraints improves scope and architecture. (starnavi.io)

AI Product Teams

  • AI chat and agent teams that focus only on the essential set of user skills and resist the urge to “make the model do everything” tend to build more reliable systems.
  • Some companies over-embed AI without understanding boundaries, leading to hallucinations, high maintenance costs, and user distrust. Later teams often rebuild from clearer principles.
  • A study on responsible AI found that product teams lacking foundational constraints struggle to define what “responsible use” means. (arxiv.org)

How to Apply First Principles Thinking in AI-Driven Products

  1. Start with “Why.” Define the true user job to be done and the metrics that represent success.
  2. Strip the problem to its essentials. Identify what must exist for the product to function correctly. Use tools like Socratic questioning or “Five Whys.”
  3. Define invariants and constraints. Specify what must always hold true, such as reliability, interpretability, or latency limits.
  4. Design from the bottom up. Compose modules with clear interfaces and minimal coupling, using AI only where it adds value.
  5. Experiment and instrument. Create tests for your hypotheses and monitor drift or failure behavior.
  6. Challenge assumptions regularly. Avoid copying competitors or defaulting to convention.
  7. Layer sophistication gradually. Build the minimal viable product first and only then add features that enhance user value.

A Thought Experiment: An AI Summarization Tool

Imagine building an AI summarization tool. Many teams start by choosing a large language model, then add features like rewrite or highlight. That is analogy-driven thinking.

A first principles approach would look like this:

  • Mission: Help users extract key highlights from a document quickly and accurately.
  • Minimal behavior: Always produce a summary that covers the main points and references the source without hallucinations.
  • Constraints: The summary must not invent information. If confidence is low, flag the uncertainty.
  • Architecture: Build a pipeline that extracts and re-ranks sentences instead of relying entirely on the model.
  • Testing: A/B test summaries for accuracy and reliability.
  • Scope: Add advanced features only after the core summary works consistently.

This disciplined process prevents the tool from drifting away from its purpose or producing unreliable results.

Addressing Common Objections

“This takes too long.”
Going one or two layers deeper into your reasoning is usually enough to uncover blind spots. You can still move fast while staying deliberate.

“Competitors are releasing features quickly.”
First principles help decide which features are critical versus distractions. It keeps you focused on sustainable differentiation.

“What if our assumptions are wrong?”
First principles are not fixed truths but starting hypotheses. They evolve as you learn.

“We lack enough data to know the fundamentals.”
Questioning assumptions early and structuring experiments around those questions accelerates learning even in uncertainty.

From Hype to Foundation

In an era where AI capabilities are widely available, the difference between good and exceptional products lies in clarity, reliability, and alignment with core user value.

A first principles mindset is no longer a philosophical exercise; it is the foundation of every sustainable product built in the age of AI. It forces teams to slow down just enough to think clearly, define what truly matters, and build systems that can evolve rather than erode.

The best AI products will not be the ones with the largest models or the most features. They will be the ones built from a deep understanding of what must be true for the product to deliver lasting value.

Before you think about model fine-tuning or feature lists, pause. Deconstruct your domain. Identify your invariants. Question every assumption. That disciplined thinking is how you build products that not only survive the AI era but define it.

Drucker and the AI Disruption: Why Landmarks of Tomorrow Still Predicts Today

When Peter Drucker published The Landmarks of Tomorrow in 1959, he was writing about the future, but not this future. He saw the rise of knowledge work, the end of mechanical thinking, and the dawn of a new age organized around patterns, processes, and purpose. What he didn’t foresee was artificial intelligence, a force capable of accelerating his “landmarks” faster than even he could have imagined.

Today, AI isn’t simply automating tasks or assisting humans. It is disrupting the foundations of how enterprises are built, governed, and led. Drucker’s framework, written for the post-industrial age, has suddenly become the survival manual for the AI-powered one.

From Mechanistic Control to Pattern Intelligence

Drucker warned that the industrial worldview, which was linear, predictable, and mechanistic, was ending. In its place would rise a world defined by feedback loops, patterns, and living systems.

That is precisely the shift AI has unleashed.

Enterprise leaders still talk about “projects,” “pipelines,” and “processes,” but AI doesn’t play by those rules. It learns, adapts, and rewires itself continuously. The organizations that treat AI as a static tool will be replaced by those that treat it as an intelligent process, one that learns as it runs.

Companies used to manage through reporting lines. Now they must manage through data flows. AI has become the nervous system, the pattern recognizer, the process optimizer, and the hidden hand that connects the enterprise’s conscious mind (its strategy) with its reflexes (its operations).

If Drucker described management as the art of “doing things right,” AI has made that art probabilistic. The managers who ignore this are already obsolete.

The Knowledge Worker Meets the Algorithm

Drucker’s greatest prediction, the rise of the “knowledge worker,” is being rewritten in real time. For 70 years, the knowledge worker has been the enterprise’s most precious asset. But now, the knowledge itself has become the product, processed, synthesized, and recombined by large language models.

We are entering what might be called the algorithmic knowledge economy. AI doesn’t just help the lawyer draft faster or the developer code better. It competes with their very value proposition.

Yet, rather than eliminating knowledge work, AI is forcing it to evolve. Drucker said productivity in knowledge work was the greatest management challenge of the 21st century. He was right, but AI is solving that challenge by redefining the role itself.

The best knowledge workers of tomorrow will not just do the work. They will design, supervise, and refine the AI that does it. The new productivity frontier isn’t about faster execution. It is about orchestrating intelligence, both human and machine, into systems that learn faster than competitors can.

AI as a Management Disruptor

If Drucker saw management as a discipline of purpose, structure, and responsibility, AI is now testing every one of those principles.

  • Purpose: AI can optimize toward any goal, but which one? Efficiency, profitability, fairness, sustainability? The model will not decide that for you. Leadership will.
  • Structure: Hierarchies are collapsing under the speed of decision loops that AI can execute autonomously. The most adaptive enterprises are building networked systems that behave more like ecosystems than bureaucracies.
  • Responsibility: Drucker believed ethics and purpose were the essence of management. In AI, that moral compass can no longer be implied. It must be engineered into the system itself.

In other words, AI does not just change how we manage. It challenges what management even means.

From Centralized Control to Federated Intelligence

Drucker predicted that traditional bureaucracies would give way to decentralized, knowledge-based organizations. That is exactly what is happening, except now it is not just humans at the edge of the organization, but algorithms.

AI is enabling every business unit, every function, every product team to have its own localized intelligence. The new question isn’t “how do we scale AI?” It is “how do we coordinate dozens of semi-autonomous AI systems working in parallel?”

Enterprise leaders who cling to centralization will find themselves trapped in a paradox. They want control, but AI thrives on freedom. Drucker would call this the new frontier of management: creating governance that empowers autonomy without sacrificing accountability.

This is why the AI-first enterprise of the future will look less like a corporation and more like a distributed cognitive organism, one where humans and machines make up a shared nervous system of learning, adaptation, and decision-making.

Values as the Ultimate Competitive Edge

Drucker wrote that the “next society” would have to rediscover meaning, that economic progress without moral purpose would collapse under its own weight.

AI is testing that thesis daily.

Enterprises racing to deploy AI without a value compass are discovering that technological advantage is fleeting. The companies that will endure are those that turn ethics into an operating principle, not a compliance checklist.

Trust is now a competitive differentiator. The winners will not just have the best models. They will have the most trustworthy ones, and the culture to use them wisely.

AI does not absolve leaders of responsibility. It multiplies it.

AI Is Drucker’s “Next Society” Arriving Early

If Drucker were alive today, he would say the AI revolution is not a technological shift, but a civilizational one. His “Next Society” has arrived early, and it is powered by algorithms that behave more like collaborators than tools.

The irony is that Drucker’s warnings were not about machines. They were about people: how we adapt, organize, and lead when the rules change. AI is simply the latest, most unforgiving test of that adaptability.

The enterprises that survive will not be those with the most advanced AI infrastructure. They will be those that rethink their management philosophy, shifting from command and control to purpose and orchestration, from metrics to meaning.

Wrapping Up

AI is Drucker’s world accelerated, a management revolution disguised as a technology trend.
Those who still see AI as just another tool are missing the point.

AI is the most profound management disruptor of our generation, and Landmarks of Tomorrow remains the best playbook we never realized we already had.

The question isn’t whether AI will reshape the enterprise. It already has.
The real question is whether leaders will evolve fast enough to manage the world Drucker saw coming, and which AI has now made real.

The Future of AI UX: Why Chat Isn’t Enough

For the last two years, AI design has been dominated by chat. Chatbots, copilots, and assistants are all different names for the same experience. We type, it responds. It feels futuristic because it talks back.

But here’s the truth: chat is not the future of AI.

It’s the training wheels phase of intelligent interaction, a bridge from how we once used computers to how we soon will. The real future is intent-based AI, where systems understand what we need before we even ask. That’s the leap that will separate enterprises merely using AI from those transformed by it.

Chat-Based UX: The Beginning, Not the Destination

Chat has been a brilliant entry point. It’s intuitive, universal, and democratizing. Employees can simply ask questions in plain language:

“Summarize this week’s client updates.”
“Generate a response to this RFP.”
“Explain this error in our data pipeline.”

And the AI responds. It’s accessible. It’s flexible. It’s even fun.

But it’s also inherently reactive. The user still carries the cognitive load. You have to know what to ask. You have to remember context. You have to steer the conversation toward the output you want. That works for casual exploration, but in enterprise environments, it’s a tax on productivity.

The irony is that while chat interfaces promise simplicity, they actually add a new layer of friction. They make you the project manager of your own AI interactions.

In short, chat is useful for discovery, but it’s inefficient for doing.

The Rise of Intent-Based AI

Intent-based UX flips the equation. Instead of waiting for a prompt, the system understands context, interprets intent, and takes initiative.

It doesn’t ask, “What do you want to do today?”
It knows, “You’re preparing for a client meeting, here’s what you’ll need.”

This shift moves AI from a tool you operate to an environment you inhabit.

Example: The Executive Assistant Reimagined

An executive with a chat assistant types:

“Create a summary of all open client escalations for tomorrow’s board meeting.”

An executive with an intent-based assistant never types anything. The AI:

  • Detects the upcoming board meeting from the calendar.
  • Gathers all open client escalations.
  • Drafts a slide deck and an email summary before the meeting.

The intent, prepare for the meeting, was never stated. It was inferred.

That’s the difference between a helpful assistant and an indispensable one.


Intent-Based Systems Drive Enterprise Productivity

This isn’t science fiction. The foundational pieces already exist: workflow signals, event streams, embeddings, and user behavior data. The only thing missing is design courage, the willingness to move beyond chat and rethink what a “user interface” even means in an AI-first enterprise.

Here’s what that shift enables:

  • Proactive workflows: A project manager receives an updated burn chart and recommended staffing adjustments when velocity drops, without asking for a report.
  • Contextual automation: A tax consultant reviewing a client case automatically sees pending compliance items, with drafts already prepared for submission.
  • Personalized foresight: A sales leader opening Salesforce doesn’t see dashboards; they see the top three accounts most likely to churn, with a prewritten email for each.

When designed around intent, AI stops being a destination. It becomes the invisible infrastructure of productivity.

Why Chat Will Eventually Fade

There’s a pattern in every major computing evolution. Command lines gave us precision but required expertise. GUIs gave us accessibility but required navigation. Chat gives us flexibility but still requires articulation.

Intent removes the requirement altogether.

Once systems understand context deeply enough, conversation becomes optional. You won’t chat with your CRM, ERP, or HR system. You’ll simply act, and it will act with you.

Enterprises that cling to chat interfaces as the primary AI channel will find themselves trapped in “talking productivity.” The real leap will belong to those who embrace systems that understand and anticipate.

What Intent-Based UX Unlocks

Imagine a workplace where:

  • Your data tools automatically build dashboards based on the story your CFO needs to tell this quarter.
  • Your engineering platform detects dependencies across services and generates a release readiness summary every Friday.
  • Your mobility platform (think global compliance, payroll, or travel) proactively drafts reminders, filings, and client updates before deadlines hit.

This isn’t about convenience. It’s about leverage.
Chat helps employees find information. Intent helps them create outcomes.

The Takeaway

The next phase of enterprise AI design is not conversational. It’s contextual.

Chatbots were the classroom where we learned to speak to machines. Intent-based AI is where machines finally learn to speak our language — the language of goals, outcomes, and priorities.

The companies that build for intent will define the productivity curve for the next decade. They won’t ask their employees to chat with AI. They’ll empower them to work alongside AI — fluidly, naturally, and with purpose.

Because the future of AI UX isn’t about talking to your tools.
It’s about your tools understanding what you’re here to achieve.

How AI Is Opening New Markets for Professional Services

The professional services industry, including consulting, legal, accounting, audit, tax, advisory, engineering, and related knowledge-intensive sectors, stands on the cusp of transformation. Historically, many firms have viewed AI primarily as a tool to boost efficiency or reduce cost. But increasingly, forward-thinking firms are discovering that AI enables them to expand into new offerings, customer segments, and business models.

Below I survey trends, opportunities, challenges, and strategic considerations for professional services firms that aim to go beyond optimization and into market creation.

Key Trends Shaping the Opportunity Landscape

Before diving into opportunities, it helps to frame the underlying dynamics.

Rapid Growth in AI-Driven Markets

  • The global Artificial Intelligence as a Service (AIaaS) market is projected to grow strongly, from about USD 16.08 billion in 2024 to USD 105 billion by 2030 (CAGR ~36.1%) (grandviewresearch.com)
  • Some forecasts push even more aggressively. Markets & Markets estimates AIaaS will grow from about USD 20.26 billion in 2025 to about USD 91.2 billion by 2030 (CAGR ~35.1%) (marketsandmarkets.com)
  • The AI consulting services market is also booming. One forecast places the global market at USD 16.4 billion in 2024, expanding to USD 257.6 billion by 2033 (CAGR ~35.8%) (marketdataforecast.com)
  • Another projection suggests the AI consulting market could reach USD 58.19 billion by 2034, from about USD 8.75 billion in 2024 (zionmarketresearch.com)
  • Meanwhile, the professional services sector itself is expected to grow by USD 2.07 trillion between 2024 and 2028 (CAGR ~5.7%), with digital and AI-led transformation as a core driver (prnewswire.com)

These macro trends suggest that both supply (consulting and integration) and demand (client AI adoption) are expanding in parallel, creating a rising tide on which professional services can paddle into new spaces.

From Efficiency to Innovation and Revenue Growth

In many firms, early AI adoption has followed a standard path: use tools to automate document drafting, data extraction, analytics, or search. But new reports and surveys suggest that adoption is maturing into more strategic use.

  • The Udacity “AI at Work” research finds a striking “trust gap.” While about 90% of workers use AI in some form, fewer trust its outputs fully. (udacity.com) That suggests substantial room for firms to intervene through governance, assurance, audits, training, and oversight services.
  • The Thomson Reuters 2025 Generative AI in Professional Services report notes that many firms are using GenAI, but far fewer are tracking ROI or embedding it in strategy (thomsonreuters.com)
  • An article from OC&C Strategy observes that an over-focus on “perfect bespoke solutions” can stall value capture; instead, a pragmatic “good-but-not-perfect” deployment mindset allows earlier revenue and learning (occstrategy.com)
  • According to RSM, professional services firms are rethinking workforce models as AI automates traditionally junior tasks, pressing senior staff into more strategic work (rsmus.com)

These signals show that we are approaching a second wave of AI in professional services, where firms seek to monetize AI not just as a cost lever but as a growth engine.

Four Categories of Market-Building Opportunity

Here are ways professional services firms can go beyond automation to build new markets.

Opportunity TypeDescriptionExamples / Use Cases
1. AI-Powered Advisory and “AI-as-a-Service” OfferingsFirms package domain expertise and AI models into products or subscription servicesA legal firm builds a contract-analysis engine and offers subscription access; accounting firms provide continuous anomaly detection on client ERP data
2. Assurance, Audit, and AI Governance ServicesAs AI becomes embedded in client systems, demand for auditing, validation, model governance, compliance, and trust frameworks will growAuditing AI outputs in regulated sectors, reviewing model fairness, or certifying an AI deployment
3. Vertical or Niche Micro-Vertical AI SolutionsRather than broad horizontal tools, build AI models specialized for particular industries or subdomainsA consulting firm builds an AI tool for energy forecasting in renewable businesses, or an AI model for real estate appraisal
4. Platform, API, or Marketplace EnablementFirms act as intermediaries or enablers, connecting client data to AI tools or building marketplaces of agentic AI servicesA tax firm builds a plugin marketplace for tax-relevant AI agents; a legal tech incubator curates AI modules

Let’s look at each in more depth.

1. AI-Powered Advisory or Embedded AI Products

One of the most direct routes is embedding AI into the service deliverable, turning part of the deliverable from human labor to intelligent automation, and then charging for it. Some possible models:

  • Subscription or SaaS model: tax, audit, or legal firms package their AI engine behind a SaaS interface and charge clients on a recurring basis.
  • Outcome-based models: pricing tied to detected savings or improved accuracy from AI insights.
  • Embedded models: AI acts as a “co-pilot” or second reviewer, but service teams retain oversight.

By moving in this direction, professional services firms evolve into AI product companies with recurring revenues instead of purely project-based revenue.

A notable example is the accounting roll-up Crete Professionals Alliance, which announced plans to invest $500M to acquire smaller firms and embed OpenAI-powered tools for tasks such as audit memo writing and data mapping. (reuters.com) This shows how firms see value in integrating AI into service platforms.

2. Assurance, Audit, and AI Governance Services

As clients deploy more AI, they will demand greater trust, transparency, and compliance, especially in regulated sectors such as finance, healthcare, and government. Professional services firms are well positioned to provide:

  • AI audits and validation: ensuring models work as intended, detecting bias, assessing robustness under adversarial conditions.
  • Governance and ethics frameworks: helping clients define guardrails, checklists, model review boards, or monitoring regimes.
  • Regulation compliance and certification: as governments begin regulating high-risk AI, firms can audit or certify client systems.
  • Trust as a service: maintaining ongoing oversight, monitors, and health-checks of deployed AI.

Because many organizations lack internal AI expertise or governance functions, this becomes a natural extension of traditional audit, risk, or compliance practices.

3. Vertical or Niche AI Solutions

A generic AI tool is valuable, but its economics often require scale. Professional services firms can differentiate by combining domain depth, industry data, and AI. Some advantages:

  • Better accuracy and relevance: domain knowledge helps build more precise models.
  • Reduced client friction: clients are comfortable trusting domain specialists.
  • Fewer competitors: domain-focused models are harder to replicate.

Examples:

  • A consulting firm builds an AI model for commodity price forecasting in mining clients.
  • A legal practice builds a specialized AI tool for pharmaceutical patent litigation.
  • An audit firm builds fraud detection models tuned to logistics or supply chain clients.

The combination of domain consulting and AI product is a powerful differentiator.

4. Platform, Agentic, or Marketplace Models

Instead of delivering all AI themselves, firms can act as platforms or intermediaries:

  • Agent marketplace: firms curate AI “agents” or microservices that clients can pick, configure, and combine.
  • Data and AI orchestration layers: firms build middleware or connectors that integrate client systems with AI tools.
  • Ecosystem partnerships: incubate AI startups or partner with AI vendors, taking a share of commercialization revenue.

In this model, the professional services firm becomes the AI integrator or aggregator, operating a marketplace that others plug into. Over time, this can generate network effects and recurring margins.

What Existing Evidence and Practitioner Moves Show

To validate that these ideas are more than theoretical, here are illustrative data points and real-world moves.

  • Over 70% of large professional services firms plan to integrate AI in workflows by 2025 (Thomson Reuters).
  • In a survey by Harvest, smaller firms report agility in adopting AI and experimentation, possibly making them early movers in new value models. (getharvest.com)
  • Law firms such as Simmons & Simmons and Baker McKenzie are converting into hybrid legal-tech consultancies, offering AI-driven legal services and consultative tech advice. (ft.com)
  • Accenture has rebranded its consulting arm to “reinvention services” to highlight AI-driven transformation at scale. (businessinsider.com)
  • RSM US announced plans to invest $1 billion in AI over the next three years to build client platforms, predictive models, and internal infrastructure. (wsj.com)
  • In Europe, concern is rising that AI adoption will be concentrated in large firms. Ensuring regional and mid-tier consultancies can access infrastructure and training is becoming a policy conversation. (europeanbusinessmagazine.com)

These moves show that leading firms are actively shifting strategy to capture AI-driven revenue models, not just internal efficiency gains.

Strategic Considerations and Challenges

While the opportunity is large, executing this transformation requires careful thinking. Below are key enablers and risks.

Key Strategic Enablers

  1. Leadership alignment and vision
    AI transformation must be anchored at the top. PwC’s predictions emphasize that AI success is as much about vision as adoption. (pwc.com)
  2. Data infrastructure and hygiene
    Clean, well-governed data is the foundation. Without that, AI models falter. OC&C warns that focusing too much on perfect models before data readiness may stall adoption.
  3. Cross-disciplinary teams
    Firms need domain specialists, data scientists, engineers, legal and compliance experts, and product managers working together, not in silos.
  4. Iterative, minimum viable product (MVP) mindset
    Instead of waiting for a perfect AI tool, launch early, learn, iterate, and scale.
  5. Trust, transparency, and ethics
    Given the trust gap highlighted by Udacity, firms need to embed explainability, human oversight, monitoring, and user education.
  6. Change management and talent upskilling
    Legacy staff need to adapt. As firms automate junior tasks, roles shift upward. RSM and others are already refocusing talent strategy.

Challenges and Risks

  • Regulation and liability: increasing scrutiny on AI’s safety, fairness, privacy, and robustness means potential legal risk for firms delivering AI-driven services.
  • Competition from tech-first entrants: pure AI-native firms may outpace traditional firms in speed and innovation.
  • Client reluctance and trust issues: many clients remain cautious about relying on AI, especially for mission-critical decisions.
  • ROI measurement difficulty: many firms currently fail to track ROI for AI initiatives (according to Thomson Reuters).
  • Skill and talent shortage: hiring and retaining AI-capable talent is a global challenge.
  • Integration complexity: AI tools must integrate with legacy systems, data sources, and client workflows.

Suggested Roadmap for Firms

Below is a high-level phased roadmap for a professional services firm seeking to evolve from AI-enabled efficiency to market creation.

  1. Diagnostic and capability audit
    • Assess data infrastructure, AI readiness, analytics capabilities, and talent gaps.
    • Map internal use cases (where AI is already helping) and potential external transitions.
  2. Pilot external offerings or productize internal tools
    • Identify one or two internal tools (for example, document summarization or anomaly detection) and wrap them as client offerings.
    • Test with early adopters, track outcomes, pricing, and adoption friction.
  3. Develop governance and assurance capability
    • Build modular governance frameworks (explainability, audit trails, human review).
    • Offer these modules to clients as part of service packages.
  4. Expand domain-specific products and verticals
    • Use domain expertise to build specialized AI models for client sectors.
    • Build go-to-market and sales enablement geared to those verticals.
  5. Launch platform or marketplace approaches
    • Once you have multiple AI modules, offer them via API, plugin, or marketplace architecture.
    • Partner with technology vendors and startup ecosystems.
  6. Scale, monitor, and iterate
    • Invest in legal, compliance, and continuous monitoring.
    • Refine pricing, SLAs, user experience, and robustness.
    • Use client feedback loops to improve.
  7. Institutionalize AI culture
    • Upskill all talent, both domain and technical.
    • Embed reward structures for productization and value creation, not just billable hours.

Why This Matters for Clients and Firms

  • Clients are demanding more value, faster insight, and continuous intelligence. They will value service providers who deliver outcomes, not just advice.
  • Firms that remain purely labor or consulting based risk commoditization, margin pressure, and competition from AI-native entrants. The firms that lean into AI productization will differentiate and open new revenue streams.
  • Societal and regulatory forces will strengthen the demand for trustworthy, auditable, and ethically-built AI systems, and professional service firms are well placed to help govern those systems.

Conclusion

AI is not just another technology wave for professional services. It is a market reset. Firms that continue to treat AI as a back-office efficiency play will slowly fade into irrelevance, while those that see it as a platform for creating new markets will define the next generation of the industry.

The firms that win will not be the ones with the best slide decks or the largest data lakes. They will be the ones that productize their expertise, embed AI into their client experiences, and lead with trust and transparency as differentiators.

AI is now the new delivery model for professional judgment. It allows firms to turn knowledge into scalable and monetizable assets, from predictive insights and continuous assurance to entirely new advisory categories.

The choice is clear: evolve from service provider to AI-powered market maker, or risk becoming a subcontractor in someone else’s digital ecosystem. The professional services firms that act decisively today will own the playbooks, platforms, and profits of tomorrow.

The Great Reversal: Has AI Changed the Specialist vs. Generalist Debate?

For years, career advice followed a predictable rhythm: specialize to stand out. Be the “go-to” expert, the person who can go deeper, faster, and with more authority than anyone else. Then came the countertrend, where generalists became fashionable. The Harvard Business Review argued that broad thinkers, capable of bridging disciplines, often outperform specialists in unpredictable or rapidly changing environments.
HBR: When Generalists Are Better Than Specialists—and Vice Versa

But artificial intelligence has rewritten the rules. The rise of generative models, automation frameworks, and intelligent copilots has forced a new question:
If machines can specialize faster than humans, what becomes of the specialist, and what new value can the generalist bring?

The Specialist’s New Reality: Depth Is No Longer Static

Specialists once held power because knowledge was scarce and slow to acquire. But with AI, depth can now be downloaded. A model can summarize 30 years of oncology research or code a Python function in seconds. What once took a career to master, AI can now generate on demand.

Yet the specialist is not obsolete. The value of a specialist has simply shifted from possessing knowledge to directing and validating it. For example, a tax expert who understands how to train an AI model on global compliance rules or a medical researcher who curates bias-free datasets becomes exponentially more valuable. AI has not erased the need for specialists; it has raised the bar for what specialization means.

The new specialist must be both a deep expert and a domain modeler, shaping how intelligence is applied in context. Technical depth is not enough. You must know how to teach your depth to machines.

The Generalist’s Moment: From Connectors to Orchestrators

Generalists thrive in ambiguity, and AI has made the world far more ambiguous. The rise of intelligent systems means entire workflows are being reinvented. A generalist, fluent in multiple disciplines such as product, data, policy, and design, can see where AI fits across silos. They can ask the right questions:

  • Should we trust this model?
  • What is the downstream effect on the client experience?
  • How do we re-train teams who once performed this work manually?

In Accenture’s case, the firm’s focus on AI reskilling rewards meta-learners, those who can learn how to learn. This favors generalists who can pivot quickly across domains, translating AI into business outcomes.
CNBC: Accenture plans on exiting staff who can’t be reskilled on AI

AI gives generalists leverage, allowing them to run experiments, simulate strategies, and collaborate across once-incompatible disciplines. The generalist’s superpower, pattern recognition, scales with AI’s ability to expose patterns faster than ever.

The Tension: When AI Collapses the Middle

However, there is a danger. AI can also collapse the middle ground. Those who are neither deep enough to train or critique models nor broad enough to redesign processes risk irrelevance.

Accenture’s stance reflects this reality: the organization will invest in those who can amplify AI, not those who simply coexist with it.

The future belongs to T-shaped professionals, people with one deep spike of expertise (the vertical bar) and a broad ability to collaborate and adapt (the horizontal bar). AI does not erase the specialist or the generalist; it fuses them.

The Passionate Argument: Both Camps Are Right, and Both Must Evolve

The Specialist’s Rallying Cry: “AI needs us.” Machines can only replicate what we teach them. Without specialists who understand the nuances of law, medicine, finance, or engineering, AI becomes dangerously confident and fatally wrong. Specialists are the truth anchors in a probabilistic world.

The Generalist’s Rebuttal: “AI liberates us.” The ability to cross disciplines, blend insights, and reframe problems is what allows human creativity to thrive alongside automation. Generalists build the bridges between technical and ethical, between code and client.

In short: the age of AI rewards those who can specialize in being generalists and generalize about specialization. It is a paradox, but it is also progress.

Bottom Line

AI has not ended the debate. It has elevated it. The winners will be those who blend the curiosity of the generalist with the credibility of the specialist. Whether you are writing code, crafting strategy, or leading people through transformation, your edge is not in competing with AI, but in knowing where to trust it, challenge it, and extend it.

Takeaway

  • Specialists define the depth of AI.
  • Generalists define the direction of AI.
  • The future belongs to those who can do both.

Further Reading on the Specialist vs. Generalist Debate

  1. Harvard Business Review: When Generalists Are Better Than Specialists—and Vice Versa
    A foundational piece exploring when broad thinkers outperform deep experts.
  2. CNBC: Accenture plans on exiting staff who can’t be reskilled on AI
    A look at how one of the world’s largest consulting firms is redefining talent through an AI lens.
  3. Generalists
    This article argues that generalists excel in complex, fast-changing environments because their diverse experience enables them to connect ideas across disciplines, adapt quickly, and innovate where specialists may struggle.
  4. World Economic Forum: The rise of the T-shaped professional in the AI era
    Discusses how professionals who balance depth and breadth are becoming essential in hybrid human-AI workplaces.
  5. McKinsey & Company: Rewired: How to build organizations that thrive in the age of AI
    A deep dive into how reskilling, systems thinking, and organizational design favor adaptable talent profiles.