In this post, I’ll break down:
By the end, you’ll understand how prompt caching can push LLM performance and efficiency to the next level.
Before we define prompt caching, it helps to clarify what it isn’t.
One common misunderstanding is thinking prompt caching is the same as output caching.
For example:
That’s output caching; storing the result of a call to reuse later.
While output caching can be applied to LLM responses, it’s not what prompt caching refers to.
Prompt caching focuses only on the input prompt, or more precisely, part of that input, and caches how the model interprets it.
Here’s the key idea:
When you send a prompt into an LLM, the model doesn’t begin generating output immediately. Instead, it performs an expensive internal computation called key/value (KV) pair generation.
Prompt caching saves these computed KV pairs so that the model doesn’t have to recompute them again for similar input.
When an LLM receives a prompt:
With prompt caching:
This means developers can structure prompts so that large static content is cached once — and reused across multiple queries.
Prompt caching typically applies to static or semi-static parts of prompts:
✅ System prompts: instructions that define agent behavior
✅ Large documents: manuals, research papers, contracts
✅ Few-shot examples: demonstration examples for output formatting
✅ Static context blocks: any repeated context that remains unchanged
In contrast, dynamic portions (like a user’s question) usually come after these static elements and are not cached.
Prompt caching systems use prefix matching.
This means prompt structure matters; static parts should come first.
]]>AI agents are becoming ubiquitous, but the challenge is that all these agents might be built on different frameworks and technologies. So how do we get them to cooperate on complex tasks?
Imagine planning a trip. Ideally, you’d have:
…but building each one is a ton of work. You might want to reuse someone else’s agent, but often it’s a black box — you don’t know how it’s implemented or how to work with it.
That’s exactly where Agent-to-Agent (A2A) protocol comes in.
A2A is an open standard that provides a common language for agent communication and collaboration — regardless of how the agent is implemented. Think of it like how LangChain made it effortless to switch between models; A2A allows multiple agents to communicate consistently.
An easy metaphor is Lego blocks:
Each A2A agent broadcasts standardized info about itself and supports the same public methods, so any other agent can call it to help complete tasks. This opens up powerful orchestration possibilities.
A2A connects three main roles:
A Client Agent generates requests and handles interaction with the end user. A Remote Agent receives those requests and attempts to respond or act as needed.
Importantly, any agent could act as both client and remote depending on context. It’s not a strict division — it’s dynamic.
A protocol is only as useful as its adoption, and A2A has already seen strong interest in the software industry. While there’s a larger list in the official docs, many major partners have publicly committed to supporting the standard.
Some of the main things A2A supports:
Importantly, A2A keeps each agent opaque — meaning implementation details do not need to be exposed to follow the protocol.
To make this real, let’s walk through an example where Agent A needs Agent B to perform a task.
Agent B publishes an Agent Card — a standard JSON file hosted at a known URI on its domain. This file acts like a digital business card and tells Agent A:
Essentially, it functions like a combination of robots.txt and a microservices registry.
Communication is done over HTTPS using JSON-RPC 2.0. Each message represents one turn of a conversation and contains:
When Agent B receives a request, it processes it via something called an Agent Executor — a class that links the generic A2A protocol (handled by the SDK) to the agent’s specific logic.
This executor lets the agent behave like a Lego block that can be easily connected to other agents.
Not all tasks complete quickly. Some require time, and the protocol handles that gracefully.
When an agent starts a long process:
This polling approach works, but it’s not the most efficient.
If Agent B supports streaming (using Server-Sent Events), Agent A can keep an HTTP connection open and receive live updates:
This enables live progress feedback, like a document appearing piece by piece as it’s generated — a much better user experience.
You can find a detailed tutorial and sample agents at the A2A tutorial link.
Is A2A all you need to productionize your agent?
Not exactly — A2A is one part of the full agent stack. Google recommends a layered stack, but you’re free to choose different components based on your needs.
You might also notice another protocol called MCP being used in agent ecosystems.
A2A and MCP are complementary, not competitors:
In sophisticated agent systems, both will be used:
The best way to learn is by trying it yourself:
pip install a2a-sdk)A2A unlocks powerful workflows in a multi-agent world. Let me know what your plan is with agent collaboration using A2A!
That next step in AI capability is here. It’s about building autonomous systems that don’t just generate text — they sense, think, and act.
At the heart of this evolution is the Agent Development Kit (ADK) — a toolkit that gives AI agents the ability to interpret their environment, reason over data, and take meaningful action. Think of LLMs (large language models) as the voice of AI — great at generating text, summaries, and code. ADKs add the hands and brain — enabling AI to interact with the world.
In this post, I’ll break down:
LLMs are powerful — but they operate in isolation. They don’t read sensor data like temperature or motion. They don’t make decisions without being prompted. They don’t act.
In contrast, ADKs provide the building blocks for:
This is especially critical in robotics, automation, and real-time systems where static models fall short.
Imagine a factory robot that can’t react when a conveyor slows down, a sensor detects overheating, or a part jams. Without feedback loops and decision making, it’s useless. ADKs change that by enabling agents to monitor live data and respond quickly — such as pausing production, cooling equipment, or alerting technicians.
An ADK is a toolkit for building autonomous AI agents that can:
Agents built with ADKs become partners in value-creation. With LLMs alone, the setup is reactive — you send a prompt, the model returns an output. With an ADK, the agent functions autonomously: it observes, makes decisions, and executes actions based on its goals.
This shift takes AI beyond simple language generation — toward collaboration and autonomous operation.
Agents built with ADKs are already impacting multiple fields:
These examples aren’t futuristic — they’re already emerging.
Let’s walk through a practical example: a smart office agent that autonomously monitors and manages environmental conditions.
Our objective is simple:
Our agent needs:
The agent will control:
We’ll use Python as the programming language — because it’s widely adopted for AI and automation and easy to read.
We’ll also set up:
Think of it this way:
| Component | Role |
|---|---|
| Python | Brain |
| IoT Hub | Senses |
| REST APIs | Limbs (Actions) |
Together, they empower the agent to act intelligently.
Simulate situations like:
Monitor how the agent responds, observe failure modes, and refine its logic.
Even simple automation systems need guardrails:
In six steps, you’ve built an agent that autonomously manages office conditions — while keeping humans in control.
No agent should operate without core principles baked in:
Agents must avoid bias in data and decision-making:
Build backup plans:
Transparent agents are trusted agents:
Fairness, safety, and trust lay the foundation for responsible AI.
As ADKs evolve, autonomous agents will collaborate with:
In smart cities, agents could optimize:
In education, agriculture, finance, and healthcare — agents are already beginning to transform how systems operate.
The next generation of AI isn’t just bigger models — it’s smarter, connected systems built for autonomy.
Today we asked:
What does it take to build AI that can think and act on its own?
Now you know:
It starts with the Agent Development Kit (ADK).
So I encourage you — explore open-source ADKs, experiment with sensor integration, and join the growing ecosystem of autonomous agents.
The future is no longer just about language — it’s about action.
Deep Agents is a standalone library built on top of LangChain’s create_agent — but with a battery-included approach that gives agents advanced capabilities such as planning, task decomposition, context management, and deep sub-agent delegation.
In this post, I’ll walk through:
Let’s dive in.
When building agents for real-world use cases, we found there are four core capabilities that significantly improve reliability, especially for more complex or long-running problems.
Instead of single-shot actions, Deep Agents can break tasks into smaller steps — planning before execution. This is enabled through a to-do middleware layer that orchestrates subtasks.
Agents often need to reference large amounts of data, but context windows are limited. Deep Agents solve this using a file system abstraction that lets agents browse and load context on demand.
Rather than tackling all problems within a single agent, Deep Agents can spawn sub-agents designed for specific tasks. This helps avoid context overflow and isolates reasoning into manageable chunks.
Deep Agents can remember what happened across previous conversations — not just within one run — giving them long-running memory across sessions.
Together, these make agents far more capable regardless of the underlying model.
Now let’s look at an example where we build a Deep Agent that generates tailored sales proposals with access to multiple data sources through a virtual file system.
In this system, the virtual file system is composed of three backends:
First, we define a backend factory that returns a composite backend mapping directories to each storage layer:
workspace directory maps to a local file systemusers maps to the SQL backends3_bucket maps to the S3 storage backendThis lets the agent access files without knowing which backend they come from — the library handles those details.
The SQL backend can map database rows to virtual file paths, so listing files under users/ will return database records as files. Likewise, the S3 backend maps files from buckets directly.
Next, we set up various Deep Agent components:
create_deep_agent() function, which acts like create_agent() but includes Deep Agents features like virtual file systemsWith everything defined, we send the agent a prompt to generate a personalized sales proposal for a user named Sarah Chen. We also specify where the final report should be stored.
When running the example:
Examining the generated proposal reveals a tailored email to Sarah — formatted using data from the SQL database, S3, and contextual information drawn from multiple sources.
By integrating virtual file systems and intelligent orchestration into agents:
Virtual file systems are only one of Deep Agents’ core capabilities. Future posts will explore:
Deep Agents represent an evolution in agent design — empowering AI with structured access to persistent knowledge and enabling more reliable, long-running, and complex workflows.
With planning, context management, sub-agent spawning, and memory, agents can do more than ever before — and remain manageable and predictable.
Thanks for reading, and stay tuned for the next deep dive into agent engineering!
]]>According to Gartner, one third of enterprise applications will include agentic AI by 2028 — and that timeline is closer than it might initially seem. The potential is huge, but so is the risk.
This new level of autonomy brings with it not just innovation, but serious governance and security challenges. Unlike traditional rules-based software, agentic AI systems learn and adapt in real time — they interpret data, make decisions, and act on their interpretations. That flexibility is powerful — but also potentially dangerous.
In this post, I’ll walk through:
I recently spoke with cybersecurity experts who work with agentic AI every single day — and they made one thing clear: many traditional AI vulnerabilities are magnified when agents are autonomous.
Here are the key threats you need to know about:
What happens if an attacker sends commands into your agent and takes control — causing it to operate on behalf of the attacker instead of you?
One major entry point for this is prompt injection — where an attacker inserts malicious instructions that the AI interprets as valid. According to the Open Web Application Security Project, prompt injection is the number one attack type right now.
Just like software can be infected with malware, AI models themselves can be infected. Since most organizations don’t build their own models, they must trust and verify externally sourced ones.
Models are trained and fine-tuned using data. If someone subtly contaminates that data — even in small amounts — it can later distort decision-making in unpredictable ways. Think of it as putting a toxin into the water supply: you don’t notice it at first, but it eventually has wide-reaching effects.
Evasion occurs when attackers manipulate the input, not the model itself. By disguising or altering key information, they can confuse the AI; causing it to misinterpret data and produce incorrect (or even dangerous) results.
Threats aren’t just about input — they’re also about output.
Model theft can happen when attackers feed inputs piece by piece, observe the outputs, and reconstruct the model logic over time.
Even worse, a compromised agent might be tricked into revealing sensitive internal data, like credentials or customer information; sometimes through what’s known as a zero-click attack (i.e., the user doesn’t have to take any action for data to be exfiltrated).
This classic attack, overwhelming a system with more requests than it can handle, applies to agentic AI as well. Flood your agent with requests, and it might become too busy to serve legitimate users.
Security is only one half of the problem. Without proper governance, agentic AI can still behave in unpredictable or unacceptable ways.
To illustrate this, let’s walk through a hypothetical (but realistic) scenario:
A recruiting firm adopts an autonomous AI to:
Sounds efficient — until the AI sends an offer without human approval.
Questions arise:
These questions point to core governance issues: autonomy vs visibility, fairness, and explainability.
In this story, the AI also showed bias — favoring candidates from certain schools due to unbalanced training data. As a result, the firm missed qualified applicants and ultimately faced a discrimination lawsuit.
And then the real question emerges: Who is responsible?
There’s no simple answer — and that’s precisely why governance matters.
Agentic AI is powerful; but it must be both governed and secured. Below are practices that help bring both together.
You can’t secure or govern what you don’t know exists.
First, you must identify all AI instances across your environment — including unauthorized systems (often called shadow AI) where developers or teams spin up models without oversight.
Once identified, every AI instance should be evaluated for:
If an instance is exposing sensitive data or services, it must be hardened immediately.
Just as we test networks and applications, we must stress-test AI models.
Simulate:
Real tests reveal vulnerabilities before attackers do.
A runtime layer, or AI-specific firewall, analyzes inputs and outputs in real time. For example:
This adds a layer of defense between users and the agent’s decision-making process.
Effective governance isn’t an afterthought — it’s foundational.
Here are the pillars every organization needs:
Ensure that AI agents are:
Evaluate agents for:
Ongoing assessment is critical:
A centralized dashboard gives executives and auditors visibility into AI operations and compliance.
Here’s the key takeaway:
To build trustworthy agentic AI, you must integrate governance and security; not treat them as optional extras.
Without this integration, autonomous agents may not just be unpredictable; they may be uncontrollable.
You pull up the logs and scan the codebase. Maybe it’s hallucination. Maybe the context window overflowed. But the real problem is this: you can’t tell.
That’s because you’re still debugging agents like traditional software — and that mindset is already obsolete.
In order to understand, debug, and iterate on your agents, you need something new. You need tracing.
With traditional software, behavior is predictable. If we process a refund, we know the sequence:
Every step is defined in code. If something breaks, we inspect logs and trace the issue back to a specific line.
Agents don’t work this way.
With AI agents:
In this world, our code becomes scaffolding. We define:
…but we no longer decide the specific path or decisions the agent takes. That’s driven by the model.
So when things go wrong, where should we look? We need insight into how our agent makes decisions — and that’s where tracing comes in.
We can’t see how a model reasons internally — but we can observe what it does. Every prompt, every step, every tool call, every message leaves measurable signals.
Combined, these signals reconstruct the sequence of actions an agent takes — this is called a trace.
A trace includes:
Agents often interact with users in a conversation. Each message in the conversation creates a new trace.
These traces are grouped into a thread — the full history of the conversation. Threads let you see how agent behavior evolves across multiple turns.
So when something goes wrong with your agent, the answer isn’t in the code — it’s in the trace or the thread.
Tracing should transform how you build AI agents.
When an agent behaves unexpectedly, the logic you’re looking for isn’t in the code — it’s in the trace.
Trace analysis becomes the new debugger.
Because agent logic lives in traces, you test those traces — not individual functions.
You run:
This is how we measure agent performance in production.
The same traces developers use to debug also reveal:
Trace analytics uncovers usage patterns, friction points, and failure modes — all based on actual behaviors, not hypothetical tests.
In traditional software, observability is the “exhaust” you monitor.
With agents, observability becomes fuel — the data that powers every improvement workflow.
This means your observability platform — your tracing system — should become the center of collaboration across teams.
The next time your agent behaves unexpectedly, don’t ask to see the logs.
Ask to see the trace.
Because with agentic AI, traces are the only reliable way to understand, debug, optimize, and evolve your systems.
Tracing isn’t just a helpful tool — it’s the foundation of agent engineering in the AI era.
Let’s begin with a picture.
Imagine a digital image of a sunset over a mountain vista. It’s beautiful. You want to store this picture in a database. Traditionally, you might use a relational database to do this.
In that case, you’d likely store:
sunset, landscape, orange)These fields are helpful—but they don’t capture the semantic context of the image. You can’t easily query “show me pictures with similar colors” or “find images with mountains in the background.”
This disconnect between how data is stored and how it’s understood is called the semantic gap.
Let’s say you query the database:
SELECT * FROM images WHERE color = 'orange';
This query won’t return all sunset images. Why?
Vector databases close the semantic gap using vector embeddings—mathematical representations of data.
They’re arrays of numbers that capture the semantic meaning of unstructured data.
This enables semantic similarity search, not just keyword or tag matching.
You can store all kinds of unstructured content:
These are transformed into vector embeddings and stored in a vector database.
Mountain sunset embedding:
[0.91, 0.15, 0.83, ...]
Beach sunset embedding:
[0.12, 0.08, 0.89, ...]
Real embeddings are high-dimensional and typically not human-interpretable—but they’re very effective.
Using embedding models trained on large datasets:
| Data Type | Model Example |
|---|---|
| Images | CLIP |
| Text | GloVe |
| Audio | Wav2Vec |
Searching millions of high-dimensional vectors is slow—so we use vector indexing.
Algorithms that trade accuracy for speed:
These enable real-time semantic search at scale.
Vector databases power Retrieval-Augmented Generation (RAG):
🧠 They serve as both a knowledge memory and a semantic retriever.
Vector databases are:
By representing data in vector form, they allow systems to think more like humans do.
]]>💡 Bridging the semantic gap isn’t just a technical improvement—it’s a fundamental shift in how machines understand the world.
In the latest version of o3, OpenAI has taken reasoning and tool usage to a new level. The most exciting capability introduced is its ability to combine multi-step reasoning with agentic tool use—enabling it to complete complex tasks more autonomously than ever before.
Let’s walk through a demo of o3 running a month-end variance report.
We begin by asking o3 to generate a variance report based on some dummy department data. Each uploaded spreadsheet contains:
Normally, this workflow would involve:
o3 is now capable of automatically completing each of these discrete steps—calling the appropriate tool or performing the right operation at each phase.
Once the task is sent off, o3 begins its chain-of-thought reasoning, visible in the live output:
In about a minute, o3 produces:
This example highlights the shift from reactive AI to agentic AI. o3 didn’t just wait for instructions, it proactively:
This is not just generative AI; this is agentic execution at scale.
]]>There’s a next-level feature inside of Claude Code that, once you truly see it, it’s impossible to unsee.
Once you see this feature, you’ll understand that AI coding is not enough.
This feature lets you build in ways engineers using Cursor, WindSurf, and other “easy mode” tools simply cannot. But don’t worry — if you’re using one of these tools, you can still tap into this capability.
Every single thing we do now is about scaling our impact by scaling our compute.
Let’s talk about programmable agentic coding — the next generation of developer tools.
What’s better than an agentic coding tool?
An agentic coding tool you can embed inside your workflows.
With Claude Code, you can write a single line of Python like:
claude_p(prompt="...", tools=["edit", "bash", "read"])
This single command wires up prompts, tools, and tasks in ways normal LLM-based code completion tools cannot match.
Here’s the key difference:
Think of it like this:
| Feature | AI Coding | Agentic Coding |
|---|---|---|
| Tool Usage | One-shot | Multi-tool workflows |
| Prompt Format | Static | Dynamic & Reactive |
| Environment | Limited to IDE | Extendable (e.g. Bash) |
| Autonomy | User-driven | Agent-driven |
| Workflow Integration | Minimal | Deep & Reusable |
Claude Code ships with built-in tools like:
editwritegrablsreadbashbatch (parallel execution)The bash tool unlocks terminal access for the AI. With it, Claude can:
And batch takes it further: Claude can spawn sub-agents to do tasks in parallel. That’s self-replication in action — a core property of future autonomous agents.
Claude Code supports MCP (Modular Context Protocol) servers — meaning:
Agentic coding is programmable, modular, and reusable.
Claude Code lets you build living systems — not just throwaway scripts.
Let’s say you’ve written your to-dos inside a Notion page.
You want to:
Claude Code can:
read to fetch Notion tasksbash to create a new projectedit to write the app codenotion.update to mark tasks as completeThat’s not just AI-enhanced — that’s fully automated engineering.
Because Claude Code is programmable, you can:
You can even stack Claude instances — one writes code, one reviews it, another tests and deploys it.
Agentic coding combines:
Claude Code is already here. It is:
AI coding is just autocomplete. Agentic coding is engineering.
We’re on a mission to build living software — systems that run while we sleep, tools that ship code, agents that manage tasks.
The only way to scale your impact is to scale your compute.
Claude Code is how you win in the generative AI age.
]]>“AI coding is not enough. Agentic coding is how you ship real engineering work.”
In this post, we’ll explore what the Model Context Protocol (MCP) is and how to build a custom MCP server using Python. If you’ve been hearing about MCP and wondering what it does or why it’s important, this guide is for you.
MCP (Model Context Protocol) is an open standard developed by Anthropic that allows you to connect external tools, resources, and prompt templates to AI applications.
It’s often described as the USB-C port for AI, providing a standardized way to plug tools and context into models like Claude.
MCP uses a client-server architecture:
Think of it like a shopping store:
As a user or developer, you typically don’t need to build the client — only the server.
The server is an independent module that:
stdio for local development.HTTP + Server Sent Events (SSE) for cloud deployments.Using Anthropic’s Python SDK, you can spin up a server that connects tools to Claude Desktop.
uv – a Rust-based fast Python environment manager.@mcp.prompt@mcp.resource@mcp.toolExample tools:
You’ll now see all the tools, resources, and prompts live inside Claude — ready to use!
MCP makes AI apps:
You define your own workflows and carry them between environments or apps — all with just a bit of Python and MCP.
]]>