Agentic AI for Serious Engineers¶
A practical field guide to building reliable, evaluable, and production-grade agent systems
Most agentic AI material teaches you how to build an impressive demo. This book teaches engineers how to build agent systems that survive real-world constraints: unclear requirements, bad tool outputs, partial failures, prompt injection, and cost pressure.
Thirteen chapters across four parts. A single project that grows from a prototype to a governed, secured, memory-enabled system connected via MCP and A2A protocols. The thesis: agents are useful only when they earn their complexity.
This site is the code companion. It contains working Python implementations for every concept, 130+ passing tests, three end-to-end projects, and 40+ hand-crafted architecture diagrams.
New to agentic AI?¶
Start with the Foundations -- five hands-on sections that take you from zero to building your first agent and connecting it to tools via MCP.
| # | Section | What you learn |
|---|---|---|
| 0a | How LLMs Actually Work | The engineer's mental model: APIs, tokens, context, hallucination |
| 0b | From API Calls to Tool Use | Function calling, schema validation, giving the model hands |
| 0c | Your First Agent, No Framework | Build a complete agent in 100 lines. See it work. See it break. |
| 0d | The Same Agent, With a Framework | ADK and LangChain side-by-side. Eval comparison. Choose with data. |
| 0e | Connecting Your Agent to MCP | Build an MCP server, connect your agent to real tools and services. |
Chapters¶
Part I: Building -- From components to multi-agent systems
| # | Chapter | Focus |
|---|---|---|
| 1 | What "Agentic" Actually Means | Precise vocabulary: LLM app vs workflow vs agent vs multi-agent |
| 2 | Tools, Context, and the Agent Loop | Building blocks: tool registry, context engineering, observe-think-act |
| 3 | Workflow First, Agent Second | The most important architectural decision |
| 4 | Multi-Agent Systems Without Theater | Coordination patterns, MCP, A2A, AIP protocols |
Part II: Judging -- Oversight, evaluation, and knowing when to stop
| # | Chapter | Focus |
|---|---|---|
| 5 | Human-in-the-Loop as Architecture | Approval gates, escalation, and auditability |
| 6 | Evaluating and Hardening Agents | Eval harnesses, tracing, reliability, cost, security |
| 7 | When Not to Use Agents | The signature chapter -- judgment over hype |
Part III: Operating -- Production reality
| # | Chapter | Focus |
|---|---|---|
| 8 | Metacognition and Self-Reflection | Loop detection, quality assessment, strategy switching |
| 9 | Deploying and Scaling | Durable execution, observability, autoscaling |
| 10 | Governance and Auditability | Decision traces, compliance boundaries, risk tiers |
| 11 | Security Deep Dive | The Lethal Trifecta, defense in depth, red teaming |
Part IV: Advanced Patterns
| # | Chapter | Focus |
|---|---|---|
| 12 | Memory Management | Session, long-term, shared memory, memory security |
| 13 | Agent Protocols in Production | Enterprise MCP, A2A at scale, AIP delegation chains |
Chapter 1 is available as a free sample. The full book is on Amazon.
Projects¶
Three end-to-end systems built incrementally through the chapters:
- Document Intelligence Agent -- Ingest documents, retrieve evidence, answer with citations, escalate on uncertainty
- Incident Runbook Agent -- Inspect signals, search runbooks, propose remediation, request human approval
- Memory Agent -- Memory-augmented pipeline with session, long-term, and shared memory layers
Evidence¶
- Baseline Eval Report -- Gold dataset evaluation with rubric scoring
- Architecture Comparison -- Workflow vs agent side-by-side metrics
- Trace Examples -- Structured execution traces with token accounting
- Failure Case Studies -- Real failure analysis and lessons learned
Get the book on Amazon | GitHub Repository | sunilprakash.com