Deployment patterns from local execution to autonomous systems
We provide here example code that implements a set of agentic capabilities of increasing sophistication, from local execution to governed, autonomous scientific systems.
The code makes use of two complementary agent frameworks:
-
LangGraph — LLM reasoning, tool calling, and structured workflows. Handles the intelligence layer.
-
Academy — Distributed execution, federation, and HPC integration. Handles the distribution layer.
Why both? Production scientific agents need both capabilities. LangGraph excels at LLM orchestration but runs in a single process. Academy excels at distributed execution but doesn't handle LLM reasoning. Together, they enable intelligent agents that run anywhere—from laptops to federated DOE infrastructure.
See these slides for a brief review of these two systems, and one more, Microsoft Agent Framework.
There are excellent tutorial materials online for LangGraph and for Academy. Here we focus on showing how to use the two systems, independently and together, to realize scalable agentic systems for science.
Building Scientific Agents — Learn to build production scientific agents:
- LLM Agents — Build agents that reason and call tools (LangGraph)
- Distributed Agents — Run agents across machines (Academy)
- Production Agents — Combine both for real deployments
| Additional Resources | |
|---|---|
| LLM Configuration | Configure OpenAI, Ollama, or FIRST backends |
The sample code shows how to implement a series of increasingly sophisticated agentic patterns.
LangGraph, Academy
Your on-ramp to CAF. Run persistent, stateful agents on a laptop or workstation—no federation required.
Status: Mature
LangGraph + Academy
Cross-institutional agent execution under federated identity and policy.
Status: Mature
LangGraph, Aegis
Fan out thousands of LLM requests in parallel on HPC.
Status: Prototype
Academy governance
Invoke expensive, stateful, or dangerous tools under proactive policy enforcement.
Status: WIP
Shared state + policy + budgets
Many agents under shared governance—within one institution or across many.
Status: Emerging
Lifecycle management
Agents that persist for days to months, maintaining state, memory, and goals.
Status: Emerging
Dynamic workflow construction
Agents dynamically construct, adapt, and execute scientific workflows.
Status: Early
| Level | Meaning |
|---|---|
| Mature | Documented with working examples on this site |
| Prototype | Demonstrated on DOE systems; documentation in progress |
| WIP | Work in progress |
| Emerging | Active development; early adopters welcome |
| Early | Early stage; design and prototyping |
| Stage | Capability | What you can do | CAF Components | Where it runs | Scale | Status |
|---|---|---|---|---|---|---|
| 1 | Local Agent Execution | Run persistent, stateful agents | LangGraph | Laptop, workstation, VM | Single agent | Mature |
| 2 | Federated Agent Execution | Invoke tools under federated identity | LangGraph + Academy | DOE HPC | Multi-resource | Mature |
| 3 | Parallel Agent Inference | Fan out thousands of LLM requests | LangGraph + FIRST | HPC accelerators | O(10³–10⁴) streams | Prototype |
| 4 | Governed Tool Use | Invoke tools under policy enforcement | Academy governance | DOE HPC | O(10²–10³) tools | WIP |
| 5 | Multi-Agent Coordination | Coordinate agents under shared governance | Shared state + policy | Distributed | O(10²–10³) agents | Emerging |
| 6 | Long-Lived Agents | Persistent agents with memory and goals | Lifecycle management | Any | Days–months | Emerging |
| 7 | Agent Workflows | Dynamic workflow construction | Workflow integration | DOE infrastructure | Varies | Early |
Scale notation: O(10²) means "order of 100" (tens to hundreds), O(10³) means "order of 1,000" (hundreds to thousands), etc. These indicate typical operating ranges, not hard limits.
All examples support multiple LLM backends (OpenAI, FIRST, Ollama) and include a mock mode for testing without API keys.
| Example | Tech | Description | Key Pattern |
|---|---|---|---|
| AgentsCalculator | LangGraph | Minimal tool-calling agent | @tool decorator |
| AgentsRAG | LangGraph | Retrieval-augmented generation | Vector search |
| AgentsDatabase | LangGraph | Natural language data queries | Pandas integration |
| AgentsAPI | LangGraph | External API calls | PubChem REST API |
| AgentsConversation | LangGraph | Stateful conversation | Short/long-term memory |
| AgentsLangGraph | LangGraph | 5-agent pipeline | StateGraph orchestration |
| AgentsAcademyBasic | Academy | Minimal Academy example | Two-agent messaging |
| AgentsRemoteTools | Academy | Remote tool invocation | Coordinator + ToolProvider |
| AgentsPersistent | Academy | Persistent workflows | Checkpoint and resume |
| AgentsFederated | Academy | Federated collaboration | Cross-institutional (DOE labs) |
| AgentsAcademy | Academy | 5-agent pipeline | Agent-to-agent messaging |
| AgentsAcademyHubSpoke | Academy | Hub-and-spoke pattern | Central orchestrator |
| AgentsAcademyDashboard | Academy | Live progress dashboard | Rich TUI |
| AgentsHybrid | Both | Academy + LangGraph hybrid | Distributed LLM agents |
| Example | Tech | Description | Key Pattern |
|---|---|---|---|
| AgentsHPCJob | LangGraph | HPC job submission | Batch scheduler lifecycle |
| CharacterizeChemicals | Academy | Molecular properties | LLM-planned RDKit + xTB |
| Example | Tech | Description | Key Pattern |
|---|---|---|---|
| AgentsGovernedTools | LangGraph | Policy enforcement | Budget, rate limits, approval |
| Example | Tech | Description | Key Pattern |
|---|---|---|---|
| AgentsCoordination | LangGraph | Shared resources | Budget, blackboard, claims |
| Example | Tech | Description | Key Pattern |
|---|---|---|---|
| AgentsCheckpoint | LangGraph | Persistent workflows | Checkpoint/resume |
| Example | Tech | Description | Key Pattern |
|---|---|---|---|
| AgentsWorkflow | LangGraph | Dynamic DAG construction | Adaptive execution |
Thanks to Kyle Chard, Yadu Babuji, Ian Foster, Suman Raj, and others for material and feedback.