Agent Patterns
Lab is built for agents. These patterns show how agents use sandboxed code and saved results to do real work — not toy demos, but workflows where "it works" becomes a URL you can verify.
Every run saves a result. For successful runs, that result can also serve as the proof artifact, handoff artifact, or review artifact.
Prove It
The core pattern. An agent makes a claim — "this function handles edge cases." Instead of asserting, it proves.
How it works:
- Agent defines the edge cases and expected outputs
- Agent implements the function in a fresh sandbox and runs every case
- Final step asserts every result and returns a verdict
In a successful Prove It run, the result shows the cases, actual outputs, pass flags, and final verdict. That run's JSON is the proof artifact you can share.
const out = await lab.runChain([
{
name: "Spec",
body: `
return {
cases: [
{ input: "$1,234.56", expected: 1234.56 },
{ input: "free", expected: null },
{ input: null, expected: null },
],
};
`,
capabilities: []
},
{
name: "Implement + test",
body: `
function parseAmount(raw) {
if (!raw) return null;
const n = parseFloat(String(raw).replace(/[^\d.\-]/g, ""));
return Number.isNaN(n) ? null : Math.round(n * 100) / 100;
}
return input.cases.map((c) => {
const actual = parseAmount(c.input);
return { ...c, actual, pass: actual === c.expected };
});
`,
capabilities: []
},
{
name: "Verdict",
body: `
const results = input;
return {
verdict: results.every((r) => r.pass) ? "PASS" : "FAIL",
score: results.filter((r) => r.pass).length + "/" + results.length,
};
`,
capabilities: []
},
]);
// Agent's response: "Proof JSON: $LAB_URL/results/<resultId>.json"When to use: Before shipping generated code. Before claiming correctness. Whenever "it works" needs to become "here's the receipt."
Self-Healing Loop
An agent doesn't give up on first failure. It reads the error, diagnoses the problem, applies a fix, and retries — all within one run.
This follows the same principle as closed-loop control systems: measure the output, compare it to what you wanted, feed the error back in.
How it works:
- Step 1 attempts the operation (e.g., parse data)
- Step 2 reads the error and raw input, diagnoses what's wrong
- Step 3 applies a targeted fix based on the diagnosis
- Step 4 validates the repair
In a successful self-healing run, the result shows the failure path, the repair step, and the final outcome. Depending on where execution stops, failed runs may have partial step detail.
const out = await lab.runChain([
{
name: "Try parse",
body: `
try {
return { ok: true, data: JSON.parse(input.raw) };
} catch (e) {
return { ok: false, error: e.message, raw: input.raw };
}
`,
capabilities: []
},
{
name: "Diagnose",
body: `
if (input.ok) return input;
const issues = [];
if (input.raw.match(/\\w+:/)) issues.push("unquoted_keys");
if (input.raw.match(/,\\s*[}\\]]/)) issues.push("trailing_commas");
return {
...input,
issues,
strategy: "apply regex fixes",
};
`,
capabilities: []
},
{
name: "Heal",
body: `
if (input.ok) return input;
let fixed = input.raw;
if (input.issues.includes("unquoted_keys")) {
fixed = fixed.replace(/(\\w+):/g, '"$1":');
}
if (input.issues.includes("trailing_commas")) {
fixed = fixed.replace(/,\\s*([}\\]])/g, '$1');
}
return { ok: true, data: JSON.parse(fixed), healed: true };
`,
capabilities: []
},
]);When to use: Parsing unreliable data. Handling flaky APIs. Any situation where the first attempt might fail and the agent should adapt.
Agent Handoff
Multiple agents, one run. Agent A does research. Agent B synthesizes. Agent C produces the deliverable. No message queue, no shared database — output flows forward automatically.
This is a pipeline architecture: each stage transforms data and passes it to the next.
How it works:
- Each "agent" is a step in the pipeline
- Agent A's output becomes Agent B's input automatically
- Agent C's output is the final result
- In a successful handoff run, the saved result shows what each step contributed
const out = await lab.runChain([
{
name: "Agent A: Research",
body: `
return {
findings: [
{ source: "API", data: { users: 1200, active: 890 } },
{ source: "DB", data: { users: 1198, active: 887 } },
],
note: "Sources roughly agree. Minor discrepancy.",
};
`,
capabilities: []
},
{
name: "Agent B: Reconcile",
body: `
const avg = {
users: Math.round(
input.findings.reduce((s, f) => s + f.data.users, 0) / input.findings.length
),
active: Math.round(
input.findings.reduce((s, f) => s + f.data.active, 0) / input.findings.length
),
};
return {
reconciled: avg,
sources: input.findings.length,
confidence: "high",
};
`,
capabilities: []
},
{
name: "Agent C: Report",
body: `
return {
summary: input.reconciled.active + " active users of " + input.reconciled.users + " total",
confidence: input.confidence,
sources: input.sources,
};
`,
capabilities: []
},
]);When to use: Multi-step workflows where different agents (or different prompts/models) handle different phases. Research-then-synthesize. Gather-then-validate. Plan-then-execute.
Canary Deploy
Old logic vs new logic, same inputs. See exactly what changed before you ship.
Named after the canary release strategy in software deployment, but applied to code an agent wrote.
How it works:
- Step 1 generates test inputs (edge cases, real data)
- Step 2 runs the old logic against all inputs
- Step 3 runs the new logic against the same inputs
- Step 4 diffs the outputs
In a successful canary run, the result shows the inputs where v1 and v2 disagree. An agent (or human) reviews the diffs and decides: ship it, fix it, or roll back.
When to use: Refactoring. Upgrading a dependency. Any time "it should behave the same" needs verification.
Stress Test
You wrote a skill, a prompt, or a pipeline. It works once. But does it work reliably? And does it work because your instructions are clear, or because the model is smart enough to guess what you meant?
This applies the same idea as minimax in game theory: assume the worst case and see if things still hold. In games, that means assuming your opponent plays perfectly. Here, it means assuming the model is as dumb as possible.
How it works:
- Run your pipeline N times, collect results
- Check: did every run produce the same (correct) output?
- If not, read the failing results — they show exactly where the ambiguity hit
- Fix the instructions, run again
- Once it's reliable, try with a smaller/cheaper model to find the floor
const lab = createLabClient({ baseUrl: process.env.LAB_URL });
const results = [];
for (let i = 0; i < 10; i++) {
const out = await lab.runChain([
{
name: "Generate",
body: `
// your skill/prompt logic here
return parseUserInput("$1,234.56");
`,
capabilities: []
},
{
name: "Verify",
body: `
const expected = 1234.56;
return {
run: ${i + 1},
output: input,
pass: input === expected,
};
`,
capabilities: []
},
]);
results.push({
run: i + 1,
pass: out.result?.pass,
resultId: out.resultId,
});
}
const passed = results.filter(r => r.pass).length;
console.log(`${passed}/10 passed`);
// Failed runs have saved-result JSON — read it to see where the output diverged
results.filter(r => !r.pass).forEach(r =>
console.log(`Run ${r.run} failed: $LAB_URL/results/${r.resultId}.json`)
);You can extend this with an evolutionary approach — generate multiple candidate implementations, evaluate each against the same test cases, select the winner by score. One run = one generation. Feed the winner back as the starting point for the next round. This is the same idea behind genetic algorithms: mutate, evaluate, select, repeat.
When to use: Before you trust a skill in production. Before you optimize for cost by switching to a smaller model. Any time "it worked once" isn't good enough.
Combining Patterns
The real power is composition. An agent might:
- Prove It to verify code handles edge cases
- Stress Test to make sure it's reliable, not lucky
- Hand off the results to a review agent
- Compare the new logic against the old before shipping
Each run saves a result. Successful runs can link together into a reviewable story for an agent or human to follow.
Using Patterns from Any Agent
Lab doesn't care what agent calls it. Claude Code, a custom script, a GitHub Action, a cron job — if it can make HTTP requests or use the TypeScript client, it can use these patterns.
TypeScript client:
npm install @acoyfellow/labMCP (Claude Desktop, Cursor):
{
"mcpServers": {
"lab": {
"command": "npx",
"args": ["-y", "@acoyfellow/lab-mcp"],
"env": { "LAB_URL": "https://your-lab.example" }
}
}
}CLI:
npx @acoyfellow/lab-cli chain '[{"body":"return 1+1","capabilities":[]}]'Raw HTTP:
curl -X POST $LAB_URL/run/chain \
-H 'Content-Type: application/json' \
-d '{"steps":[{"body":"return 1+1","capabilities":[]}]}'The resultId in the response identifies the artifact. Agents read /results/:id.json. Humans open /results/:id. The rest is transport.