Run agent code.
Get proof it worked.
An agent writes code. Lab runs it in a Cloudflare sandbox and saves a result at a URL. Successful runs include full step data. Failed or aborted runs include the error and reason; per-step detail may be partial.
Write a function → run tests → prove it works
const out = await lab.runChain([
{
name: "Spec",
body: `
return {
cases: [
{ input: "$1,234.56", expected: 1234.56 },
{ input: "free", expected: null },
{ input: null, expected: null },
],
};
`,
capabilities: []
},
{
name: "Implement + test",
body: `
function parseAmount(raw) {
if (!raw) return null;
const n = parseFloat(String(raw).replace(/[^\\d.\\-]/g, ""));
return Number.isNaN(n) ? null : Math.round(n * 100) / 100;
}
return input.cases.map((c) => {
const actual = parseAmount(c.input);
return { ...c, actual, pass: actual === c.expected };
});
`,
capabilities: []
},
{
name: "Verdict",
body: `
const passed = input.filter((r) => r.pass).length;
return {
score: passed + "/" + input.length,
verdict: passed === input.length ? "PASS" : "FAIL",
};
`,
capabilities: []
},
]);
// → saved result JSON proves 3/3 pass. Ship the receipt, not "trust me."Every run gets a saved result
Successful runs include step data, inputs, outputs, and timing. Every run saves canonical JSON for agents and a viewer URL for humans.
Agent A finishes and produces a resultId. Agent B reads /results/:id.json and continues the work. No message queue, no shared database.
When something fails, the saved result includes the error and reason. Depending on where execution stopped, step detail may be partial. Agents use that result to retry or debug. You use it to understand what happened.
Every run saves JSON and a viewer URL. Send /results/:id to a teammate, or feed /results/:id.json to another agent.
Patterns
Agent writes a function, generates edge cases, runs them all. 10/10 pass — the saved result JSON shows the cases, assertions, and verdict.
A run fails. The saved result includes the error. The agent uses that result to patch the code and rerun.
Agent A researches. Agent B synthesizes. Agent C delivers. Each one picks up where the last left off — one saved result ties the run together.
Old logic vs new logic, same inputs. See exactly what changed before you deploy.
Run it 10 times. If run 7 breaks, your instructions are ambiguous. Try with a dumber model to find the floor.
Get started
npm install @acoyfellow/lab