v0.0.3 — experimental (here be dragons)

Run agent code.
Get proof it worked.

An agent writes code. Lab runs it in a Cloudflare sandbox and saves a result at a URL. Successful runs include full step data. Failed or aborted runs include the error and reason; per-step detail may be partial.

Open Compose browse examples →

Write a function → run tests → prove it works

const out = await lab.runChain([
  {
    name: "Spec",
    body: `
      return {
        cases: [
          { input: "$1,234.56", expected: 1234.56 },
          { input: "free", expected: null },
          { input: null, expected: null },
        ],
      };
    `,
    capabilities: []
  },
  {
    name: "Implement + test",
    body: `
      function parseAmount(raw) {
        if (!raw) return null;
        const n = parseFloat(String(raw).replace(/[^\\d.\\-]/g, ""));
        return Number.isNaN(n) ? null : Math.round(n * 100) / 100;
      }

      return input.cases.map((c) => {
        const actual = parseAmount(c.input);
        return { ...c, actual, pass: actual === c.expected };
      });
    `,
    capabilities: []
  },
  {
    name: "Verdict",
    body: `
      const passed = input.filter((r) => r.pass).length;
      return {
        score: passed + "/" + input.length,
        verdict: passed === input.length ? "PASS" : "FAIL",
      };
    `,
    capabilities: []
  },
]);
// → saved result JSON proves 3/3 pass. Ship the receipt, not "trust me."

All 6 patterns →

Every run gets a saved result

Proof, not promises

Successful runs include step data, inputs, outputs, and timing. Every run saves canonical JSON for agents and a viewer URL for humans.

Agents can pick up where others left off

Agent A finishes and produces a resultId. Agent B reads /results/:id.json and continues the work. No message queue, no shared database.

Debugging built in

When something fails, the saved result includes the error and reason. Depending on where execution stopped, step detail may be partial. Agents use that result to retry or debug. You use it to understand what happened.

Shareable

Every run saves JSON and a viewer URL. Send /results/:id to a teammate, or feed /results/:id.json to another agent.

Patterns

10/10

Code mode — but verified

Agent writes a function, generates edge cases, runs them all. 10/10 pass — the saved result JSON shows the cases, assertions, and verdict.

Fix

Auto-fix pipeline

A run fails. The saved result includes the error. The agent uses that result to patch the code and rerun.

URL

Multi-agent relay

Agent A researches. Agent B synthesizes. Agent C delivers. Each one picks up where the last left off — one saved result ties the run together.

Diff

Compare before you ship

Old logic vs new logic, same inputs. See exactly what changed before you deploy.

Stress test

Run it 10 times. If run 7 breaks, your instructions are ambiguous. Try with a dumber model to find the floor.

All 6 patterns → Browse runnable examples →

Get started

npm install @acoyfellow/lab

Tutorial

2 minutes to your first run

Docs

API, permissions, reference

Self-host

Your agents, your data