Ratchet

Agentic optimization loop with high watermarking. Ratchet uses Claude to iteratively improve a single file (the "lever") against a scorer, accepting only changes that beat the current high watermark. Like a ratchet wrench — it only moves forward.

How It Works

┌─────────┐     ┌───────┐     ┌────────┐     ┌──────────┐
│  Agent   │────▶│ Lever │────▶│ Scorer │────▶│ Watermark│
│ (Claude) │     │ (file)│     │ (sh)   │     │  check   │
└─────────┘     └───────┘     └────────┘     └──────────┘
     ▲                                            │
     │            ✓ kept (score > watermark)       │
     └────────────────────────────────────────────┘
                  ✗ discarded (rollback)

Each iteration, Claude proposes a single targeted change to the lever file. The scorer evaluates the result and produces a numeric score. If the score exceeds the current watermark by at least --min-delta, the change is kept and the watermark ratchets up. Otherwise, the change is rolled back. Progress is logged and snapshots are saved for every accepted iteration.

Installation

Prerequisites

Bun v1.0+
An Anthropic API key

From Source

git clone [email protected]:hackerrdave/ratchet.git
cd ratchet
bun install

Run directly with Bun:

bun run src/cli.ts <command>

Or compile to a standalone binary:

bun run build
./dist/ratchet <command>

Cross-Platform Builds

bun run build:all
# Produces:
#   dist/ratchet-macos    (darwin-arm64)
#   dist/ratchet-linux    (linux-x64)
#   dist/ratchet-windows.exe

Configuration

Environment Variables

Variable	Required	Description
`ANTHROPIC_API_KEY`	Yes	Your Anthropic API key

`ratchet init`

Run ratchet init in your project root to set everything up interactively. It walks you through:

Scorer type — how you'll measure improvement:
- Objective — a test suite, build script, latency benchmark, or any command that outputs a score
- Labeled — you label example pairs (match/no-match), Ratchet builds a scorer from them
- Live signal — a metrics endpoint or DB query that returns a live score
- LLM judge — uses a model to evaluate outputs (weakest anchor — scores may drift)
Goal — one sentence describing what you're optimizing for
Lever — the file path Claude is allowed to modify (e.g., prompts/classifier.md)
Constraints — rules Claude must follow (e.g., "Must remain under 2000 tokens")
Context files — additional files Claude should read for background

Named Ratchets

You can run multiple independent ratchets in the same project using --name:

ratchet init --name classifier
ratchet init --name system-prompt
ratchet init --name config

ratchet start --name classifier
ratchet watch --name system-prompt
ratchet log --name config

Each named ratchet gets its own spec, scorer, watermark, and history. Omitting --name uses default.

Generated Files

Path	Purpose
`ratchet/<name>/RATCHET.md`	Optimization spec — goal, lever, constraints, context
`ratchet/<name>/scorer.sh`	Shell script that outputs a single float (higher = better)
`ratchet/<name>/watermark.txt`	Current high watermark score
`ratchet/<name>/progress.log`	JSONL log of every iteration
`ratchet/<name>/learnings.md`	Tactical learnings extracted from runs
`ratchet/<name>/best/`	Latest best version of the lever
`ratchet/<name>/snapshots/`	Snapshot of the lever at each kept iteration

`RATCHET.md`

The generated spec file looks like this:

# Goal
Improve prompt accuracy for product-recall matching

# Lever
The file at prompts/classifier.md is the only thing you may change.
One targeted improvement per iteration. Do not rewrite wholesale.

# Constraints
- Must remain under 2000 tokens
- Must not remove the routing section

# Context
- Read TOPOLOGY.md for codebase structure
- The scorer measures how well the lever achieves the goal above

`scorer.sh`

Your scorer must output a single float to stdout. Higher is better. Example:

#!/bin/bash
# Run tests and report pass rate
TOTAL=$(bun test 2>&1 | grep -oP '\d+ pass' | grep -oP '\d+')
FAILED=$(bun test 2>&1 | grep -oP '\d+ fail' | grep -oP '\d+')
echo "scale=4; $TOTAL / ($TOTAL + $FAILED)" | bc

Usage

All commands accept --name <name> to target a specific ratchet. Defaults to default.

Start an Optimization Run

# Run 20 iterations with defaults
ratchet start

# Target a specific ratchet
ratchet start --name classifier

# Customize the run
ratchet start --name classifier --iterations 50 --min-delta 0.01 --model claude-haiku-4-5-20251001

Option	Default	Description
`--name <name>`	`default`	Which ratchet to run
`-n, --iterations <n>`	`20`	Number of iterations to run
`--min-delta <delta>`	`0.001`	Minimum score improvement to accept a change
`--model <model>`	`claude-haiku-4-5-20251001`	Claude model to use
`--schedule <cron>`	—	Cron schedule for recurring runs (coming soon)

Output looks like:

Starting ratchet loop (classifier)
  Lever: prompts/classifier.md
  Model: claude-haiku-4-5-20251001
  Iterations: 20
  Min delta: 0.001
  Current watermark: 0.72

[1/20] Running agent... scoring... ✓ kept  score=0.7400 (+0.0200) — Added few-shot example for edge case
[2/20] Running agent... scoring... ✗ disc  score=0.7350 (-0.0050) — Restructured output format
[3/20] Running agent... scoring... ✓ kept  score=0.7650 (+0.0250) — Clarified matching criteria
...

Done. 8 kept, 12 discarded. Final watermark: 0.8420

Watch Live Progress

ratchet watch --name classifier

Displays a live staircase chart in the terminal showing score progression, watermark level, and accept/discard history. Refreshes every 2 seconds.

View Iteration History

ratchet log --name classifier

   #  Status      Score     Delta  Timestamp                 Summary
────────────────────────────────────────────────────────────────────────────
   1  ✓ kept     0.7400   +0.0200  2026-03-10T12:00:00.000Z  Added few-shot example
   2  ✗ disc     0.7350   -0.0050  2026-03-10T12:01:30.000Z  Restructured output format
   3  ✓ kept     0.7650   +0.0250  2026-03-10T12:03:00.000Z  Clarified matching criteria

3 iterations, 2 kept, 1 discarded

Compare Iterations

# Diff the lever between two kept iterations
ratchet diff 1 3 --name classifier

Shows a unified diff of the lever file between any two snapshots.

Inspect a Specific Iteration

ratchet show 3 --name classifier

Restore a Previous State

# Roll the lever back to iteration 3's state
ratchet checkout 3 --name classifier

View Learnings

After each run, Ratchet extracts tactical learnings from the progress log — what worked, what didn't, and why. These are fed back to the agent on subsequent runs.

ratchet learnings --name classifier

## What Works
- **Few-shot examples outperform explicit rules** — iterations 1,3 tried verbose instructions
  and scored lower; iteration 4 added a single worked example and jumped +0.0234

## What Doesn't
- **Explicit classification criteria degrade accuracy** — prescriptive rules overconstrain the model

## Tactics
- Use 2-3 worked examples covering diverse edge cases
- Preserve JSON-only output constraint in all changes

Learnings are stored in ratchet/<name>/learnings.md and updated after each run.

Pause and Resume

# Pause a running optimization (checked between iterations)
ratchet pause --name classifier

# Resume where you left off
ratchet resume --name classifier

Example: Optimizing a Classifier Prompt

# 1. Set up your project
mkdir my-project && cd my-project
echo "Classify the product..." > prompts/classifier.md

# 2. Write a scorer
cat > scorer.sh << 'EOF'
#!/bin/bash
python3 eval.py prompts/classifier.md | tail -1
EOF
chmod +x scorer.sh

# 3. Initialize ratchet
export ANTHROPIC_API_KEY="sk-ant-..."
ratchet init --name classifier
# → Select "Objective" scorer type
# → Goal: "Improve classification accuracy"
# → Lever: prompts/classifier.md

# 4. Run the optimization loop
ratchet start --name classifier --iterations 30

# 5. Watch it climb
ratchet watch --name classifier

Project Structure

your-project/
├── prompts/
│   └── classifier.md           # The lever (file being optimized)
└── ratchet/
    ├── classifier/
    │   ├── RATCHET.md          # Optimization spec
    │   ├── scorer.sh           # Scoring script
    │   ├── watermark.txt       # Current high watermark
    │   ├── progress.log        # JSONL iteration history
    │   ├── learnings.md          # Tactical learnings from runs
    │   ├── best/
    │   │   └── classifier.md   # Best lever state so far
    │   └── snapshots/
    │       ├── 1/
    │       ├── 3/
    │       └── ...             # One per kept iteration
    └── system-prompt/
        ├── RATCHET.md
        ├── scorer.sh
        └── ...                 # Another independent ratchet

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
examples		examples
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ratchet

How It Works

Installation

Prerequisites

From Source

Cross-Platform Builds

Configuration

Environment Variables

`ratchet init`

Named Ratchets

Generated Files

`RATCHET.md`

`scorer.sh`

Usage

Start an Optimization Run

Watch Live Progress

View Iteration History

Compare Iterations

Inspect a Specific Iteration

Restore a Previous State

View Learnings

Pause and Resume

Example: Optimizing a Classifier Prompt

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ratchet

How It Works

Installation

Prerequisites

From Source

Cross-Platform Builds

Configuration

Environment Variables

ratchet init

Named Ratchets

Generated Files

RATCHET.md

scorer.sh

Usage

Start an Optimization Run

Watch Live Progress

View Iteration History

Compare Iterations

Inspect a Specific Iteration

Restore a Previous State

View Learnings

Pause and Resume

Example: Optimizing a Classifier Prompt

Project Structure

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`ratchet init`

`RATCHET.md`

`scorer.sh`

Packages