Stop iterating with the model after implementation. Define what you'd accept, run two commands, ship it.
# Claude Code (primary)
/plugin marketplace add doodledood/manifest-dev
/plugin install manifest-dev@manifest-dev-marketplace
# Gemini CLI — everything (skills, agents, hooks)
curl -fsSL https://raw.githubusercontent.com/doodledood/manifest-dev/main/dist/gemini/install.sh | bash
# OpenCode — everything (skills, agents, commands, plugin)
curl -fsSL https://raw.githubusercontent.com/doodledood/manifest-dev/main/dist/opencode/install.sh | bash
# Codex CLI — everything (skills, TOML stubs, rules, config)
curl -fsSL https://raw.githubusercontent.com/doodledood/manifest-dev/main/dist/codex/install.sh | bashThen use it:
/define <what you want to build>
/do <manifest-path>
# Or go end-to-end autonomously:
/auto <what you want to build>Two commands. /define interviews you and builds a manifest. /do executes it. That's the whole workflow. /auto chains both — define autonomously, auto-approve, execute — in a single command.
Control interview depth with --interview minimal|autonomous|thorough (default: thorough). Thorough asks everything. Minimal asks scope and high-impact items. Autonomous builds the manifest without asking, presents it for approval.
If you use zsh and want easy upgrade commands for the non-Claude distributions, add this to ~/.zshrc:
alias upgrade-manifest-dev-codex='curl -fsSL https://raw.githubusercontent.com/doodledood/manifest-dev/main/dist/codex/install.sh | bash'
alias upgrade-manifest-dev-gemini='curl -fsSL https://raw.githubusercontent.com/doodledood/manifest-dev/main/dist/gemini/install.sh | bash'
alias upgrade-manifest-dev-opencode='curl -fsSL https://raw.githubusercontent.com/doodledood/manifest-dev/main/dist/opencode/install.sh | bash'
alias upgrade-manifest-dev-all='upgrade-manifest-dev-codex && upgrade-manifest-dev-gemini && upgrade-manifest-dev-opencode'Then run source ~/.zshrc once. Future updates are just upgrade-manifest-dev-codex, upgrade-manifest-dev-gemini, upgrade-manifest-dev-opencode, or upgrade-manifest-dev-all for all three.
Uninstall uses the same entrypoints:
curl -fsSL https://raw.githubusercontent.com/doodledood/manifest-dev/main/dist/gemini/install.sh | bash -s -- uninstall
curl -fsSL https://raw.githubusercontent.com/doodledood/manifest-dev/main/dist/opencode/install.sh | bash -s -- uninstall
curl -fsSL https://raw.githubusercontent.com/doodledood/manifest-dev/main/dist/codex/install.sh | bash -s -- uninstallInstead of telling the AI how to build something, you tell it what you'd accept.
Say you need a login page. The old way: "use React Hook Form, validate with Zod, show inline errors, disable the button while submitting." You've made every design decision upfront. The manifest way: "invalid credentials show an error without clearing the password field" and "the form can't be submitted twice." You define the bar. The AI picks how to clear it. Automated verification confirms it did.
flowchart TD
A["/define 'task'"] --> B["Interview"]
B --> C["Manifest file"]
C --> D["/do manifest.md"]
D --> E{"For each Deliverable"}
E --> F["Satisfy ACs"]
F --> G["/verify"]
G -->|failures| H["Fix specific criterion"]
H --> G
G -->|all pass| I["/done"]
E -->|risk detected| J["Consult trade-offs, adjust approach"]
J -->|ACs achievable| E
J -->|stuck| K["/escalate"]
/define interviews you to surface what you actually want. The stuff you'd reject in a PR but wouldn't think to specify upfront. Then /do implements toward those acceptance criteria, flexible on how but not on what.
After each deliverable, /verify runs automated checks against every criterion. Failing checks say exactly what's wrong. The AI fixes what failed, only what failed, and the loop continues until everything passes or a blocker needs your attention.
Your first pass lands closer to done. Issues get caught by verification before you see them, and the fix loop handles cleanup without your involvement. Every acceptance criterion has been verified, and you know what was checked.
While one manifest executes, you can define the next. The define phase is where your judgment matters; the do-verify-fix phase runs on its own. Writing acceptance criteria also forces you to stay engaged with your own code, which matters when heavy AI usage starts making your codebase feel foreign.
Resist the urge to intervene during /do. It won't nail everything on the first pass. That's expected. You invested in define; let the loop run.
If you've burned out on the weekly "game-changing AI coding tool" cycle and just want something that works, this is for you. Experienced developers who care more about output quality than execution speed. People who've learned the hard way that AI-generated code needs guardrails more than cheerleading.
We build around how LLMs actually work, not how we wish they worked. That means investing upfront for better results, not optimizing for token cost or raw speed. If you're counting every cent per token or want the fastest possible output regardless of quality, this probably isn't your thing.
Everything below is reference. You don't need any of it to get started.
The problem this solves
You plan a feature with the agent. It implements. The code looks reasonable. Then you review it and half the things aren't how you'd want them: wrong error handling patterns, conventions ignored, edge cases skipped. You send it back. It fixes some things, breaks others. Two or three rounds later you're satisfied, but you've spent more time reviewing and iterating than you saved.
The models can code. But we're throwing them into deep water without defining what "done" actually means. So the review-iterate loop eats the productivity gains.
Manifest-dev front-loads that review energy into /define. You spell out acceptance criteria and invariants before implementation starts. The do phase becomes mechanical, and the output lands closer to what you'd accept as a reviewer.
Why this works (LLM first principles)
LLMs are goal-oriented pattern matchers trained through reinforcement learning, not general reasoners. Clear acceptance criteria play to that strength. Rigid step-by-step plans fall apart because neither you nor the model can predict every detail upfront. Acceptance criteria focus on outcomes and leave implementation open.
There's also the drift problem. Long sessions cause the model to lose track of earlier instructions. The manifest compensates with external state and verification that catches drift before it ships. And since LLMs can't express genuine uncertainty (they'll confidently produce broken code), the verify-fix loop doesn't rely on the AI knowing it failed. It relies on automated checks catching failures.
These are design constraints, and the workflow treats them that way.
Process Guidance and Approach
The manifest also supports Process Guidance and an initial Approach (architecture, execution order). These are exactly what they sound like: recommendations, not requirements. Hints to help the model make better decisions while it's still not AGI. The acceptance criteria are the contract; the guidance is optimization on top.
This is spec-driven development adapted for LLM execution. The manifest is a spec, but ephemeral: it drives one task, then the code is the source of truth. No spec maintenance. No drift.
Execution Modes
/do supports --mode efficient|balanced|thorough to control verification intensity. Default is thorough (current behavior). Only pass --mode when you explicitly want to trade verification depth for quota savings.
| Mode | What changes | When to use |
|---|---|---|
| thorough (default) | Full verification: all quality gates, all models inherit from session, unlimited parallelism and fix loops | Most tasks. Don't change unless you have a reason. |
| balanced | Same models, but limits parallelism (max 4 concurrent verifiers) and fix loops (max 2) | Long-running tasks where you want to limit concurrent quota burn |
| efficient | Uses haiku for criteria-checker, skips quality gate reviewers, sequential verification, max 1 fix loop | Quick iterations where speed matters more than verification depth |
Set per-execution: /do manifest.md --mode balanced
Set in manifest: add mode: balanced to the Intent & Context section.
Run /define and /do in separate sessions. The define session holds your intent; the do session holds implementation state. Keep both open.
When /do finishes and something's off: a missed edge case, a reviewer comment, a bug you didn't anticipate. Don't patch it ad hoc. Go back to the define session. Encode the issue as an acceptance criterion in the manifest. Then re-run /do against the updated manifest in the do session.
This closes the loop properly. The fix gets the same verification treatment as everything else. The manifest stays the single source of truth for what "done" means. And if something regresses on a later pass, the criterion catches it.
Example: You ship a login feature. A reviewer flags that error messages leak whether an email exists in the system.
- Define session: add
[AC-2.4] Authentication errors return a generic message regardless of whether the account existswith a verification method - Do session: run
/doagainst the updated manifest /verifyconfirms the fix. It will also catch it if it regresses in a future change.
Every round trip through the manifest grows your verification surface. Bug fixes and late requirements become checked criteria. The manifest accumulates what "done" means for this task, and nothing falls through because you fixed it outside the loop.
The do session doesn't need to remember the define conversation. The manifest is external state. Run /do in a fresh session after /define, or at minimum /compact before starting.
The interview classifies your task (Code, Writing, Document, Blog, Research) and loads task-specific guidance. It probes for your latent criteria, the standards you hold but wouldn't think to spell out. A manifest-verifier agent validates the manifest for gaps before output.
Example manifest
# Definition: User Authentication
## 1. Intent & Context
- **Goal:** Add password-based authentication to existing Express app
with JWT sessions. Users can register, log in, and log out.
- **Mental Model:** Auth is a cross-cutting concern. Security invariants
apply globally; endpoint behavior is per-deliverable.
## 2. Approach
- **Architecture:** Middleware-based auth with JWT stored in httpOnly cookies
- **Execution Order:** D1 (Model) → D2 (Endpoints) → D3 (Protected Routes)
- **Risk Areas:**
- [R-1] Session fixation if tokens not rotated | Detect: security review
- [R-2] Timing attacks on password comparison | Detect: constant-time check
- **Trade-offs:**
- [T-1] Simplicity vs Security → Prefer security (use bcrypt, not md5)
## 3. Global Invariants (The Constitution)
- [INV-G1] Passwords never stored in plaintext
```yaml
verify:
method: bash
command: "! grep -r 'password.*=' src/ | grep -v hash | grep -v test"
```
- [INV-G2] All auth endpoints rate-limited (max 5 attempts/minute)
```yaml
verify:
method: subagent
agent: general-purpose
model: inherit
prompt: "Verify rate limiting exists on /login and /register endpoints"
```
- [INV-G3] JWT secrets from environment, never hardcoded
```yaml
verify:
method: bash
command: "grep -r 'process.env.JWT' src/auth/"
```
## 4. Process Guidance (Non-Verifiable)
- [PG-1] Follow existing error handling patterns in the codebase
- [PG-2] Use established logging conventions
## 5. Known Assumptions
- [ASM-1] Express.js already configured | Default: true | Impact if wrong: Add setup step
- [ASM-2] PostgreSQL available | Default: true | Impact if wrong: Adjust migration
## 6. Deliverables (The Work)
### Deliverable 1: User Model & Migration
**Acceptance Criteria:**
- [AC-1.1] User model has id, email, hashedPassword, createdAt
```yaml
verify:
method: codebase
pattern: "User.*id.*email.*hashedPassword.*createdAt"
```
- [AC-1.2] Email has unique constraint
- [AC-1.3] Migration creates users table with indexes
### Deliverable 2: Auth Endpoints
**Acceptance Criteria:**
- [AC-2.1] POST /register creates user, returns 201
- [AC-2.2] POST /login validates credentials, returns JWT
- [AC-2.3] Invalid credentials return 401, not 500
```yaml
verify:
method: subagent
agent: code-bugs-reviewer
model: inherit
prompt: "Check auth routes return 401 for auth failures, not 500"
```| Section | Purpose | ID Scheme |
|---|---|---|
| Intent & Context | Goal and mental model | -- |
| Approach | Architecture, execution order, risks, trade-offs | R-{N}, T-{N} |
| Global Invariants | Task-level rules (task fails if violated) | INV-G{N} |
| Process Guidance | Non-verifiable recommendations for how to work | PG-{N} |
| Known Assumptions | Low-impact items with defaults | ASM-{N} |
| Deliverables | Ordered work items with acceptance criteria | AC-{D}.{N} |
Approach section is added for complex tasks with dependencies, risks, or architectural decisions.
Every criterion can have an automated verification method:
| Method | When to Use | Example |
|---|---|---|
bash |
Commands with deterministic output | npm run typecheck && npm run lint |
codebase |
Code pattern checks | Check file exists, pattern matches |
subagent |
LLM-as-judge for subjective criteria | Code quality, security review |
research |
External information lookup | API compatibility, version checks |
manual |
Human verification required | UI review, deployment checks |
# Bash verification
verify:
method: bash
command: "npm run test -- --coverage"
# Subagent verification with specialized reviewer
verify:
method: subagent
agent: code-maintainability-reviewer
model: inherit
prompt: "Review for DRY violations and coupling issues"
# Manual verification
verify:
method: manual
instructions: "Verify the login flow works in staging"The Claude Code plugin is the source of truth. Per-CLI distributions under dist/ provide native packages for other AI coding CLIs. Each has a one-command remote installer — run again to update, or pass uninstall to remove only manifest-dev-managed files.
| CLI | Install | Skills | Agents | Hooks | Details |
|---|---|---|---|---|---|
| Claude Code | /plugin install |
Full | Full | Full | Primary target |
| Gemini CLI | curl .../gemini/install.sh | bash |
Full | Full (converted) | Full (adapted) | README |
| OpenCode | curl .../opencode/install.sh | bash |
Full | Full (converted) | Partial (adapted plugin) | README |
| Codex CLI | curl .../codex/install.sh | bash |
Full | TOML stubs | Not available | README |
Keeping distributions in sync: After changing plugin components, run /sync-tools in Claude Code to regenerate dist/. The sync skill reads reference files with per-CLI conversion rules and produces the full distribution for each target.
| Plugin | Description |
|---|---|
manifest-dev |
Core manifest workflows: /define, /do, /verify, review agents, workflow hooks |
manifest-dev-collab |
Slack and GitHub team collaboration on define/do workflows via /slack-collab. Autonomous lead orchestrator spawns specialized and ad-hoc teammates (slack-coordinator, github-coordinator, manifest-define-worker, manifest-executor). Dynamic team composition, phase-anchored threading, verification hard gates. |
| Skill | Type | Description |
|---|---|---|
/define |
User-invoked | Interviews you, classifies task type, probes for latent criteria, outputs manifest with verification methods |
/do |
User-invoked | Executes against manifest. Follows execution order, watches for risks, logs progress for disaster recovery |
/auto |
User-invoked | End-to-end autonomous: /define --interview autonomous → auto-approve → /do. Supports --mode pass-through |
/verify |
Internal | Spawns parallel verifiers for all criteria. Routes to criteria-checker agents based on verification method |
/done |
Internal | Prints hierarchical completion summary mirroring manifest structure |
/escalate |
Internal | Structured escalation when blockers need human intervention. Requires evidence: 3+ attempts, failure reasons, hypothesis, resolution options |
/learn-define-patterns |
User-invoked | Analyzes recent /define sessions, extracts user preference patterns, writes them to CLAUDE.md for future /define sessions |
Built-in agents for quality verification via subagent method:
| Agent | Focus |
|---|---|
criteria-checker |
Core verifier: validates single criterion using bash/codebase/subagent/research methods |
manifest-verifier |
Validates manifest completeness during /define |
code-bugs-reviewer |
Race conditions, data loss, edge cases, logic errors, resource leaks |
code-maintainability-reviewer |
DRY violations, coupling, cohesion, dead code, consistency |
code-design-reviewer |
Design fitness: reinvented wheels, code vs configuration boundary, under-engineering, interface foresight, PR coherence |
code-simplicity-reviewer |
Over-engineering, premature optimization, cognitive complexity |
code-testability-reviewer |
Excessive mocking requirements, logic buried in IO, hidden dependencies |
code-coverage-reviewer |
Test coverage gaps in changed code |
type-safety-reviewer |
TypeScript type safety: any abuse, invalid states representable, narrowing issues |
docs-reviewer |
Documentation accuracy against code changes |
context-file-adherence-reviewer |
Compliance with context file (CLAUDE.md/AGENTS.md/GEMINI.md) project rules |
define-session-analyzer |
Analyzes a single /define session transcript for user preference patterns. Spawned by /learn-define-patterns |
Each reviewer returns structured output with severity levels (Critical, High, Medium, Low) and specific fix guidance.
Hooks enforce workflow integrity. The AI can't skip steps:
| Hook | Event | Purpose |
|---|---|---|
stop_do_hook |
Stop command | Blocks premature stopping. Can't stop without verification passing or proper escalation. |
post_compact_hook |
Session compaction | Restores /do workflow context after compaction. Reminds to re-read manifest and log. |
pretool_verify_hook |
/verify invocation |
Ensures manifest and log are in context before spawning verifiers. |
/define loads guidance based on task classification:
| Task Type | Guidance | Quality Gates |
|---|---|---|
| Feature | tasks/FEATURE.md + CODING.md |
Bug detection, type safety, maintainability, simplicity, test coverage, testability, CLAUDE.md adherence |
| Bug | tasks/BUG.md + CODING.md |
Bug fix verification, regression prevention, root cause analysis |
| Refactor | tasks/REFACTOR.md + CODING.md |
Behavior preservation, maintainability, simplicity |
| Prompting | tasks/PROMPTING.md |
Prompt quality criteria |
| Writing | tasks/WRITING.md |
Prose quality, AI tells, vocabulary, anti-patterns, craft fundamentals (base for Blog, Document) |
| Document | tasks/DOCUMENT.md + WRITING.md |
Structure completeness, consistency |
| Blog | tasks/BLOG.md + WRITING.md |
Engagement, SEO |
| Research | tasks/research/RESEARCH.md + source files |
Source-agnostic research methodology. Source-specific guidance in tasks/research/sources/ |
# Setup (first time)
./scripts/setup.sh
source .venv/bin/activate
# Lint, format, typecheck
ruff check --fix claude-plugins/ && black claude-plugins/ && mypy
# Test hooks (run after ANY hook changes)
pytest tests/hooks/ -v
# Test plugin locally
/plugin marketplace add /path/to/manifest-dev
/plugin install manifest-dev@manifest-dev-marketplaceSee CONTRIBUTING.md for plugin development guidelines.
MIT
Built by developers who understand LLM limitations, and design around them.
Follow along: @aviramkofman