The ZeroEval CLI ships with the Python SDK and gives you terminal access to monitoring, prompts, judges, and optimization. It is designed to be agent-friendly and works well in CI pipelines, scripted workflows, and agent toolchains.
Setup
Run the interactive setup to save your API key and resolve your project:
This opens the ZeroEval dashboard, prompts for your API key, saves it to your shell config (e.g. ~/.zshrc, ~/.bashrc), and links your project automatically.
For non-interactive environments like CI or coding agents, use auth set instead:zeroeval auth set --api-key-env ZEROEVAL_API_KEY
Auth commands
zeroeval auth set --api-key <key> # Set API key directly
zeroeval auth set --api-key-env MY_KEY_VAR # Read API key from an env var
zeroeval auth set --api-base-url <url> # Override API base URL
zeroeval auth show --redact # Show current config (key masked)
zeroeval auth clear # Wipe all stored config
zeroeval auth clear --api-key-only # Clear only the API key and project
Auth resolution
The CLI resolves credentials in this order:
- Explicit CLI flags (
--project-id, --api-base-url)
- Environment variables (
ZEROEVAL_API_KEY, ZEROEVAL_PROJECT_ID, ZEROEVAL_BASE_URL)
- Global CLI config file
Config file location:
- macOS / Linux:
~/.config/zeroeval/config.json (or $XDG_CONFIG_HOME/zeroeval/config.json)
- Windows:
%APPDATA%/zeroeval/config.json
Global flags
Global flags must appear before the subcommand:
zeroeval --output json --project-id <id> --timeout 30.0 traces list
| Flag | Default | Description |
|---|
--output text|json | text | Output format. json emits stable JSON to stdout, errors to stderr. |
--project-id | env / config | Project context. Required for monitoring, prompts, judges, and optimization commands. |
--api-base-url | https://api.zeroeval.com | Override the API URL. |
--quiet | off | Suppress non-essential output. |
--timeout | 20.0 | HTTP request timeout in seconds. |
Output modes
text (default) — human-readable; dict/list payloads are pretty-printed as JSON.
json — stable, machine-readable JSON to stdout. Errors go to stderr as structured JSON. Confirmation prompts (e.g. optimize promote) are auto-skipped in JSON mode.
Exit codes
| Code | Meaning |
|---|
0 | Success |
2 | User / validation error |
3 | Auth or permission error |
4 | Remote API or network error |
Querying and filtering
Most list and get commands support --where, --select, and --order for client-side filtering, projection, and sorting.
--where
Filter rows. Repeatable — multiple clauses are AND-ed.
# Exact match
zeroeval judges list --where "name=Quality Check"
# Substring match (case-insensitive)
zeroeval traces list --where "status~completed"
# Set membership
zeroeval spans list --where 'kind in ["llm","tool"]'
Supported operators: = (exact), ~ (substring), in (JSON array).
--select
Project specific fields. Comma-separated, supports dotted paths.
zeroeval judges list --select "id,name,evaluation_type"
--order
Sort results by a field. Defaults to ascending.
zeroeval traces list --order "created_at:desc"
Monitoring
Monitoring commands require a --project-id (resolved automatically after zeroeval setup).
Sessions
zeroeval sessions list --start-date 2025-01-01 --end-date 2025-02-01 --limit 50
zeroeval sessions get <session_id>
| Flag | Description |
|---|
--start-date | ISO date string lower bound |
--end-date | ISO date string upper bound |
--limit | Max results (default 50) |
--offset | Pagination offset (default 0) |
Traces
zeroeval traces list --start-date 2025-01-01 --limit 50
zeroeval traces get <trace_id>
zeroeval traces spans <trace_id> --limit 100
traces spans returns the spans belonging to a specific trace, useful for debugging individual requests.
Spans
zeroeval spans list --start-date 2025-01-01 --limit 50
zeroeval spans get <span_id>
Prompts
List and inspect
zeroeval prompts list
zeroeval prompts get <prompt_slug>
zeroeval prompts get <prompt_slug> --version 3
zeroeval prompts get <prompt_slug> --tag production
zeroeval prompts versions <prompt_slug>
zeroeval prompts tags <prompt_slug>
Submit feedback
Provide feedback on a prompt completion for DSPy optimization and prompt tuning:
zeroeval prompts feedback create \
--prompt-slug customer-support \
--completion-id <uuid> \
--thumbs-up \
--reason "Clear and helpful response"
For scored judges, add judge-specific fields:
zeroeval prompts feedback create \
--prompt-slug customer-support \
--completion-id <uuid> \
--thumbs-down \
--judge-id <judge_uuid> \
--expected-score 3.5 \
--score-direction too_high \
--reason "Score should be lower"
--thumbs-up and --thumbs-down are mutually exclusive and one is required.
Judges
List and inspect
zeroeval judges list
zeroeval judges get <judge_id>
zeroeval judges criteria <judge_id>
zeroeval judges evaluations <judge_id> --limit 100
zeroeval judges insights <judge_id>
zeroeval judges performance <judge_id>
zeroeval judges calibration <judge_id>
zeroeval judges versions <judge_id>
Filter evaluations
zeroeval judges evaluations <judge_id> \
--start-date 2025-01-01 \
--end-date 2025-02-01 \
--evaluation-result true \
--feedback-state pending \
--limit 200
| Flag | Description |
|---|
--evaluation-result | true or false |
--feedback-state | Filter by feedback state |
--start-date / --end-date | Date range |
Create a judge
zeroeval judges create \
--name "Tone Check" \
--prompt "Evaluate whether the response maintains a professional tone." \
--evaluation-type binary \
--sample-rate 1.0 \
--temperature 0.0
Or load the prompt from a file:
zeroeval judges create \
--name "Quality Scorer" \
--prompt-file judge_prompt.txt \
--evaluation-type scored \
--score-min 0 \
--score-max 10 \
--pass-threshold 7
| Flag | Default | Description |
|---|
--name | required | Judge name |
--prompt | — | Inline prompt text (mutually exclusive with --prompt-file) |
--prompt-file | — | Path to file containing the prompt |
--evaluation-type | binary | binary or scored |
--score-min | 0.0 | Minimum score (scored only) |
--score-max | 10.0 | Maximum score (scored only) |
--pass-threshold | — | Pass threshold (scored only) |
--sample-rate | 1.0 | Fraction of spans to evaluate |
--backfill | 100 | Number of existing spans to backfill |
--tag | — | Tag filter in key=value1,value2 format. Repeatable. |
--tag-match | all | all or any |
--target-prompt-id | — | Scope judge to a specific prompt |
--temperature | 0.0 | LLM temperature for judge |
Submit judge feedback
zeroeval judges feedback create \
--span-id <span_uuid> \
--thumbs-down \
--reason "Missed safety issue" \
--expected-output "Should flag harmful content"
For scored judges with per-criterion feedback:
zeroeval judges feedback create \
--span-id <span_uuid> \
--thumbs-down \
--expected-score 2.0 \
--score-direction too_high \
--criteria-feedback '{"clarity": {"expected_score": 1.0, "reason": "Confusing response"}}'
Optimization
Start, inspect, and promote prompt or judge optimization runs. All optimization commands require --project-id.
Prompt optimization
zeroeval optimize prompt list <task_id>
zeroeval optimize prompt get <task_id> <run_id>
zeroeval optimize prompt start <task_id> --optimizer-type quick_refine
zeroeval optimize prompt promote <task_id> <run_id> --yes
Judge optimization
zeroeval optimize judge list <judge_id>
zeroeval optimize judge get <judge_id> <run_id>
zeroeval optimize judge start <judge_id> --optimizer-type dspy_bootstrap
zeroeval optimize judge promote <judge_id> <run_id> --yes
| Flag | Default | Description |
|---|
--optimizer-type | quick_refine | quick_refine, dspy_bootstrap, or dspy_gepa |
--config | — | JSON string of extra optimizer configuration |
--yes | off | Skip the confirmation prompt (also skipped in --output json mode) |
Spec (machine-readable manual)
The spec commands dump the CLI’s command and parameter contract as JSON or Markdown, useful for agents and toolchains that need to discover available commands programmatically.
zeroeval spec cli --format json
zeroeval spec command "judges create" --format markdown
CI / automation recipes
Get the latest traces as JSON
zeroeval --output json traces list --limit 10 --order "created_at:desc"
Check judge pass rate
zeroeval --output json judges evaluations <judge_id> \
--evaluation-result true --limit 1000 \
--select "id" | jq length
zeroeval --output json optimize prompt promote <task_id> <run_id> --yes