Statistical benchmark regression detector for Rust, Go, and hyperfine benchmarks.
Highlights
- Multi-format parsers: Criterion JSON (Rust), go test -bench text, hyperfine JSON, generic CSV
- Statistical comparison: Welch's two-sample t-test (unequal variance) matching scipy to 8 decimals, Cohen's d effect size, Benjamini-Hochberg FDR correction
- Baseline management: save/load/list labeled baselines with label validation (no path traversal)
- CI gating: exit codes (0 pass, 1 regression, 2/3/4/5 specific errors) with --allow-regressions tolerance
- Output formats: rich text table, JSON, GitHub-flavored markdown with emoji verdicts
- Safety: #![forbid(unsafe_code)], 256MB file cap, 1MB line cap, symlink rejection, depth-capped directory walks
- 180 tests (166 unit + 14 integration), mypy-equivalent zero-warning build
Install
cargo install benchdiff-rs
Usage
benchdiff save my-bench-output.json --label v1.0.0
benchdiff compare current.json --against v1.0.0 --output markdown
Hardened across 8 adversarial evaluation cycles before first release. First V1 Rust project.