Independent hostile audit. Cycle 1 of 3.
- src/lib.rs, src/main.rs, src/error.rs, src/cli.rs
- src/sample.rs, src/stats.rs, src/baseline.rs, src/config.rs, src/compare.rs
- src/parsers/{mod.rs,criterion.rs,csv.rs,gobench.rs,hyperfine.rs}
- src/report/{mod.rs,json.rs,text.rs,markdown.rs}
- README.md, Cargo.toml, ROUND_LOG.md, CHANGELOG.md
- tests/integration_*.rs (5 files)
- INF_SENTINEL saturation (
compare::compute_row) — verified:1e300sentinel finite, JSON never emitsnull, display layer collapses to±∞viaDISPLAY_INF_THRESHOLD = 1e100. Real-world conflict (a real 1e300 measurement) is impossible (1e300 ns ≈ 3e283 years). classify_exit_codedowncast —cli::runreturnsanyhow::Resultbut never calls.context(), sobenchdiff::Errorstays at the head of the anyhow chain anddowncast_ref::<BdError>()succeeds for every propagated error. All variants mapped to documented exit codes (2/3/4/5).MAX_WALK_DEPTH=32+ symlink rejection — bothparse_direntry and per-entry walk skip symlinks viasymlink_metadata().file_type().is_symlink(). Windows note: this also catches reparse points / junctions becauseis_symlink()returns true for any reparse with the symlink tag and junctions are skipped at the directory iteration level (we'd just recurse into them; the depth cap protects against cycles regardless).Config::toleranceper-entry validation — verified: walks every entry, accepts integer or float, rejects string / negative / non-finite.deny_unknown_fieldsat top level still rejects mystery keys (E2E verified — see below).glob_matchmiddle wildcard — verified:Bench*SkipmatchesBenchFooSkip,BenchSkip;**is not special-cased but degenerates correctly (empty middle chunks are skipped, equivalent to*).
- README determinism — markdown report uses literal
\n, notwriteln!, so output is LF on every platform. ✓ - Welch Satterthwaite df — fractional, not rounded.
two_sided_pusesdfdirectly viareg_incomplete_beta(x, df/2, 0.5). ✓ - Welch with n=1 —
welch_t_testreturnsInsufficientSamples,compute_rowcatches and emitsVerdict::Insufficient. ✓ - Hyperfine
times: null— serde_json rejects null→Vec, surfaced asError::Parse. Acceptable. - Criterion
meanwithoutstd_dev—EstimatesJson::std_devdefaults to None →sd=0.0→ 3 identical synthesized samples. ✓ - CSV BOM prefix — would parse the BOM-prefixed
namerow as data, fail withInvalidNumber. Sub-optimal but errors clearly. Not blocking. - CSV CRLF —
lines_crlfstrips trailing\r, integration testhandles_crlfcovers it. ✓ - CSV column case-sensitivity — header detection uses
to_ascii_lowercase(); case-insensitive. README example uses lowercase but parser accepts either. ✓ ignoreregex special chars — patterns are literal except*;Bench[0-9]+would only match the literal string. Documented as "no character classes". ✓- Multiple benchmarks same name across runs —
BTreeMapaggregation in csv/gobench parsers; criterion would emit twoBenchmarkentries (lastfind()wins in compare). Pre-existing limitation, not a regression. - Baseline overwrite —
Baseline::savecallsfs::write(truncate); no warning. Not documented as either way; not a bug. - Compare with baseline > current — missing rows reported as
MissingFromCurrent. ✓ - Compare with current > baseline — extra rows reported as
NewInCurrent. ✓ - JSON schema versioning —
Baselinecarriesschema: u32; load rejects non-1.DiffReportJSON has no version field. Not blocking — consumers can rely on field stability. - Unicode benchmark names — comfy-table handles wide chars; markdown
escape only touches
|/ backtick. ✓ - Exit code 3 vs 4 — Parse / UnknownFormat / Json / Toml / InvalidNumber → 3 (parse-class); Io / BaselineNotFound / InputTooLarge / LineTooLong → 4 (I/O-class). InvalidLabel / Config → 2 (usage). InsufficientSamples / EmptyInput / NonFinite → 5 (statistical). All match documented table.
--helpfor every subcommand — verified forsave(and clap auto-generates for the rest).- README code samples — quick-start commands E2E verified end-to-end via CSV save+compare; output matches doc shape, exit code 1 on regression.
main.rs printed the error via eprintln!("benchdiff error: {err:#}")
where {err:#} is anyhow's alternate-form formatter. That formatter walks
#[source] and appends each layer with : . But every benchdiff::Error
variant whose source is interesting (Io, Json, Toml) already embeds
{source} directly in its #[error("…: {source}")] Display. The result
was the same TOML/JSON/IO message printed twice, joined by an extra : :
benchdiff error: TOML parse error in "x.toml": TOML parse error at line 4 …
: TOML parse error at line 4 …
Fix: switch to eprintln!("benchdiff error: {err}") (non-alternate). Each
error variant's Display already contains the source text exactly once.
Comment added to explain why we deliberately drop :#.
Severity: cosmetic / UX; no functional impact, but confused error output is exactly the kind of thing CI users will paste into bug reports.
174 tests, all passing post-fix:
- 160 lib unit tests (sample, stats, baseline, config, compare, parsers, report, cli)
- 14 integration tests across 5 fixture-driven files
test result: ok. 160 passed; 0 failed; 0 ignored
test result: ok. 3 passed; 0 failed (integration_compare)
test result: ok. 3 passed; 0 failed (integration_criterion)
test result: ok. 2 passed; 0 failed (integration_csv)
test result: ok. 3 passed; 0 failed (integration_gobench)
test result: ok. 3 passed; 0 failed (integration_hyperfine)
saveCSV → baseline writtencompareCSV vs CSV → markdown table, regression flagged, exit 1- JSON output → strictly numeric (no nulls), well-formed
--configwith mystery top-level field → exit 3, single clean error line
1 LOW bug found, 1 fix applied. Cycle is NOT clean — a fresh agent must re-evaluate.