Goal: reproducible ladder of 8 increasingly complex datasets + runner + standardized outputs.
- Deliverables: datasets/ladder/dataset01.yaml…dataset08.yaml, runner notebook, results layout, baseline report for v0.2.1 tag.
- Success criteria: ladder runs end-to-end; baseline established; used as acceptance gate.
- Dependencies: should not be blocked by priors work; it’s an evaluation tool and can proceed early.
Goal: reproducible ladder of 8 increasingly complex datasets + runner + standardized outputs.