Canonical store of methodologies: META + sections + rules (+ tools, overrides, tests, core).
For a working example of the file layout and content, see
docs/examples/TEMPLATE_METHOD.
See RULESET.md for conventions and CI guardrails.
Sector naming note: The internal slug
Forestryin paths such asmethodologies/UNFCCC/Forestryrefers to UNFCCC sector 14, “Afforestation and reforestation”. We keep the slug stable for hashes and reproducibility; human-facing docs and models should use the full sector label.
- Data-first methodologies
- Audit-ready hashes
- Open references
- Reproducible scripts
- CI guardrails
- sections.json -> META.audit_hashes.sections_json_sha256
- rules.json -> META.audit_hashes.rules_json_sha256
- tools//**/* -> META.references.tools[]
- scripts/** and core/** -> scripts_manifest.json
- Edit methodology content or scripts.
- Run
./scripts/hash-all.shto refresh digests. - Commit the changes.
- CI validates JSON, schemas, and registry consistency.
- In
batches/*.links.txt, the first N links must be UNFCCC “DB/.../view.html” entrypoints (N = number of codes). - Any remaining links are treated as optional extra assets (tools PDFs, glossary, FileStorage attachments).
- Run
npm run validate:batches(andscripts/ingest-full.shruns it automatically).
- The recommended environment is the Linux devcontainer defined in
.devcontainer/. - Use VS Code’s “Reopen in Container” (or GitHub Codespaces) to get Node 20, poppler-utils, qpdf, jq, yq, git-lfs, and build-essential preinstalled.
- See
docs/devcontainer.mdfor full instructions. Native macOS/Windows setups remain optional but aren’t required for ingest work.
./scripts/hash-all.sh— updatesMETA.audit_hashes,META.automation, andscripts_manifest.json.npm run validate:rich— ensures every rich JSON conforms to the schemas before deriving lean files.npm run validate:lean— validates all leanMETA.json,sections.json, andrules.jsonartifacts.npm run validate:guardrails— enforces forestry rule coverage, sequential IDs, and tool provenance consistency../scripts/check-registry.sh— confirmsregistry.jsonmirrors the methodologies tree.
All four commands must complete without diffs or errors before opening a pull request. Capture any new evidence files under outputs/mvp/ and include screenshots or logs referenced in the change summary.
-
Section retrieval (BM25):
- Build dataset:
npm run dataset:sections - Evaluate:
npm run eval:sections:bm25 - Example metrics (current corpus):
acc@1≈0.6250,mrr@5≈0.7333
- Build dataset:
-
Parameter/units extraction (TF‑IDF/Linear):
- Build dataset:
npm run dataset:params - Evaluate:
npm run eval:params:linear - Example metrics:
variables micro‑F1≈0.6364,units micro‑F1≈0.9091
- Build dataset:
-
Rule-type labels (manual curation):
-
Primary annotations:
datasets/rule_type/rules.csv(method_tag,anchor,text,label) -
Rule metadata:
datasets/rule_type/rules_meta.csv(rule_id,rule_type,notes) -
Allowed categories listed in
datasets/rule_type/labels.yaml -
Update flow: edit CSVs →
./scripts/hash-all.sh→ run validators. -
Categories defined in
datasets/rule_type/labels.yaml
-
main
-
Update flow: edit CSV →
./scripts/hash-all.sh→ run validators. -
CLI retrieval wrapper:
- Installable bin:
mrv-cli - Usage:
npm run cli:query -- "<query text>" --k 5 - Prints top‑K rules with sections, summary, and refs (tool kind/path/sha256 lifted from META).
- Installable bin:
-
HTTP engine adapter:
- Installable bin:
http-engine-adapter - Start locally:
npm run server:http -- --port 3030 - POST
http://<host>:<port>/querywith{ "query": "forest leakage" }(optionaltop_k≤ 50). - Replies deterministically with BM25-ranked rules across AR-AMS0003 and AR-AMS0007 plus audit hashes (rules/sections/tool refs).
- Metrics logging: every request prints
requests=<n> p95_ms=<latency>; setENGINE_METRICS_LOG=/path/to/file.logto append to disk.
- Installable bin:
-
Serverless endpoint (Vercel-style
/api/query):- GET
/api/query?text=forest+leakage[&top_k=5]for ad-hoc checks. - POST
/api/querywith{ "query": "forest leakage", "top_k": 5 }for structured calls. - Delegates to the same deterministic BM25 engine to keep outputs aligned with the CLI/HTTP adapter.
- GET
-
Health check:
- GET
/api/healthz→{ "status": "ok", "documents": 26 }(document count driven by corpus size).
- GET
Determinism
- Fixed BM25 params and TF‑IDF/Linear hyperparameters; stable ordering and splits.
- Dataset files recorded in
datasets_manifest.jsonwith SHA‑256.
See also: docs/baselines-cli.md for quickstart commands, expected metrics, and how to verify determinism locally.
Use node scripts/check-source-hash.js to verify that all META.references.tools[*] entries exist and match their recorded SHA-256. This avoids assumptions about folder layout and treats META as the source of truth for tool paths.
- JSON UTF-8, LF, 2 spaces.
- Do not delete evidence; supersede only.
- registry.json mirrors
/methodologies.
This structure is normative. Changes require a "Stable Tree vX" section and CI update.