-
-
Notifications
You must be signed in to change notification settings - Fork 0
Calibration
New in v0.3
By default, infermap's confidence scores are raw weighted averages of scorer outputs. They're useful for ranking but not calibrated probabilities — a 0.40 confidence doesn't mean "40% chance this mapping is correct." Calibration fixes that.
A calibrator is a function f: [0,1] → [0,1] that transforms raw confidence into calibrated probability. It's applied after the Hungarian assignment picks the mappings, so:
- F1, top-1, and MRR are unchanged (same mappings, different numbers on them)
- ECE (expected calibration error) drops dramatically (0.46 → 0.005 on Valentine)
- Users can now trust
confidence = p(correct)for downstream decisions
| Calibrator | Method | Best for |
|---|---|---|
IdentityCalibrator |
Passthrough (no-op) | Testing / baseline |
IsotonicCalibrator |
Pool-adjacent-violators (monotonic piecewise linear) | Most use cases |
PlattCalibrator |
Sigmoid fit via maximum likelihood | When you want a smooth parametric model |
from infermap import MapEngine
from infermap.calibration import IsotonicCalibrator, save_calibrator, load_calibrator
import numpy as np
# Step 1: Collect (confidence, correct) pairs on labeled data
engine = MapEngine()
scores = []
correct = []
for case in my_labeled_cases:
result = engine.map_schemas(case.source, case.target)
for m in result.mappings:
scores.append(m.confidence)
correct.append(1.0 if (m.source, m.target) in case.expected else 0.0)
# Step 2: Fit
cal = IsotonicCalibrator()
cal.fit(np.array(scores), np.array(correct))
# Step 3: Save for production
save_calibrator(cal, "calibrator.json")
# Step 4: Use in production
cal = load_calibrator("calibrator.json")
engine = MapEngine(calibrator=cal)
result = engine.map_schemas(source, target)
# result.mappings[0].confidence is now a calibrated probabilityimport { MapEngine, IsotonicCalibrator, saveCalibrator, loadCalibrator } from "infermap";
const cal = new IsotonicCalibrator();
cal.fit(scores, correct);
const json = saveCalibrator(cal);
// Store json, reload later:
const loaded = loadCalibrator(json);
const engine = new MapEngine({ calibrator: loaded });# Fit on Valentine corpus
infermap-bench calibrate --only category:valentine --method isotonic --output cal.json
# Run benchmark with calibrator
infermap-bench run --calibrator cal.json --output calibrated-report.jsonA calibrator never changes which mappings the engine picks. It only relabels the confidence on each picked mapping. This is enforced by the architecture (calibration runs after assignment) and verified by the test test_calibrator_does_not_change_mappings.
Calibrators serialize as JSON with a kind discriminator:
{"kind": "isotonic", "x": [0.1, 0.2, ...], "y": [0.05, 0.15, ...]}
{"kind": "platt", "a": 2.5, "b": -1.0}
{"kind": "identity"}Python-fitted calibrators can be loaded by the TypeScript engine and vice versa.
- Accuracy Benchmark — how ECE is measured
- Example: 10_calibration.py