Skip to content

Calibration

bsevern edited this page Apr 10, 2026 · 1 revision

Confidence Calibration

New in v0.3

By default, infermap's confidence scores are raw weighted averages of scorer outputs. They're useful for ranking but not calibrated probabilities — a 0.40 confidence doesn't mean "40% chance this mapping is correct." Calibration fixes that.

How it works

A calibrator is a function f: [0,1] → [0,1] that transforms raw confidence into calibrated probability. It's applied after the Hungarian assignment picks the mappings, so:

  • F1, top-1, and MRR are unchanged (same mappings, different numbers on them)
  • ECE (expected calibration error) drops dramatically (0.46 → 0.005 on Valentine)
  • Users can now trust confidence = p(correct) for downstream decisions

Available calibrators

Calibrator Method Best for
IdentityCalibrator Passthrough (no-op) Testing / baseline
IsotonicCalibrator Pool-adjacent-violators (monotonic piecewise linear) Most use cases
PlattCalibrator Sigmoid fit via maximum likelihood When you want a smooth parametric model

Usage

Python

from infermap import MapEngine
from infermap.calibration import IsotonicCalibrator, save_calibrator, load_calibrator
import numpy as np

# Step 1: Collect (confidence, correct) pairs on labeled data
engine = MapEngine()
scores = []
correct = []
for case in my_labeled_cases:
    result = engine.map_schemas(case.source, case.target)
    for m in result.mappings:
        scores.append(m.confidence)
        correct.append(1.0 if (m.source, m.target) in case.expected else 0.0)

# Step 2: Fit
cal = IsotonicCalibrator()
cal.fit(np.array(scores), np.array(correct))

# Step 3: Save for production
save_calibrator(cal, "calibrator.json")

# Step 4: Use in production
cal = load_calibrator("calibrator.json")
engine = MapEngine(calibrator=cal)
result = engine.map_schemas(source, target)
# result.mappings[0].confidence is now a calibrated probability

TypeScript

import { MapEngine, IsotonicCalibrator, saveCalibrator, loadCalibrator } from "infermap";

const cal = new IsotonicCalibrator();
cal.fit(scores, correct);

const json = saveCalibrator(cal);
// Store json, reload later:
const loaded = loadCalibrator(json);
const engine = new MapEngine({ calibrator: loaded });

Benchmark CLI

# Fit on Valentine corpus
infermap-bench calibrate --only category:valentine --method isotonic --output cal.json

# Run benchmark with calibrator
infermap-bench run --calibrator cal.json --output calibrated-report.json

Critical invariant

A calibrator never changes which mappings the engine picks. It only relabels the confidence on each picked mapping. This is enforced by the architecture (calibration runs after assignment) and verified by the test test_calibrator_does_not_change_mappings.

JSON format

Calibrators serialize as JSON with a kind discriminator:

{"kind": "isotonic", "x": [0.1, 0.2, ...], "y": [0.05, 0.15, ...]}
{"kind": "platt", "a": 2.5, "b": -1.0}
{"kind": "identity"}

Python-fitted calibrators can be loaded by the TypeScript engine and vice versa.

See also

Clone this wiki locally