Calibration

Confidence Calibration

New in v0.3

By default, infermap's confidence scores are raw weighted averages of scorer outputs. They're useful for ranking but not calibrated probabilities — a 0.40 confidence doesn't mean "40% chance this mapping is correct." Calibration fixes that.

How it works

A calibrator is a function f: [0,1] → [0,1] that transforms raw confidence into calibrated probability. It's applied after the Hungarian assignment picks the mappings, so:

F1, top-1, and MRR are unchanged (same mappings, different numbers on them)
ECE (expected calibration error) drops dramatically (0.46 → 0.005 on Valentine)
Users can now trust confidence = p(correct) for downstream decisions

Available calibrators

Calibrator	Method	Best for
`IdentityCalibrator`	Passthrough (no-op)	Testing / baseline
`IsotonicCalibrator`	Pool-adjacent-violators (monotonic piecewise linear)	Most use cases
`PlattCalibrator`	Sigmoid fit via maximum likelihood	When you want a smooth parametric model

Usage

Python

from infermap import MapEngine
from infermap.calibration import IsotonicCalibrator, save_calibrator, load_calibrator
import numpy as np

# Step 1: Collect (confidence, correct) pairs on labeled data
engine = MapEngine()
scores = []
correct = []
for case in my_labeled_cases:
    result = engine.map_schemas(case.source, case.target)
    for m in result.mappings:
        scores.append(m.confidence)
        correct.append(1.0 if (m.source, m.target) in case.expected else 0.0)

# Step 2: Fit
cal = IsotonicCalibrator()
cal.fit(np.array(scores), np.array(correct))

# Step 3: Save for production
save_calibrator(cal, "calibrator.json")

# Step 4: Use in production
cal = load_calibrator("calibrator.json")
engine = MapEngine(calibrator=cal)
result = engine.map_schemas(source, target)
# result.mappings[0].confidence is now a calibrated probability

TypeScript

import { MapEngine, IsotonicCalibrator, saveCalibrator, loadCalibrator } from "infermap";

const cal = new IsotonicCalibrator();
cal.fit(scores, correct);

const json = saveCalibrator(cal);
// Store json, reload later:
const loaded = loadCalibrator(json);
const engine = new MapEngine({ calibrator: loaded });

Benchmark CLI

# Fit on Valentine corpus
infermap-bench calibrate --only category:valentine --method isotonic --output cal.json

# Run benchmark with calibrator
infermap-bench run --calibrator cal.json --output calibrated-report.json

Critical invariant

A calibrator never changes which mappings the engine picks. It only relabels the confidence on each picked mapping. This is enforced by the architecture (calibration runs after assignment) and verified by the test test_calibrator_does_not_change_mappings.

JSON format

Calibrators serialize as JSON with a kind discriminator:

{"kind": "isotonic", "x": [0.1, 0.2, ...], "y": [0.05, 0.15, ...]}
{"kind": "platt", "a": 2.5, "b": -1.0}
{"kind": "identity"}

Python-fitted calibrators can be loaded by the TypeScript engine and vice versa.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Calibration

Confidence Calibration

How it works

Available calibrators

Usage

Python

TypeScript

Benchmark CLI

Critical invariant

JSON format

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally