Stickler Documentation

Stickler is a Python library for structured JSON comparison and evaluation, built for generative AI workflows. It uses specialized comparators, business-weighted scoring, and the Hungarian algorithm to tell you not just whether your AI output is accurate, but whether the errors actually matter.

Key Use Case: Key Information Extraction (KIE)

Generative AI models extract structured data from documents — invoices, forms, receipts, medical records. But how accurate is the extraction? And do the errors actually matter? Stickler answers both questions by comparing AI output against ground truth with field-level precision, business-weighted scoring, and optimal list matching.

from stickler import StructuredModel, ComparableField
from stickler.comparators import ExactComparator, NumericComparator

# 1. Define what "correct" looks like — each field gets its own comparator and weight
class Invoice(StructuredModel):
    invoice_id: str = ComparableField(comparator=ExactComparator(), weight=3.0)   # Must match perfectly, high weight
    total: float = ComparableField(comparator=NumericComparator(tolerance=0.01), weight=2.0)  # Allow rounding
    vendor: str = ComparableField(weight=1.0)  # Fuzzy text match by default

# 2. Compare ground truth vs AI prediction
gt = Invoice(invoice_id="INV-001", total=1250.00, vendor="Acme Corp")
pred = Invoice(invoice_id="INV-001", total=1250.00, vendor="ACME Corporation")
result = gt.compare_with(pred)

# 3. Get a single weighted score + per-field breakdown
print(f"Score: {result['overall_score']:.3f}")  # 0.786
print(result['field_scores'])
# {'invoice_id': 1.0, 'total': 1.0, 'vendor': 0.786}

How Stickler Works

Stickler uses a layered architecture where each layer builds on the one below it. Comparators handle primitive value comparison (exact, numeric, fuzzy, semantic). ComparableFields attach a comparator, threshold, and weight to each field. StructuredModels compose fields into nested, evaluation-aware data structures with Hungarian list matching. BulkStructuredModelEvaluator aggregates results across an entire test set.

graph TD
    A[BulkStructuredModelEvaluator] -->|iterates over document pairs| B[StructuredModel]
    B -->|contains weighted| C[ComparableField]
    C -->|delegates to| D[Comparator]
    D -->|returns 0.0 – 1.0| C
    C -->|threshold + clip| B
    B -->|weighted average + Hungarian matching| A

Getting Started

Installation, quick start, and your first evaluation in 30 seconds.

Get started
Comparators

Exact, numeric, fuzzy, semantic, and LLM-based comparison algorithms.

Choose a comparator
Evaluation

Thresholds, weights, clipping, JSON Schema config, and bulk evaluation.

Customize evaluation
Use Cases

Document extraction, OCR, entity extraction, ML evaluation, and ETL validation.

See patterns
Best Practices

Threshold tuning, SME calibration, weight assignment, and performance.

Learn more
Advanced

Hungarian algorithm internals, recursive engine, and custom comparators.

Go deeper
API Reference

Complete documentation for all classes, methods, and configuration.

Browse API
Contributing

Report issues, submit pull requests, and development setup.

Contribute