Stickler Documentation
Stickler is a Python library for structured JSON comparison and evaluation, built for generative AI workflows. It uses specialized comparators, business-weighted scoring, and the Hungarian algorithm to tell you not just whether your AI output is accurate, but whether the errors actually matter.
Key Use Case: Key Information Extraction (KIE)
Generative AI models extract structured data from documents — invoices, forms, receipts, medical records. But how accurate is the extraction? And do the errors actually matter? Stickler answers both questions by comparing AI output against ground truth with field-level precision, business-weighted scoring, and optimal list matching.
from stickler import StructuredModel, ComparableField
from stickler.comparators import ExactComparator, NumericComparator
# 1. Define what "correct" looks like — each field gets its own comparator and weight
class Invoice(StructuredModel):
invoice_id: str = ComparableField(comparator=ExactComparator(), weight=3.0) # Must match perfectly, high weight
total: float = ComparableField(comparator=NumericComparator(tolerance=0.01), weight=2.0) # Allow rounding
vendor: str = ComparableField(weight=1.0) # Fuzzy text match by default
# 2. Compare ground truth vs AI prediction
gt = Invoice(invoice_id="INV-001", total=1250.00, vendor="Acme Corp")
pred = Invoice(invoice_id="INV-001", total=1250.00, vendor="ACME Corporation")
result = gt.compare_with(pred)
# 3. Get a single weighted score + per-field breakdown
print(f"Score: {result['overall_score']:.3f}") # 0.786
print(result['field_scores'])
# {'invoice_id': 1.0, 'total': 1.0, 'vendor': 0.786}
How Stickler Works
Stickler uses a layered architecture where each layer builds on the one below it. Comparators handle primitive value comparison (exact, numeric, fuzzy, semantic). ComparableFields attach a comparator, threshold, and weight to each field. StructuredModels compose fields into nested, evaluation-aware data structures with Hungarian list matching. BulkStructuredModelEvaluator aggregates results across an entire test set.
graph TD
A[BulkStructuredModelEvaluator] -->|iterates over document pairs| B[StructuredModel]
B -->|contains weighted| C[ComparableField]
C -->|delegates to| D[Comparator]
D -->|returns 0.0 – 1.0| C
C -->|threshold + clip| B
B -->|weighted average + Hungarian matching| A
-
Getting Started
Installation, quick start, and your first evaluation in 30 seconds.
-
Comparators
Exact, numeric, fuzzy, semantic, and LLM-based comparison algorithms.
-
Evaluation
Thresholds, weights, clipping, JSON Schema config, and bulk evaluation.
-
Use Cases
Document extraction, OCR, entity extraction, ML evaluation, and ETL validation.
-
Best Practices
Threshold tuning, SME calibration, weight assignment, and performance.
-
Advanced
Hungarian algorithm internals, recursive engine, and custom comparators.
-
API Reference
Complete documentation for all classes, methods, and configuration.
-
Contributing
Report issues, submit pull requests, and development setup.