Open Source Data Tools

Ben Severn

Building tools that check, transform, match, and map data. All open source. All production-grade.

GitHub Stars
Total Downloads
0
Tests Passing

Benchmark Highlight

0.971 F1 on Febrl

GoldenMatch hit 0.971 F1 on Febrl and 0.918 on DBLP-ACM — the most consistent performer across datasets. Zero training data required.

Read the full comparison →

Projects

5 packages on PyPI

GoldenCheck

try →

Validate & profile data quality

GoldenFlow

try →

Transform & standardize data

GoldenMatch

try →

Deduplicate & match records

GoldenPipe

try →

Orchestrate the full pipeline

infermap

try →

Map messy columns to target schemas

playground

Interactive Playground

Try it on your data

Upload a CSV, configure matching fields, adjust thresholds in real-time, and see scored pairs instantly. No signup required.

Upload CSV
Configure fields
See results
Open Playground

Pipeline

How It Compares

Full benchmark →
LibraryApproachFebrl F1DBLP-ACM F1Training Data
GoldenMatchConfig-driven weighted scoring0.9710.918None required
SplinkFellegi-Sunter EM0.9980.728Unsupervised
DedupeActive learning0.9280.734Interactive labeling
RecordLinkageIndexer + Classify0.8450.923Optional

Same datasets, same machine, best reasonable config per library. Full methodology →

Latest from the Blog

All posts →