I build open-source data quality and entity resolution tools in Python. Everything I ship lands on PyPI and the MCP Registry so it works out of the box with LLM agents.
A modular toolkit where each piece works standalone or chains together via GoldenPipe.
CSV / DB / API
│
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ GoldenCheck │───▶│ GoldenFlow │───▶│ GoldenMatch │
│ Validate │ │ Transform │ │ Resolve │
└─────────────┘ └─────────────┘ └─────────────┘
└──────────────────┬──────────────────┘
▼
GoldenPipe
(orchestrator)
| Project | Highlights | Downloads | |
|---|---|---|---|
| Resolve | GoldenMatch | 97.2% F1 on DBLP-ACM · 30 MCP tools · 10 A2A skills | |
| Validate | GoldenCheck | Zero-config profiling & drift detection · 19 MCP tools | |
| Transform | GoldenFlow | 76 transforms · DQBench Transform: 100/100 · 10 MCP tools | |
| Orchestrate | GoldenPipe | Chains Check → Flow → Match · 4 MCP tools |
Extensions & integrations
- goldencheck-action — GitHub Action for data validation in CI with PR comments
- goldencheck-types — Community semantic type definitions (healthcare, finance, e-commerce)
- goldenmatch-extensions — SQL extensions for Postgres (pgrx) and DuckDB
- goldenmatch-wallet-attribution — Entity resolution on 13M blockchain records across 10 sources
- goldenmatch-vuln-attribution — Entity resolution on 869K OSS vulnerability records across 15 sources
DQBench — The standard benchmark for data quality tools. 4 categories, 12 tiers, 161 tests. Used to score Golden Suite and compare against Great Expectations, Pandera, Soda Core, and others.
InferMap — Inference-driven schema mapping for Python & TypeScript. 7 scorers, domain dictionaries, cross-language parity (F1 0.84).
DevPilot — Dev server supervisor for AI coders. CLI + MCP server with 10 tools. Lifecycle management, health checks, crash recovery.



