Community-contributed semantic type definitions for GoldenCheck.
Domain packs teach GoldenCheck about industry-specific column types, improving detection accuracy and reducing false positives.
| Domain | Types | Description |
|---|---|---|
| healthcare | 10 | NPI, ICD codes, insurance IDs, patient demographics, CPT, DRG |
| finance | 8 | Account numbers, routing numbers, CUSIP/ISIN, currency, transactions |
| ecommerce | 9 | SKUs, order IDs, tracking numbers, categories, shipping |
goldencheck scan data.csv --domain healthcarecurl -o goldencheck_domain.yaml https://raw.githubusercontent.com/benzsevern/goldencheck-types/main/domains/telecom.yaml
goldencheck scan data.csvUse the install_domain tool to browse and install community domains.
- Fork this repo
- Add a YAML file in
domains/following the format below - Open a PR — CI validates your YAML automatically
description: "Short description of the domain"
types:
my_type:
name_hints: ["column_name_hint", "another_hint"]
value_signals:
min_unique_pct: 0.90 # optional: minimum uniqueness
max_unique: 20 # optional: maximum unique values
format_match: "email" # optional: regex format
mixed_case: true # optional: mixed case values
avg_length_min: 15 # optional: minimum average string length
short_strings: true # optional: short string values
numeric: true # optional: numeric values
suppress: ["pattern_consistency", "cardinality"] # checks to suppress- Plain string: substring match (
"npi"matchesnpi_number,provider_npi) - Ending with
_: prefix-only match ("is_"matchesis_activebut NOTdiagnosis) - Starting with
_: suffix-only match ("_id"matchespatient_id)
uniqueness, nullability, format_detection, type_inference, range_distribution, cardinality, pattern_consistency, temporal_order, encoding_detection, sequence_detection, drift_detection
MIT