Small, focused CSV utilities for common data wrangling tasks.
csvsmith provides a handful of practical tools for working with CSV
files, including cleaning numeric values, filtering rows, deduplicating
records, classifying files, converting Excel spreadsheets to CSV, moving
files by suffix, and finding matches inside CSV content.
Read the full documentation at:
https://csvsmith.readthedocs.io/en/latest/
- Clean numeric strings into normalized values
- Filter CSV rows by substring matching
- Deduplicate row data and generate reports
- Classify CSV files into folders based on headers/signatures
- Convert Excel workbooks to CSV
- Move files by suffix
- Find matching values inside CSV files
- Use the tools either from Python or from the command line
Install the package in your environment as usual for your project setup.
Example:
pip install csvsmithIf you are developing locally, install it in editable mode from the project root:
pip install -e .You can use the library from Python:
from csvsmith.utils.clean_numeric import clean_currency_numeric
print(clean_currency_numeric("$1,234.56"))For command-line usage, use single quotes around values containing $:
csvsmith --helpThe package provides a CLI with several subcommands.
Clean numeric values:
csvsmith clean-numeric "1,234.56" --sep "," --decimal "."Clean currency-prefixed numeric values:
csvsmith clean-currency-numeric '$1,234.56' --sep "," --decimal "."Note
Use single quotes for values containing $. Double quotes may trigger
shell expansion and change the input unexpectedly.
Filter rows in a CSV:
csvsmith drop-rows input.csv notes spam --case-insensitive --drop-headerDeduplicate rows:
csvsmith dedupe input.csv -o out.csv --subset id --keep firstClassify CSV files:
csvsmith classify src_dir dst_dir --mode relaxed --match subset --auto --dry-runConvert Excel to CSV:
csvsmith excel2csv input.xlsxMove files by suffix:
csvsmith move-files src_dir dst_dir --suffixes .csv,.pdfFind matches in a CSV:
csvsmith find-matches input.csv target --ignore-case --ignore-whitespacefind_matches_in_csv searches a CSV file for a target value and returns
match records containing coordinates and row context information.
Python API:
from csvsmith import find_matches_in_csv
results = find_matches_in_csv("input.csv", "target")CLI:
csvsmith find-matches input.csv targetOptions:
--ignore-case: ignore case while matching--ignore-whitespace: ignore whitespace while matching--no-nfkc: disable NFKC normalization
If matches are found, the CLI prints formatted JSON. If no matches are found, it prints a simple message.
The package also exposes a few other helper functions and classes from its top-level API.
Numeric and row tools:
from csvsmith import (
clean_numeric,
count_duplicates_sorted,
add_row_digest,
find_duplicate_rows,
dedupe_with_report,
read_csv_rows,
write_csv_rows,
)CSV classification and filtering:
from csvsmith import CSVClassifier, DropRowsBySubstring, CSVCleanerFile and conversion helpers:
from csvsmith import excel_to_csv, move_by_suffixString comparison utilities:
from csvsmith import StringDistance, Relation, Result, analyze_pairThe code is organized into two main areas:
csvsmith.toolsfor higher-level CSV workflowscsvsmith.utilsfor reusable utility helpers
Run the test suite with your preferred Python test runner.
Example:
pytestSee the project license for details.