Redact personal and business data from CSV files containing client ticket data. All processing happens locally — no data leaves your machine.
- Personal names
- Email addresses
- Phone numbers (international formats)
- Physical addresses and locations
- Organization / company names
- IP addresses
- Hostnames (server names like
srv-prod-01.internal) - MAC addresses
- Credit card numbers
- IBAN codes
- US Social Security Numbers
- URLs
The tool generates a new redacted CSV file. The original file is never modified. A redaction report is printed showing what was found.
Same values get the same placeholder (e.g. [PERSON-1]) so you can still
see patterns across rows without knowing the actual identity.
Requires Python 3.9+.
pip install .
python -m spacy download en_core_web_lgticket-redactor tickets.csv # → tickets_redacted.csv
ticket-redactor tickets.csv -o clean.csv # custom output path
ticket-redactor tickets.csv --dry-run # preview without writingOr without installing:
python -m ticket_redactor tickets.csvpip install -e ".[dev]"
pytest- Encoding: Reads UTF-8 (with or without BOM). Outputs UTF-8.
- CSV dialect: Auto-detected (comma, semicolon, tab, etc.)
- Performance: ~1000 rows/sec on a modern laptop.
- The tool intentionally errs on the side of over-redaction. It is better to redact something that looks like PII than to miss actual PII.