A Python command-line tool to automatically clean CSV files by removing empty rows, duplicate rows, and handling missing values.
- Remove Empty Rows: Automatically detects and removes completely empty rows from CSV files
- Remove Duplicates: Identifies and eliminates duplicate rows while preserving unique data
- Handle Missing Values: Offers multiple strategies for dealing with missing or empty cells:
remove: Remove rows containing any missing values (default)fill: Fill missing values with 'N/A'keep: Keep rows as-is without modification
- Python 3.6 or higher
- No external dependencies (uses only Python standard library)
No installation required! Just download the auto_csv_cleaner.py file and run it.
Clean a CSV file with default settings (removes empty rows, duplicates, and rows with missing values):
python auto_csv_cleaner.py input.csvThis will create a cleaned file named input_cleaned.csv in the same directory.
python auto_csv_cleaner.py input.csv output.csvRemove rows with missing values (default):
python auto_csv_cleaner.py input.csv --strategy removeFill missing values with 'N/A':
python auto_csv_cleaner.py input.csv --strategy fillKeep rows with missing values:
python auto_csv_cleaner.py input.csv --strategy keepInput CSV (messy.csv):
Name,Age,City
John,25,NYC
Jane,30,LA
,,
John,25,NYC
Bob,,Chicago
Command:
python auto_csv_cleaner.py messy.csv --strategy fillOutput CSV (messy_cleaned.csv):
Name,Age,City
John,25,NYC
Jane,30,LA
Bob,N/A,Chicago
The tool provides helpful statistics during cleaning:
- Original number of rows
- Rows after removing empty entries
- Rows after removing duplicates
- Rows after handling missing values
This tool was created for issue #1006 of the 100LinesOfPythonCode repository.
Free to use and modify!
Contributed to 100LinesOfPythonCode - Hacktoberfest 2025