You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+30-4Lines changed: 30 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,11 +45,37 @@ Key features include:
45
45
46
46
-**Intelligent Profiling**: Detect missing values, skewed distributions, outliers, and data type inconsistencies.
47
47
-**ML-Specific Checks**: Identify data leakage, dataset drift, class imbalance, and high-cardinality features.
48
-
-**Automated Preparation**: Get suggestions for encoding, imputation, scaling, and transformations, and optionally apply them automatically.
49
-
-**Rich Reporting**: Generate statistical summaries and exportable reports for collaboration.
50
-
-**Production-Ready Pipelines**: Output reproducible cleaning and preprocessing code that integrates seamlessly with ML workflows.
48
+
-**Automated Preparation**: Get suggestions for encoding, imputation, scaling, and transformations.
49
+
-**Rich Reporting**: Generate statistical summaries and exportable reports (HTML/PDF/Markdown/JSON) with embedded visualizations.
50
+
-**Production-Ready Pipelines**: Output reproducible cleaning and preprocessing code (`fixes.py`) that integrates seamlessly with ML workflows.
51
+
-**Modern Themes**: Choose between "Minimal" (professional) and "Neubrutalism" (bold) report styles.
51
52
52
-
HashPrep turns dataset debugging into a guided, automated process - saving time, improving model reliability, and standardizing best practices across teams.
53
+
---
54
+
55
+
## Usage
56
+
57
+
### 1. Quick Scan
58
+
Get a quick summary of critical issues in your terminal.
59
+
```bash
60
+
hashprep scan dataset.csv
61
+
```
62
+
63
+
### 2. Generate Report
64
+
Generate a comprehensive HTML report with visualizations.
65
+
```bash
66
+
hashprep report dataset.csv --format html --theme minimal
67
+
```
68
+
69
+
**Options:**
70
+
-`--theme`: `minimal` (default) or `neubrutalism`
71
+
-`--format`: `html`, `pdf`, `md`, or `json`
72
+
-`--no-visualizations`: Disable plot generation for faster performance.
73
+
74
+
### 3. Generate Fixes
75
+
Automatically generate a Python script (`dataset_fixes.py`) to apply suggested fixes.
0 commit comments