Skip to content

Prajwal18py/SMART-CSV-HEALTH-CHECKER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Smart CSV Health Checker AI

Data Quality. Diagnosed in Seconds.

Streamlit Python Supabase License

Stars Forks Watchers


The most powerful AI-driven CSV data quality analyzer

Upload β†’ Analyze β†’ Fix β†’ Export β€” All in seconds!

πŸš€ Live Demo β€’ πŸ“– Documentation β€’ πŸ› Report Bug β€’ ✨ Request Feature


πŸ’‘ Try it now: https://smart-csv-health-checker.streamlit.app/


🌟 Why Smart CSV Health Checker?

Traditional Tools Smart CSV Health Checker
❌ Manual inspection βœ… AI-powered auto-detection
❌ Hours of work βœ… Results in seconds
❌ Miss hidden issues βœ… Finds complex anomalies
❌ No fix suggestions βœ… One-click auto-fix
❌ Basic statistics βœ… Deep profiling & PCA
❌ No code export βœ… Export Python code

✨ Features

πŸ€– AI-Powered Analysis

  • πŸ” Intelligent Anomaly Detection

    • Isolation Forest algorithm
    • MICE imputation analysis
    • Statistical outlier detection
  • πŸ“Š Smart Data Profiling

    • Auto column type detection
    • Pattern recognition
    • Correlation analysis
  • 🎯 Health Score System

    • Overall data quality grade (A-F)
    • Issue severity classification
    • Actionable recommendations
  • 🧬 Deep Learning Insights

    • Hidden pattern discovery
    • Data relationship mapping
    • Predictive quality metrics

πŸ“‹ 10 Powerful Tabs

Tab Feature Description
πŸ“‹ Overview Quick summary with health score, issue breakdown, and key metrics
🧠 AI Deep Dive Advanced ML-powered anomaly detection and insights
πŸ› οΈ Fix Data One-click fixes for missing values, outliers, and formatting
πŸ”§ Pipeline Build custom data cleaning pipelines
πŸ“Š Visualizations Interactive charts, distributions, and heatmaps
πŸ“‰ PCA Analysis Dimensionality reduction and component analysis
πŸ’» Code Export Get Python code for all transformations
πŸ”’ Deep Profile PII detection and sensitive data scanning
πŸ“ˆ Compare Side-by-side dataset comparison
🎲 Synthetic Data Generate realistic test data

πŸ” Secure Authentication

πŸ”’ Enterprise-Grade Security
β”œβ”€β”€ πŸ“§ Email/Password Authentication
β”œβ”€β”€ ✨ User Registration with Verification
β”œβ”€β”€ πŸ”‘ Secure Password Reset
β”œβ”€β”€ πŸ‘€ User Profile Management
└── πŸšͺ Session Management

Powered by Supabase β€” Enterprise-grade authentication and database.


🎨 Beautiful UI/UX

  • πŸŒ™ Dark Mode β€” Easy on the eyes
  • ✨ Glassmorphism Design β€” Modern and sleek
  • πŸ“± Responsive β€” Works on any device
  • 🎭 Animated Elements β€” Smooth interactions
  • 🎨 Gradient Accents β€” Professional look

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • pip package manager

Installation

# 1. Clone the repository
git clone https://github.com/Prajwal18py/SMART-CSV-HEALTH-CHECKER.git

# 2. Navigate to directory
cd SMART-CSV-HEALTH-CHECKER

# 3. Create virtual environment
python -m venv venv

# 4. Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# 5. Install dependencies
pip install -r requirements.txt

# 6. Run the app
streamlit run app.py

Environment Setup

Create .streamlit/secrets.toml:

[supabase]
url = "your-supabase-url"
key = "your-supabase-anon-key"

πŸ“Έ Screenshots

πŸ” Login Page

Login Page

πŸ“Š Dashboard Overview

Dashboard Overview

🧠 AI Deep Dive Analysis

AI Analysis

πŸ“ˆ Interactive Visualizations

Visualization 1 Visualization 2 Visualization 3

✨ And Many More Features!

These are just a few highlights. Explore the full app to discover:

  • πŸ”§ Custom Data Pipelines
  • πŸ“‰ PCA Analysis
  • πŸ’» Code Export
  • πŸ”’ Deep Profiling with PII Detection
  • πŸ“ˆ Dataset Comparison
  • 🎲 Synthetic Data Generation

πŸš€ Try the Live App

## πŸ—οΈ **Project Structure**
smart-csv-health-checker/
β”‚
β”œβ”€β”€ πŸ“„ app.py                    # Main application entry point
β”‚
β”œβ”€β”€ πŸ“ auth/                     # Authentication module
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ auth_functions.py        # Supabase auth functions
β”‚   └── login.py                 # Login page UI
β”‚
β”œβ”€β”€ πŸ“ core/                     # Core functionality
β”‚   β”œβ”€β”€ analysis.py              # AI analysis engine
β”‚   β”œβ”€β”€ data_loader.py           # CSV loading & validation
β”‚   └── type_detection.py        # Column type detection
β”‚
β”œβ”€β”€ πŸ“ tabs/                     # Application tabs
β”‚   β”œβ”€β”€ tab_overview.py          # Overview tab
β”‚   β”œβ”€β”€ tab_ai_deep_dive.py      # AI analysis tab
β”‚   β”œβ”€β”€ tab_fix_data.py          # Data fixing tab
β”‚   β”œβ”€β”€ tab_pipeline.py          # Pipeline builder
β”‚   β”œβ”€β”€ tab_visualizations.py    # Charts & graphs
β”‚   β”œβ”€β”€ tab_pca.py               # PCA analysis
β”‚   β”œβ”€β”€ tab_code.py              # Code export
β”‚   β”œβ”€β”€ tab_deep_profile.py      # Deep profiling
β”‚   β”œβ”€β”€ tab_compare.py           # Dataset comparison
β”‚   └── tab_synthetic.py         # Synthetic data
β”‚
β”œβ”€β”€ πŸ“ ui/                       # UI components
β”‚   β”œβ”€β”€ layout.py                # Page layout
β”‚   β”œβ”€β”€ styles.py                # Custom CSS
β”‚   └── sidebar.py               # Sidebar component
β”‚
β”œβ”€β”€ πŸ“ database/                 # Database module
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ db_functions.py          # Database operations
β”‚   └── schema.sql               # Database schema
β”‚
β”œβ”€β”€ πŸ“ config/                   # Configuration
β”‚   └── supabase_config.py       # Supabase client
β”‚
β”œβ”€β”€ πŸ“ .streamlit/               # Streamlit config
β”‚   └── secrets.toml             # API keys (gitignored)
β”‚
β”œβ”€β”€ πŸ“„ requirements.txt          # Python dependencies
β”œβ”€β”€ πŸ“„ README.md                 # This file
└── πŸ“„ LICENSE                   # MIT License

πŸ› οΈ Tech Stack

Category Technologies
Frontend Streamlit
Backend Python
Database Supabase PostgreSQL
ML/AI Scikit-learn Pandas
Visualization Plotly
Auth Supabase Auth

πŸ“Š Analysis Capabilities

Data Quality Checks

βœ… Missing Value Detection     βœ… Duplicate Row Detection
βœ… Outlier Identification      βœ… Data Type Validation
βœ… Format Consistency          βœ… Range Validation
βœ… Pattern Anomalies           βœ… Correlation Analysis
βœ… PII Detection               βœ… Statistical Profiling

Supported Data Types

πŸ“Š Numeric      β†’ int, float, currency, percentage
πŸ“ Text         β†’ string, categorical, free-text
πŸ“… DateTime     β†’ date, time, datetime, timestamp
βœ‰οΈ Identifiers  β†’ email, phone, ID, UUID
🌐 Web          β†’ URL, IP address, domain
πŸ“ Location     β†’ address, coordinates, postal code

🎯 Use Cases

πŸ‘¨β€πŸ’Ό Data Analysts

  • Quick data quality assessment
  • Automated reporting
  • Export insights to stakeholders

πŸ‘©β€πŸ”¬ Data Scientists

  • Feature engineering prep
  • Anomaly investigation
  • Dataset validation

πŸ‘¨β€πŸ’» Developers

  • API data validation
  • Test data generation
  • Code snippet export

🀝 Contributing

Contributions are what make the open source community amazing! Any contributions you make are greatly appreciated.

# 1. Fork the Project
# 2. Create your Feature Branch
git checkout -b feature/AmazingFeature

# 3. Commit your Changes
git commit -m 'Add some AmazingFeature'

# 4. Push to the Branch
git push origin feature/AmazingFeature

# 5. Open a Pull Request

πŸ“œ License

Distributed under the MIT License. See LICENSE for more information.

MIT License

Copyright (c) 2026 Prajwal.A

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

πŸ™ Acknowledgments


πŸ“ž Contact & Support

Created with ❀️ by Prajwal.A

GitHub LinkedIn Email


⭐ Star this repo if you found it helpful!

Stars


About

Enterprise-grade CSV data quality analyzer powered by Machine Learning. Automatic anomaly detection, statistical profiling, PII scanning, and actionable insights. Secure user authentication, custom data pipelines, and interactive dashboards. Production-ready SaaS application.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors