Comprehensive Machine Learning Pipeline on Heart Disease UCI Dataset
This project provides a complete Machine Learning pipeline for analyzing, predicting, and visualizing heart disease risks using the UCI Heart Disease dataset. It covers data preprocessing, dimensionality reduction (PCA), supervised & unsupervised models, hyperparameter tuning, and optional web deployment using Streamlit & Ngrok.
- Project Objectives
- Dataset
- Tools & Libraries
- Project Structure
- How to Run
- Pipeline Workflow
- Results & Deliverables
- Perform Data Cleaning & Preprocessing (missing values, encoding, scaling).
- Apply Dimensionality Reduction using PCA.
- Implement Feature Selection using Random Forest, RFE, Chi-Square.
- Train Supervised Models:
- Logistic Regression
- Decision Tree
- Random Forest
- Support Vector Machine (SVM)
- Apply Unsupervised Learning:
- K-Means Clustering
- Hierarchical Clustering
- Optimize models using GridSearchCV & RandomizedSearchCV.
- Deploy a Streamlit Web App and use Ngrok for public access.
- Name: Heart Disease UCI Dataset
- Description: Predict the presence or absence of heart disease based on clinical parameters.
- Language:
python - Libraries:
pandas,numpy– Data Handlingmatplotlib,Seaborn– Visualizationsklearn– Machine Learning Models & PCAjoblib- Save Model as .plkstreamlit– Interactive Web Appngrok– Public URL for Deployment
Heart_Disease_Project/
│── data/
│ ├── heart_disease.csv
│── notebooks/
│ ├── 00_data_collecting.ipynb
│ ├── 01_data_preprocessing.ipynb
│ ├── 02_pca_analysis.ipynb
│ ├── 03_feature_selection.ipynb
│ ├── 04_supervised_learning.ipynb
│ ├── 05_unsupervised_learning.ipynb
│ ├── 06_hyperparameter_tuning.ipynb
│── models/
│ ├── final_model.pkl
│── ui/
│ ├── app.py (Streamlit UI)
│── deployment/
│ ├── ngrok_setup.txt
│── results/
│ ├── evaluation_metrics.txt
│── requirements.txt
│── README.md
│── .gitignore
git clone https://github.com/basmala-ayman/Heart-Disease.git
cd Heart-Disease
python3 -m venv venv
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate
pip install -r requirements.txt
jupyter notebook
streamlit run ui/app.py
Read instructions in deployment/ngrok_setup.txt.
- Data Preprocessing & Cleaning – Handle missing values, encoding, scaling
- PCA Analysis – Dimensionality Reduction
- Feature Selection – Random Forest, RFE, Chi-Square
- Model Training – Logistic Regression, Decision Tree, Random Forest, SVM
- Evaluation – Accuracy, Precision, Recall, F1, ROC-AUC
- Clustering – K-Means & Hierarchical Clustering
- Hyperparameter Tuning – GridSearchCV, RandomizedSearchCV
- Deployment (Bonus) – Streamlit & Ngrok
- Cleaned Dataset
- PCA & Feature Selection Results
- Trained Models with Evaluation Metrics
- Optimized Model Saved as
.pkl - Interactive Streamlit UI
- Ngrok Public Access Link