This project analyzes health survey data from the Heart Disease Health Indicators dataset to predict the likelihood of heart disease.
The notebook walks through data preprocessing, exploratory data analysis, and training multiple classification models including:
- Random Forest
- XGBoost
- Support Vector Machine (SVM)
- K-Nearest Neighbors (KNN)
- Decision Tree
- Gradient Boost
- Stacked Ensemble
- Neural Network
Model performance is evaluated using:
- Accuracy
- ROC-AUC
- Confusion Matrix
- Classification Report
- Heart_Diseases_project_V_01.ipynb — Main Jupyter Notebook with data exploration, preprocessing, model training, and evaluation.
- heart_disease_health_indicators.csv — Dataset file (download from Kaggle and place in this folder).
- requirements.txt — Python dependencies.
- README.md — Project description and usage.
- Download the dataset from Kaggle and save it as
heart_disease_health_indicators.csvin this folder. - Install the dependencies:
pip install -r requirements.txt- Open the notebook:
jupyter notebook Heart_Diseases_project_V_01.ipynb- Run all cells to reproduce the analysis.
- Kaggle dataset by Alex Teboul
- scikit-learn, XGBoost, pandas, matplotlib, seaborn