This project performs a comprehensive analysis of a balanced dataset (50-50 diabetic vs. non-diabetic) to predict health outcomes. It utilizes various statistical and machine learning techniques in R to identify key health indicators and compare the performance of multiple classification models.
The project is structured to handle data cleaning, exploratory data analysis (EDA), feature selection, and model training. The primary goal is to decrease bias using a balanced dataset and accurately predict Diabetes_binary status. Following prediction, the project focuses on inference using Decision Trees to create interpretable health rules.
- Data Preprocessing: Handling of duplicates and NA values to ensure dataset integrity.
- Exploratory Data Analysis (EDA): Visualization of relationships between diabetes and factors like BMI, Blood Pressure, Cholesterol, and Income using
ggplot2. - Feature Selection: Implementation of Forward and Backward Stepwise selection (using AIC) to identify the top 5 predictive features.
- Model Comparison: Training and evaluation of 8 different algorithms using 5-fold Cross-Validation:
- QDA & LDA
- Random Forest (
rf) - Neural Network (
nnet) - Conditional Inference Tree (
ctree) - Linear SVM
- Logistic Regression (
LogitBoost) - K-Nearest Neighbors (
knn)
- Inference: Pruned Decision Tree generation for interpreting the logical flow of diagnosis.
- Ensure you have R installed (version 4.2+ recommended).
- Clone this repository.
- Install the required dependencies by running the following command in your R console:
install.packages(c("caret", "pROC", "ggplot2", "tidyverse", "leaps", "MASS", "tree"))-
Prepare Data: Ensure your dataset (e.g.,
d2.csv) is placed in theData/directory. Note: You may need to update the file path inHW2.RorHW2-Code.Rmdline:read.csv("path/to/your/d2.csv"). -
Run Analysis: You can run the raw script or knit the R Markdown file for a report.
# Run the main script source("HW2.R")
Or open
HW2-Code.Rmdin RStudio and click Knit to generate an HTML/PDF report.
Contributions are welcome!
- Fork the repository.
- Create a feature branch (
git checkout -b feature/NewModel). - Commit your changes.
- Push to the branch and open a Pull Request.
Distributed under the MIT License. See LICENSE for more information.
Nima Kelidari Project Link: https://github.com/nikelroid/regression-hw2