🌳 Decision Trees & Random Forest — Loan Default Prediction

A machine learning project that applies Decision Tree and Random Forest algorithms to predict whether a borrower is likely to repay a loan. The project uses real lending-style data and compares tree-based models for classification performance.

📌 Overview

This project analyzes borrower financial data and builds classification models using:

Decision Tree Classifier
Random Forest Classifier

The goal is to predict loan repayment behavior and compare how single-tree vs ensemble methods perform on the same dataset.

🎯 Objective

Predict whether a loan will be:

Repaid ✅
Default ❌

Using borrower and loan-related features.

🧩 Tech Stack

Python
Pandas
NumPy
Scikit-learn
Matplotlib
Seaborn
Jupyter Notebook

📊 Dataset

Based on publicly available lending-style data (LendingClub-type dataset).

Typical features include:

Credit policy
Interest rate
Installment amount
Annual income
Debt-to-income ratio
FICO score range
Credit history metrics

Target variable:

loan_status / not_fully_paid

🔬 Project Workflow

1️⃣ Data Exploration

Dataset loading and inspection
Feature distribution analysis
Class balance check
Visualization of key variables

2️⃣ Data Preprocessing

Handling categorical variables
Feature selection
Train/Test split
Encoding where required

3️⃣ Decision Tree Model

Train Decision Tree classifier
Fit on training data
Generate predictions
Evaluate performance

4️⃣ Random Forest Model

Train ensemble Random Forest classifier
Compare against Decision Tree
Analyze performance improvements

5️⃣ Evaluation Metrics

Models evaluated using:

Accuracy
Confusion Matrix
Precision / Recall
F1 Score
Classification Report

📈 Results

For Decision Tree classification_report:

              precision    recall  f1-score   support

           0       0.84      0.83      0.84      2404
           1       0.19      0.21      0.20       470

    accuracy                           0.73      2874
   macro avg       0.52      0.52      0.52      2874
weighted avg       0.74      0.73      0.73      2874

For Random Forest classification_report:

              precision    recall  f1-score   support

           0       0.84      0.99      0.91      2404
           1       0.52      0.03      0.06       470

    accuracy                           0.84      2874
   macro avg       0.68      0.51      0.49      2874
weighted avg       0.79      0.84      0.77      2874

For the same dataset, Random Forest showed improved generalization and lower variance.

⚖️ Model Comparison Insight

Model	Strength	Limitation
Decision Tree	Easy to interpret	Prone to overfitting
Random Forest	Better generalization	Less interpretable

▶️ How to Run

Clone repo

git clone https://github.com/rohitb281/decision-trees-random-forest-project.git
cd decision-trees-random-forest-project

Install dependencies

pip install pandas numpy scikit-learn matplotlib seaborn

Run notebook

jupyter notebook

Open:

Decision Trees and Random Forest Project.ipynb

Run all cells.

🧠 Concepts Demonstrated

Tree-based machine learning models
Ensemble learning
Bias–variance tradeoff
Feature-based classification
Model comparison methodology
Evaluation metrics for classifiers

🚀 Possible Improvements

Hyperparameter tuning (GridSearchCV)
Cross-validation
Feature importance analysis
ROC–AUC curves
Class imbalance handling
Model explainability (SHAP)

⚠️ Limitations

Limited hyperparameter tuning
No production deployment
Interpretability vs performance tradeoff

📄 License

Open for educational and portfolio use.

👤 Author

Rohit Bollapragada
GitHub: https://github.com/rohitb281

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Decision Trees and Random Forest Project.ipynb		Decision Trees and Random Forest Project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌳 Decision Trees & Random Forest — Loan Default Prediction

📌 Overview

🎯 Objective

🧩 Tech Stack

📊 Dataset

🔬 Project Workflow

1️⃣ Data Exploration

2️⃣ Data Preprocessing

3️⃣ Decision Tree Model

4️⃣ Random Forest Model

5️⃣ Evaluation Metrics

📈 Results

⚖️ Model Comparison Insight

▶️ How to Run

Clone repo

Install dependencies

Run notebook

Open:

🧠 Concepts Demonstrated

🚀 Possible Improvements

⚠️ Limitations

📄 License

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌳 Decision Trees & Random Forest — Loan Default Prediction

📌 Overview

🎯 Objective

🧩 Tech Stack

📊 Dataset

🔬 Project Workflow

1️⃣ Data Exploration

2️⃣ Data Preprocessing

3️⃣ Decision Tree Model

4️⃣ Random Forest Model

5️⃣ Evaluation Metrics

📈 Results

⚖️ Model Comparison Insight

▶️ How to Run

Clone repo

Install dependencies

Run notebook

Open:

🧠 Concepts Demonstrated

🚀 Possible Improvements

⚠️ Limitations

📄 License

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages