Skip to content

rohitb281/decision-trees-random-forest-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

🌳 Decision Trees & Random Forest — Loan Default Prediction

A machine learning project that applies Decision Tree and Random Forest algorithms to predict whether a borrower is likely to repay a loan. The project uses real lending-style data and compares tree-based models for classification performance.


📌 Overview

This project analyzes borrower financial data and builds classification models using:

  • Decision Tree Classifier
  • Random Forest Classifier

The goal is to predict loan repayment behavior and compare how single-tree vs ensemble methods perform on the same dataset.


🎯 Objective

Predict whether a loan will be:

  • Repaid ✅
  • Default ❌

Using borrower and loan-related features.


🧩 Tech Stack

  • Python
  • Pandas
  • NumPy
  • Scikit-learn
  • Matplotlib
  • Seaborn
  • Jupyter Notebook

📊 Dataset

Based on publicly available lending-style data (LendingClub-type dataset).

Typical features include:

  • Credit policy
  • Interest rate
  • Installment amount
  • Annual income
  • Debt-to-income ratio
  • FICO score range
  • Credit history metrics

Target variable:

loan_status / not_fully_paid


🔬 Project Workflow

1️⃣ Data Exploration

  • Dataset loading and inspection
  • Feature distribution analysis
  • Class balance check
  • Visualization of key variables

2️⃣ Data Preprocessing

  • Handling categorical variables
  • Feature selection
  • Train/Test split
  • Encoding where required

3️⃣ Decision Tree Model

  • Train Decision Tree classifier
  • Fit on training data
  • Generate predictions
  • Evaluate performance

4️⃣ Random Forest Model

  • Train ensemble Random Forest classifier
  • Compare against Decision Tree
  • Analyze performance improvements

5️⃣ Evaluation Metrics

Models evaluated using:

  • Accuracy
  • Confusion Matrix
  • Precision / Recall
  • F1 Score
  • Classification Report

📈 Results

For Decision Tree classification_report:

              precision    recall  f1-score   support

           0       0.84      0.83      0.84      2404
           1       0.19      0.21      0.20       470

    accuracy                           0.73      2874
   macro avg       0.52      0.52      0.52      2874
weighted avg       0.74      0.73      0.73      2874

For Random Forest classification_report:

              precision    recall  f1-score   support

           0       0.84      0.99      0.91      2404
           1       0.52      0.03      0.06       470

    accuracy                           0.84      2874
   macro avg       0.68      0.51      0.49      2874
weighted avg       0.79      0.84      0.77      2874

For the same dataset, Random Forest showed improved generalization and lower variance.


⚖️ Model Comparison Insight

Model Strength Limitation
Decision Tree Easy to interpret Prone to overfitting
Random Forest Better generalization Less interpretable

▶️ How to Run

Clone repo

git clone https://github.com/rohitb281/decision-trees-random-forest-project.git
cd decision-trees-random-forest-project

Install dependencies

pip install pandas numpy scikit-learn matplotlib seaborn

Run notebook

jupyter notebook

Open:

Decision Trees and Random Forest Project.ipynb

  • Run all cells.

🧠 Concepts Demonstrated

  • Tree-based machine learning models
  • Ensemble learning
  • Bias–variance tradeoff
  • Feature-based classification
  • Model comparison methodology
  • Evaluation metrics for classifiers

🚀 Possible Improvements

  • Hyperparameter tuning (GridSearchCV)
  • Cross-validation
  • Feature importance analysis
  • ROC–AUC curves
  • Class imbalance handling
  • Model explainability (SHAP)

⚠️ Limitations

  • Limited hyperparameter tuning
  • No production deployment
  • Interpretability vs performance tradeoff

📄 License

Open for educational and portfolio use.


👤 Author

About

For this project, we will analyze publicly available data from LendingClub.com, which connects borrowers needing money with investors. The goal is to create a model that predicts the likelihood of borrowers repaying their loans. We will focus on Lending Club's data from 2007-2010 to classify and determine the repayment behavior pre-2016.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors