A machine learning project that applies Decision Tree and Random Forest algorithms to predict whether a borrower is likely to repay a loan. The project uses real lending-style data and compares tree-based models for classification performance.
This project analyzes borrower financial data and builds classification models using:
- Decision Tree Classifier
- Random Forest Classifier
The goal is to predict loan repayment behavior and compare how single-tree vs ensemble methods perform on the same dataset.
Predict whether a loan will be:
- Repaid ✅
- Default ❌
Using borrower and loan-related features.
- Python
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
- Jupyter Notebook
Based on publicly available lending-style data (LendingClub-type dataset).
Typical features include:
- Credit policy
- Interest rate
- Installment amount
- Annual income
- Debt-to-income ratio
- FICO score range
- Credit history metrics
Target variable:
loan_status / not_fully_paid
- Dataset loading and inspection
- Feature distribution analysis
- Class balance check
- Visualization of key variables
- Handling categorical variables
- Feature selection
- Train/Test split
- Encoding where required
- Train Decision Tree classifier
- Fit on training data
- Generate predictions
- Evaluate performance
- Train ensemble Random Forest classifier
- Compare against Decision Tree
- Analyze performance improvements
Models evaluated using:
- Accuracy
- Confusion Matrix
- Precision / Recall
- F1 Score
- Classification Report
For Decision Tree classification_report:
precision recall f1-score support
0 0.84 0.83 0.84 2404
1 0.19 0.21 0.20 470
accuracy 0.73 2874
macro avg 0.52 0.52 0.52 2874
weighted avg 0.74 0.73 0.73 2874
For Random Forest classification_report:
precision recall f1-score support
0 0.84 0.99 0.91 2404
1 0.52 0.03 0.06 470
accuracy 0.84 2874
macro avg 0.68 0.51 0.49 2874
weighted avg 0.79 0.84 0.77 2874
For the same dataset, Random Forest showed improved generalization and lower variance.
| Model | Strength | Limitation |
|---|---|---|
| Decision Tree | Easy to interpret | Prone to overfitting |
| Random Forest | Better generalization | Less interpretable |
git clone https://github.com/rohitb281/decision-trees-random-forest-project.git
cd decision-trees-random-forest-projectpip install pandas numpy scikit-learn matplotlib seaborn
jupyter notebook
Decision Trees and Random Forest Project.ipynb
- Run all cells.
- Tree-based machine learning models
- Ensemble learning
- Bias–variance tradeoff
- Feature-based classification
- Model comparison methodology
- Evaluation metrics for classifiers
- Hyperparameter tuning (GridSearchCV)
- Cross-validation
- Feature importance analysis
- ROC–AUC curves
- Class imbalance handling
- Model explainability (SHAP)
- Limited hyperparameter tuning
- No production deployment
- Interpretability vs performance tradeoff
Open for educational and portfolio use.
- Rohit Bollapragada
- GitHub: https://github.com/rohitb281