An end-to-end machine learning project designed to help microfinance institutions, SACCOs, and digital lenders predict the likelihood of loan default based on customer financial and demographic data. Built using Python, Streamlit, and Plotly, this project showcases ML modeling, explainability, and real-time deployment.
Loan defaults significantly impact the financial sustainability of microfinance institutions. Traditional manual assessments are time-consuming and often subjective. This system enables automated, consistent, and explainable credit risk assessment, reducing bad loans and increasing operational efficiency.
- 🧾 Interactive Web App – Predict loan default risk using real-time user inputs.
- 🧠 Model Explainability – Uses SHAP for interpreting model decisions.
- 📈 Visual Analytics Dashboard – Built-in Plotly-based insights.
- ⚙️ Clean Modular Code – Structured for production, extension, and scalability.
- Handle missing values
- Detect and treat outliers
- Feature engineering (e.g., debt-to-income ratio, loan-to-asset ratio)
- Algorithms: XGBoost, Random Forest, LightGBM
- Evaluation Metrics: Confusion Matrix, ROC-AUC, Precision-Recall Curve
- SHAP values for local and global interpretation
- Transparency in credit decision-making
- Frontend: Streamlit
- Realtime inference from user inputs
- Visual insights with Plotly
Loan-Default-Prediction-System/ ├── data/ # Raw and processed datasets ├── notebooks/ # EDA and model training notebooks ├── models/ # Trained ML models (pickle files) ├── streamlit_app.py # Streamlit dashboard script ├── requirements.txt # Python dependencies └── README.md # Project documentation
| Category | Tools/Frameworks |
|---|---|
| Programming Language | Python |
| ML Libraries | Pandas, Scikit-learn, XGBoost, LightGBM |
| Explainability | SHAP |
| Visualization | Plotly |
| Interface | Streamlit |
| Version Control | Git + GitHub |
| Optional Add-ons | Docker, MLflow, SQL/NoSQL |
This system directly addresses operational challenges faced by lenders by optimizing loan approval through data-driven credit risk scoring:
-
✅ 15–30% Reduction in Non-Performing Loans (NPLs)
Identify high-risk applicants pre-disbursement to minimize defaults. -
✅ 2–4 Hours Saved Per Day in Manual Risk Review
Automates time-consuming loan assessments. -
✅ Faster, Consistent Loan Decisions at Scale
Eliminates inconsistencies in subjective judgment. -
✅ Extendable for Future Use Cases
Enables segmentation, dynamic retraining, and performance tracking.
This system is designed for the following stakeholders:
- Credit Officers – Receive instant risk feedback for applicants.
- Loan Committees – Reference model insights during vetting.
- Risk Analysts / Data Teams – Monitor model performance, retrain periodically.
- Branch / Portfolio Managers – Track risk exposure and lending health.
- Product Teams / Developers – Integrate predictions into digital platforms.
- Input Bias – Heavily reliant on structured financial data (e.g., income, assets); underrepresents informal earners.
- No External Credit Data – CRB scores or third-party credit history not yet integrated.
- Static Model – Based on historical data; performance may degrade without retraining.
- No Fairness or Compliance Layer – Does not enforce regulatory or ethical constraints (e.g., gender/race bias).
🧠 Recommendation: For real-world use, integrate this model into a loan management system, set up performance monitoring, and retrain with updated data quarterly.
import plotly.graph_objects as go
fig = go.Figure(data=[
go.Bar(name='Before ML Model', x=['NPL Rate'], y=[0.28], marker_color='crimson'),
go.Bar(name='After ML Model', x=['NPL Rate'], y=[0.17], marker_color='seagreen')
])
fig.update_layout(
title='📉 Estimated Reduction in Non-Performing Loans (NPL)',
yaxis=dict(title='NPL Rate', tickformat=".0%"),
barmode='group'
)
fig.show()
---
📊 Risk Category Distribution
import plotly.express as px
labels = ['Low Risk', 'Medium Risk', 'High Risk']
values = [220, 130, 50]
fig = px.pie(
names=labels,
values=values,
title='📊 Distribution of Predicted Loan Risk Categories',
color_discrete_sequence=px.colors.sequential.RdBu
)
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()
---
📈 ROC Curve
from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt
fpr, tpr, _ = roc_curve(y_test, y_proba)
auc = roc_auc_score(y_test, y_proba)
plt.figure(figsize=(6, 5))
plt.plot(fpr, tpr, label=f'AUC = {auc:.2f}', color='navy')
plt.plot([0, 1], [0, 1], linestyle='--', color='gray')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('📈 ROC Curve - Loan Default Model')
plt.legend(loc='lower right')
plt.grid()
plt.show()
---
🧠 SHAP Feature Importance
import shap
explainer = shap.Explainer(model)
shap_values = explainer(X_train)
shap.summary_plot(shap_values, X_train)
---
🚀 How to Run
# 1. Clone the repo
git clone https://github.com/evans25575/-Loan-Default-Prediction-System.git
cd Loan-Default-Prediction-System
# 2. Install dependencies
pip install -r requirements.txt
# 3. Run the app
streamlit run streamlit_app.py
Then open http://localhost:8501 in your browser.
---
👨💻 Author
Evans Kiplangat
🌐 Portfolio Website
🐙 GitHub
---
📜 License
MIT License