Skip to content

evans25575/-Loan-Default-Prediction-System

Repository files navigation

💸 Loan Default Prediction System

An end-to-end machine learning project designed to help microfinance institutions, SACCOs, and digital lenders predict the likelihood of loan default based on customer financial and demographic data. Built using Python, Streamlit, and Plotly, this project showcases ML modeling, explainability, and real-time deployment.


📌 Problem Statement

Loan defaults significantly impact the financial sustainability of microfinance institutions. Traditional manual assessments are time-consuming and often subjective. This system enables automated, consistent, and explainable credit risk assessment, reducing bad loans and increasing operational efficiency.


📊 Key Features

  • 🧾 Interactive Web App – Predict loan default risk using real-time user inputs.
  • 🧠 Model Explainability – Uses SHAP for interpreting model decisions.
  • 📈 Visual Analytics Dashboard – Built-in Plotly-based insights.
  • ⚙️ Clean Modular Code – Structured for production, extension, and scalability.

🧠 Machine Learning Workflow

📍 Data Preprocessing

  • Handle missing values
  • Detect and treat outliers
  • Feature engineering (e.g., debt-to-income ratio, loan-to-asset ratio)

🤖 Model Training

  • Algorithms: XGBoost, Random Forest, LightGBM
  • Evaluation Metrics: Confusion Matrix, ROC-AUC, Precision-Recall Curve

📌 Model Explainability

  • SHAP values for local and global interpretation
  • Transparency in credit decision-making

🚀 Deployment

  • Frontend: Streamlit
  • Realtime inference from user inputs
  • Visual insights with Plotly

📂 Project Structure

Loan-Default-Prediction-System/ ├── data/ # Raw and processed datasets ├── notebooks/ # EDA and model training notebooks ├── models/ # Trained ML models (pickle files) ├── streamlit_app.py # Streamlit dashboard script ├── requirements.txt # Python dependencies └── README.md # Project documentation


🛠️ Tech Stack

Category Tools/Frameworks
Programming Language Python
ML Libraries Pandas, Scikit-learn, XGBoost, LightGBM
Explainability SHAP
Visualization Plotly
Interface Streamlit
Version Control Git + GitHub
Optional Add-ons Docker, MLflow, SQL/NoSQL

💼 Business Value & Use Case

📊 Business Impact

This system directly addresses operational challenges faced by lenders by optimizing loan approval through data-driven credit risk scoring:

  • 15–30% Reduction in Non-Performing Loans (NPLs)
    Identify high-risk applicants pre-disbursement to minimize defaults.

  • 2–4 Hours Saved Per Day in Manual Risk Review
    Automates time-consuming loan assessments.

  • Faster, Consistent Loan Decisions at Scale
    Eliminates inconsistencies in subjective judgment.

  • Extendable for Future Use Cases
    Enables segmentation, dynamic retraining, and performance tracking.


🏦 Target Users

This system is designed for the following stakeholders:

  • Credit Officers – Receive instant risk feedback for applicants.
  • Loan Committees – Reference model insights during vetting.
  • Risk Analysts / Data Teams – Monitor model performance, retrain periodically.
  • Branch / Portfolio Managers – Track risk exposure and lending health.
  • Product Teams / Developers – Integrate predictions into digital platforms.

⚠️ Model Assumptions & Limitations

  • Input Bias – Heavily reliant on structured financial data (e.g., income, assets); underrepresents informal earners.
  • No External Credit Data – CRB scores or third-party credit history not yet integrated.
  • Static Model – Based on historical data; performance may degrade without retraining.
  • No Fairness or Compliance Layer – Does not enforce regulatory or ethical constraints (e.g., gender/race bias).

🧠 Recommendation: For real-world use, integrate this model into a loan management system, set up performance monitoring, and retrain with updated data quarterly.


📸 Visual Insights

📉 Estimated Reduction in Non-Performing Loans (NPL)

import plotly.graph_objects as go

fig = go.Figure(data=[
    go.Bar(name='Before ML Model', x=['NPL Rate'], y=[0.28], marker_color='crimson'),
    go.Bar(name='After ML Model', x=['NPL Rate'], y=[0.17], marker_color='seagreen')
])
fig.update_layout(
    title='📉 Estimated Reduction in Non-Performing Loans (NPL)',
    yaxis=dict(title='NPL Rate', tickformat=".0%"),
    barmode='group'
)
fig.show()


---

📊 Risk Category Distribution

import plotly.express as px

labels = ['Low Risk', 'Medium Risk', 'High Risk']
values = [220, 130, 50]

fig = px.pie(
    names=labels, 
    values=values, 
    title='📊 Distribution of Predicted Loan Risk Categories',
    color_discrete_sequence=px.colors.sequential.RdBu
)
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()


---

📈 ROC Curve

from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt

fpr, tpr, _ = roc_curve(y_test, y_proba)
auc = roc_auc_score(y_test, y_proba)

plt.figure(figsize=(6, 5))
plt.plot(fpr, tpr, label=f'AUC = {auc:.2f}', color='navy')
plt.plot([0, 1], [0, 1], linestyle='--', color='gray')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('📈 ROC Curve - Loan Default Model')
plt.legend(loc='lower right')
plt.grid()
plt.show()


---

🧠 SHAP Feature Importance

import shap

explainer = shap.Explainer(model)
shap_values = explainer(X_train)
shap.summary_plot(shap_values, X_train)


---

🚀 How to Run

# 1. Clone the repo
git clone https://github.com/evans25575/-Loan-Default-Prediction-System.git
cd Loan-Default-Prediction-System

# 2. Install dependencies
pip install -r requirements.txt

# 3. Run the app
streamlit run streamlit_app.py

Then open http://localhost:8501 in your browser.


---

👨‍💻 Author

Evans Kiplangat
🌐 Portfolio Website
🐙 GitHub


---

📜 License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages