💸 Loan Default Prediction System

An end-to-end machine learning project designed to help microfinance institutions, SACCOs, and digital lenders predict the likelihood of loan default based on customer financial and demographic data. Built using Python, Streamlit, and Plotly, this project showcases ML modeling, explainability, and real-time deployment.

📌 Problem Statement

Loan defaults significantly impact the financial sustainability of microfinance institutions. Traditional manual assessments are time-consuming and often subjective. This system enables automated, consistent, and explainable credit risk assessment, reducing bad loans and increasing operational efficiency.

📊 Key Features

🧾 Interactive Web App – Predict loan default risk using real-time user inputs.
🧠 Model Explainability – Uses SHAP for interpreting model decisions.
📈 Visual Analytics Dashboard – Built-in Plotly-based insights.
⚙️ Clean Modular Code – Structured for production, extension, and scalability.

🧠 Machine Learning Workflow

📍 Data Preprocessing

Handle missing values
Detect and treat outliers
Feature engineering (e.g., debt-to-income ratio, loan-to-asset ratio)

🤖 Model Training

Algorithms: XGBoost, Random Forest, LightGBM
Evaluation Metrics: Confusion Matrix, ROC-AUC, Precision-Recall Curve

📌 Model Explainability

SHAP values for local and global interpretation
Transparency in credit decision-making

🚀 Deployment

Frontend: Streamlit
Realtime inference from user inputs
Visual insights with Plotly

📂 Project Structure

Loan-Default-Prediction-System/ ├── data/ # Raw and processed datasets ├── notebooks/ # EDA and model training notebooks ├── models/ # Trained ML models (pickle files) ├── streamlit_app.py # Streamlit dashboard script ├── requirements.txt # Python dependencies └── README.md # Project documentation

🛠️ Tech Stack

Category	Tools/Frameworks
Programming Language	Python
ML Libraries	Pandas, Scikit-learn, XGBoost, LightGBM
Explainability	SHAP
Visualization	Plotly
Interface	Streamlit
Version Control	Git + GitHub
Optional Add-ons	Docker, MLflow, SQL/NoSQL

💼 Business Value & Use Case

📊 Business Impact

This system directly addresses operational challenges faced by lenders by optimizing loan approval through data-driven credit risk scoring:

✅ 15–30% Reduction in Non-Performing Loans (NPLs)
Identify high-risk applicants pre-disbursement to minimize defaults.
✅ 2–4 Hours Saved Per Day in Manual Risk Review
Automates time-consuming loan assessments.
✅ Faster, Consistent Loan Decisions at Scale
Eliminates inconsistencies in subjective judgment.
✅ Extendable for Future Use Cases
Enables segmentation, dynamic retraining, and performance tracking.

🏦 Target Users

This system is designed for the following stakeholders:

Credit Officers – Receive instant risk feedback for applicants.
Loan Committees – Reference model insights during vetting.
Risk Analysts / Data Teams – Monitor model performance, retrain periodically.
Branch / Portfolio Managers – Track risk exposure and lending health.
Product Teams / Developers – Integrate predictions into digital platforms.

⚠️ Model Assumptions & Limitations

Input Bias – Heavily reliant on structured financial data (e.g., income, assets); underrepresents informal earners.
No External Credit Data – CRB scores or third-party credit history not yet integrated.
Static Model – Based on historical data; performance may degrade without retraining.
No Fairness or Compliance Layer – Does not enforce regulatory or ethical constraints (e.g., gender/race bias).

🧠 Recommendation: For real-world use, integrate this model into a loan management system, set up performance monitoring, and retrain with updated data quarterly.

📸 Visual Insights

📉 Estimated Reduction in Non-Performing Loans (NPL)

import plotly.graph_objects as go

fig = go.Figure(data=[
    go.Bar(name='Before ML Model', x=['NPL Rate'], y=[0.28], marker_color='crimson'),
    go.Bar(name='After ML Model', x=['NPL Rate'], y=[0.17], marker_color='seagreen')
])
fig.update_layout(
    title='📉 Estimated Reduction in Non-Performing Loans (NPL)',
    yaxis=dict(title='NPL Rate', tickformat=".0%"),
    barmode='group'
)
fig.show()


---

📊 Risk Category Distribution

import plotly.express as px

labels = ['Low Risk', 'Medium Risk', 'High Risk']
values = [220, 130, 50]

fig = px.pie(
    names=labels, 
    values=values, 
    title='📊 Distribution of Predicted Loan Risk Categories',
    color_discrete_sequence=px.colors.sequential.RdBu
)
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()


---

📈 ROC Curve

from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt

fpr, tpr, _ = roc_curve(y_test, y_proba)
auc = roc_auc_score(y_test, y_proba)

plt.figure(figsize=(6, 5))
plt.plot(fpr, tpr, label=f'AUC = {auc:.2f}', color='navy')
plt.plot([0, 1], [0, 1], linestyle='--', color='gray')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('📈 ROC Curve - Loan Default Model')
plt.legend(loc='lower right')
plt.grid()
plt.show()


---

🧠 SHAP Feature Importance

import shap

explainer = shap.Explainer(model)
shap_values = explainer(X_train)
shap.summary_plot(shap_values, X_train)


---

🚀 How to Run

# 1. Clone the repo
git clone https://github.com/evans25575/-Loan-Default-Prediction-System.git
cd Loan-Default-Prediction-System

# 2. Install dependencies
pip install -r requirements.txt

# 3. Run the app
streamlit run streamlit_app.py

Then open http://localhost:8501 in your browser.


---

👨‍💻 Author

Evans Kiplangat
🌐 Portfolio Website
🐙 GitHub


---

📜 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.devcontainer		.devcontainer
models		models
LOAN DEFAULT.mp4		LOAN DEFAULT.mp4
Model		Model
README.md		README.md
loan_default_model.pkl		loan_default_model.pkl
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💸 Loan Default Prediction System

📌 Problem Statement

📊 Key Features

🧠 Machine Learning Workflow

📍 Data Preprocessing

🤖 Model Training

📌 Model Explainability

🚀 Deployment

📂 Project Structure

🛠️ Tech Stack

💼 Business Value & Use Case

📊 Business Impact

🏦 Target Users

⚠️ Model Assumptions & Limitations

📸 Visual Insights

📉 Estimated Reduction in Non-Performing Loans (NPL)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

💸 Loan Default Prediction System

📌 Problem Statement

📊 Key Features

🧠 Machine Learning Workflow

📍 Data Preprocessing

🤖 Model Training

📌 Model Explainability

🚀 Deployment

📂 Project Structure

🛠️ Tech Stack

💼 Business Value & Use Case

📊 Business Impact

🏦 Target Users

⚠️ Model Assumptions & Limitations

📸 Visual Insights

📉 Estimated Reduction in Non-Performing Loans (NPL)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages