This repository presents a comprehensive machine learning solution for customer churn prediction, designed to help businesses identify at-risk customers and proactively optimize retention strategies. The project employs a tuned LightGBM model within a tailored data pipeline, effectively handling class imbalance through SMOTE and engineering features to improve model performance.
This project showcases an end-to-end, production-ready approach to churn modeling, highlighting skills in model optimization, and deploying interactive machine learning applications—ideal for demonstrating expertise in building impactful, business-driven ML solutions.
Explore the code and methodology in the Google Colab Notebook. This notebook includes all the steps for data preprocessing, feature engineering, model training, and evaluation.

Streamlit interface that allows for interactive customer churn predictions.
-
Data Preprocessing:
- Created new features (
CustomerTenureEngagement,ContentConsumptionScore) to improve model accuracy. - Encoded
SubscriptionTypeusing ordinal encoding and applied one-hot encoding to other categorical features. - Handled outliers and skewness with Winsorization and log transformations.
- Created new features (
-
Model Training:
- Utilized LightGBM for its efficiency with large datasets
- Addressed class imbalance with SMOTE to improve recall for the churn class.
- Tuned hyperparameters with GridSearchCV to optimize recall and F1 score.
-
Deployment:
- Built a Streamlit app for churn predictions, processing user inputs through the full pipeline to deliver customer-specific churn probabilities.
Follow these steps to set up and run the application locally:
First, clone the repository and navigate to the project directory:
git clone https://github.com/Parag000/Customer-Churn-Prediction.git
cd customer-churn-predictionpython -m venv venv
source venv/bin/activate # For Linux/MacOS
venv\Scripts\activate # For Windows
pip install -r requirements.txtDownload the following files and place them in the project root:
- smote_lgbm.pkl: The pre-trained LightGBM model.
- preprocessing_pipeline.pkl: The custom preprocessing pipeline. These files are be provided as part of the repository assets.
Start the Streamlit application by running:
streamlit run app.pyThis will open the application in your web browser