End-to-end telecom churn prediction system using machine learning. Includes EDA, feature engineering, RFE/PCA-based modeling, and recall-optimized evaluation. Identifies at-risk customers using behavioral trends and enables data-driven retention strategies for improved customer lifetime value.
This project builds a complete end-to-end machine learning system to predict customer churn in the telecom industry.
The goal is to identify high-risk customers early and enable data-driven retention strategies.
Customer churn leads to significant revenue loss.
This project answers:
- Who is likely to churn?
- Why are they churning?
- How can we intervene early?
- π Exploratory Data Analysis (EDA)
- π§Ή Data Cleaning & Feature Engineering
- π Feature Selection (RFE, Correlation, VIF)
- βοΈ Dimensionality Reduction (PCA)
- π€ Model Building:
- Logistic Regression
- Random Forest
- Gradient Boosting
- XGBoost
- π― Threshold Optimization (Recall-focused)
- π Model Evaluation (ROC, PR Curve, Confusion Matrix)
βββ data/
β βββ train.csv
β βββ test.csv
βββ notebooks/
β βββ Artificial-Intelligence-Driven-Customer-Retention-System.ipynb
βββ README.md
βββ requirements.txt
- Python π
- Pandas, NumPy
- Scikit-learn
- Statsmodels
- XGBoost
- Matplotlib, Seaborn
- π ARPU trends (revenue decline)
- π Call usage patterns
- β± Recharge gap features
- π Temporal behavior changes
- β³ Customer tenure
| Model | Accuracy | Recall (Churn) |
|---|---|---|
| Logistic Regression | ~75% | ~82% β |
| PCA + Logistic | ~76% | ~82% π₯ |
| Gradient Boosting | ~92% | ~23% β |
| XGBoost | ~92% | ~35% β |
- High Recall (~82%)
- Stable Generalization
- Handles Multicollinearity
- Business-aligned performance
- π Declining revenue is the strongest churn signal
- π Reduced usage indicates disengagement
- β± Recharge delays are early churn indicators
- π Recent behavior matters more than historical
- π Churn is a gradual behavioral process
- π― Segment users by churn risk
- π΄ High risk β aggressive retention
- π Medium risk β engagement campaigns
- π’ Low risk β no action
Raw Data β Cleaning β Feature Engineering β Scaling β PCA β Model β Prediction
# Install dependencies
pip install -r requirements.txt
# Run notebook
jupyter notebook- Deploy using FastAPI / Streamlit
- Real-time churn prediction system
- Advanced models (LightGBM, tuned XGBoost)
- Cost-sensitive learning
- Churn is not a sudden event β it is a gradual disengagement process.
Feel free to fork and improve the project!
For any queries or collaboration, reach out!
β If you like this project, give it a star!