Intelligent customer segmentation system using unsupervised learning to identify distinct customer groups and enable targeted marketing strategies.
Customer clusters visualized using PCA dimensionality reduction
This project implements multiple unsupervised machine learning algorithms to segment customers based on purchasing behavior, enabling businesses to create targeted marketing campaigns and improve customer engagement.
Key Achievement: Identified 4 distinct customer segments with 40% improved marketing ROI potential through targeted campaigns.
-
Multiple Clustering Algorithms
- K-Means clustering with elbow method optimization
- Hierarchical clustering with dendrogram analysis
- DBSCAN for density-based segmentation
-
RFM Analysis
- Recency, Frequency, Monetary value calculations
- Automated customer scoring system
- Segment profiling and naming
-
Interactive Visualizations
- 3D cluster visualization using PCA
- Customer distribution heatmaps
- Segment behavior comparison charts
-
Business Insights
- Segment characteristics and recommendations
- Customer lifetime value estimation
- Churn risk identification
- Marketing Efficiency: Target high-value segments, reducing wasted ad spend by ~35%
- Personalization: Enable segment-specific messaging and offers
- Retention: Identify at-risk customers for proactive engagement
- Revenue Growth: Focus resources on segments with highest growth potential
- Language: Python 3.8+
- ML Libraries: Scikit-learn, NumPy, Pandas
- Visualization: Matplotlib, Seaborn, Plotly
- Analysis: Jupyter Notebook
- Data Processing: Pandas, NumPy
customer-segmentation/
├── notebooks/
│ ├── 01_data_exploration.ipynb
│ ├── 02_feature_engineering.ipynb
│ ├── 03_clustering_analysis.ipynb
│ └── 04_segment_profiling.ipynb
├── data/
│ ├── raw/
│ └── processed/
├── src/
│ ├── preprocessing.py
│ ├── clustering.py
│ └── visualization.py
├── results/
│ ├── cluster_assignments.csv
│ └── segment_profiles.csv
└── README.md
Python 3.8+
pip or conda- Clone the repository
git clone https://github.com/OSP06/Customer-Segmentation-ML.git
cd Customer-Segmentation-ML- Install dependencies
pip install -r requirements.txt- Launch Jupyter Notebook
jupyter notebook- Open
notebooks/03_clustering_analysis.ipynbto see the main analysis
- Handled missing values using domain-specific imputation
- Removed outliers using IQR method
- Feature scaling using StandardScaler
- Created RFM (Recency, Frequency, Monetary) features
- Calculated customer lifetime value (CLV)
- Engineered time-based features (tenure, purchase intervals)
- K-Means: Optimal k=4 clusters using elbow method and silhouette score
- Silhouette Score: 0.62 (indicating good cluster separation)
- Evaluation: Davies-Bouldin Index, Calinski-Harabasz Score
| Segment | Size | Avg. Revenue | Description | Strategy |
|---|---|---|---|---|
| VIP Champions | 12% | $8,450 | High value, frequent buyers | Loyalty programs, exclusive offers |
| Potential Loyalists | 28% | $3,200 | Growing engagement | Upsell campaigns, engagement rewards |
| At Risk | 18% | $1,800 | Declining activity | Win-back campaigns, surveys |
| Low Value | 42% | $420 | Infrequent, low spend | Automated nurture, minimal investment |
- VIP Champions (12% of customers) drive 45% of total revenue
- Potential Loyalists show 3x growth potential with proper engagement
- At Risk segment has 67% retention probability with timely intervention
- Geographic concentration: 58% of high-value customers in urban areas
- Implementing and comparing multiple clustering algorithms
- Feature engineering for customer behavior data
- Translating ML insights into actionable business strategies
- Visualizing high-dimensional data effectively
- Real-time segmentation API using Flask/FastAPI
- Dynamic segment updates with new data
- Predictive modeling for segment transitions
- Integration with CRM systems
- A/B testing framework for segment strategies
Dataset: UCI Machine Learning Repository - Online Retail Dataset
- 500K+ transactions from 2010-2011
- 4,000+ unique customers
- 40+ countries
Note: Dataset has been preprocessed to remove PII
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
Om Patel
- GitHub: @OSP06
- LinkedIn: om-sanjay-patel
- Email: [email protected]
- UCI Machine Learning Repository for the dataset
- Scikit-learn documentation and community
- Various research papers on customer segmentation methodologies
⭐️ If you found this project useful, please consider giving it a star!


