This project involves analyzing and classifying shop customer data to gain insights into customer behavior and improve business strategies. The project consists of two main components:
- Customer Clustering: Grouping customers into clusters based on their characteristics.
- Customer Classification: Building classification models to predict customer clusters based on specific features.
The customer data is sourced from a Kaggle dataset.
This notebook focuses on clustering customers based on their features. Below are the key steps:
-
Introduction to Dataset:
- Overview of the dataset with details on rows, columns, and feature descriptions.
-
Importing Libraries:
- Essential libraries such as
pandas,numpy,matplotlib,seaborn, andsklearnare imported.
- Essential libraries such as
-
Loading the Dataset:
- Data is loaded into a Pandas DataFrame, and the first few rows are displayed.
-
Exploratory Data Analysis (EDA):
- Dataset structure and descriptive statistics are examined.
- Missing values are identified, and data visualizations are used to understand variable distributions.
-
Data Preprocessing:
- Handling missing values by removing rows with null values.
- Removing duplicate entries.
- Detecting and handling outliers using the IQR method.
- Encoding categorical features with
LabelEncoder.
-
Clustering Model Development:
- Implementing K-Means clustering with an initial number of clusters (K=3).
- Evaluating the model using Silhouette Score.
- Optimizing the number of clusters using the Elbow method and Silhouette Score.
- Retraining the model with the optimal number of clusters.
- Performing feature selection to identify influential features.
- Training the K-Means model with selected features and comparing the results.
-
Clustering Results Visualization:
- Visualizing clustering outcomes using PCA for dimensionality reduction.
-
Cluster Analysis and Interpretation:
- Examining the characteristics of each cluster based on available features.
- Displaying value distributions within each cluster.
-
Exporting Results:
- Saving the clustering results to a CSV file.
This notebook provides a comprehensive analysis of shop customers, enabling the business to group customers into distinct segments for targeted strategies.
This notebook focuses on building machine learning models to classify customers into their respective clusters.
-
Importing Libraries:
- Libraries such as
pandas,scikit-learn,seaborn, andmatplotlibare imported.
- Libraries such as
-
Loading Clustered Dataset:
- The dataset from the clustering notebook is loaded into a DataFrame for further analysis.
-
Data Splitting:
- The dataset is split into training (70%) and testing (30%) sets.
-
Classification Model Development:
- Building models using the following algorithms:
- Logistic Regression
- Decision Tree
- Random Forest
- K-Nearest Neighbors (K-NN)
- Building models using the following algorithms:
-
Model Evaluation:
- Evaluating models on the testing set using metrics such as:
- Accuracy
- Precision
- Recall
- F1-Score
- Confusion Matrix
- Summary of results:
- Decision Tree, Random Forest, and K-NN achieved perfect scores for all metrics (Accuracy, Precision, Recall, and F1-Score = 1.0).
- Logistic Regression achieved an accuracy of 0.9119.
- Evaluating models on the testing set using metrics such as:
-
Confusion Matrix Visualization:
- Confusion matrices for each model are visualized using
seaborn.
- Confusion matrices for each model are visualized using
This notebook demonstrates a step-by-step process for data analysis, model training, and performance evaluation for classifying shop customers.
By combining clustering and classification approaches, this project provides valuable insights into customer segmentation and predictive analytics. These insights can be leveraged by businesses to create tailored marketing strategies, improve customer satisfaction, and optimize resource allocation.