Skip to content

LNRobertson/Customer_Segmentation

Repository files navigation

Capstone3 Proposal - Customer Segmentation (1280 × 650 px)

Hotel Customer Segmentation - Attrition Prediction

Insights from three years of customer booking data.

-- Project Status: [Active]

(Table of Contents)

MIT License GPLv3 License AGPL License


Project Intro/Objective

The purpose of this project is to cluster customers into segments based on their booking data and recency, frequency and monetary value. The goal is to have a deeper understanding of our top value customers for more targeted marketing, managment action, promo triggers etc. We can also use these segments to help predict the if a customer is at risk of attrition (non-returning/lost customer).

The primary reason for attrition prediction is to retain customers at high risk of loss and take preventiative action to preserve revenue sources and ensure cost of acquisition resources are yielding a solid ROI.




Table of Contents


Heading examples

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

Project Description

Methods Used

  • Inferential Statistics
  • Machine Learning
  • Data Visualization
  • Predictive Modeling
  • KMeans clustering
  • Classification

Languages and Technologies

  • Python
  • Pandas
  • Jupyter
  • Numpy
  • Scikit-learn
  • PyCaret
  • Tableau

The Data

A real-world customer dataset with 31 variables describes 83,590 instances (customers) from a hotel in Lisbon, Portugal. Instances include; customer personal, behavioral, demographic, and geographical information for 3 full years. The dataset can be found on Kaggle here

Kaggle dataset origin, domain assumptions and data collection information:

Nuno Antonio, Ana de Almeida, Luis Nunes. A hotel's customer's personal, behavioral, demographic, and geographic dataset from Lisbon, Portugal (2015-2018). Data in Brief 33(2020)106583, 24(November), 2020. URL: https://www.sciencedirect.com/journal/data-in-brief.



Preview

(Table of Contents)


Getting Started

(Table of Contents)

  1. Clone this repo (for help see this tutorial).

  2. Raw Data is being kept [here](Repo folder containing raw data) within this repo.

  3. Data processing/transformation scripts are being kept [here](Repo folder containing data processing scripts/notebooks)

  4. Recreate environment and dependencies using this file

    • Using anaconda prompt . . .
  5. Follow setup [instructions](Link to file)


Featured Notebooks and Deliverables

(Table of Contents)


Credits

Kaggle dataset origin, domain assumptions and data collection information:

Nuno Antonio, Ana de Almeida, Luis Nunes. A hotel's customer's personal, behavioral, demographic, and geographic dataset from Lisbon, Portugal (2015-2018). Data in Brief 33(2020)106583, 24(November), 2020. URL: https://www.sciencedirect.com/journal/data-in-brief.


Contact

  • Connect with me on Linkedin here.
  • Personal website coming soon . . .

About

3 years of hotel data will provide insight into customer segments, value and attrition risk.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors