Predicting Hotel Cancellations

Can we predict if a guest will cancel a reservation at the moment of booking?

The booking process has changed dramatically over the past decade, with more guests choosing to book online rather than direct. While the accessibility of online travel agencies may increase exposure and demand for many hotels, it has been met with an increase in cancellation rates. While cancellations are a familiar foe of the hotel industry, it has been the advent of the "risk free cancellations" campaign put on by online travel agencies that have made it a damaging statistic worthy of a second look.

According to a study conducted by D-Edge Hospitality Solutions, cancellation rates in the hotel industry peaked at 41.3% in 2017, up from around 32% in 2014. What is importance to note here is how heavily skewed this average is by online travel agencies like "Booking.com" who posted a whopping 50% cancellation rate in 2018^[Hertzfeld, Esther. Study: Cancellation Rate at 40% as OTAs Push Free Change Policy. Hotel Management, 23 Apr. 2019, www.hotelmanagement.net/tech/study-cancelation-rate-at-40-as-otas-push-free-change-policy.] This is in stark contrast to an average cancellation rate of 18.2% in 2018 for customers booking direct. The booking process has changed and hotels are now forced to find ways of limiting the damage caused by cancellations. My work here aims to predict cancellations and offer a solution based on early outreach for red flags or high cancellation risk bookings.

The dataset was obtained from Science Direct and contains a collection of observations taken from 2015 to 2017 of guest bookings for two hotels, both located in Portugal. The data was collected directly from the hotels’ PMS (property management system) database and was relatively clean and structured upon retrieval. Each observation represents a booking, with the variables capturing details including when it was booked, dates of stay, through which operator the guest chose to go through, if that booking was cancelled and the average daily rate.

Overview

The project comprises all steps of Data Science work broken down as follows:

Data collection and wangling: done in Jupyter Notebook
Exploratory Data Analysis: using python in Rstudio with the reticulate library for statistical data analysis
Machine learning: using Python - Logistic Regression and Random Forests with scikit-learn and a final CatBoost algorithm in Jupyter Notebook
Report completed and rendered as Rmarkdown document

Links

The work has been broken down in stages and summary slides have been created for a quick look at the results.

Data Wrangling and Cleaning - Cleaning steps with justification
Exploratory analysis - Found trends and initial insights into cancellations
Statistical Analysis - Identified statistically significant features
Machine Learning - The Catboost plots render nicely here
Final Report - Very little code, reports findings of each stage
Summary Slides Link to ppt

Author

Matthew Merrill

Acknowledgements

Dhiraj Khanna - Springboard mentor

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Jupyter_Notebooks		Jupyter_Notebooks
Milestone_files/figure-gfm		Milestone_files/figure-gfm
Milestone.md		Milestone.md
README.md		README.md
Summary_Slides.pdf		Summary_Slides.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Hotel Cancellations

Overview

Links

Author

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predicting Hotel Cancellations

Overview

Links

Author

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages