You will need the standard data science libraries found in the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*.
This project is to analyse the Kaggle Titanic dataset that has samples that list passengers who survived or did not survive the Titanic disaster. Our model will analyze what made the passengers survive or not to survive the disaster.
As we all know, the vessel left on its maiden voyage from Southampton (S) to New York on April 10, 1912, en route passing through Cherbourg-Octeville (C) in France and through Queenstown (Q) in Ireland. It crashed into an iceberg and sank on April 15 with 2,224 people on board, making it one of the biggest maritime disasters in all history.
The sinking of the Titanic can be attributed to several causes, natural and human. The high number of deaths can be attributed to the lack of lifeboats, and insufficient capacity for everyone on board.
There was also the fact that some groups of people were more likely to survive than others, such as women, children and the upper class. But to draw conclusions with scientific bases we will analyze the data from our csv file.
Specifically, I looked at the following questions:
- What is the mean age of passengers on board?
- How is the distribution of passengers on the ship by class?
- What is the mean sex of the passengers who survived?
- What class of passengers survived?
- What is the mean age of the passengers who survived?
- What were the factors that made people survive?
The following are the files available in this repository:
-
Titanic_Dataset.ipynb- a notebook of the analysis performed following the CRISP-DM process -
titanic-data-6.csv- contains the data analysed by the .ipynb file. To use it properly put it in the same directory as the .ipynb file.
The results are saved in the Titanic_Dataset.ipynb file in the repository.
This study uses passenger data from the voyage of the RMS Titanic (1912). Data can be obtained from Kaggle.
You can find the Licensing for the data and other descriptive information on Kaggle website.