Skip to content

romulloferreira/Titanic_Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Table of Contents

  1. Installation
  2. Project Motivation
  3. File Descriptions
  4. Results
  5. Licensing, Authors, and Acknowledgements

Installation

You will need the standard data science libraries found in the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*.

Project Motivation

This project is to analyse the Kaggle Titanic dataset that has samples that list passengers who survived or did not survive the Titanic disaster. Our model will analyze what made the passengers survive or not to survive the disaster.

As we all know, the vessel left on its maiden voyage from Southampton (S) to New York on April 10, 1912, en route passing through Cherbourg-Octeville (C) in France and through Queenstown (Q) in Ireland. It crashed into an iceberg and sank on April 15 with 2,224 people on board, making it one of the biggest maritime disasters in all history.

The sinking of the Titanic can be attributed to several causes, natural and human. The high number of deaths can be attributed to the lack of lifeboats, and insufficient capacity for everyone on board.

There was also the fact that some groups of people were more likely to survive than others, such as women, children and the upper class. But to draw conclusions with scientific bases we will analyze the data from our csv file.

Specifically, I looked at the following questions:

  1. What is the mean age of passengers on board?
  2. How is the distribution of passengers on the ship by class?
  3. What is the mean sex of the passengers who survived?
  4. What class of passengers survived?
  5. What is the mean age of the passengers who survived?
  6. What were the factors that made people survive?

File Descriptions

The following are the files available in this repository:

  • Titanic_Dataset.ipynb - a notebook of the analysis performed following the CRISP-DM process

  • titanic-data-6.csv - contains the data analysed by the .ipynb file. To use it properly put it in the same directory as the .ipynb file.

Results

The results are saved in the Titanic_Dataset.ipynb file in the repository.

Licensing, Authors, Acknowledgements

This study uses passenger data from the voyage of the RMS Titanic (1912). Data can be obtained from Kaggle.

You can find the Licensing for the data and other descriptive information on Kaggle website.

About

This project is to analyse the Kaggle Titanic dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors