"I'm the king of the world!" — but can our model predict who survives?
Welcome aboard the RMS Data Science, where we set sail through the famous Titanic dataset! This project downloads passenger records via sklearn.datasets.fetch_openml, navigates the icy waters of exploratory data analysis (EDA), and trains a baseline model to predict who makes it to the lifeboats. 🧊
| | |
)_) )_) )_)
)___))___))___)\
)____)____)_____)\\
_____|____|____|____\\\__
---------\ /---------
^^^^^ ^^^^^^^^^^^^^^^^^^^^^
^^^^ ^^^^ ^^^ ^^
^^^^ ^^^
First class passengers only — set up your environment before the ship departs!
Prerequisites: Python 3.8+ required
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtSound the foghorn and fire up the script:
python src/titanic_analysis.pyAll treasures are stored in the outputs/ directory:
| Artifact | Description |
|---|---|
missingness.csv |
Which passenger records fell overboard? |
target_balance.csv |
Survivors vs. the deep blue sea |
summary_numeric.csv |
By the numbers: age, fare, and more |
survival_distribution.png |
Who lived to tell the tale? |
age_by_survival.png |
Did age determine your fate? |
fare_by_survival.png |
Did a first-class ticket save you? |
survival_by_class.png |
Class matters (unfortunately) |
confusion_matrix.png |
Where did our model go wrong? |
metrics.json |
The captain's log: model performance |
Prefer a guided tour of the ship? Launch the interactive notebook:
jupyter notebookOpen notebooks/titanic_analysis.ipynb and run all cells — no iceberg warnings required.
- Did passenger class (1st, 2nd, 3rd) affect survival odds?
- Was age a factor — were children given priority?
- Did paying a higher fare improve your chances?
- How well can a logistic regression model predict survival?
If anything goes wrong:
- Make sure your virtual environment is activated:
source .venv/bin/activate - Reinstall dependencies:
pip install -r requirements.txt - Check your Python version:
python --version(Python 3.8+ recommended)
"She's made of iron, sir. I assure you, she can. And she will." 🛳️
...our model, on the other hand, might need a little tuning. 😄
Beyond the data and the models lies a profound human story.
On the night of April 14–15, 1912, the RMS Titanic struck an iceberg in the North Atlantic and sank in less than three hours. Of the estimated 2,224 passengers and crew on board, more than 1,500 lives were lost — making it one of the deadliest peacetime maritime disasters in history.
The rows in this dataset are not just numbers. Each one represents a real person — a father, a mother, a child, a dreamer — who boarded that ship with hopes and plans. We use their stories to learn about data science, and in doing so, we honor them by remembering that behind every data point is a human life.
"The ship that will never be forgotten, crewed by those who must never be forgotten."
🕯️ Rest in peace to all those who perished on April 15, 1912.