Skip to content

gsbdarc/HHT_titantic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚢 Titanic Analysis (scikit-learn OpenML)

"I'm the king of the world!" — but can our model predict who survives?

Welcome aboard the RMS Data Science, where we set sail through the famous Titanic dataset! This project downloads passenger records via sklearn.datasets.fetch_openml, navigates the icy waters of exploratory data analysis (EDA), and trains a baseline model to predict who makes it to the lifeboats. 🧊

         |    |    |
        )_)  )_)  )_)
       )___))___))___)\ 
      )____)____)_____)\\
    _____|____|____|____\\\__
---------\                 /---------
  ^^^^^ ^^^^^^^^^^^^^^^^^^^^^
    ^^^^      ^^^^     ^^^    ^^
         ^^^^      ^^^

🪝 Boarding Pass (Setup)

First class passengers only — set up your environment before the ship departs!

Prerequisites: Python 3.8+ required

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

⚓ All Aboard — Run the Analysis

Sound the foghorn and fire up the script:

python src/titanic_analysis.py

🗺️ Cargo Hold — Outputs

All treasures are stored in the outputs/ directory:

Artifact Description
missingness.csv Which passenger records fell overboard?
target_balance.csv Survivors vs. the deep blue sea
summary_numeric.csv By the numbers: age, fare, and more
survival_distribution.png Who lived to tell the tale?
age_by_survival.png Did age determine your fate?
fare_by_survival.png Did a first-class ticket save you?
survival_by_class.png Class matters (unfortunately)
confusion_matrix.png Where did our model go wrong?
metrics.json The captain's log: model performance

📓 Interactive Voyage — Run the Notebook

Prefer a guided tour of the ship? Launch the interactive notebook:

jupyter notebook

Open notebooks/titanic_analysis.ipynb and run all cells — no iceberg warnings required.


🌊 Key Questions We're Exploring

  • Did passenger class (1st, 2nd, 3rd) affect survival odds?
  • Was age a factor — were children given priority?
  • Did paying a higher fare improve your chances?
  • How well can a logistic regression model predict survival?

🆘 SOS — Distress Signals (Troubleshooting)

If anything goes wrong:

  1. Make sure your virtual environment is activated: source .venv/bin/activate
  2. Reinstall dependencies: pip install -r requirements.txt
  3. Check your Python version: python --version (Python 3.8+ recommended)

"She's made of iron, sir. I assure you, she can. And she will." 🛳️
...our model, on the other hand, might need a little tuning. 😄


🕯️ In Memoriam

Beyond the data and the models lies a profound human story.

On the night of April 14–15, 1912, the RMS Titanic struck an iceberg in the North Atlantic and sank in less than three hours. Of the estimated 2,224 passengers and crew on board, more than 1,500 lives were lost — making it one of the deadliest peacetime maritime disasters in history.

The rows in this dataset are not just numbers. Each one represents a real person — a father, a mother, a child, a dreamer — who boarded that ship with hopes and plans. We use their stories to learn about data science, and in doing so, we honor them by remembering that behind every data point is a human life.

"The ship that will never be forgotten, crewed by those who must never be forgotten."

🕯️ Rest in peace to all those who perished on April 15, 1912.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors