🚢 Titanic Data Preparation Challenge

This project is a complete data engineering workflow that transforms the raw Titanic dataset into a clean, structured format optimized for Machine Learning models.

🎯 Project Objective

The goal was to move from "raw and messy" data to "AI-ready" data. This involves handling missing values, encoding categorical variables, and engineering new features to improve potential model accuracy.

🛠️ Tech Stack

VS Code: Primary development environment.
Python 3.10+: Core programming language.
Pandas & NumPy: For data manipulation and numerical logic.
Matplotlib & Seaborn: For exploratory data analysis (EDA).

🚀 Key Steps Applied

1. Data Cleaning & Imputation

Age: Filled missing values using the median grouped by Passenger Class and Sex.
Embarked: Filled missing values with the mode (most common port).
Cabin: Dropped due to having over 77% missing data.

2. Feature Engineering

I created several new features to capture hidden patterns:

Title: Extracted from names (Mr, Mrs, Miss, etc.).
FamilySize: Combined SibSp and Parch.
IsAlone: A binary flag for passengers traveling without family.
FareBin & AgeBin: Grouped continuous data into logical categories.

3. Categorical Encoding

Converted Sex to binary (0/1).
Applied One-Hot Encoding to Embarked and Title to ensure the data is 100% numerical.

4. Feature Selection

Dropped non-predictive columns: PassengerId, Name, Ticket, SibSp, and Parch.

📂 Project Structure

titanic_data_preparation.ipynb: The complete Jupyter Notebook with step-by-step logic.
train.csv: The original raw dataset.
titanic_clean_ai_ready.csv: The final processed output.

Completed as part of the AI Data Preparation Challenge.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚢 Titanic Data Preparation Challenge

🎯 Project Objective

🛠️ Tech Stack

🚀 Key Steps Applied

1. Data Cleaning & Imputation

2. Feature Engineering

3. Categorical Encoding

4. Feature Selection

📂 Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
titanic_clean_ai_ready.csv		titanic_clean_ai_ready.csv
titanic_data_preparation.ipynb		titanic_data_preparation.ipynb
train.csv		train.csv

Folders and files

Latest commit

History

Repository files navigation

🚢 Titanic Data Preparation Challenge

🎯 Project Objective

🛠️ Tech Stack

🚀 Key Steps Applied

1. Data Cleaning & Imputation

2. Feature Engineering

3. Categorical Encoding

4. Feature Selection

📂 Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages