Skip to content

HADIL19/AI-Data-Preparation-4-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚢 Titanic Data Preparation Challenge

This project is a complete data engineering workflow that transforms the raw Titanic dataset into a clean, structured format optimized for Machine Learning models.

🎯 Project Objective

The goal was to move from "raw and messy" data to "AI-ready" data. This involves handling missing values, encoding categorical variables, and engineering new features to improve potential model accuracy.

🛠️ Tech Stack

  • VS Code: Primary development environment.
  • Python 3.10+: Core programming language.
  • Pandas & NumPy: For data manipulation and numerical logic.
  • Matplotlib & Seaborn: For exploratory data analysis (EDA).

🚀 Key Steps Applied

1. Data Cleaning & Imputation

  • Age: Filled missing values using the median grouped by Passenger Class and Sex.
  • Embarked: Filled missing values with the mode (most common port).
  • Cabin: Dropped due to having over 77% missing data.

2. Feature Engineering

I created several new features to capture hidden patterns:

  • Title: Extracted from names (Mr, Mrs, Miss, etc.).
  • FamilySize: Combined SibSp and Parch.
  • IsAlone: A binary flag for passengers traveling without family.
  • FareBin & AgeBin: Grouped continuous data into logical categories.

3. Categorical Encoding

  • Converted Sex to binary (0/1).
  • Applied One-Hot Encoding to Embarked and Title to ensure the data is 100% numerical.

4. Feature Selection

  • Dropped non-predictive columns: PassengerId, Name, Ticket, SibSp, and Parch.

📂 Project Structure

  • titanic_data_preparation.ipynb: The complete Jupyter Notebook with step-by-step logic.
  • train.csv: The original raw dataset.
  • titanic_clean_ai_ready.csv: The final processed output.

Completed as part of the AI Data Preparation Challenge.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors