Skip to content

JacobHess03/ML-Titanic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Titanic Survival Prediction

This project is a Machine Learning pipeline designed to predict the survival of Titanic passengers, based on the classic dataset provided by Kaggle. It includes data analysis, feature engineering, predictive models (Decision Tree, Linear Regression), and optimization using Grid Search. Project Structure

titanic_project/
├── titanic_cleaned.csv         # Preprocessed dataset
├── titanic_features.csv        # Features ready for the model
├── titanic_target.csv          # Target variable (Survived)
├── features.py                 # Feature engineering function
├── decision_tree_model.py      # Decision Tree model + evaluation
├── grid_search_dt.py           # Decision Tree with Grid Search
├── linear_regression_model.py  # Linear Regression and evaluation
├── outlier_removal.py          # Outlier removal with IQR
├── README.md                   # This file

Dataset

The original dataset has been preprocessed to include:

Handling of missing values
Encoding of categorical variables (Sex, Embarked, etc.)
Normalization and scaling
Creation of new features

Implemented Models Decision Tree Classifier

Evaluation with Accuracy, Classification Report, Confusion Matrix
Tree visualization
Optimization with GridSearchCV

Linear Regression

Used for binary classification with a threshold (≥ 0.5)
Evaluation with R², RMSE, and Accuracy
Conversion from continuous regression to classification

Analysis and Visualization

Box plots for outlier detection
Outlier removal using the IQR method
Confusion matrix heatmap
Interpretive plots with Matplotlib and Seaborn

Setup and Requirements

Ensure you have Python 3.8+ and install the main dependencies: Bash

pip install -r requirements.txt

Example requirements.txt content:

pandas
numpy
scikit-learn
matplotlib
seaborn

Execution

Example for training and testing the Decision Tree: Bash

python decision_tree_model.py

Example for running the Grid Search: Bash

python grid_search_dt.py

Results Achieved

Final Accuracy (optimized Decision Tree): $\sim$0.82
Best parameters found via Grid Search
Identification and removal of outliers for dataset improvement

Authors: Giacomo Visciotti, Simone Verrengia

About

This project is a Machine Learning pipeline designed to predict the survival of Titanic passengers, based on the classic dataset provided by Kaggle.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages