Titanic Survival Prediction
This project is a Machine Learning pipeline designed to predict the survival of Titanic passengers, based on the classic dataset provided by Kaggle. It includes data analysis, feature engineering, predictive models (Decision Tree, Linear Regression), and optimization using Grid Search. Project Structure
titanic_project/
├── titanic_cleaned.csv # Preprocessed dataset
├── titanic_features.csv # Features ready for the model
├── titanic_target.csv # Target variable (Survived)
├── features.py # Feature engineering function
├── decision_tree_model.py # Decision Tree model + evaluation
├── grid_search_dt.py # Decision Tree with Grid Search
├── linear_regression_model.py # Linear Regression and evaluation
├── outlier_removal.py # Outlier removal with IQR
├── README.md # This file
Dataset
The original dataset has been preprocessed to include:
Handling of missing values
Encoding of categorical variables (Sex, Embarked, etc.)
Normalization and scaling
Creation of new features
Implemented Models Decision Tree Classifier
Evaluation with Accuracy, Classification Report, Confusion Matrix
Tree visualization
Optimization with GridSearchCV
Linear Regression
Used for binary classification with a threshold (≥ 0.5)
Evaluation with R², RMSE, and Accuracy
Conversion from continuous regression to classification
Analysis and Visualization
Box plots for outlier detection
Outlier removal using the IQR method
Confusion matrix heatmap
Interpretive plots with Matplotlib and Seaborn
Setup and Requirements
Ensure you have Python 3.8+ and install the main dependencies: Bash
pip install -r requirements.txt
Example requirements.txt content:
pandas
numpy
scikit-learn
matplotlib
seaborn
Execution
Example for training and testing the Decision Tree: Bash
python decision_tree_model.py
Example for running the Grid Search: Bash
python grid_search_dt.py
Results Achieved
Final Accuracy (optimized Decision Tree): $\sim$0.82
Best parameters found via Grid Search
Identification and removal of outliers for dataset improvement
Authors: Giacomo Visciotti, Simone Verrengia