Datathon FME 2025 – Mango

View the Oink Oink Project on Devpost

Description

This repository contains our final demand forecasting pipeline for Mango, developed for the Datathon FME 2025.

Goal: predict the optimal production quantity of garments for the next season.

The final implementation is in model.py (cleaned and modularized), with EDA.py and inference.py helpers. The training pipeline produces an ensemble that was used in the original experiments.

Repository Structure

.
├── data/
│   ├── train.csv        # Historical training data (semicolon-separated)
│   └── test.csv         # Test data for prediction
├── EDA.py               # Exploratory Data Analysis script (plots & visualization)
├── model.py             # Main training pipeline (single-run script)
├── inference.py         # Simple CLI to predict using saved models from a JSON input
└── outputs/             # Output models, artifacts and submissions

Final Pipeline (`model.py`)

Main steps:

Library imports
- pandas, numpy, sklearn, catboost, etc.
Global configuration
- Paths, PCA parameters, cross-validation, ensemble weights
Feature engineering
- Data cleaning and aggregation
- Parsing and PCA of image embeddings
- Aggregated features by family, category, and attributes
- Logarithmic normalization of numerical features
Training of finalist models
- Model A: Alpha=0.78, learning_rate=0.01 (more stable)
- Model B: Alpha=0.75, learning_rate=0.03 (more aggressive)
- CatBoost with K-Fold CV to select optimal iterations
Weighted ensemble
- 60% Model A + 40% Model B
- Inverse log1p transformation to obtain real predictions
Submission generation
- Writes submission.csv at the repository root

Requirements

Python >= 3.9
pandas
numpy
scikit-learn
catboost

pip install pandas numpy scikit-learn catboost

Usage

Place train.csv and test.csv in the data/ folder
Run the training pipeline (saves models to outputs/ and writes submission.csv at repo root):

python model.py

You will get submission.csv and models/artifacts under outputs/.

Exploratory Data Analysis (EDA)

EDA.py generates a set of plots to inspect the training dataset. To execute it (and open plots), run:

python EDA.py

If running in a headless environment, you may redirect or save each plot; the script prints status messages as it runs.

Inference from JSON (optional)

If you want to test the model on a JSON payload mirroring data/test.csv, use inference.py:

python inference.py --input-json sample_input.json --output-json predictions.json

This outputs a list of { ID, TARGET }. The script reuses the same feature pipeline and needs data/train.csv present for group aggregations.

How to run the model

Install requirements:

pip install -r requirements.txt

Place CSVs under data/ (semicolon-separated): train.csv, test.csv.
Train and generate submission:

python model.py

The submission is saved as submission.csv at the project root.

Achievements and Learnings

CatBoost model ensemble achieved 55.57900 accuracy
Robust feature engineering was more decisive than hypertuning complex models
Combination of image embeddings, categorical attributes, and multi-season historical data was key
Temporal validation (TimeSeriesSplit) avoided data leakage and enabled generalizable models

Next Steps

Train our own visual embeddings
Explore TabNet or LightGBM with automatic tuning
Add interpretability to the pipeline to understand which attributes generate more demand
Automate the entire workflow for real production

Credits

Team Oink Oink – AI Students UPC, Datathon FME 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
__pycache__		__pycache__
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
EDA.py		EDA.py
Presentation_Mango-Datathon2025.pdf		Presentation_Mango-Datathon2025.pdf
README.md		README.md
inference.py		inference.py
model.py		model.py
requirements.txt		requirements.txt
sample_input.json		sample_input.json
submission.csv		submission.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datathon FME 2025 – Mango

Description

Repository Structure

Final Pipeline (`model.py`)

Main steps:

Requirements

Usage

Exploratory Data Analysis (EDA)

Inference from JSON (optional)

How to run the model

Achievements and Learnings

Next Steps

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Datathon FME 2025 – Mango

Description

Repository Structure

Final Pipeline (model.py)

Main steps:

Requirements

Usage

Exploratory Data Analysis (EDA)

Inference from JSON (optional)

How to run the model

Achievements and Learnings

Next Steps

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Final Pipeline (`model.py`)

Packages