Binary classification for telecom churn — AUC = 0.914 on public leaderboard.
Predicts telecom customer churn for the Kaggle Playground Series S6E3 competition. 75 experiments across 6 model families, distilled into an 11-model greedy ensemble. Raw features consistently beat engineered ones — the gains came from ensemble diversity, not feature engineering.
| Method | CV AUC | Public LB |
|---|---|---|
| Best single model (CatBoost) | 0.91621 | 0.91370 |
| Top-20 average ensemble | 0.91672 | 0.91401 |
| Greedy forward selection (11 models) | 0.91691 | ~0.91410 |
- Python 3.11+
git clone https://github.com/YOUR_USERNAME/kaggle-customer-churn.git
cd kaggle-customer-churn
pip install scikit-learn lightgbm xgboost catboost optuna pandas numpyDownload competition data from Kaggle into data/.
python train_churn.py --features raw --model catboost --seeds 3kaggle-customer-churn/
├── train_churn.py # Main training script (5-fold CV)
├── run_azure.py # Azure ML job orchestrator
├── advanced_experiments.py # Target/frequency encoding experiments
├── optuna_hpo.py # CatBoost/LightGBM Bayesian HPO
├── lgbm_hpo.py # Focused LightGBM + XGBoost HPO
├── diverse_models.py # HistGBT, ExtraTrees, Ridge diversity
├── fast_ensemble.py # Simple averaging, rank averaging
├── greedy_ensemble.py # Greedy forward selection + stacking
├── pseudo_label.py # Semi-supervised pseudo-labeling
├── figures.py # Visualizations
├── improvement_log.md # Detailed experiment log
├── session_log.md # Azure ML session log
├── azure_config.example.json # Azure config template
├── data/ # Competition data (gitignored)
└── results/ # 75 experiment runs (gitignored)
Data: 594k training rows, 255k test rows, 19 features (demographics, services, billing). Binary target: churn yes/no.
Approach: Tested 6 model families (CatBoost, LightGBM, XGBoost, HistGBT, ExtraTrees, Ridge) with raw label-encoded features. Bayesian HPO via Optuna (80 LightGBM + 40 XGBoost trials). Final ensemble built via greedy forward selection — iteratively adding models only if they improved CV AUC, selecting 11 from 75 candidates.
Validation: 5-fold stratified CV. OOF predictions saved for offline ensemble experimentation.
# Train with different models
python train_churn.py --features raw --model lgbm --seeds 5
python train_churn.py --features raw --model xgboost --seeds 3
# Run Bayesian HPO
python lgbm_hpo.py
python optuna_hpo.py
# Add ensemble diversity
python diverse_models.py
# Build final ensemble
python greedy_ensemble.pyAzure ML is optional.
cp azure_config.example.json azure_config.json
python run_azure.pypip install scikit-learn lightgbm xgboost catboost optuna pandas numpy
python train_churn.py --features raw --model catboost --seeds 3
python greedy_ensemble.pyRandom seed: 42. Multi-seed training (3-5 seeds) for variance estimation.
- Raw features beat engineered features — trees handled categoricals better as simple label-encoded integers
- Ensemble diversity > individual quality — weak models (HistGBT, ExtraTrees) contributed via architectural diversity
- Greedy selection > simple averaging — selecting 11 from 75 models outperformed blindly averaging top-20
- Multi-seed training is low-hanging fruit — averaging 5 LightGBM seeds already beats the single best model
- HPO needs enough trials — 12 CatBoost trials in 10-dim space was insufficient; 50+ needed
- Kaggle Playground Series S6E3
- Azure ML for cloud compute (~$0.27 total)