Repository files navigation
Ensemble of Random Forest, XgBoost, CatBoost, LightGBM, KNN
Stratified KFold with the same seed used for all the models
UserId, Unnamed: 0 increased accuracy and F1_score
Creations == 0 perfectly classified class 1
Non zero creations had all the classes equally distributed
Trained XgBoost on the dataset with only non-zero creations and Hardcoded class to be 1 whenever creations are 0
Trained XgBoost on the entire dataset with a new Binary feature "zero creations"
Decomposed the feature columns to 3 using PCA
Trained XgBoost on the entire dataset by adding the 3 new decomposed columns
Feature selection using feature importances of the models and Recursive Feature Elimination CV
Out-of-fold cross-validation(OOF) was used instead of average across all the folds for deciding weights of the ensemble. Source
RANDOM FOREST - 1
Important Features were selected using feature importances.
Two Random Forests were combined and used together for better predictions.
The soft probabilities from both the trees were averaged to get the final probabilities.
Notebook
RANDOM FOREST - 2
LIGHT GBM
User Id and Unnamed used for better age_group prediction.
Tuned the parameters for the Number of leaves with regularization to increase the score.
Explored the double trees (class provided in the notebook), but the score decreased.
Notebook
CATBOOST
User Id and Unnamed used for better age_group prediction.
The parameters were tuned using Sklearn optimizer and the tuning code is provided.
Triple Tree was explored but the score decreased with the addition of User Id.
Notebook
KNN CLASSIFIER
Trained KNN Classifier on GPU using RAPIDS Library.
Found the best number of neighbors using the Elbow method.
Notebook
DOUBLE XGBOOST - WITH MANUAL TUNING
Removed Features using feature importances and Recursive feature elimination.
Used Two Xgboosts together for better predictions.
The Parameters were tuned manually.
Notebook
XGBOOST - BASELINE
Removed features using Recursive Feature elimination.
Trained Baseline Xgboost with no parameter tuning.
Notebook
XGBOOST - UNNAMED
Removed features using feature importances.
Used Unnamed: 0 and user id feature for better classification.
Tuned the number of trees and regularization parameters by hand and the results are documented in the form of comments.
Notebook
XGBOOST - UNNAMED AND NON ZERO TRAINING
Removed features using feature importances.
Trained only on the samples with Creations value non zero which resulted in faster training and better results.
The Samples which had Creations zero were hardcoded and classified to age group 1.
The above method was discovered after the EDA of the training data.
Notebook
XGBOOST - UNNAMED AND BINARY FEATURES
Removed Features using feature importances and Recursive feature elimination.
Created New Binary features with Creation feature which gave the model the information about the sparse nature of the Creation column in the training data.
Notebook
XGBOOST - PCA
Used Principal Component Analysis (PCA) for decomposing the data and creating new features for the model.
The PCA Features were used in addition to the original data so as to not lose the information from the old data.
Notebook
ENSEMBLING
All the above models were used to generate OOF Files which were then blended using scipy optimizer.
We found appropriate weights using the oof files and used them to combine predictions.
The Blending increased the f1 score as much as by 1.2
Notebook
Neural Networks couldn't cross 70. We tried both normal Neural Network and a Skip connection (Resnet) Model. (Approach)
Tabnet couldn't cross 67, fluctuated a lot
Training on GPU resulted in lower f1_score (0.3 less than CPU). (RESOURCE)
T-SNE decomposition very slow (9 hours on GPU not enough)
Tried using kernelPCA but faced a memory limit error.
SVM very slow (9 hours on CPU not enough for completing even a single fold)
Kaggle results not reproducible on Colab
Fast.ai Tabular learner gave high training loss
Better Hyperparameter Search using Optuna (our approach)
Better Feature engineering can be done.
More Deep learning approaches can be explored.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.