We use the BiLSTM attention Kfold add features kernel to reach 0.703 score at the Kaggle Quora competition.
This kernel stands on :
- gru-capsule
- How to: Preprocessing when using embedding
- Improve your Score with some Text Preprocessing
- Simple attention layer
- baseline-pytorch-bilstm
- pytorch-starter
| name | value |
|---|---|
| embed_size | 300 |
| max_features | 120000 |
| maxlen | 70 |
| batch_size | 512 |
| n_epochs | 5 |
| n_splits | 5 |
seed_everything : A common headache in this competition is the lack of determinism in the results due to cudnn. This Kerne has a solution in Pytorch.
Function from here.
load_gloveload_fasttextload_para
build_vocab
Borrowed from:
-
Improve your Score with some Text Preprocessing
-
build_vocab -
known_contractions -
clean_contractions -
correct_spelling -
unknown_punct -
clean_numbers -
clean_special_chars -
add_lower -
clean_text -
clean_numbers -
_get_mispell -
replace_typical_misspell
Extra feature part taken here
-
add_features_before_cleaningcount_contains_a_punctcount_contains_a_stringcount_words_more_frequent_in_insccount_words_more_frequent_in_sc
-
add_features_customcount_contains_a_stringcount_words_more_frequent_in_insccount_words_more_frequent_in_sc
add_features
load_and_prec
- lower
- Clean the text
- Clean numbers
- Clean speelings
- fill up the missing values
Add Features
- https://github.com/wongchunghang/toxic-comment-challenge-lstm/blob/master/toxic_comment_9872_model.ipynb
- Tokenize the sentences
- Pad the sentences
- Get the target values
- Splitting to training and a final test set
- shuffling the data
- fill up the missing values
Two embedding matrices have been used. Glove, and paragram. The mean of the two is used as the final embedding matrix. Missing entries in the embedding are set using np.random.normal so we have to seed here too
Code taken here code inspired from: https://github.com/anandsaha/pytorch.cyclic.learning.rate/blob/master/cls.py
CyclicLRbatch_step_triangular_scale_fn_triangular2_scale_fn_exp_range_scale_fnget_lr
Binary LSTM with an attention layer and an additional fully connected layer. Also added extra features taken from a winning kernel of the toxic comments competition. Also using CLR and a capsule Layer. Blended together in concatentation.
Initial idea borrowed from: https://www.kaggle.com/ziliwang/baseline-pytorch-bilstm
-
Embed_Layerforward
-
GRU_Layerinit_weightsforward
-
Caps_Layerforwardsquash
-
Capsule_Mainforward
-
Attentionforward
-
NeuralNetforward
The method for training is borrowed from https://www.kaggle.com/hengzheng/pytorch-starter
-
MyDataset-__getitem____len__
-
sigmoid
Borrowed from: https://www.kaggle.com/ziliwang/baseline-pytorch-bilstm
bestThresshold

