The dataset is Stanford Natural Language Inference (SNLI), which we regard as a three-way classification tasks. We use an encoder-attention-decoder architecture, and stack two additional birnn upon the final sequence representation. Both GloVe word embedding and character embedding is used for word-level representation. Main experimental results are summarized below.
| Model | #Params | Base | +LN | +BERT | +LN+BERT | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC | Time | ACC | Time | ACC | Time | ACC | Time | |||
| Rocktaschel et al. (2016) | 250K | 83.50 | - | - | - | - | - | - | - | |
| This Work |
LSTM | 8.36M | 84.27 | 0.262 | 86.03 | 0.432 | 89.95 | 0.544 | 90.49 | 0.696 |
| GRU | 6.41M | 85.71 | 0.245 | 86.05 | 0.419 | 90.29 | 0.529 | 90.10 | 0.695 | |
| ATR | 2.87M | 84.88 | 0.210 | 85.81 | 0.307 | 90.00 | 0.494 | 90.28 | 0.580 | |
| SRU | 5.48M | 84.28 | 0.258 | 85.32 | 0.283 | 89.98 | 0.543 | 90.09 | 0.555 | |
| LRN | 4.25M | 84.88 | 0.209 | 85.06 | 0.223 | 89.98 | 0.488 | 89.93 | 0.506 | |
LN: layer normalizaton; Time: time in seconds per training batch measured from 1k training steps.
tensorflow >= 1.8.1
-
download and preprocess dataset
-
The dataset link: https://nlp.stanford.edu/projects/snli/
-
Prepare separate data files:
We provide a simple processing script
convert_to_plain.pyin scripts folder. By calling:python convert_to_plain.py snli_1.0/[ds].txtyou can get the
*.p, *.q, *.lfiles as in theconfig.py. [ds] indicatessnli_1.0_train.txt,snli_1.0_dev.txtandsnli_1.0_test.txt. We only preserve'entailment', 'neutral', 'contradiction'instances, and others are dropped. -
Prepare embedding and vocabulary
Download the pre-trained GloVe embedding. And prepare the char as well as word vocabulary using
vocab.pyas follows:# word embedding & vocabulary python vocab.py --embeddings [path-to-glove-embedding] train.p,train.q,dev.p,dev.q,test.p,test.q word_vocab # char embedding python vocab.py --char train.p,train.q,dev.p,dev.q,test.p,test.q char_vocab -
Download BERT pre-trained embedding (if you plan to work with BERT)
-
-
training and evaluation
- Train the model as follows:
# configure your cuda libaray if necessary export CUDA_ROOT=XXX export PATH=$CUDA_ROOT/bin:$PATH export LD_LIBRARY_PATH=$CUDA_ROOT/lib64:$LD_LIBRARY_PATH # LRN python code/run.py --mode train --config config.py --parameters=gpus=[0],cell="lrn",layer_norm=False,output_dir="train_no_ln" >& log.noln # LRN + LN python code/run.py --mode train --config config.py --parameters=gpus=[0],cell="lrn",layer_norm=True,output_dir="train_ln" >& log.ln # LRN + BERT python code/run.py --mode train --config config_bert.py --parameters=gpus=[0],cell="lrn",layer_norm=False,output_dir="train_no_ln_bert" >& log.noln.bert # LRN + LN + BERT python code/run.py --mode train --config config_bert.py --parameters=gpus=[0],cell="lrn",layer_norm=True,output_dir="train_ln_bert" >& log.ln.bertOther hyperparameter settings are available in the given config.py.
- Test the model as follows:
# LRN python code/run.py --mode test --config config.py --parameters=gpus=[0],cell="lrn",layer_norm=False,output_dir="train_no_ln/best",test_output="out.noln" >& log.noln.test # LRN + LN python code/run.py --mode test --config config.py --parameters=gpus=[0],cell="lrn",layer_norm=True,output_dir="train_ln/best",test_output="out.ln" >& log.ln.test # LRN + BERT python code/run.py --mode test --config config_bert.py --parameters=gpus=[0],cell="lrn",layer_norm=False,output_dir="train_no_ln_bert/best",test_output="out.noln.bert" >& log.noln.bert.test # LRN + LN + BERT python code/run.py --mode test --config config_bert.py --parameters=gpus=[0],cell="lrn",layer_norm=True,output_dir="train_ln_bert/best",test_output="out.ln.bert" >& log.ln.bert.test
Source code structure is adapted from zero.