One concern of LRN is that after simplifying the recurrent component, modeling capacity, in particular the long-range dependency, would be weakened. We answer this question by doing experiments on document classification.
We choose:
- Amazon Review Polarity (AmaPolar, 2 labels, 3.6M/0.4M for training/testing)
- Amazon Review Full (AmaFull, 5 labels, 3M/0.65M for training/testing)
- Yahoo! Answers (Yahoo, 10 labels, 1.4M/60K for training/testing)
- Yelp Review Polarity (YelpPolar, 2 labels, 0.56M/38K for training/testing)
Dataset comes from Zhang et al. (2015). We use a birnn model followed by an attentive pooling layer. Char and Glove embeddings are used for word representation. Main experimental results are summarized below.
| Model | #Params | AmaPolar | Yahoo | AmaFull | YelpPolar | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| ERR | Time | ERR | Time | ERR | Time | ERR | Time | |||
| Zhang et al. (2015) | - | 6.10 | - | 29.16 | - | 40.57 | - | 5.26 | - | |
| This Work |
LSTM | 227K | 4.37 | 0.947 | 24.62 | 1.332 | 37.22 | 1.003 | 3.58 | 1.362 |
| GRU | 176K | 4.39 | 0.948 | 24.68 | 1.242 | 37.20 | 0.982 | 3.47 | 1.230 | |
| ATR | 74K | 4.78 | 0.867 | 25.33 | 1.117 | 38.54 | 0.836 | 4.00 | 1.124 | |
| SRU | 194K | 4.95 | 0.919 | 24.78 | 1.394 | 38.23 | 0.907 | 3.99 | 1.310 | |
| LRN | 151K | 4.98 | 0.731 | 25.07 | 1.038 | 38.42 | 0.788 | 3.98 | 1.022 | |
Time: time in seconds per training batch measured from 1k training steps.
tensorflow >= 1.8.1
-
download and preprocess dataset
-
The dataset link: https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M
-
Prepare embedding and vocabulary
Download the pre-trained GloVe embedding. Generate vocabulary for each task as follows:
task=(amafull amapolar yahoo yelppolar) python code/run.py --mode vocab --config config.py --parameters=task="${task}",output_dir="${task}_vocab"
-
-
training and evaluation
- Train the model as follows:
# configure your cuda libaray if necessary export CUDA_ROOT=XXX export PATH=$CUDA_ROOT/bin:$PATH export LD_LIBRARY_PATH=$CUDA_ROOT/lib64:$LD_LIBRARY_PATH task=(amafull amapolar yahoo yelppolar) python code/run.py --mode train --config config.py --parameters=task="${task}",output_dir="${task}_train",gpus=[1],word_vocab_file="${task}_vocab/vocab.word",char_vocab_file="${task}_vocab/vocab.char",enable_hierarchy=False,nthreads=2,enable_bert=False,cell="lrn",swap_memory=FalseOther hyperparameter settings are available in the given config.py.
- Test the model as follows:
task=(amafull amapolar yahoo yelppolar) python code/run.py --mode test --config config.py --parameters=task="${task}",output_dir="${task}_train/best",gpus=[0],word_vocab_file="${task}_vocab/vocab.word",char_vocab_file="${task}_vocab/vocab.char",enable_hierarchy=False,nthreads=2,enable_bert=False,cell="lrn",swap_memory=False,train_continue=False,test_output=${task}.out.txt
Source code structure is adapted from zero.