Standard encoder-decoder NMT (following Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu et el)
- python 3.6
- torch 1.2
- tensorboard 1.14+
- psutil
- dill
- CUDA 9
- Source / target files: one sentence per line
- Source / target vocab files: one vocab per line, the top 5 fixed to be
<pad> <unk> <s> </s> <spc>as defined inutils/config.py
To train the model - check Examples/train.sh
train_path_src- path to source file for trainingtrain_path_tgt- path to target file for trainingdev_path_src- path to source file for validation (default set toNone)dev_path_tgt- path to target file for validation (default set toNone)path_vocab_src- path to source vocab listpath_vocab_tgt- path to target vocab listload_embedding_src- load pretrained src embedding if providedload_embedding_tgt- load pretrained target embedding if provideduse_type-wordor tokenise intocharsave- dir to save the trained modelrandom_seed- set random seedshare_embedder- share embedding matrix across source and targetembedding_size_enc- source embedding sizeembedding_size_dec- target embedding sizehidden_size_enc- encoder hidden sizenum_bilstm_enc- number of encoder BiLSTM layersnum_unilstm_enc- number of encoder UniLSTM layers (default0)hidden_size_dec- decoder hidden sizenum_unilstm_dec- number of decoder UniLSTM layersatt_mode- attention modebahdanau | bilinear | hybridhidden_size_att- only used ifatt_modeis set tohybridresidual- residual connection across LSTM layershidden_size_shared- transformed attention output hidden sizemax_seq_len- maximum sequence length, longer sentences filtered out in trainingbatch_size- batch sizebatch_first- set toTrueseqrev- train seq2seq in reverse ordereval_with_mask- compute loss on non<pad>tokens (defaultTrue)scheduled_sampling- scheduled samplingteacher_forcing_ratio- probability to run in teacher forcing mode, set to1.0for teacher forcing to be used throughoutdropout- dropout rateembedding_dropout- embedding dropout ratenum_epochs- number of epochsuse_gpu- set toTrueif GPU device is availablelearning_rate- learning ratemax_grad_norm- gradient clippingcheckpoint_every- number of batches trained for 1 checkpoint saved (ifdev_path*not given, save after every epoch)print_every- number of batches trained for train losses printedmax_count_no_improve- used whendev_path*is given, number of batches trained (with no improvement in accuracy on dev set) before roll backmax_count_num_rollback- reduce learning rate if rolling back for multiple timeskeep_num- number of checkpoint kept in model dir (used ifdev_path*is given)normalise_loss- normalise loss on per token basisminibatch_split- if OOM, split batch into minibatch (note gradient descent still is done per batch, not minibatch)
To test the model - check Examples/translate.sh
test_path_src- path to source textseqrev- translate in reverse order or notpath_vocab_src- be consistent with trainingpath_vocab_tgt- be consistent with traininguse_type- be consistent with trainingload- path to model checkpointtest_path_out- path to save the translated textmax_seq_len- maximum translation sequence length (set to be at least larger than the maximum source sentence length)batch_size- batch size in translation, restricted by memoryuse_gpu- set toTrueif GPU device is availablebeam_width- beam search decodingeval_mode- default1(other modes for debugging)