UEDIN WMT20 systems

The University of Edinburgh's WMT20 systems

This directory contains the TA-EN and EN-TA NMT models built for the WMT20 shared news translation task.

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory  -  
[   ]en-ta.tar.gz2020-11-14 20:09 334M 
[   ]ta-en.tar.gz2020-11-14 20:10 334M 

Requirements

The models use the following software:

EN-TA and TA-EN models

The models for each language direction are found in the corresponding tar files.

The BLEU scores of the models on the official dev and test sets are as follows:

Dataset en-ta ta-en
newsdev2019 12.30 21.00
newstest2019 8.40 16.60

Translations for ta-en use a beam size of 18 and for en-ta – beam size 8. See translation scripts for more details.

To use:

  1. Decompress the file for the language direction you need:
    
    tar -xzvf {en-ta,ta-en}.tar.gz
    
    
  2. Paths to preprocessing tools are specified in {en-ta,ta-en}/vars. Please change these to your own installation paths.
  3. Translate:
    
    bash ./{en-ta,ta-en}/translate.sh INPUT "GPU_ID(S)" > OUTPUT
    
    

License

The use of the models provided in this directory is permitted under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported license (CC BY-NC-SA 3.0): https://creativecommons.org/licenses/by-nc-sa/3.0/

Attribution - You must give appropriate credit [please use the citation below], provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

NonCommercial - You may not use the material for commercial purposes.

ShareAlike - If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

Citation

The models are described in the following publication:

Rachel Bawden, Alexandra Birch, Radina Dobreva, Arturo Oncevay, Antonio Valerio Miceli Barone and Philip Williams. The University of Edinburgh’s English-Tamil and English-Inuktitut Submissions to the WMT20 News Translation Task. 2020. In Proceedings of the Fifth Conference on Machine Translation. Online.

@inproceedings{uedin-wmt20,
    title = {The University of Edinburgh’s English-Tamil and English-Inuktitut Submissions to the WMT20 News Translation Task},
    author = {Bawden, Rachel and Birch, Alexandra and Dobreva, Radina and
              Oncevay Marcos, Arturo and Miceli Barone, Antonio Valerio and Williams, Philip},
    publisher = {Association for Computational Linguistics},
    address = {Online},
    pages = {38--45},
    booktitle = {Proceedings of the Fifth Conference on Machine Translation},
    year = {2020},
    url = {https://www.aclweb.org/anthology/2020.wmt-1.5}
}