| Name | Last modified | Size | Description | |
|---|---|---|---|---|
| Parent Directory | - | |||
| nc11_de_fr_sentence_aligned.tar.bz2 | 2017-07-21 11:32 | 18M | ||
| wsd_testset_corpora_de_en.tar.bz2 | 2017-07-14 19:02 | 621M | ||
| wsd_testset_corpora_de_fr.tar.bz2 | 2017-07-14 19:04 | 848M | ||
This directory contains supplementary data for the following paper:
Annette Rios, Laura Mascarell and Rico Sennrich. 2017. Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings. In Proceedings of the Second Conference on Machine Translation (WMT17), Copenhagen, Denmark
@inproceedings{rios2017,
address = "Copenhagen, Denmark",
author = "Rios, Annette and Mascarell, Laura and Sennrich, Rico",
booktitle = "{Proceedings of the Second Conference on Machine Translation, Volume 1: Research Papers}",
title = "{Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings}",
year = "2017"
}
The test set is available at https://github.com/a-rios/ContraWSD
This directory contains snapshots of the corpora from which the test set
was extracted (wsd_testset_corpora_de_*.tar.bz2). These can be used,
along with the sentence IDs in the test set, to recover document context
for experiments on document-level machine translation.
We also provide a cleaned-up version of news-commentary v.11 for DE-FR
that we used as training data in our experiments: nc11_de_fr_sentence_aligned.tar.bz2