Codebase for the paper "Copy mechanism and tailored training for character-based data-to-text generation" (Roberti et al., ECML-PKDD 2019).
A working Python 3 environment is needed. Required libraries are listed in the requirements.txt file, use one of the following commands to install them, depending on your environment:
pip install requirements.txt
# XOR
conda install --file requirements.txtThe main.py file is used to train an EDA_CS, EDA_C or EDA model on the dataset on your choice:
python3 main.py --dataset <dataset> --model <model>The default configuration trains EDA_CS on the E2E+ dataset.
Available models are ['e2e+', 'e2e', 'hotel', 'restaurant']; available datasets are ['eda_cs', 'eda_c', 'eda'].
Different hyperparameters can be set via argparse (run python3 main.py -h for more details).
At the end of the training phase, one checkpoint for each epoch will be stored in the trained_nets/<timestamp>/ folder, where timestamp is the UNIX time of starting the script.
The create_eval_files.py script will generate both outputs and references files, which can be directly used as inputs for the evaluation script. For example, you can generate on the E2E development set using ED+ACS as follows:
PYTHONPATH=. python3 utils/create_eval_files.py trained_nets/<timestamp>/<checkpoint> --subset devThis will create the trained_nets/<timestamp>/<checkpoint>.dev.output and trained_nets/<timestamp>/<checkpoint>.dev.references files.
The default configuration uses your EDA_CS checkpoint to generate from the E2E+ test dataset's inputs. You can choose a different dataset/subset/architecture via argparse.
We took advantage of the E2E NLG Challenge Evaluation metrics. Please refer to their repository for detailed instructions.
Please use the following BibTeX snippet to cite our work:
@inproceedings{Roberti2019,
author = {Marco Roberti and
Giovanni Bonetta and
Rossella Cancelliere and
Patrick Gallinari},
title = {Copy Mechanism and Tailored Training for Character-Based Data-to-Text
Generation},
booktitle = {Machine Learning and Knowledge Discovery in Databases - European Conference,
{ECML} {PKDD} 2019, W{\"{u}}rzburg, Germany, September 16-20,
2019, Proceedings, Part {II}},
pages = {648--664},
year = {2019},
crossref = {ECMLPKDD2019-2},
url = {https://doi.org/10.1007/978-3-030-46147-8\_39},
doi = {10.1007/978-3-030-46147-8\_39},
timestamp = {Mon, 15 Jun 2020 17:05:23 +0200},
biburl = {https://dblp.org/rec/conf/pkdd/RobertiBCG19.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@proceedings{ECMLPKDD2019-2,
editor = {Ulf Brefeld and
{\'{E}}lisa Fromont and
Andreas Hotho and
Arno J. Knobbe and
Marloes H. Maathuis and
C{\'{e}}line Robardet},
title = {Machine Learning and Knowledge Discovery in Databases - European Conference,
{ECML} {PKDD} 2019, W{\"{u}}rzburg, Germany, September 16-20,
2019, Proceedings, Part {II}},
series = {Lecture Notes in Computer Science},
volume = {11907},
publisher = {Springer},
year = {2020},
url = {https://doi.org/10.1007/978-3-030-46147-8},
doi = {10.1007/978-3-030-46147-8},
isbn = {978-3-030-46146-1},
timestamp = {Mon, 27 Dec 2021 15:13:42 +0100},
biburl = {https://dblp.org/rec/conf/pkdd/2019-2.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}