PLOG: Table-to-Logic Pretraining for Logical Table-to-Text Generation

Code and data for EMNLP 2022 paper PLOG: Table-to-Logic Pretraining for Logical Table-to-Text Generation.

Requirements

python >= 3.8
transformers >= 4.5
torch >= 1.10.1

Evaluation Scripts

Datasets

LOGICNLG

Original LOGICNLG dataset consists of three .json files for train/dev/test samples and a directory all_csv/ of .csv files for tables. To extract the csv files:

cd data/logicnlg
unzip all_csv.zip

CONTLOG

CONTLOG is collected based on Logic2text dataset. We provide the table-to-text data with highlighted cells and pre-computed cell information in data/contlog.

Table-to-logic pretraining data

Download pretraining data from here.

Finetuning on Downstream Tasks

Training

LOGICNLG

CUDA_VISIBLE_DEVICES=0 python train_logicnlg.py --do_train \
                          --model [t5-base|t5-large|facebook/bart-large]  \
                          --task text  \
                          --data_path data/logicnlg \
                          --use_cache \
                          --affix [experiment id]  \
                          --interval_type epoch  \
                          --pre_com \
                          --load_from [pretrained model checkpoint path]

CONTLOG

CUDA_VISIBLE_DEVICES=0 python train_contlog.py --do_train \
                          --model [t5-base|t5-large|facebook/bart-large]  \
                          --task text  \
                          --data_path data/contlog  \
                          --affix [experiment id] \
                          --interval_type epoch  \
                          --pre_com  \
                          --load_from [pretrained model checkpoint path]

Inference

LOGICNLG

CUDA_VISIBLE_DEVICES=0 python train_logicnlg.py --do_test \
                          --model [t5-base|t5-large|facebook/bart-large] \
                          --task text \
                          --data_path data/logicnlg \
                          --use_cache  \
                          --affix [experiment id] \
                          --pre_com  \
                          --load_from [checkpoint path]

CONTLOG

CUDA_VISIBLE_DEVICES=0 python train_logicnlg.py --do_test \
                          --model [t5-base|t5-large|facebook/bart-large] \
                          --task text \
                          --data_path data/contlog \
                          --affix [experiment id]  \
                          --pre_com  \
                          --load_from [checkpoint path]

Pretraining with Table-to-Logic Data

LOGICNLG

CUDA_VISIBLE_DEVICES=0 python train_logicnlg.py --do_train \
                          --model [t5-base|t5-large|facebook/bart-large] \
                          --task logic  \
                          --data_path data/logicnlg \
                          --use_cache  \
                          --affix [experiment id]  \
                          --interval_type step  \
                          --pre_com

CONTLOG

CUDA_VISIBLE_DEVICES=0 python train_contlog.py --do_train  \
                          --model [t5-base|t5-large|facebook/bart-large] \
                          --task logic \
                          --data_path data/contlog  \
                          --affix [experiment id]  \
                          --interval_type step \
                          --pre_com

Evaluation of Model Outputs

We provide the model outputs in model_outputs/ and evaluation scripts for TAPEX-Acc and TAPAS-Acc in scripts/.

TAPEX-Acc Evaluation

To use TAPEX, you may need to install the latest version of Transformers and Datasets:

pip install -U datasets
pip install -U transformers

CONTLOG

CUDA_VISIBLE_DEVICES=0 python scripts/eval_contlog_with_tapex.py --do_predict \
                          --model_name_or_path microsoft/tapex-large-finetuned-tabfact \
                          --test_name model_outputs/contlog/plog-bart-large.txt \
                          --split_name test \
                          --output_dir tapex-contlog-eval \
                          --affix plog-bart-large \
                          --data_dir data/contlog \
                          --per_device_eval_batch_size 12 \
                          --eval_accumulation_steps 6

LOGICNLG

CUDA_VISIBLE_DEVICES=0 python scripts/eval_logicnlg_with_tapex.py --do_predict \
                          --model_name_or_path microsoft/tapex-large-finetuned-tabfact \
                          --test_name model_outputs/logicnlg/plog-bart-large.txt \
                          --split_name test \
                          --output_dir tapex-logic-eval \
                          --affix plog-bart-large \
                          --data_dir data/logicnlg \
                          --per_device_eval_batch_size 12 \
                          --eval_accumulation_steps 6

--test_name is the path to a model output file, --affix is the experiment ID, --output_dir is the directory for storing intermediate data files and prediction results, --data_dir is the path to original data files.

TAPAS-Acc Evaluation

CONTLOG

CUDA_VISIBLE_DEVICES=0 python scripts/eval_contlog_with_tapas.py --test_file model_outputs/contlog/plog-bart-large.txt --data_dir data/contlog --batch_size 4 --split_name test

LOGICNLG

CUDA_VISIBLE_DEVICES=0 python scripts/eval_logicnlg_with_tapas.py --test_file model_outputs/logicnlg/plog-bart-large.json --data_dir data/logicnlg --batch_size 4

Reference

If you find this project useful in your work, please consider citing the paper:

@article{liu2022plog,
  title={PLOG: Table-to-Logic Pretraining for Logical Table-to-Text Generation},
  author={Liu, Ao and Dong, Haoyu and Okazaki, Naoaki and Han, Shi and Zhang, Dongmei},
  journal={arXiv preprint arXiv:2205.12697},
  year={2022}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Security

Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include Microsoft, Azure, DotNet, AspNet, Xamarin, and our GitHub organizations.

If you believe you have found a security vulnerability in any Microsoft-owned repository that meets Microsoft's definition of a security vulnerability, please report it to us through https://docs.opensource.microsoft.com/releasing/securing-content/reporting-security-issues/.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
.idea		.idea
data		data
execute		execute
model_outputs		model_outputs
scripts		scripts
Datasets.py		Datasets.py
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
eval_utils.py		eval_utils.py
train_contlog.py		train_contlog.py
train_logicnlg.py		train_logicnlg.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PLOG: Table-to-Logic Pretraining for Logical Table-to-Text Generation

Requirements

Evaluation Scripts

Datasets

LOGICNLG

CONTLOG

Table-to-logic pretraining data

Finetuning on Downstream Tasks

Training

LOGICNLG

CONTLOG

Inference

LOGICNLG

CONTLOG

Pretraining with Table-to-Logic Data

LOGICNLG

CONTLOG

Evaluation of Model Outputs

TAPEX-Acc Evaluation

CONTLOG

LOGICNLG

TAPAS-Acc Evaluation

CONTLOG

LOGICNLG

Reference

Contributing

Security

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PLOG: Table-to-Logic Pretraining for Logical Table-to-Text Generation

Requirements

Evaluation Scripts

Datasets

LOGICNLG

CONTLOG

Table-to-logic pretraining data

Finetuning on Downstream Tasks

Training

LOGICNLG

CONTLOG

Inference

LOGICNLG

CONTLOG

Pretraining with Table-to-Logic Data

LOGICNLG

CONTLOG

Evaluation of Model Outputs

TAPEX-Acc Evaluation

CONTLOG

LOGICNLG

TAPAS-Acc Evaluation

CONTLOG

LOGICNLG

Reference

Contributing

Security

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages