models

This folder is for r2c models. They broadly follow the allennlp configuration format. If you want r2c, you'll want to look at multiatt.

Replicating validation results

Here's how you can replicate my val results. Run the command(s) below. First, you might want to make your GPUs available. When I ran these experiments I used

source activate r2c && export LD_LIBRARY_PATH=/usr/local/cuda-9.0/ && export PYTHONPATH=/home/rowan/code/r2c && export CUDA_VISIBLE_DEVICES=0,1,2

For question answering, run:

python train.py -params multiatt/default.json -folder saves/flagship_answer

for Answer justification, run

python train.py -params multiatt/default.json -folder saves/flagship_rationale -rationale

You can combine the validation predictions using python eval_q2ar.py -answer_preds saves/flagship_answer/valpreds.npy -rationale_preds saves/flagship_rationale/valpreds.npy

Submitting to the leaderboard

VCR features a leaderboard where you can submit your answers on the test set. Submitting to the leaderboard is easy! You'll need to submit something like the example submission CSV file. You can use the eval_for_leaderboard.py script, which formats everything in the right way.

Essentially, your submission has to have the following columns:

annot_id,answer_0,answer_1,answer_2,answer_3,rationale_conditioned_on_a0_0,rationale_conditioned_on_a0_1,rationale_conditioned_on_a0_2,rationale_conditioned_on_a0_3,rationale_conditioned_on_a1_0,rationale_conditioned_on_a1_1,rationale_conditioned_on_a1_2,rationale_conditioned_on_a1_3,rationale_conditioned_on_a2_0,rationale_conditioned_on_a2_1,rationale_conditioned_on_a2_2,rationale_conditioned_on_a2_3,rationale_conditioned_on_a3_0,rationale_conditioned_on_a3_1,rationale_conditioned_on_a3_2,rationale_conditioned_on_a3_3

To evaluate, I'll first take the argmax over the answer choices, then take the argmax over your rationale choices (conditioned on the right answers). These give two sets of predictions, which can be used to compute Q->A and QA->R accuracy. For Q->AR accuracy, we take a bitwise AND between the hits of the QA and QAR columns. In other words, to get a question right, you have to get the answer AND the rationale right.

Name		Name	Last commit message	Last commit date
parent directory ..
multiatt		multiatt
README.md		README.md
__init__.py		__init__.py
eval_for_leaderboard.py		eval_for_leaderboard.py
eval_q2ar.py		eval_q2ar.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

models

Replicating validation results

Submitting to the leaderboard

FilesExpand file tree

models

Directory actions

More options

Directory actions

More options

Latest commit

History

models

Folders and files

parent directory

README.md

models

Replicating validation results

Submitting to the leaderboard