This is the repository that contains source code for the BERDS website.
Tested on Python 3.8.
To use the repo, first clone the project.
git clone [email protected]:timchen0618/berds.gitAnd create a virtual environment (recommended).
cd berds/
python3 -m venv berds
source berds/bin/activateInstall the required packages.
pip install -r requirments.txtYou can find the data and model here.
You can load the data from huggingface, and later save it if needed.
from datasets import load_dataset
arguana_ds = load_dataset("timchen0618/Arguana")
kialo_ds = load_dataset("timchen0618/Kialo")
opinionqa_ds = load_dataset("timchen0618/OpinionQA")To run evaluation on your own retrieval outputs, you need to download the evaluator model.
You can load the model from huggingface, with the peft and transformers libraries.
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM
config = PeftConfig.from_pretrained("timchen0618/Mistral_BERDS_evaluator")
base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
model = PeftModel.from_pretrained(base_model, "timchen0618/Mistral_BERDS_evaluator")Given a documents and a perspective, perspective detection is defined as "identifying whether the document supports or implies the perspective".
A perspective detection model is an essential component of the automatic evaluation.
More details on this can be found here.
We expect the output to be in a jsonl file, with each line being a JSON object.
Each element should follow the format below:
{
"perspectives": [p1, p2, ...],
"ctxs": [
{
"title": [title1],
"text": [retrieved_document1]
},
{
"title": [title2],
"text": [retrieved_document2]
},
...
]
}
For the ease of inspection, you could simply add the ctxs field to the original input jsonl file.
Each element in the ctxs should contain the text field. The title field is optional.
Run
cd eval/
PYTHONPATH=.. torchrun --nproc_per_node 1 --master-port [port] eval.py \
--data [path_to_retrieval_outputs] \
--output_file [path_to_eval_results] \
--instructions instructions.txt \
--model [path/to/evaluator/model] \
--model_type mistral \
--topk [k]If the [path/to/evaluator/model] is set to timchen0618/Mistral_BERDS_evaluator_full, the script will run the evaluator reported in the paper.
However, this will take 3~4 hours for a single dataset.
We also support (vLLM)[https://github.com/vllm-project/vllm] inference to reduce evaluation time.
Simply replace eval.py in the commands with eval_vllm.py. All the arguments are the same.
See run_eval.sh for an example.
In run_eval.sh, the outputs are saved to files named [dataset].jsonl.