This repo provides the code and data of our proposed benchmark for object hallucination evaluation in LVLMs. Please refer to our paper: What Makes "Good" Distractors for Object Hallucination Evaluation in Large Vision-Language Models?.
See the README.md file in the data directory for instructions on downloading and preparing the datasets.
-
Install the required dependency packages according to the
requirements.txt.conda create --name hope python=3.8 -y conda activate hope pip install -r requirements.txt
-
Perform inference on the pre-generated json files. For example, the json file generated using the description-based strategy is organized in the following format:
{"question_id": 1, "image": "val2014/COCO_val2014_000000095283.jpg", "text": "Please answer yes or no. Is there a person in the image?", "label": "yes"} {"question_id": 2, "image": "val2014/COCO_val2014_000000095283.jpg", "text": "Please answer yes or no. Is there person in enclosed area in the image?", "label": "no"}Perform evaluation on the generated anwser json files. The answer json files should be organized in the following format:
{"question": "Please answer yes or no. Is there a person in the image?", "answer": "yes"} {"question": "Please answer yes or no. Is there person in enclosed area in the image?", "answer": "no"}Obtain the evaluation results by running:
python evaluate.py --prompt_type binary --ans_file xxx --label_file xxx
-
When constructing category-oriented hallucination search and content-aware hallucination search data, one needs to run the following command to generate question-answer data on the Objects365 dataset.
python main.py --dataset objects365 \ --sample_num 2000 --pos_selected 6 --neg_selected 6 \ --search_strategy similarity --prompt_type binaryIn the example below, we select 2,000 samples, choose 6 positive labels for each sample, use the similarity strategy to select 6 negative labels, and finally generate question-answering data in binary format.
-
When constructing description-based hallucination search data, one needs to run the following comman to obtain 500 samples that requires some specific conditions, each with 3 randomly-selected positive labels.
python main.py --dataset vg \ --objs_filter 2000 --desc_filter 50 --labs_filter 10 \ --sample_num 500 --pos_selected 3 \ --search_strategy descriptionThe above comman will generate an excel file, which includes images, their positive labels, and respective misleading descriptions obtained by description-based searching strategy. Users need to manually review the excel file, and highlight a valid (meaning it's actually an incorrect description) yet most misleading description for each positive label. Finally, run the following command to generate description-based hallucination question-answer data.
python process.py --dataset vg
Table 1: Statistics on Category-Oriented Hallucination Searching and Content-Aware Hallucination Searching data.
| Dataset | #CLS | #Train | #Val | #Labels | #Labels | #Statistics | ||
|---|---|---|---|---|---|---|---|---|
| {Min~Max} | {Avg} | Samples | Selected Pos. | Selected Neg. | ||||
| MS-COCO | 80 | 82,081 | 40,137 | 1~18 | 2.93 | 500 | 3 | 3 |
| Objects-365 | 365 | 1,742,289 | 80,000 | 1~36 | 6.17 | 2000 | 6 | 6 |
Table 2: Statistics on Description-Based Hallucination Searching data.
| #Dataset | #Filter Condition(1) | #Filter Condition(2) | #Filter Condition(3) | #Statistics | ||
|---|---|---|---|---|---|---|
| Object Frequency | Descriptions (/object) | Objects (/image) | Objects | Descriptions | Samples | |
| MS-COCO | - | - | - | 80 | 1 - 1,884 | 40,137 |
| VG |
|
|
|
265 | 90 - 4,401 | 88,730 |
| OpenImages |
|
|
|
97 | 71 - 2,009 | 4,435 |
To generate different entries of the main table, modify the following parameters:
dataset: The dataset to use, options are ['coco2014', 'objects365', 'vg', 'openimages'].sample_num: The sample size for generating benchmark.prompt_type: The generated question-answer data format can be selected from ['binary', 'multiopt'].pos_selected: The number of postive labels selected.neg_selected: The number of negative labels selected.search_strategy: The available attack strategies are category-oriented hallucination search ['random', 'popular', 'co_occurrence', 'similarity'], content-aware hallucination search ['content_clip', 'content_tagclip', 'union'] and description-based hallucination search ['description'].desc_filter: Filter out the objects with too few descriptions.objs_filter: Filter out the objects with low occurrence frequencies.labs_filter: Filter out the images with less labels.ans_file: Optional explicit path to answers.json.label_file: Optional explicit path to labels.json.