Skip to content

xiemk/HOPE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HOPE: Hallucination searching-based Object Probing Evaluation

This repo provides the code and data of our proposed benchmark for object hallucination evaluation in LVLMs. Please refer to our paper: What Makes "Good" Distractors for Object Hallucination Evaluation in Large Vision-Language Models?.

Preparing Data

See the README.md file in the data directory for instructions on downloading and preparing the datasets.

Quick Start

Conda environment

  • Install the required dependency packages according to the requirements.txt.

    conda create --name hope python=3.8 -y
    conda activate hope
    pip install -r requirements.txt
    

How to perform evaluation on the HOPE benchmark?

  • Perform inference on the pre-generated json files. For example, the json file generated using the description-based strategy is organized in the following format:

      {"question_id": 1, "image": "val2014/COCO_val2014_000000095283.jpg", "text": "Please answer yes or no. Is there a person in the image?", "label": "yes"}
      {"question_id": 2, "image": "val2014/COCO_val2014_000000095283.jpg", "text": "Please answer yes or no. Is there person in enclosed area in the image?", "label": "no"}

    Perform evaluation on the generated anwser json files. The answer json files should be organized in the following format:

      {"question": "Please answer yes or no. Is there a person in the image?", "answer": "yes"}
      {"question": "Please answer yes or no. Is there person in enclosed area in the image?", "answer": "no"}

    Obtain the evaluation results by running:

    python evaluate.py --prompt_type binary --ans_file xxx --label_file xxx
    

How to customize your HOPE benchmark?

  • When constructing category-oriented hallucination search and content-aware hallucination search data, one needs to run the following command to generate question-answer data on the Objects365 dataset.

    python main.py --dataset objects365 \
    --sample_num 2000 --pos_selected 6 --neg_selected 6 \
    --search_strategy similarity --prompt_type binary
    

    In the example below, we select 2,000 samples, choose 6 positive labels for each sample, use the similarity strategy to select 6 negative labels, and finally generate question-answering data in binary format.

  • When constructing description-based hallucination search data, one needs to run the following comman to obtain 500 samples that requires some specific conditions, each with 3 randomly-selected positive labels.

    python main.py --dataset vg \
    --objs_filter 2000 --desc_filter 50 --labs_filter 10 \
    --sample_num 500 --pos_selected 3 \
    --search_strategy description
    

    The above comman will generate an excel file, which includes images, their positive labels, and respective misleading descriptions obtained by description-based searching strategy. Users need to manually review the excel file, and highlight a valid (meaning it's actually an incorrect description) yet most misleading description for each positive label. Finally, run the following command to generate description-based hallucination question-answer data.

    python process.py --dataset vg
    

Statistics of the datasets

Table 1: Statistics on Category-Oriented Hallucination Searching and Content-Aware Hallucination Searching data.

Dataset #CLS #Train #Val #Labels #Labels #Statistics
{Min~Max} {Avg} Samples Selected Pos. Selected Neg.
MS-COCO 80 82,081 40,137 1~18 2.93 500 3 3
Objects-365 365 1,742,289 80,000 1~36 6.17 2000 6 6

Table 2: Statistics on Description-Based Hallucination Searching data.

#Dataset #Filter Condition(1) #Filter Condition(2) #Filter Condition(3) #Statistics
Object Frequency Descriptions (/object) Objects (/image) Objects Descriptions Samples
MS-COCO - - - 80 1 - 1,884 40,137
VG $\geq$ 2,000 $\geq$ 50 $\geq$ 10 265 90 - 4,401 88,730
OpenImages $\geq$ 1,000 $\geq$ 50 $\geq$ 10 97 71 - 2,009 4,435

Hyper-Parameters

To generate different entries of the main table, modify the following parameters:

  1. dataset: The dataset to use, options are ['coco2014', 'objects365', 'vg', 'openimages'].
  2. sample_num: The sample size for generating benchmark.
  3. prompt_type: The generated question-answer data format can be selected from ['binary', 'multiopt'].
  4. pos_selected: The number of postive labels selected.
  5. neg_selected: The number of negative labels selected.
  6. search_strategy: The available attack strategies are category-oriented hallucination search ['random', 'popular', 'co_occurrence', 'similarity'], content-aware hallucination search ['content_clip', 'content_tagclip', 'union'] and description-based hallucination search ['description'].
  7. desc_filter: Filter out the objects with too few descriptions.
  8. objs_filter: Filter out the objects with low occurrence frequencies.
  9. labs_filter: Filter out the images with less labels.
  10. ans_file: Optional explicit path to answers.json.
  11. label_file: Optional explicit path to labels.json.

About

Hallucination searching-based Object Probing Evaluation (HOPE) benchmark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors