Reproduction Code for Paper "Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models". The preprint of our paper is publicly available at this link.
The project requires the setup of two separate environments. Here are the steps to configure each environment:
conda create -n llava python=3.10 -y
conda activate llava
pip install -r requirements_llava.txtconda create -n qwen python=3.10 -y
conda activate qwen
pip install -r requirements_qwen.txtThe models required for our experiments are Qwen-VL-Chat and vicuna-7b-v1.5. Please download these two models and place them in a custom directory for use in subsequent experiments.
We conduct experiments using the following datasets: COCO, GQA, OCR-VQA, TextVQA, and VisualGenome.
Please download the dataset and the description files, and structure the description file according to the following format:
{
"id": "000000341431",
"image": "coco/train2017/000000341431.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\nWhat period of day does the image capture?"
},
{
"from": "gpt",
"value": "The image captures a nocturnal scene."
},
{
"from": "human",
"value": "How is the road illuminated in the image?"
},
{
"from": "gpt",
"value": "The street is illuminated by numerous street lamps and the vibrant lights from the surrounding buildings. Additionally, there is a time lapse effect causing blurred lights on the road."
},
{
"from": "human",
"value": "Is there any distinctive architecture in the image?"
},
{
"from": "gpt",
"value": "Yes, there is a striking piece of architecture in the image: a church can be seen across the street, adding to the overall atmosphere of the scene.\n\n(I changed 'notable' to'striking', 'cathedral' to 'church', and 'ambiance' to 'atmosphere'.)"
},
{
"from": "human",
"value": "What is situated in the middle of the thoroughfare?"
},
{
"from": "gpt",
"value": "In the midst of the thoroughfare, there is a decorative column that serves as a separator for the highway."
},
{
"from": "human",
"value": "What does the cinematic effect in the image achieve?"
},
{
"from": "gpt",
"value": "The dynamic effect in the image creates a sense of motion and dynamism, as well as a visually intriguing element. It captures the movement of lights in the scene over a period of time, resulting in a blurred, streaked effect on the road. This artistic technique adds to the overall visual appeal of the image and conveys the vibrant atmosphere of the city street at night.\n\n(I changed \"time-lapse\" to \"dynamic\", and \"aesthetic\" to \"visual\")"
}
]
}place the description files under data/llava and data/qwen respectively, then filter the dataset according to the description files using tools/choose.py and Split the dataset to obtain the training set using tools/split.py:
python choose.pypython split.pyThe training set should be organized as follows:
data/
└── data_without_privacy/
└──train/
├── coco/
│ └── train2017/
├── gqa/
│ └── images/
├── ocr_vqa/
│ └── images/
├── textvqa/
│ └── train_images/
└── vg/
├── VG_100K/
└── VG_100K_2/
In this project, we need datasets with embedded privacy. The following are the steps for generating the privacy dataset:
- Generate private information using tools/generate_user_info.py:
python generate_user_info.py- Use tools/add_privacy_to_image.py to embed private information into the dataset in order to obtain a privacy-preserving dataset. Modify the code at line 32 to adjust the embedding rate.
python add_privacy_to_image.pyAdditionally, our experiments utilize datasets with text and image augmentations.Use tools/augmentation/text_augmentation.py and tools/augmentation/image_augmentation.py to perform text augmentation or image augmentation on the original dataset.
python text_augmentation.pypython image_augmentation.pyFor the LLaVA model, use finetune/finetune_lora_llava.sh for fine-tuning. Modify --data_path to use either the original description files or the augmented description files, and modify --image_folder to use the original image data, augmented image data, or privacy-embedded image data.
bash finetune_lora_llava.shFor the Qwen model, use finetune/finetune_lora_qwenvl.sh for fine-tuning. Modify --data_path to use various datasets.
To explore how task-irrelevant content might affect the finetuning process, we examine the performance of the MLLMs on standard VQA tasks before and after embedding the task-irrelevant private content.
- ScienceQA Under data/eval/scienceqa, download images, pid_splits.json, problems.json from the data/scienceqa folder of the ScienceQA repo and scienceqa_test_img.jsonl.
For LLaVA, Single-GPU inference and evaluate.
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/llava/sqa.shFor Qwen-VL, run scripts/eval/qwen/evaluate_multiple_choice.py as follows.
ds="scienceqa_test_img"
checkpoint=/PATH/TO/CHECKPOINT
python -m torch.distributed.launch --use-env \
--nproc_per_node ${NPROC_PER_NODE:-8} \
--nnodes ${WORLD_SIZE:-1} \
--node_rank ${RANK:-0} \
--master_addr ${MASTER_ADDR:-127.0.0.1} \
--master_port ${MASTER_PORT:-12345} \
evaluate_multiple_choice.py \
--checkpoint $checkpoint \
--dataset $ds \
--batch-size 8 \
--num-workers 2- MME Download the data following the official instructions here.
For LLaVA, downloaded images to MME_Benchmark_release_version, put the official eval_tool and MME_Benchmark_release_version under data/eval/MME, then Single-GPU inference and evaluate.
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/llava/mme.shFor Qwen-VL: Rearrange images by executing python get_images.py. Evaluate Qwen-VL-Chat results by executing python eval.py.
Modify the Transformers library, add a path for saving gradient outputs, then run the fine-tuning script.
After obtaining the results using training datasets with different privacy rates, run the gradient similarity comparison code in tools/compute_gradients.
LLaVA:
python gradients_llava.pyQwen-VL:
puthon gradients_qwen.pyAfter fine-tuning, divide the test set according to the requirements in the paper, and then follow the steps below to complete the probing experiments and line chart plotting for Qwen-VL.
- Use probing/run_qwen.py to generate result files in probing/results.
python run_qwen.py --model-base --query- Use probing/results/analyze.py to generate line charts based on the results.
python analyze.py