Prefill-Guided Thinking for zero-shot detection of AI-generated images

🤝 Citation

If you use this code in your research, please cite our paper:

@misc{kachwala2025prefillguidedthinking,
      title={Prefill-Guided Thinking for zero-shot detection of AI-generated images}, 
      author={Zoher Kachwala and Danishjeet Singh and Danielle Yang and Filippo Menczer},
      year={2025},
      eprint={2506.11031},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2506.11031}, 
}

Note: Paper submitted to ACL ARR.

Can you tell which images above are real vs AI-generated? Answer in footnote¹

This repository contains the evaluation system for our paper on using Prefill-Guided Thinking (PGT) to detect AI-generated images with Vision-Language Models (VLMs).

💡 For detailed technical documentation, particularly helpful for LLM code agents: See AGENTS.md for complete architecture details, function signatures, and implementation specifics.

Key Finding: Simply prefilling a VLM's response with the phrase "Let's examine the style and the synthesis artifacts" improves detection by up to 24% in Macro F1 — without any training or fine-tuning.

🎯 What is Prefill-Guided Thinking?

Instead of asking a VLM to detect fake images directly, we prefill its response to guide its reasoning:

(a) Baseline: Direct query → incorrect classification (real)
(b) Chain-of-Thought: "Let's think step by step" → still incorrect
(c) S2 (our method): "Let's examine the style and the synthesis artifacts" → correct ✓

This simple technique works across 3 VLMs and 16 different image generators spanning faces, objects, and natural scenes.

🚀 Quick Start

Installation

See SETUP.md for complete environment setup instructions (conda, PyTorch, vLLM, Flash-Attention).

Usage

See Usage Examples for detailed command-line examples and all available options.

📊 Datasets

We evaluate on three diverse benchmarks:

Dataset	Content	Images	Generators
D3	Diverse web images (objects, scenes, art)	8.4k	4 (Stable Diffusion variants, DeepFloyd)
DF40	Human faces (deepfakes)	10k	6 (Midjourney, StyleCLIP, StarGAN, etc.)
GenImage	ImageNet objects (animals, vehicles)	10k	8 (ADM, BigGAN, GLIDE, etc.)

Setup Data

See Data Collection & Setup for complete instructions on downloading and organizing all three datasets.

🧪 Supported Models

Qwen2.5-VL-7B — Dynamic-resolution Vision Transformer
LLaVA-OneVision-7B — Multimodal instruction-following model
Llama-3.2-Vision-11B — Vision adapter + Llama 3.1 LM

All models use instruction-tuned variants via vLLM for efficient inference.

🎨 Three Evaluation Methods

Method	Description
Baseline	No prefill, just ask the question
CoT	Chain-of-thought reasoning
S2	Task-aligned (our method)

See Usage Examples for detailed command-line examples and all available options.

📈 Results

Detection Macro F1 across models, datasets, and PGT variations. Bars show relative improvement of S2 over the next best method, with 95% confidence intervals from 10k bootstrap iterations.

Detection recall (%) for Llama on each dataset, broken down by generator. Similar figures for LLaVA and Qwen in the paper.

Interpretability: Confidence Progression

To understand how prefills affect reasoning, we track answer confidence at five partial-response intervals (0–100% of sentences):

At each interval, we probe for the model's answer and confidence. The results reveal a striking pattern:

Evolution of answer confidence and Macro F1 across partial responses for Qwen. Baseline queries trigger immediate high confidence despite poor detection — the model commits to an answer before examining the image. Prefills induce a confidence dip mid-response, with detection improving steadily as the response progresses.

🔬 Advanced Usage

Multi-Response Generation (n>1) - Generate multiple responses with majority voting → Details
Phrase Modes - Test prefill vs prompt vs system instruction → Details
Debug Mode - Quick validation with 5 examples → Details

📂 Output Structure

Results are saved in hierarchical directories with timestamped JSON files containing metrics and full reasoning traces.

See Output Structure for detailed file organization and JSON schemas.

📊 Visualization & Analysis

Generate publication-ready plots (Macro F1 bars, radar plots, vocabulary analysis, etc.)

See Plotting & Visualization System for available plots and usage instructions.

📚 Documentation

SETUP.md - Environment setup and installation instructions
AGENTS.md - Complete technical reference (architecture, function signatures, all details)
Paper - arXiv:2506.11031

👥 Authors

Zoher Kachwala · Danishjeet Singh · Danielle Yang · Filippo Menczer

Observatory on Social Media Indiana University, Bloomington

_{¹ Answer to image quiz: Only images 3, 10, and 11 in the mosaic are real. All others are AI-generated.}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
images		images
results		results
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE.md		LICENSE.md
README.md		README.md
SETUP.md		SETUP.md
add_class_probs_to_reasoning.py		add_class_probs_to_reasoning.py
aggregate_margins.py		aggregate_margins.py
config.py		config.py
evaluate.py		evaluate.py
generate_results_table.py		generate_results_table.py
helpers.py		helpers.py
image_18_all_responses.md		image_18_all_responses.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prefill-Guided Thinking for zero-shot detection of AI-generated images

🤝 Citation

🎯 What is Prefill-Guided Thinking?