Skip to content

Zoher15/Zero-shot-PGT

Repository files navigation

Prefill-Guided Thinking for zero-shot detection of AI-generated images

Python 3.11 PyTorch 2.7.1 vLLM 0.10.1 License arXiv

🀝 Citation

If you use this code in your research, please cite our paper:

@misc{kachwala2025prefillguidedthinking,
      title={Prefill-Guided Thinking for zero-shot detection of AI-generated images}, 
      author={Zoher Kachwala and Danishjeet Singh and Danielle Yang and Filippo Menczer},
      year={2025},
      eprint={2506.11031},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2506.11031}, 
}

Note: Paper submitted to ACL ARR.


Sample images from D3, DF40, and GenImage datasets

Can you tell which images above are real vs AI-generated? Answer in footnoteΒΉ

This repository contains the evaluation system for our paper on using Prefill-Guided Thinking (PGT) to detect AI-generated images with Vision-Language Models (VLMs).

πŸ’‘ For detailed technical documentation, particularly helpful for LLM code agents: See AGENTS.md for complete architecture details, function signatures, and implementation specifics.

Key Finding: Simply prefilling a VLM's response with the phrase "Let's examine the style and the synthesis artifacts" improves detection by up to 24% in Macro F1 β€” without any training or fine-tuning.

🎯 What is Prefill-Guided Thinking?

PGT applied to a Midjourney-generated strawberry

Instead of asking a VLM to detect fake images directly, we prefill its response to guide its reasoning:

  • (a) Baseline: Direct query β†’ incorrect classification (real)
  • (b) Chain-of-Thought: "Let's think step by step" β†’ still incorrect
  • (c) S2 (our method): "Let's examine the style and the synthesis artifacts" β†’ correct βœ“

This simple technique works across 3 VLMs and 16 different image generators spanning faces, objects, and natural scenes.


πŸš€ Quick Start

Installation

See SETUP.md for complete environment setup instructions (conda, PyTorch, vLLM, Flash-Attention).

Usage

See Usage Examples for detailed command-line examples and all available options.


πŸ“Š Datasets

We evaluate on three diverse benchmarks:

Dataset Content Images Generators
D3 Diverse web images (objects, scenes, art) 8.4k 4 (Stable Diffusion variants, DeepFloyd)
DF40 Human faces (deepfakes) 10k 6 (Midjourney, StyleCLIP, StarGAN, etc.)
GenImage ImageNet objects (animals, vehicles) 10k 8 (ADM, BigGAN, GLIDE, etc.)

Setup Data

See Data Collection & Setup for complete instructions on downloading and organizing all three datasets.


πŸ§ͺ Supported Models

  • Qwen2.5-VL-7B β€” Dynamic-resolution Vision Transformer
  • LLaVA-OneVision-7B β€” Multimodal instruction-following model
  • Llama-3.2-Vision-11B β€” Vision adapter + Llama 3.1 LM

All models use instruction-tuned variants via vLLM for efficient inference.


🎨 Three Evaluation Methods

Method Description
Baseline No prefill, just ask the question
CoT Chain-of-thought reasoning
S2 Task-aligned (our method)

See Usage Examples for detailed command-line examples and all available options.


πŸ“ˆ Results

Macro F1 performance comparison

Detection Macro F1 across models, datasets, and PGT variations. Bars show relative improvement of S2 over the next best method, with 95% confidence intervals from 10k bootstrap iterations.

Per-generator recall for Llama

Detection recall (%) for Llama on each dataset, broken down by generator. Similar figures for LLaVA and Qwen in the paper.

Interpretability: Confidence Progression

To understand how prefills affect reasoning, we track answer confidence at five partial-response intervals (0–100% of sentences):

Partial response intervals

At each interval, we probe for the model's answer and confidence. The results reveal a striking pattern:

Confidence progression for Qwen

Evolution of answer confidence and Macro F1 across partial responses for Qwen. Baseline queries trigger immediate high confidence despite poor detection β€” the model commits to an answer before examining the image. Prefills induce a confidence dip mid-response, with detection improving steadily as the response progresses.


πŸ”¬ Advanced Usage

  • Multi-Response Generation (n>1) - Generate multiple responses with majority voting β†’ Details
  • Phrase Modes - Test prefill vs prompt vs system instruction β†’ Details
  • Debug Mode - Quick validation with 5 examples β†’ Details

πŸ“‚ Output Structure

Results are saved in hierarchical directories with timestamped JSON files containing metrics and full reasoning traces.

See Output Structure for detailed file organization and JSON schemas.


πŸ“Š Visualization & Analysis

Generate publication-ready plots (Macro F1 bars, radar plots, vocabulary analysis, etc.)

See Plotting & Visualization System for available plots and usage instructions.


πŸ“š Documentation

  • SETUP.md - Environment setup and installation instructions
  • AGENTS.md - Complete technical reference (architecture, function signatures, all details)
  • Paper - arXiv:2506.11031

πŸ‘₯ Authors

Zoher Kachwala Β· Danishjeet Singh Β· Danielle Yang Β· Filippo Menczer

Observatory on Social Media Indiana University, Bloomington


ΒΉ Answer to image quiz: Only images 3, 10, and 11 in the mosaic are real. All others are AI-generated.