Skip to content

mlrm-LEAD/mlrm-LEAD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LEAD: Latent Entropy-Aware Decoding

Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding

CVPR 2026 arXiv License Project Page

πŸ”₯ Latest Updates

  • [2026/01] πŸŽ‰ LEAD was accepted to CVPR 2026!
  • [2026/03] The code and datasets have been released.

πŸ“Œ Overview

LEAD (Latent Entropy-Aware Decoding) is a training-free decoding strategy designed to mitigate hallucinations in Multimodal Large Reasoning Models (MLRMs). LEAD dynamically switches between latent decoding and discrete decoding by monitoring the entropy of token probability distributions in real time, and injects visual anchors at highly uncertain steps to strengthen visual grounding.

πŸ’‘ Key Highlights

  • Entropy-aware switching: Uses probability-weighted continuous embeddings for latent reasoning during high-entropy stages, and returns to discrete token decoding during low-entropy stages to ensure convergence.
  • Visual anchor injection: Injects visual anchor tokens at critical moments of uncertain reasoning to reduce image-detached hallucinated reasoning.
  • Plug-and-play: Requires no additional training or external tools and can be directly applied to existing MLRMs.

πŸ› οΈ Setup

1. Clone the Repository

git clone https://github.com/mlrm-LEAD/mlrm-LEAD
cd mlrm-LEAD

2. Install Dependencies

pip install -r requirements.txt

3. Prepare Model Weights

Download R1-Onevision-7B, or use the Hugging Face model name for automatic download:

# Option A: automatic download
--model_name Fancy-MLLM/R1-Onevision-7B

# Option B: local path
--model_name /path/to/R1-Onevision-7B

πŸš€ Quick Start

Demo Example

python main.py \
    --model_name Fancy-MLLM/R1-Onevision-7B \
    --dataset data/demo.jsonl \
    --method lead \
    --max_new_tokens 2048

Full Evaluation

bash script/run.sh

Custom Configuration

python main.py \
    --model_name Fancy-MLLM/R1-Onevision-7B \
    --dataset data/physunibench.jsonl \
    --output_dir output \
    --method lead \
    --alpha 0.6 \
    --max_switch_count 5 \
    --temperature 0.6 \
    --top_p 0.95 \
    --top_k 20 \
    --max_new_tokens 25600 \
    --seed 42

βš™οΈ Arguments

Model and Data

Argument Default Description
--model_name Fancy-MLLM/R1-Onevision-7B Hugging Face model name or local checkpoint path
--dataset data/physunibench.jsonl Path to the dataset JSONL file
--output_dir output/ Directory for saving results
--limit None Run only the first N samples (for debugging)

Decoding Method

Argument Default Description
--method lead Decoding method: lead / cot / cot_greedy
--alpha 0.6 Soft-mode mixing coefficient Ξ±β‚€; larger values place more weight on probability-weighted embeddings
--max_switch_count 5 Maximum number of soft→normal mode switches before convergence injection is triggered

Sampling Parameters

Argument Default Description
--temperature 0.6 Sampling temperature
--top_p 0.95 Nucleus sampling threshold
--top_k 20 Top-k filtering
--max_new_tokens 25600 Maximum number of generated tokens
--seed 42 Random seed

πŸ“ Available Scripts

Script Description
script/run.sh Full evaluation with the LEAD method
script/run_cot.sh Evaluation of the CoT baseline
script/run_debug.sh Debug mode: 5 samples with short generation
script/run_eval.sh Evaluate existing results only

πŸ“¦ Dataset Format

Place the JSONL file in the data/ directory using the following format:

{"id": 1, "image": "path/to/image.jpg", "question": "What is shown?", "options": "A. ...\nB. ...\nC. ...\nD. ...", "answer": "A"}

Built-in benchmark datasets: physunibench, math_vision, math_vista, mmvp, realworldqa, visulogic, vstar, demo


πŸ—‚οΈ Project Structure

LEAD/
β”œβ”€β”€ main.py                    # Main entry point
β”œβ”€β”€ lead/
β”‚   β”œβ”€β”€ generation_utils.py    # Core generation algorithms for LEAD and CoT
β”‚   β”œβ”€β”€ inference.py           # Input construction and single-sample inference
β”‚   β”œβ”€β”€ data.py                # Data loading and preprocessing
β”‚   β”œβ”€β”€ evaluator.py           # Answer evaluation and accuracy statistics
β”‚   β”œβ”€β”€ prompts.py             # Prompt template management
β”‚   β”œβ”€β”€ logger.py              # Logging system
β”‚   └── utils.py               # General utility functions
β”œβ”€β”€ data/                      # Dataset JSONL files
β”œβ”€β”€ figure/                    # Paper figures
β”œβ”€β”€ script/                    # Run scripts
β”œβ”€β”€ tests/                     # Unit tests
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ setup.py
β”œβ”€β”€ LICENSE
└── CONTRIBUTING.md

πŸ“ Citation

If this project is helpful to your research, please cite our paper:

@inproceedings{xu2026thinking,
      title     = {Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding},
      author    = {Zhongxing, Xu and Zhonghua, Wang and Zhe, Qian and Dachuan, Shi and Feilong, Tang and Ming, Hu and Shiyan, Su and Xiaocheng, Zou and Wei, Feng and Dwarikanath, Mahapatra and Yifan, Peng and Mingquan, Lin and Zongyuan, Ge},
      booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference},
      year      = {2026}
}

πŸ“„ License

This project is released under the MIT License. See the LICENSE file for details.


πŸ’¬ Acknowledgments

We thank the contributors of open-source projects Coconut, Soft-Thinking, and SwiReasoning.

About

[CVPR 2026] Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors