- [2026/01] π LEAD was accepted to CVPR 2026!
- [2026/03] The code and datasets have been released.
LEAD (Latent Entropy-Aware Decoding) is a training-free decoding strategy designed to mitigate hallucinations in Multimodal Large Reasoning Models (MLRMs). LEAD dynamically switches between latent decoding and discrete decoding by monitoring the entropy of token probability distributions in real time, and injects visual anchors at highly uncertain steps to strengthen visual grounding.
- Entropy-aware switching: Uses probability-weighted continuous embeddings for latent reasoning during high-entropy stages, and returns to discrete token decoding during low-entropy stages to ensure convergence.
- Visual anchor injection: Injects visual anchor tokens at critical moments of uncertain reasoning to reduce image-detached hallucinated reasoning.
- Plug-and-play: Requires no additional training or external tools and can be directly applied to existing MLRMs.
git clone https://github.com/mlrm-LEAD/mlrm-LEAD
cd mlrm-LEADpip install -r requirements.txtDownload R1-Onevision-7B, or use the Hugging Face model name for automatic download:
# Option A: automatic download
--model_name Fancy-MLLM/R1-Onevision-7B
# Option B: local path
--model_name /path/to/R1-Onevision-7Bpython main.py \
--model_name Fancy-MLLM/R1-Onevision-7B \
--dataset data/demo.jsonl \
--method lead \
--max_new_tokens 2048bash script/run.shpython main.py \
--model_name Fancy-MLLM/R1-Onevision-7B \
--dataset data/physunibench.jsonl \
--output_dir output \
--method lead \
--alpha 0.6 \
--max_switch_count 5 \
--temperature 0.6 \
--top_p 0.95 \
--top_k 20 \
--max_new_tokens 25600 \
--seed 42| Argument | Default | Description |
|---|---|---|
--model_name |
Fancy-MLLM/R1-Onevision-7B |
Hugging Face model name or local checkpoint path |
--dataset |
data/physunibench.jsonl |
Path to the dataset JSONL file |
--output_dir |
output/ |
Directory for saving results |
--limit |
None |
Run only the first N samples (for debugging) |
| Argument | Default | Description |
|---|---|---|
--method |
lead |
Decoding method: lead / cot / cot_greedy |
--alpha |
0.6 |
Soft-mode mixing coefficient Ξ±β; larger values place more weight on probability-weighted embeddings |
--max_switch_count |
5 |
Maximum number of softβnormal mode switches before convergence injection is triggered |
| Argument | Default | Description |
|---|---|---|
--temperature |
0.6 |
Sampling temperature |
--top_p |
0.95 |
Nucleus sampling threshold |
--top_k |
20 |
Top-k filtering |
--max_new_tokens |
25600 |
Maximum number of generated tokens |
--seed |
42 |
Random seed |
| Script | Description |
|---|---|
script/run.sh |
Full evaluation with the LEAD method |
script/run_cot.sh |
Evaluation of the CoT baseline |
script/run_debug.sh |
Debug mode: 5 samples with short generation |
script/run_eval.sh |
Evaluate existing results only |
Place the JSONL file in the data/ directory using the following format:
{"id": 1, "image": "path/to/image.jpg", "question": "What is shown?", "options": "A. ...\nB. ...\nC. ...\nD. ...", "answer": "A"}Built-in benchmark datasets: physunibench, math_vision, math_vista, mmvp, realworldqa, visulogic, vstar, demo
LEAD/
βββ main.py # Main entry point
βββ lead/
β βββ generation_utils.py # Core generation algorithms for LEAD and CoT
β βββ inference.py # Input construction and single-sample inference
β βββ data.py # Data loading and preprocessing
β βββ evaluator.py # Answer evaluation and accuracy statistics
β βββ prompts.py # Prompt template management
β βββ logger.py # Logging system
β βββ utils.py # General utility functions
βββ data/ # Dataset JSONL files
βββ figure/ # Paper figures
βββ script/ # Run scripts
βββ tests/ # Unit tests
βββ requirements.txt
βββ setup.py
βββ LICENSE
βββ CONTRIBUTING.md
If this project is helpful to your research, please cite our paper:
@inproceedings{xu2026thinking,
title = {Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding},
author = {Zhongxing, Xu and Zhonghua, Wang and Zhe, Qian and Dachuan, Shi and Feilong, Tang and Ming, Hu and Shiyan, Su and Xiaocheng, Zou and Wei, Feng and Dwarikanath, Mahapatra and Yifan, Peng and Mingquan, Lin and Zongyuan, Ge},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference},
year = {2026}
}This project is released under the MIT License. See the LICENSE file for details.
We thank the contributors of open-source projects Coconut, Soft-Thinking, and SwiReasoning.
