DREAM

Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding

An open-source framework to accelerate Vision Language Model (VLM) inference by up to 3x with no quality loss.

🔥 Our work has been accepted to NeurIPS 2025! The paper is now available on arXiv. ✨

🚀 Overview

DREAM is a cutting-edge framework designed to significantly accelerate the inference speed of Vision Language Models (VLMs), such as LLaVA. By employing a novel speculative decoding mechanism, DREAM achieves up to a 3x speedup over traditional autoregressive methods without compromising the quality of the output.

The core of DREAM is its innovative approach: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding. This allows the model to generate multiple tokens in parallel and validate them efficiently, leading to substantial gains in performance.

✨ Key Features

High-Performance Inference: Up to 3x faster inference for Vision Language Models (VLMs) compared to standard methods.
Zero Quality Loss: Maintains the same output distribution as the original model.
Multimodal Support: Fully compatible with multimodal models like LLaVA.
Efficient Training: Includes scripts for training the auto-regression head using DeepSpeed.
Interactive Web UI: Comes with a Gradio-based web interface for easy testing and demonstration.
Comprehensive Tooling: Provides scripts for training data generation and performance evaluation.

🎥 Demo

Vanilla	DREAM

🛠️ Setup & Installation

Clone the repository:

git clone https://github.com/SAI-Lab-NYU/DREAM.git
cd DREAM

Install dependencies: We recommend creating a virtual environment first.
```
pip install -e .
```
Note: -e installs the project in editable mode.
Download Model Weights: See the Model Weights section below for links to the available models.

⚡ Quick Start

1. Inference with Web UI

Run our Gradio-based web interface for an interactive experience. The command automatically handles model allocation across multiple GPUs.

python -m dream.application.webui \
    --ea-model-path [PATH_TO_DREAM_WEIGHTS] \
    --base-model-path [PATH_TO_BASE_MODEL]

[PATH_TO_DREAM_WEIGHTS]: Path to the downloaded DREAM weights (e.g., ./DREAM-llava-v1.6-vicuna-7b).
[PATH_TO_BASE_MODEL]: Path to the original base model weights (e.g., the original vicuna-7b-v1.3).
total-token: Number of draft tokens. Adjust this based on your hardware for optimal performance. Set to -1 for auto-configuration.

Once the model is loaded, a URL will be displayed in the terminal.

2. Training the Auto-regression Head

First, generate the necessary training data (see ./ge_data for detailed instructions and generation scripts):

python -m dream.ge_data.allocation_mix665

Then, use the following DeepSpeed command to start training:

cd dream/train
deepspeed main_deepspeed.py \
    --deepspeed_config ./ds_config.json \
    --tmpdir [PATH_TO_TRAINING_DATA] \
    --cpdir [PATH_TO_SAVE_CHECKPOINTS] \
    --configpath ./vicuna_7B_config.json

3. Evaluation

Test the inference speed of DREAM on benchmarks like MT-Bench.

python -m dream.evaluation.eval_llava \
    --ea-model-path [PATH_TO_DREAM_WEIGHTS] \
    --base-model-path [PATH_TO_BASE_MODEL]

This will generate a .jsonl file containing the generation results and wall time.

📦 Model Weights

Model	Base Model	Download
`DREAM-llava-v1.6-vicuna-7b`	`vicuna-7b-v1.6`	🤗 HideonBed12138/DREAM-llava-v1.6-vicuna-7b

📄 Citation

If you find our work useful for your research, please consider citing our paper:

@misc{hu2025dreamdraftingrefinedtarget,
  title={DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding},
  author={Yunhai Hu and Tianhua Xia and Zining Liu and Rahul Raman and Xingyu Liu and Bo Bao and Eric Sather and Vithursan Thangarasa and Sai Qian Zhang},
  year={2025},
  eprint={2505.19201},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2505.19201},
}

🙏 Acknowledgements

This project is built upon the incredible work of the open-source community. We are especially grateful to the developers of Medusa, EAGLE, and FastChat.

📜 License

DREAM is licensed under the Apache 2.0 License

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
dream		dream
figs		figs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DREAM

🚀 Overview

✨ Key Features

🎥 Demo

🛠️ Setup & Installation

⚡ Quick Start

1. Inference with Web UI

2. Training the Auto-regression Head

3. Evaluation

📦 Model Weights

📄 Citation

🙏 Acknowledgements

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DREAM

🚀 Overview

✨ Key Features

🎥 Demo

🛠️ Setup & Installation

⚡ Quick Start

1. Inference with Web UI

2. Training the Auto-regression Head

3. Evaluation

📦 Model Weights

📄 Citation

🙏 Acknowledgements

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages