Skip to content

SAI-Lab-NYU/DREAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DREAM Logo

DREAM

Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding

An open-source framework to accelerate Vision Language Model (VLM) inference by up to 3x with no quality loss.

License Version arXiv Python

🔥 Our work has been accepted to NeurIPS 2025! The paper is now available on arXiv. ✨

🚀 Overview

DREAM is a cutting-edge framework designed to significantly accelerate the inference speed of Vision Language Models (VLMs), such as LLaVA. By employing a novel speculative decoding mechanism, DREAM achieves up to a 3x speedup over traditional autoregressive methods without compromising the quality of the output.

The core of DREAM is its innovative approach: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding. This allows the model to generate multiple tokens in parallel and validate them efficiently, leading to substantial gains in performance.

✨ Key Features

  • High-Performance Inference: Up to 3x faster inference for Vision Language Models (VLMs) compared to standard methods.
  • Zero Quality Loss: Maintains the same output distribution as the original model.
  • Multimodal Support: Fully compatible with multimodal models like LLaVA.
  • Efficient Training: Includes scripts for training the auto-regression head using DeepSpeed.
  • Interactive Web UI: Comes with a Gradio-based web interface for easy testing and demonstration.
  • Comprehensive Tooling: Provides scripts for training data generation and performance evaluation.

🎥 Demo

Vanilla DREAM
Vanilla Demo DREAM Demo

🛠️ Setup & Installation

  1. Clone the repository:

    git clone https://github.com/SAI-Lab-NYU/DREAM.git
    cd DREAM
  2. Install dependencies: We recommend creating a virtual environment first.

    pip install -e .

    Note: -e installs the project in editable mode.

  3. Download Model Weights: See the Model Weights section below for links to the available models.

⚡ Quick Start

1. Inference with Web UI

Run our Gradio-based web interface for an interactive experience. The command automatically handles model allocation across multiple GPUs.

python -m dream.application.webui \
    --ea-model-path [PATH_TO_DREAM_WEIGHTS] \
    --base-model-path [PATH_TO_BASE_MODEL]
  • [PATH_TO_DREAM_WEIGHTS]: Path to the downloaded DREAM weights (e.g., ./DREAM-llava-v1.6-vicuna-7b).
  • [PATH_TO_BASE_MODEL]: Path to the original base model weights (e.g., the original vicuna-7b-v1.3).
  • total-token: Number of draft tokens. Adjust this based on your hardware for optimal performance. Set to -1 for auto-configuration.

Once the model is loaded, a URL will be displayed in the terminal.

2. Training the Auto-regression Head

First, generate the necessary training data (see ./ge_data for detailed instructions and generation scripts):

python -m dream.ge_data.allocation_mix665

Then, use the following DeepSpeed command to start training:

cd dream/train
deepspeed main_deepspeed.py \
    --deepspeed_config ./ds_config.json \
    --tmpdir [PATH_TO_TRAINING_DATA] \
    --cpdir [PATH_TO_SAVE_CHECKPOINTS] \
    --configpath ./vicuna_7B_config.json

3. Evaluation

Test the inference speed of DREAM on benchmarks like MT-Bench.

python -m dream.evaluation.eval_llava \
    --ea-model-path [PATH_TO_DREAM_WEIGHTS] \
    --base-model-path [PATH_TO_BASE_MODEL]

This will generate a .jsonl file containing the generation results and wall time.

📦 Model Weights

Model Base Model Download
DREAM-llava-v1.6-vicuna-7b vicuna-7b-v1.6 🤗 HideonBed12138/DREAM-llava-v1.6-vicuna-7b

📄 Citation

If you find our work useful for your research, please consider citing our paper:

@misc{hu2025dreamdraftingrefinedtarget,
  title={DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding},
  author={Yunhai Hu and Tianhua Xia and Zining Liu and Rahul Raman and Xingyu Liu and Bo Bao and Eric Sather and Vithursan Thangarasa and Sai Qian Zhang},
  year={2025},
  eprint={2505.19201},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2505.19201},
}

🙏 Acknowledgements

This project is built upon the incredible work of the open-source community. We are especially grateful to the developers of Medusa, EAGLE, and FastChat.

📜 License

DREAM is licensed under the Apache 2.0 License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages