Hierarchy Drafting for Speculative Decoding

Official Code Repository for the paper "Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding"

As the paper is under review, we anonymously released the code.

Installation

The first step of installation is to create a conda environment as follows:

$ conda create -n HD python=3.10
$ pip install -r requirements.txt

Then, we exploit Python Library, DraftRetriever designed by REST, for the statistics-dependent database. DraftRetriever can be installed in here.

Construct Model-dependent DB

As model-dependent DB is based on previously generated texts by target LLM, such texts can be generated as follows:

$ python ./scripts/construct_model_DB --model_dir {save_path} --model {target LLM}

Currently, we only support Llama-2 and Vicuna-v1.3. We will update the code to support more models and provide the data for generated texts.

Run Hierarchy Drafting

You can run the hierarchy drafting as follows:

CUDA_VISIBLE_DEVICES=${GPU_DEVICES} USE_LADE=1 python -m evaluation.inference_hierachy --model-path $Vicuna_PATH --model-id ${MODEL_NAME}-${torch_dtype}-hierarchy --level $LEVEL --window $WINDOW --guess $GUESS --previous_tokens $PREVIOUS_TOKENS --bench-name $bench_NAME --dtype $torch_dtype --history_file {history_file} --db_file=${db_file} --do_WM --do_SM --do_LM --order $ORDER

Also, you can easily run Hierarchy Drafting and other baselines in the run_llama.sh and run_vicuna.sh.

Acknowledgement

The implementation of Hierarchy Drafting is from REST, LADE, and Spec-Bench.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hierarchy Drafting for Speculative Decoding

Installation

Construct Model-dependent DB

Run Hierarchy Drafting

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data/spec_bench		data/spec_bench
evaluation		evaluation
model		model
scripts		scripts
.gitignore		.gitignore
README.MD		README.MD
requirements.txt		requirements.txt
run_llama.sh		run_llama.sh
run_vicuna.sh		run_vicuna.sh

Folders and files

Latest commit

History

Repository files navigation

Hierarchy Drafting for Speculative Decoding

Installation

Construct Model-dependent DB

Run Hierarchy Drafting

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages