We propose DualDiffusion, a speculative decoding frame- work that combines both approaches: a lightweight KV-cached MDLM drafts multiple denoising steps rapidly, while a bidirectional MDLM verifies outputs using full context.
This project uses uv for package management. To install the required dependencies, run the following command:
uv syncThis file contains the core orchestration logic for the Dual Diffusion pipeline. The main function, dual_diffusion_generate, manages the entire process:
- Drafting: It begins by using a fast drafter model to generate a sequence of tokens.
- Verification: The drafted sequence is then passed to a more powerful verifier model.
- Comparison: A verification algorithm (from
verification_algos.py) is used to compare the outputs of the two models. - Remasking: Based on the comparison, some tokens may be "re-masked" to be generated again in the next iteration.
- Iteration: The process repeats, refining the generated sequence with each pass.
This file provides various strategies for comparing the outputs of the drafter and verifier models. These functions determine which tokens are accepted and which should be re-generated. Key algorithms include:
exact_match_verification: Remasks tokens where the drafter and verifier outputs disagree.confidence_threshold_verification: Remasks tokens if the verifier's confidence is below a certain threshold.trust_verifier: Simply accepts the verifier's output, completing the generation in a single pass.
To run the test pipeline, you can use the test_pipeline.ipynb notebook. Make sure you have a Jupyter environment installed and running. Open and execute the cells in the notebook to see the pipeline in action.
jupyter notebook test_pipeline.ipynb