Chanhyuk Lee1, Jaehoon Yoo1, Manan Agarwal2, Sheel Shah2, Jerry Huang2,
Aditi Raghunathan2, Seunghoon Hong1, Nicholas M. Boffi†2, Jinwoo Kim†1
1KAIST 2Carnegie Mellon University †Equal advising
We introduce Flow-based Language Model (FLM) and its flow-map distilled variant Flow-map Language Model (FMLM), enabling one-step parallel text generation through continuous denoising.
FLM applies the benefits of continuous image generation to discrete state spaces by encoding text as one-hot vectors and using flow matching to directly map noise to one-hot data. Unlike discrete diffusion, FLM gradually denoises all tokens in parallel, allowing it to represent a superposition of sequences while capturing correlations between tokens — a fundamental bottleneck for discrete diffusion in the few-step regime.
pip install torch>=2.3.0
pip install -r requirements.txt
# Install flash-attn separately matching your python / torch version (see https://github.com/Dao-AILab/flash-attention/releases)
pip install flash-attn==2.8.3 --no-build-isolationOur DiT backbone supports torch.compile with max-autotune for faster training. Enable it by setting the environment variable before running any script:
export DIT_USE_COMPILE=TRUEWith the option, we are able to train OpenWebText experiments with 512 batch size on 8 H100 (80GB VRAM), without gradient accumulation.
Before running, update data.cache_dir in the scripts to point to your dataset location. If the directory is empty, the dataset will be automatically downloaded and preprocessed.
FLM Training (1M steps)
| Dataset | Script |
|---|---|
| LM1B | scripts/train_lm1b_flm.sh |
| OpenWebText | scripts/train_owt_flm.sh |
Flow Map Distillation
Set algo.teacher_path to your pre-trained FLM checkpoint before running.
| Dataset | Script |
|---|---|
| LM1B | scripts/train_lm1b_flm_distill.sh |
| OpenWebText | scripts/train_owt_flm_distill.sh |
Second Stage Distillation (optional)
Set algo.teacher_path_f to your pre-trained FLM checkpoint and algo.teacher_path_g to your distilled backbone from above script.
| Dataset | Script |
|---|---|
| LM1B | scripts/train_lm1b_flm_distill_second.sh |
| OpenWebText | scripts/train_owt_flm_distill_second.sh |
Set CKPT_PATH in the script to your trained checkpoint before running.
| Model | Dataset | Script |
|---|---|---|
| FLM | LM1B | scripts/gen_ppl_lm1b_flm.sh |
| FLM | OpenWebText | scripts/gen_ppl_owt_flm.sh |
| FMLM | LM1B | scripts/gen_ppl_lm1b_flm_distill_double.sh |
| FMLM | OpenWebText | scripts/gen_ppl_owt_flm_distill_double.sh |
@article{lee2026one,
title={One-step Language Modeling via Continuous Denoising},
author={Chanhyuk Lee and Jaehoon Yoo and Manan Agarwal and Sheel Shah and Jerry Huang and Aditi Raghunathan and Seunghoon Hong and Nicholas M. Boffi and Jinwoo Kim},
journal={arXiv preprint arXiv:2602.16813},
year={2026}
}
