Discrete Diffusion Reading Group

About the Reading Group

Diffusion LLMs are faster, more controllable successors to traditional LLMs and are rapidly gaining adoption. This reading group aims to build a community for exchanging and debating emerging ideas in this space. While our primary focus is discrete diffusion models for language, we also invite work that extends these methods to other modalities and applications—such as molecular design, drug discovery, and beyond. Each session features an author-led presentation followed by Q&A, with recordings shared on our YouTube channel.

Paper Discussions

Authors present their work followed by discussions and Q&A sessions

Recorded Sessions

All sessions are recorded and available on YouTube

Community

Stay informed through our email list and Twitter/X

Meet the Organizers

Subham Sahoo

Holds a Ph.D. from Cornell Tech, where he specialized in Diffusion Language Models. He has made foundational contributions to the field, with his work deployed at scale by Google, NVIDIA, and ByteDance across language generation and drug discovery.

Justin Deschenaux

PhD student in Machine Learning at EPFL, advised by Prof. Caglar Gulcehre. Previously interned at Apple MLR. His research interests include diffusion language models, fast generative models, and generalization.

Zhihan Yang

PhD student at Cornell CS. Previously completed his Bachelor's degrees in Mathematics and Statistics at Carleton College. He is a winner of the CRA Outstanding Undergraduate Researcher Award and his research focuses on principled, controllable, and efficient generative models.

Latest Sessions

View All Sessions

0:54:49

March 16, 2026

S12 | Discrete Feynman-Kac Correctors

Mohsin Hasan and Viktor Ohanesian present Discrete Feynman-Kac Correctors, a framework for controlling discrete diffusion sampling at inference time using Sequential Monte Carlo, enabling temperature control and reward-guided generation without retraining.

In today's session, Mohsin Hasan and Viktor Ohanesian present their recent work on Discrete Feynman-Kac Correctors, a framework for controlling the sampling distribution of discrete diffusion models at inference time. The method uses Sequential Monte Carlo (SMC) to enable temperature control (annealing), combine multiple diffusion processes, and incorporate external reward functions, without retraining or fine tuning the original model. The framework is demonstrated on applications including Ising model sampling, improved code generation, and reward guided protein sequence generation.

S10 | Reasoning with Latent Tokens in Diffusion Language Models

1:04:36

March 2, 2026

S10 | Reasoning with Latent Tokens in Diffusion Language Models

Andre He (LTI @ CMU) presents why latent tokens in diffusion language models enable planning and lookahead, and how similar multi-token prediction objectives improve autoregressive reasoning.

In today's session, Andre He (LTI @ CMU) presented his paper Reasoning with Latent Tokens in Diffusion Language Models, exploring why diffusion language models outperform autoregressive (AR) models on synthetic reasoning tasks. Andre argues that diffusion models naturally maintain "latent tokens" (as joint predictions over undecoded positions) which enable planning and lookahead, and demonstrated that modulating these latent tokens creates a smooth tradeoff between inference speed and output quality. Andre further showed that introducing a similar multi-token prediction objective into AR models significantly improves their reasoning performance, suggesting latent tokens as a general mechanism for enhancing global coherence.

S9 | Scaling Discrete Diffusion Language Models

1:20:12

February 23, 2026

S9 | Scaling Discrete Diffusion Language Models

Dimitri von Rütte (ETH) and Zhihan Yang (Cornell) present two papers on scaling laws of discrete diffusion LLMs that challenge the dominance of Masked Diffusion.

Dimitri von Rütte (ETH) and Zhihan Yang (Cornell) present "Scaling Behavior of Discrete Diffusion Language Models" (https://arxiv.org/abs/2512.10858) and "Scaling Beyond Masked Diffusion Language Models" (https://www.arxiv.org/abs/2602.15014), two recent papers presenting systematic scaling laws of uniform-state and hybrid discrete diffusion LLMs. Importantly, both papers challenge the dominance of Masked Diffusion.

Featured Videos

View All Videos

22:14

February 9, 2026

How did diffusion LLMs get so fast?

Techniques for accelerating diffusion LLMs, from self-distillation and curriculum learning to KV caching and block diffusion

This video discusses techniques for making diffusion LLMs faster, including self-distillation through time, curriculum learning, confidence scores for unmasking, guided diffusion (FlashDLM), approximate KV caching (dLLM-Cache, dKV-Cache), and block diffusion.

But How Do Diffusion Language Models Actually Work?

12:27

August 3, 2025

But How Do Diffusion Language Models Actually Work?

Jia-Bin Huang explores several ideas for applying diffusion models to language modeling

Most Large Language Models (LLMs) today are based on Autoregressive models (i.e., they predict texts in a left-to-right order). But diffusion models offer iterative refinement, flexible control, and faster sampling. In this video, we explore several ideas for applying diffusion models to language modeling.

15:07

July 3, 2024

Simple Diffusion Language Models

Quick introduction to Masked Diffusion Language Models (MDLM) by Alexander Rush

Stay Updated

Join our community and never miss a session

Twitter / X

Research highlights & updates

YouTube

Watch session recordings

Email List

Announcements & reminders

Discord Community

Chat with researchers & discuss papers

Microsoft Teams

Join live weekly sessions