😎 Generative AI Meets 6G and Beyond:
Diffusion Models for Semantic Communications

Hai-Long Qin¹, Jincheng Dai¹, Guo Lu², Shuo Shao³, Sixian Wang², Tongda Xu⁴,
Wenjun Zhang², Ping Zhang¹, Khaled B. Letaief⁵

¹ Beijing University of Posts and Telecommunications (BUPT)
² Shanghai Jiao Tong University (SJTU)
³ University of Shanghai for Science and Technology (USST)
⁴ Tsinghua University (THU)
⁵ Hong Kong University of Science and Technology (HKUST)

This repository accompanies our IEEE tutorial paper, serving as a living resource for researchers at the intersection of generative AI and wireless communications. As semantic communications emerge as a paradigm shift from bit-accurate transmission toward meaning-centric communication, diffusion models have become a cornerstone technology enabling receivers to reconstruct high-quality content from minimal semantic cues. This repository provides curated collections of representative works, popular implementations, educational resources, and practical guidelines to help researchers continuously acquire knowledge in this rapidly evolving interdisciplinary field.

📋 TL;DR

What is this article about?
To the best of our knowledge, this is the first tutorial paper on diffusion models for generative semantic communications. It provides a unified resource for researchers to efficiently begin their work in this interdisciplinary area, without separately navigating scattered literatures across generative AI and wireless communications.

🎯Mathematical Fundamentals: From score matching and Langevin dynamics to stochastic differential equations (SDEs) and probability flow ordinary differential equations (PF ODEs), we present the theoretical foundations of score-based diffusion models.
🎨 Conditioning Mechanisms: We examine how to steer diffusion models toward task-specific objectives through two complementary paradigms — inference-time conditioning that injects guidance during sampling while preserving pre-trained models, and training-time conditioning that jointly optimizes conditional and unconditional scores for tighter control, meeting the fundamental controllability requirement in semantic communications.
⚡ Sampling Acceleration: Recognizing that iterative sampling (often requiring hundreds to thousands of neural network evaluations) presents significant computational challenges for real-time deployment, we review five primary acceleration strategies: dimensionality reduction, knowledge distillation, structure pruning, cache reuse, and flow matching.
🔬 Task Generalization: We explore how diffusion models, initially conceived for specific data modalities and domains, can be extended across diverse scenarios through three fundamental aspects — modality expansion, domain adaptation, and task generalization, which addresses the requirements of task-specific multi-modal semantic communications.
📡 Application Scenarios: Through analysis of three distinct use cases, we illustrate how diffusion models enable extreme compression while maintaining semantic fidelity:
- Fidelity-oriented human semantic communications balancing consistency-realism trade-offs for perceptually realistic reconstruction
- Task-specific machine semantic communications optimizing effectiveness-efficiency trade-offs for downstream task execution under bandwidth constraints
- Intent-driven agent semantic communications managing centralization-distribution trade-offs for multi-agent coordination through shared probabilistic representations

Why is this article needed?
As wireless systems approach Shannon capacity limits, semantic communications represent a paradigm shift from bit-accurate transmission toward meaning-centric communication. The emergence of diffusion models as powerful generative priors has catalyzed generative semantic communications, where receivers reconstruct high-quality content from minimal semantic cues. However, the field currently lacks systematic guidance connecting diffusion model techniques to semantic communication system design. This article fills that critical gap by:

Eliminating barriers between machine learning and communication communities
Providing depth beyond existing surveys and magazines through rigorous mathematical treatment and implementation details
Establishing connections via an inverse problem perspective that reformulates semantic decoding as posterior inference
Offering practical resources including open-source implementations, and deployment guidelines

Who should read this?
We believe this article may be helpful to the following groups of people:

Researchers in semantic communications seeking to leverage diffusion models
Machine learning practitioners interested in wireless communication applications
Graduate students entering the interdisciplinary field of AI-native wireless networks
Engineers designing next-generation communication systems with semantic awareness

📇 Table of Contents

😎 Generative AI Meets 6G and Beyond:Diffusion Models for Semantic Communications

🎓 Fundamentals of Diffusion Models

Mathematical Foundations

Mathematical concepts underlying diffusion models.

#	Concept	Reference	Description
1	Score Matching	Estimation of Non-Normalized Statistical Models (Hyvärinen, JMLR 2005)	Foundation for learning score functions without computing partition functions
2	Denoising Score Matching	A Connection Between Score Matching and Denoising Autoencoders (Vincent, Neural Computation 2011)	Equivalence between score matching and denoising
3	Langevin Dynamics	Bayesian Learning via Stochastic Gradient Langevin Dynamics (Welling & Teh, ICML 2011)	MCMC sampling using gradient information
4	Tweedie's Formula	Tweedie's Formula and Selection Bias (Efron, JASA 1992)	Posterior mean estimation from corrupted observations
5	Neural ODEs	Neural Ordinary Differential Equations (Chen et al., NeurIPS 2018)	Continuous-depth neural networks and invertible transformations
6	Flow Matching	Flow Matching for Generative Modeling (Lipman et al., ICLR 2023)	Continuous normalizing flows via regression

Foundational Papers

Seminal works establishing the theoretical and practical foundations of diffusion models.

#	Method	Venue	Key Contribution
1	Deep Unsupervised Learning using Nonequilibrium Thermodynamics	ICML'15	First diffusion model using thermodynamic principles
2	NCSN - Generative Modeling by Estimating Gradients	NeurIPS'19	Score matching with Langevin dynamics (SMLD)
3	DDPM - Denoising Diffusion Probabilistic Models	NeurIPS'20	Simplified training objective and high-quality generation
4	DDIM - Denoising Diffusion Implicit Models	ICLR'21	Non-Markovian sampling for accelerated generation
5	Score SDE - Score-Based Generative Modeling through SDEs	ICLR'21	Unified SDE framework connecting score matching and diffusion
6	LDM - High-Resolution Image Synthesis with Latent Diffusion Models	CVPR'22	Diffusion in learned latent spaces (Stable Diffusion)

🎨 Conditional Diffusion Models

Conditional diffusion models enable controlled generation by incorporating external guidance. This section covers two main categories based on when conditioning is applied.

Inference-Time Conditional Diffusion Models

These methods introduce guidance during sampling without modifying the pre-trained model.

#	Method	Venue	Description
1	CG - Classifier Guidance	NeurIPS'21	Adds classifier gradients to steer generation
2	ILVR	ICCV'21	Iterative refinement toward a reference image
3	SDEdit	ICLR'22	Structure-preserving editing via controlled denoising
4	RePaint	CVPR'22	Inpainting by alternating denoising and re-noising
5	Prompt-to-Prompt	arXiv'22	Cross-attention editing guided by text prompts
6	DDRM	NeurIPS'22	Linear inverse problem solver using diffusion priors
7	MCG	NeurIPS'22	Adds manifold consistency during sampling
8	DDNM - Denoising Diffusion Null-space Model	ICLR'23	Null-space projection for zero-shot restoration
9	DPS - Diffusion Posterior Sampling	ICLR'23	Posterior sampling with measurement guidance
10	πGDM - Pseudoinverse-Guided DM	ICLR'23	Pseudoinverse-based conditioning for inverse tasks
11	Null-Text Inversion	CVPR'23	Real-image editing via null-text optimization
12	BlindDPS	CVPR'23	Jointly samples unknown operator and clean signal
13	DiffPIR	CVPRW'23	Plug-and-play restoration with diffusion priors
14	DiffusionMBIR	CVPR'23	Uses 2D diffusion priors for 3D reconstruction
15	FreeDoM	ICCV'23	Training-free diffusion adaptation for new tasks
16	DG - Discriminator Guidance	ICML'23	Introduces a discriminator that gives explicit supervision to a denoising sample path
17	SMRD	MICCAI'23	MRI reconstruction via diffusion priors
18	PSLD	NeurIPS'23	Posterior sampling in latent diffusion space
19	RED-diff	ICLR'24	Variational regularization with diffusion denoisers
20	ControlVideo	ICLR'24	Video editing with spatial/temporal control via fine-tuning
21	DeqIR	CVPR'24	Fixed-point solver for diffusion restoration
22	SparseCtrl	ECCV'24	Adds sparse keyframe controls to text-to-video diffusion
23	DiffBIR	ECCV'24	Blind image restoration with generative diffusion priors
24	DMPlug	NeurIPS'24	Plug-in solver for general inverse problems
25	DGSolver	NeurIPS'25	Diffusion generalist solver with universal posterior sampling
26	DAPS	CVPR'25	Annealed posterior sampling for inverse problems
27	SITCOM	ICML'25	Iterative constrained optimization during sampling
28	DiffStateGrad	ICLR'25	Gradient projection in diffusion latent space
29	RF-Inversion	ICLR'25	Semantic image inversion and editing using rectified SDEs
30	FlowDPS	ICCV'25	Posterior sampling within flow-matching ODEs

Key Formula:

Training-Time Conditional Diffusion Models

These methods incorporate conditioning directly during model training.

#	Method	Venue	Description
1	CFG - Classifier-Free Guidance	NeurIPS'21	Standard for conditional generation
2	LDM - Latent Diffusion Model	CVPR'22	Stable Diffusion foundation
3	Palette	SIGGRAPH'22	Image-to-image diffusion (colorization, inpainting, etc.)
4	Textual Inversion	arXiv'22	Personalizes a concept via learned token embeddings
5	DreamBooth	CVPR'23	Subject-driven personalization via fine-tuning
6	GLIGEN	CVPR'23	Grounded language-to-image generation
7	InstructPix2Pix	CVPR'23	Instruction-based image editing
8	ControlNet	ICCV'23	Fine-grained spatial control
9	IP-Adapter	arXiv'23	Image prompt adapter for identity/style conditioning
10	MoD - Mixture of Diffusers	arXiv'23	Conditional diffusion with learned mixture experts
11	DiT - Diffusion Transformer	ICCV'23	Transformer-based diffusion
12	MDT - Masked Diffusion Transformer	ICCV'23	Masked diffusion transformers
13	SDXL - Stable Diffusion XL	ICLR'24	High-res text-to-image diffusion with multi-aspect conditioning
14	T2I-Adapter	AAAI'24	Lightweight adapters for control
15	AnimateDiff	ICLR'24	Motion module for animation
16	LVD - LLM-grounded Video Diffusion	ICLR'24	LLM-guided video generation
17	SEINE	ICLR'24	Short-to-long video diffusion
18	VideoCrafter2	CVPR'24	Open-source text-to-video / video editing diffusion pipeline
19	HunyuanDiT	CVPR'24	Large-scale DiT-based text-to-image diffusion with strong conditioning
20	S-CFG - Rethinking spatial Inconsistency in CFG	CVPR'24	Analyzes and improves spatial consistency in CFG-based generation
21	D3PO	CVPR'24	RLHF-style preference finetuning for diffusion without reward model
22	DreamMatcher	CVPR'24	Appearance matching self-attention for semantically-consistent text-to-image personalization
23	PixArt-Σ	ECCV'24	High-resolution text-to-image
24	Follow-Your-Emoji	SIGGRAPH Asia'24	Fine-controllable and expressive freestyle portrait animation with diffusion
25	HunyuanVideo	arXiv'24	High-res text-to-video diffusion with multi-scale DiT backbone
26	DDO - Direct Discriminative Optimization	ICML'25	Direct optimization for preference alignment
27	CFG++	ICLR'25	Refines CFG via dynamic gradient weighting
28	Ctrl-Adapter	ICLR'25	Unified adapter to inject diverse spatial/temporal controls into image/video diffusion
29	T2V-Turbo-v2	ICLR'25	Fast text-to-video generation
30	β-CFG	arXiv'25	Dynamic guidance method for text-to-image diffusion models

Key Formula:

⚡ Efficient Diffusion Models

Efficient diffusion models aim to reduce computational cost and sampling time through various acceleration strategies.

Dimensionality Reduction

Operating in compressed latent spaces reduces computational overhead.

#	Method	Venue	Description
1	LDM - Latent Diffusion Model	CVPR'22	Stable Diffusion foundation
2	WSGM - Wavelet Score-based GM	NeurIPS'22	Wavelet-based score models
3	DiT - Diffusion Transformer	ICCV'23	Transformer-based diffusion
4	WaveDiff	CVPR'23	Wavelet-based diffusion
5	LMD - Latent Masking Diffusion	AAAI'24	Combines the advantages of MAEs and diffusion

Knowledge Distillation

Distilling multi-step diffusion into fewer steps or single-step models.

#	Method	Venue	Description
1	PD - Progressive Distillation	ICLR'22	4-8 steps with minimal quality loss
2	CM - Consistency Model	ICML'23	Single-step generation
3	LCM - Latent Consistency Model	arXiv'23	Distills diffusion into few-step latent consistency models
4	DMD2 - Distribution Matching Distillation v2	NeurIPS'24	Improved distribution matching
5	CTM - Consistency Trajectory Model	ICLR'24	Trajectory consistency modeling
6	iCT - Improved Consistency Training	ICML'24	Improved consistency training without teacher models

Structure Pruning

Reducing model parameters through structured pruning.

#	Method	Venue	Description
1	Diff-Pruning	NeurIPS'23	Structural pruning for diffusion
2	TDPM - Truncated DPM	ICLR'23	Truncated diffusion models
3	LD-Pruner	CVPR'24	Latent diffusion pruning
4	DiP-GO	NeurIPS'24	Diffusion pruning with gradient optimization
5	AdaDiff	ECCV'24	Adaptive diffusion pruning
6	SnapFusion	NeurIPS'23	Mobile diffusion via architecture evolution and data distillation

Cache Reuse

Reusing intermediate computations across sampling steps.

#	Method	Venue	Description
1	DeepCache	CVPR'24	Deep feature caching
2	BlockCaching	CVPR'24	Block-wise caching strategy
3	L2C - Learning to Cache	NeurIPS'24	Learned caching policies
4	ToCa - Token-wise Caching	ICLR'25	Token-wise feature caching for DiT acceleration
5	ClusCa - Clustered Caching	MM'25	Compute-efficient clustering cache
6	TaylorSeer	ICCV'25	Taylor expansion-based feature forecasting for DiT acceleration

Flow Matching

Transforming diffusion into deterministic flows for faster sampling.

#	Method	Venue	Description
1	Flow Matching	ICLR'23	Continuous normalizing flows
2	Rectified Flow	ICLR'23	Straightening probability flows
3	PeRFlow - Piecewise Rectified Flow	NeurIPS'24	Piecewise rectification for accelerating diffusion models
4	InstaFlow	ICLR'24	One-step generation via rectified flow
5	MeanFlow	NeurIPS'25	Mean-field flow matching
6	Stable Diffusion 3	arXiv'24	Scaling rectified flow transformers for high-resolution image synthesis (MMDiT)
7	FLUX	arXiv'25	High-quality flow matching-based text-to-image model with hybrid transformer architecture

🌐 Generalized Diffusion Models

Generalized diffusion models extend the framework to diverse modalities, domains, and tasks.

Modality Expansion

Extending diffusion to multiple modalities beyond images.

#	Method	Venue	Description
1	MonoFormer	arXiv'24	One transformer for both diffusion and autoregression
2	Diffusion Forcing	NeurIPS'24	Full-sequence diffusion forcing
3	Show-o	ICLR'25	Unified image and text generation
4	Transfusion	ICLR'25	Combining diffusion and autoregression
5	UniDisc	arXiv'25	Unified discrete-continuous diffusion
6	OmniGen2	arXiv'25	Unified image generation model with multi-modal conditioning

Domain Adaptation

Adapting diffusion models to specialized domains.

#	Method	Venue	Description
1	DSB - Diffusion Schrödinger Bridge	NeurIPS'21	Domain transfer via Schrödinger bridge
2	Composable Diffusion	ECCV'22	Compositional visual generation
3	DreamBooth	CVPR'23	Personalization with few examples
4	I2SB - Image-to-Image Schrödinger Bridge	ICML'23	Image-to-image translation
5	P2P-Bridge	ECCV'24	Point-to-point bridging
6	OT-CFM	ICLR'23	Optimal transport conditional flow matching for efficient domain coupling

Task Generalization

Generalizing diffusion models across multiple tasks.

#	Method	Venue	Description
1	Diffuser	ICML'22	Planning with diffusion models
2	Diffusion Policy	RSS'23	Visuomotor policy learning
3	DDPO - Denoising Diffusion Policy Optimization	ICLR'24	RL fine-tuning for diffusion
4	C-LoRA - Continual LoRA	TMLR'24	Continual learning for diffusion
5	Diffusion-ES	CVPR'24	Evolutionary search with diffusion for black-box trajectory optimization
6	B²-DiffuRL	CVPR'25	Bidirectional diffusion for RL
7	DPPO - Diffusion Policy Policy Optimization	ICLR'25	PPO fine-tuning for diffusion policies in robotics

🛜 Diffusion Models for Semantic Communications

This section presents applications of diffusion models in semantic communications.

[Preliminary] Diffusion Models for Data Compression

Representative works using diffusion models for data compression across image, video, and audio modalities.

#	Method	Venue	Description
1	CDC	NeurIPS'23	Conditional diffusion decoder for end-to-end optimized lossy image compression
2	HFD	arXiv'23	High-fidelity compression with score-based generative models
3	Multi-Band Diffusion	NeurIPS'23	High-fidelity audio generation from low-bitrate discrete representations
4	PerCo	ICLR'24	Ultra-low bitrate image compression with diffusion models (0.003 bpp)
5	IPIC (Idempotence)	ICLR'24	Perceptual compression via idempotence constraints without training new models
6	CorrDiff	ICML'24	Correcting diffusion compression with privileged end-to-end decoder
7	Foundation Diffusion	ECCV'24	Lossy compression using pre-trained foundation models without fine-tuning
8	Extreme Video Compression	WCSP'24	Extreme video compression with diffusion-based predictive generation (0.02 bpp)
9	UQDM	ICLR'25	Progressive compression with universally quantized diffusion models
10	DiffC	ICLR'25	Zero-shot lossy compression using pretrained Stable Diffusion models
11	PICD	CVPR'25	Versatile perceptual image compression with diffusion rendering for screen and natural images

Fidelity-Oriented Human Semantic Communications

Diffusion models for high-quality semantic image, video, and audio transmission prioritizing perceptual fidelity for human consumption.

#	Method	Venue	Description
1	DM4ASC	ICASSP'24	First diffusion framework for audio semantic communication as inverse problem
2	CommIN	ICASSP'24	INN-guided diffusion for wireless image transmission as inverse problem
3	DiffSC	ICASSP'24	DDPM with Multi-Dimensional Feature Extraction for high-noise environments
4	CDDM	TWC'24	Channel denoising diffusion models adapting to AWGN/Rayleigh channels
5	Gen-SC	WCSP'24	Transmits images efficiently by sending text descriptions and reconstructing images via a text-to-image diffusion model
6	CDM-JSCC	WCL'24	Enhances the perceptual quality of transmitted images by utilizing a rate-adaptive conditional diffusion model
7	Img2Img-SC	MLSP'24	Language-oriented semantic communication framework that transmits both textual descriptions and compressed image embeddings
8	MU-GSC	arXiv'24	Swin Transformer JSCC with diffusion decoder, 17.75% PSNR improvement
9	DiffJSCC	TMLCN'25	Pre-trained Stable Diffusion with Deep JSCC achieving <0.008 symbols/pixel
10	DiffCom	JSAC'25	Probabilistic sampling using channel signals as fine-grained conditions
11	GVSC	TVT'25	First generative video semantic communication at low bandwidth ratio
12	Wang et al.	arXiv'25	Receiver-driven retransmission with caption-guided latent diffusion inpainting
13	SGD-JSCC	arXiv'25	DiT-based diffusion with semantic side information for channel denoising
14	WVSC-D	arXiv'25	Wireless video semantic communication framework with decoupled diffusion multi-frame compensation
15	DiT-JSCC	arXiv'26	A DiT-based generative JSCC that ensures high semantic consistency for image transmission under extreme channel conditions

Task-Specific Machine Semantic Communications

Resource-efficient diffusion models optimized for machine semantic communications and edge computing scenarios.

#	Method	Venue	Description
1	GESCO	arXiv'23	Pioneering diffusion-based machine semantic communication transmitting compressed semantic maps
2	Qiao et al.	WCL'24	Latency-aware generative semantic communications with pre-trained diffusion models
3	SCGSC	WCNC'24	Semantic change driven generative machine semantic communication framework
4	LDM-SemCom	TWC'25	Real-time edge computing with end-to-end consistency distillation
5	Guo et al.	TWC'25	Treating wireless transmission as forward diffusion process with VAE modules
6	Q-GESCO	WCL'25	Quantized models reducing memory 75% and FLOPs 79% for resource-constrained devices
7	CASC	ICC'25	Latent diffusion with Condition-Aware NN, 51.7% inference time reduction
8	SC-Diffusion	TMLCN'25	Parameter generation for task-oriented semantic communications via conditional diffusion model
9	Khalid et al.	ICML'25	Semantic image communication via Stable Cascade with compact latent embeddings
10	Wang et al.	arXiv'25	Training-free LDM receiver with SDE-derived SNR-to-timestep mapping for zero-shot generalization
11	DiffSem	arXiv'25	Task-oriented with privacy, notable accuracy improvement on MNIST
12	SS-MGSC	arXiv'25	A multi-user generative semantic communication framework utilizing semantic-splitting and diffusion models for personalized vehicular networks

Intent-Driven Agent Semantic Communications

AI agents with diffusion models for intent-driven semantic communications.

#	Method	Venue	Description
1	A-GSC	TWC'24	Agent-driven generative semantic communications with cross-modality and prediction based on diffusion RL
2	Semantic Collaboration	CNIOT'24	A multi-agent collaboration framework based on semantic communication for search and rescue tasks
3	CSCA	TMC'26	A diffusion policy-empowered cognitive SemCom agent for intent-driven multimodal communication planning at the edge

📊 Benchmarks and Datasets

Benchmarks

Widely-used open-source benchmarks for evaluating diffusion model generation quality, prompt fidelity, and compositional capabilities.

Text-to-Image Benchmarks

#	Benchmark	Description
1	DrawBench	200 challenging prompts across 11 categories (counting, colors, spatial, text rendering, etc.) introduced by Imagen for qualitative human evaluation of T2I models.
2	PartiPrompts (P2)	1,600 diverse English prompts spanning 12 categories and 11 challenge aspects for holistic T2I evaluation. Released with the Parti model.
3	TIFA	VQA-based automatic evaluation measuring T2I faithfulness by generating question-answer pairs from prompts and verifying against images. 4K prompts, 25K questions across 12 categories.
4	T2I-CompBench	Comprehensive compositional T2I benchmark evaluating attribute binding, spatial relationships, and complex compositions with detection-based metrics.
5	GenEval	Compositional generation benchmark evaluating object count, spatial relations, attribute binding, and co-occurrence accuracy via object detection pipelines.
6	DPG-Bench	Dense prompt generation benchmark with long, detailed prompts synthesized from multi-annotation sources for evaluating models on complex, attribute-rich descriptions.
7	MJHQ-30K	30K high-quality Midjourney images across 10 categories for automatic FID-based aesthetic quality evaluation. Curated with aesthetic and CLIP score filtering.
8	GenAI-Bench	1,600 compositional prompts from professional designers, evaluating advanced reasoning (counting, comparison, logic) with human ratings across 10 leading T2I/T2V models.

Video Generation Benchmarks

#	Benchmark	Description	Source
1	VBench	Comprehensive video generation benchmark evaluating 16 dimensions including temporal consistency, motion quality, aesthetic fidelity, and subject identity.
2	EvalCrafter	Benchmark and pipeline for evaluating video generation models across visual quality, text-video alignment, motion quality, and temporal consistency.

Datasets

Audio

#	Dataset	Description	Size	Tasks
1	LibriSpeech	Large-scale corpus of read English speech derived from audiobooks. Clean and noisy subsets available.	1000 hours	ASR, Speech Recognition
2	VCTK	English multi-speaker corpus with 110 speakers reading newspapers. High-quality recordings.	44 hours	TTS, Voice Conversion, Speaker Recognition
3	AudioSet	Large-scale dataset of 2M 10-second audio clips with 527 sound event classes from YouTube.	2M clips	Audio Classification, Sound Event Detection

Image

#	Dataset	Description	Size	Tasks
1	ImageNet	Large-scale image classification dataset with 1000 object categories. Standard benchmark for computer vision.	1.4M images	Classification, Object Recognition
2	COCO	Common Objects in Context. Object detection, segmentation, and captioning with 80 categories.	330K images	Detection, Segmentation, Captioning
3	FFHQ	Flickr-Faces-HQ. High-quality face dataset at 1024×1024 resolution with diverse variations.	70K images	Face Generation, GAN, Style Transfer
4	CLIC	Challenge on Learned Image Compression dataset. Professional quality images for compression research.	2000+ images	Image Compression, Quality Assessment
5	Kodak	Kodak PhotoCD dataset. Standard benchmark with 24 high-quality uncompressed images.	24 images	Image Compression, Quality Evaluation
6	Places365	Scene recognition dataset with 365 scene categories. Focuses on environmental context.	10M images	Scene Recognition, Classification
7	CelebA	Large-scale face attributes dataset with 40 attribute annotations per image.	202K images	Face Recognition, Attribute Prediction

Video

#	Dataset	Description	Size	Tasks
1	Kinetics-400/600/700	Large-scale human action video dataset from YouTube. Standard for action recognition.	650K videos	Action Recognition, Video Classification
2	UCF101	Action recognition dataset with 101 action categories from realistic web videos.	13K videos	Action Recognition, Video Understanding
3	ActivityNet	Large-scale video dataset for human activity understanding with temporal annotations.	20K videos	Activity Detection, Temporal Localization
4	YouTube-8M	Large-scale video understanding dataset with 8M videos and 3862 visual entity classes.	8M videos	Video Classification, Multi-label
5	MSR-VTT	Video captioning dataset with 10K video clips and 200K natural language descriptions.	10K videos	Video Captioning, Video-Text Retrieval

Volume (3D/4D)

#	Dataset	Description	Size	Tasks
1	D-NeRF	Dynamic Neural Radiance Fields dataset with synthetic and real dynamic scenes for 4D reconstruction.	9 scenes	Dynamic Novel View Synthesis, 4D Reconstruction
2	Neu3D	Neural 3D video synthesis dataset with multi-view videos of human performances.	200+ sequences	3D Human Reconstruction, Neural Rendering
3	ShapeNet	Large-scale 3D shape dataset with 55 object categories and 51,300 3D CAD models.	51K models	3D Reconstruction, Shape Analysis
4	ScanNet	Richly-annotated indoor RGB-D scans with 3D semantic segmentation labels for 1513 scenes.	1513 scans	3D Segmentation, Indoor Scene Understanding
5	ModelNet	3D CAD model dataset with ModelNet40 (40 classes) and ModelNet10 (10 classes) versions.	12K models	3D Classification, Point Cloud Processing
6	NeRF Synthetic	Blender-rendered synthetic scenes with known camera poses and lighting for NeRF evaluation.	8 scenes	Novel View Synthesis, 3D Reconstruction

Domain-Specific

Autonomous Driving

#	Dataset	Description	Size	Tasks
1	nuScenes	Full 3D sensor suite with LiDAR, radar, and cameras. 1000 scenes with 3D bounding boxes.	1000 scenes	3D Detection, Tracking, Prediction
2	KITTI	Benchmark suite for stereo, optical flow, visual odometry, and 3D object detection from driving scenarios.	200K images	3D Detection, Depth, Odometry
3	Waymo Open Dataset	High-resolution sensor data with LiDAR and camera from Waymo vehicles. Large-scale 3D annotations.	1000 segments	3D Detection, Tracking, Motion Prediction
4	Cityscapes	Urban street scenes with dense pixel-level semantic and instance segmentation annotations.	25K images	Semantic Segmentation, Instance Segmentation

Medical Imaging

#	Dataset	Description	Size	Tasks
1	BraTS	Brain Tumor Segmentation challenge with multimodal MRI scans (T1, T2, FLAIR, T1ce). Annual benchmark.	2000+ cases	3D Tumor Segmentation, Medical Imaging
2	MIMIC-CXR	Large chest X-ray dataset with free-text radiology reports. Largest publicly available CXR dataset.	377K images	Disease Classification, Report Generation
3	ChestX-ray14	Large-scale chest X-ray dataset with 14 common disease labels for multi-label classification.	112K images	Disease Classification, Localization
4	Medical Segmentation Decathlon	Multi-organ segmentation covering 10 different medical imaging tasks (CT, MRI).	2600+ cases	Multi-task 3D Segmentation

Depth Estimation

#	Dataset	Description	Size	Tasks
1	NYU Depth V2	Indoor RGB-D dataset with dense depth maps from Microsoft Kinect. 1449 labeled scenes.	1449 scenes	Depth Estimation, Indoor Scene Understanding
2	DIODE	Dense Indoor and Outdoor DEpth dataset with high-quality depth from laser scanner.	25K images	Depth Estimation, Normal Estimation
3	Middlebury Stereo	Standard stereo matching benchmark with high-resolution calibrated image pairs and ground truth.	30+ pairs	Stereo Matching, Depth Estimation
4	SceneFlow	Large synthetic dataset with optical flow and disparity ground truth for 3D scene understanding.	39K images	Optical Flow, Stereo Matching, Depth

Remote Sensing

#	Dataset	Description	Size	Tasks
1	SpaceNet	High-resolution satellite imagery with building footprints, road networks across multiple cities.	1M+ buildings	Building Detection, Road Extraction
2	xView	One of the largest overhead imagery datasets with 1M object instances across 60 classes.	1M objects	Object Detection, Classification
3	DOTA	Dataset for Object deTection in Aerial images with oriented bounding boxes. 15 categories.	188K instances	Oriented Object Detection, Aerial Imagery
4	LEVIR-CD	Large-scale building change detection dataset from Google Earth with 637 image pairs.	637 pairs	Change Detection, Building Analysis

📏 Evaluation Metrics

Perception Metrics

Full-Reference Metrics

#	Metric	Description
1	PSNR	Peak Signal-to-Noise Ratio. Measures the ratio between the maximum possible power of a signal and the power of corrupting noise. Calculated as PSNR = 10·log₁₀(MAX²/MSE).
2	SSIM	Structural Similarity Index. Assesses image quality based on luminance, contrast, and structure. Designed to improve on PSNR by considering structural information.
3	LPIPS	Learned Perceptual Image Patch Similarity. Uses deep neural network features to compute perceptual distance between images, better aligned with human perception.
4	DISTS	Deep Image Structure and Texture Similarity. Combines structure and texture similarity using deep features for better perceptual quality assessment.

Reduced-Reference Metrics

#	Metric	Description	Source
1	RRED	Reduced-Reference Entropic Differencing. Uses entropic differences between wavelet coefficients, requiring only partial statistical features from reference.
2	RR-SSIM	Reduced-Reference SSIM. Extracts and transmits only key structural features (edge information, local statistics) from reference image.

No-Reference Metrics

#	Metric	Description
1	NIQE	Natural Image Quality Evaluator. Measures deviation from statistical regularities in natural images using natural scene statistics (NSS). Completely blind quality assessment.
2	FID	Fréchet Inception Distance. Calculates Fréchet distance between feature distributions of real and generated images in Inception-v3 space. Lower FID indicates better quality and diversity.
3	KID	Kernel Inception Distance. Unbiased alternative to FID using polynomial kernel on Inception features. More reliable for small sample sizes.
4	IS	Inception Score. Evaluates both quality (classification confidence) and diversity (marginal class distribution).
5	MUSIQ	Multi-scale Image Quality Transformer. Handles native-resolution images via multi-scale patch embedding without fixed-size cropping, enabling more robust no-reference quality assessment.
6	CLIP-IQA	Leverages CLIP's vision-language representations for no-reference image quality and aesthetic assessment via prompt-based antonym pairing.

Semantic Metrics

#	Metric	Description
1	CLIPScore	Measures text-image alignment using CLIP embeddings. Computed as cosine similarity between CLIP image and text features.
2	ViTScore	Uses Vision Transformer features to evaluate semantic similarity between images. Captures high-level semantic content beyond pixel-level differences.
3	SeSS	Semantic Similarity Score. Based on Scene Graph Generation and graph matching, shifts image similarity scores into semantic-level graph matching scores.
4	DreamSim	Learned perceptual metric trained on synthetic triplet judgments from diffusion models, capturing mid-level semantic similarity beyond low-level texture.
5	ImageReward	Text-image alignment metric learned from human preference rankings via reward modeling, designed to evaluate text-to-image generation quality.
6	HPSv2	Human Preference Score v2. Fine-tuned CLIP model predicting human aesthetic preferences for generated images, trained on large-scale human choice data.
7	PickScore	Preference-based scoring model trained on the Pick-a-Pic dataset of human pairwise preferences for text-to-image generation.

🔗 Other Resources

📚 Comprehensive Books, Surveys & Tutorials

Diffusion Models

#	Paper	Authors	Year
1	Understanding Diffusion Models: A Unified Perspective	Luo et al.	2022
2	Diffusion Models: A Comprehensive Survey of Methods and Applications	Yang et al.	2022
3	Diffusion Models in Vision: A Survey	Croitoru et al.	2022
4	A Survey on Generative Diffusion Models	Cao et al.	2022
5	A Survey on Video Diffusion Models	Xing et al.	2023
6	Diffusion Models for Image Restoration and Enhancement: A Comprehensive Survey	Li et al.	2023
7	Efficient Diffusion Models: A Comprehensive Survey From Principles to Practices	Ma et al.	2024
8	Diffusion Model-Based Image Editing: A Survey	Huang et al.	2024
9	Diffusion Models in Low-Level Vision: A Survey	He et al.	2024
10	Diffusion Models in 3D Vision: A Survey	Wang et al.	2024
11	Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review	Uehara et al.	2024
12	Efficient Diffusion Models: A Survey	Shen et al.	2025
13	A Survey on Diffusion Language Models	Li et al.	2025
14	The Principles of Diffusion Models	Lai et al.	2025
15	Flow Matching Guide and Code	Lipman et al.	2024
16	An Introduction to Flow Matching and Diffusion Models	Holderrieth & Erives	2025

Semantic Communications

#	Paper	Authors	Year
1	Toward Wisdom-Evolutionary and Primitive-Concise 6G: A New Paradigm of Semantic Communication Networks	Zhang et al.	2022
2	Semantic Communications for Future Internet: Fundamentals, Applications, and Challenges	Yang et al.	2022
3	Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications	Gunduz et al.	2022
4	Semantics-Empowered Communications: A Tutorial-Cum-Survey	Lu et al.	2022
5	Less Data, More Knowledge: Building Next Generation Semantic Communication Networks	Chaccour et al.	2022
6	Enhancing Deep Reinforcement Learning: A Tutorial on Generative Diffusion Models in Network Optimization	Du et al.	2023
7	A Survey on Semantic Communication Networks: Architecture, Security, and Privacy	Guo et al.	2024
8	Resource Management, Security, and Privacy Issues in Semantic Communications: A Survey	Won et al.	2024
9	Generative AI-Driven Semantic Communication Networks: Architecture, Technologies, and Applications	Liang et al.	2024
10	A Contemporary Survey on Semantic Communications: Theory of Mind, Generative AI, and Deep Joint Source-Channel Coding	Nguyen et al.	2025
11	Generative Diffusion Models for Wireless Networks: Fundamental, Architecture, and State-of-the-Art	Fan et al.	2025
12	Resource Allocation in Wireless Semantic Communications: A Comprehensive Survey	Zhang et al.	2025

📺 Courses & Video Lectures

#	Title	Source	Type
1	Stanford CS236: Deep Generative Models	Stefano Ermon et al.	University Course
2	MIT 6.S978: Deep Generative Models	Kaiming He et al.	University Course
3	MIT 6.S184: Introduction to Flow Matching and Diffusion Models	Peter Holderrieth & Ezra Erives	University Course
4	Diffusion Models Course	Hugging Face	Online Course
5	NeurIPS 2023 Workshop: Diffusion Models	NeurIPS	Workshop
6	Diffusion and Score-Based Generative Models	Yang Song	Lecture
7	Two Minute Papers – Diffusion Series	Two Minute Papers	YouTube Series
8	Generative Modeling by Estimating Gradients of the Data Distribution	Yang Song	Blog Post
9	What are Diffusion Models?	Lilian Weng	Blog Post

🧰 Interactive Demos & Tools

#	Tool	Type	What it’s great for
1	Stable Diffusion WebUI (AUTOMATIC1111)	UI + Extensions	Local UI with huge plugin ecosystem
2	InvokeAI	Pro UI	Studio-style creative workflow & editing
3	🤗 Diffusers	Library	Clean Python API for diffusion inference & training
4	Diffusers Playground (Hugging Face Spaces)	Web demo	Try many pipelines online (no local install)
5	ComfyUI	Node-graph UI	Modular node-based pipelines for reproducible flows
6	StableStudio (Stability AI)	Official UI	Frontend for SDXL / stability models
7	Fooocus	Simple UI	One-click text→image with SDXL support
8	kohya-ss / sd-scripts	Training / Finetune	LoRA, DreamBooth, finetuning helpers
9	ControlNet	Conditioning model	Pose / edge / depth guided generation
10	sd-webui-controlnet	WebUI Extension	Easy ControlNet integration for WebUI

📝 Citation

If you find this article or repository helpful, please consider citing:

@article{qin-diffcomm,
    author  = {H. L. Qin and J. Dai and G. Lu and S. Shao and S. Wang and T. Xu and W. Zhang and P. Zhang and K. B. Letaief},
    title   = {Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications},
    journal = {arXiv preprint arXiv:2511.08416},
    year    = {2025}
}

Related Papers from Our Group

@article{dai-gaicomm,
	author  = {J. Dai and X. Qin and S. Wang and L. Xu and K. Niu and P. Zhang},
	title   = {Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency},
	journal = {IEEE Wireless Commun.},
	volume  = {31},
	number  = {4},
	pages   = {48--56},
	year    = {2024}
}

@article{wang-diffcom,
	author  = {S. Wang and J. Dai and K. Tan and X. Qin and K. Niu and P. Zhang},
	title   = {DiffCom: Channel Received Signal is a Natural Condition to Guide Diffusion Posterior Sampling},
	journal = {IEEE J. Sel. Areas Commun.},
	volume  = {43},
	number  = {7},
	pages   = {2651--2666},
	year    = {2025}
}

@article{qin-semcod,
    author  = {H. L. Qin and J. Dai and S. Wang and X. Qin and S. Shao and K. Niu and W. Xu and P. Zhang},
    title   = {Neural Coding is Not Always Semantic: Toward the Standardized Coding Workflow in Semantic Communications},
    journal = {IEEE Commun. Stand. Mag.},
    volume  = {9},
    number  = {4},
    pages   = {24--33},
    year    = {2025}
}

@article{tan-ditjscc,
    author  = {K. Tan and J. Dai and S. Wang and G. Lu and S. Shao and K. Niu and W. Zhang and P. Zhang},
    title   = {DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations},
    journal = {arXiv preprint arXiv:2601.03112},
    year    = {2026}
}

🌟 Acknowledgments

We thank the diffusion models and semantic communications research communities for their groundbreaking work. Special thanks to all and future contributors to this repository.

⭐ Star this repo if you find it useful! ⭐

Maintained with ❤️ by the community members:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
LICENSE		LICENSE
README.md		README.md
index.html		index.html
script.js		script.js
style.css		style.css
translations.js		translations.js
viz.js		viz.js

Folders and files

Latest commit

History

Repository files navigation

😎 Generative AI Meets 6G and Beyond:Diffusion Models for Semantic Communications