Skip to content

qin-jingyun/Awesome-DiffComm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

😎 Generative AI Meets 6G and Beyond:
Diffusion Models for Semantic Communications

Awesome arXiv IEEE Project Page Stars

License Visitors Pull Request Email

Hai-Long Qin¹, Jincheng Dai¹, Guo Lu², Shuo Shao³, Sixian Wang², Tongda Xu⁴,
Wenjun Zhang², Ping Zhang¹, Khaled B. Letaief

¹ Beijing University of Posts and Telecommunications (BUPT)
² Shanghai Jiao Tong University (SJTU)
³ University of Shanghai for Science and Technology (USST)
⁴ Tsinghua University (THU)
⁵ Hong Kong University of Science and Technology (HKUST)

This repository accompanies our IEEE tutorial paper, serving as a living resource for researchers at the intersection of generative AI and wireless communications. As semantic communications emerge as a paradigm shift from bit-accurate transmission toward meaning-centric communication, diffusion models have become a cornerstone technology enabling receivers to reconstruct high-quality content from minimal semantic cues. This repository provides curated collections of representative works, popular implementations, educational resources, and practical guidelines to help researchers continuously acquire knowledge in this rapidly evolving interdisciplinary field.


Teaser

📋 TL;DR

What is this article about?
To the best of our knowledge, this is the first tutorial paper on diffusion models for generative semantic communications. It provides a unified resource for researchers to efficiently begin their work in this interdisciplinary area, without separately navigating scattered literatures across generative AI and wireless communications.

  • 🎯Mathematical Fundamentals: From score matching and Langevin dynamics to stochastic differential equations (SDEs) and probability flow ordinary differential equations (PF ODEs), we present the theoretical foundations of score-based diffusion models.
  • 🎨 Conditioning Mechanisms: We examine how to steer diffusion models toward task-specific objectives through two complementary paradigms — inference-time conditioning that injects guidance during sampling while preserving pre-trained models, and training-time conditioning that jointly optimizes conditional and unconditional scores for tighter control, meeting the fundamental controllability requirement in semantic communications.
  • Sampling Acceleration: Recognizing that iterative sampling (often requiring hundreds to thousands of neural network evaluations) presents significant computational challenges for real-time deployment, we review five primary acceleration strategies: dimensionality reduction, knowledge distillation, structure pruning, cache reuse, and flow matching.
  • 🔬 Task Generalization: We explore how diffusion models, initially conceived for specific data modalities and domains, can be extended across diverse scenarios through three fundamental aspects — modality expansion, domain adaptation, and task generalization, which addresses the requirements of task-specific multi-modal semantic communications.
  • 📡 Application Scenarios: Through analysis of three distinct use cases, we illustrate how diffusion models enable extreme compression while maintaining semantic fidelity:
    • Fidelity-oriented human semantic communications balancing consistency-realism trade-offs for perceptually realistic reconstruction
    • Task-specific machine semantic communications optimizing effectiveness-efficiency trade-offs for downstream task execution under bandwidth constraints
    • Intent-driven agent semantic communications managing centralization-distribution trade-offs for multi-agent coordination through shared probabilistic representations

Why is this article needed?
As wireless systems approach Shannon capacity limits, semantic communications represent a paradigm shift from bit-accurate transmission toward meaning-centric communication. The emergence of diffusion models as powerful generative priors has catalyzed generative semantic communications, where receivers reconstruct high-quality content from minimal semantic cues. However, the field currently lacks systematic guidance connecting diffusion model techniques to semantic communication system design. This article fills that critical gap by:

  • Eliminating barriers between machine learning and communication communities
  • Providing depth beyond existing surveys and magazines through rigorous mathematical treatment and implementation details
  • Establishing connections via an inverse problem perspective that reformulates semantic decoding as posterior inference
  • Offering practical resources including open-source implementations, and deployment guidelines

Who should read this?
We believe this article may be helpful to the following groups of people:

  • Researchers in semantic communications seeking to leverage diffusion models
  • Machine learning practitioners interested in wireless communication applications
  • Graduate students entering the interdisciplinary field of AI-native wireless networks
  • Engineers designing next-generation communication systems with semantic awareness

📇 Table of Contents

🎓 Fundamentals of Diffusion Models

Mathematical Foundations

Mathematical concepts underlying diffusion models.

# Concept Reference Description Links
1 Score Matching Estimation of Non-Normalized Statistical Models (Hyvärinen, JMLR 2005) Foundation for learning score functions without computing partition functions Paper
2 Denoising Score Matching A Connection Between Score Matching and Denoising Autoencoders (Vincent, Neural Computation 2011) Equivalence between score matching and denoising Paper
3 Langevin Dynamics Bayesian Learning via Stochastic Gradient Langevin Dynamics (Welling & Teh, ICML 2011) MCMC sampling using gradient information Paper
4 Tweedie's Formula Tweedie's Formula and Selection Bias (Efron, JASA 1992) Posterior mean estimation from corrupted observations Paper
5 Neural ODEs Neural Ordinary Differential Equations (Chen et al., NeurIPS 2018) Continuous-depth neural networks and invertible transformations arXiv GitHub
6 Flow Matching Flow Matching for Generative Modeling (Lipman et al., ICLR 2023) Continuous normalizing flows via regression arXiv GitHub

Foundational Papers

Seminal works establishing the theoretical and practical foundations of diffusion models.

# Method Venue Key Contribution Links
1 Deep Unsupervised Learning using Nonequilibrium Thermodynamics ICML'15 First diffusion model using thermodynamic principles arXiv GitHub
2 NCSN - Generative Modeling by Estimating Gradients NeurIPS'19 Score matching with Langevin dynamics (SMLD) arXiv GitHub
3 DDPM - Denoising Diffusion Probabilistic Models NeurIPS'20 Simplified training objective and high-quality generation arXiv GitHub Website
4 DDIM - Denoising Diffusion Implicit Models ICLR'21 Non-Markovian sampling for accelerated generation arXiv GitHub
5 Score SDE - Score-Based Generative Modeling through SDEs ICLR'21 Unified SDE framework connecting score matching and diffusion arXiv GitHub
6 LDM - High-Resolution Image Synthesis with Latent Diffusion Models CVPR'22 Diffusion in learned latent spaces (Stable Diffusion) arXiv GitHub HF

🎨 Conditional Diffusion Models

Conditional diffusion models enable controlled generation by incorporating external guidance. This section covers two main categories based on when conditioning is applied.

Inference-Time Conditional Diffusion Models

These methods introduce guidance during sampling without modifying the pre-trained model.

# Method Venue Description Links
1 CG - Classifier Guidance NeurIPS'21 Adds classifier gradients to steer generation arXiv GitHub
2 ILVR ICCV'21 Iterative refinement toward a reference image arXiv GitHub
3 SDEdit ICLR'22 Structure-preserving editing via controlled denoising arXiv GitHub Website
4 RePaint CVPR'22 Inpainting by alternating denoising and re-noising arXiv GitHub
5 Prompt-to-Prompt arXiv'22 Cross-attention editing guided by text prompts arXiv GitHub Website
6 DDRM NeurIPS'22 Linear inverse problem solver using diffusion priors arXiv GitHub Website
7 MCG NeurIPS'22 Adds manifold consistency during sampling arXiv GitHub
8 DDNM - Denoising Diffusion Null-space Model ICLR'23 Null-space projection for zero-shot restoration arXiv GitHub HF Website
9 DPS - Diffusion Posterior Sampling ICLR'23 Posterior sampling with measurement guidance arXiv GitHub Website
10 πGDM - Pseudoinverse-Guided DM ICLR'23 Pseudoinverse-based conditioning for inverse tasks arXiv Website GitHub
11 Null-Text Inversion CVPR'23 Real-image editing via null-text optimization arXiv GitHub Website
12 BlindDPS CVPR'23 Jointly samples unknown operator and clean signal arXiv GitHub
13 DiffPIR CVPRW'23 Plug-and-play restoration with diffusion priors arXiv GitHub
14 DiffusionMBIR CVPR'23 Uses 2D diffusion priors for 3D reconstruction arXiv GitHub
15 FreeDoM ICCV'23 Training-free diffusion adaptation for new tasks arXiv GitHub
16 DG - Discriminator Guidance ICML'23 Introduces a discriminator that gives explicit supervision to a denoising sample path arXiv GitHub
17 SMRD MICCAI'23 MRI reconstruction via diffusion priors Paper GitHub
18 PSLD NeurIPS'23 Posterior sampling in latent diffusion space arXiv GitHub
19 RED-diff ICLR'24 Variational regularization with diffusion denoisers arXiv GitHub
20 ControlVideo ICLR'24 Video editing with spatial/temporal control via fine-tuning arXiv GitHub Replicate Website
21 DeqIR CVPR'24 Fixed-point solver for diffusion restoration arXiv GitHub
22 SparseCtrl ECCV'24 Adds sparse keyframe controls to text-to-video diffusion arXiv GitHub Website
23 DiffBIR ECCV'24 Blind image restoration with generative diffusion priors arXiv GitHub Replicate Website
24 DMPlug NeurIPS'24 Plug-in solver for general inverse problems arXiv GitHub
25 DGSolver NeurIPS'25 Diffusion generalist solver with universal posterior sampling arXiv GitHub
26 DAPS CVPR'25 Annealed posterior sampling for inverse problems arXiv GitHub Website
27 SITCOM ICML'25 Iterative constrained optimization during sampling arXiv GitHub
28 DiffStateGrad ICLR'25 Gradient projection in diffusion latent space arXiv GitHub Website
29 RF-Inversion ICLR'25 Semantic image inversion and editing using rectified SDEs arXiv GitHub ComfyUI Website
30 FlowDPS ICCV'25 Posterior sampling within flow-matching ODEs arXiv GitHub

Key Formula:

formula

Training-Time Conditional Diffusion Models

These methods incorporate conditioning directly during model training.

# Method Venue Description Links
1 CFG - Classifier-Free Guidance NeurIPS'21 Standard for conditional generation arXiv GitHub
2 LDM - Latent Diffusion Model CVPR'22 Stable Diffusion foundation arXiv GitHub HF
3 Palette SIGGRAPH'22 Image-to-image diffusion (colorization, inpainting, etc.) arXiv Website
4 Textual Inversion arXiv'22 Personalizes a concept via learned token embeddings arXIV GitHub Website
5 DreamBooth CVPR'23 Subject-driven personalization via fine-tuning arXiv GitHub Website
6 GLIGEN CVPR'23 Grounded language-to-image generation arXiv GitHub Website
7 InstructPix2Pix CVPR'23 Instruction-based image editing arXiv GitHub Website
8 ControlNet ICCV'23 Fine-grained spatial control arXiv GitHub HF
9 IP-Adapter arXiv'23 Image prompt adapter for identity/style conditioning arXiv GitHub Website
10 MoD - Mixture of Diffusers arXiv'23 Conditional diffusion with learned mixture experts arXiv GitHub
11 DiT - Diffusion Transformer ICCV'23 Transformer-based diffusion arXiv GitHub Website
12 MDT - Masked Diffusion Transformer ICCV'23 Masked diffusion transformers arXiv GitHub
13 SDXL - Stable Diffusion XL ICLR'24 High-res text-to-image diffusion with multi-aspect conditioning arXiv GitHub HF
14 T2I-Adapter AAAI'24 Lightweight adapters for control arXiv GitHub HF
15 AnimateDiff ICLR'24 Motion module for animation arXiv GitHub Website
16 LVD - LLM-grounded Video Diffusion ICLR'24 LLM-guided video generation arXiv GitHub Website
17 SEINE ICLR'24 Short-to-long video diffusion arXiv GitHub HF Website
18 VideoCrafter2 CVPR'24 Open-source text-to-video / video editing diffusion pipeline arXiv GitHub Website
19 HunyuanDiT CVPR'24 Large-scale DiT-based text-to-image diffusion with strong conditioning arXiv GitHub HF Website
20 S-CFG - Rethinking spatial Inconsistency in CFG CVPR'24 Analyzes and improves spatial consistency in CFG-based generation arXiv GitHub
21 D3PO CVPR'24 RLHF-style preference finetuning for diffusion without reward model arXiv GitHub
22 DreamMatcher CVPR'24 Appearance matching self-attention for semantically-consistent text-to-image personalization arXiv GitHub Website
23 PixArt-Σ ECCV'24 High-resolution text-to-image arXiv GitHub HF Website
24 Follow-Your-Emoji SIGGRAPH Asia'24 Fine-controllable and expressive freestyle portrait animation with diffusion arXiv GitHub Website
25 HunyuanVideo arXiv'24 High-res text-to-video diffusion with multi-scale DiT backbone arXiv GitHub HF Website
26 DDO - Direct Discriminative Optimization ICML'25 Direct optimization for preference alignment arXiv GitHub HF Website
27 CFG++ ICLR'25 Refines CFG via dynamic gradient weighting arXiv GitHub Website
28 Ctrl-Adapter ICLR'25 Unified adapter to inject diverse spatial/temporal controls into image/video diffusion arXiv GitHub HF Website
29 T2V-Turbo-v2 ICLR'25 Fast text-to-video generation arXiv GitHub Website
30 β-CFG arXiv'25 Dynamic guidance method for text-to-image diffusion models arXiv GitHub

Key Formula:

formula

⚡ Efficient Diffusion Models

Efficient diffusion models aim to reduce computational cost and sampling time through various acceleration strategies.

Dimensionality Reduction

Operating in compressed latent spaces reduces computational overhead.

# Method Venue Description Links
1 LDM - Latent Diffusion Model CVPR'22 Stable Diffusion foundation arXiv GitHub HF
2 WSGM - Wavelet Score-based GM NeurIPS'22 Wavelet-based score models Website
3 DiT - Diffusion Transformer ICCV'23 Transformer-based diffusion arXiv GitHub Website
4 WaveDiff CVPR'23 Wavelet-based diffusion arXiv GitHub
5 LMD - Latent Masking Diffusion AAAI'24 Combines the advantages of MAEs and diffusion arXiv GitHub

Knowledge Distillation

Distilling multi-step diffusion into fewer steps or single-step models.

# Method Venue Description Links
1 PD - Progressive Distillation ICLR'22 4-8 steps with minimal quality loss arXiv GitHub
2 CM - Consistency Model ICML'23 Single-step generation arXiv GitHub
3 LCM - Latent Consistency Model arXiv'23 Distills diffusion into few-step latent consistency models arXiv GitHub Replicate Website
4 DMD2 - Distribution Matching Distillation v2 NeurIPS'24 Improved distribution matching arXiv GitHub HF Website
5 CTM - Consistency Trajectory Model ICLR'24 Trajectory consistency modeling arXiv GitHub Website
6 iCT - Improved Consistency Training ICML'24 Improved consistency training without teacher models arXiv GitHub

Structure Pruning

Reducing model parameters through structured pruning.

# Method Venue Description Links
1 Diff-Pruning NeurIPS'23 Structural pruning for diffusion arXiv GitHub
2 TDPM - Truncated DPM ICLR'23 Truncated diffusion models arXiv GitHub
3 LD-Pruner CVPR'24 Latent diffusion pruning Website
4 DiP-GO NeurIPS'24 Diffusion pruning with gradient optimization arXiv GitHub
5 AdaDiff ECCV'24 Adaptive diffusion pruning arXiv GitHub
6 SnapFusion NeurIPS'23 Mobile diffusion via architecture evolution and data distillation arXiv Website

Cache Reuse

Reusing intermediate computations across sampling steps.

# Method Venue Description Links
1 DeepCache CVPR'24 Deep feature caching arXiv GitHub Website
2 BlockCaching CVPR'24 Block-wise caching strategy arXiv Website
3 L2C - Learning to Cache NeurIPS'24 Learned caching policies arXiv GitHub
4 ToCa - Token-wise Caching ICLR'25 Token-wise feature caching for DiT acceleration arXiv GitHub
5 ClusCa - Clustered Caching MM'25 Compute-efficient clustering cache arXiv GitHub
6 TaylorSeer ICCV'25 Taylor expansion-based feature forecasting for DiT acceleration arXiv GitHub Website

Flow Matching

Transforming diffusion into deterministic flows for faster sampling.

# Method Venue Description Links
1 Flow Matching ICLR'23 Continuous normalizing flows arXiv GitHub
2 Rectified Flow ICLR'23 Straightening probability flows arXiv GitHub
3 PeRFlow - Piecewise Rectified Flow NeurIPS'24 Piecewise rectification for accelerating diffusion models arXiv GitHub HF Website
4 InstaFlow ICLR'24 One-step generation via rectified flow arXiv GitHub
5 MeanFlow NeurIPS'25 Mean-field flow matching arXiv GitHub
6 Stable Diffusion 3 arXiv'24 Scaling rectified flow transformers for high-resolution image synthesis (MMDiT) arXiv GitHub HF
7 FLUX arXiv'25 High-quality flow matching-based text-to-image model with hybrid transformer architecture arXiv GitHub HF

🌐 Generalized Diffusion Models

Generalized diffusion models extend the framework to diverse modalities, domains, and tasks.

Modality Expansion

Extending diffusion to multiple modalities beyond images.

# Method Venue Description Links
1 MonoFormer arXiv'24 One transformer for both diffusion and autoregression arXiv GitHub HF Website
2 Diffusion Forcing NeurIPS'24 Full-sequence diffusion forcing arXiv GitHub Website
3 Show-o ICLR'25 Unified image and text generation arXiv GitHub
4 Transfusion ICLR'25 Combining diffusion and autoregression arXiv GitHub
5 UniDisc arXiv'25 Unified discrete-continuous diffusion arXiv GitHub HF Website
6 OmniGen2 arXiv'25 Unified image generation model with multi-modal conditioning arXiv GitHub HF Website

Domain Adaptation

Adapting diffusion models to specialized domains.

# Method Venue Description Links
1 DSB - Diffusion Schrödinger Bridge NeurIPS'21 Domain transfer via Schrödinger bridge Website GitHub
2 Composable Diffusion ECCV'22 Compositional visual generation arXiv GitHub Website
3 DreamBooth CVPR'23 Personalization with few examples arXiv GitHub Website
4 I2SB - Image-to-Image Schrödinger Bridge ICML'23 Image-to-image translation arXiv GitHub Website
5 P2P-Bridge ECCV'24 Point-to-point bridging arXiv GitHub Website
6 OT-CFM ICLR'23 Optimal transport conditional flow matching for efficient domain coupling arXiv GitHub

Task Generalization

Generalizing diffusion models across multiple tasks.

# Method Venue Description Links
1 Diffuser ICML'22 Planning with diffusion models arXiv GitHub Website
2 Diffusion Policy RSS'23 Visuomotor policy learning arXiv GitHub Website
3 DDPO - Denoising Diffusion Policy Optimization ICLR'24 RL fine-tuning for diffusion arXiv GitHub Website
4 C-LoRA - Continual LoRA TMLR'24 Continual learning for diffusion arXiv Website
5 Diffusion-ES CVPR'24 Evolutionary search with diffusion for black-box trajectory optimization arXiv GitHub Website
6 B²-DiffuRL CVPR'25 Bidirectional diffusion for RL arXiv GitHub
7 DPPO - Diffusion Policy Policy Optimization ICLR'25 PPO fine-tuning for diffusion policies in robotics arXiv GitHub Website

🛜 Diffusion Models for Semantic Communications

This section presents applications of diffusion models in semantic communications.

[Preliminary] Diffusion Models for Data Compression

Representative works using diffusion models for data compression across image, video, and audio modalities.

# Method Venue Description Links
1 CDC NeurIPS'23 Conditional diffusion decoder for end-to-end optimized lossy image compression arXiv GitHub
2 HFD arXiv'23 High-fidelity compression with score-based generative models arXiv
3 Multi-Band Diffusion NeurIPS'23 High-fidelity audio generation from low-bitrate discrete representations arXiv GitHub Website
4 PerCo ICLR'24 Ultra-low bitrate image compression with diffusion models (0.003 bpp) arXiv GitHub
5 IPIC (Idempotence) ICLR'24 Perceptual compression via idempotence constraints without training new models arXiv GitHub
6 CorrDiff ICML'24 Correcting diffusion compression with privileged end-to-end decoder arXiv
7 Foundation Diffusion ECCV'24 Lossy compression using pre-trained foundation models without fine-tuning arXiv
8 Extreme Video Compression WCSP'24 Extreme video compression with diffusion-based predictive generation (0.02 bpp) arXiv GitHub
9 UQDM ICLR'25 Progressive compression with universally quantized diffusion models arXiv GitHub Website
10 DiffC ICLR'25 Zero-shot lossy compression using pretrained Stable Diffusion models arXiv GitHub Website
11 PICD CVPR'25 Versatile perceptual image compression with diffusion rendering for screen and natural images arXiv

Fidelity-Oriented Human Semantic Communications

Diffusion models for high-quality semantic image, video, and audio transmission prioritizing perceptual fidelity for human consumption.

# Method Venue Description Links
1 DM4ASC ICASSP'24 First diffusion framework for audio semantic communication as inverse problem arXiv GitHub Website
2 CommIN ICASSP'24 INN-guided diffusion for wireless image transmission as inverse problem arXiv
3 DiffSC ICASSP'24 DDPM with Multi-Dimensional Feature Extraction for high-noise environments Website
4 CDDM TWC'24 Channel denoising diffusion models adapting to AWGN/Rayleigh channels arXiv GitHub
5 Gen-SC WCSP'24 Transmits images efficiently by sending text descriptions and reconstructing images via a text-to-image diffusion model arXiv
6 CDM-JSCC WCL'24 Enhances the perceptual quality of transmitted images by utilizing a rate-adaptive conditional diffusion model arXiv GitHub
7 Img2Img-SC MLSP'24 Language-oriented semantic communication framework that transmits both textual descriptions and compressed image embeddings arXiv GitHub
8 MU-GSC arXiv'24 Swin Transformer JSCC with diffusion decoder, 17.75% PSNR improvement arXiv
9 DiffJSCC TMLCN'25 Pre-trained Stable Diffusion with Deep JSCC achieving <0.008 symbols/pixel arXiv GitHub
10 DiffCom JSAC'25 Probabilistic sampling using channel signals as fine-grained conditions arXiv GitHub Website
11 GVSC TVT'25 First generative video semantic communication at low bandwidth ratio arXiv
12 Wang et al. arXiv'25 Receiver-driven retransmission with caption-guided latent diffusion inpainting arXiv
13 SGD-JSCC arXiv'25 DiT-based diffusion with semantic side information for channel denoising arXiv GitHub
14 WVSC-D arXiv'25 Wireless video semantic communication framework with decoupled diffusion multi-frame compensation arXiv
15 DiT-JSCC arXiv'26 A DiT-based generative JSCC that ensures high semantic consistency for image transmission under extreme channel conditions arXiv

Task-Specific Machine Semantic Communications

Resource-efficient diffusion models optimized for machine semantic communications and edge computing scenarios.

# Method Venue Description Links
1 GESCO arXiv'23 Pioneering diffusion-based machine semantic communication transmitting compressed semantic maps arXiv GitHub
2 Qiao et al. WCL'24 Latency-aware generative semantic communications with pre-trained diffusion models Website
3 SCGSC WCNC'24 Semantic change driven generative machine semantic communication framework arXiv GitHub
4 LDM-SemCom TWC'25 Real-time edge computing with end-to-end consistency distillation arXiv GitHub
5 Guo et al. TWC'25 Treating wireless transmission as forward diffusion process with VAE modules arXiv
6 Q-GESCO WCL'25 Quantized models reducing memory 75% and FLOPs 79% for resource-constrained devices arXiv GitHub
7 CASC ICC'25 Latent diffusion with Condition-Aware NN, 51.7% inference time reduction arXiv
8 SC-Diffusion TMLCN'25 Parameter generation for task-oriented semantic communications via conditional diffusion model Website
9 Khalid et al. ICML'25 Semantic image communication via Stable Cascade with compact latent embeddings arXiv
10 Wang et al. arXiv'25 Training-free LDM receiver with SDE-derived SNR-to-timestep mapping for zero-shot generalization arXiv
11 DiffSem arXiv'25 Task-oriented with privacy, notable accuracy improvement on MNIST arXiv
12 SS-MGSC arXiv'25 A multi-user generative semantic communication framework utilizing semantic-splitting and diffusion models for personalized vehicular networks arXiv

Intent-Driven Agent Semantic Communications

AI agents with diffusion models for intent-driven semantic communications.

# Method Venue Description Links
1 A-GSC TWC'24 Agent-driven generative semantic communications with cross-modality and prediction based on diffusion RL arXiv
2 Semantic Collaboration CNIOT'24 A multi-agent collaboration framework based on semantic communication for search and rescue tasks Website
3 CSCA TMC'26 A diffusion policy-empowered cognitive SemCom agent for intent-driven multimodal communication planning at the edge Website

📊 Benchmarks and Datasets

Benchmarks

Widely-used open-source benchmarks for evaluating diffusion model generation quality, prompt fidelity, and compositional capabilities.

Text-to-Image Benchmarks

# Benchmark Description Source
1 DrawBench 200 challenging prompts across 11 categories (counting, colors, spatial, text rendering, etc.) introduced by Imagen for qualitative human evaluation of T2I models. arXiv
2 PartiPrompts (P2) 1,600 diverse English prompts spanning 12 categories and 11 challenge aspects for holistic T2I evaluation. Released with the Parti model. arXiv HF
3 TIFA VQA-based automatic evaluation measuring T2I faithfulness by generating question-answer pairs from prompts and verifying against images. 4K prompts, 25K questions across 12 categories. arXiv GitHub Website
4 T2I-CompBench Comprehensive compositional T2I benchmark evaluating attribute binding, spatial relationships, and complex compositions with detection-based metrics. arXiv GitHub
5 GenEval Compositional generation benchmark evaluating object count, spatial relations, attribute binding, and co-occurrence accuracy via object detection pipelines. arXiv GitHub
6 DPG-Bench Dense prompt generation benchmark with long, detailed prompts synthesized from multi-annotation sources for evaluating models on complex, attribute-rich descriptions. arXiv GitHub
7 MJHQ-30K 30K high-quality Midjourney images across 10 categories for automatic FID-based aesthetic quality evaluation. Curated with aesthetic and CLIP score filtering. arXiv HF
8 GenAI-Bench 1,600 compositional prompts from professional designers, evaluating advanced reasoning (counting, comparison, logic) with human ratings across 10 leading T2I/T2V models. arXiv GitHub HF

Video Generation Benchmarks

# Benchmark Description Source
1 VBench Comprehensive video generation benchmark evaluating 16 dimensions including temporal consistency, motion quality, aesthetic fidelity, and subject identity. arXiv GitHub
2 EvalCrafter Benchmark and pipeline for evaluating video generation models across visual quality, text-video alignment, motion quality, and temporal consistency. arXiv GitHub Website

Datasets

Audio

# Dataset Description Size Tasks Source
1 LibriSpeech Large-scale corpus of read English speech derived from audiobooks. Clean and noisy subsets available. 1000 hours ASR, Speech Recognition OpenSLR
2 VCTK English multi-speaker corpus with 110 speakers reading newspapers. High-quality recordings. 44 hours TTS, Voice Conversion, Speaker Recognition Link
3 AudioSet Large-scale dataset of 2M 10-second audio clips with 527 sound event classes from YouTube. 2M clips Audio Classification, Sound Event Detection Link

Image

# Dataset Description Size Tasks Source
1 ImageNet Large-scale image classification dataset with 1000 object categories. Standard benchmark for computer vision. 1.4M images Classification, Object Recognition Link
2 COCO Common Objects in Context. Object detection, segmentation, and captioning with 80 categories. 330K images Detection, Segmentation, Captioning arXiv
3 FFHQ Flickr-Faces-HQ. High-quality face dataset at 1024×1024 resolution with diverse variations. 70K images Face Generation, GAN, Style Transfer GitHub
4 CLIC Challenge on Learned Image Compression dataset. Professional quality images for compression research. 2000+ images Image Compression, Quality Assessment Link
5 Kodak Kodak PhotoCD dataset. Standard benchmark with 24 high-quality uncompressed images. 24 images Image Compression, Quality Evaluation Link
6 Places365 Scene recognition dataset with 365 scene categories. Focuses on environmental context. 10M images Scene Recognition, Classification Link
7 CelebA Large-scale face attributes dataset with 40 attribute annotations per image. 202K images Face Recognition, Attribute Prediction Link

Video

# Dataset Description Size Tasks Source
1 Kinetics-400/600/700 Large-scale human action video dataset from YouTube. Standard for action recognition. 650K videos Action Recognition, Video Classification arXiv
2 UCF101 Action recognition dataset with 101 action categories from realistic web videos. 13K videos Action Recognition, Video Understanding Link
3 ActivityNet Large-scale video dataset for human activity understanding with temporal annotations. 20K videos Activity Detection, Temporal Localization Link
4 YouTube-8M Large-scale video understanding dataset with 8M videos and 3862 visual entity classes. 8M videos Video Classification, Multi-label Link
5 MSR-VTT Video captioning dataset with 10K video clips and 200K natural language descriptions. 10K videos Video Captioning, Video-Text Retrieval Link

Volume (3D/4D)

# Dataset Description Size Tasks Source
1 D-NeRF Dynamic Neural Radiance Fields dataset with synthetic and real dynamic scenes for 4D reconstruction. 9 scenes Dynamic Novel View Synthesis, 4D Reconstruction arXiv GitHub Website
2 Neu3D Neural 3D video synthesis dataset with multi-view videos of human performances. 200+ sequences 3D Human Reconstruction, Neural Rendering arXiv GitHub Project
3 ShapeNet Large-scale 3D shape dataset with 55 object categories and 51,300 3D CAD models. 51K models 3D Reconstruction, Shape Analysis arXiv Link
4 ScanNet Richly-annotated indoor RGB-D scans with 3D semantic segmentation labels for 1513 scenes. 1513 scans 3D Segmentation, Indoor Scene Understanding arXiv Link
5 ModelNet 3D CAD model dataset with ModelNet40 (40 classes) and ModelNet10 (10 classes) versions. 12K models 3D Classification, Point Cloud Processing Link
6 NeRF Synthetic Blender-rendered synthetic scenes with known camera poses and lighting for NeRF evaluation. 8 scenes Novel View Synthesis, 3D Reconstruction arXiv GitHub Project

Domain-Specific

Autonomous Driving

# Dataset Description Size Tasks Source
1 nuScenes Full 3D sensor suite with LiDAR, radar, and cameras. 1000 scenes with 3D bounding boxes. 1000 scenes 3D Detection, Tracking, Prediction arXiv Link
2 KITTI Benchmark suite for stereo, optical flow, visual odometry, and 3D object detection from driving scenarios. 200K images 3D Detection, Depth, Odometry Link
3 Waymo Open Dataset High-resolution sensor data with LiDAR and camera from Waymo vehicles. Large-scale 3D annotations. 1000 segments 3D Detection, Tracking, Motion Prediction arXiv Link
4 Cityscapes Urban street scenes with dense pixel-level semantic and instance segmentation annotations. 25K images Semantic Segmentation, Instance Segmentation arXiv Link

Medical Imaging

# Dataset Description Size Tasks Source
1 BraTS Brain Tumor Segmentation challenge with multimodal MRI scans (T1, T2, FLAIR, T1ce). Annual benchmark. 2000+ cases 3D Tumor Segmentation, Medical Imaging Link
2 MIMIC-CXR Large chest X-ray dataset with free-text radiology reports. Largest publicly available CXR dataset. 377K images Disease Classification, Report Generation arXiv PhysioNet
3 ChestX-ray14 Large-scale chest X-ray dataset with 14 common disease labels for multi-label classification. 112K images Disease Classification, Localization arXiv NIH
4 Medical Segmentation Decathlon Multi-organ segmentation covering 10 different medical imaging tasks (CT, MRI). 2600+ cases Multi-task 3D Segmentation arXiv Link

Depth Estimation

# Dataset Description Size Tasks Source
1 NYU Depth V2 Indoor RGB-D dataset with dense depth maps from Microsoft Kinect. 1449 labeled scenes. 1449 scenes Depth Estimation, Indoor Scene Understanding Link
2 DIODE Dense Indoor and Outdoor DEpth dataset with high-quality depth from laser scanner. 25K images Depth Estimation, Normal Estimation arXiv Link
3 Middlebury Stereo Standard stereo matching benchmark with high-resolution calibrated image pairs and ground truth. 30+ pairs Stereo Matching, Depth Estimation Link
4 SceneFlow Large synthetic dataset with optical flow and disparity ground truth for 3D scene understanding. 39K images Optical Flow, Stereo Matching, Depth arXiv Link

Remote Sensing

# Dataset Description Size Tasks Source
1 SpaceNet High-resolution satellite imagery with building footprints, road networks across multiple cities. 1M+ buildings Building Detection, Road Extraction Link
2 xView One of the largest overhead imagery datasets with 1M object instances across 60 classes. 1M objects Object Detection, Classification arXiv Link
3 DOTA Dataset for Object deTection in Aerial images with oriented bounding boxes. 15 categories. 188K instances Oriented Object Detection, Aerial Imagery arXiv Link
4 LEVIR-CD Large-scale building change detection dataset from Google Earth with 637 image pairs. 637 pairs Change Detection, Building Analysis Link

📏 Evaluation Metrics

Perception Metrics

Full-Reference Metrics

# Metric Description Source
1 PSNR Peak Signal-to-Noise Ratio. Measures the ratio between the maximum possible power of a signal and the power of corrupting noise. Calculated as PSNR = 10·log₁₀(MAX²/MSE). Wikipedia
2 SSIM Structural Similarity Index. Assesses image quality based on luminance, contrast, and structure. Designed to improve on PSNR by considering structural information. Paper
3 LPIPS Learned Perceptual Image Patch Similarity. Uses deep neural network features to compute perceptual distance between images, better aligned with human perception. arXiv
4 DISTS Deep Image Structure and Texture Similarity. Combines structure and texture similarity using deep features for better perceptual quality assessment. arXiv

Reduced-Reference Metrics

# Metric Description Source
1 RRED Reduced-Reference Entropic Differencing. Uses entropic differences between wavelet coefficients, requiring only partial statistical features from reference. Paper
2 RR-SSIM Reduced-Reference SSIM. Extracts and transmits only key structural features (edge information, local statistics) from reference image. Paper

No-Reference Metrics

# Metric Description Source
1 NIQE Natural Image Quality Evaluator. Measures deviation from statistical regularities in natural images using natural scene statistics (NSS). Completely blind quality assessment. Paper
2 FID Fréchet Inception Distance. Calculates Fréchet distance between feature distributions of real and generated images in Inception-v3 space. Lower FID indicates better quality and diversity. arXiv
3 KID Kernel Inception Distance. Unbiased alternative to FID using polynomial kernel on Inception features. More reliable for small sample sizes. arXiv
4 IS Inception Score. Evaluates both quality (classification confidence) and diversity (marginal class distribution). arXiv
5 MUSIQ Multi-scale Image Quality Transformer. Handles native-resolution images via multi-scale patch embedding without fixed-size cropping, enabling more robust no-reference quality assessment. arXiv GitHub
6 CLIP-IQA Leverages CLIP's vision-language representations for no-reference image quality and aesthetic assessment via prompt-based antonym pairing. arXiv GitHub

Semantic Metrics

# Metric Description Source
1 CLIPScore Measures text-image alignment using CLIP embeddings. Computed as cosine similarity between CLIP image and text features. arXiv
2 ViTScore Uses Vision Transformer features to evaluate semantic similarity between images. Captures high-level semantic content beyond pixel-level differences. arXiv
3 SeSS Semantic Similarity Score. Based on Scene Graph Generation and graph matching, shifts image similarity scores into semantic-level graph matching scores. arXiv
4 DreamSim Learned perceptual metric trained on synthetic triplet judgments from diffusion models, capturing mid-level semantic similarity beyond low-level texture. arXiv GitHub
5 ImageReward Text-image alignment metric learned from human preference rankings via reward modeling, designed to evaluate text-to-image generation quality. arXiv GitHub
6 HPSv2 Human Preference Score v2. Fine-tuned CLIP model predicting human aesthetic preferences for generated images, trained on large-scale human choice data. arXiv GitHub
7 PickScore Preference-based scoring model trained on the Pick-a-Pic dataset of human pairwise preferences for text-to-image generation. arXiv GitHub

🔗 Other Resources

📚 Comprehensive Books, Surveys & Tutorials

Diffusion Models

# Paper Authors Year Links
1 Understanding Diffusion Models: A Unified Perspective Luo et al. 2022 arXiv
2 Diffusion Models: A Comprehensive Survey of Methods and Applications Yang et al. 2022 arXiv
3 Diffusion Models in Vision: A Survey Croitoru et al. 2022 arXiv
4 A Survey on Generative Diffusion Models Cao et al. 2022 arXiv GitHub
5 A Survey on Video Diffusion Models Xing et al. 2023 arXiv GitHub
6 Diffusion Models for Image Restoration and Enhancement: A Comprehensive Survey Li et al. 2023 arXiv
7 Efficient Diffusion Models: A Comprehensive Survey From Principles to Practices Ma et al. 2024 arXiv
8 Diffusion Model-Based Image Editing: A Survey Huang et al. 2024 arXiv GitHub
9 Diffusion Models in Low-Level Vision: A Survey He et al. 2024 arXiv GitHub
10 Diffusion Models in 3D Vision: A Survey Wang et al. 2024 arXiv
11 Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review Uehara et al. 2024 arXiv GitHub
12 Efficient Diffusion Models: A Survey Shen et al. 2025 arXiv GitHub
13 A Survey on Diffusion Language Models Li et al. 2025 arXiv GitHub
14 The Principles of Diffusion Models Lai et al. 2025 arXiv
15 Flow Matching Guide and Code Lipman et al. 2024 arXiv GitHub
16 An Introduction to Flow Matching and Diffusion Models Holderrieth & Erives 2025 arXiv Project Page

Semantic Communications

# Paper Authors Year Links
1 Toward Wisdom-Evolutionary and Primitive-Concise 6G: A New Paradigm of Semantic Communication Networks Zhang et al. 2022 Paper
2 Semantic Communications for Future Internet: Fundamentals, Applications, and Challenges Yang et al. 2022 arXiv
3 Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications Gunduz et al. 2022 arXiv
4 Semantics-Empowered Communications: A Tutorial-Cum-Survey Lu et al. 2022 arXiv
5 Less Data, More Knowledge: Building Next Generation Semantic Communication Networks Chaccour et al. 2022 arXiv
6 Enhancing Deep Reinforcement Learning: A Tutorial on Generative Diffusion Models in Network Optimization Du et al. 2023 arXiv
7 A Survey on Semantic Communication Networks: Architecture, Security, and Privacy Guo et al. 2024 arXiv
8 Resource Management, Security, and Privacy Issues in Semantic Communications: A Survey Won et al. 2024 Paper
9 Generative AI-Driven Semantic Communication Networks: Architecture, Technologies, and Applications Liang et al. 2024 arXiv
10 A Contemporary Survey on Semantic Communications: Theory of Mind, Generative AI, and Deep Joint Source-Channel Coding Nguyen et al. 2025 arXiv
11 Generative Diffusion Models for Wireless Networks: Fundamental, Architecture, and State-of-the-Art Fan et al. 2025 arXiv
12 Resource Allocation in Wireless Semantic Communications: A Comprehensive Survey Zhang et al. 2025 Paper

📺 Courses & Video Lectures

# Title Source Type Links
1 Stanford CS236: Deep Generative Models Stefano Ermon et al. University Course Project Page
2 MIT 6.S978: Deep Generative Models Kaiming He et al. University Course Project Page
3 MIT 6.S184: Introduction to Flow Matching and Diffusion Models Peter Holderrieth & Ezra Erives University Course arXiv Project Page
4 Diffusion Models Course Hugging Face Online Course GitHub
5 NeurIPS 2023 Workshop: Diffusion Models NeurIPS Workshop Project Page
6 Diffusion and Score-Based Generative Models Yang Song Lecture YouTube
7 Two Minute Papers – Diffusion Series Two Minute Papers YouTube Series YouTube
8 Generative Modeling by Estimating Gradients of the Data Distribution Yang Song Blog Post Project Page
9 What are Diffusion Models? Lilian Weng Blog Post Project Page

🧰 Interactive Demos & Tools

# Tool Type What it’s great for Links
1 Stable Diffusion WebUI (AUTOMATIC1111) UI + Extensions Local UI with huge plugin ecosystem GitHub
2 InvokeAI Pro UI Studio-style creative workflow & editing GitHub
3 🤗 Diffusers Library Clean Python API for diffusion inference & training GitHub
4 Diffusers Playground (Hugging Face Spaces) Web demo Try many pipelines online (no local install) Project Page
5 ComfyUI Node-graph UI Modular node-based pipelines for reproducible flows GitHub
6 StableStudio (Stability AI) Official UI Frontend for SDXL / stability models GitHub
7 Fooocus Simple UI One-click text→image with SDXL support GitHub
8 kohya-ss / sd-scripts Training / Finetune LoRA, DreamBooth, finetuning helpers GitHub
9 ControlNet Conditioning model Pose / edge / depth guided generation GitHub
10 sd-webui-controlnet WebUI Extension Easy ControlNet integration for WebUI GitHub

📝 Citation

If you find this article or repository helpful, please consider citing:

@article{qin-diffcomm,
    author  = {H. L. Qin and J. Dai and G. Lu and S. Shao and S. Wang and T. Xu and W. Zhang and P. Zhang and K. B. Letaief},
    title   = {Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications},
    journal = {arXiv preprint arXiv:2511.08416},
    year    = {2025}
}

Related Papers from Our Group

@article{dai-gaicomm,
	author  = {J. Dai and X. Qin and S. Wang and L. Xu and K. Niu and P. Zhang},
	title   = {Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency},
	journal = {IEEE Wireless Commun.},
	volume  = {31},
	number  = {4},
	pages   = {48--56},
	year    = {2024}
}
@article{wang-diffcom,
	author  = {S. Wang and J. Dai and K. Tan and X. Qin and K. Niu and P. Zhang},
	title   = {DiffCom: Channel Received Signal is a Natural Condition to Guide Diffusion Posterior Sampling},
	journal = {IEEE J. Sel. Areas Commun.},
	volume  = {43},
	number  = {7},
	pages   = {2651--2666},
	year    = {2025}
}
@article{qin-semcod,
    author  = {H. L. Qin and J. Dai and S. Wang and X. Qin and S. Shao and K. Niu and W. Xu and P. Zhang},
    title   = {Neural Coding is Not Always Semantic: Toward the Standardized Coding Workflow in Semantic Communications},
    journal = {IEEE Commun. Stand. Mag.},
    volume  = {9},
    number  = {4},
    pages   = {24--33},
    year    = {2025}
}
@article{tan-ditjscc,
    author  = {K. Tan and J. Dai and S. Wang and G. Lu and S. Shao and K. Niu and W. Zhang and P. Zhang},
    title   = {DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations},
    journal = {arXiv preprint arXiv:2601.03112},
    year    = {2026}
}

🌟 Acknowledgments

We thank the diffusion models and semantic communications research communities for their groundbreaking work. Special thanks to all and future contributors to this repository.

⭐ Star this repo if you find it useful! ⭐

Back to Top

Maintained with ❤️ by the community members:

Contributors

About

A public repository of "Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications", which is a collection of educational resources and curated papers on diffusion models and their applications in semantic communications.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages