GitHub - JJLibra/SALAD-Pan: 🤗 Official implementation for "SALAD-Pan: Sensor-Agnostic Latent Adaptive Diffusion for Pan-Sharpening" https://arxiv.org/abs/2602.04473

SALAD-Pan: Sensor-Agnostic Latent Adaptive Diffusion for Pan-Sharpening

Junjie Li · Congyang Ou · Haokui Zhang · Guoting Wei · Shengqin Jiang · Ying Li · Chunhua Shen

Given a PAN–LRMS image pair, SALAD-Pan fine-tunes a pre-trained diffusion model to generate a HRMS.

News

[02/01/2026] Code will be released soon !

Setup

Requirements

git clone https://github.com/JJLibra/SALAD-Pan.git
cd SALAD-Pan

conda create -n saladpan python=3.10 -y
conda activate saladpan

# This project depends on a modified local version of `diffusers` under `./diffusers`.
cd diffusers
pip install -e .

cd ..
pip install -r requirements.txt

And initialize an 🤗Accelerate environment with:

accelerate config

Or for a default accelerate configuration without answering questions about your environment

accelerate config default

Weights

We provide two-stage checkpoints:

Stage I (Band-VAE): checkpoints/vae.safetensors (download: Hugging Face)
Stage II (Latent Diffusion): runs on top of Stable Diffusion in the Band-VAE latent space.
- Stable Diffusion base: download from Hugging Face (e.g., Stable Diffusion v1-5)
- Adapters: checkpoints/adapters.pth (download: Hugging Face)

Usage

Training

We train the model in two stages.

Stage I (VAE pretraining)

accelerate launch train_vae.py --config configs/train_vae.yaml

Stage II (Diffusion + Adapter training)

accelerate launch train_diffusion.py --config configs/train_diffusion.yaml

Note: Tuning usually takes 40k~50k steps, about 1~2 days using eight RTX 4090 GPUs in fp16. Reduce batch_size if your GPU memory is limited.

Inference

Once the training is done, run inference:

Coming soon.

Installing xformers is highly recommended for more efficiency and speed on GPUs. To enable xformers, set enable_xformers_memory_efficient_attention=True.

Results

🚨We strongly recommend that you visit this website for a better reading experience.

Quantitative Results

Table 1. Quantitative results on the WorldView-3 (WV3) dataset. Best results are in bold.

Models	Pub/Year	Q₈ ↑	SAM ↓	ERGAS ↓	SCC ↑	D_λ ↓	D_s ↓	HQNR ↑
PaNNet	ICCV’17	0.891±0.045	3.613±0.787	2.664±0.347	0.943±0.018	0.017±0.008	0.047±0.014	0.937±0.015
FusionNet	TGRS’20	0.904±0.092	3.324±0.411	2.465±0.603	0.958±0.023	0.024±0.011	0.036±0.016	0.940±0.019
LAGConv	AAAI’22	0.910±0.114	3.104±1.119	2.300±0.911	0.980±0.043	0.036±0.009	0.032±0.016	0.934±0.011
BiMPAN	ACMM’23	0.915±0.087	2.984±0.601	2.257±0.552	0.984±0.005	0.017±0.019	0.035±0.015	0.949±0.026
ARConv	CVPR’25	0.916±0.083	2.858±0.590	2.117±0.528	0.989±0.014	0.014±0.006	0.030±0.007	0.958±0.010
WFANET	AAAI’25	0.917±0.088	2.855±0.618	2.095±0.422	0.989±0.011	0.012±0.007	0.031±0.009	0.957±0.010
PanDiff	TGRS’23	0.898±0.090	3.297±0.235	2.467±0.166	0.980±0.019	0.027±0.108	0.054±0.047	0.920±0.077
SSDiff	NeurIPS’24	0.915±0.086	2.843±0.529	2.106±0.416	0.986±0.004	0.013±0.005	0.031±0.003	0.956±0.016
SGDiff	CVPR’25	0.921±0.082	2.771±0.511	2.044±0.449	0.987±0.009	0.012±0.005	0.027±0.003	0.960±0.006
SALAD‑Pan	Ours	0.924±0.064	2.689±0.135	1.839±0.211	0.989±0.007	0.010±0.008	0.021±0.004	0.965±0.007

Table 2. Quantitative results on the QuickBird (QB) dataset. Best results are in bold.

Models	Pub/Year	Q₄ ↑	SAM ↓	ERGAS ↓	SCC ↑	D_λ ↓	D_s ↓	HQNR ↑
PaNNet	ICCV’17	0.885±0.118	5.791±0.995	5.863±0.413	0.948±0.021	0.059±0.017	0.061±0.010	0.883±0.025
FusionNet	TGRS’20	0.925±0.087	4.923±0.812	4.159±0.351	0.956±0.018	0.059±0.019	0.052±0.009	0.892±0.022
LAGConv	AAAI’22	0.916±0.130	4.370±0.720	3.740±0.290	0.959±0.047	0.085±0.024	0.068±0.014	0.853±0.018
BiMPAN	ACMM’23	0.931±0.091	4.586±0.821	3.840±0.319	0.980±0.008	0.026±0.020	0.040±0.013	0.935±0.030
ARConv	CVPR’25	0.936±0.088	4.453±0.499	3.649±0.401	0.987±0.009	0.019±0.014	0.034±0.017	0.948±0.042
WFANET	AAAI’25	0.935±0.092	4.490±0.582	3.604±0.337	0.986±0.008	0.019±0.016	0.033±0.019	0.948±0.037
PanDiff	TGRS’23	0.934±0.095	4.575±0.255	3.742±0.353	0.980±0.007	0.058±0.015	0.064±0.020	0.881±0.075
SSDiff	NeurIPS’24	0.934±0.094	4.464±0.747	3.632±0.275	0.982±0.008	0.031±0.011	0.036±0.013	0.934±0.021
SGDiff	CVPR’25	0.938±0.087	4.353±0.741	3.578±0.290	0.983±0.007	0.023±0.013	0.043±0.012	0.934±0.011
SALAD‑Pan	Ours	0.939±0.088	4.198±0.526	3.251±0.288	0.984±0.009	0.017±0.011	0.026±0.009	0.957±0.010

Table 3. Quantitative results on the GaoFen-2 (GF2) dataset. Best results are in bold.

Models	Pub/Year	Q₄ ↑	SAM ↓	ERGAS ↓	SCC ↑	D_λ ↓	D_s ↓	HQNR ↑
PaNNet	ICCV’17	0.967±0.013	0.997±0.022	0.919±0.039	0.973±0.011	0.017±0.012	0.047±0.012	0.937±0.023
FusionNet	TGRS’20	0.964±0.014	0.974±0.035	0.988±0.072	0.971±0.012	0.040±0.013	0.101±0.014	0.863±0.018
LAGConv	AAAI’22	0.970±0.011	1.080±0.023	0.910±0.045	0.977±0.006	0.033±0.013	0.079±0.013	0.891±0.021
BiMPAN	ACMM’23	0.965±0.020	0.902±0.066	0.881±0.058	0.972±0.018	0.032±0.015	0.051±0.014	0.918±0.019
ARConv	CVPR’25	0.982±0.013	0.710±0.149	0.645±0.127	0.994±0.005	0.007±0.005	0.029±0.019	0.963±0.018
WFANET	AAAI’25	0.981±0.007	0.751±0.082	0.657±0.074	0.994±0.002	0.003±0.003	0.032±0.021	0.964±0.020
PanDiff	TGRS’23	0.979±0.011	0.888±0.037	0.746±0.031	0.988±0.003	0.027±0.011	0.073±0.013	0.903±0.025
SSDiff	NeurIPS’24	0.983±0.007	0.670±0.124	0.604±0.108	0.991±0.006	0.016±0.009	0.027±0.027	0.957±0.010
SGDiff	CVPR’25	0.980±0.011	0.708±0.119	0.668±0.094	0.989±0.005	0.020±0.013	0.024±0.022	0.959±0.011
SALAD‑Pan	Ours	0.982±0.010	0.667±0.051	0.592±0.088	0.991±0.003	0.005±0.002	0.022±0.014	0.973±0.010

Qualitative Comparison

Visual comparison on WorldView-3 (WV-3) and QuickBird (QB) dataset at reduced resolution. Visual comparison on WorldView-3 (WV-3) and QuickBird (QB) dataset at full resolution.

Efficiency comparison (RR, QB)

Diffusion-based Methods	SAM ↓	ERGAS ↓	NFE	Latency (s) ↓
PanDiff	4.575±0.255	3.742±0.353	1000	356.63±1.98
SSDiff	4.464±0.747	3.632±0.275	10	10.10±0.21
SGDiff	4.353±0.741	3.578±0.290	50	6.64±0.09
SALAD-Pan	4.198±0.526	3.251±0.288	20	3.36±0.07

Latency is reported as mean ± std over 10 runs (warmup=3), batch size=1, evaluated on the QB dataset under the reduced-resolution (RR) protocol, on an RTX 4090 GPU.

Citation

If you make use of our work, please cite our paper.

@article{li2026saladpan,
  title={SALAD-Pan: Sensor-Agnostic Latent Adaptive Diffusion for Pan-Sharpening},
  author={Junjie Li and Congyang Ou and Haokui Zhang and Guoting Wei and Shengqin Jiang and Ying Li and Chunhua Shen},
  journal={arXiv preprint arXiv:2602.04473},
  year={2026}
}

Shoutouts

Built with 🤗 Diffusers. Thanks for open-sourcing !
The interactive demo is powered by 🤗 Gradio. Thanks for open-sourcing !

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
checkpoints		checkpoints
configs		configs
core		core
data		data
diffusers		diffusers
utils		utils
LICENSE		LICENSE
README.md		README.md
gradio_demo.py		gradio_demo.py
inference.py		inference.py
requirements.txt		requirements.txt
train_diffusion.py		train_diffusion.py
train_vae.py		train_vae.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SALAD-Pan: Sensor-Agnostic Latent Adaptive Diffusion for Pan-Sharpening

News

Contents

Setup

Requirements

Weights

Usage

Training

Inference

Results

Quantitative Results

Qualitative Comparison

Efficiency comparison (RR, QB)

Citation

Shoutouts

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SALAD-Pan: Sensor-Agnostic Latent Adaptive Diffusion for Pan-Sharpening

News

Contents

Setup

Requirements

Weights

Usage

Training

Inference

Results

Quantitative Results

Qualitative Comparison

Efficiency comparison (RR, QB)

Citation

Shoutouts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages