Wavelet-Domain Masked Image Modeling for Color-Consistent HDR Video Reconstruction

IEEE Transactions on Multimedia (TMM), 2026

Yang Zhang¹, Zhangkai Ni¹, Wenhan Yang², Hanli Wang¹

¹Tongji University, ²Pengcheng Laboratory

This repository provides the official implementation for the paper "Wavelet-Domain Masked Image Modeling for Color-Consistent HDR Video Reconstruction", IEEE Transactions on Multimedia (TMM), 2026.

About WMNet

High Dynamic Range (HDR) video reconstruction aims to recover fine brightness, color, and details from Low Dynamic Range (LDR) videos. However, existing methods often suffer from color inaccuracies and temporal inconsistencies. To address these challenges, we propose WMNet, a novel HDR video reconstruction network that leverages Wavelet domain Masked Image Modeling (W-MIM). WMNet adopts a two-phase training strategy: In Phase I, W-MIM performs self-reconstruction pre-training by selectively masking color and detail information in the wavelet domain, enabling the network to develop robust color restoration capabilities. A curriculum learning scheme further refines the reconstruction process. Phase II fine-tunes the model using the pre-trained weights to improve the final reconstruction quality. To improve temporal consistency, we introduce the Temporal Mixture of Experts (T-MoE) module and the Dynamic Memory Module (DMM). T-MoE adaptively fuses adjacent frames to reduce flickering artifacts, while DMM captures long-range dependencies, ensuring smooth motion and preservation of fine details. Additionally, since existing HDR video datasets lack scene-based segmentation, we reorganize HDRTV4K into HDRTV4K-Scene, establishing a new benchmark for HDR video reconstruction. Extensive experiments demonstrate that WMNet achieves state-of-the-art performance across multiple evaluation metrics, significantly improving color fidelity, temporal coherence, and perceptual quality.

TL;DR: We propose WMNet, a novel HDR video reconstruction network that leverages Wavelet domain Masked Image Modeling (W-MIM) with a two-phase training strategy to address color inaccuracies and temporal inconsistencies. It decouples reconstruction into robust color/detail restoration via wavelet-domain masking and temporal coherence enhancement through Temporal Mixture of Experts (T-MoE) and Dynamic Memory Module (DMM).

Experimental Results

Performance comparison of various HDR reconstruction models on HDRTV4K-Scene and HDRTV4K-LongScene. The performance on metrics PSNR, SSIM, SR-SIM, $\Delta E_{ITP}$, HDR-VDP3, LPIPS and $E_{warp}$ are reported. The top three performances are highlighted in red, orange, and yellow backgrounds, respectively.

Quantitative comparisons on HDRTV4K-Scene dataset

Quantitative comparisons on HDRTV4K-LongScene dataset

Environment setup

To start, we prefer creating the environment using conda:

conda create -n wmnet
conda activate wmnet
pip install -r requirements.txt

PyTorch installation is machine dependent, please install the correct version for your machine.

Dependencies (click to expand)

PyTorch, numpy: main computation.
pytorch-msssim: SSIM calculation.
tqdm: progress bar.
opencv-python,scikit-image: image processing.
imageio: images I/O.
einops: torch tensor shaping with pretty api.

Getting the data

The datasets we used are as follows:

Please organize the dataset structure in accordance with Section 4.A.1 of the paper.

Directory structure for the datasets

(click to expand;)

data_path
├── HDRTV4KSence
│   ├── train_scene_hdr
│   │   ├── abp1_autumnwoods
│   |   |   ├── 000.png
│   |   |   ├── 001.png
│   |   |   ├── 002.png
│   |   |   ...
│   |   |   └── 009.png
│   |   ├── abp1_bamboo
│   |   ...
│   |   └── ugc2_sunroom
│   ├── train_scene_sdr
│   │   ├── abp1_autumnwoods
│   |   |   ├── 000.png
│   |   |   ├── 001.png
│   |   |   ├── 002.png
│   |   |   ...
│   |   |   └── 009.png
│   |   ├── abp1_bamboo
│   |   ...
│   |   └── ugc2_sunroom
│   ├── test_scene_hdr
│   │   ├── abp1_dancinggirl
│   |   |   ├── 000.png
│   |   |   ├── 001.png
│   |   |   ├── 002.png
│   |   |   ...
│   |   |   └── 009.png
│   |   ├── abp1_factoryout1
│   |   ...
│   |   └── ugc2_sculpture
│   └── test_scene_sdr
│       ├── abp1_dancinggirl
│       |   ├── 000.png
│       |   ├── 001.png
│       |   ├── 002.png
│       |   ...
│       |   └── 009.png
│       ├── abp1_factoryout1
│       ...
│       └── ugc2_sculpture
└── HDRTV4KLong
    ├── test_video_scene_hdr
    │   ├── scene01
    |   |   ├── 01.png
    |   |   ├── 02.png
    |   |   ├── 03.png
    |   |   ...
    |   |   └── 30.png
    |   ├── scene02
    |   ...
    |   └── scene10
    └── test_video_scene_sdr
        ├── scene01
        |   ├── 01.png
        |   ├── 02.png
        |   ├── 03.png
        |   ...
        |   └── 30.png
        ├── scene02
        ...
        └── scene10

Running the model

Preprocess

Prepare the training dataset.
Preprocess the train_scene_hdr and train_scene_sdr by running the following command respectively:

python3 preprocessing.py --input_folder [INPUT_FOLDER] --save_folder [SAVE_FOLDER] --n_thread [YOUR_THREAD_COUNT] --crop_sz 128 --step 128 --thres_sz 0 --compression_level 95

Please note that preprocessing is required ONLY for the training dataset; the testing dataset DOES NOT require preprocessing.

Training

The training of WMNet contains two stages.

Stage 1 training

Change working dir into stage1 by cd stage1.
Modify 'train.dataroot_LQ' and 'train.dataroot_GT' in the options/train/fmnet_final.yml with the preprocessed training dataset.
Modify 'val.dataroot_LQ' and 'val.dataroot_GT' in the options/train/fmnet_final.yml with the testing dataset.
Run the following commands for training:

python3 train.py -opt options/train/fmnet_final.yml

Stage 2 training

Change working dir into stage2 by cd stage2.
Modify 'train.dataroot_LQ' and 'train.dataroot_GT' in the options/train/fmnet_final.yml with the preprocessed training dataset.
Modify 'val.dataroot_LQ' and 'val.dataroot_GT' in the options/train/fmnet_final.yml with the testing dataset.
Modify 'path.pretrain_model_G' in the options/train/fmnet_final.yml with the last saved checkpoint in stage 1.
Run the following commands for training:

python3 train.py -opt options/train/fmnet_final.yml

Testing

Prepare the testing dataset.
Change working dir into stage1 or stage2.
Modify 'val.dataroot_LQ' and 'val.dataroot_GT' in the options/train/fmnet_final.yml with the testing dataset.
Run the following commands for testing:

python3 train_val.py -opt options/train/fmnet_final.yml

Results

Pretrained models can be find in the ./pretrain_model folder.

Contact

Thanks for your attention! If you have any suggestion or question, feel free to leave a message here or contact Dr. Zhangkai Ni ([email protected]).

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
pretrain_model		pretrain_model
readme-img		readme-img
stage1		stage1
stage2		stage2
.DS_Store		.DS_Store
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wavelet-Domain Masked Image Modeling for Color-Consistent HDR Video Reconstruction

IEEE Transactions on Multimedia (TMM), 2026

About WMNet

Experimental Results

Quantitative comparisons on HDRTV4K-Scene dataset

Quantitative comparisons on HDRTV4K-LongScene dataset

Environment setup

Getting the data

Directory structure for the datasets

Running the model

Preprocess

Training

Stage 1 training

Stage 2 training

Testing

Results

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wavelet-Domain Masked Image Modeling for Color-Consistent HDR Video Reconstruction

IEEE Transactions on Multimedia (TMM), 2026

About WMNet

Experimental Results

Quantitative comparisons on HDRTV4K-Scene dataset

Quantitative comparisons on HDRTV4K-LongScene dataset

Environment setup

Getting the data

Directory structure for the datasets

Running the model

Preprocess

Training

Stage 1 training

Stage 2 training

Testing

Results

Contact

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages