Yang Zhang1, Zhangkai Ni1, Wenhan Yang2, Hanli Wang1
1Tongji University, 2Pengcheng Laboratory
This repository provides the official implementation for the paper "Wavelet-Domain Masked Image Modeling for Color-Consistent HDR Video Reconstruction", IEEE Transactions on Multimedia (TMM), 2026.
High Dynamic Range (HDR) video reconstruction aims to recover fine brightness, color, and details from Low Dynamic Range (LDR) videos. However, existing methods often suffer from color inaccuracies and temporal inconsistencies. To address these challenges, we propose WMNet, a novel HDR video reconstruction network that leverages Wavelet domain Masked Image Modeling (W-MIM). WMNet adopts a two-phase training strategy: In Phase I, W-MIM performs self-reconstruction pre-training by selectively masking color and detail information in the wavelet domain, enabling the network to develop robust color restoration capabilities. A curriculum learning scheme further refines the reconstruction process. Phase II fine-tunes the model using the pre-trained weights to improve the final reconstruction quality. To improve temporal consistency, we introduce the Temporal Mixture of Experts (T-MoE) module and the Dynamic Memory Module (DMM). T-MoE adaptively fuses adjacent frames to reduce flickering artifacts, while DMM captures long-range dependencies, ensuring smooth motion and preservation of fine details. Additionally, since existing HDR video datasets lack scene-based segmentation, we reorganize HDRTV4K into HDRTV4K-Scene, establishing a new benchmark for HDR video reconstruction. Extensive experiments demonstrate that WMNet achieves state-of-the-art performance across multiple evaluation metrics, significantly improving color fidelity, temporal coherence, and perceptual quality.
TL;DR: We propose WMNet, a novel HDR video reconstruction network that leverages Wavelet domain Masked Image Modeling (W-MIM) with a two-phase training strategy to address color inaccuracies and temporal inconsistencies. It decouples reconstruction into robust color/detail restoration via wavelet-domain masking and temporal coherence enhancement through Temporal Mixture of Experts (T-MoE) and Dynamic Memory Module (DMM).
Performance comparison of various HDR reconstruction models on HDRTV4K-Scene and HDRTV4K-LongScene. The performance on metrics PSNR, SSIM, SR-SIM,
To start, we prefer creating the environment using conda:
conda create -n wmnet
conda activate wmnet
pip install -r requirements.txtPyTorch installation is machine dependent, please install the correct version for your machine.
Dependencies (click to expand)
PyTorch,numpy: main computation.pytorch-msssim: SSIM calculation.tqdm: progress bar.opencv-python,scikit-image: image processing.imageio: images I/O.einops: torch tensor shaping with pretty api.
The datasets we used are as follows:
Please organize the dataset structure in accordance with Section 4.A.1 of the paper.
(click to expand;)
data_path
├── HDRTV4KSence
│ ├── train_scene_hdr
│ │ ├── abp1_autumnwoods
│ | | ├── 000.png
│ | | ├── 001.png
│ | | ├── 002.png
│ | | ...
│ | | └── 009.png
│ | ├── abp1_bamboo
│ | ...
│ | └── ugc2_sunroom
│ ├── train_scene_sdr
│ │ ├── abp1_autumnwoods
│ | | ├── 000.png
│ | | ├── 001.png
│ | | ├── 002.png
│ | | ...
│ | | └── 009.png
│ | ├── abp1_bamboo
│ | ...
│ | └── ugc2_sunroom
│ ├── test_scene_hdr
│ │ ├── abp1_dancinggirl
│ | | ├── 000.png
│ | | ├── 001.png
│ | | ├── 002.png
│ | | ...
│ | | └── 009.png
│ | ├── abp1_factoryout1
│ | ...
│ | └── ugc2_sculpture
│ └── test_scene_sdr
│ ├── abp1_dancinggirl
│ | ├── 000.png
│ | ├── 001.png
│ | ├── 002.png
│ | ...
│ | └── 009.png
│ ├── abp1_factoryout1
│ ...
│ └── ugc2_sculpture
└── HDRTV4KLong
├── test_video_scene_hdr
│ ├── scene01
| | ├── 01.png
| | ├── 02.png
| | ├── 03.png
| | ...
| | └── 30.png
| ├── scene02
| ...
| └── scene10
└── test_video_scene_sdr
├── scene01
| ├── 01.png
| ├── 02.png
| ├── 03.png
| ...
| └── 30.png
├── scene02
...
└── scene10
- Prepare the training dataset.
- Preprocess the
train_scene_hdrandtrain_scene_sdrby running the following command respectively:
python3 preprocessing.py --input_folder [INPUT_FOLDER] --save_folder [SAVE_FOLDER] --n_thread [YOUR_THREAD_COUNT] --crop_sz 128 --step 128 --thres_sz 0 --compression_level 95Please note that preprocessing is required ONLY for the training dataset; the testing dataset DOES NOT require preprocessing.
The training of WMNet contains two stages.
- Change working dir into
stage1bycd stage1. - Modify
'train.dataroot_LQ'and'train.dataroot_GT'in theoptions/train/fmnet_final.ymlwith the preprocessed training dataset. - Modify
'val.dataroot_LQ'and'val.dataroot_GT'in theoptions/train/fmnet_final.ymlwith the testing dataset. - Run the following commands for training:
python3 train.py -opt options/train/fmnet_final.yml- Change working dir into
stage2bycd stage2. - Modify
'train.dataroot_LQ'and'train.dataroot_GT'in theoptions/train/fmnet_final.ymlwith the preprocessed training dataset. - Modify
'val.dataroot_LQ'and'val.dataroot_GT'in theoptions/train/fmnet_final.ymlwith the testing dataset. - Modify
'path.pretrain_model_G'in theoptions/train/fmnet_final.ymlwith the last saved checkpoint in stage 1. - Run the following commands for training:
python3 train.py -opt options/train/fmnet_final.yml- Prepare the testing dataset.
- Change working dir into
stage1orstage2. - Modify
'val.dataroot_LQ'and'val.dataroot_GT'in theoptions/train/fmnet_final.ymlwith the testing dataset. - Run the following commands for testing:
python3 train_val.py -opt options/train/fmnet_final.ymlPretrained models can be find in the ./pretrain_model folder.
Thanks for your attention! If you have any suggestion or question, feel free to leave a message here or contact Dr. Zhangkai Ni ([email protected]).


