Official PyTorch implemeation of "Redistributing the Precision and Content in 3D-LUT-based Inverse Tone-mapping for HDR/WCG Display"(paper(arViv), paper) in CVMP2023 (website, proceedings).
Here are all AI-3D-LUT (look-up table) as far as we know (last updated 07/03/2024), please jump to them if interested.
You can cite our paper if you feel this overview helpful.
@InProceedings{Guo_2023_CVMP,
author = {Guo, Cheng and Fan, Leidong and Zhang, Qian and Liu, Hanyuan and Liu, Kanglin and Jiang, Xiuhua},
title = {Redistributing the Precision and Content in 3D-LUT-based Inverse Tone-mapping for HDR/WCG Display},
booktitle = {Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production (CVMP)},
month = {November},
year = {2023},
pages = {1-10},
doi = {10.1145/3626495.3626503}
}
| AI-3D-LUT algotithms | Expressiveness of the trained LUT | Output of neural network(s) |
Nodes (packing) |
||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Idea | Task | Name |
Publication | Paper |
Code | Institution | #BasicLUT | LUT size each | (#) Extra dimension | ||
| First AI-LUT | Image enhancement /retouching |
A3DLUT | 20-TPAMI | paper | code | HK_PolyU & DJI Innovation | 3×1 | 3×333 | - | weights (of basic LUTs) | uniform |
| C | SA-LUT-Nets | ICCV'21 | paper | - | Huawei Noah's Ark Lab | 3×10 | 3×333 | (10) category | weights & category map | ||
| E |
CLUT-Net | MM'22 | paper | code | CN_TongjiU & OPPO Research | 20×1 |
3×5×20 (compressed LUT representation) | - | weights | ||
| E | F2D-LUT | paper | code | CN_TsinghuaU | 6×3 | 2×332 (3D LUT decoupled to 2D LUTs) | (3) R-G/R-B/G-B channel order | ||||
| N | AdaInt | CVPR'22 | paper | code | CN_SJTU & Alibaba Group | 3×1 | 3×333 | - | weights & nodes | learned non-uniform | |
| N | SepLUT | ECCV'22 | paper | code | 1 (no self-adaptibility) | 3×93 or 3×173 | directly 1D & 3D LUTs | learned non-linear by 1D LUT | |||
| C |
DualBLN | ACCV'22 | paper | code | CN_NorthwesternPolyU | 5×1 | 3×363 | LUT fusion map | uniform | ||
| C | 4D-LUT | 23-TIP | paper | - | CN_XianJiaotongU & Microsoft Research Asia | 3×1 | 3×334 | (33) context | weights & context map | ||
| C & E | AttentionLUT | 24-ArXiv | paper | - | CN_SJTU John Hopcroft Center | no (donot relay on basic LUT for self-adaptibility) | 9×15×33 (represented by Canonical Polyadic decomposition) | - | feature (to encode Q,K,V tensors) | ||
| E | Photorealistic Style Transfer |
NLUT | 23-arXiv | paper | code | Sobey Digital Technology & Peng Cheng Lab | 2048×1 | 3×32×32 (compressed LUT representation) | - | weights | |
| C | Video Low-light enhancement |
IA-LUT | MM'23 | paper | code | CN_SJTU & Alibaba Damo Academy | 3×1 | 3×334 | (33) intensity | weights & intensity map | |
| No | Underwater Imge Enhancement | INAM-LUT | 23-Sensors | paper | - | CN_XidianU | 3×1 | 3×33(?)3 | - | weights | |
| C | Tone-mapping | LapLUT | NeurIPS'23 | paper | - | CN_HUST & DJI Innovation | 3×1 | 3×333 | - | weight map (of each interpolated image) | |
| Ours | HDR/WCG Inverse Tone-mapping |
ITM-LUT |
CVMP'23 | paper | see below | CN_CUC & Peng Cheng Lab | 5×3 | 3×173 | (3) luminance probability (contribution) |
weights | explicitly defined non-uniform |
In col. idea:
C stands for improving the expressiveness of LUT content (by new way to generate image-adaptive LUT or introducing new dimension);
E stands for making LUT further efficient (by special representation of LUT's elements);
N stands for setting non-uniform nodes (to optimize LUT's interpolation error on image with specific numerical distribution).
Note that:
We only listed AI-3D-LUTs for image-to-image low-level vision tasks, and below AI-LUTs are not included:
- Non-3D AI-LUTs for other CV tasks: e.g. SR-LUT, MuLUT(paper1, paper2(extented to image restoration)), VA-LUT, SPLUT (super-resolution, non-3D-LUT), MEFLUT (multi-exposure fusion, 1D-LUT), SA-LuT-Nets (medical imaging) etc. (Such LUTs may not even involve an interpolation process).
- Claim to be AI-LUT, but use other mechanism to conduct image-to-image transform: e.g. NILUT (represent LUT transform using MLP(multi-layer perceptron)) etc.
Our AI-3D-LUT alogorithm named ITM-LUT conduct inverse tone-mapping (ITM) from standard dynamic range (SDR) image/frame to its high dynamic range and wide color gamut (HDR/WCG) version.
- Self-adaptability: LUT content will alter with input SDR's statistics, by merging basic LUTs using neural-network-generated weight from input SDR.
- AI-learning: Rather a 'top-down design' static LUT, our LUT can be learned from any dataset in 'bottom-up' manner, enabling the reverse engineering of any technical and artistic intent between SDR and HDR/WCG.
- HDR/WCG optimization: For a LUT processing higher-bit-depth HDR/WCG content (requiring larger LUT size N), we use 3 LUTs with different non-uniform nodes. Their result will have less interpolation error respectively in different ranges, so we use a pixel-wise contribution map to blend their best ranges. In this case, 3 smaller LUTs (e.g. N=17) can reach the same error level to single bigger LUT (e.g. N=33), while occupy less #elements (e.g. 44217<107811).
- Python
- PyTorch
- OpenCV
- ImageIO
- NumPy
- GCC/G++
First, install the CUDA&C++ implementation of trilinear interpolation with non-uniform vertices (need GCC/G++):
python3 ./ailut/setup.py installafter that, you can get ailut package in your python.
Run test.py with below configuration(s):
python3 test.py frameName.jpgWhen batch processing, use wildcard *:
python3 test.py framesPath/*.pngor like:
python3 test.py framesPath/footageName_*.pngAdd below configuration(s) for specific propose:
| Propose | Configuration |
|---|---|
| Specifing output path | -out resultDir/ (default is inputDir) |
| Resizing image before inference | -resize True -height newH -width newW |
| Adding filename tag | -tag yourTag |
| Forcing CPU processing | -use_gpu False |
| Using input SDR with bit depth != 8 | e.g. -in_bitdepth 16 |
| Saving result HDR in other format (defalut is uncompressed 16-bit .tifof single frame) |
-out_format suffixpng as 16bit .pngexr require extra package openEXR |
Change line 104 in test.py to use other parameters/checkpoint:
- Current
params.pthis trained on our own HDRTV4K dataset and DaVinci degradation model (available here), it can score 35.14dB the PSNR, 0.9605 the SSIM, 14.330 the$\Delta$ Eitp and 9.1181 VDP3 ('task'='side-by-side', 'color_encoding'='rgb-bt.2020', 'pixel_per_degree'=60 on 1920*1080 image) on HDRTV4K-DaVinci testset. - Checkpoint
params_TV1K.pthis trained on popular HDRTV1K dataset and YouTube degradation model, it can score 36.69dB the PSNR, 0.9811 the SSIM, 10.194 the$\Delta$ Eitp and 8.9122 VDP3 ('task'='side-by-side', 'color_encoding'='rgb-bt.2020', 'pixel_per_degree'=60 on 1920*1080 image) on HDRTV1K testset. - We will later release more interesting checkpoint(s).
First, download the training code from BaiduNetDisk(code:qgs2) or GoogleDrive. This package contain 5 essential real ITM LUTs used in our own LUT initialization, and other 13 real ITM LUTs (in both N=17/33/65) where you can use any of their combinations to try new LUT initialization.
Then:
cd ITMLUT_train/codespython3 train.py -opt options/test/test_Net.yml- You can modify training configuration e.g. #basicLUTs and LUTsize at
codes/options/test/test_Net.yml. - Rename any aother LUT in
codes/real_luts/other_lutsto e.g.2_17.cubeincodes/real_lutsto try new initialization, remenber to delete the first row (str) when using other commercial LUT(s).
| Date | log |
|---|---|
| 29 Feb 2024 | Since most SoTAs are still trained and tested on HDRTV1K dataset, we add a checkpoint params_TV1K.pth trained on it, so result will get a similar look as SoTAs. |
| 3 Mar 2024 | Training code (along with 18 real ITM LUTs in N=17/33/65) is now released. |
Guo Cheng (Andre Guo) [email protected]
- State Key Laboratory of Media Convergence and Communication (MCC), Communication University of China (CUC), Beijing, China.
- Peng Cheng Laboratory (PCL), Shenzhen, China.