Jun Xiao

Singapore
293 followers 262 connections

Join to view profile

About

I currently work as a researcher and engineer position at Zoom. Before joining in Zoom, I…

Activity

293 followers

See all activities

Experience & Education

  • Zoom

View Jun’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Licenses & Certifications

Publications

  • Multi-scale Sampling and Aggregation Network For High Dynamic Range Imaging

    Arxiv

    High dynamic range (HDR) imaging is a fundamental problem in image processing, which aims to generate well-exposed images, even in the presence of varying illumination in the scenes. In recent years, multi-exposure fusion methods have achieved remarkable results, which merge multiple low dynamic range (LDR) images, captured with different exposures, to generate corresponding HDR images. However, synthesizing HDR images in dynamic scenes is still challenging and in high demand. There are two…

    High dynamic range (HDR) imaging is a fundamental problem in image processing, which aims to generate well-exposed images, even in the presence of varying illumination in the scenes. In recent years, multi-exposure fusion methods have achieved remarkable results, which merge multiple low dynamic range (LDR) images, captured with different exposures, to generate corresponding HDR images. However, synthesizing HDR images in dynamic scenes is still challenging and in high demand. There are two challenges in producing HDR images: 1). Object motion between LDR images can easily cause undesirable ghosting artifacts in the generated results. 2). Under and overexposed regions often contain distorted image content, because of insufficient compensation for these regions in the merging stage. In this paper, we propose a multi-scale sampling and aggregation network for HDR imaging in dynamic scenes. To effectively alleviate the problems caused by small and large motions, our method implicitly aligns LDR images by sampling and aggregating high-correspondence features in a coarse-to-fine manner. Furthermore, we propose a densely connected network based on discrete wavelet transform for performance improvement, which decomposes the input into several non-overlapping frequency subbands and adaptively performs compensation in the wavelet domain. Experiments show that our proposed method can achieve state-of-the-art performances under diverse scenes, compared to other promising HDR imaging methods. In addition, the HDR images generated by our method contain cleaner and more detailed content, with fewer distortions, leading to better visual quality.

    See publication
  • Online Video Super-Resolution with Convolutional Kernel Bypass Graft

    Arxiv

    Deep learning-based models have achieved remarkable performance in video super-resolution (VSR) in recent years, but most of these models are less applicable to online video applications. These methods solely consider the distortion quality and ignore crucial requirements for online applications, e.g., low latency and low model complexity. In this paper, we focus on online video transmission, in which VSR algorithms are required to generate high-resolution video sequences frame by frame in real…

    Deep learning-based models have achieved remarkable performance in video super-resolution (VSR) in recent years, but most of these models are less applicable to online video applications. These methods solely consider the distortion quality and ignore crucial requirements for online applications, e.g., low latency and low model complexity. In this paper, we focus on online video transmission, in which VSR algorithms are required to generate high-resolution video sequences frame by frame in real time. To address such challenges, we propose an extremely low-latency VSR algorithm based on a novel kernel knowledge transfer method, named convolutional kernel bypass graft (CKBG). First, we design a lightweight network structure that does not require future frames as inputs and saves extra time costs for caching these frames. Then, our proposed CKBG method enhances this lightweight base model by bypassing the original network with ``kernel grafts'', which are extra convolutional kernels containing the prior knowledge of external pretrained image SR models. In the testing phase, we further accelerate the grafted multi-branch network by converting it into a simple single-path structure. Experiment results show that our proposed method can process online video sequences up to 110 FPS, with very low model complexity and competitive SR performance.

    See publication
  • Progressive and Selective Fusion Network for High Dynamic Range Imaging

    the 29th ACM International Conference on Multimedia

    This paper considers the problem of generating an HDR image of a scene from its LDR images. Recent studies employ deep learning and solve the problem in an end-to-end fashion, leading to significant performance improvements. However, it is still hard to generate a good quality image from LDR images of a dynamic scene captured by a hand-held camera, e.g., occlusion due to the large motion of foreground objects, causing ghosting artifacts. The key to success relies on how well we can fuse the…

    This paper considers the problem of generating an HDR image of a scene from its LDR images. Recent studies employ deep learning and solve the problem in an end-to-end fashion, leading to significant performance improvements. However, it is still hard to generate a good quality image from LDR images of a dynamic scene captured by a hand-held camera, e.g., occlusion due to the large motion of foreground objects, causing ghosting artifacts. The key to success relies on how well we can fuse the input images in their feature space, where we wish to remove the factors leading to low-quality image generation while performing the fundamental computations for HDR image generation, e.g., selecting the best-exposed image/region. We propose a novel method that can better fuse the features based on two ideas. One is multi-step feature fusion; our network gradually fuses the features in a stack of blocks having the same structure. The other is the design of the component block that effectively performs two operations essential to the problem, i.e., comparing and selecting appropriate images/regions. Experimental results show that the proposed method outperforms the previous state-of-the-art methods on the standard benchmark tests.

    See publication
  • Self-feature Learning: An Efficient Deep Lightweight Network for Image Super-resolution

    the 29th ACM International Conference on Multimedia

    Deep learning-based models have achieved unprecedented performance in single image super-resolution (SISR). However, existing deep learning-based models usually require high computational complexity to generate high-quality images, which limits their applications in edge devices, e.g., mobile phones. To address this issue, we propose a dynamic, channel-agnostic filtering method in this paper. The proposed method not only adaptively generates convolutional kernels based on the local information…

    Deep learning-based models have achieved unprecedented performance in single image super-resolution (SISR). However, existing deep learning-based models usually require high computational complexity to generate high-quality images, which limits their applications in edge devices, e.g., mobile phones. To address this issue, we propose a dynamic, channel-agnostic filtering method in this paper. The proposed method not only adaptively generates convolutional kernels based on the local information of each position, but also can significantly reduce the cost of computing the inter-channel redundancy. Based on this, we further propose a simple, yet effective, deep lightweight model for SISR. Experiment results show that our proposed model outperforms other state-of-the-art deep lightweight SISR models, leading to the best trade-off between the performance and the number of model parameters.

    See publication
  • Balanced distortion and perception in single-image super-resolution based on optimal transport in wavelet domain

    Neurocomputing

    Single image super-resolution (SISR) is a classic ill-posed problem in computer vision. In recent years, deep-learning-based (DL-based) models have achieved promising results with the SISR problem. However, most existing methods suffer from an intrinsic trade-off between distortion and perceptual quality. To satisfy the requirements in different real-world situations, the balance of distortion and visual quality for image super-resolution is a critical issue. In DL-based models, the uses of…

    Single image super-resolution (SISR) is a classic ill-posed problem in computer vision. In recent years, deep-learning-based (DL-based) models have achieved promising results with the SISR problem. However, most existing methods suffer from an intrinsic trade-off between distortion and perceptual quality. To satisfy the requirements in different real-world situations, the balance of distortion and visual quality for image super-resolution is a critical issue. In DL-based models, the uses of hybrid loss (i.e., the combination of the distortion loss and the perceptual loss) and network interpolation are two common approaches to balancing the distortion and perceptual quality of super-resolved images. However, these two kinds of methods lack flexibility and hold strict constraints on network architectures. In this paper, we propose an image-fusion interpolation method for image super-resolution, which can balance the distortion and visual quality of super-resolved images, based on the optimal transport theory in the wavelet domain. The advantage of our proposed method is that it can be applied to any pretrained DL-based model, without any requirement from the network architecture and parameters. In addition, our proposed method is parameter-free and can run fast without using a GPU. Compared with existing state-of-the-art SISR methods, experiment results show that our proposed method can achieve a better balance between the distortion and visual quality in super-resolved images.

    See publication
  • Invertible image decolorization

    IEEE Transaction on Image Processing

    Invertible image decolorization is a useful color compression technique to reduce the cost in multimedia systems. Invertible decolorization aims to synthesize faithful grayscales from color images, which can be fully restored to the original color version. In this paper, we propose a novel color compression method to produce invertible grayscale images using invertible neural networks (INNs). Our key idea is to separate the color information from color images, and encode the color information…

    Invertible image decolorization is a useful color compression technique to reduce the cost in multimedia systems. Invertible decolorization aims to synthesize faithful grayscales from color images, which can be fully restored to the original color version. In this paper, we propose a novel color compression method to produce invertible grayscale images using invertible neural networks (INNs). Our key idea is to separate the color information from color images, and encode the color information into a set of Gaussian distributed latent variables via INNs. By this means, we force the color information lost in grayscale generation to be independent of the input color image. Therefore, the original color version can be efficiently recovered by randomly re-sampling a new set of Gaussian distributed variables, together with the synthetic grayscale, through the reverse mapping of INNs. To effectively learn the invertible grayscale, we introduce the wavelet transformation into a UNet-like INN architecture, and further present a quantization embedding to prevent the information omission in format conversion, which improves the generalizability of the framework in real-world scenarios. Extensive experiments on three widely used benchmarks demonstrate that the proposed method achieves a state-of-the-art performance in terms of both qualitative and quantitative results, which shows its superiority in multimedia communication and storage systems.

    See publication
  • Bayesian sparse hierarchical model for image denoising

    Signal Processing: Image Communication

    Sparse models and their variants have been extensively investigated, and have achieved great success in image denoising. Compared with recently proposed deep-learning-based methods, sparse models have several advantages: (1) Sparse models do not require a large number of pairs of noisy images and the corresponding clean images for training. (2) The performance of sparse models is less reliant on the training data, and the learned model can be easily generalized to natural images across…

    Sparse models and their variants have been extensively investigated, and have achieved great success in image denoising. Compared with recently proposed deep-learning-based methods, sparse models have several advantages: (1) Sparse models do not require a large number of pairs of noisy images and the corresponding clean images for training. (2) The performance of sparse models is less reliant on the training data, and the learned model can be easily generalized to natural images across different noise domains. In sparse models, norm penalty makes the problem highly non-convex, which is difficult to be solved. Instead, L0 norm penalty is commonly adopted for convex relaxation, which is considered as the Laplacian prior from the Bayesian perspective. However, many previous works have revealed that L1 norm regularization causes a biased estimation for the sparse code, especially for high-dimensional data, e.g., images. In this paper, instead of using the L1 norm penalty, we employ an improper prior in the sparse model and formulate a hierarchical sparse model for image denoising. Compared with other competitive methods, experiment results show that our proposed method achieves a better generalization for images with different characteristics across various domains, and achieves state-of-the-art performance for image denoising on several benchmark datasets.

    See publication
  • Deep multi-task learning for facial expression recognition and synthesis based on selective feature sharing

    25th IEEE International Conference on Pattern Recognition (ICPR)

    Multi-task learning is an effective learning strategy for deep-learning-based facial expression recognition tasks. However, most existing methods take into limited consideration the feature selection, when transferring information between different tasks, which may lead to task interference when training the multi-task networks. To address this problem, we propose a novel selective feature-sharing method, and establish a multi-task network for facial expression recognition and facial expression…

    Multi-task learning is an effective learning strategy for deep-learning-based facial expression recognition tasks. However, most existing methods take into limited consideration the feature selection, when transferring information between different tasks, which may lead to task interference when training the multi-task networks. To address this problem, we propose a novel selective feature-sharing method, and establish a multi-task network for facial expression recognition and facial expression synthesis. The proposed method can effectively transfer beneficial features between different tasks, while filtering out useless and harmful information. Moreover, we employ the facial expression synthesis task to enlarge and balance the training dataset to further enhance the generalization ability of the proposed method. Experimental results show that the proposed method achieves state-of-the-art performance on those commonly used facial expression recognition benchmarks, which makes it a potential solution to real-world facial expression recognition problems.

    See publication
  • Progressive Motion Representation Distillation With Two-Branch Networks for Egocentric Activity Recognition

    IEEE Signal Processing Letter

    Video-based egocentric activity recognition involves fine-grained spatio-temporal human-object interactions. State-of-the-art methods, based on the two-branch-based architecture, rely on pre-calculated optical flows to provide motion information. However, this two-stage strategy is computationally intensive, storage demanding, and not task-oriented, which hampers it from being deployed in real-world applications. Albeit there have been numerous attempts to explore other motion representations…

    Video-based egocentric activity recognition involves fine-grained spatio-temporal human-object interactions. State-of-the-art methods, based on the two-branch-based architecture, rely on pre-calculated optical flows to provide motion information. However, this two-stage strategy is computationally intensive, storage demanding, and not task-oriented, which hampers it from being deployed in real-world applications. Albeit there have been numerous attempts to explore other motion representations to replace optical flows, most of the methods were designed for third-person activities, without capturing fine-grained cues. To tackle these issues, in this letter, we propose a progressive motion representation distillation (PMRD) method, based on two-branch networks, for egocentric activity recognition. We exploit a generalized knowledge distillation framework to train a hallucination network, which receives RGB frames as input and produces motion cues guided by the optical-flow network. Specifically, we propose a progressive metric loss, which aims to distill local fine-grained motion patterns in terms of each temporal progress level. To further enforce the proposed distillation framework to concentrate on those informative frames, we integrate a temporal attention mechanism into the metric loss. Moreover, a multi-stage training procedure is employed for the efficient learning of the hallucination network. Experimental results on three egocentric activity benchmarks demonstrate the state-of-the-art performance of the proposed method.

    See publication
  • Deep Progressive Convolutional Neural Network for Blind Super-Resolution With Multiple Degradations

    IEEE International Conference on Image Processing (ICIP), 2019.

    Blind super-resolution (SR) of blurry and noisy low-resolution (LR) images is still a challenging problem in single image super-resolution (SISR). The performance of most existing convolutional neural network (CNN)-based models is inevitably degraded when LR images are corrupted by both blur and noise. For those blind SR methods based on kernel estimation, accurate estimation is barely attained under complex degradations and this gives rise to poor-quality results. To address these problems, we…

    Blind super-resolution (SR) of blurry and noisy low-resolution (LR) images is still a challenging problem in single image super-resolution (SISR). The performance of most existing convolutional neural network (CNN)-based models is inevitably degraded when LR images are corrupted by both blur and noise. For those blind SR methods based on kernel estimation, accurate estimation is barely attained under complex degradations and this gives rise to poor-quality results. To address these problems, we propose a deep progressive network under a probabilistic framework and a novel up-sampling method for blind super-resolution with multiple degradations, which effectively utilizes image priors across scales. Experimental results show that the proposed method achieves promising performance on images with multiple degradations.

    See publication

Projects

  • Machine Learning Algorithms for Financial Applications

    - Present

    1. Apply advanced deep learning models (e.g., RNN-based, LSTM-based, and transformer-based models) to forecast stock return based on the CSI300 data in the China A-Shares market.
    2. Proposed Bayesian state-space models for pairs trading. The methods are robust to non-Gaussian noise and adaptively estimate the spread between the selected assets.
    3. Proposed sparse representation models for financial index tracking. The advanced sparse algorithms, e.g., re-weighted L1-norm approximation…

    1. Apply advanced deep learning models (e.g., RNN-based, LSTM-based, and transformer-based models) to forecast stock return based on the CSI300 data in the China A-Shares market.
    2. Proposed Bayesian state-space models for pairs trading. The methods are robust to non-Gaussian noise and adaptively estimate the spread between the selected assets.
    3. Proposed sparse representation models for financial index tracking. The advanced sparse algorithms, e.g., re-weighted L1-norm approximation and the minimax concave penalty, are applied to improve the tracking performance.

  • High Dynamic Range (HDR) Imaging With Large-scale Motion

    -

    1. The ghosting artifacts and corrupted content caused by objective motions are challenging issues for HDR imaging.
    2. Proposed a progressive feature fusion scheme for deep learning models which can effectively generate ghost-free HDR images. The proposed method can achieve 44.06 dB in terms of PSNR, which significantly outperforms the baseline method by 1.35 dB.
    3. Proposed a sampling and aggregation network for HDR imaging in the wavelet domain. The method hierarchically selects similar…

    1. The ghosting artifacts and corrupted content caused by objective motions are challenging issues for HDR imaging.
    2. Proposed a progressive feature fusion scheme for deep learning models which can effectively generate ghost-free HDR images. The proposed method can achieve 44.06 dB in terms of PSNR, which significantly outperforms the baseline method by 1.35 dB.
    3. Proposed a sampling and aggregation network for HDR imaging in the wavelet domain. The method hierarchically selects similar image patches from multi-scale spaces and then aggregates them for motion alignment. In addition, wavelet transform is adopted for feature fusion, which can effectively restore the corrupted contents. The performance can be up to 44.38 dB, which is 1.68 dB higher than the baseline. (Submitted to TMM, 2022)

  • Deep Lightweight Image Super-resolution (SR) Models

    -

    1. Existing deep image SR models require high computational complexity and memory consumption, making them less applicable in resource-constraint devices, e.g., mobile phones, personal computers, etc.
    2. Proposed a feature compression algorithm based on the knowledge-distillation module. Compared with the benchmark, e.g., EDSR (1,370K, 26.07dB), the proposed method can reduce the model parameters by 50% and achieve comparable performance (ours: 690K, 25.89dB). (Published in ICASSP…

    1. Existing deep image SR models require high computational complexity and memory consumption, making them less applicable in resource-constraint devices, e.g., mobile phones, personal computers, etc.
    2. Proposed a feature compression algorithm based on the knowledge-distillation module. Compared with the benchmark, e.g., EDSR (1,370K, 26.07dB), the proposed method can reduce the model parameters by 50% and achieve comparable performance (ours: 690K, 25.89dB). (Published in ICASSP, 2021)
    3. Designed a lightweight, spatially variant convolutional kernel, which significantly reduces the model complexity by 78%. Compared with other lightweight models, the proposed model can achieve the best performance, with only 264K model parameters. (Published in ACM(MM), 2021)

  • The Distortion-perception Trade-off for Image Super-resolution

    -

    1. Deep image SR models have a problem with generating over-smoothed images, which results in low perceptual quality. Although GAN-based methods may effectively synthesize texture information, the distorted content is a major concern. For real-world applications, balancing the distortion-perception trade-off is still a necessary and significant problem.
    2. Proposed an efficient image fusion algorithm based on optimal transport theory in the wavelet domain, which can effectively maintain the…

    1. Deep image SR models have a problem with generating over-smoothed images, which results in low perceptual quality. Although GAN-based methods may effectively synthesize texture information, the distorted content is a major concern. For real-world applications, balancing the distortion-perception trade-off is still a necessary and significant problem.
    2. Proposed an efficient image fusion algorithm based on optimal transport theory in the wavelet domain, which can effectively maintain the distortion quality and improve the perceptual quality by 50\%} in the Set14 dataset. In addition, the average running time is reduced from 5.6 hours to 3.6 seconds, without GPU requirements. (Published in Neurocomputing, 2021)

Honors & Awards

  • Stars of Tomorrow Internship

    Microsoft Researcher Aisa

Languages

  • Mandarin

    Native or bilingual proficiency

  • Cantonese

    Native or bilingual proficiency

  • English

    Professional working proficiency

View Jun’s full profile

  • See who you know in common
  • Get introduced
  • Contact Jun directly
Join to view full profile

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses