Zhaohu Xing 邢兆虎

VIP Lab,
No.1, Duxue Road,
Guangzhou, China

Email: [email protected]

Next Image on Click
图片1
click the photo to see more

Biography

I'm Zhaohu Xing (邢兆虎), a third-year Ph.D. candidate at HKUST-GZ, supervised by Prof. Lei Zhu and Prof. Fugee Tsung. Before that, I received my Master's degree from Tianjin University.

My research focuses on video generation, visual perception, and medical vision foundation models. Recently, I'm particularly interested in RL-based post-training (e.g., GRPO) for multimodal models and agentic video synthesis.

I have built hands-on experience in (i) video generation & multimodal models — agentic multi-shot video generation and human-centric video captioning via RL training; (ii) visual perception — large-scale mirror/reflection detection with iterative data engines; and (iii) medical vision foundation models — long-range sequential modeling with Mamba for medical image/video segmentation (SegMamba, Vivim).

My work has appeared at CVPR (2025; 2024 Spotlight), NeurIPS (2024), ECCV (2024), MICCAI (2024 Spotlight), ACM MM (2024 Oral; 2025), IEEE TMM/TMI, and more.

Publications

Publications
AgentShot: Towards Agentic Multi-Shot Video Generation and BenchmarkingVideo Generation
Zhaohu Xing, Yingwei Song, Jiantong Zhao, Tian Ye, Yuan Gao, Kelly Peng, Lei Zhu, and Huanrui Yang
State: Submitted to ECCV 2026.
Proposes an agentic framework for generating multi-shot videos with consistent characters and scenes.
 
HVCap: Human-centric Video Captioning via Factorized GRPO and a New BenchmarkMultimodal Models
Zhaohu Xing, Jiantong Zhao, Tian Ye, Yuan Gao, and Kelly Peng
State: Submitted to ICML 2026.
Introduces factorized GRPO for human-centric video captioning with a new benchmark.
 
Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data EngineVisual Perception
Zhaohu Xing, Lihao Liu, Yijun Yang, Hongqiu Wang, Tian Ye, Sixiang Chen, Wenxue Li, Guang Liu, and Lei Zhu
State: Accepted by CVPR2025.
Scales mirror detection to large unlabeled data via an iterative self-training data engine.
 
Farther Than Mirror: Explore Pattern-Compensated Depth of Mirror with Temporal Changes for Video Mirror DetectionVisual Perception
Zhaohu Xing, Lihao Liu, Tian Ye, Sixiang Chen, Yijun Yang, Xiaojie Xu, Guang Liu, and Lei Zhu
State: Accepted by ACM MM2025
Exploits temporal depth changes to detect mirrors in videos beyond appearance cues.
 
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image SegmentationMedical Image
Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu and Lei Zhu
State: Early Accepted by MICCAI2024. (Spotlight Presentation)
Applies Mamba's long-range sequential modeling to enhance vision encoder.
 
SegMamba-V2: Long-range Sequential Modeling Mamba For 3D Medical Image SegmentationMedical Image
Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu and Lei Zhu
State: Accepted by IEEE Transactions on Medical Imaging (TMI)
Extends SegMamba with improved architecture for stronger 3D medical image segmentation.
 
Cross-conditioned Diffusion Model for medical image-to-image translationMedical Image
Zhaohu Xing, Sicheng Yang, Sixiang Chen, Tian Ye, Jing Qin, and Lei Zhu
State: Accepted by MICCAI2024
Bridges unpaired medical image modalities via cross-conditioned diffusion generation.
 
Diff-UNet: A Diffusion Embedded Network for Robust 3D Medical Image SegmentationMedical Image
Zhaohu Xing, Huazhu Fu, Guang Yang, and Lei Zhu
State: Accepted by Medical Image Analysis (MedIA).
Embeds diffusion probabilistic models into U-Net for robust 3D medical segmentation.
 
Hybrid Masked Image Modeling for 3D Medical Image SegmentationMedical Image
Zhaohu Xing, Liang Wan, Lequan Yu, and Lei Zhu
State: Accepted by IEEE Journal of Biomedical and Health Informatics (JBHI) 2024.
 
NestedFormer: Nested Modality-Aware Transformer for Brain Tumor SegmentationMedical Image
Zhaohu Xing, Lequan Yu, Liang Wan, and Lei Zhu
State: Accepted by MICCAI2022.
Designs a nested modality-aware transformer for multi-modal brain tumor segmentation.
 
Temporal Prompt Learning with Depth Memory for Video Mirror DetectionVisual Perception
Zhaohu Xing, Tian Ye, Sixiang Chen, and Lei Zhu
State: Accepted by IEEE TMM.
Leverages depth memory and temporal prompts for accurate video mirror detection.
 
Non-Invasive to Invasive: Enhancing FFA Synthesis from CFP with a Benchmark Dataset and a Novel NetworkMedical Image
Hongqiu Wang, Zhaohu Xing*, Weitong Wu, Yijun Yang, Qingqing Tang, Yanwu Xu, and Lei Zhu
State: Accepted by ACM MM Workshop (2024).
Synthesizes invasive FFA images from non-invasive CFP with a new paired benchmark.
 
Cascaded Inner-Outer Clip Retformer for Ultrasound Video Object SegmentationMedical Image
Jialu Li, Lei Zhu, Zhaohu Xing, Baoliang Zhao, Ying Hu, Faqin Lv, Qiong Wang
State: Accepted by IEEE Journal of Biomedical and Health Informatics (JBHI) 2024.
Segments ultrasound video objects via cascaded inner-outer clip transformers.
 
PromptHaze: Prompting Real-world Dehazing via Depth Anything ModelImage Restoration
Tian Ye, Sixiang Chen, Haoyu Chen, Wenhao Chai, Jingjing Ren, Zhaohu Xing, Wenxue Li, and Lei Zhu
State: Accepted by AAAI24
Prompts depth estimation models to provide structural priors for real-world dehazing.
 
AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image EnhancementImage Restoration
Yunlong Lin, Tian Ye, Sixiang Chen, Zhenqi Fu, Yingying Wang, Wenhao Chai, Zhaohu Xing, Lei Zhu, and Xinghao Ding
State: Accepted by AAAI24
Guides diffusion models with illumination priors for unsupervised low-light enhancement.
 
Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything ConstraintImage Restoration
Sixiang Chen, Tian Ye, Kai Zhang, Zhaohu Xing, Yunlong Lin, and Lei Zhu
State: Accepted by ECCV2024
Tailors restoration prompts per degradation type with depth-aware constraints.
 
Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?Medical Image
Zhaohu Xing, and etc
State: Accepted by NIPS2024
A large-scale benchmark to rigorously evaluate medical image segmentation algorithms.
 
Timeline and Boundary Guided Diffusion Network for Video Shadow DetectionVisual Perception
Haipeng Zhou, Hongqiu Wang, Tian Ye, Zhaohu Xing, Jun Ma, Ping Li, Qiong Wang, and Lei Zhu
State: Accepted by ACM MM 2024 (Oral Presentation)
Detects video shadows via timeline and boundary guided diffusion reasoning.
 
Learning Diffusion Texture Priors for Image RestorationImage Restoration
Tian Ye, Sixiang Chen, Wenhao Chai, Zhaohu Xing, Qing Qin, and Lei Zhu
State: Accepted by CVPR2024 (spotlight).
Learns diffusion-based texture priors to improve perceptual quality of image restoration.
 
Anchored Supervised Contrastive Learning for Long-Tailed Medical Image RegressionMedical Image
Zhaoying Li, Zhaohu Xing, Lei Zhu, and Liang Wan
State: Accepted by PRCV2024.
Anchors contrastive learning to handle long-tailed distributions in medical regression.
 
Standing on the Giants: Informative Messenger Prompts with Self-adapter for ImageImage Restoration
Sixiang Chen, Tian Ye, Haoyu Chen, Yijun Yang, Zhaohu Xing, Fugee Tsung, and Lei Zhu
Submitted to ECCV2024
Transfers knowledge from large pretrained models via informative messenger prompts.
 
Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal ReasoningMedical Image
YingLing Lu, Yijun Yang, Zhaohu Xing, and Lei Zhu
State: Early Accepted by MICCAI2024.
Segments polyps in videos via multi-task diffusion with adversarial temporal reasoning.
 
Vivim: a Video Vision Mamba for Medical Video Object SegmentationMedical Image
Yijun Yang, Zhaohu Xing, and Lei Zhu
State: Early Accepted by IEEE TSCVT.
 
DiffMIC-v2: Medical Image Classification via Improved Diffusion NetworkMedical Image
Yijun Yang, Huazhu Fu, Angelica I. Aviles-Rivero, Zhaohu Xing, Lei Zhu
State: Accepted by IEEE TMI
Improves diffusion-based medical image classification with a refined network design.
 
Uncertainty-aware Multi-dimensional Mutual Learning for Brain and Brain Tumor SegmentationMedical Image
Junting Zhao, Zhaohu Xing, and Lei Zhu
State: Accepted by IEEE Journal of Biomedical and Health Informatics (JBHI) 2023.
Jointly segments brain and tumor via uncertainty-aware multi-dimensional mutual learning.
 
Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challengeMedical Image
Zhaohu Xing, Lei Zhu, and etc.
State: Accepted by Medical Image Analysis
Benchmarks AI algorithms for pulmonary fibrosis imaging biomarker detection.
 
SegRap2023 : A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal CarcinomaMedical Image
Zhaohu Xing, Lei Zhu, and etc.
State: Accepted by Medical Image Analysis
Benchmarks organs-at-risk and tumor segmentation for nasopharyngeal carcinoma radiotherapy.

Template from Jon Barron
Last updated: 12/19, 2023

Website Hit Counters