Zhaohu Xing 邢兆虎

VIP Lab,
No.1, Duxue Road,
Guangzhou, China

Email: [email protected]

Next Image on Click

click the photo to see more

Biography

I'm Zhaohu Xing (邢兆虎), a third-year Ph.D. candidate at HKUST-GZ, supervised by Prof. Lei Zhu and Prof. Fugee Tsung. Before that, I received my Master's degree from Tianjin University.

My research focuses on video generation, visual perception, and medical vision foundation models. Recently, I'm particularly interested in RL-based post-training (e.g., GRPO) for multimodal models and agentic video synthesis.

I have built hands-on experience in (i) video generation & multimodal models — agentic multi-shot video generation and human-centric video captioning via RL training; (ii) visual perception — large-scale mirror/reflection detection with iterative data engines; and (iii) medical vision foundation models — long-range sequential modeling with Mamba for medical image/video segmentation (SegMamba, Vivim).

My work has appeared at CVPR (2025; 2024 Spotlight), NeurIPS (2024), ECCV (2024), MICCAI (2024 Spotlight), ACM MM (2024 Oral; 2025), IEEE TMM/TMI, and more.

Publications

	AgentShot: Towards Agentic Multi-Shot Video Generation and BenchmarkingVideo Generation Zhaohu Xing, Yingwei Song, Jiantong Zhao, Tian Ye, Yuan Gao, Kelly Peng, Lei Zhu, and Huanrui Yang State: Submitted to ECCV 2026. Proposes an agentic framework for generating multi-shot videos with consistent characters and scenes.

	HVCap: Human-centric Video Captioning via Factorized GRPO and a New BenchmarkMultimodal Models Zhaohu Xing, Jiantong Zhao, Tian Ye, Yuan Gao, and Kelly Peng State: Submitted to ICML 2026. Introduces factorized GRPO for human-centric video captioning with a new benchmark.

	Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data EngineVisual Perception Zhaohu Xing, Lihao Liu, Yijun Yang, Hongqiu Wang, Tian Ye, Sixiang Chen, Wenxue Li, Guang Liu, and Lei Zhu State: Accepted by CVPR2025. Scales mirror detection to large unlabeled data via an iterative self-training data engine.

	Farther Than Mirror: Explore Pattern-Compensated Depth of Mirror with Temporal Changes for Video Mirror DetectionVisual Perception Zhaohu Xing, Lihao Liu, Tian Ye, Sixiang Chen, Yijun Yang, Xiaojie Xu, Guang Liu, and Lei Zhu State: Accepted by ACM MM2025 Exploits temporal depth changes to detect mirrors in videos beyond appearance cues.

	SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image SegmentationMedical Image Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu and Lei Zhu State: Early Accepted by MICCAI2024. (Spotlight Presentation) Applies Mamba's long-range sequential modeling to enhance vision encoder.

	SegMamba-V2: Long-range Sequential Modeling Mamba For 3D Medical Image SegmentationMedical Image Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu and Lei Zhu State: Accepted by IEEE Transactions on Medical Imaging (TMI) Extends SegMamba with improved architecture for stronger 3D medical image segmentation.

	Cross-conditioned Diffusion Model for medical image-to-image translationMedical Image Zhaohu Xing, Sicheng Yang, Sixiang Chen, Tian Ye, Jing Qin, and Lei Zhu State: Accepted by MICCAI2024 Bridges unpaired medical image modalities via cross-conditioned diffusion generation.

	Diff-UNet: A Diffusion Embedded Network for Robust 3D Medical Image SegmentationMedical Image Zhaohu Xing, Huazhu Fu, Guang Yang, and Lei Zhu State: Accepted by Medical Image Analysis (MedIA). Embeds diffusion probabilistic models into U-Net for robust 3D medical segmentation.

	Hybrid Masked Image Modeling for 3D Medical Image SegmentationMedical Image Zhaohu Xing, Liang Wan, Lequan Yu, and Lei Zhu State: Accepted by IEEE Journal of Biomedical and Health Informatics (JBHI) 2024.

	NestedFormer: Nested Modality-Aware Transformer for Brain Tumor SegmentationMedical Image Zhaohu Xing, Lequan Yu, Liang Wan, and Lei Zhu State: Accepted by MICCAI2022. Designs a nested modality-aware transformer for multi-modal brain tumor segmentation.

	Temporal Prompt Learning with Depth Memory for Video Mirror DetectionVisual Perception Zhaohu Xing, Tian Ye, Sixiang Chen, and Lei Zhu State: Accepted by IEEE TMM. Leverages depth memory and temporal prompts for accurate video mirror detection.

	Non-Invasive to Invasive: Enhancing FFA Synthesis from CFP with a Benchmark Dataset and a Novel NetworkMedical Image Hongqiu Wang, Zhaohu Xing, Weitong Wu, Yijun Yang, Qingqing Tang, Yanwu Xu, and Lei Zhu State: Accepted by ACM MM Workshop (2024).* Synthesizes invasive FFA images from non-invasive CFP with a new paired benchmark.

	Cascaded Inner-Outer Clip Retformer for Ultrasound Video Object SegmentationMedical Image Jialu Li, Lei Zhu, Zhaohu Xing, Baoliang Zhao, Ying Hu, Faqin Lv, Qiong Wang State: Accepted by IEEE Journal of Biomedical and Health Informatics (JBHI) 2024. Segments ultrasound video objects via cascaded inner-outer clip transformers.

	PromptHaze: Prompting Real-world Dehazing via Depth Anything ModelImage Restoration Tian Ye, Sixiang Chen, Haoyu Chen, Wenhao Chai, Jingjing Ren, Zhaohu Xing, Wenxue Li, and Lei Zhu State: Accepted by AAAI24 Prompts depth estimation models to provide structural priors for real-world dehazing.

	AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image EnhancementImage Restoration Yunlong Lin, Tian Ye, Sixiang Chen, Zhenqi Fu, Yingying Wang, Wenhao Chai, Zhaohu Xing, Lei Zhu, and Xinghao Ding State: Accepted by AAAI24 Guides diffusion models with illumination priors for unsupervised low-light enhancement.

	Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything ConstraintImage Restoration Sixiang Chen, Tian Ye, Kai Zhang, Zhaohu Xing, Yunlong Lin, and Lei Zhu State: Accepted by ECCV2024 Tailors restoration prompts per degradation type with depth-aware constraints.

	Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?Medical Image Zhaohu Xing, and etc State: Accepted by NIPS2024 A large-scale benchmark to rigorously evaluate medical image segmentation algorithms.

	Timeline and Boundary Guided Diffusion Network for Video Shadow DetectionVisual Perception Haipeng Zhou, Hongqiu Wang, Tian Ye, Zhaohu Xing, Jun Ma, Ping Li, Qiong Wang, and Lei Zhu State: Accepted by ACM MM 2024 (Oral Presentation) Detects video shadows via timeline and boundary guided diffusion reasoning.

	Learning Diffusion Texture Priors for Image RestorationImage Restoration Tian Ye, Sixiang Chen, Wenhao Chai, Zhaohu Xing, Qing Qin, and Lei Zhu State: Accepted by CVPR2024 (spotlight). Learns diffusion-based texture priors to improve perceptual quality of image restoration.

	Anchored Supervised Contrastive Learning for Long-Tailed Medical Image RegressionMedical Image Zhaoying Li, Zhaohu Xing, Lei Zhu, and Liang Wan State: Accepted by PRCV2024. Anchors contrastive learning to handle long-tailed distributions in medical regression.

	Standing on the Giants: Informative Messenger Prompts with Self-adapter for ImageImage Restoration Sixiang Chen, Tian Ye, Haoyu Chen, Yijun Yang, Zhaohu Xing, Fugee Tsung, and Lei Zhu Submitted to ECCV2024 Transfers knowledge from large pretrained models via informative messenger prompts.

	Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal ReasoningMedical Image YingLing Lu, Yijun Yang, Zhaohu Xing, and Lei Zhu State: Early Accepted by MICCAI2024. Segments polyps in videos via multi-task diffusion with adversarial temporal reasoning.

	Vivim: a Video Vision Mamba for Medical Video Object SegmentationMedical Image Yijun Yang, Zhaohu Xing, and Lei Zhu State: Early Accepted by IEEE TSCVT.

	DiffMIC-v2: Medical Image Classification via Improved Diffusion NetworkMedical Image Yijun Yang, Huazhu Fu, Angelica I. Aviles-Rivero, Zhaohu Xing, Lei Zhu State: Accepted by IEEE TMI Improves diffusion-based medical image classification with a refined network design.

	Uncertainty-aware Multi-dimensional Mutual Learning for Brain and Brain Tumor SegmentationMedical Image Junting Zhao, Zhaohu Xing, and Lei Zhu State: Accepted by IEEE Journal of Biomedical and Health Informatics (JBHI) 2023. Jointly segments brain and tumor via uncertainty-aware multi-dimensional mutual learning.

	Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challengeMedical Image Zhaohu Xing, Lei Zhu, and etc. State: Accepted by Medical Image Analysis Benchmarks AI algorithms for pulmonary fibrosis imaging biomarker detection.

	SegRap2023 : A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal CarcinomaMedical Image Zhaohu Xing, Lei Zhu, and etc. State: Accepted by Medical Image Analysis Benchmarks organs-at-risk and tumor segmentation for nasopharyngeal carcinoma radiotherapy.

Template from Jon Barron
Last updated: 12/19, 2023