Zhaohu Xing 邢兆虎
VIP Lab, |
|
I'm Zhaohu Xing (邢兆虎), a third-year Ph.D. candidate at HKUST-GZ, supervised by Prof. Lei Zhu and Prof. Fugee Tsung. Before that, I received my Master's degree from Tianjin University.
My research focuses on video generation, visual perception, and medical vision foundation models. Recently, I'm particularly interested in RL-based post-training (e.g., GRPO) for multimodal models and agentic video synthesis.
I have built hands-on experience in (i) video generation & multimodal models — agentic multi-shot video generation and human-centric video captioning via RL training; (ii) visual perception — large-scale mirror/reflection detection with iterative data engines; and (iii) medical vision foundation models — long-range sequential modeling with Mamba for medical image/video segmentation (SegMamba, Vivim).
My work has appeared at CVPR (2025; 2024 Spotlight), NeurIPS (2024), ECCV (2024), MICCAI (2024 Spotlight), ACM MM (2024 Oral; 2025), IEEE TMM/TMI, and more.
|
AgentShot: Towards Agentic Multi-Shot Video Generation and BenchmarkingVideo Generation
State: Submitted to ECCV 2026.
Proposes an agentic framework for generating multi-shot videos with consistent characters and scenes.
|
|
HVCap: Human-centric Video Captioning via Factorized GRPO and a New BenchmarkMultimodal Models
State: Submitted to ICML 2026.
Introduces factorized GRPO for human-centric video captioning with a new benchmark.
|
|
Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data EngineVisual Perception
State: Accepted by CVPR2025.
Scales mirror detection to large unlabeled data via an iterative self-training data engine.
|
|
Farther Than Mirror: Explore Pattern-Compensated Depth of Mirror with Temporal Changes for Video Mirror DetectionVisual Perception
State: Accepted by ACM MM2025
Exploits temporal depth changes to detect mirrors in videos beyond appearance cues.
|
|
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image SegmentationMedical Image
State: Early Accepted by MICCAI2024. (Spotlight Presentation)
Applies Mamba's long-range sequential modeling to enhance vision encoder.
|
|
SegMamba-V2: Long-range Sequential Modeling Mamba For 3D Medical Image SegmentationMedical Image
State: Accepted by IEEE Transactions on Medical Imaging (TMI)
Extends SegMamba with improved architecture for stronger 3D medical image segmentation.
|
|
Cross-conditioned Diffusion Model for medical image-to-image translationMedical Image
State: Accepted by MICCAI2024
Bridges unpaired medical image modalities via cross-conditioned diffusion generation.
|
|
Diff-UNet: A Diffusion Embedded Network for Robust 3D Medical Image SegmentationMedical Image
State: Accepted by Medical Image Analysis (MedIA).
Embeds diffusion probabilistic models into U-Net for robust 3D medical segmentation.
|
|
Hybrid Masked Image Modeling for 3D Medical Image SegmentationMedical Image
State: Accepted by IEEE Journal of Biomedical and Health Informatics (JBHI) 2024.
|
|
NestedFormer: Nested Modality-Aware Transformer for Brain Tumor SegmentationMedical Image
State: Accepted by MICCAI2022.
Designs a nested modality-aware transformer for multi-modal brain tumor segmentation.
|
|
Temporal Prompt Learning with Depth Memory for Video Mirror DetectionVisual Perception
State: Accepted by IEEE TMM.
Leverages depth memory and temporal prompts for accurate video mirror detection.
|
|
Non-Invasive to Invasive: Enhancing FFA Synthesis from CFP with a Benchmark Dataset and a Novel NetworkMedical Image
State: Accepted by ACM MM Workshop (2024).
Synthesizes invasive FFA images from non-invasive CFP with a new paired benchmark.
|
|
Cascaded Inner-Outer Clip Retformer for Ultrasound Video Object SegmentationMedical Image
State: Accepted by IEEE Journal of Biomedical and Health Informatics (JBHI) 2024.
Segments ultrasound video objects via cascaded inner-outer clip transformers.
|
|
PromptHaze: Prompting Real-world Dehazing via Depth Anything ModelImage Restoration
State: Accepted by AAAI24
Prompts depth estimation models to provide structural priors for real-world dehazing.
|
|
AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image EnhancementImage Restoration
State: Accepted by AAAI24
Guides diffusion models with illumination priors for unsupervised low-light enhancement.
|
|
Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything ConstraintImage Restoration
State: Accepted by ECCV2024
Tailors restoration prompts per degradation type with depth-aware constraints.
|
|
Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?Medical Image
State: Accepted by NIPS2024
A large-scale benchmark to rigorously evaluate medical image segmentation algorithms.
|
|
Timeline and Boundary Guided Diffusion Network for Video Shadow DetectionVisual Perception
State: Accepted by ACM MM 2024 (Oral Presentation)
Detects video shadows via timeline and boundary guided diffusion reasoning.
|
|
Learning Diffusion Texture Priors for Image RestorationImage Restoration
State: Accepted by CVPR2024 (spotlight).
Learns diffusion-based texture priors to improve perceptual quality of image restoration.
|
|
Anchored Supervised Contrastive Learning for Long-Tailed Medical Image RegressionMedical Image
State: Accepted by PRCV2024.
Anchors contrastive learning to handle long-tailed distributions in medical regression.
|
|
Standing on the Giants: Informative Messenger Prompts with Self-adapter for ImageImage Restoration
Submitted to ECCV2024
Transfers knowledge from large pretrained models via informative messenger prompts.
|
|
Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal ReasoningMedical Image
State: Early Accepted by MICCAI2024.
Segments polyps in videos via multi-task diffusion with adversarial temporal reasoning.
|
|
Vivim: a Video Vision Mamba for Medical Video Object SegmentationMedical Image
State: Early Accepted by IEEE TSCVT.
|
|
DiffMIC-v2: Medical Image Classification via Improved Diffusion NetworkMedical Image
State: Accepted by IEEE TMI
Improves diffusion-based medical image classification with a refined network design.
|
|
Uncertainty-aware Multi-dimensional Mutual Learning for Brain and Brain Tumor SegmentationMedical Image
State: Accepted by IEEE Journal of Biomedical and Health Informatics (JBHI) 2023.
Jointly segments brain and tumor via uncertainty-aware multi-dimensional mutual learning.
|
|
Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challengeMedical Image
State: Accepted by Medical Image Analysis
Benchmarks AI algorithms for pulmonary fibrosis imaging biomarker detection.
|
|
SegRap2023 : A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal CarcinomaMedical Image
State: Accepted by Medical Image Analysis
Benchmarks organs-at-risk and tumor segmentation for nasopharyngeal carcinoma radiotherapy.
|