Publications
Translational Biomedical AI Controllable Multimodal Generation Multimodal Perception and Understanding
Translational Biomedical AI
A Weakly Supervised Transformer for Rare Disease Diagnosis and Subphenotyping from EHRs with Pulmonary Case Studies
Translational Biomedical AI Co-first Author
Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction
Translational Biomedical AI Corresponding Author
CLINES: Clinical LLM-based Information Extraction and Structuring Agent
Translational Biomedical AI Co-first Author
MedSAM2: Segment Anything in 3D Medical Images and Videos
Translational Biomedical AI Co-first Author
Show full list (6 papers)
A Weakly Supervised Transformer for Rare Disease Diagnosis and Subphenotyping from EHRs with Pulmonary Case Studies
Translational Biomedical AI Co-first Author
Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction
Translational Biomedical AI Corresponding Author
Prompt-based multimodal representation learning for drug repurposing
Translational Biomedical AI Corresponding Author
CLINES: Clinical LLM-based Information Extraction and Structuring Agent
Translational Biomedical AI Co-first Author
MedSAM2: Segment Anything in 3D Medical Images and Videos
Translational Biomedical AI Co-first Author
Controllable Multimodal Generation
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
Controllable Multimodal Generation
BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
Controllable Multimodal Generation
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
Controllable Multimodal Generation
Insert Anything: Image Insertion via In-Context Editing in DiT
Controllable Multimodal Generation Oral
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
Controllable Multimodal Generation
3DIS: Depth-driven decoupled instance synthesis for text-to-image generation
Controllable Multimodal Generation Co-first Author Spotlight
MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis
Controllable Multimodal Generation Corresponding Author
Show full list (23 papers)
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
Controllable Multimodal Generation
BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
Controllable Multimodal Generation
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
Controllable Multimodal Generation
Insert Anything: Image Insertion via In-Context Editing in DiT
Controllable Multimodal Generation Oral
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
Controllable Multimodal Generation
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
Controllable Multimodal Generation
3DIS: Depth-driven decoupled instance synthesis for text-to-image generation
Controllable Multimodal Generation Co-first Author Spotlight
MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis
Controllable Multimodal Generation Corresponding Author
Controllable 3D Face Generation with Conditional Style Code Diffusion
Controllable Multimodal Generation Corresponding Author
DRIP: Unleashing Diffusion Priors for Joint Foreground and Alpha Prediction in Image Matting
Controllable Multimodal Generation
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting
Controllable Multimodal Generation
SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction
Controllable Multimodal Generation Corresponding Author Highlight
SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance
Controllable Multimodal Generation
Human101: Training 100+ FPS Human Gaussians in 100s from 1 View
Controllable Multimodal Generation
GD2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields
Controllable Multimodal Generation Corresponding Author
AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion
Controllable Multimodal Generation
Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction
Controllable Multimodal Generation
Efficient Emotional Adaptation for Audio-driven Talking-Head Generation
Controllable Multimodal Generation
TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering
Controllable Multimodal Generation
JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery
Controllable Multimodal Generation
Pyramid Diffusion Models For Low-light Image Enhancement
Controllable Multimodal Generation
Multimodal Perception and Understanding
Efficient training of large vision models via advanced automated progressive learning
Multimodal Perception and Understanding
SELongVLM: Empowering Long Video Language Models with Self-Corrective Clip Selection
Multimodal Perception and Understanding
Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions
Multimodal Perception and Understanding
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
Multimodal Perception and Understanding Co-first Author
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Multimodal Perception and Understanding First Author
Scalable Video Object Segmentation with Identification Mechanism
Multimodal Perception and Understanding First Author
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
Multimodal Perception and Understanding Corresponding Author Best Paper
Show full list (26 papers)
Efficient training of large vision models via advanced automated progressive learning
Multimodal Perception and Understanding
SELongVLM: Empowering Long Video Language Models with Self-Corrective Clip Selection
Multimodal Perception and Understanding
Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions
Multimodal Perception and Understanding
The devil is in temporal token: High quality video reasoning segmentation
Multimodal Perception and Understanding
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
Multimodal Perception and Understanding Co-first Author
Few-shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation
Multimodal Perception and Understanding
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Multimodal Perception and Understanding First Author
Scalable Video Object Segmentation with Identification Mechanism
Multimodal Perception and Understanding First Author
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
Multimodal Perception and Understanding Corresponding Author Best Paper
Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation
Multimodal Perception and Understanding
Video Object Segmentation in Panoptic Wild Scenes
Multimodal Perception and Understanding
Co-Learning Meets Stitch-Up for Noisy Multi-Label Visual Recognition
Multimodal Perception and Understanding
FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation
Multimodal Perception and Understanding
ProD: Prompting-to-disentangle Domain Knowledge for Cross-domain Few-shot Image Classification
Multimodal Perception and Understanding
Decompose to Generalize: Species-Generalized Animal Pose Estimation
Multimodal Perception and Understanding
Decoupling Features in Hierarchical Propagation for Video Object Segmentation
Multimodal Perception and Understanding First Author Spotlight
Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation
Multimodal Perception and Understanding
In-N-Out Generative Learning for Dense Unsupervised Video Segmentation
Multimodal Perception and Understanding
H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object Detection
Multimodal Perception and Understanding
Associating Objects with Transformers for Video Object Segmentation
Multimodal Perception and Understanding First Author
Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration
Multimodal Perception and Understanding First Author
DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency
Multimodal Perception and Understanding First Author
Collaborative Video Object Segmentation by Foreground-Background Integration
Multimodal Perception and Understanding First Author Spotlight
Gated Channel Transformation for Visual Recognition
Multimodal Perception and Understanding First Author
