Publications

Translational Biomedical AI Controllable Multimodal Generation Multimodal Perception and Understanding

Translational Biomedical AI

Selected 5 / Total 6

A Weakly Supervised Transformer for Rare Disease Diagnosis and Subphenotyping from EHRs with Pulmonary Case Studies

Nature Partner Journal Digital Medicine • 2026

Kimberly F. Greco*, Zongxin Yang*, Mengyan Li, Han Tong, Sara Morini Sweet, Alon Geva, Kenneth D. Mandl, Benjamin A. Raby, Tianxi Cai

Translational Biomedical AI Co-first Author

Paper

Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction

arXiv 2026 • 2026

Jiafa Ruan, Ruijie Quan, Zongxin Yang✉, Liyang Xu, Yi Yang

Translational Biomedical AI Corresponding Author

Paper

CLINES: Clinical LLM-based Information Extraction and Structuring Agent

Preprint • 2025

Zongxin Yang*, Hongyi Yuan*, Raheel Sayeed*, Amelia Li Min Tan, Enci Cai, Mohammed Moro, Xiudi Li, Huaiyuan Ying, Nicholas Brown, Griffin Weber, and others

Translational Biomedical AI Co-first Author

Paper

MedSAM2: Segment Anything in 3D Medical Images and Videos

Preprint • 2025

Jun Ma*, Zongxin Yang*, Sumin Kim, Bihui Chen, Mohammed Baharoon, Adibvafa Fallahpour, Reza Asakereh, Hongwei Lyu, Bo Wang

Translational Biomedical AI Co-first Author

Paper Code

X-Field: A Physically Grounded Representation for 3D X-ray Reconstruction

NeurIPS 2025 (Spotlight) • 2025

Feiran Wang, Jiachen Tao, Junyi Wu, Haoxuan Wang, Bin Duan, Kai Wang, Zongxin Yang, Yan Yan

Translational Biomedical AI Spotlight

Paper Code

Show full list (6 papers)

A Weakly Supervised Transformer for Rare Disease Diagnosis and Subphenotyping from EHRs with Pulmonary Case Studies

Nature Partner Journal Digital Medicine • 2026

Kimberly F. Greco*, Zongxin Yang*, Mengyan Li, Han Tong, Sara Morini Sweet, Alon Geva, Kenneth D. Mandl, Benjamin A. Raby, Tianxi Cai

Translational Biomedical AI Co-first Author

Paper

Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction

arXiv 2026 • 2026

Jiafa Ruan, Ruijie Quan, Zongxin Yang✉, Liyang Xu, Yi Yang

Translational Biomedical AI Corresponding Author

Paper

Prompt-based multimodal representation learning for drug repurposing

Briefings in Bioinformatics 2025 • 2025

Jinliang Liu, Kaicheng U, Dhruv Rana, Sophia Meixuan Zhang, Jiahui Yu, Sen Yang, Bo Jin, Xiyue Wang, Zongxin Yang✉, Hongping Tang, and others

Translational Biomedical AI Corresponding Author

Paper

CLINES: Clinical LLM-based Information Extraction and Structuring Agent

Preprint • 2025

Zongxin Yang*, Hongyi Yuan*, Raheel Sayeed*, Amelia Li Min Tan, Enci Cai, Mohammed Moro, Xiudi Li, Huaiyuan Ying, Nicholas Brown, Griffin Weber, and others

Translational Biomedical AI Co-first Author

Paper

MedSAM2: Segment Anything in 3D Medical Images and Videos

Preprint • 2025

Jun Ma*, Zongxin Yang*, Sumin Kim, Bihui Chen, Mohammed Baharoon, Adibvafa Fallahpour, Reza Asakereh, Hongwei Lyu, Bo Wang

Translational Biomedical AI Co-first Author

Paper Code

X-Field: A Physically Grounded Representation for 3D X-ray Reconstruction

NeurIPS 2025 (Spotlight) • 2025

Feiran Wang, Jiachen Tao, Junyi Wu, Haoxuan Wang, Bin Duan, Kai Wang, Zongxin Yang, Yan Yan

Translational Biomedical AI Spotlight

Paper Code

Controllable Multimodal Generation

Selected 9 / Total 23

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

arXiv 2026 • 2026

Dewei Zhou, You Li, Zongxin Yang, Yi Yang

Controllable Multimodal Generation

Paper Code Project

Are Image-to-Video Models Good Zero-Shot Image Editors?

CVPR 2026 • 2026

Zechuan Zhang, Zhenyuan Chen, Zongxin Yang, Yi Yang

Controllable Multimodal Generation

Paper

BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment

ICLR 2026 • 2026

Dewei Zhou, Mingwei Li, Zongxin Yang, Yu Lu, Yunqiu Xu, Zhizhong Wang, Zeyi Huang, Yi Yang

Controllable Multimodal Generation

Paper

Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models

ICLR 2026 • 2026

Ruisi Zhao, Haoren Zheng, Zongxin Yang, Hehe Fan, Yi Yang

Controllable Multimodal Generation

Paper

Insert Anything: Image Insertion via In-Context Editing in DiT

AAAI 2026 (Oral) • 2026

Wensong Song, Hong Jiang, Zongxin Yang, Ruijie Quan, Yi Yang

Controllable Multimodal Generation Oral

Paper Code Project

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

NeurIPS 2025 • 2025

Zechuan Zhang, Ji Xie, Yu Lu, Zongxin Yang, Yi Yang

Controllable Multimodal Generation

Paper Code Project

3DIS: Depth-driven decoupled instance synthesis for text-to-image generation

ICLR 2025 (Spotlight) • 2025

Dewei Zhou*, Ji Xie*, Zongxin Yang*, Yi Yang

Controllable Multimodal Generation Co-first Author Spotlight

Paper Code

MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

TPAMI 2025 • 2025

Dewei Zhou, You Li, Fan Ma, Zongxin Yang✉, Yi Yang

Controllable Multimodal Generation Corresponding Author

Paper Code

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

CVPR 2024 (Highlight) • 2024

Zechuan Zhang, Zongxin Yang✉, Yi Yang

Controllable Multimodal Generation Corresponding Author Highlight

Paper Code Project

Show full list (23 papers)

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

arXiv 2026 • 2026

Dewei Zhou, You Li, Zongxin Yang, Yi Yang

Controllable Multimodal Generation

Paper Code Project

Are Image-to-Video Models Good Zero-Shot Image Editors?

CVPR 2026 • 2026

Zechuan Zhang, Zhenyuan Chen, Zongxin Yang, Yi Yang

Controllable Multimodal Generation

Paper

BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment

ICLR 2026 • 2026

Dewei Zhou, Mingwei Li, Zongxin Yang, Yu Lu, Yunqiu Xu, Zhizhong Wang, Zeyi Huang, Yi Yang

Controllable Multimodal Generation

Paper

Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models

ICLR 2026 • 2026

Ruisi Zhao, Haoren Zheng, Zongxin Yang, Hehe Fan, Yi Yang

Controllable Multimodal Generation

Paper

Insert Anything: Image Insertion via In-Context Editing in DiT

AAAI 2026 (Oral) • 2026

Wensong Song, Hong Jiang, Zongxin Yang, Ruijie Quan, Yi Yang

Controllable Multimodal Generation Oral

Paper Code Project

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

NeurIPS 2025 • 2025

Zechuan Zhang, Ji Xie, Yu Lu, Zongxin Yang, Yi Yang

Controllable Multimodal Generation

Paper Code Project

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

ICCV 2025 • 2025

Dewei Zhou, Mingwei Li, Zongxin Yang, Yi Yang

Controllable Multimodal Generation

Paper Code

3DIS: Depth-driven decoupled instance synthesis for text-to-image generation

ICLR 2025 (Spotlight) • 2025

Dewei Zhou*, Ji Xie*, Zongxin Yang*, Yi Yang

Controllable Multimodal Generation Co-first Author Spotlight

Paper Code

MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

TPAMI 2025 • 2025

Dewei Zhou, You Li, Fan Ma, Zongxin Yang✉, Yi Yang

Controllable Multimodal Generation Corresponding Author

Paper Code

Controllable 3D Face Generation with Conditional Style Code Diffusion

AAAI 2024 • 2024

Xiaolong Shen, Jianxin Ma, Chang Zhou, Zongxin Yang✉

Controllable Multimodal Generation Corresponding Author

Paper Code

DRIP: Unleashing Diffusion Priors for Joint Foreground and Alpha Prediction in Image Matting

NeurIPS 2024 • 2024

Xiaodi Li, Zongxin Yang, Ruijie Quan, Yi Yang

Controllable Multimodal Generation

Paper Code

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

ECCV 2024 • 2024

Zhenglin Zhou, Fan Ma, Hehe Fan, Zongxin Yang, Yi Yang

Controllable Multimodal Generation

Paper Code

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

CVPR 2024 (Highlight) • 2024

Zechuan Zhang, Zongxin Yang✉, Yi Yang

Controllable Multimodal Generation Corresponding Author Highlight

Paper Code Project

SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance

ECCV 2024 Workshops • 2024

Yuanyou Xu, Zongxin Yang, Yi Yang

Controllable Multimodal Generation

Paper Code Project

Human101: Training 100+ FPS Human Gaussians in 100s from 1 View

Preprint • 2024

Mingwei Li, Jiachen Tao, Zongxin Yang, Yi Yang

Controllable Multimodal Generation

Paper Code

GD²-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

Preprint • 2024

Xiao Pan, Zongxin Yang✉, Shuai Bai, Yi Yang

Controllable Multimodal Generation Corresponding Author

Paper

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

ACM MM 2023 • 2023

Shuo Huang, Zongxin Yang, Liangting Li, Yi Yang, Jia Jia

Controllable Multimodal Generation

Paper Code Project

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

NeurIPS 2023 • 2023

Zechuan Zhang, Li Sun, Zongxin Yang, Lin Chen, Yi Yang

Controllable Multimodal Generation

Paper Code Project

Efficient Emotional Adaptation for Audio-driven Talking-Head Generation

ICCV 2023 • 2023

Yuan Gan, Zongxin Yang, Xihang Yue, Lingyun Sun, Yi Yang

Controllable Multimodal Generation

Paper Code

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

ICCV 2023 • 2023

Xiao Pan, Zongxin Yang, Jianxin Ma, Chang Zhou, Yi Yang

Controllable Multimodal Generation

Paper Code Project

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

ICCV 2023 • 2023

Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

Controllable Multimodal Generation

Paper Code

Pyramid Diffusion Models For Low-light Image Enhancement

IJCAI 2023 • 2023

Dewei Zhou, Zongxin Yang, Yi Yang

Controllable Multimodal Generation

Paper Code

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

CVPR 2023 • 2023

Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

Controllable Multimodal Generation

Paper Code

Multimodal Perception and Understanding

Selected 8 / Total 26

Efficient training of large vision models via advanced automated progressive learning

TPAMI 2026 • 2026

Changlin Li, Jiawei Zhang, Sihao Lin, Zongxin Yang, Junwei Liang, Xiaodan Liang, Xiaojun Chang

Multimodal Perception and Understanding

Paper Code

SELongVLM: Empowering Long Video Language Models with Self-Corrective Clip Selection

TPAMI 2026 • 2026

Kecheng Zhang, Zongxin Yang, Mingfei Han, Yunzhi Zhuge, Haihong Hao, Changlin Li, Zhihui Li, Xiaojun Chang

Multimodal Perception and Understanding

Paper

Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions

ICLR 2026 • 2026

Kecheng Zhang, Zongxin Yang, Mingfei Han, Haihong Hao, Yunzhi Zhuge, Changlin Li, Junhan Zhao, Zhihui Li, Xiaojun Chang

Multimodal Perception and Understanding

Paper

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

ICLR 2025 • 2025

Haomiao Xiong*, Zongxin Yang*, Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Jiawen Zhu, Huchuan Lu

Multimodal Perception and Understanding Co-first Author

Paper Code

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

ICML 2024 • 2024

Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, Yi Yang

Multimodal Perception and Understanding First Author

Paper Code Project

Scalable Video Object Segmentation with Identification Mechanism

TPAMI 2024 • 2024

Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang

Multimodal Perception and Understanding First Author

Paper Code

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

ACM MM 2023 • 2023

Kexin Li, Zongxin Yang✉, Lei Chen, Yi Yang, Jun Xiao

Multimodal Perception and Understanding Corresponding Author Best Paper

Paper Code

Associating Objects with Transformers for Video Object Segmentation

NeurIPS 2021 (Score 8/8/8/7) • 2021

Zongxin Yang, Yunchao Wei, Yi Yang

Multimodal Perception and Understanding First Author

Paper Code Review

Show full list (26 papers)

Efficient training of large vision models via advanced automated progressive learning

TPAMI 2026 • 2026

Changlin Li, Jiawei Zhang, Sihao Lin, Zongxin Yang, Junwei Liang, Xiaodan Liang, Xiaojun Chang

Multimodal Perception and Understanding

Paper Code

SELongVLM: Empowering Long Video Language Models with Self-Corrective Clip Selection

TPAMI 2026 • 2026

Kecheng Zhang, Zongxin Yang, Mingfei Han, Yunzhi Zhuge, Haihong Hao, Changlin Li, Zhihui Li, Xiaojun Chang

Multimodal Perception and Understanding

Paper

Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions

ICLR 2026 • 2026

Kecheng Zhang, Zongxin Yang, Mingfei Han, Haihong Hao, Yunzhi Zhuge, Changlin Li, Junhan Zhao, Zhihui Li, Xiaojun Chang

Multimodal Perception and Understanding

Paper

The devil is in temporal token: High quality video reasoning segmentation

CVPR 2025 • 2025

Sitong Gong, Yunzhi Zhuge, Lu Zhang, Zongxin Yang, Pingping Zhang, Huchuan Lu

Multimodal Perception and Understanding

Paper Code

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

ICLR 2025 • 2025

Haomiao Xiong*, Zongxin Yang*, Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Jiawen Zhu, Huchuan Lu

Multimodal Perception and Understanding Co-first Author

Paper Code

Few-shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation

AAAI 2025 • 2025

Jingqian Xiu, Mengze Li, Zongxin Yang, Wei Ji, Yifang Yin, Roger Zimmermann

Multimodal Perception and Understanding

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

ICML 2024 • 2024

Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, Yi Yang

Multimodal Perception and Understanding First Author

Paper Code Project

Scalable Video Object Segmentation with Identification Mechanism

TPAMI 2024 • 2024

Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang

Multimodal Perception and Understanding First Author

Paper Code

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

ACM MM 2023 • 2023

Kexin Li, Zongxin Yang✉, Lei Chen, Yi Yang, Jun Xiao

Multimodal Perception and Understanding Corresponding Author Best Paper

Paper Code

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

ICCV 2023 • 2023

Yuanyou Xu, Zongxin Yang, Yi Yang

Multimodal Perception and Understanding

Paper Code

Segment and Track Anything

Tech. Report • 2023

Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang

Multimodal Perception and Understanding Project Leader

Paper Code

Video Object Segmentation in Panoptic Wild Scenes

IJCAI 2023 • 2023

Yuanyou Xu, Zongxin Yang, Yi Yang

Multimodal Perception and Understanding

Paper Code

Co-Learning Meets Stitch-Up for Noisy Multi-Label Visual Recognition

TIP 2023 • 2023

Chao Liang, Zongxin Yang, Linchao Zhu, Yi Yang

Multimodal Perception and Understanding

Paper Code

FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation

CVPR 2023 • 2023

Jiaxu Miao, Zongxin Yang, Leilei Fan, Yi Yang

Multimodal Perception and Understanding

Paper Code

ProD: Prompting-to-disentangle Domain Knowledge for Cross-domain Few-shot Image Classification

CVPR 2023 • 2023

Tianyi Ma, Yifan Sun, Zongxin Yang, Yi Yang

Multimodal Perception and Understanding

Paper

Decompose to Generalize: Species-Generalized Animal Pose Estimation

ICLR 2023 • 2023

Guangrui Li, Yifan Sun, Zongxin Yang, Yi Yang

Multimodal Perception and Understanding

Paper

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

NeurIPS 2022 (Spotlight) • 2022

Zongxin Yang, Yi Yang

Multimodal Perception and Understanding First Author Spotlight

Paper Code

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

ECCV 2022 • 2022

Feng Zhu, Zongxin Yang, Yunchao Wei, Xin Yu, Yi Yang

Multimodal Perception and Understanding

Paper Code

In-N-Out Generative Learning for Dense Unsupervised Video Segmentation

ACM MM 2022 • 2022

Xiao Pan, Peike Li, Zongxin Yang, Huiling Zhou, Chang Zhou, Hongxia Yang, Jingren Zhou, Yi Yang

Multimodal Perception and Understanding

Paper Code

H²FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object Detection

CVPR 2022 • 2022

Yunqiu Xu, Yifan Sun, Zongxin Yang, Jiaxu Miao, Yi Yang

Multimodal Perception and Understanding

Paper Code

Associating Objects with Transformers for Video Object Segmentation

NeurIPS 2021 (Score 8/8/8/7) • 2021

Zongxin Yang, Yunchao Wei, Yi Yang

Multimodal Perception and Understanding First Author

Paper Code Review

Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration

TPAMI 2021 • 2021

Zongxin Yang, Yunchao Wei, Yi Yang

Multimodal Perception and Understanding First Author

Paper Code

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency

CVPR 2021 • 2021

Zongxin Yang, Xin Yu, Yi Yang

Multimodal Perception and Understanding First Author

Paper

Collaborative Video Object Segmentation by Foreground-Background Integration

ECCV 2020 (Spotlight) • 2020

Zongxin Yang, Yunchao Wei, Yi Yang

Multimodal Perception and Understanding First Author Spotlight

Paper Code

Gated Channel Transformation for Visual Recognition

CVPR 2020 • 2020

Zongxin Yang, Linchao Zhu, Yu Wu, Yi Yang

Multimodal Perception and Understanding First Author

Paper Code

Very Long Natural Scenery Image Prediction by Outpainting

ICCV 2019 • 2019

Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, Shuicheng Yan

Multimodal Perception and Understanding First Author

Paper Code

Dr. Zongxin Yang

Publications

Translational Biomedical AI

A Weakly Supervised Transformer for Rare Disease Diagnosis and Subphenotyping from EHRs with Pulmonary Case Studies

Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction

CLINES: Clinical LLM-based Information Extraction and Structuring Agent

MedSAM2: Segment Anything in 3D Medical Images and Videos

X-Field: A Physically Grounded Representation for 3D X-ray Reconstruction

A Weakly Supervised Transformer for Rare Disease Diagnosis and Subphenotyping from EHRs with Pulmonary Case Studies

Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction

Prompt-based multimodal representation learning for drug repurposing

CLINES: Clinical LLM-based Information Extraction and Structuring Agent

MedSAM2: Segment Anything in 3D Medical Images and Videos

X-Field: A Physically Grounded Representation for 3D X-ray Reconstruction

Controllable Multimodal Generation

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

Are Image-to-Video Models Good Zero-Shot Image Editors?

BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment

Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models

Insert Anything: Image Insertion via In-Context Editing in DiT

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

3DIS: Depth-driven decoupled instance synthesis for text-to-image generation

MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

Are Image-to-Video Models Good Zero-Shot Image Editors?

BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment

Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models

Insert Anything: Image Insertion via In-Context Editing in DiT

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

3DIS: Depth-driven decoupled instance synthesis for text-to-image generation

MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

Controllable 3D Face Generation with Conditional Style Code Diffusion

DRIP: Unleashing Diffusion Priors for Joint Foreground and Alpha Prediction in Image Matting

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance

Human101: Training 100+ FPS Human Gaussians in 100s from 1 View

GD2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

Efficient Emotional Adaptation for Audio-driven Talking-Head Generation

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

Pyramid Diffusion Models For Low-light Image Enhancement

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

Multimodal Perception and Understanding

Efficient training of large vision models via advanced automated progressive learning

SELongVLM: Empowering Long Video Language Models with Self-Corrective Clip Selection

Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

Scalable Video Object Segmentation with Identification Mechanism

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

Associating Objects with Transformers for Video Object Segmentation

Efficient training of large vision models via advanced automated progressive learning

SELongVLM: Empowering Long Video Language Models with Self-Corrective Clip Selection

Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions

The devil is in temporal token: High quality video reasoning segmentation

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

Few-shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

Scalable Video Object Segmentation with Identification Mechanism

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

Segment and Track Anything

Video Object Segmentation in Panoptic Wild Scenes

Co-Learning Meets Stitch-Up for Noisy Multi-Label Visual Recognition

FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation

ProD: Prompting-to-disentangle Domain Knowledge for Cross-domain Few-shot Image Classification

Decompose to Generalize: Species-Generalized Animal Pose Estimation

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

In-N-Out Generative Learning for Dense Unsupervised Video Segmentation

H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object Detection

Associating Objects with Transformers for Video Object Segmentation

Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency

Collaborative Video Object Segmentation by Foreground-Background Integration

GD²-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

H²FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object Detection