Xiang An
Xiang An (Chinese: 安翔) is a Research Scientist and Team Lead of the Multimodal Large Model Group at GlintLab, specializing in computer vision and multimodal large models. You can find his research on Google Scholar (with citations) and his open-source projects on GitHub (with a total of 34,177+ stars). His current research focuses on building the next-generation Vision Transformer (ViT) to address urgent needs in modern MLLMs. He is also the #2 contributor to the InsightFace ecosystem (~27k⭐).
For a complete list of publications, see All Publications.
Publications §
The following is a selection of notable publications. For a complete list, see All Publications.
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
Preprint 2026
Paper
Code
Homepage
Bilibili
YouTube
Introduces codec-aligned sparsity as a foundational principle for multimodal intelligence. Focuses exclusively on 3.1%-25% of regions rich in signal entropy, achieving 4.1% average improvement over Qwen3-ViT on video understanding tasks
Feilong Tang, Xiang An, Yunyao Yan, Yin Xie, Bin Qin, Kaicheng Yang, Yifei Shen, Yuanhan Zhang, Chunyuan Li, Shikun Feng, Changrui Chen, Huajie Tan, Ming Hu, Manyuan Zhang, Bo Li, Ziyong Feng, Ziwei Liu, Zongyuan Ge, Jiankang Deng
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
Preprint 2025
Paper
Code
搜狐科技
Fully open-source code, data, checkpoints and training logs; Provided a better open-source ViT; Proved the idea that simple scaling dense captions would improve overall multimodal tasks performance
Xiang An, Yin Xie, Kaicheng Yang, Wenkang Zhang, Xiuwei Zhao, Zheng Cheng, Changrui Chen, Zizhen Yan, Ziyong Feng, Ziwei Liu, Bo Li, Jiankang Deng, et al.
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
AAAI 2026 (Oral)
Code
Tiancheng Gu, Kaicheng Yang, Kaichen Zhang, Xiang An, Ziyong Feng, Yueyi Zhang, Weidong Cai, Jiankang Deng, Lidong Bing
Region-based Cluster Discrimination for Visual Representation Learning
ICCV 2025 (Highlight)
Code
Novel approach to self-supervised learning by introducing region-based cluster discrimination.
Yin Xie, Kaicheng Yang, Xiang An (Project Leader), Kun Wu, Yongle Zhao, Weimo Deng, Zimin Ran, Yumeng Wang, Ziyong Feng, Jiankang Deng
Multi-label Cluster Discrimination for Visual Representation Learning
ECCV 2024
Code
Transformers
MLCD-Seg
Multi-label cluster discrimination framework for self-supervised visual representation learning
Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jiankang Deng
Unicom: Universal and Compact Representation Learning for Image Retrieval
ICLR 2023
Code
Universal and compact representation learning framework for large-scale image retrieval. Foundation for scalable image retrieval systems
Xiang An, Jiankang Deng, Kaicheng Yang, Jiawei Li, Ziyong Feng, Jia Guo, Jing Yang, Tongliang Liu
Killing Two Birds with One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC
CVPR 2022
Code
MXNet
PyTorch
知乎
Enabling training of 10 million identities on a single machine through innovative Partial FC approach
Xiang An, Jiankang Deng, Jia Guo, Ziyong Feng, Xuhan Zhu, Jing Yang, Tongliang Liu
Partial FC: Training 10 Million Identities on a Single Machine
ICCVW 2021
Code
MXNet
PyTorch
知乎
Xiang An, Xuhan Zhu, Yuan Gao, Yang Xiao, Yongle Zhao, Ziyong Feng, Lan Wu, Bin Qin, Ming Zhang, Debing Zhang, Ying Fu
Awards & Competitions §
- ICCV 2025 Outstanding Reviewer
- CVPR 2024 Outstanding Reviewer
- Ranked 1st in NIST FRVT Competition, Visa Track 1:1
- 2024 中国年度力量人物提名
- Ranked 1st in the graduate entrance examination (major)
- First Place in Vehicle Re-Identification, PRCV 2019
Open Source §
InsightFace
Open Source Library
#2 contributor to the open-source 2D & 3D deep face analysis library. Author of Glint360K (the largest open-source face recognition training dataset) and Partial FC (enabling training 10 million identities on a single machine). Also organized the ICCV 2021 Workshop on masked face recognition challenge.
LLaVA-OneVision-1.5
Multimodal LLM Framework
Team Leader of this fully open framework designed to democratize multimodal training. Released mid-training and instruct data for community use, and developed offline sampling pack for efficient training. Implemented RiceViT with native resolution support.
OneVision-Encoder
Vision Encoder
Lead author of this next-generation vision encoder that introduces codec-aligned sparsity as a foundational principle for multimodal intelligence. Achieves state-of-the-art performance on 16 image, video, and document understanding benchmarks while using substantially fewer visual tokens. Demonstrates 4.1% average improvement over Qwen3-ViT on video understanding tasks.
UNICOM
Image Retrieval Framework
Lead author and maintainer of Universal and Compact Representation Learning framework for universal image representations. Designed the novel cluster discrimination approach for representation learning. Developed the multi-label and region-based extensions (published at ECCV 2024 and ICCV 2025 (Highlight)).
LLaVA-NeXT
Large Multimodal Model
Vision module contributor to the next-generation large multimodal model. Enhanced the OCR capability of the vision module for better text recognition in images. Optimized the visual encoder for processing text-rich and document images.
Urban Seg
Educational Project
Author and maintainer of this educational project for semantic segmentation on remote sensing and satellite imagery. Designed a simple single-file training approach for accessibility and integrated popular pretrained models. Created comprehensive tutorials and documentation for beginners.
External Links §
This page is styled after Wikipedia.