Publications
Preprint
"*" means authors contributed equally and "#" means corresponding author.
Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions
Yunheng Li, Hengrui Zhang, Meng-Hao Guo, Wenzhao Gao, Shaoyong Jia, Shaohui Jiao, Qibin Hou#, Ming-Ming Cheng
Arxiv, 2026
GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation
Yuhao Wan, Lijuan Liu, Jingzhi Zhou, Zihan Zhou, Xuying Zhang, Dongbo Zhang, Shaohui Jiao, Qibin Hou#, Ming-Ming Cheng
Arxiv, 2025
Selected Journal Publications
Yolo-ms: rethinking multi-scale representation learning for real-time object detection
Yuming Chen, Xinbin Yuan, Ruiqi Wu, Jiabao Wang, Qibin Hou#, Ming-Ming Cheng
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 47(6), 4240-4252, 2025
Conv2former: A simple transformer-style convnet for visual recognition
Qibin Hou, Cheng-Ze Lu, Ming-Ming Cheng#, Jiashi Feng
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Camoformer: Masked separable attention for camouflaged object detection
Bowen Yin*, Xuying Zhang*, Deng-Ping Fan, Shaohui Jiao, Ming-Ming Cheng, Luc Van Gool, Qibin Hou#
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Vision permutator: A permutable mlp-like architecture for visual recognition
Qibin Hou, Zihang Jiang, Li Yuan, Ming-Ming Cheng, Shuicheng Yan, Jiashi Feng
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Deeply Supervised Salient Object Detection with Short Connections
Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, Philip Torr
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
Selected Conference Publications
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
Modi Jin, Yiming Zhang, Boyuan Sun, Dingwen Zhang, MingMing Cheng, Qibin Hou#
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Yunheng Li, Jing Cheng, Shaoyong Jia, Hangyi Kuang, Shaohui Jiao, Qibin Hou#, Ming-Ming Cheng
Neural Information Processing Systems (NeurIPS), 2025
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Xinbin Yuan, Jian Zhang, Kaixin Li, Zhuoxuan Cai, Lujian Yao, Jie Chen, Enguang Wang, Qibin Hou#, Jinwei Chen, Peng-Tao Jiang, Bo Li#
Neural Information Processing Systems (NeurIPS), 2025
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang*, Yutong Liu*, Yangguang Li, Renrui Zhang, Yufei Liu, Kai Wang, Wanli Ouyang, Zhiwei Xiong, Peng Gao, Qibin Hou#, Ming-Ming Cheng
IEEE International Conference on Computer Vision (ICCV), 2025
Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction
Yunheng Li, Yuxuan Li, Quansheng Zeng, Wenhai Wang, Qibin Hou#, Ming-Ming Cheng
IEEE International Conference on Computer Vision (ICCV), 2025
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng, Qibin Hou#
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Yupeng Zhou, Daquan Zhou#, Ming-Ming Cheng, Jiashi Feng, Qibin Hou#
Neural Information Processing Systems (NeurIPS), 2024
OPUS: Occupancy Prediction Using a Sparse Set
Jiabao Wang*, Zhaojiang Liu*, Qiang Meng, Liujiang Yan, Ke Wang, Jie Yang, Wei Liu, Qibin Hou#, Ming-Ming Cheng
Neural Information Processing Systems (NeurIPS), 2024
Dformer: Rethinking rgbd representation learning for semantic segmentation
Bowen Yin, Xuying Zhang, Zhongyu Li, Li Liu, Ming-Ming Cheng, Qibin Hou#
International Conference on Learning Representations (ICLR), 2024
SRFormer: Permuted Self-Attention for Single Image Super-Resolution
Yupeng Zhou, Zhen Li, Chun-Le Guo, Song Bai, Ming-Ming Cheng, Qibin Hou#
IEEE International Conference on Computer Vision (ICCV), 2023
Coordinate attention for efficient mobile network design
Qibin Hou, Daquan Zhou, Jiashi Feng
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021