[ICLR 2026] Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

[📜 Paper] [⭐️Project Page] [🤗 Model] [🤗 Dataset]

⭐️ Introduction

While significant research has focused on developing embodied reasoning capabilities using Vision-Language Models (VLMs) or integrating advanced VLMs into Vision-Language-Action (VLA) models for end-to-end robot control, few studies directly address the critical gap between upstream VLM-based reasoning and downstream VLA policy learning. In this work, we take an initial step toward bridging embodied reasoning with VLA policy learning by introducing Vlaser -- a Vision-Language-Action Model with synergistic embodied reasoning capability, which is a foundational vision-language model designed to integrate high-level reasoning with low-level control for embodied agents. Built upon the high-quality Vlaser-6M dataset, Vlaser achieves state-of-the-art performance across a range of embodied reasoning benchmarks—including spatial reasoning, embodied grounding, embodied QA, and task planning. Furthermore, we systematically examine how different VLM initializations affect supervised VLA fine-tuning, offering novel insights into mitigating the domain shift between internet-scale pre-training data and embodied-specific policy learning data. Based on these insights, our approach achieves state-of-the-art results on the WidowX benchmark and competitive performance on the Google Robot benchmark.

🗞️ News

2025-10-13: 🤖 We release Vlaser VLM model (Vlaser-2B and Vlaser-8B) as well as VLA model (Vlaser-2B-VLA) on 🤗Vlaser.
2025-10-13: 🤖 We release the training and inference code of Vlaser VLM based on InternVL3.
2025-11-7: 🤖 We release the training and inference code of Vlaser VLA based on open-pi-zero.
2026-01-27: 🤖 Vlaser was accepted by ICLR 2026, congrats!
2026-02-15: 🤖 We release the data pipeline for in-domain data based on open-pi-zero.
2026-03-18: 🤖 We release the training dataset 🤗Vlaser-6M, which could help you train your own embodied brain! 🔥🔥🔥

📆 Todo

Release Vlaser-2B and Vlaser-8B ckpt for VLM embodied reasoning.
Release Vlaser-2B-VLA model for end-to-end robot control in SimplerEnv (WidowX and Google Robot) and RoboTwin 2.0.
Release the training and evaluation code for Vlaser VLMs.
Release the training and evaluation code for Vlaser VLAs.
Release the Dataset Generation Pipeline.
Release the Vlaser-6M Dataset.

Vlaser VLM Quick Start

Please refer to Vlaser_VLM for details.

Vlaser VLA Quick Start

For SimplerEnv, Please refer to Vlaser_VLA/Simpler for details. For RoboTwin 2.0, Please refer to Vlaser_VLA/RoboTwin for details.

Data pipeline Quick Start

Please refer to data-pipeline for details.

Vlaser-6M dataset Quick Start

You can download our training dataset 🤗Vlaser-6M, which is categorized into Robot_QA_data (for general Robot QA tasks), grounding_data (for 2D robot grounding tasks), planning_data (for robotic planning tasks) and spatial_data (for spatial intelligence tasks). Each dataset is composed of a *.jsonl file for multimodal annotation and a *.tar.gz file for images/videos.

🎫 License

This project is released under the MIT License.

🖊️ Citation

If you find this work helpful in your research, please consider giving this repo a star ⭐ and citing our paper:

@article{yang2025vlaser,
  title={Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning},
  author={Yang, Ganlin and Zhang, Tianyi and Hao, Haoran and Wang, Weiyun and Liu, Yibin and Wang, Dehui and Chen, Guanzhou and Cai, Zijian and Chen, Junting and Su, Weijie and others},
  journal={arXiv preprint arXiv:2510.11027},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Vlaser_VLA		Vlaser_VLA
Vlaser_VLM		Vlaser_VLM
data-pipeline		data-pipeline
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICLR 2026] Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

⭐️ Introduction

🗞️ News

📆 Todo

Vlaser VLM Quick Start

Vlaser VLA Quick Start

Data pipeline Quick Start

Vlaser-6M dataset Quick Start

🎫 License

🖊️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[ICLR 2026] Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

⭐️ Introduction

🗞️ News

📆 Todo

Vlaser VLM Quick Start

Vlaser VLA Quick Start

Data pipeline Quick Start

Vlaser-6M dataset Quick Start

🎫 License

🖊️ Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages