Awesome Unified Multimodal Models
-
Updated
Mar 24, 2026
Awesome Unified Multimodal Models
A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.
A curated collection of research papers, models, and resources tracing the evolution from specialized models to unified world models.
Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.19834
UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
[CVPR 2026] GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models
Planning with unified multimodal models
Leaderboard for The Trinity of Consistency as a Defining Principle for General World Models
A unified multimodal generative AI system designed to learn and adapt across multiple modalities (text, audio, vision, robotics) with minimal data and long-term autonomy through reinforcement learning.
Add a description, image, and links to the unified-multimodal-models topic page so that developers can more easily learn about it.
To associate your repository with the unified-multimodal-models topic, visit your repo's landing page and select "manage topics."