Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.19834
-
Updated
Mar 9, 2026 - Python
Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.19834
A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.
MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning
An official repository of "VAGUE: Visual Contexts Clarify Ambiguous Expressions"
Add a description, image, and links to the multimodal-reasoning-benchmarks topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-reasoning-benchmarks topic, visit your repo's landing page and select "manage topics."