A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

circle_central_dot
GitHub
Knowledge out-of-domain testset
A row of dots is shown. Circle the dot that is in the middle by count (the one with an equal number of dots on each side).
First Frame
Last Frame
shape_outline_then_move
GitHub
Abstraction in-domain testset
The scene shows an analogy A→B→C :: D→?→? with two rows of shapes and arrows. On the top row, a filled trapezoid first becomes an outline-only trapezoid (step 1), then moves up by a small amount (step 2). On the bottom row, the heart starts filled. Apply the same two-step transformation: first convert it to outline-only style, then move it up by a small amount, keeping its shape and size the same while only the style and position change.
First Frame
Last Frame
find_keys_and_open_doors
GitHub
Spatiality training set
In the maze, the agent is the green circle. First move the agent to collect the key (diamond shape), then move the agent to the door (hollow rectangle). Use the shortest path for each movement. Show the complete movement step by step.
First Frame
Last Frame
multiple_occlusions_horizontal
GitHub
Transformation training set
The scene shows 3 objects arranged horizontally on the right side of the frame, with a dark rectangular mask initially positioned on the left side. Move the mask horizontally to the right in a continuous motion until it leaves the frame. As it moves, the mask passes in front of the objects, temporarily blocking them from view.
First Frame
Last Frame
identify_pentagons
GitHub
Perception out-of-domain testset
Multiple polygons are shown; exactly one of them is a pentagon (5 sides). Identify that pentagon and mark it with a red circle that expands from the inside out to encircle the shape. Do not change anything else.
First Frame
Last Frame

Inference Results

View All Results
Domino Chain Gap Analysis - Samples
00
01
02
03
04
Task Domains 1/5
Domino Chain Gap Analysis
Knowledge in-domain testset
Shape Outline Then Move
Abstraction in-domain testset
LEGO Construction
Spatiality in-domain testset
Separate Objects (Spinning)
Transformation in-domain testset
Identify Hollow Points
Perception in-domain testset
Prompt
Loading...
Ground Truth
First
First Frame
Final
Final Frame
Model Outputs
1/9
VBVR-Wan2.2
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V

Leaderboard

Reference
Strong Baseline
Proprietary
Open-source
Human
Human
97.4%
#1
VBVR
VBVR-Wan2.2
68.5%
#2
Sora 2
Sora 2
54.6%
#3
Veo 3.1
Veo 3.1
48.0%
#4
Runway
Runway Gen-4 Turbo
40.3%
#5
Wan2.2
Wan2.2-I2V-A14B
37.1%
#6
Kling
Kling 2.6
36.9%
#7
LTX-2
LTX-2
31.3%
#8
CogVideoX
CogVideoX1.5-5B-I2V
27.3%
#9
HunyuanVideo
HunyuanVideo-I2V
27.3%
#9