Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control
Published in arxiv, 2026
We introduce a human-centric video world model that is conditioned on both tracked head pose and joint-level hand poses. For this purpose, we evaluate existing diffusion model conditioning strategies and propose an effective mechanism for 3D head and hand control, enabling dexterous hand-object interactions. We train a bidirectional video diffusion model teacher using this strategy and distill it into a causal, interactive system that generates egocentric virtual environments. We evaluate this generated reality system with human subjects and demonstrate improved task performance as well as a significantly higher level of perceived amount of control over the performed actions compared with relevant baselines.


