Simulation Distillation
Pretraining World Models in Simulation for Rapid Real-World Adaptation
3D Planning Demo
SimDist plans in a latent world model pretrained in simulation. Below we reconstruct and visualize the latent plans — play with them yourself!
The Simulation Distillation Pipeline
SimDist extracts structural priors from the simulator by training a latent world model on a large-scale, mixed quality dataset. We then plan over the world model in the real world, and fine-tune the dynamics predictions on the real world data. As the dynamics prediction improve, the planner rapidly improves performance.
Reliable improvement with simple, supervised system identification!
Train state-based expert policy.
Perturb expert actions to generate large-scale diverse dataset.
Distill simulation data into a world model from raw perception and deploy it with online planning.
Finetune dynamics predictions with real-world data to improve planning performance.
Rapid Real-World Improvement
Simulation Distillation
Simulation Distillation (SimDist) rapidly overcomes the sim-to-real dynamics gap through adaptation in the real world, resulting in substantial gains in task execution on both precise manipulation and quadrupedal locomotion tasks.
World Models
Structure
Our Key Insight: world models automatically decompose task structure in a form that we can exploit to target adaptation where it’s needed. We argue that the encoder, rewards, and value function capture the global structure of the problem in a form that is largely invariant sim-to-real. Thus, we freeze these components during the real world finetuning phase, and focus on finetuning only the dynamics model. This sidesteps the need for end-to-end learning with sparse real-world data and avoids long-horizon credit assignment, which is a central challenge for existing RL approaches.
Transferring State Representations
Transferring State Representations
In order to reliably transfer from sim-to-real, the encoder must learn a valid state representation for the real world environment. Below, we display images which are reconstructed from the latent states predicted by the world model. This demonstrates how the encoder — trained entirely in simulation — captures a robust and accurate representation for the real world.
Note: we do not train the world model with a reconstruction loss. These images are produced by an auxiliary probe that was trained to predict real images from encoded latent states.
Freezing and Transferring Task Structure
Value Prediction
Below we see that the frozen value function accurately discriminates between successful and failed real-world trajectories. Bootstrapping the value functionin simulation enables the planner to estimate long-horizon returns without solving challenging real-world credit assignment problems.
World Models
Adapting Dynamics Prediction
Adapting the dynamics model is essential for effective planning, as both the reward and value estimates are computed over predicted trajectories. By freezing the encoder, we reduce adaptation to a simple supervised learning problem in a low-dimensional latent space. This yields an extremely simple learning problem which can be reliably solved in low-data regimes.
Finetuning drastically lowers dynamics prediction loss for a held out quadruped slippery slope trajectory.
During this trajectory, the front-left foot slips.
At this same instant, the finetuned model correctly anticipates the future slippage, while the pretrained model fails to do so.
Real-World Results
Success rate for two manipulation tasks, computed over 20 trials, and average forward progress for two quadruped locomotion tasks, averaged across all 15 trials (3 speeds, 5 trials each), as a function of real-world finetuning data. For manipulation, we consider two difficulties: initial conditions drawn from a Narrow or Wide grid.
SimDist exhibits rapid and consistent improvement with limited data by finetuning only the latent dynamics model while planning with frozen reward and value models. In contrast, direct policy finetuning with the baselines shows limited or no improvement under the same data budgets.
Hover a plot to preview its video. Click to expand.
Interactive charts load here from assets/data/results.json.