Simulation Distillation

Pretraining World Models in Simulation for Rapid Real-World Adaptation

Jacob Levy^*1, Tyler Westenbroek^*2, Kevin Huang², Fernando Palafox¹, Patrick Yin²,
Shayegan Omidshafiei³, Dong-Ki Kim³, Abhishek Gupta², David Fridovich-Keil¹ ¹ UT Austin, ² UW, ³ FieldAI, ^* Equal Contribution

Paper Code

3D Planning DemoInteractive

SimDist plans in a latent world model pretrained in simulation. Below we reconstruct and visualize the latent plans — play with them yourself!

The Simulation Distillation Pipeline

SimDist extracts structural priors from the simulator by training a latent world model on a large-scale, mixed quality dataset. We then plan over the world model in the real world, and fine-tune the dynamics predictions on the real world data. As the dynamics prediction improve, the planner rapidly improves performance.

Reliable improvement with simple, supervised system identification!

Step 1 — Expert Policy Training

Train state-based expert policy.

Step 2 — Data Generation

Perturb expert actions to generate large-scale diverse dataset.

Step 3 — World Model Pretraining and Deployment

Distill simulation data into a world model from raw perception and deploy it with online planning.

Step 4 — Adaptation

Finetune dynamics predictions with real-world data to improve planning performance.

Rapid Real-World Improvement

Simulation Distillation

Simulation Distillation (SimDist) rapidly overcomes the sim-to-real dynamics gap through adaptation in the real world, resulting in substantial gains in task execution on both precise manipulation and quadrupedal locomotion tasks.

Manipulation - Peg Insertion

Quadruped - Slippery Slope

World Models

Structure

Our Key Insight: world models automatically decompose task structure in a form that we can exploit to target adaptation where it’s needed. We argue that the encoder, rewards, and value function capture the global structure of the problem in a form that is largely invariant sim-to-real. Thus, we freeze these components during the real world finetuning phase, and focus on finetuning only the dynamics model. This sidesteps the need for end-to-end learning with sparse real-world data and avoids long-horizon credit assignment, which is a central challenge for existing RL approaches.

World model architecture showing encoder, latent dynamics, and value components

Transferring State Representations

In order to reliably transfer from sim-to-real, the encoder must learn a valid state representation for the real world environment. Below, we display images which are reconstructed from the latent states predicted by the world model. This demonstrates how the encoder — trained entirely in simulation — captures a robust and accurate representation for the real world.

Note: we do not train the world model with a reconstruction loss. These images are produced by an auxiliary probe that was trained to predict real images from encoded latent states.

Value Prediction

Below we see that the frozen value function accurately discriminates between successful and failed real-world trajectories. Bootstrapping the value functionin simulation enables the planner to estimate long-horizon returns without solving challenging real-world credit assignment problems.

Success

Failure

Adapting Dynamics Prediction

Adapting the dynamics model is essential for effective planning, as both the reward and value estimates are computed over predicted trajectories. By freezing the encoder, we reduce adaptation to a simple supervised learning problem in a low-dimensional latent space. This yields an extremely simple learning problem which can be reliably solved in low-data regimes.

Finetuning drastically lowers dynamics prediction loss for a held out quadruped slippery slope trajectory.

During this trajectory, the front-left foot slips.

Foot prediction comparison visualization during slip event

Real-World Results Interactive

Success rate for two manipulation tasks, computed over 20 trials, and average forward progress for two quadruped locomotion tasks, averaged across all 15 trials (3 speeds, 5 trials each), as a function of real-world finetuning data. For manipulation, we consider two difficulties: initial conditions drawn from a Narrow or Wide grid.

SimDist exhibits rapid and consistent improvement with limited data by finetuning only the latent dynamics model while planning with frozen reward and value models. In contrast, direct policy finetuning with the baselines shows limited or no improvement under the same data budgets.

Hover a legend item to highlight a curve.
Click to toggle it on/off.

Hover a plot to preview its video. Click to expand.

Static fallback plot for real-world results

Interactive charts load here from assets/data/results.json.

BibTeX

@article{2026simdist,
  title={Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation},
  author={Levy, Jacob and Westenbroek, Tyler and Huang, Kevin and Palafox, Fernando and Yin, Patrick and Omidshafiei, Shayegan and Kim, Dong-Ki and Gupta, Abhishek and Fridovich-Keil, David},
  journal={arXiv preprint arXiv:2603.15759},
  year={2026},
  url={https://arxiv.org/abs/2603.15759}
}

Simulation Distillation

3D Planning Demo

The Simulation Distillation Pipeline

Rapid Real-World Improvement

Simulation Distillation

World Models

Structure

Transferring State Representations

Transferring State Representations

Freezing and Transferring Task Structure

Value Prediction

World Models

Adapting Dynamics Prediction

Real-World Results