- [2026/03/06] LPS is now on arXiv.
- [2026/03/05] We release the code of LPS.
Latent Policy Steering (LPS) is a robust offline reinforcement learning framework for robotics that resolves the brittle trade-off between return maximization and behavioral constraints. Instead of relying on lossy proxy latent critics, LPS directly optimizes a latent-action-space actor by backpropagating original-action-space Q-gradients through a differentiable one-step MeanFlow policy. This architecture allows the original critic to guide end-to-end optimization without proxy networks, while the MeanFlow policy serves as a strong generative prior. As a result, LPS works out-of-the-box with minimal tuning, achieving state-of-the-art performance across OGBench and real-world robotic tasks.
# For OGBench Experiments
conda create -n lps python=3.10
pip install -r requirements.txtsh lps_ogbench.sh agent_naem task_name task_num alpha seed
# Examples - LPS
sh lps_ogbench.sh lps cube-single-play 1 1.0 100
sh lps_ogbench.sh lps cube-double-play 1 1.0 100
sh lps_ogbench.sh lps scene-play 1 1.0 100
sh lps_ogbench.sh lps puzzle-3x3-play 1 1.0 100
sh lps_ogbench.sh lps puzzle-4x4-play 1 1.0 100
# Exampels - QC-FQL
sh lps_ogbench.sh qcfql cube-single-play 1 30.0 100
sh lps_ogbench.sh qcfql cube-double-play 1 3.0 100
sh lps_ogbench.sh qcfql scene-play 1 1.0 100
sh lps_ogbench.sh qcfql puzzle-3x3-play 1 3.0 100
sh lps_ogbench.sh qcfql puzzle-4x4-play 1 10.0 100Due to the mujoco-related version conflict, we need to use a separate python environment for DROID training / inference.
# For DROID Experiments
conda create -n lps_droid python=3.10
# Install LPS for DROID
pip install -r requirements_droid.txtWe have made some chage on DROID repo, so we recommend you to use our forked DROID repo.
git clone https://github.com/jellyho/droid.git
cd droid
git submodule sync
git submodule update --init --recursiveFollow the instruction to install DROID.
Make sure you set the server-side DROID setup ready.
There could be some warning, but please ignore.
pip install 'numpy<2'Run droid_teleop.py collect demonstration as hdf5 format.
We assume that the dataset saved as
droid_dataset_dir/
├── task_1/
│ ├── success/
│ │ ├── trajectory_0.hdf5
│ │ ├── trajectory_1.hdf5
| | └── ...
│ ├── failure/
│ │ ├── trajectory_7.hdf5
│ │ ├── trajectory_9.hdf5
| | └── ...
├── task_2/
│ ├── success/
| | └── ...
│ ├── failure/
| | └── ...
└── ...
sh lps_droid.sh task_name droid_dataset_dir seed
# Example
sh lps_droid.sh task_2 ~/droid_dataset_dir 100We again assuming that you installed DROID on the same python environment you installed LPS.
sh lps_droid_eval.sh checkpoint_dir checkpoint_step
# Example
sh lps_droid_eval.sh /home/rllab2/jellyho/droid_ckpts/LPS/LPS_TEST_1 10000This codebase is built on top of Reinforcement Learninig with Aciton Chunking. Referred DiT implementaion from MeanflowQL
If you find this work useful, please consider citing:
@misc{im2026latentpolicysteeringonestep,
title={Latent Policy Steering through One-Step Flow Policies},
author={Hokyun Im and Andrey Kolobov and Jianlong Fu and Youngwoon Lee},
year={2026},
eprint={2603.05296},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.05296},
}