Latent Policy Steering

📰 News

[2026/03/06] LPS is now on arXiv.
[2026/03/05] We release the code of LPS.

Overview

Latent Policy Steering (LPS) is a robust offline reinforcement learning framework for robotics that resolves the brittle trade-off between return maximization and behavioral constraints. Instead of relying on lossy proxy latent critics, LPS directly optimizes a latent-action-space actor by backpropagating original-action-space Q-gradients through a differentiable one-step MeanFlow policy. This architecture allows the original critic to guide end-to-end optimization without proxy networks, while the MeanFlow policy serves as a strong generative prior. As a result, LPS works out-of-the-box with minimal tuning, achieving state-of-the-art performance across OGBench and real-world robotic tasks.

OGBench

Installation

# For OGBench Experiments
conda create -n lps python=3.10
pip install -r requirements.txt

Run Experiments

sh lps_ogbench.sh agent_naem task_name task_num alpha seed

# Examples - LPS
sh lps_ogbench.sh lps cube-single-play 1 1.0 100
sh lps_ogbench.sh lps cube-double-play 1 1.0 100
sh lps_ogbench.sh lps scene-play 1 1.0 100
sh lps_ogbench.sh lps puzzle-3x3-play 1 1.0 100
sh lps_ogbench.sh lps puzzle-4x4-play 1 1.0 100

# Exampels - QC-FQL
sh lps_ogbench.sh qcfql cube-single-play 1 30.0 100
sh lps_ogbench.sh qcfql cube-double-play 1 3.0 100
sh lps_ogbench.sh qcfql scene-play 1 1.0 100
sh lps_ogbench.sh qcfql puzzle-3x3-play 1 3.0 100
sh lps_ogbench.sh qcfql puzzle-4x4-play 1 10.0 100

DROID

Installation

Due to the mujoco-related version conflict, we need to use a separate python environment for DROID training / inference.

1. Create Environment

# For DROID Experiments
conda create -n lps_droid python=3.10

# Install LPS for DROID
pip install -r requirements_droid.txt

2. Install DROID

We have made some chage on DROID repo, so we recommend you to use our forked DROID repo.

git clone https://github.com/jellyho/droid.git
cd droid
git submodule sync
git submodule update --init --recursive

Follow the instruction to install DROID.

Make sure you set the server-side DROID setup ready.

3. Downgrade Numpy

There could be some warning, but please ignore.

pip install 'numpy<2'

Collect the data (hdf5 format)

Run droid_teleop.py collect demonstration as hdf5 format.

We assume that the dataset saved as

droid_dataset_dir/
├── task_1/
│   ├── success/
│   │   ├── trajectory_0.hdf5
│   │   ├── trajectory_1.hdf5
|   |   └── ...
│   ├── failure/
│   │   ├── trajectory_7.hdf5
│   │   ├── trajectory_9.hdf5
|   |   └── ...
├── task_2/
│   ├── success/
|   |   └── ...
│   ├── failure/
|   |   └── ...
└── ...

Train LPS using DROID dataset

sh lps_droid.sh task_name droid_dataset_dir seed

# Example
sh lps_droid.sh task_2 ~/droid_dataset_dir 100

Evaluate LPS on DROID

We again assuming that you installed DROID on the same python environment you installed LPS.

sh lps_droid_eval.sh checkpoint_dir checkpoint_step

# Example
sh lps_droid_eval.sh /home/rllab2/jellyho/droid_ckpts/LPS/LPS_TEST_1 10000

Acknowledgments

This codebase is built on top of Reinforcement Learninig with Aciton Chunking. Referred DiT implementaion from MeanflowQL

Citation

If you find this work useful, please consider citing:

@misc{im2026latentpolicysteeringonestep,
      title={Latent Policy Steering through One-Step Flow Policies}, 
      author={Hokyun Im and Andrey Kolobov and Jianlong Fu and Youngwoon Lee},
      year={2026},
      eprint={2603.05296},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2603.05296}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
agents		agents
assets		assets
droid_utils		droid_utils
envs		envs
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
droid_eval.py		droid_eval.py
droid_online.py		droid_online.py
droid_teleop.py		droid_teleop.py
lps_droid.sh		lps_droid.sh
lps_droid_eval.sh		lps_droid_eval.sh
lps_ogbench.sh		lps_ogbench.sh
main.py		main.py
requirements.txt		requirements.txt
requirements_droid.txt		requirements_droid.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Latent Policy Steering

📰 News

Overview

OGBench

Installation

Run Experiments

DROID

Installation

1. Create Environment

2. Install DROID

3. Downgrade Numpy

Collect the data (hdf5 format)

Train LPS using DROID dataset

Evaluate LPS on DROID

Acknowledgments

Citation

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Latent Policy Steering

📰 News

Overview

OGBench

Installation

Run Experiments

DROID

Installation

1. Create Environment

2. Install DROID

3. Downgrade Numpy

Collect the data (hdf5 format)

Train LPS using DROID dataset

Evaluate LPS on DROID

Acknowledgments

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages