We formally build a theoretical foundation of policy representation via the diffusion probability model and provide practical implementations of diffusion policy for online model-free RL.
Paper link: https://arxiv.org/pdf/2305.13122.pdf
Installations of PyTorch and MuJoCo are needed.
A suitable conda environment named DIPO can be created and activated with:
conda create DIPO
conda activate DIPOTo get started, install the additionally required python packages into you environment.
pip install -r requirements.txtRunning experiments based our code could be quite easy, so below we use Hopper-v3 task as an example.
python main.py --env_name Hopper-v3 --num_steps 1000000 --n_timesteps 100 --cuda 0 --seed 0Hyperparameters for DIPO have been shown as follow for easily reproducing our reported results.
| Hyperparameter | DIPO | SAC | TD3 | PPO |
|---|---|---|---|---|
| No. of hidden layers | 2 | 2 | 2 | 2 |
| No. of hidden nodes | 256 | 256 | 256 | 256 |
| Activation | mish | relu | relu | tanh |
| Batch size | 256 | 256 | 256 | 256 |
| Discount for reward |
0.99 | 0.99 | 0.99 | 0.99 |
| Target smoothing coefficient |
0.005 | 0.005 | 0.005 | 0.005 |
| Learning rate for actor | ||||
| Learning rate for critic | ||||
| Actor Critic grad norm | 2 | N/A | N/A | 0.5 |
| Memeroy size | ||||
| Entropy coefficient | N/A | 0.2 | N/A | 0.01 |
| Value loss coefficient | N/A | N/A | N/A | 0.5 |
| Exploration noise | N/A | N/A |
|
N/A |
| Policy noise | N/A | N/A |
|
N/A |
| Noise clip | N/A | N/A | 0.5 | N/A |
| Use gae | N/A | N/A | N/A | True |
| Hyperparameter | Hopper-v3 | Walker2d-v3 | Ant-v3 | HalfCheetah-v3 | Humanoid-v3 |
|---|---|---|---|---|---|
| Learning rate for action | 0.03 | 0.03 | 0.03 | 0.03 | 0.03 |
| Actor Critic grad norm | 1 | 2 | 0.8 | 2 | 2 |
| Action grad norm ratio | 0.3 | 0.08 | 0.1 | 0.08 | 0.1 |
| Action gradient steps | 20 | 20 | 20 | 40 | 20 |
| Diffusion inference timesteps | 100 | 100 | 100 | 100 | 100 |
| Diffusion beta schedule | cosine | cosine | cosine | cosine | cosine |
| Update actor target every | 1 | 1 | 1 | 2 | 1 |
If you have any questions regarding the code or paper, feel free to send all correspondences to [email protected] or [email protected]