D4PG-pytorch

PyTorch implementation of Distributed Distributional Deterministic Policy Gradients (https://arxiv.org/abs/1804.08617).

Implementation was tested on environments from OpenAI Gym.

About

D4PG and D3PG implementations with following features

learner, sampler and agents run in separate processes
exploiter agent(s) exists which acts without noise in actions on target network
GPU is hold only by exploiters, all other exploration processes are run on CPU

Project was tested on Ubuntu 18.04, Intel i5 with 4 cores, Nvidia GTX 1080Ti

Usage

Run python train.py --config configs/openai/d4pg/walker2d_d4pg.yml

Tests

python -m unittest discover

Results

Configs for reproducing curves below can be found in configs directory (num parallel agents = 4).

OpenAI Mujoco

DMControl

Reproduce

All results were obtained with configs in configs directory

References

Continuous control with deep reinforcement learning, [https://arxiv.org/abs/1509.02971]
Distributed Distributional Deterministic Policy Gradients [https://arxiv.org/abs/1804.08617]

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
configs		configs
env		env
misc		misc
models		models
tests		tests
utils		utils
README.md		README.md
config.yml		config.yml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

D4PG-pytorch

About

Usage

Tests

Results

Reproduce

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

D4PG-pytorch

About

Usage

Tests

Results

Reproduce

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages