DDPG

Deep Deterministic Policy Gradient

This is quite a complex RL algorithm - can be used for continous state and action space. It is a kind of Actor - Critic method(Atleast that's what the founders call it), but it is somewhat similar to supervised learning. It can be used for continous state action space. The actor takes the actions and is evaluated with the Q values generated using the critic. So, here actor acts as the output variable and the q values by the critic - we can call them, the labels.

For every action taken by the actor model - you get state, action, reward, new_state, done(if task is done). So, you can create a tuple of { S, A, R, S', D } and store it in the replay buffer.
Take a sample from the reply buffer and train the network from the experiences stored.

Make a note that for a critic network, the first hidden layer's input is fcs1_units + action_size,

so self.fc2 = nn.Linear(fcs1_unit + action_size, fcs2_unit)

- I have tried to explain how DDPG works with both actor - critic model in thebelow diagram.

Projects solved:

1 - Pendulum - For info on the environment - https://github.com/openai/gym/wiki/Pendulum-v0

Name		Name	Last commit message	Last commit date
parent directory ..
Pendulum-v0		Pendulum-v0
Readme.md		Readme.md
ddpg_explained.jpeg		ddpg_explained.jpeg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Deep Deterministic Policy Gradient

Make a note that for a critic network, the first hidden layer's input is fcs1_units + action_size,

FilesExpand file tree

DDPG

Directory actions

More options

Directory actions

More options

Latest commit

History

DDPG

Folders and files

parent directory

Readme.md

Deep Deterministic Policy Gradient

Make a note that for a critic network, the first hidden layer's input is fcs1_units + action_size,