Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

4. REINFORCE

REINFORCE is a policy based RL algorithm. It collects couple of episodes and then generate a loss function to be back-propagated along the network. I mostly use it for continous/discrete state space but only discrete action space. Algorithm :

  • For an episode, we generate a categorical distribution of probabilites and then collect the log probabilities of each action taken.
  • After an episode is over, we mulitply the log probabilities with the associated reward for that action.
  • The main aim is to give more preference to those actions which generates more reward.
  • The loss function is calculated by taking the sum of all the product of log probabilites with total reward.

Projects solved: