CEM

Cross Entropy Method

Here I use, CEM method to solve the environment. In this method, a small noise is added to weights to the neural network instead on the actions taken by the agent. It is an off policy reinforcement learning method. It uses a tanh activation function in the final layer - it can be used for continouse state action space.

Instantiate a weight matrix. For every episode, a small amount of noise is added to the weight matrix and rewards are evaluated. Its like genetic evolution method -
For every episode, you take a set of weights(by adding noise everytime to the weight matrix) and calculate the rewards obtained using those weights.
Then you sort those rewards and only take the top 10/ whatever the elite number and get the best weights corresponding to those rewards.
In the end, you take the mean of those top weights and then calculate the reward with that mean weight.
Repeat step 2-4 for number of episodes, with the mean weight and add noice to it to get correct set of weights.

Projects solved:

1 - MountainCar_Continous - For info on the environment - https://github.com/openai/gym/wiki/MountainCarContinuous-v0

Name		Name	Last commit message	Last commit date
parent directory ..
mountainCar_Cont		mountainCar_Cont
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Cross Entropy Method

FilesExpand file tree

CEM

Directory actions

More options

Directory actions

More options

Latest commit

History

CEM

Folders and files

parent directory

Readme.md

Cross Entropy Method