This is a pretty straight forward method - where I simple instantiate a weight matrix and multiple with the states to get rewards over a period and then try to maximise the reward. For discrete state and action space.
Projects solved:
1- Cartpole - For info on the environment - https://github.com/openai/gym/wiki/CartPole-v0