Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

Temporal Difference Methods


Tags: TD Learning Model Free Control Model Free Prediction

Overview

TD learning mehtods are methods that are model free and don't require the task to be episodic. This method can be applied for evaluation and control of continuing tasks. It is computationally cheaper than the Monte Carlo mehtod due to the fact that the updates are made online and we need not wait till the end of every episode. TD methods update estimates based in part on other learned estimates, without waiting for a final outcome (they bootstrap). Here the target is value function of the next state and not the true return.

Which is better TD or MC, is a Question open for debate and research

Implementations

  • SARSA
  • Q-Learning
  • Comparitive study of SARSA and Q-Learning

Results

  • SARSA on WindyGridworld

  • Q-Learning on Cliff-walk

  • Comparision of Q-learning and SARSA on Cliff walk The red one is from Q-Learning and the blue from SARSA

Resources