Temporal Difference

Temporal Difference Methods

Tags: `TD Learning` `Model Free Control` `Model Free Prediction`

Overview

TD learning mehtods are methods that are model free and don't require the task to be episodic. This method can be applied for evaluation and control of continuing tasks. It is computationally cheaper than the Monte Carlo mehtod due to the fact that the updates are made online and we need not wait till the end of every episode. TD methods update estimates based in part on other learned estimates, without waiting for a final outcome (they bootstrap). Here the target is value function of the next state and not the true return.

Which is better TD or MC, is a Question open for debate and research

Implementations

SARSA
Q-Learning
Comparitive study of SARSA and Q-Learning

Results

SARSA on WindyGridworld

Q-Learning on Cliff-walk

Comparision of Q-learning and SARSA on Cliff walk The red one is from Q-Learning and the blue from SARSA

Resources

For detailed proofs and theory refer Sutton and Barto.
For understanding the environment visit the github page.

Name		Name	Last commit message	Last commit date
parent directory ..
Q_Learning.py		Q_Learning.py
README.md		README.md
SARSA_Learning.py		SARSA_Learning.py
compare.py		compare.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Temporal Difference Methods

Tags: `TD Learning` `Model Free Control` `Model Free Prediction`

Overview

Implementations

Results

Resources

FilesExpand file tree

Temporal Difference

Directory actions

More options

Directory actions

More options

Latest commit

History

Temporal Difference

Folders and files

parent directory

README.md

Temporal Difference Methods

Tags: TD Learning Model Free Control Model Free Prediction

Overview

Implementations

Results

Resources

Tags: `TD Learning` `Model Free Control` `Model Free Prediction`