Monday, December 17, 2018

[Reinforcement Learning] Get started to learn Sarsa(lambda λ) for reinforcement learning

Once you know what the Sarsa algorithm is, you can continue to learn Sarsa(lambda λ) algorithm.
I basically refer to these tutorial documents (written in Chinese) :
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/3-3-A-sarsa-lambda/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/3-3-tabular-sarsa-lambda/
https://zhuanlan.zhihu.com/p/28108498

The Sarsa(lambda λ) algorithm looks like this:




I think the main difference between Sarsa and Sarsa(lambda) are two parts:
1. Lambda (from 0 to 1):

  • If lambda = 0, Sarsa(lambda) is Sarsa because it only updates the last step that gains rewards.
  • If lambda = 1, Sarsa(lambda) will update all the historical steps and the last step that gains reward.
  • If lambda = 0.5, Sarsa(lambda) will update around half of the historical steps and the last step that gains reward.

2. Eligibility Trace:
  • Keep tracing the states and steps and will be used for updating Q-table if gaining the reward.

I make a diagram to briefly explain Sarsa(lambda λ) algorithm and it can be compared with my previous post about Sarsa algorithm:
[Reinforcement Learning] Get started to learn Sarsa for reinforcement learning