Saturday, December 22, 2018

[Reinforcement Learning] Get started to learn Actor Critic for reinforcement learning

Actor-Critic is basically combined with Policy Gradient (Actor)  and Function Approximation (Critic) based algorithm together. Actor is based on the probability given by policy to act and Critic judges the performance of Actor and gives the score. So, Actor will improve its probability given by policy based on Critic's judge and score. The following diagram is the concept:




Actor-Critic can update its policy by step, which is faster than Policy Gradients. But, it seems to not easy to converge.

Here is a short and clear explanation about Actor Critic:
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/6-1-A-AC/

Check out its code example to understand  it a little bit quicker
https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/contents/8_Actor_Critic_Advantage/AC_CartPole.py

I draw a diagram about the example: AC_CartPole.py for me to refer to.




No comments: