Danny's tech notebook | 丹尼技術手札

Monday, December 24, 2018

[TensorFlow] My example of using SavedModelBuilder to do inference in TensorFlow

The purpose of this post is to show my example of SavedModelBuilder to do inference in TensorFlow. From my experiment, this approach can save a model with the signature that has input and output node name. And SavedModelBuilder can restore the graph based on the previously saved model pb file and the signature definition. Once, the restore is done, the inference task can be executed directly without GPU device needed if the training task is on GPU device.

[Reinforcement Learning] Get started to learn Actor Critic for reinforcement learning

Actor-Critic is basically combined with Policy Gradient (Actor) and Function Approximation (Critic) based algorithm together. Actor is based on the probability given by policy to act and Critic judges the performance of Actor and gives the score. So, Actor will improve its probability given by policy based on Critic's judge and score. The following diagram is the concept:

[Reinforcement Learning] Get started to learn Sarsa(lambda λ) for reinforcement learning

Once you know what the Sarsa algorithm is, you can continue to learn Sarsa(lambda λ) algorithm.
I basically refer to these tutorial documents (written in Chinese) :
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/3-3-A-sarsa-lambda/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/3-3-tabular-sarsa-lambda/
https://zhuanlan.zhihu.com/p/28108498

The Sarsa(lambda λ) algorithm looks like this:

[Reinforcement Learning] Get started to learn Sarsa for reinforcement learning

If taking a look at Sarsa algorithm, you will find that it is so similar with Q-Learning.
For my previous post about Q-Learning, please refer to this link:
https://danny270degree.blogspot.com/2018/11/reinforcement-learning-get-started-to_21.html

Here is the Sarsa algorithm:

[Reinforcement Learning] Using dynamic programming to solve a simple GridWorld with 4X4

I borrow the example and its source code from here which is a dynamic programming to solve a simple GridWorld with 4X4 and put my explanation for the calculation of value function. Hope that will help to understand dynamic programming and Markov Reward Process(MRP) more quickly.

[Reinforcement Learning] Get started to learn DQN for reinforcement learning

The previous post about Q-Learning is here:
[Reinforcement Learning] Get started to learn Q-Learning for reinforcement learning

Basically, Deep Q-Learning ( DQN ) is upgraded the Q-Learning algorithm and the Q-table is replaced by the neural network. For the DQN tutorial, I refer to these as follows: ( sorry, they are written in Chinese )
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/4-1-A-DQN/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/4-1-DQN1/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/4-2-DQN2/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/4-3-DQN3/

[Reinforcement Learning] Get started to learn Q-Learning for reinforcement learning

The previous post about reinforcement learning:
[Reinforcement Learning] Get started to learn gradient method for reinforcement learning

For the Q-Learning tutorial, I refer to these as follows: ( sorry, they are written in Chinese )
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/2-2-A-q-learning/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/2-2-tabular-q1/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/2-3-tabular-q2/

[Reinforcement Learning] Get started to learn policy gradient method for reinforcement learning

This post is about my first time to learn policy gradient method for reinforcement learning. Basically, there are already a lot of materials on the internet, but in this time, I only want to focus on a tutorial as follows: ( sorry, they are written in Chinese )
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/5-1-policy-gradient-softmax1/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/5-2-policy-gradient-softmax2/

Danny's tech notebook | 丹尼技術手札

Monday, December 24, 2018

[TensorFlow] My example of using SavedModelBuilder to do inference in TensorFlow

Saturday, December 22, 2018

[Reinforcement Learning] Get started to learn Actor Critic for reinforcement learning

Monday, December 17, 2018

[Reinforcement Learning] Get started to learn Sarsa(lambda λ) for reinforcement learning

Friday, December 14, 2018

[Reinforcement Learning] Get started to learn Sarsa for reinforcement learning

Thursday, December 13, 2018

[Reinforcement Learning] Using dynamic programming to solve a simple GridWorld with 4X4

Wednesday, November 28, 2018

[Reinforcement Learning] Get started to learn DQN for reinforcement learning

Thursday, November 22, 2018

[Reinforcement Learning] Get started to learn Q-Learning for reinforcement learning

Wednesday, November 21, 2018

[Reinforcement Learning] Get started to learn policy gradient method for reinforcement learning