Recently for some reasons, I studied the Domain-Adversarial Training of Neural Networks and it can be downloaded from http://jmlr.org/papers/volume17/15-239/15-239.pdf
In this paper, there is the key point that we should implement "Gradient Reversal Layer" for Discriminator to use it to connect the feature extractor. I found the source to implement it by replacing Identity op's gradient function as follows:
Friday, January 4, 2019
Thursday, January 3, 2019
[TensorFlow] How to generate the Memory Report from Grappler?
In the previous post, I introduce the way to generate cost and model report from Grappler.
https://danny270degree.blogspot.com/2019/01/tensorflow-how-to-generate-cost-and.html
In this post, I will continue to introduce the memory report which I think that is very useful. Please refer to my previous post to get the model code.
https://danny270degree.blogspot.com/2019/01/tensorflow-how-to-generate-cost-and.html
In this post, I will continue to introduce the memory report which I think that is very useful. Please refer to my previous post to get the model code.
[TensorFlow] How to generate the Cost and Model Report from Grappler?
General speaking, Grappler in Tensorflow has several optimizers to do the specific area optimizations, such as for reducing the peak memory usage in GPU. So, I want to introduce some useful functions inside Grappler which are used for Simple Placer mechanism. And, these functions are also partially used in Grappler's optimizers.
Monday, December 24, 2018
[TensorFlow] My example of using SavedModelBuilder to do inference in TensorFlow
The purpose of this post is to show my example of SavedModelBuilder to do inference in TensorFlow. From my experiment, this approach can save a model with the signature that has input and output node name. And SavedModelBuilder can restore the graph based on the previously saved model pb file and the signature definition. Once, the restore is done, the inference task can be executed directly without GPU device needed if the training task is on GPU device.
Saturday, December 22, 2018
[Reinforcement Learning] Get started to learn Actor Critic for reinforcement learning
Actor-Critic is basically combined with Policy Gradient (Actor)  and Function Approximation (Critic) based algorithm together. Actor is based on the probability given by policy to act and Critic judges the performance of Actor and gives the score. So, Actor will improve its probability given by policy based on Critic's judge and score. The following diagram is the concept:
Monday, December 17, 2018
[Reinforcement Learning] Get started to learn Sarsa(lambda λ) for reinforcement learning
Once you know what the Sarsa algorithm is, you can continue to learn Sarsa(lambda λ) algorithm.
I basically refer to these tutorial documents (written in Chinese) :
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/3-3-A-sarsa-lambda/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/3-3-tabular-sarsa-lambda/
https://zhuanlan.zhihu.com/p/28108498
The Sarsa(lambda λ) algorithm looks like this:
I basically refer to these tutorial documents (written in Chinese) :
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/3-3-A-sarsa-lambda/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/3-3-tabular-sarsa-lambda/
https://zhuanlan.zhihu.com/p/28108498
The Sarsa(lambda λ) algorithm looks like this:
Friday, December 14, 2018
[Reinforcement Learning] Get started to learn Sarsa for reinforcement learning
If taking a look at Sarsa algorithm, you will find that it is so similar with Q-Learning.
For my previous post about Q-Learning, please refer to this link:
https://danny270degree.blogspot.com/2018/11/reinforcement-learning-get-started-to_21.html
Here is the Sarsa algorithm:
For my previous post about Q-Learning, please refer to this link:
https://danny270degree.blogspot.com/2018/11/reinforcement-learning-get-started-to_21.html
Here is the Sarsa algorithm:
Thursday, December 13, 2018
[Reinforcement Learning] Using dynamic programming to solve a simple GridWorld with 4X4
I borrow the example and its source code from here which is a dynamic programming to solve a simple GridWorld with 4X4 and put my explanation for the calculation of value function. Hope that will help to understand dynamic programming and Markov Reward Process(MRP) more quickly.
Subscribe to:
Comments (Atom)
 
