Thursday, December 13, 2018

[Reinforcement Learning] Get started to learn Sarsa for reinforcement learning

If taking a look at Sarsa algorithm, you will find that it is so similar with Q-Learning.
For my previous post about Q-Learning, please refer to this link:
https://danny270degree.blogspot.com/2018/11/reinforcement-learning-get-started-to_21.html

Here is the Sarsa algorithm:

And, the main difference between Sarsa and Q-Learning is not much. Please compare with the following two graphs:
Q-Learning:


Sarsa:

We also consider Sarsa as on-policy strategy because the actor will adopt the action it picks up for the next state.Another difference is the epsilon-greedy policy for choosing actions.

Wednesday, December 12, 2018

[Reinforcement Learning] Using dynamic programming to solve a simple GridWorld with 4X4

I borrow the example and its source code from here which is a dynamic programming to solve a simple GridWorld with 4X4 and put my explanation for the calculation of value function. Hope that will help to understand dynamic programming and Markov Reward Process(MRP) more quickly.

#!/usr/bin/python
# -*- coding: utf-8 -*-
# coding=utf-8

'''
Implementation of small grid world example illustrated by David Silver
in his Reinforcement Learning Lecture3 - Planning by Dynamic 
Programming. 
Author: Qiang Ye
Date: July 1, 2017

The value function converges to:
 0.00 -14.00 -20.00 -22.00 
-14.00 -18.00 -20.00 -20.00 
-20.00 -20.00 -18.00 -14.00 
-22.00 -20.00 -14.00   0.00 
At Iterate No.153
'''
# id of the states, 0 and 15 are terminal states
states = [i for i in range(16)]
#  0* 1  2   3  
#  4  5  6   7
#  8  9  10  11
#  12 13 14  15*

# initial values of states
values = [0  for _ in range(16)]

# Action
actions = ["n", "e", "s", "w"]

# State changed by action
# use a dictionary for convenient computation of next state id.
ds_actions = {"n": -4, "e": 1, "s": 4, "w": -1}  

# discount factor
gamma = 1.00

# Based on current state and action to get next state励
def nextState(s, a):
  next_state = s
  if (s%4 == 0 and a == "w") or (s<4 and a == "n") or \
     ((s+1)%4 == 0 and a == "e") or (s > 11 and a == "s"):
    pass
  else:
    ds = ds_actions[a]
    next_state = s + ds
  return next_state

# reward of a state
def rewardOf(s):
  return 0 if s in [0,15] else -1

# check if a state is terminate state
def isTerminateState(s):
  return s in [0,15]

# get successor states of a given state s
def getSuccessors(s):
  successors = []
  if isTerminateState(s):
    return successors
  for a in actions:
    next_state = nextState(s, a)
    # if s != next_state:
    successors.append(next_state)
  return successors

# update the value of state s
def updateValue(s):
  sucessors = getSuccessors(s)
  newValue = 0  # values[s]
  num = 4       # len(successors)
  reward = rewardOf(s)
  for next_state in sucessors:
    newValue += 1.00/num * (reward + gamma * values[next_state])
  return newValue

# perform one-step iteration
def performOneIteration():
  newValues = [0 for _ in range(16)]
  for s in states:
    newValues[s] = updateValue(s)
  global values
  values = newValues
  printValue(values)

# show some array info of the small grid world
def printValue(v):
  for i in range(16):
    print('{0:>6.2f}'.format(v[i],end = ""))
    if (i+1)%4 == 0:
      print("")
  print()

# test function
def test():
  printValue(states)
  printValue(values)
  for s in states:
    reward = rewardOf(s)
    for a in actions:
      next_state = nextState(s, a)
      print("({0}, {1}) -> {2}, with reward {3}".format(s, a,next_state, reward))

  for i in range(200):
    performOneIteration()
    printValue(values)

def main():
  max_iterate_times = 160
  cur_iterate_times = 0
  while cur_iterate_times <= max_iterate_times:
    print("Iterate No.{0}".format(cur_iterate_times))
    performOneIteration()
    cur_iterate_times += 1
  printValue(values)

if __name__ == '__main__':
  main()

When executing it, you will get the first and second iteration's value function in the table
Value function of Iterate No.0 
  0.00  -1.00  -1.00  -1.00
 -1.00  -1.00  -1.00  -1.00
 -1.00  -1.00  -1.00  -1.00
 -1.00  -1.00  -1.00   0.00


Value function of Iterate No.1
  0.00  -1.75  -2.00  -2.00
 -1.75  -2.00  -2.00  -2.00
 -2.00  -2.00  -2.00  -1.75
 -2.00  -2.00  -1.75   0.00

Here due to using dynamic programming, I only want to explain how the value function for state:1 is generated in the first and second iteration.

In Iterate No.0, all the value function for any state is 0.
Giving the state at 1, we can calculate value function for state 1 using the formula: Bellman Equation for MRP
In the source code, we implement this formula to this:
for next_state in sucessors:
    newValue += 1.00/num * (reward + gamma * values[next_state])
As we know that there are 4 actions mapped to 4 next state below:
(1, n) -> 1, with reward -1    # next state:1
(1, e) -> 2, with reward -1    # next state:2
(1, s) -> 5, with reward -1    # next state:5
(1, w) -> 0, with reward -1   # next state:0

= 1/4 * (-1 + 1 * 0) # next state:1's value = 0
+ 1/4 * (-1 + 1 * 0) # next state:2's value = 0
+ 1/4 * (-1 + 1 * 0) # next state:5's value = 0
+ 1/4 * (-1 + 1 * 0) # next state:1's value = 0
= -1

In Iterate No.1, 
we can calculate value function for state 1 using the formula and the value function of Iterate No.0
= 1/4 * (-1 + 1 * -1) # next state:1's value = -1
+ 1/4 * (-1 + 1 * -1) # next state:2's value = -1
+ 1/4 * (-1 + 1 * -1) # next state:5's value = -1
+ 1/4 * (-1 + 1 * 0)  # next state:0's value = 0
= -1.75
Bingo, the answer is correct!!

Tuesday, November 27, 2018

[Reinforcement Learning] Get started to learn DQN for reinforcement learning

The previous post about Q-Learning is here:
[Reinforcement Learning] Get started to learn Q-Learning for reinforcement learning

Basically, Deep Q-Learning ( DQN ) is upgraded the Q-Learning algorithm and the Q-table is replaced by the neural network. For the DQN tutorial, I refer to these as follows: ( sorry, they are written in Chinese )
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/4-1-A-DQN/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/4-1-DQN1/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/4-2-DQN2/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/4-3-DQN3/

And, the DQN demo example is in GitHub:
https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/tree/master/contents/5_Deep_Q_Network

First, let us recall what the Q-Learning is. Please refer to the picture below. Assume we right now at State1 and there are 2 actions. According to the Q-Table, we take action 2 because of the more potential reward. Then, we can update Q-Table of Q(s1, a2) based on the Q-Learning algorithm.


For DQN, there is something different. At state1, we take action based on picking up the max Q value from Evaluation NN. Then, we use Target NN to get the next state’s Q value for updating NN. ( NN means neural network )

In the DQN demo example, I draw this diagram to help understand it more:


Again, I dump some of the data during the training process of DQN as follows. ( it can be compared with the diagram above. )

('[DEBUG] s, a, r, _s:', array([0.25, 0.25]), 1, 0, array([0.25, 0.25]))
('[DEBUG] s, a, r, _s:', array([0.25, 0.25]), 1, 0, array([0.25, 0.25]))
('[DEBUG] s, a, r, _s:', array([0.25, 0.25]), 0, 0, array([0.25, 0.  ]))
('[DEBUG] s, a, r, _s:', array([0.25, 0.  ]), 1, 0, array([0.25, 0.25]))
('[DEBUG] s, a, r, _s:', array([0.25, 0.25]), 1, 0, array([0.25, 0.25]))
('[DEBUG] batch_memory:', array([[ 0.  ,  0.25,  0.  ,  1.  ,  0.  ,  0.  ],
       [ 0.  ,  0.25,  1.  ,  0.  ,  0.  ,  0.25],
       [ 0.  ,  0.25,  1.  ,  0.  ,  0.  ,  0.25],
       [ 0.  ,  0.25,  1.  ,  0.  ,  0.  ,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  3.  ,  0.  ,  0.  ,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.  ,  0.25,  1.  ,  0.  ,  0.  ,  0.25],
       [ 0.  ,  0.25,  1.  ,  0.  ,  0.  ,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.  , -0.5 ,  2.  ,  0.  ,  0.25, -0.5 ],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.  ,  0.25,  1.  ,  0.  ,  0.  ,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.  ,  0.25,  1.  ,  0.  ,  0.  ,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.  ,  0.25,  2.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.25,  0.25,  1.  ,  0.  ,  0.25,  0.25],
       [ 0.  ,  0.25,  1.  ,  0.  ,  0.  ,  0.25]]))
('[DEBUG] q_next, q_eval:', array([[-0.06391249,  0.19887918,  0.22256142, -0.02985281],
       [-0.09851839,  0.26159775,  0.2543864 , -0.04858976],
       [-0.09851839,  0.26159775,  0.2543864 , -0.04858976],
       [-0.09851839,  0.26159775,  0.2543864 , -0.04858976],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.09851839,  0.26159775,  0.2543864 , -0.04858976],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.09851839,  0.26159775,  0.2543864 , -0.04858976],
       [-0.09851839,  0.26159775,  0.2543864 , -0.04858976],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.01857544,  0.11471414,  0.10633466, -0.11460548],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.09851839,  0.26159775,  0.2543864 , -0.04858976],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.09851839,  0.26159775,  0.2543864 , -0.04858976],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.04373299,  0.18115267,  0.1792323 , -0.08237151],
       [-0.09851839,  0.26159775,  0.2543864 , -0.04858976]],
      dtype=float32), array([[-0.09573826,  0.2598316 ,  0.25124246, -0.04765747],
       [-0.09573826,  0.2598316 ,  0.25124246, -0.04765747],
       [-0.09573826,  0.2598316 ,  0.25124246, -0.04765747],
       [-0.09573826,  0.2598316 ,  0.25124246, -0.04765747],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.09573826,  0.2598316 ,  0.25124246, -0.04765747],
       [-0.09573826,  0.2598316 ,  0.25124246, -0.04765747],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.05664021,  0.16240907,  0.21497497, -0.04578716],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.09573826,  0.2598316 ,  0.25124246, -0.04765747],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.09573826,  0.2598316 ,  0.25124246, -0.04765747],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.09573826,  0.2598316 ,  0.25124246, -0.04765747],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.04104728,  0.17953533,  0.17629224, -0.08140446],
       [-0.09573826,  0.2598316 ,  0.25124246, -0.04765747]],
      dtype=float32))
('[DEBUG] batch_index, eval_act_index:', array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
      dtype=int32), array([0, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 2, 1, 1, 1]))
('[DEBUG] reward, gamma:', array([1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), 0.9)
('[DEBUG] q_target[batch_index, eval_act_index]:', array([1.2003052 , 0.23543797, 0.23543797, 0.23543797, 0.1630374 ,
       0.1630374 , 0.1630374 , 0.23543797, 0.1630374 , 0.1630374 ,
       0.1630374 , 0.1630374 , 0.23543797, 0.23543797, 0.1630374 ,
       0.1630374 , 0.1630374 , 0.10324273, 0.1630374 , 0.1630374 ,
       0.1630374 , 0.1630374 , 0.1630374 , 0.23543797, 0.1630374 ,
       0.1630374 , 0.23543797, 0.1630374 , 0.1630374 , 0.1630374 ,
       0.1630374 , 0.23543797], dtype=float32))                  0.0  0.0  0.0  0.0


If I choose batch_index[1]=1, eval_act_index[1]=1
q_next[1,1] = 0.26159775

q_next[1,1]  * gamma + reward = 0.26159775 * 0.9 + 0 = 0.23543797
q_target[1,1] = 0.23543797

So, we verify the q_target[1,1] value OK.

Wednesday, November 21, 2018

[Reinforcement Learning] Get started to learn Q-Learning for reinforcement learning

The previous post about reinforcement learning:
[Reinforcement Learning] Get started to learn gradient method for reinforcement learning

For the Q-Learning tutorial, I refer to these as follows: ( sorry, they are written in Chinese )
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/2-2-A-q-learning/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/2-2-tabular-q1/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/2-3-tabular-q2/

And, the Q-Learning demo example is in GitHub:
https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/tree/master/contents/2_Q_Learning_maze

The most important part of it is the Q-Learning algorithm:


Here I am going to dump the data during the training process of Q-Learning so that we can get understand the algorithm quicker.

If I got the dump data as follows:
current state:'[45.0, 45.0, 75.0, 75.0]'
Action: 1
Reward: -1
Next state: 'terminal'

Current Q_table:
State \ Action            0    1    2    3
[5.0, 5.0, 35.0, 35.0]    0.0  0.0  0.0  0.0
[5.0, 45.0, 35.0, 75.0]   0.0  0.0  0.0  0.0
[45.0, 45.0, 75.0, 75.0]  0.0  0.0  0.0  0.0
terminal                  0.0  0.0  0.0  0.0
We don't need to pay too much attention about the state's value in the list because the state is generated by the environment. So, based on the data, how to calculate the Q-Learning algorithm?
    def learn(self, s, a, r, s_):
        self.check_state_exist(s_)  # 检测 q_table 中是否存在 s_ (见后面标题内容)
        q_predict = self.q_table.loc[s, a]
        if s_ != 'terminal':
            q_target = r + self.gamma * self.q_table.loc[s_, :].max()  # 下个 state 不是 终止符
        else:
            q_target = r  # 下个 state 是终止符
        self.q_table.loc[s, a] += self.lr * (q_target - q_predict)  # 更新对应的 state-action 值
First, we calculate q_target, q_predict, and Q_table using the Q-learning algorithm above:
q_predict = 0.0 
q_target = -1

self.q_table.loc[s, a] += 0.01 * (-1 - 0.0)
self.q_table.loc['[45.0, 45.0, 75.0, 75.0]', 1] = 0.0 - 0.01 = -0.01
So, we will get the new Q_table:
Current Q_table:
State \ Action            0     1    2    3
[5.0, 5.0, 35.0, 35.0]    0.0  0.00  0.0  0.0
[5.0, 45.0, 35.0, 75.0]   0.0  0.00  0.0  0.0
[45.0, 45.0, 75.0, 75.0]  0.0 -0.01  0.0  0.0
terminal                  0.0  0.00  0.0  0.0

Tuesday, November 20, 2018

[Reinforcement Learning] Get started to learn policy gradient method for reinforcement learning

This post is about my first time to learn policy gradient method for reinforcement learning. Basically, there are already a lot of materials on the internet, but in this time, I only want to focus on a tutorial as follows: ( sorry, they are written in Chinese )
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/5-1-policy-gradient-softmax1/
https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/5-2-policy-gradient-softmax2/

The CartPole demo source code is in the link below:
https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/tree/master/contents/7_Policy_gradient_softmax


The most important part of it is the Policy Gradient algorithm:


Here I am going to represent some dump data from the demo example so that it may help somehow to understand Policy Gradients in this CartPole game.

An episode means the agent plays the game and through the end of the game.
So, Here I will give an example, episode 0:

The actions are done by the agent in this episode :
[0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0]
The state looks like:
array([-0.03290773, -0.01920455,  0.00946775,  0.03302626]), 
array([-0.03329182,  0.17578036,  0.01012827, -0.25665452]),
...
This episode has a total number of rewards is 13.
('episode:', 0, '  reward:', 13)
The rewards are in details:
[1.0, 1.0, 1.0, 1.0, 
 1.0, 1.0, 1.0, 1.0, 
 1.0, 1.0, 1.0, 1.0, 1.0]
We need to transfer rewards to discount and normalized rewards
[12.2478977 , 11.36151283, 10.46617457,  9.5617925 ,  
 8.64827525, 7.72553056,  6.79346521,  5.85198506,  
 4.90099501,  3.940399  , 2.9701    ,  1.99      ,  1. ]
Here I draw the diagram to help more understand the source code in the demo for reference.


Now, based on the tutorial, we know that the loss function for policy gradients as follows:
 # 最大化 总体 reward (log_p * R) 就是在最小化 -(log_p * R), 而 tf 的功能里只有最小化 loss
 neg_log_prob = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=all_act, labels=self.tf_acts) # 所选 action 的概率 -log 值
 # 下面的方式是一样的:
 # neg_log_prob = tf.reduce_sum(-tf.log(self.all_act_prob)*tf.one_hot(self.tf_acts, self.n_actions), axis=1)
 loss = tf.reduce_mean(neg_log_prob * self.tf_vt)  # (vt = 本reward + 衰减的未来reward) 引导参数的梯度下降

I dump neg_log_prob variable for this episode:
[0.62964076, 0.7338653 , 0.62811565, 0.65259093, 
 0.6791367 , 0.7074002 , 0.7367828 , 0.76643014, 
 0.7952844 , 0.5788585 , 0.8118235 , 0.56644785, 0.8317267 ]

Then, we can verify these values by hand:
 # neg_log_prob = tf.reduce_sum(-tf.log(self.all_act_prob)*tf.one_hot(self.tf_acts, self.n_actions), axis=1)


import math
import numpy as np
one_hot_acts = np.array([[1., 0.],
       [0., 1.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [1., 0.]])

all_act_prob = np.array([[0.53278315, 0.46721685],
       [0.5199501 , 0.48004982],
       [0.53359634, 0.4664037 ],
       [0.520695  , 0.47930512],
       [0.50705457, 0.49294546],
       [0.49292403, 0.50707597],
       [0.47865143, 0.5213486 ],
       [0.4646689 , 0.53533113],
       [0.45145282, 0.5485472 ],
       [0.4394621 , 0.5605378 ],
       [0.44404763, 0.55595237],
       [0.4324621 , 0.56753784],
       [0.43529704, 0.56470305]])

# for neg_log_prob[0]
-math.log((all_act_prob[0]*one_hot_acts[0].transpose())[0])
==> 0.6296407856314258

# for neg_log_prob[1]
-math.log((all_act_prob[1]*one_hot_acts[1].transpose())[1])
==> 0.7338653887995161

# for neg_log_prob[2]
-math.log((all_act_prob[2]*one_hot_acts[2].transpose())[0])
==> 0.6281156434747114
    
So, we can verify that the values are the same
[0.62964076, 0.7338653 , 0.62811565, 0.65259093, 
 0.6791367 , 0.7074002 , 0.7367828 , 0.76643014, 
 0.7952844 , 0.5788585 , 0.8118235 , 0.56644785, 0.8317267 ]
With TensorFlow, we can train this policy gradient and model based on the losss function
with tf.name_scope('train'):
            self.train_op = tf.train.AdamOptimizer(self.lr).minimize(loss)

In sum, by checking out the input data and dumping the temp data, I can more understand policy gradient method for reinforcement learning.

Wednesday, November 14, 2018

[RNN] What are the difference of input and output's tensor shape in dynamic_rnn and static_rnn using TensorFlow

When studying RNN, my first issue encountered in my program is about the shape of input and output tensors. Shape is a very important information to connect between layers. Here I just directly point out what are differences in input/output shape of static RNN and dynamic RNN.
P.S: If you use Keras to write your RNN model, you won't need to deal with these details.

The short example of Static RNN

Please pay a tension about the output shape in the following picture.
batch_size = 32
time_step = 5
input_size = 4
rnn_cell = 20
X = tf.placeholder(tf.float32, shape=[batch_size, time_step, input_size])
x=tf.unstack(X,axis=1)
lstm_cell = rnn.BasicLSTMCell(rnn_cell)
outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
output=outputs[-1]









The short example of Dynamic RNN

Please pay a tension about the output shape in the following picture.

batch_size = 32
time_step = 5
input_size = 4
rnn_cell = 20
X = tf.placeholder(tf.float32, shape=[batch_size, time_step, input_size])
lstm_cell = rnn.BasicLSTMCell(rnn_cell)
outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
outputs=tf.transpose(outputs, [1, 0, 2])
output=outputs[-1]




RNN API:
https://www.tensorflow.org/api_docs/python/tf/nn/static_rnn
https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn

Monday, November 12, 2018

[TensorFlow] The explanation of average gradients by example in data parallelism

When studying some examples of training model using Multi-GPUs ( in data parallelism ), the average gradients function always exists in some kind of ways, and here is a simple version as follows:

def average_gradients(tower_grads):
    average_grads = []
    for grad_and_vars in zip(*tower_grads):
        # Note that each grad_and_vars looks like the following:
        #   ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN))
        grads = []
        for g, _ in grad_and_vars:
            # Add 0 dimension to the gradients to represent the tower.
            expanded_g = tf.expand_dims(g, 0)

            # Append on a 'tower' dimension which we will average over below.
            grads.append(expanded_g)

        # Average over the 'tower' dimension.
        grad = tf.concat(grads, 0)
        grad = tf.reduce_mean(grad, 0)

        # Keep in mind that the Variables are redundant because they are shared
        # across towers. So .. we will just return the first tower's pointer to
        # the Variable.
        v = grad_and_vars[0][1]
        grad_and_var = (grad, v)
        average_grads.append(grad_and_var)
    return average_grads

The purpose of this function is to grab each trainable variable in GPUs and do the average calculation. Here I use the fake data to show what the function average_gradients do and print out the result in details. Hope this way can help the readers to get understand it.
At least, it works for me!

import numpy as np

average_grads = []

# This is the fake data for tower_grads
# we assume it has 3 variables in the model and uses 4 gpus
# so that the tower_grads will look like the following list:
tower_grads = [
[('grad0_gpu0', 'var0_gpu0'), ('grad1_gpu0', 'var1_gpu0') , ('grad2_gpu0', 'var2_gpu0')],
[('grad0_gpu1', 'var0_gpu1'), ('grad1_gpu1', 'var1_gpu1') , ('grad2_gpu1', 'var2_gpu1')],
[('grad0_gpu2', 'var0_gpu2'), ('grad1_gpu2', 'var1_gpu2') , ('grad2_gpu2', 'var2_gpu2')],
[('grad0_gpu3', 'var0_gpu3'), ('grad1_gpu3', 'var1_gpu3') , ('grad2_gpu3', 'var2_gpu3')]]


for grad_and_vars in zip(*tower_grads):
  grads = []
  for g, _ in grad_and_vars:
    # Add 0 dimension to the gradients to represent the tower.
    expanded_g = np.expand_dims(g, 0)
    
    # Append on a 'tower' dimension which we will average over below.
    grads.append(expanded_g)

  # Average over the 'tower' dimension.
  grad = "Avg: " + str(grads)
  print grad
  
  v = grad_and_vars[0][1]
  grad_and_var = (grad, v)
  average_grads.append(grad_and_var)
  
print average_grads

<<<grad>>>
Avg: ['grad0_gpu0', 'grad0_gpu1', 'grad0_gpu2', 'grad0_gpu3']
Avg: ['grad1_gpu0', 'grad1_gpu1', 'grad1_gpu2', 'grad1_gpu3']
Avg: ['grad2_gpu0', 'grad2_gpu1', 'grad2_gpu2', 'grad2_gpu3']


<<<average_grads>>>
[
(Avg: ['grad0_gpu0', 'grad0_gpu1', 'grad0_gpu2', 'grad0_gpu3'], 'var0_gpu0'),
(Avg: ['grad1_gpu0', 'grad1_gpu1', 'grad1_gpu2', 'grad1_gpu3'], 'var0_gpu0'),
(Avg: ['grad2_gpu0', 'grad2_gpu1', 'grad2_gpu2', 'grad2_gpu3'], 'var0_gpu0')
]


P.S:
Here is a simple example to show zip function in Python:

accum_slots = [ "accm_g1", "accm_g2", "accm_g3", "accm_g4", "accm_g5", "accm_g6", "accm_g7"]
grads_and_vars = [ ("g1", "v1"), ("g2", "v2"), ("g3", "v3"), ("g4", "v4"), ("g5", "v5"), ("g6", "v6"), ("g7", "v7")]

for s, (g, _) in zip(accum_slots, grads_and_vars):
  print(s, g)

('accm_g1', 'g1')
('accm_g2', 'g2')
('accm_g3', 'g3')
('accm_g4', 'g4')
('accm_g5', 'g5')
('accm_g6', 'g6')
('accm_g7', 'g7')

Wednesday, November 7, 2018

[Dynamic Control Flow] Whitepaper: Implementation of Control Flow in TensorFlow

In the following whitepaper, we can understand more dynamic control flow in details.
Whitepaper: Implementation of Control Flow in TensorFlow
http://download.tensorflow.org/paper/white_paper_tf_control_flow_implementation_2017_11_1.pdf

There are five basic element ops to perform the while loop function:
Switch : A Switch operator forwards the input tensor d to one of its outputs depending on the
boolean tensor of the control input p. A Switch is enabled for execution when both its inputs are
available.

Merge : A Merge operator forwards one of its available inputs to its output. A Merge is enabled
for execution when any of its inputs is available. It is unspecified which available input it outputs if there are multiple inputs available.

Enter(name) : An Enter operator forwards its input to the execution frame that is uniquely
identified by the given name. This Enter op is used to pass a tensor in one execution frame to a
child execution frame. There can be multiple Enter ops to the same child execution frame, each
making a tensor available (asynchronously) in that child execution frame. An Enter is enabled
for execution when its input is available. A new execution frame is instantiated in the
TensorFlow runtime when the first Enter op to that frame is executed.

Exit : An Exit operator forwards a value from an execution frame to its parent execution frame.
This Exit op is used to return a tensor computed in a child execution frame back to its parent
frame. There can be multiple Exit ops to the parent frame, each asynchronously passing a
tensor back to the parent frame. An Exit is enabled when its input is available.

NextIteration: A NextIteration operator forwards its input to the next iteration in the current
execution frame.

We also can see the implementation of while_loop function in C++ as follows:
tensorflow/cc/ops/while_loop.cc

Status BuildWhileLoop(const Scope& scope, const std::vector<Output>& inputs,
                      const CondGraphBuilderFn& cond,
                      const BodyGraphBuilderFn& body, const string& frame_name,
                      OutputList* outputs, bool create_while_ctx,
                      Output* cond_output)

A while loop with a single loop variable looks like this:

(output)
    ^    +---------------+
    |    | body subgraph +-------------+
   Exit  +---------------+             |
     ^    ^                            |
     |    |                            |
     Switch<--------+                  v
       ^            |             NextIteration
       |     +------+--------+         |
       +---->| cond subgraph |         |
       |     +---------------+         |
      Merge<---------------------------+
      ^
      |
   Enter
     ^
     |
  (input)

If there are multiple loop variables, each of the control flow ops is
duplicated for each loop variable.

Because of so many points in it, here I only want to highlight and mention the point of memory swapping related as follows:
"To reuse forward values in backprop loop, we automatically detect, during the construction of
the backprop while loop, the forward values that are needed in the backprop. For each such
forward value x, we automatically introduce a stack and add nodes in the forward loop to save
its value at each iteration to the stack. The backprop loop uses the values from the stack in the
reverse order. The stack lives outside the forward and backprop loops and is shared by the
two loops."


And this stack push operation can be used in While_Loop operation, and TensorFlow will generate stack pop one in the backpropagation phase. If you check the source code, it can be found in GradLoopState Class.

tensorflow/python/ops/control_flow_ops.py
780 class GradLoopState(object):
781   """The state used for constructing the gradient graph for a while loop.
782 
783   We create a GradLoopState for each while loop in forward and its
784   corresponding while loop in backprop. This gives us access to both
785   the forward and the backprop WhileContexts.
786 
787   During the construction of gradient graph, any time when we detect
788   a forward value that is needed for backprop, we create a history
789   accumulator and add it to `history_map`. Any time when we backprop
790   a loop switch op (in _SwitchGrad), we add the grad merge op in
791   `switch_map`.
792   """
...
...
For more explanation and experiments in dynamic control flow, please refer to the paper:
Dynamic Control Flow in Large-Scale Machine Learning

Tuesday, October 30, 2018

[TensorFlow] Train in Tensorflow and do inference with the trained model

If you want to train your model in Tensorflow and do inference with the trained model, you can refer to this post.

1. Train your model

I will use the simple CNN model in my previous post:
[ONNX] Train in Tensorflow and export to ONNX (Part II)
https://danny270degree.blogspot.com/2018/08/onnx-train-in-tensorflow-and-export-to_20.html

So, after training, you will get these files:
my_mnist/
├── checkpoint
├── graph.pbtxt
├── my_mnist_model.data-00000-of-00001
├── my_mnist_model.index
└── my_mnist_model.meta

2. Freeze graph

W need to run tensorflow/python/tools/freeze_graph.py to convert the checkpoint values into embedded constants within the graph file itself. Here I use another script to freeze the model by TensorFlow's freeze_graph:  ( the format of the input graph is text format )
bazel build tensorflow/python/tools:freeze_graph
bazel-bin/tensorflow/python/tools/freeze_graph \
    --input_graph=/danny/pyutillib/my_mnist/graph.pbtxt \
    --input_checkpoint=/danny/pyutillib/my_mnist/my_mnist_model \
    --output_graph=/danny/pyutillib/frozen_graph.pb \
    --output_node_names=output/output/BiasAdd \
    --input_binary=False
my_mnist
|-- checkpoint
|-- frozen_graph.pb  <== it will be generated.
|-- graph.pbtxt
|-- my_mnist_model.data-00000-of-00001
|-- my_mnist_model.index
`-- my_mnist_model.meta

P.S: The difficult part is to find out the output node name and input node name in the further using.

3. Transform graph

Here we use the graph transform tool from TensorFlow. For more in details, please check out this document:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms
bazel build tensorflow/tools/graph_transforms:transform_graph
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph='/danny/pyutillib/my_mnist/frozen_graph.pb' \
--out_graph='/danny/pyutillib/my_mnist/optimized_frozen_graph.pb' \
--inputs='inputs/X:0' \
--outputs='output/output/BiasAdd:0' \
--transforms='
  strip_unused_nodes
  fold_constants
  fold_batch_norms
  fold_old_batch_norms
  quantize_weights'
my_mnist/
|-- checkpoint
|-- frozen_graph.pb
|-- graph.pbtxt
|-- my_mnist_model.data-00000-of-00001
|-- my_mnist_model.index
|-- my_mnist_model.meta
`-- optimized_frozen_graph.pb <==  it will be generated.

4. Do inference for your model

I will use this brief example to do the inference using your frozen and optimized model.
import argparse
import tensorflow as tf
import numpy as np
import tensorflow.examples.tutorials.mnist.input_data as input_data

n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)

def load_graph(frozen_graph_filename):
    # We load the protobuf file from the disk and parse it to retrieve the
    # unserialized graph_def
    with tf.gfile.GFile(frozen_graph_filename, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())

    # Then, we import the graph_def into a new Graph and returns it
    with tf.Graph().as_default() as graph:
        # The name var will prefix every op/nodes in your graph
        # Since we load everything in a new graph, this is not needed
        tf.import_graph_def(graph_def, name="prefix")
    return graph

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--frozen_model_filename", default="./optimized_frozen_graph.pb", type=str, help = "Quantized/Frozen model to import")
    args = parser.parse_args()

    graph = load_graph(args.frozen_model_filename)
    #for op in graph.get_operations():
    #    print(op.name)
    input_node  = graph.get_tensor_by_name('prefix/inputs/X:0')
    output_node = graph.get_tensor_by_name('prefix/output/output/BiasAdd:0')
    
    picture = np.ones([1, 784])
    print('picture:', picture)
    with tf.Session(graph=graph) as sess:
        _ = sess.run(output_node, feed_dict={input_node: picture})
        for _output in _:
            print("result:", np.argmax(_output))

Run this example:
$ python mnist_inference.py --frozen_model_filename ./my_mnist/optimized_frozen_graph.pb

('picture:', array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]))
...
...
('result:', 5)

Or we can use the grayscale image(28x28) which is drug by myself as follow:


Then, we need to change the code a little bit for our image
    picture = cv2.imread("my_mnist/2.png", cv2.IMREAD_GRAYSCALE)
    print('picture:', picture)
    picture = picture.reshape(1, 784) # from (28, 28) to (1, 784)
    with tf.Session(graph=graph) as sess:
        _ = sess.run(output_node, feed_dict={input_node: picture})
        for _output in _:
            print("result:", np.argmax(_output))

Finally, we will get the result: 2
('picture:', array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,  255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0, 255, 255, 255, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0, 255, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0, 255, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0, 255,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0],
                    [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0]], 
     dtype=uint8))

...
...

('result:', 2)


Yes, I get the result as the same as we expect!

P.S: If you trained your model in NCHW data format, your model will not be able to do inference in CPU environment. And, it cannot be converted to NHWC.
Someone tried to convert the data format, but cannot work. Please check out the link:
https://stackoverflow.com/questions/47014306/freeze-graph-with-different-data-format?rq=1

P.S: Here is another graph tool: optimize_for_inference, but it's old one.
TransformGraph is the new one.
https://stackoverflow.com/questions/45382917/how-to-optimize-for-inference-a-simple-saved-tensorflow-1-0-1-graph
http://bbs.bugcode.cn/t/63211



Wednesday, October 24, 2018

[LLVM] LLVM studying list for newbie

If you are an LLVM newbie and are interested in LLVM like me, you may take a look at my LLVM studying list. It takes time for me to search the related resources and documents. So, I think it will help somehow. By the way, most of my list items are written in Chinese so that those who are native Engish speakers may not suit for this.

1. LLVM installation and concept

The first time to try LLVM ( in Chinese )
https://medium.com/@zetavg/%E7%B7%A8%E8%AD%AF%E5%99%A8-llvm-%E6%B7%BA%E6%B7%BA%E7%8E%A9-42a58c7a7309

llvm之旅第一站 - 编译及简单使用,llvm,clang

llvm之旅第二站 - 环境配置,llvm,clang

llvm之旅第三站- 認識LLVM IR,llvm,clang

LLVM Tutorial  ( in English )
https://llvm.org/docs/tutorial/

LLVM Language Reference Manual
http://llvm.org/docs/LangRef.html

COSCUP2016 - LLVM框架、由淺入淺 ( Slideshare )
https://www.slideshare.net/hydai/coscup2016-llvm

LLVM introduction
https://people.cs.nctu.edu.tw/~chenwj/dokuwiki/doku.php?id=llvm

2. LLVM Pass

llvm之旅第四站 - 编写Pass,llvm,clang
http://www.nagain.com/activity/article/14/

llvm之旅第五站 - 调试Pass,llvm,clang
http://www.nagain.com/activity/article/20/

llvm:Call Graph And Control Flow Graph | PCB博客
http://blog.binpang.me/2017/05/20/llvm-CGAndCFG/

3. LLVM more in-depth

LLVM

Synthesis Flow @ LLVM 2.8
http://funningboy.blogspot.com/2011/02/

使用 LLVM 框架创建一个工作编译器,第 1 部分
https://www.ibm.com/developerworks/cn/opensource/os-createcompilerllvm1/


Tuesday, October 23, 2018

[TensorFlow] Does it help the processing time and transmission time if increasing CUDA Steam number in TensorFlow?

Before starting to increase the CUDA Steam number in TensorFlow, I want to recap some ideas about the Executor module. When TensorFlow session runs, it will build Executor. Meanwhile, if you enable CUDA in TensorFlow build configuration, the Executor will add visible GPU devices and create TF device object (GPUDevice object) mapping to physical GPU device. There are 4 kinds of streams inside GPUDevice:

  • CUDA stream 
  • Host_to_Device stream
  • Device_to_Host stream
  • Device_to_Device stream

By default, these 4 kinds of streams only will have 1 stream for each. Please check out the following code:
class GPUDevice : public BaseGPUDevice {
 public:
  GPUDevice(const SessionOptions& options, const string& name,
            Bytes memory_limit, const DeviceLocality& locality,
            TfGpuId tf_gpu_id, const string& physical_device_desc,
            Allocator* gpu_allocator, Allocator* cpu_allocator)
      : BaseGPUDevice(options, name, memory_limit, locality, tf_gpu_id,
                      physical_device_desc, gpu_allocator, cpu_allocator,
                      false /* sync every op */, 1 /* max_streams */) {
    if (options.config.has_gpu_options()) {
      force_gpu_compatible_ =
          options.config.gpu_options().force_gpu_compatible();
    }
  }
  ...
  ...

If I change it to 2, does it help to improve the training or inference speed? In my experiment, the answer is "No". Please see the pictures below:

My case does a lot of memcpy between GPU and CPU devices and "stream=2" doesn't help to improve the processing time and transmission time. The result also makes sense because the bottleneck is in GPU SM for data processing time and PCIe for data transmission time.



Wednesday, October 17, 2018

[TensorFlow Grappler] How to do the topological sorting in TensorFlow Grappler?

If you try to implement some optimizers in TensorFlow Grappler, you must have to know how to deal with the directed computation graph. One of the most important tools/knowledges is topological sorting.
The definition from Wiki: Topological sorting
https://en.wikipedia.org/wiki/Topological_sorting
"In the field of computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering."

In TensorFlow, there are topological sort related functions already in the following link and we can take advantage of them.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/grappler/utils/topological_sort.cc
First of all, we have this directed computation graph and can see the node: "fc1/fc1/MatMul"


Here is an example to dump the order of nodes in the graph using topological sorting
static std::unordered_map<const NodeDef*, int> GetTopoOrdering(GrapplerItem* item) {
  std::unordered_map<const NodeDef*, int> topo_order;
  ComputeTopologicalOrder(item->graph, &topo_order, nullptr);
  for ( auto& n : topo_order){
    const string& node_name = n.first->name();
    const int order = n.second;
    VLOG(1) << "...[DEBUG2] Node " << node_name << " at TopoOrdering order " << order;
  }
  return topo_order;
}

Result:
...[DEBUG2] Node train/Adam at TopoOrdering order 130
...[DEBUG2] Node train/Adam/Assign at TopoOrdering order 128
...[DEBUG2] Node train/Adam/mul at TopoOrdering order 126
...[DEBUG2] Node train/Adam/update_conv1/bias/ApplyAdam at TopoOrdering order 124
...[DEBUG2] Node train/gradients/conv1/BiasAdd_grad/BiasAddGrad at TopoOrdering order 122
...[DEBUG2] Node train/Adam/update_conv2/kernel/ApplyAdam at TopoOrdering order 121
...[DEBUG2] Node train/gradients/conv1/Relu_grad/ReluGrad at TopoOrdering order 120
...[DEBUG2] Node train/Adam/update_conv1/kernel/ApplyAdam at TopoOrdering order 125
...[DEBUG2] Node train/gradients/conv2/Conv2D_grad/Conv2DBackpropFilter at TopoOrdering order 119
...[DEBUG2] Node train/gradients/conv2/Relu_grad/ReluGrad at TopoOrdering order 115
...[DEBUG2] Node train/gradients/pool3/dropout/cond/dropout/div_grad/tuple/control_dependency at TopoOrdering order 109
...[DEBUG2] Node train/gradients/pool3/dropout/cond/dropout/div_grad/RealDiv at TopoOrdering order 108
...[DEBUG2] Node train/gradients/conv1/Conv2D_grad/Conv2DBackpropFilter at TopoOrdering order 123
...[DEBUG2] Node train/gradients/pool3/dropout/cond/Identity/Switch_grad/cond_grad at TopoOrdering order 106
...[DEBUG2] Node train/Adam/update_fc1/kernel/ApplyAdam at TopoOrdering order 105
...[DEBUG2] Node train/gradients/pool3/MaxPool_grad/MaxPoolGrad-2-TransposeNHWCToNCHW-LayoutOptimizer at TopoOrdering order 113
...[DEBUG2] Node train/gradients/pool3/dropout/cond/Merge_grad/cond_grad at TopoOrdering order 104
...[DEBUG2] Node train/Adam/update_output/kernel/ApplyAdam at TopoOrdering order 99
...[DEBUG2] Node train/gradients/fc1/fc1/Relu_grad/ReluGrad at TopoOrdering order 98
...[DEBUG2] Node train/gradients/output/output/MatMul_grad/MatMul at TopoOrdering order 95
...[DEBUG2] Node train/gradients/conv2/BiasAdd_grad/BiasAddGrad at TopoOrdering order 116
...[DEBUG2] Node train/gradients/output/output/BiasAdd_grad/BiasAddGrad at TopoOrdering order 94
...[DEBUG2] Node output/output/BiasAdd at TopoOrdering order 90
...[DEBUG2] Node output/output/MatMul at TopoOrdering order 89
...[DEBUG2] Node fc1/fc1/BiasAdd at TopoOrdering order 87
...[DEBUG2] Node fc1/fc1/Relu at TopoOrdering order 88
...[DEBUG2] Node pool3/dropout/cond/Merge at TopoOrdering order 85
...[DEBUG2] Node pool3/dropout/cond/dropout/mul at TopoOrdering order 82
...[DEBUG2] Node train/gradients/fc1/fc1/MatMul_grad/MatMul at TopoOrdering order 101
...[DEBUG2] Node train/gradients/zeros/Const at TopoOrdering order 80
...[DEBUG2] Node train/gradients/Shape_1 at TopoOrdering order 79
...[DEBUG2] Node pool3/dropout/cond/dropout/div at TopoOrdering order 78
...[DEBUG2] Node ConstantFoldingCtrl/pool3/dropout/cond/dropout/div/Switch_1 at TopoOrdering order 77
...[DEBUG2] Node train/gradients/Switch at TopoOrdering order 76
...[DEBUG2] Node pool3/dropout/cond/dropout/div/Switch at TopoOrdering order 75
...[DEBUG2] Node pool3/MaxPool-1-0-TransposeNCHWToNHWC-LayoutOptimizer at TopoOrdering order 73
...[DEBUG2] Node pool3/dropout/cond/dropout/Floor at TopoOrdering order 72
...[DEBUG2] Node pool3/MaxPool at TopoOrdering order 71
...[DEBUG2] Node ConstantFolding/train/gradients/conv2/Conv2D_grad/ShapeN-matshapes-1 at TopoOrdering order 67
...[DEBUG2] Node conv2/BiasAdd at TopoOrdering order 66
...[DEBUG2] Node train/gradients/pool3/dropout/cond/dropout/mul_grad/Mul at TopoOrdering order 107
...[DEBUG2] Node train/gradients/conv2/Conv2D_grad/ShapeN at TopoOrdering order 64
...[DEBUG2] Node pool3/dropout/cond/dropout/random_uniform/RandomUniform at TopoOrdering order 62
...[DEBUG2] Node pool3/dropout/cond/dropout/Shape at TopoOrdering order 59
...[DEBUG2] Node train/gradients/conv2/Conv2D_grad/Conv2DBackpropInput at TopoOrdering order 118
...[DEBUG2] Node pool3/dropout/cond/dropout/keep_prob at TopoOrdering order 58
...[DEBUG2] Node conv1/BiasAdd at TopoOrdering order 57
...[DEBUG2] Node ConstantFolding/train/gradients/conv1/Conv2D_grad/ShapeN-matshapes-1 at TopoOrdering order 54
...[DEBUG2] Node train/gradients/conv1/Conv2D_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer at TopoOrdering order 52
...[DEBUG2] Node conv1/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer at TopoOrdering order 51
...[DEBUG2] Node train/Adam/mul_1 at TopoOrdering order 127
...[DEBUG2] Node train/beta2_power/read at TopoOrdering order 50
...[DEBUG2] Node train/gradients/zeros_1 at TopoOrdering order 84
...[DEBUG2] Node train/beta1_power/read at TopoOrdering order 49
...[DEBUG2] Node train/Adam/update_fc1/bias/ApplyAdam at TopoOrdering order 103
...[DEBUG2] Node train/gradients/output/output/MatMul_grad/MatMul_1 at TopoOrdering order 96
...[DEBUG2] Node output/bias/read at TopoOrdering order 48
...[DEBUG2] Node output/kernel/read at TopoOrdering order 47
...[DEBUG2] Node fc1/bias/read at TopoOrdering order 46
...[DEBUG2] Node fc1/kernel/read at TopoOrdering order 45
...[DEBUG2] Node conv2/kernel/read at TopoOrdering order 43
...[DEBUG2] Node conv1/bias/read at TopoOrdering order 42
...[DEBUG2] Node conv1/kernel/read at TopoOrdering order 41
...[DEBUG2] Node train/gradients/fc1/fc1/MatMul_grad/MatMul_1 at TopoOrdering order 102
...[DEBUG2] Node inputs/training at TopoOrdering order 40
...[DEBUG2] Node inputs/Reshape at TopoOrdering order 39
...[DEBUG2] Node PermConstNCHWToNHWC-LayoutOptimizer at TopoOrdering order 38
...[DEBUG2] Node pool3/dropout/cond/dropout/random_uniform at TopoOrdering order 68
...[DEBUG2] Node PermConstNHWCToNCHW-LayoutOptimizer at TopoOrdering order 37
...[DEBUG2] Node train/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits at TopoOrdering order 91
...[DEBUG2] Node train/Adam/epsilon at TopoOrdering order 36
...[DEBUG2] Node conv2/Conv2D at TopoOrdering order 63
...[DEBUG2] Node train/Adam/beta2 at TopoOrdering order 35
...[DEBUG2] Node train/Adam/update_output/bias/ApplyAdam at TopoOrdering order 97
...[DEBUG2] Node train/Adam/beta1 at TopoOrdering order 34
...[DEBUG2] Node train/gradients/train/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits_grad/PreventGradient at TopoOrdering order 92
...[DEBUG2] Node pool3/Reshape at TopoOrdering order 74
...[DEBUG2] Node conv2/Relu at TopoOrdering order 69
...[DEBUG2] Node output/bias/Adam_1 at TopoOrdering order 32
...[DEBUG2] Node train/gradients/zeros at TopoOrdering order 83
...[DEBUG2] Node output/bias/Adam at TopoOrdering order 31
...[DEBUG2] Node output/kernel/Adam_1 at TopoOrdering order 30
...[DEBUG2] Node output/kernel/Adam at TopoOrdering order 29
...[DEBUG2] Node fc1/fc1/MatMul at TopoOrdering order 86
...[DEBUG2] Node fc1/bias/Adam_1 at TopoOrdering order 28
...[DEBUG2] Node train/gradients/pool3/MaxPool_grad/MaxPoolGrad at TopoOrdering order 114
...[DEBUG2] Node conv1/Relu at TopoOrdering order 61
...[DEBUG2] Node fc1/bias/Adam at TopoOrdering order 27
...[DEBUG2] Node train/gradients/pool3/Reshape_grad/Reshape at TopoOrdering order 112
...[DEBUG2] Node fc1/kernel/Adam_1 at TopoOrdering order 26
...[DEBUG2] Node train/gradients/pool3/dropout/cond/dropout/div/Switch_grad/cond_grad at TopoOrdering order 110
...[DEBUG2] Node pool3/dropout/cond/dropout/add at TopoOrdering order 70
...[DEBUG2] Node fc1/kernel/Adam at TopoOrdering order 25
...[DEBUG2] Node conv2/bias/Adam_1 at TopoOrdering order 24
...[DEBUG2] Node train/gradients/train/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits_grad/mul at TopoOrdering order 93
...[DEBUG2] Node train/gradients/Shape_2 at TopoOrdering order 81
...[DEBUG2] Node conv2/bias/Adam at TopoOrdering order 23
...[DEBUG2] Node train/Adam/update_conv2/bias/ApplyAdam at TopoOrdering order 117
...[DEBUG2] Node train/gradients/AddN at TopoOrdering order 111
...[DEBUG2] Node conv2/kernel/Adam_1 at TopoOrdering order 22
...[DEBUG2] Node conv2/kernel/Adam at TopoOrdering order 21
...[DEBUG2] Node pool3/dropout/cond/dropout/random_uniform/mul at TopoOrdering order 65
...[DEBUG2] Node conv1/bias/Adam_1 at TopoOrdering order 20
...[DEBUG2] Node conv1/kernel/Adam_1 at TopoOrdering order 18
...[DEBUG2] Node train/Adam/Assign_1 at TopoOrdering order 129
...[DEBUG2] Node conv1/kernel/Adam at TopoOrdering order 17
...[DEBUG2] Node train/beta2_power at TopoOrdering order 16
...[DEBUG2] Node train/beta1_power at TopoOrdering order 15
...[DEBUG2] Node train/gradients/pool3/Reshape_grad/Shape at TopoOrdering order 14
...[DEBUG2] Node train/gradients/train/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits_grad/ExpandDims at TopoOrdering order 13
...[DEBUG2] Node pool3/dropout/cond/switch_t at TopoOrdering order 56
...[DEBUG2] Node conv1/bias/Adam at TopoOrdering order 19
...[DEBUG2] Node output/bias at TopoOrdering order 12
...[DEBUG2] Node fc1/bias at TopoOrdering order 10
...[DEBUG2] Node train/Adam/learning_rate at TopoOrdering order 33
...[DEBUG2] Node fc1/kernel at TopoOrdering order 9
...[DEBUG2] Node pool3/Reshape/shape at TopoOrdering order 8
...[DEBUG2] Node conv2/bias at TopoOrdering order 7
...[DEBUG2] Node pool3/dropout/cond/Switch at TopoOrdering order 53
...[DEBUG2] Node conv2/kernel at TopoOrdering order 6
...[DEBUG2] Node conv1/bias at TopoOrdering order 5
...[DEBUG2] Node conv1/kernel at TopoOrdering order 4
...[DEBUG2] Node conv1/Conv2D at TopoOrdering order 55
...[DEBUG2] Node inputs/training/input at TopoOrdering order 3
...[DEBUG2] Node output/kernel at TopoOrdering order 11
...[DEBUG2] Node inputs/y at TopoOrdering order 2
...[DEBUG2] Node train/gradients/fc1/fc1/BiasAdd_grad/BiasAddGrad at TopoOrdering order 100
...[DEBUG2] Node ConstantFolding/pool3/dropout/cond/dropout/div_recip at TopoOrdering order 60
...[DEBUG2] Node inputs/Reshape/shape at TopoOrdering order 1
...[DEBUG2] Node conv2/bias/read at TopoOrdering order 44
...[DEBUG2] Node inputs/X at TopoOrdering order 0


So, we can see the sequence of the following nodes in the directed computation graph:
fc1/fc1/MatMul ==> fc1/fc1/BiasAdd ==> fc1/fc1/Relu
And the topological sorting method gives us the same sequence

...[DEBUG2] Node fc1/fc1/BiasAdd at TopoOrdering order 87
...[DEBUG2] Node fc1/fc1/Relu at TopoOrdering order 88
...[DEBUG2] Node fc1/fc1/MatMul at TopoOrdering order 86