Thursday, January 3, 2019

[TensorFlow] How to generate the Cost and Model Report from Grappler?

General speaking, Grappler in Tensorflow has several optimizers to do the specific area optimizations, such as for reducing the peak memory usage in GPU. So, I want to introduce some useful functions inside Grappler which are used for Simple Placer mechanism. And, these functions are also partially used in Grappler's optimizers.



Model:
( Here is an important point: You should give your placeholder with a specific number for batch size. If the placeholder that contains batch size doesn't have the batch size information, the following estimation of cost and model report will be incorrect because of using batch size equals 1 )

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.python.client import timeline
from tensorflow.python.grappler import item as gitem
from tensorflow.python.grappler import cluster as gcluster
from tensorflow.core.framework import attr_value_pb2
# Common imports
import numpy as np
import os

# Create the model
batch_size=100
x = tf.placeholder(tf.float32, [batch_size, 784])
w = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.matmul(x, w) + b

# Define loss and optimizer. 
# The minimize() funtion will build the backward propagation graph.
y_ = tf.placeholder(tf.int64, [batch_size])
cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
init = tf.global_variables_initializer()
Then, this part is to generate the cost report for my simple model.
Generate Cost Report:
from tensorflow.python.grappler import cluster as gcluster
from tensorflow.python.grappler import item as gitem
cluster = gcluster.Cluster(disable_detailed_stats=False)
m = tf.train.export_meta_graph(graph=tf.get_default_graph())

cost_report = tf.pywrap_tensorflow.GenerateCostReport(m.SerializeToString(), True, False, cluster.tf_cluster)
Result: ( you can see the cost details by operation )
print(cost_report)

Total time measured in ns (serialized):                         119000
Total time measured in ns (actual):                            1257333
Total time analytical in ns (upper bound):                           0
Total time analytical in ns (lower bound):                           0
Overall efficiency (analytical upper/actual):                        0
Overall efficiency (analytical lower/actual):                        0

                                 Op,          Count,  Measured time (ns),    Time percent,     Acc percent,    Analytical upper,    Analytical lower,      Overall eff      Compute eff       Memory eff
                         VariableV2,              2,               28000,             24%,             24%,                   0,                   0,              0%,              0%,              0%,
SparseSoftmaxCrossEntropyWithLogits,              1,               23000,             19%,             43%,                   0,                   0,              0%,              0%,              0%,
                             MatMul,              2,               17000,             14%,             57%,                   0,                   0,              0%,              0%,              0%,
                         ExpandDims,              1,               10000,            8.4%,             66%,                   0,                   0,              0%,              0%,              0%,
                              Const,              1,                9000,            7.6%,             73%,                   0,                   0,              0%,              0%,              0%,
                           Identity,              2,                9000,            7.6%,             81%,                   0,                   0,              0%,              0%,              0%,
                                Sum,              1,                7000,            5.9%,             87%,                   0,                   0,              0%,              0%,              0%,
               ApplyGradientDescent,              2,                6000,              5%,             92%,                   0,                   0,              0%,              0%,              0%,
                               NoOp,              1,                4000,            3.4%,             95%,                   0,                   0,              0%,              0%,              0%,
                                Add,              1,                3000,            2.5%,             97%,                   0,                   0,              0%,              0%,              0%,
                                Mul,              1,                3000,            2.5%,          1e+02%,                   0,                   0,              0%,              0%,              0%,

Below is the per-node report summary:
                                 Op,  Measured time (ns),   Compute time (ns),    Memory time (ns),     Compute eff,      Memory eff,    Inputs
                         VariableV2,               22000,                   0,                   0,           -inf%,           -inf%,    []
                           Identity,                5000,                   0,                   0,           -inf%,           -inf%,    [(784, 10)]
                         VariableV2,                6000,                   0,                   0,           -inf%,           -inf%,    []
                           Identity,                4000,                   0,                   0,           -inf%,           -inf%,    [(10)]
                             MatMul,               11000,                   0,                   0,           -inf%,           -inf%,    [(784, 10)]
                                Add,                3000,                   0,                   0,           -inf%,           -inf%,    [(100, 10), (10)]
SparseSoftmaxCrossEntropyWithLogits,               23000,                   0,                   0,           -inf%,           -inf%,    [(100, 10), ]
                         ExpandDims,               10000,                   0,                   0,           -inf%,           -inf%,    []
                                Mul,                3000,                   0,                   0,           -inf%,           -inf%,    [(100, 1), ]
                                Sum,                7000,                   0,                   0,           -inf%,           -inf%,    [(100, 10), ]
                             MatMul,                6000,                   0,                   0,           -inf%,           -inf%,    []
                              Const,                9000,                   0,                   0,           -inf%,           -inf%,    []
               ApplyGradientDescent,                3000,                   0,                   0,           -inf%,           -inf%,    [(784, 10), ]
               ApplyGradientDescent,                3000,                   0,                   0,           -inf%,           -inf%,    [(10), ]
                               NoOp,                4000,                   0,                   0,           -inf%,           -inf%,    []

Generate Model Report:
from tensorflow.python.grappler import cluster as gcluster
from tensorflow.python.grappler import item as gitem
cluster = gcluster.Cluster(disable_detailed_stats=False)
m = tf.train.export_meta_graph(graph=tf.get_default_graph())

model_report = tf.pywrap_tensorflow.GenerateModelReport(m.SerializeToString(), True, False)

Result: ( you can see the operations in details )
print(model_report)

GradientDescent [NoOp]
GradientDescent/update_Variable_1/ApplyGradientDescent [ApplyGradientDescent]
        output 0 (float_ref) has shape [10]
gradients/add_grad/tuple/control_dependency_1 [Identity]
        output 0 (float) has shape [10]
gradients/add_grad/tuple/group_deps [NoOp]
gradients/add_grad/Reshape_1 [Reshape]
        output 0 (float) has shape [10]
gradients/add_grad/Shape_1 [Const]
        output 0 (int32) has shape [1]
gradients/add_grad/Sum_1 [Sum]
        output 0 (float) has shape ?
gradients/add_grad/BroadcastGradientArgs [BroadcastGradientArgs]
        output 0 (int32) has shape [x6]
        output 1 (int32) has shape [x7]
gradients/add_grad/Shape [Const]
        output 0 (int32) has shape [2]
gradients/sparse_softmax_cross_entropy_loss/xentropy/xentropy_grad/mul [Mul]
        output 0 (float) has shape [100, 10]
gradients/sparse_softmax_cross_entropy_loss/xentropy/xentropy_grad/PreventGradient [PreventGradient]
        output 0 (float) has shape [100, 10]
sparse_softmax_cross_entropy_loss/xentropy/xentropy [SparseSoftmaxCrossEntropyWithLogits]
        output 0 (float) has shape [100]
        output 1 (float) has shape [100, 10]
Placeholder_1 [Placeholder]
        output 0 (int64) has shape [100]
add [Add]
        output 0 (float) has shape [100, 10]
Variable_1/read [Identity]
        output 0 (float) has shape [10]
Variable_1 [VariableV2]
        output 0 (float_ref) has shape [10]
MatMul [MatMul]
        output 0 (float) has shape [100, 10]
Variable/read [Identity]
        output 0 (float) has shape [784, 10]
Variable [VariableV2]
        output 0 (float_ref) has shape [784, 10]
Placeholder [Placeholder]
        output 0 (float) has shape [100, 784]
gradients/sparse_softmax_cross_entropy_loss/xentropy/xentropy_grad/ExpandDims [ExpandDims]
        output 0 (float) has shape [100, 1]
gradients/sparse_softmax_cross_entropy_loss/xentropy/xentropy_grad/ExpandDims/dim [Const]
        output 0 (int32) has shape []
gradients/sparse_softmax_cross_entropy_loss/Mul_grad/tuple/control_dependency [Identity]
        output 0 (float) has shape [100]
gradients/sparse_softmax_cross_entropy_loss/Mul_grad/tuple/group_deps [NoOp]
gradients/sparse_softmax_cross_entropy_loss/Mul_grad/Reshape_1 [Reshape]
        output 0 (float) has shape []
gradients/sparse_softmax_cross_entropy_loss/Mul_grad/Shape_1 [Const]
        output 0 (int32) has shape [0]
gradients/sparse_softmax_cross_entropy_loss/Mul_grad/Sum_1 [Sum]
        output 0 (float) has shape ?
gradients/sparse_softmax_cross_entropy_loss/Mul_grad/BroadcastGradientArgs [BroadcastGradientArgs]
        output 0 (int32) has shape [x4]
        output 1 (int32) has shape [x5]
gradients/sparse_softmax_cross_entropy_loss/Mul_grad/Shape [Const]
        output 0 (int32) has shape [1]
gradients/sparse_softmax_cross_entropy_loss/Mul_grad/Mul_1 [Mul]
        output 0 (float) has shape [100]
gradients/sparse_softmax_cross_entropy_loss/Sum_grad/Tile [Tile]
        output 0 (float) has shape [100]
gradients/sparse_softmax_cross_entropy_loss/Sum_grad/Const [Const]
        output 0 (int32) has shape [1]
gradients/sparse_softmax_cross_entropy_loss/Sum_grad/Reshape [Reshape]
        output 0 (float) has shape [1]
gradients/sparse_softmax_cross_entropy_loss/Sum_grad/Reshape/shape [Const]
        output 0 (int32) has shape [1]
gradients/sparse_softmax_cross_entropy_loss/Sum_1_grad/Tile [Tile]
        output 0 (float) has shape []
gradients/sparse_softmax_cross_entropy_loss/Sum_1_grad/Const [Const]
        output 0 (int32) has shape [0]
gradients/sparse_softmax_cross_entropy_loss/Sum_1_grad/Reshape [Reshape]
        output 0 (float) has shape []
gradients/sparse_softmax_cross_entropy_loss/Sum_1_grad/Reshape/shape [Const]
        output 0 (int32) has shape [0]
gradients/sparse_softmax_cross_entropy_loss/div_grad/tuple/control_dependency [Identity]
        output 0 (float) has shape []
gradients/sparse_softmax_cross_entropy_loss/div_grad/tuple/group_deps [NoOp]
gradients/sparse_softmax_cross_entropy_loss/div_grad/Reshape_1 [Reshape]
        output 0 (float) has shape []
gradients/sparse_softmax_cross_entropy_loss/div_grad/Shape_1 [Const]
        output 0 (int32) has shape [0]
gradients/sparse_softmax_cross_entropy_loss/div_grad/Sum_1 [Sum]
        output 0 (float) has shape ?
gradients/sparse_softmax_cross_entropy_loss/div_grad/BroadcastGradientArgs [BroadcastGradientArgs]
        output 0 (int32) has shape [x2]
        output 1 (int32) has shape [x3]
gradients/sparse_softmax_cross_entropy_loss/div_grad/Shape [Const]
        output 0 (int32) has shape [0]
gradients/sparse_softmax_cross_entropy_loss/div_grad/mul [Mul]
        output 0 (float) has shape []
gradients/sparse_softmax_cross_entropy_loss/div_grad/RealDiv_2 [RealDiv]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/Select [Select]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/num_present [Sum]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/num_present/Const [Const]
        output 0 (int32) has shape [1]
sparse_softmax_cross_entropy_loss/assert_broadcastable/static_scalar_check_success [NoOp]
sparse_softmax_cross_entropy_loss/num_present/broadcast_weights [Mul]
        output 0 (float) has shape [100]
sparse_softmax_cross_entropy_loss/num_present/broadcast_weights/ones_like [Fill]
        output 0 (float) has shape [100]
sparse_softmax_cross_entropy_loss/num_present/broadcast_weights/ones_like/Const [Const]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/num_present/broadcast_weights/assert_broadcastable/static_scalar_check_success [NoOp]
sparse_softmax_cross_entropy_loss/num_present/broadcast_weights/ones_like/Shape [Const]
        output 0 (int32) has shape [1]
sparse_softmax_cross_entropy_loss/num_present/Select [Select]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/num_present/ones_like [Fill]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/num_present/ones_like/Const [Const]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/num_present/ones_like/Shape [Const]
        output 0 (int32) has shape [0]
sparse_softmax_cross_entropy_loss/num_present/zeros_like [Const]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/num_present/Equal [Equal]
        output 0 (bool) has shape []
sparse_softmax_cross_entropy_loss/num_present/Equal/y [Const]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/Const [Const]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/ones_like [Fill]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/ones_like/Const [Const]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/ones_like/Shape [Const]
        output 0 (int32) has shape [0]
sparse_softmax_cross_entropy_loss/Equal [Equal]
        output 0 (bool) has shape []
sparse_softmax_cross_entropy_loss/Equal/y [Const]
        output 0 (float) has shape []
gradients/sparse_softmax_cross_entropy_loss/div_grad/RealDiv_1 [RealDiv]
        output 0 (float) has shape []
gradients/sparse_softmax_cross_entropy_loss/div_grad/Neg [Neg]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/Sum_1 [Sum]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/Const_2 [Const]
        output 0 (int32) has shape [0]
sparse_softmax_cross_entropy_loss/Sum [Sum]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/Const_1 [Const]
        output 0 (int32) has shape [1]
sparse_softmax_cross_entropy_loss/Mul [Mul]
        output 0 (float) has shape [100]
gradients/sparse_softmax_cross_entropy_loss/value_grad/tuple/control_dependency [Identity]
        output 0 (float) has shape []
gradients/sparse_softmax_cross_entropy_loss/value_grad/tuple/group_deps [NoOp]
gradients/sparse_softmax_cross_entropy_loss/value_grad/Select_1 [Select]
        output 0 (float) has shape []
gradients/Fill [Fill]
        output 0 (float) has shape []
gradients/grad_ys_0 [Const]
        output 0 (float) has shape []
gradients/Shape [Const]
        output 0 (int32) has shape [0]
gradients/sparse_softmax_cross_entropy_loss/value_grad/zeros_like [Const]
        output 0 (float) has shape []
sparse_softmax_cross_entropy_loss/Greater [Greater]
        output 0 (bool) has shape []
sparse_softmax_cross_entropy_loss/Greater/y [Const]
        output 0 (float) has shape []
gradients/sparse_softmax_cross_entropy_loss/value_grad/Select [Select]
        output 0 (float) has shape []
gradients/sparse_softmax_cross_entropy_loss/div_grad/Reshape [Reshape]
        output 0 (float) has shape []
gradients/sparse_softmax_cross_entropy_loss/div_grad/Sum [Sum]
        output 0 (float) has shape ?
gradients/sparse_softmax_cross_entropy_loss/div_grad/RealDiv [RealDiv]
        output 0 (float) has shape []
gradients/sparse_softmax_cross_entropy_loss/Mul_grad/Reshape [Reshape]
        output 0 (float) has shape [100]
gradients/sparse_softmax_cross_entropy_loss/Mul_grad/Sum [Sum]
        output 0 (float) has shape ?
gradients/sparse_softmax_cross_entropy_loss/Mul_grad/Mul [Mul]
        output 0 (float) has shape [100]
gradients/add_grad/Reshape [Reshape]
        output 0 (float) has shape [100, 10]
gradients/add_grad/Sum [Sum]
        output 0 (float) has shape ?
GradientDescent/learning_rate [Const]
        output 0 (float) has shape []
GradientDescent/update_Variable/ApplyGradientDescent [ApplyGradientDescent]
        output 0 (float_ref) has shape [784, 10]
gradients/MatMul_grad/tuple/control_dependency_1 [Identity]
        output 0 (float) has shape [784, 10]
gradients/MatMul_grad/tuple/group_deps [NoOp]
gradients/MatMul_grad/MatMul_1 [MatMul]
        output 0 (float) has shape [784, 10]
gradients/add_grad/tuple/control_dependency [Identity]
        output 0 (float) has shape [100, 10]
gradients/MatMul_grad/MatMul [MatMul]
        output 0 (float) has shape [100, 784]

P.S:
If you really want to know how Grappler does generate these reports, here is a hint to take a look at the step stats calculation information by using this function:
from tensorflow.core.protobuf import device_properties_pb2
from tensorflow.python.grappler import item

grappler_item = item.Item(m)
op_perfs, run_time, step_stats = grappler_cluster.MeasureCosts(grappler_item)
print("op_perfs...:", op_perfs)
print("run_time...:", run_time)
print("step_stats...", step_stats)
You can check them by yourself. I just do sampling from the result...

op_perfs... :
...
...
compute_cost: 32000
node: "MatMul"
memory_time: 32000
op_memory {
  output_memory: 40000
}
, op {
  op: "Add"
  attr {
    key: "T"
    value {
      type: DT_FLOAT
    }
  }
  inputs {
    dtype: DT_FLOAT
    shape {
      dim {
        size: 1000
      }
      dim {
        size: 10
      }
    }
  }
  inputs {
    dtype: DT_FLOAT
    shape {
      dim {
        size: 10
      }
    }
  }
  device {
    type: "GPU"
    vendor: "NVIDIA"
    model: "GeForce GTX 1080"
    frequency: 1809
    num_cores: 20
    environment {
      key: "architecture"
      value: "6.1"
    }
    environment {
      key: "cuda"
      value: "9000"
    }
    environment {
      key: "cudnn"
      value: "7104"
    }
    num_registers: 65536
    l1_cache_size: 24576
    l2_cache_size: 2097152
    shared_memory_size_per_multiprocessor: 98304
    memory_size: 8504868864
    bandwidth: 320320000
  }
}

...
...

step_stats... :
...
...
  node_stats {
    node_name: "MatMul"
    op_end_rel_micros: 32
    all_end_rel_micros: 32
    output {
      tensor_description {
        dtype: DT_FLOAT
        shape {
          dim {
            size: 1000
          }
          dim {
            size: 10
          }
        }
        allocation_description {
          requested_bytes: 40000
          allocated_bytes: 40000
        }
      }
    }
    timeline_label: "MatMul"
    memory_stats {
    }
  }


No comments: