Wednesday, April 10, 2019

[TensorFlow Lite] The performance experiment of TensorFlow Lite

Based on my previous post: [TensorFlow Lite] My first try with TensorFlow Lite ,
I am just curious about the performance of TensorFlow Lite on the different platforms so that I do a performance experiment for it.
There are a X86 server (no GPU enabled) and a Nvidia TX2 board ( which is ARM based ) to do the experiment. They both run TFLite’s official example using Inception_v3 model and get average inference time as the following table.

Hardware Information:
Nvidia TX2: 4 CPU cores and 1 GPU ( but does not use )
X86 server: 24 CPU cores and no GPU enabled

[XLA] Build Tensorflow XLA AOT Shared Library

Build Tensorflow XLA AOT Shared Library:

Add the following code into tensorflow/compiler/aot/BUILD in TensorFlow source:

cc_binary(
    name = "libxlaaot.so",
    deps = [":embedded_protocol_buffers",
        "//tensorflow/compiler/tf2xla:xla_jit_compiled_cpu_function",
        "//tensorflow/compiler/tf2xla",
        "//tensorflow/compiler/tf2xla:cpu_function_runtime",
        "//tensorflow/compiler/tf2xla:tf2xla_proto",
        "//tensorflow/compiler/tf2xla:tf2xla_util",
        "//tensorflow/compiler/tf2xla:xla_compiler",
        "//tensorflow/compiler/tf2xla/kernels:xla_cpu_only_ops",
        "//tensorflow/compiler/tf2xla/kernels:xla_dummy_ops",
        "//tensorflow/compiler/tf2xla/kernels:xla_ops",
        "//tensorflow/compiler/xla:shape_util",
        "//tensorflow/compiler/xla:statusor",
        "//tensorflow/compiler/xla:util",
        "//tensorflow/compiler/xla:xla_data_proto",
        "//tensorflow/compiler/xla/client:client_library",
        "//tensorflow/compiler/xla/client:compile_only_client",
        "//tensorflow/compiler/xla/client:xla_computation",
        "//tensorflow/compiler/xla/service:compiler",
        "//tensorflow/compiler/xla/service/cpu:buffer_info_util",
        "//tensorflow/compiler/xla/service/cpu:cpu_compiler",
        "//tensorflow/core:core_cpu_internal",
        "//tensorflow/core:framework_internal",
        "//tensorflow/core:lib",
        "//tensorflow/core:lib_internal",
        "//tensorflow/core:protos_all_cc",
        ":tfcompile_lib",
        "//tensorflow/compiler/xla/legacy_flags:debug_options_flags",
        "//tensorflow/core:core_cpu",
        "//tensorflow/core:framework",
        "//tensorflow/core:graph",
        "@com_google_absl//absl/memory",
        "@com_google_absl//absl/strings",
        "@com_google_absl//absl/types:span",
    ],
    linkopts=["-shared -Wl,--whole-archive" ],
    linkshared=1

)

Wednesday, March 27, 2019

[TensorFlow Lite] Build Tensorflow Lite C++ shared library

Build Tensorflow Lite shared library:

Modify the code: tensorflow/contrib/lite/BUILD in TensorFlow source:
cc_binary(
    name = "libtflite.so",
    deps = [":framework",
        "//tensorflow/contrib/lite/kernels:builtin_ops",
        "//tensorflow/contrib/lite/kernels:eigen_support",
        "//tensorflow/contrib/lite/kernels:gemm_support",
    ],
    linkopts=["-shared -Wl,--whole-archive" ],
    linkshared=1

)

[TFCompile] Use XLA AOT Compiler to compile Resnet50 model and do inference

I inspired by this following article and try to do something different because it's approach using Keras has an issue for XLA AOT compiler.
Kerasモデルをtfcompileでコンパイルする
Instead, I download the pre-trained Resnet50 model and optimize simply by the tool:transform_graph

Download:
wget http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC.tar.gz
or
wget http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz
Transform:
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph='/home/liudanny/git/tensorflow_danny/tensorflow/compiler/aot/myaot_resnet50/resnetv2_imagenet_frozen_graph.pb' \
--out_graph='/home/liudanny/workspace/pyutillib/my_resnet50/optimized_frozen_graph.pb' \
--inputs='input_tensor:0' \
--outputs='softmax_tensor:0' \
--transforms='
  strip_unused_nodes
  fold_constants'

#I don't enable the follwoing options due to worse performance.
#fold_batch_norms      <== For XLA AOT, it is not good in the performance
#fold_old_batch_norms  <== For XLA AOT, it is not good in the performance
#quantize_weights' <== XLA AOT doesn't support

Thursday, March 21, 2019

[AutoKeras] My first try with a simple example of AutoKeras

AutoKeras only supports Python 3.6 so that the running environment has to install Python 3.6. My operation system is Ubuntu 16.04 and it needs to add apt repository first.

Install Python 3.6 and AutoKeras ( Don't remove Python 3.5)
# Install pip3
apt-get install python3-pip
# Install Python 3.6
apt-get install software-properties-common
add-apt-repository ppa:jonathonf/python-3.6
apt-get update
apt-get install python3.6

update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.5 1
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 2
update-alternatives --config python3

ln -s /usr/include/python3.5 /usr/include/python3.6m
pip3 install lws
pip3 install autokeras

Friday, March 15, 2019

[TensorFlow] Build TensorFlow v1.12 from the source on Ubuntu 16.04

My previous post: [TensorFlow] How to build your C++ program or application with TensorFlow library using CMake
It is for building TensorFlow from the source based on v1.10. Currently, I want to upgrade it to v1.12 and encounter some problems. 
First, my version of ProtoBuf library on my system is v3.6.1 so that we should align its version in the TensorFlow.
Second, it seems that there are a few issues when building TensorFlow v1.12 that we need to deal with it case by case.

Monday, March 11, 2019

[TensorFlow Lite] My first try with TensorFlow Lite

I just take my first try with the example: label_image (tensorflow/contrib/lite/examples/label_image) in TensorFlow Lite and write down the commands that I used.
There are a bunch of information from the offical TensorFlow Lite guide:
https://www.tensorflow.org/lite/guide

1. convert the example of model to tflite format

Wednesday, March 6, 2019

[Tool] Convert TensorFlow graph to UFF format

The previous post: How to use TensorRT to do inference with TensorFlow model ? has introduced away to do the converting job for UFF format model. But, basically there are 2 ways to do that:

1. Convert TensorFlow's Session GraphDef directly on the fly to UFF format model
    ==> convert_uff_from_tensorflow()
2. Convert the frozen model file to UFF format model
    ==> convert_uff_from_frozen_model()

The following code is about the functions to convert TensorFlow graph to UFF format for running with TensorRT.

Tuesday, March 5, 2019

[OpenCV] Build OpenCV 3.4.4 on TX2

For the reason that I was curious about the performance of OpenCV on TX2 using GPU, I installed OpenCV 3.4.4 (this version and after will integrate with the inference engine) on my TX2 based on the following links.
https://jkjung-avt.github.io/opencv3-on-tx2/
https://www.learnopencv.com/install-opencv-3-4-4-on-ubuntu-16-04/

Monday, February 25, 2019

[Inspecting Graphs] Use TensorFlow's summarize_graph tool to find the input and output node names in the frozen model/graph

When trying to do inferencing using a frozen model from downloading or freezing by yourself, we may encounter a problem about what the input and output node names are in this model? If we cannot figure them out, it is impossible for you to do inferencing correctly. Here is an easy way to get the possible ones: using the tool: "summarize_graph"

bazel build tensorflow/tools/graph_transforms:summarize_graph
bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=/danny/tmp/faster_rcnn_resnet101_coco_2018_01_28/frozen_inference_graph.pb
Found 1 possible inputs: (name=image_tensor, type=uint8(4), shape=[?,?,?,3])
No variables spotted.
Found 4 possible outputs: (name=detection_boxes, op=Identity) (name=detection_scores, op=Identity) (name=num_detections, op=Identity) (name=detection_classes, op=Identity)
Found 48132698 (48.13M) const parameters, 0 (0) variable parameters, and 4163 control_edges
Op types used: 4688 Const, 885 StridedSlice, 559 Gather, 485 Mul, 472 Sub, 462 Minimum, 369 Maximum, 304 Reshape, 276 Split, 205 RealDiv, 204 Pack, 202 ConcatV2, 201 Cast, 188 Greater, 183 Where, 149 Shape, 145 Add, 109 BiasAdd, 107 Conv2D, 106 Slice, 100 Relu, 99 Unpack, 97 Squeeze, 94 ZerosLike, 91 NonMaxSuppressionV2, 55 Enter, 46 Identity, 45 Switch, 27 Range, 24 Merge, 22 TensorArrayV3, 17 ExpandDims, 15 NextIteration, 12 TensorArrayScatterV3, 12 TensorArrayReadV3, 10 TensorArrayWriteV3, 10 Exit, 10 Tile, 10 TensorArrayGatherV3, 10 TensorArraySizeV3, 6 Transpose, 6 Fill, 6 Assert, 5 Less, 5 LoopCond, 5 Equal, 4 Round, 4 Exp, 4 MaxPool, 3 Pad, 2 Softmax, 2 Size, 2 GreaterEqual, 2 TopKV2, 2 MatMul, 1 All, 1 CropAndResize, 1 ResizeBilinear, 1 Relu6, 1 Placeholder, 1 LogicalAnd, 1 Max, 1 Mean
To use with tensorflow/tools/benchmark:benchmark_model try these arguments:
bazel run tensorflow/tools/benchmark:benchmark_model -- --graph=/danny/tmp/faster_rcnn_resnet101_coco_2018_01_28/frozen_inference_graph.pb --show_flops --input_layer=image_tensor --input_layer_type=uint8 --input_layer_shape=-1,-1,-1,3 --output_layer=detection_boxes,detection_scores,num_detections,detection_classes

For more information, please refer to this:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#inspecting-graphs

Wednesday, January 30, 2019

[TFRecord] The easy way to verify your TFRecord file

There is a common situation when you build your TFRecord file ( your dataset ) and want to verify the correctness of the data in it. How to do it? I assume you don't have the problem to build your TFRecord file. So, the easy way to verify your TFRecord file is to use the API: tf.python_io.tf_record_iterator()

Wednesday, January 16, 2019

[TensorFlow] How to use Distribution Strategy in TensorFlow?

Learned from What’s coming in TensorFlow 2.0, TensorFlow 2.0 is coming soon and there are several features which are ready to use already, for instance, Distribution Strategy. Quoted from the article,
"For large ML training tasks, the Distribution Strategy API makes it easy to distribute and train models on different hardware configurations without changing the model definition. Since TensorFlow provides support for a range of hardware accelerators like CPUs, GPUs, and TPUs, you can enable training workloads to be distributed to single-node/multi-accelerator as well as multi-node/multi-accelerator configurations, including TPU Pods. Although this API supports a variety of cluster configurations, templates to deploy training on Kubernetes clusters in on-prem or cloud environments are provided."

Friday, January 11, 2019

[Shell] The example shell script of automation way to build the software or library required

I don't like to preach at people. But, if someone has a task that has been done more than two times, then he/she should consider using an automation way to ease the burden. One of the solutions is by writing a shell script. Here is an example of building OpenCV 3.4.1 on TX2 using a shell script. The software or the target platform is not the key part. The most important part is to adopt this sort of shell script to become the one of your version.

Thursday, January 10, 2019

[Tool] Visdom is a great visualization tool

I cannot use my word to describe Visdom because it is so amazingly awesome. Referencing the introduction from Facebook AI's official site,Visdom is a visualization tool that generates rich visualizations of live data to help researchers and developers stay on top of their scientific experiments that are run on remote servers. Visualizations in Visdom can be viewed in browsers and easily shared with others.

Monday, January 7, 2019

[TensorRT] How to use TensorRT to do inference with TensorFlow model ?

TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for deep learning applications. Here I am going to demonstrate that how to use TensorRT to do inference with TensorFlow model.


Install TensorRT
Please refer to this official website first:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-install-guide/index.html#installing
After downloading TensorRT 4.0 ( in my case ), we can install it.
$ dpkg -i nv-tensorrt-repo-ubuntu1604-cuda9.0-ga-trt4.0.1.6-20180612_1-1_amd64.deb
$ apt-get update
$ apt-get install tensorrt
$ apt-get install python-libnvinfer-dev
$ apt-get install uff-converter-tf

Friday, January 4, 2019

[TensorFlow] How to write op with gradient in python?

Recently for some reasons, I studied the Domain-Adversarial Training of Neural Networks and it can be downloaded from http://jmlr.org/papers/volume17/15-239/15-239.pdf

In this paper, there is the key point that we should implement "Gradient Reversal Layer" for Discriminator to use it to connect the feature extractor. I found the source to implement it by replacing Identity op's gradient function as follows:

Thursday, January 3, 2019

[TensorFlow] How to generate the Memory Report from Grappler?

In the previous post, I introduce the way to generate cost and model report from Grappler.
https://danny270degree.blogspot.com/2019/01/tensorflow-how-to-generate-cost-and.html
In this post, I will continue to introduce the memory report which I think that is very useful. Please refer to my previous post to get the model code.

[TensorFlow] How to generate the Cost and Model Report from Grappler?

General speaking, Grappler in Tensorflow has several optimizers to do the specific area optimizations, such as for reducing the peak memory usage in GPU. So, I want to introduce some useful functions inside Grappler which are used for Simple Placer mechanism. And, these functions are also partially used in Grappler's optimizers.

Monday, December 24, 2018

[TensorFlow] My example of using SavedModelBuilder to do inference in TensorFlow

The purpose of this post is to show my example of SavedModelBuilder to do inference in TensorFlow. From my experiment, this approach can save a model with the signature that has input and output node name. And SavedModelBuilder can restore the graph based on the previously saved model pb file and the signature definition. Once, the restore is done, the inference task can be executed directly without GPU device needed if the training task is on GPU device.

Saturday, December 22, 2018

[Reinforcement Learning] Get started to learn Actor Critic for reinforcement learning

Actor-Critic is basically combined with Policy Gradient (Actor)  and Function Approximation (Critic) based algorithm together. Actor is based on the probability given by policy to act and Critic judges the performance of Actor and gives the score. So, Actor will improve its probability given by policy based on Critic's judge and score. The following diagram is the concept:


Friday, December 14, 2018

[Reinforcement Learning] Get started to learn Sarsa for reinforcement learning

If taking a look at Sarsa algorithm, you will find that it is so similar with Q-Learning.
For my previous post about Q-Learning, please refer to this link:
https://danny270degree.blogspot.com/2018/11/reinforcement-learning-get-started-to_21.html

Here is the Sarsa algorithm:

Thursday, December 13, 2018

[Reinforcement Learning] Using dynamic programming to solve a simple GridWorld with 4X4

I borrow the example and its source code from here which is a dynamic programming to solve a simple GridWorld with 4X4 and put my explanation for the calculation of value function. Hope that will help to understand dynamic programming and Markov Reward Process(MRP) more quickly.