Danny's tech notebook | 丹尼技術手札

Wednesday, June 19, 2019

[DDS] Install OpenSplice Community Version

What is DDS? The Data Distribution Service (DDS™) is a middleware protocol and API standard for data-centric connectivity from the Object Management Group® (OMG®). It integrates the components of a system together, providing low-latency data connectivity, extreme reliability, and a scalable architecture that business and mission-critical Internet of Things (IoT) applications need. For more information, check it out: https://www.dds-foundation.org/what-is-dds-3/
https://zhuanlan.zhihu.com/p/32278571

In this post, I install OpenSplice as my DDS runtime environment and library. You can download it from here: https://www.adlinktech.com/en/dds-community-software-evaluation.aspx

[TensorFlow] Build TensorFlow from source with Intel MKL enabled

Based on this document "Intel® Math Kernel Library for Deep Learning Networks: Part 1–Overview and Installation", I give a quick summary to do it.
Here are the steps for building TensorFlow from source with Intel MKL enabled.

For MKL DNN:

# Install Intel MKL & MKL-DNN
$ git clone https://github.com/01org/mkl-dnn.git
$ cd mkl-dnn
$ cd scripts && ./prepare_mkl.sh && cd ..
$ mkdir -p build && cd build && cmake .. && make -j$(nproc)
$ make test
$ sudo make install

For building TensorFlow:

[Squid] Setup Linux Proxy Server and Client

The following steps are about installing and setup Linux proxy server (Squid)

#install squid proxy
$ sudo apt-get install squid

#modify squid.conf
$ sudo vi /etc/squid/squid.conf
  ==> #add your local network segment
  acl mynetwork src 192.168.0.0/24
  ==> #allow "mynetwork" be accessing via http
  http_access allow mynetwork

#restart squid...OK
$ sudo service squid restart

[Docker] Using GUI with Docker

Recently I need to run my GUI application with Docker and it can show up either directly on my desktop operating system or in my ssh terminal client via X11.

Basically, there are some people who already provide the solution for the cases. I just list the reference and quickly give my examples.

[TVM] Deploy TVM Module using C++ API

When the first time to deal with deploying TVM module using C++ API, I found this official web site: Deploy TVM Module using C++ API which only gives a fine example for deploying, but it doesn't explain how to generate the related files and modules.

So, after several times of trial and error, I figure it out to generate the required files for deploying.
Basically you can refer to the TVM tutorials about compiling models:
https://docs.tvm.ai/tutorials/index.html

[Experiment] Compare the inference performance of TensorFlow Lite and TVM

I compare the inference performance of both TensorFlow Lite and TVM on my laptop with the same MobileNet model and the same input size of 224*224.
They both assign two threads to do the inference task and see the average inferencing time it spent.
(P.S: Giving the same of 10 threads in these 2 cases )

[TensorFlow Lite] The performance experiment of TensorFlow Lite

Based on my previous post: [TensorFlow Lite] My first try with TensorFlow Lite ,
I am just curious about the performance of TensorFlow Lite on the different platforms so that I do a performance experiment for it.
There are a X86 server (no GPU enabled) and a Nvidia TX2 board ( which is ARM based ) to do the experiment. They both run TFLite’s official example using Inception_v3 model and get average inference time as the following table.

Hardware Information:
Nvidia TX2: 4 CPU cores and 1 GPU ( but does not use )
X86 server: 24 CPU cores and no GPU enabled

[XLA] Build Tensorflow XLA AOT Shared Library

Build Tensorflow XLA AOT Shared Library:

Add the following code into tensorflow/compiler/aot/BUILD in TensorFlow source:

cc_binary(
    name = "libxlaaot.so",
    deps = [":embedded_protocol_buffers",
        "//tensorflow/compiler/tf2xla:xla_jit_compiled_cpu_function",
        "//tensorflow/compiler/tf2xla",
        "//tensorflow/compiler/tf2xla:cpu_function_runtime",
        "//tensorflow/compiler/tf2xla:tf2xla_proto",
        "//tensorflow/compiler/tf2xla:tf2xla_util",
        "//tensorflow/compiler/tf2xla:xla_compiler",
        "//tensorflow/compiler/tf2xla/kernels:xla_cpu_only_ops",
        "//tensorflow/compiler/tf2xla/kernels:xla_dummy_ops",
        "//tensorflow/compiler/tf2xla/kernels:xla_ops",
        "//tensorflow/compiler/xla:shape_util",
        "//tensorflow/compiler/xla:statusor",
        "//tensorflow/compiler/xla:util",
        "//tensorflow/compiler/xla:xla_data_proto",
        "//tensorflow/compiler/xla/client:client_library",
        "//tensorflow/compiler/xla/client:compile_only_client",
        "//tensorflow/compiler/xla/client:xla_computation",
        "//tensorflow/compiler/xla/service:compiler",
        "//tensorflow/compiler/xla/service/cpu:buffer_info_util",
        "//tensorflow/compiler/xla/service/cpu:cpu_compiler",
        "//tensorflow/core:core_cpu_internal",
        "//tensorflow/core:framework_internal",
        "//tensorflow/core:lib",
        "//tensorflow/core:lib_internal",
        "//tensorflow/core:protos_all_cc",
        ":tfcompile_lib",
        "//tensorflow/compiler/xla/legacy_flags:debug_options_flags",
        "//tensorflow/core:core_cpu",
        "//tensorflow/core:framework",
        "//tensorflow/core:graph",
        "@com_google_absl//absl/memory",
        "@com_google_absl//absl/strings",
        "@com_google_absl//absl/types:span",
    ],
    linkopts=["-shared -Wl,--whole-archive" ],
    linkshared=1

)

[TensorFlow Lite] Build Tensorflow Lite C++ shared library

Build Tensorflow Lite shared library:

Modify the code: tensorflow/contrib/lite/BUILD in TensorFlow source:

cc_binary(
    name = "libtflite.so",
    deps = [":framework",
        "//tensorflow/contrib/lite/kernels:builtin_ops",
        "//tensorflow/contrib/lite/kernels:eigen_support",
        "//tensorflow/contrib/lite/kernels:gemm_support",
    ],
    linkopts=["-shared -Wl,--whole-archive" ],
    linkshared=1

)

[TFCompile] Use XLA AOT Compiler to compile Resnet50 model and do inference

I inspired by this following article and try to do something different because it's approach using Keras has an issue for XLA AOT compiler.
Kerasモデルをtfcompileでコンパイルする
Instead, I download the pre-trained Resnet50 model and optimize simply by the tool:transform_graph

Download:

wget http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC.tar.gz
or
wget http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz

Transform:

bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph='/home/liudanny/git/tensorflow_danny/tensorflow/compiler/aot/myaot_resnet50/resnetv2_imagenet_frozen_graph.pb' \
--out_graph='/home/liudanny/workspace/pyutillib/my_resnet50/optimized_frozen_graph.pb' \
--inputs='input_tensor:0' \
--outputs='softmax_tensor:0' \
--transforms='
  strip_unused_nodes
  fold_constants'

#I don't enable the follwoing options due to worse performance.
#fold_batch_norms      <== For XLA AOT, it is not good in the performance
#fold_old_batch_norms  <== For XLA AOT, it is not good in the performance
#quantize_weights' <== XLA AOT doesn't support

[AutoKeras] My first try with a simple example of AutoKeras

AutoKeras only supports Python 3.6 so that the running environment has to install Python 3.6. My operation system is Ubuntu 16.04 and it needs to add apt repository first.

Install Python 3.6 and AutoKeras ( Don't remove Python 3.5)

# Install pip3
apt-get install python3-pip
# Install Python 3.6
apt-get install software-properties-common
add-apt-repository ppa:jonathonf/python-3.6
apt-get update
apt-get install python3.6

update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.5 1
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 2
update-alternatives --config python3

ln -s /usr/include/python3.5 /usr/include/python3.6m
pip3 install lws
pip3 install autokeras

[TensorFlow] Build TensorFlow v1.12 from the source on Ubuntu 16.04

My previous post: [TensorFlow] How to build your C++ program or application with TensorFlow library using CMake

It is for building TensorFlow from the source based on v1.10. Currently, I want to upgrade it to v1.12 and encounter some problems.

First, my version of ProtoBuf library on my system is v3.6.1 so that we should align its version in the TensorFlow.
Second, it seems that there are a few issues when building TensorFlow v1.12 that we need to deal with it case by case.

[TensorFlow Lite] My first try with TensorFlow Lite

I just take my first try with the example: label_image (tensorflow/contrib/lite/examples/label_image) in TensorFlow Lite and write down the commands that I used.
There are a bunch of information from the offical TensorFlow Lite guide:
https://www.tensorflow.org/lite/guide

1. convert the example of model to tflite format

[Tool] Convert TensorFlow graph to UFF format

The previous post: How to use TensorRT to do inference with TensorFlow model ? has introduced away to do the converting job for UFF format model. But, basically there are 2 ways to do that:

1. Convert TensorFlow's Session GraphDef directly on the fly to UFF format model
==> convert_uff_from_tensorflow()
2. Convert the frozen model file to UFF format model
==> convert_uff_from_frozen_model()

The following code is about the functions to convert TensorFlow graph to UFF format for running with TensorRT.

[OpenCV] Build OpenCV 3.4.4 on TX2

For the reason that I was curious about the performance of OpenCV on TX2 using GPU, I installed OpenCV 3.4.4 (this version and after will integrate with the inference engine) on my TX2 based on the following links.
https://jkjung-avt.github.io/opencv3-on-tx2/
https://www.learnopencv.com/install-opencv-3-4-4-on-ubuntu-16-04/

[Inspecting Graphs] Use TensorFlow's summarize_graph tool to find the input and output node names in the frozen model/graph

When trying to do inferencing using a frozen model from downloading or freezing by yourself, we may encounter a problem about what the input and output node names are in this model? If we cannot figure them out, it is impossible for you to do inferencing correctly. Here is an easy way to get the possible ones: using the tool: "summarize_graph"

bazel build tensorflow/tools/graph_transforms:summarize_graph
bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=/danny/tmp/faster_rcnn_resnet101_coco_2018_01_28/frozen_inference_graph.pb

Found 1 possible inputs: (name=image_tensor, type=uint8(4), shape=[?,?,?,3])
No variables spotted.
Found 4 possible outputs: (name=detection_boxes, op=Identity) (name=detection_scores, op=Identity) (name=num_detections, op=Identity) (name=detection_classes, op=Identity)
Found 48132698 (48.13M) const parameters, 0 (0) variable parameters, and 4163 control_edges
Op types used: 4688 Const, 885 StridedSlice, 559 Gather, 485 Mul, 472 Sub, 462 Minimum, 369 Maximum, 304 Reshape, 276 Split, 205 RealDiv, 204 Pack, 202 ConcatV2, 201 Cast, 188 Greater, 183 Where, 149 Shape, 145 Add, 109 BiasAdd, 107 Conv2D, 106 Slice, 100 Relu, 99 Unpack, 97 Squeeze, 94 ZerosLike, 91 NonMaxSuppressionV2, 55 Enter, 46 Identity, 45 Switch, 27 Range, 24 Merge, 22 TensorArrayV3, 17 ExpandDims, 15 NextIteration, 12 TensorArrayScatterV3, 12 TensorArrayReadV3, 10 TensorArrayWriteV3, 10 Exit, 10 Tile, 10 TensorArrayGatherV3, 10 TensorArraySizeV3, 6 Transpose, 6 Fill, 6 Assert, 5 Less, 5 LoopCond, 5 Equal, 4 Round, 4 Exp, 4 MaxPool, 3 Pad, 2 Softmax, 2 Size, 2 GreaterEqual, 2 TopKV2, 2 MatMul, 1 All, 1 CropAndResize, 1 ResizeBilinear, 1 Relu6, 1 Placeholder, 1 LogicalAnd, 1 Max, 1 Mean
To use with tensorflow/tools/benchmark:benchmark_model try these arguments:
bazel run tensorflow/tools/benchmark:benchmark_model -- --graph=/danny/tmp/faster_rcnn_resnet101_coco_2018_01_28/frozen_inference_graph.pb --show_flops --input_layer=image_tensor --input_layer_type=uint8 --input_layer_shape=-1,-1,-1,3 --output_layer=detection_boxes,detection_scores,num_detections,detection_classes

For more information, please refer to this:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#inspecting-graphs

Wednesday, January 30, 2019

[TFRecord] The easy way to verify your TFRecord file

There is a common situation when you build your TFRecord file ( your dataset ) and want to verify the correctness of the data in it. How to do it? I assume you don't have the problem to build your TFRecord file. So, the easy way to verify your TFRecord file is to use the API: tf.python_io.tf_record_iterator().

[TensorFlow] How to use Distribution Strategy in TensorFlow?

Learned from What’s coming in TensorFlow 2.0, TensorFlow 2.0 is coming soon and there are several features which are ready to use already, for instance, Distribution Strategy. Quoted from the article,
"For large ML training tasks, the Distribution Strategy API makes it easy to distribute and train models on different hardware configurations without changing the model definition. Since TensorFlow provides support for a range of hardware accelerators like CPUs, GPUs, and TPUs, you can enable training workloads to be distributed to single-node/multi-accelerator as well as multi-node/multi-accelerator configurations, including TPU Pods. Although this API supports a variety of cluster configurations, templates to deploy training on Kubernetes clusters in on-prem or cloud environments are provided."

[Shell] The example shell script of automation way to build the software or library required

I don't like to preach at people. But, if someone has a task that has been done more than two times, then he/she should consider using an automation way to ease the burden. One of the solutions is by writing a shell script. Here is an example of building OpenCV 3.4.1 on TX2 using a shell script. The software or the target platform is not the key part. The most important part is to adopt this sort of shell script to become the one of your version.

[Tool] Visdom is a great visualization tool

I cannot use my word to describe Visdom because it is so amazingly awesome. Referencing the introduction from Facebook AI's official site,Visdom is a visualization tool that generates rich visualizations of live data to help researchers and developers stay on top of their scientific experiments that are run on remote servers. Visualizations in Visdom can be viewed in browsers and easily shared with others.

[TensorRT] How to use TensorRT to do inference with TensorFlow model ?

TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for deep learning applications. Here I am going to demonstrate that how to use TensorRT to do inference with TensorFlow model.

Install TensorRT
Please refer to this official website first:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-install-guide/index.html#installing
After downloading TensorRT 4.0 ( in my case ), we can install it.

$ dpkg -i nv-tensorrt-repo-ubuntu1604-cuda9.0-ga-trt4.0.1.6-20180612_1-1_amd64.deb
$ apt-get update
$ apt-get install tensorrt
$ apt-get install python-libnvinfer-dev
$ apt-get install uff-converter-tf

[TensorFlow] How to write op with gradient in python?

Recently for some reasons, I studied the Domain-Adversarial Training of Neural Networks and it can be downloaded from http://jmlr.org/papers/volume17/15-239/15-239.pdf

In this paper, there is the key point that we should implement "Gradient Reversal Layer" for Discriminator to use it to connect the feature extractor. I found the source to implement it by replacing Identity op's gradient function as follows:

[TensorFlow] How to generate the Memory Report from Grappler?

In the previous post, I introduce the way to generate cost and model report from Grappler.
https://danny270degree.blogspot.com/2019/01/tensorflow-how-to-generate-cost-and.html
In this post, I will continue to introduce the memory report which I think that is very useful. Please refer to my previous post to get the model code.

[TensorFlow] How to generate the Cost and Model Report from Grappler?

General speaking, Grappler in Tensorflow has several optimizers to do the specific area optimizations, such as for reducing the peak memory usage in GPU. So, I want to introduce some useful functions inside Grappler which are used for Simple Placer mechanism. And, these functions are also partially used in Grappler's optimizers.

[TensorFlow] My example of using SavedModelBuilder to do inference in TensorFlow

The purpose of this post is to show my example of SavedModelBuilder to do inference in TensorFlow. From my experiment, this approach can save a model with the signature that has input and output node name. And SavedModelBuilder can restore the graph based on the previously saved model pb file and the signature definition. Once, the restore is done, the inference task can be executed directly without GPU device needed if the training task is on GPU device.

[Reinforcement Learning] Get started to learn Actor Critic for reinforcement learning

Actor-Critic is basically combined with Policy Gradient (Actor) and Function Approximation (Critic) based algorithm together. Actor is based on the probability given by policy to act and Critic judges the performance of Actor and gives the score. So, Actor will improve its probability given by policy based on Critic's judge and score. The following diagram is the concept:

Wednesday, June 19, 2019

Wednesday, June 12, 2019

Monday, June 10, 2019

The following steps are about installing and setup Linux proxy server (Squid)

Tuesday, May 28, 2019

Monday, April 22, 2019

Monday, April 15, 2019

Wednesday, April 10, 2019

Wednesday, March 27, 2019

Thursday, March 21, 2019

Friday, March 15, 2019

Monday, March 11, 2019

Wednesday, March 6, 2019

Tuesday, March 5, 2019

Monday, February 25, 2019

Wednesday, January 30, 2019

Wednesday, January 16, 2019

Friday, January 11, 2019

Thursday, January 10, 2019

Monday, January 7, 2019

Friday, January 4, 2019

Thursday, January 3, 2019

Monday, December 24, 2018

Saturday, December 22, 2018