Monday, June 10, 2019

[Squid] Setup Linux Proxy Server and Client

The following steps are about installing and setup Linux proxy server (Squid) 

#install squid proxy
$ sudo apt-get install squid

#modify squid.conf
$ sudo vi /etc/squid/squid.conf
  ==> #add your local network segment
  acl mynetwork src 192.168.0.0/24
  ==> #allow "mynetwork" be accessing via http
  http_access allow mynetwork

#restart squid...OK
$ sudo service squid restart

Tuesday, May 28, 2019

[Docker] Using GUI with Docker

Recently I need to run my GUI application with Docker and it can show up either directly on my desktop operating system or in my ssh terminal client via X11.

Basically, there are some people who already provide the solution for the cases. I just list the reference and quickly give my examples.

Monday, April 22, 2019

[TVM] Deploy TVM Module using C++ API

When the first time to deal with deploying TVM module using C++ API, I found this official web site: Deploy TVM Module using C++ API which only gives a fine example for deploying, but it doesn't explain how to generate the related files and modules.

So, after several times of trial and error, I figure it out to generate the required files for deploying.
Basically you can refer to the TVM tutorials about compiling models:
https://docs.tvm.ai/tutorials/index.html

Monday, April 15, 2019

[Experiment] Compare the inference performance of TensorFlow Lite and TVM

I compare the inference performance of both TensorFlow Lite and TVM on my laptop with the same MobileNet model and the same input size of 224*224.
They both assign two threads to do the inference task and see the average inferencing time it spent.
(P.S: Giving the same of 10 threads in these 2 cases )

Wednesday, April 10, 2019

[TensorFlow Lite] The performance experiment of TensorFlow Lite

Based on my previous post: [TensorFlow Lite] My first try with TensorFlow Lite ,
I am just curious about the performance of TensorFlow Lite on the different platforms so that I do a performance experiment for it.
There are a X86 server (no GPU enabled) and a Nvidia TX2 board ( which is ARM based ) to do the experiment. They both run TFLite’s official example using Inception_v3 model and get average inference time as the following table.

Hardware Information:
Nvidia TX2: 4 CPU cores and 1 GPU ( but does not use )
X86 server: 24 CPU cores and no GPU enabled

[XLA] Build Tensorflow XLA AOT Shared Library

Build Tensorflow XLA AOT Shared Library:

Add the following code into tensorflow/compiler/aot/BUILD in TensorFlow source:

cc_binary(
    name = "libxlaaot.so",
    deps = [":embedded_protocol_buffers",
        "//tensorflow/compiler/tf2xla:xla_jit_compiled_cpu_function",
        "//tensorflow/compiler/tf2xla",
        "//tensorflow/compiler/tf2xla:cpu_function_runtime",
        "//tensorflow/compiler/tf2xla:tf2xla_proto",
        "//tensorflow/compiler/tf2xla:tf2xla_util",
        "//tensorflow/compiler/tf2xla:xla_compiler",
        "//tensorflow/compiler/tf2xla/kernels:xla_cpu_only_ops",
        "//tensorflow/compiler/tf2xla/kernels:xla_dummy_ops",
        "//tensorflow/compiler/tf2xla/kernels:xla_ops",
        "//tensorflow/compiler/xla:shape_util",
        "//tensorflow/compiler/xla:statusor",
        "//tensorflow/compiler/xla:util",
        "//tensorflow/compiler/xla:xla_data_proto",
        "//tensorflow/compiler/xla/client:client_library",
        "//tensorflow/compiler/xla/client:compile_only_client",
        "//tensorflow/compiler/xla/client:xla_computation",
        "//tensorflow/compiler/xla/service:compiler",
        "//tensorflow/compiler/xla/service/cpu:buffer_info_util",
        "//tensorflow/compiler/xla/service/cpu:cpu_compiler",
        "//tensorflow/core:core_cpu_internal",
        "//tensorflow/core:framework_internal",
        "//tensorflow/core:lib",
        "//tensorflow/core:lib_internal",
        "//tensorflow/core:protos_all_cc",
        ":tfcompile_lib",
        "//tensorflow/compiler/xla/legacy_flags:debug_options_flags",
        "//tensorflow/core:core_cpu",
        "//tensorflow/core:framework",
        "//tensorflow/core:graph",
        "@com_google_absl//absl/memory",
        "@com_google_absl//absl/strings",
        "@com_google_absl//absl/types:span",
    ],
    linkopts=["-shared -Wl,--whole-archive" ],
    linkshared=1

)

Wednesday, March 27, 2019

[TensorFlow Lite] Build Tensorflow Lite C++ shared library

Build Tensorflow Lite shared library:

Modify the code: tensorflow/contrib/lite/BUILD in TensorFlow source:
cc_binary(
    name = "libtflite.so",
    deps = [":framework",
        "//tensorflow/contrib/lite/kernels:builtin_ops",
        "//tensorflow/contrib/lite/kernels:eigen_support",
        "//tensorflow/contrib/lite/kernels:gemm_support",
    ],
    linkopts=["-shared -Wl,--whole-archive" ],
    linkshared=1

)

[TFCompile] Use XLA AOT Compiler to compile Resnet50 model and do inference

I inspired by this following article and try to do something different because it's approach using Keras has an issue for XLA AOT compiler.
Kerasモデルをtfcompileでコンパイルする
Instead, I download the pre-trained Resnet50 model and optimize simply by the tool:transform_graph

Download:
wget http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC.tar.gz
or
wget http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz
Transform:
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph='/home/liudanny/git/tensorflow_danny/tensorflow/compiler/aot/myaot_resnet50/resnetv2_imagenet_frozen_graph.pb' \
--out_graph='/home/liudanny/workspace/pyutillib/my_resnet50/optimized_frozen_graph.pb' \
--inputs='input_tensor:0' \
--outputs='softmax_tensor:0' \
--transforms='
  strip_unused_nodes
  fold_constants'

#I don't enable the follwoing options due to worse performance.
#fold_batch_norms      <== For XLA AOT, it is not good in the performance
#fold_old_batch_norms  <== For XLA AOT, it is not good in the performance
#quantize_weights' <== XLA AOT doesn't support