Wednesday, March 27, 2019

[TFCompile] Use XLA AOT Compiler to compile Resnet50 model and do inference

I inspired by this following article and try to do something different because it's approach using Keras has an issue for XLA AOT compiler.
Kerasモデルをtfcompileでコンパイルする
Instead, I download the pre-trained Resnet50 model and optimize simply by the tool:transform_graph

Download:
wget http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC.tar.gz
or
wget http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz
Transform:
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph='/home/liudanny/git/tensorflow_danny/tensorflow/compiler/aot/myaot_resnet50/resnetv2_imagenet_frozen_graph.pb' \
--out_graph='/home/liudanny/workspace/pyutillib/my_resnet50/optimized_frozen_graph.pb' \
--inputs='input_tensor:0' \
--outputs='softmax_tensor:0' \
--transforms='
  strip_unused_nodes
  fold_constants'

#I don't enable the follwoing options due to worse performance.
#fold_batch_norms      <== For XLA AOT, it is not good in the performance
#fold_old_batch_norms  <== For XLA AOT, it is not good in the performance
#quantize_weights' <== XLA AOT doesn't support



So, based on the same process in the post above, I can get the running result and the inference time on my laptop:
liudanny@ubuntu:~/git/tensorflow_danny/bazel-bin/tensorflow/compiler/aot/myaot_resnet50$ ./my_code
inference time(ms) : 12062.1
inference 2nd time(ms) : 11807.2
result max_i is 283
In sum, here is the whole executing screenshot as follows:

P.S: The program for inferencing using XLA AOT can use multithreading to enhance the performance. So, if you run it on a server which has more CPU cores and takes advantage of these cores in multithreading.

No comments: