I inspired by this following article and try to do something different because it's approach using Keras has an issue for XLA AOT compiler.
Kerasモデルをtfcompileでコンパイルする
Instead, I download the pre-trained Resnet50 model and optimize simply by the tool:transform_graph
Download:
Transform:
So, based on the same process in the post above, I can get the running result and the inference time on my laptop:
P.S: The program for inferencing using XLA AOT can use multithreading to enhance the performance. So, if you run it on a server which has more CPU cores and takes advantage of these cores in multithreading.
Kerasモデルをtfcompileでコンパイルする
Instead, I download the pre-trained Resnet50 model and optimize simply by the tool:transform_graph
Download:
wget http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC.tar.gz
or
wget http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph='/home/liudanny/git/tensorflow_danny/tensorflow/compiler/aot/myaot_resnet50/resnetv2_imagenet_frozen_graph.pb' \
--out_graph='/home/liudanny/workspace/pyutillib/my_resnet50/optimized_frozen_graph.pb' \
--inputs='input_tensor:0' \
--outputs='softmax_tensor:0' \
--transforms='
strip_unused_nodes
fold_constants'
#I don't enable the follwoing options due to worse performance.
#fold_batch_norms <== For XLA AOT, it is not good in the performance
#fold_old_batch_norms <== For XLA AOT, it is not good in the performance
#quantize_weights' <== XLA AOT doesn't support
So, based on the same process in the post above, I can get the running result and the inference time on my laptop:
liudanny@ubuntu:~/git/tensorflow_danny/bazel-bin/tensorflow/compiler/aot/myaot_resnet50$ ./my_code
inference time(ms) : 12062.1
inference 2nd time(ms) : 11807.2
result max_i is 283
In sum, here is the whole executing screenshot as follows:
P.S: The program for inferencing using XLA AOT can use multithreading to enhance the performance. So, if you run it on a server which has more CPU cores and takes advantage of these cores in multithreading.
No comments:
Post a Comment