Monday, April 15, 2019

[Experiment] Compare the inference performance of TensorFlow Lite and TVM

I compare the inference performance of both TensorFlow Lite and TVM on my laptop with the same MobileNet model and the same input size of 224*224.
They both assign two threads to do the inference task and see the average inferencing time it spent.
(P.S: Giving the same of 10 threads in these 2 cases )


TensorFlow Lite
-c 10 ==> loop 10 times
-t 10 ==> assign 10 threads
# Execute the example of TensorFlow Lite label_image
$ tensorflow/contrib/lite/examples/label_image/build/label_image  \
-m /home/liudanny/.tvm_test_data/tf/official/mobilenet_v1_1.0_224.tflite \
-l tensorflow/examples/label_image/data/imagenet_slim_labels.txt \
-i tensorflow/contrib/lite/examples/label_image/testdata/grace_hopper.bmp \
-c 10 \
-t 10

Loaded model tensorflow/examples/label_image/data/mobilenet_v1_1.0_224.tflite
resolved reporter
invoked
average time: 42.8714 ms
0.860175: 653 military uniform
0.0481018: 907 Windsor tie
0.00786705: 466 bulletproof vest
0.00644933: 514 cornet
0.0060803: 543 drumstick


TVM
https://docs.tvm.ai/tutorials/frontend/from_tflite.html
I add the following code to "from_tflite.py" to setup using 2 threads
# Set number of threads used for tuning based on the number of
# physical cpu cores on your machine.
num_threads = 10
os.environ["TVM_NUM_THREADS"] = str(num_threads)
# Execute the example of TVM's tflite
$ python from_tflite.py 

File /home/test/.tvm_test_data/data/cat.png exists, skip.
input (1, 3, 224, 224)

No handlers could be found for logger "autotvm"
it took 0.02525761127471924 seconds
exist file got corrupted, downloading /home/test/.tvm_test_data/data/labels_mobilenet_quant_v1_224.txt file freshly...
Downloading from url https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/lite/java/demo/app/src/main/assets/labels_mobilenet_quant_v1_224.txt to /home/test/.tvm_test_data/data/labels_mobilenet_quant_v1_224.txt
...100%, 0.02 MB, 29 KB/s, 0 seconds passed
The image prediction result is: id 283 name: tiger cat


In sum:
TensorFlow Lite: 42.87 ms
TVM for TFLite: 25.25 ms

So, TVM indeed optimizes the model.


P.S:
  Currently, Relay frontend seems not fully to support Inception V3 TFLite model. Here is the place to implement unsupported operation in:
https://github.com/dmlc/tvm/blob/master/python/tvm/relay/frontend/tflite.py

2 comments:

t13m said...

Hi Danny,
Thanks for your post, it's very useful.
Is it correct that your benchmark are carried out on a X86-64 platform? Is there any SIMD instruction set are used, like sse or avx?
Did you try to inference any model containing LSTM (or other RNN types) on TFLite and TVM? How is the performance?

Thanks

TeYen (Danny) said...

Hi t13m,
Sorry about replying late.
Yes, my benchmark is based on my Intel x64 server, but I think the CPU is old and may not support AVX.
I didn't try LSTM/RNN for both.