2020-03-06 02:27 AM
Hi,
I have followed the steps described here https://wiki.st.com/stm32mpu/wiki/STM32MP1_artificial_intelligence_expansion_packages to setup AI package on STM32MP1. Everything is working fine.
However, I have my own object detector model file i.e. model.tflite file. I know there are two ways to run object detector model 1) using Python code and 2) using C/C++ code. I tried to run my object detector model using above both approaches. Model works fine with python code but it doesn't detect anything using C/C++ code.
I used the following command to run object detector using python code:
root@stm32mp1:/usr/local/demo-ai/ai-cv#python3 python/objdetect_tfl_multiprocessing.py -m models/model.tflite -l models/labels.txt -i models/images/
Similarly I used the following command to run object detector using C/C++ code:
root@stm32mp1:/usr/local/demo-ai/ai-cv/bin# ./objdetect_tfl_gst_gtk -m ../models
/model.tflite -l ../models/labels.txt -i ../models/images/
In my opinion C/C++ code should take less inference time than the python code. However C/C++ code takes ~2.5 seconds and python code takes ~2.7 seconds.
Am I missing something? How can I use C/C++ code to run my model?
Thanking you,
Saurabh
2020-03-06 05:30 AM
Hi @SChau.2062
I have just tested the same on my side with COCO SSD embedded in the demo:
root@stm32mp1:/usr/local/demo-ai/ai-cv/bin# ./objdetect_tfl_gst_gtk -m ../models/coco_ssd_mobilenet/detect.tflite -l ../models/coco_ssd_mobilenet/labels.txt -i ../models/coco_ssd_mobilenet/testdata/
C/C++ application have around 750ms of inference.
root@stm32mp1:/usr/local/demo-ai/ai-cv/python# python3 objdetect_tfl_multiprocessing.py -m ../models/coco_ssd_mobilenet/detect.tflite -l ../models/coco_ssd_mobilenet/labels.txt -i ../models/coco_ssd_mobilenet/testdata/
Python application have aroun 1.3s of inference.
Can you please try the same just to be sure that you have approximately same results?
Further you can benchmark you own NN network on the STM32MP1 using the benchmark application provided by TensorFlowLite framework.
The benchmark application is named benchmark_model and is located here:
/usr/bin/tensorflow-lite-2.0.0/examples/benchmark_model
You can use it as follow:
root@stm32mp1:/usr/local/demo-ai/ai-cv# /usr/bin/tensorflow-lite-2.0.0/examples/benchmark_model --graph=models/coco_ssd_mobilenet/detect.tflite
By default the benchmark application use only one thread (use of 1 core) to execute the inference. You can give a try by using the 2 cortexA7 core to run the inference by executing the following command:
root@stm32mp1:/usr/local/demo-ai/ai-cv# /usr/bin/tensorflow-lite-2.0.0/examples/benchmark_model --graph=models/coco_ssd_mobilenet/detect.tflite --num_threads=2
You should see an improvement.
C/C++ application can take benefit of the 2 cores whereas the python application can't.
In you case, if you confirm that benchmark application provide the same figures you give in your post, it means that according to your NN model structure, you have only a small benefit (but still have a benefit).
With some NN model like deepspeech, either you run the inference with 1 or 2 core, it does not change the inference time because of the architecture of the NN model.
Hope it answer you questions.
BR
Vincent
2020-03-06 07:39 AM
Hi @VABRI ,
Thanks for your reply.
I have tried to run COCO SSD embedded in the demo using both Python and C/C++ application. Both worked! Python code took ~1.3 seconds and C/C++ application took ~750ms for inference (as you pointed).
So far everything is clear. Further I verified benchmark result of COCO SSD (everything is clear here) and my own model.
When I perform benchmark application on my model using the following command:
root@stm32mp1:/usr/local/demo-ai/ai-cv# /usr/bin/tensorflow-lite-2.0.0/examples/benchmark_model --graph=models/model.tflite
it displays the following result:
Average inference timings in us: Warmup: 2.72468e+06, Init: 218333, no stats: 2.68986e+06
Similarly for the following command (using num_threads=2:(
root@stm32mp1:/usr/local/demo-ai/ai-cv# /usr/bin/tensorflow-lite-2.0.0/examples/benchmark_model --graph=models/model.tflite --num_threads=2
It displays the following output:
Average inference timings in us: Warmup: 2.52347e+06, Init: 36266, no stats: 2.49059e+06
There isn't much improvement.
But my main problem is: Python code detects object in the test images while C/C++ application doesn't detect object in the same test images.
How can I resolve this?
Thanking you,
Saurabh
2020-03-09 04:33 AM
Hi @SChau.2062
It seems that you environment is OK.
It is little bit difficult to explain why the C/C++ application is not working with your model whereas python script is working.
Nevertheless, here is some tracks:
Hope it helps.
BR
Vincent
2020-03-09 04:52 AM
Hi @VABRI ,
Thanks for the explanation.
Could you please tell where can I find C/C++ script on STM device itself i.e. location so that I can edit C/C++ script directly on the device side. I can edit the Python script in this path /usr/local/demo-ai/ai-cv/python.
Please feel free to correct me and share your feedback!
Thanks,
Saurabh
2020-03-09 07:06 AM
Hi @SChau.2062 ,
I'm fully in line with your analysis. It should work. So there is something wrong and my advise is to have a look into the source code of the C/C++ application.
You will not be able to edit the C/C++ application on the target, because only the binary is available on the target.
The source of the C/C++ object detection application is located in the meta-st-stm32mpu-ai openembedded layer of the OpenSTLinux Distribution:
./meta-st-stm32mpu-ai/recipes-samples/demo/tensorflow-lite-cv-apps/bin/objdetect_tfl_gst_gtk.cc
I assume that you have the OpenSTLinux Distribution environment up and running so you can rebuild the objdetect_tfl_gst_gtk application with the following command:
bitbake tensorflow-lite-cv-apps -c compile
BR
Vincent
2020-03-09 07:48 AM
Hi @VABRI ,
Thank you, I will look into that and meanwhile move forward with the Python script.
Thanks,
Saurabh