Object Detection on STM32MP1using STM32MP1_artificial_intelligence_expansion_packages

SChau.2062 · ‎2020-03-06

Hi,

I have followed the steps described here https://wiki.st.com/stm32mpu/wiki/STM32MP1_artificial_intelligence_expansion_packages to setup AI package on STM32MP1. Everything is working fine.

However, I have my own object detector model file i.e. model.tflite file. I know there are two ways to run object detector model 1) using Python code and 2) using C/C++ code. I tried to run my object detector model using above both approaches. Model works fine with python code but it doesn't detect anything using C/C++ code.

I used the following command to run object detector using python code:

root@stm32mp1:/usr/local/demo-ai/ai-cv#python3 python/objdetect_tfl_multiprocessing.py -m models/model.tflite -l models/labels.txt -i models/images/

Similarly I used the following command to run object detector using C/C++ code:

root@stm32mp1:/usr/local/demo-ai/ai-cv/bin# ./objdetect_tfl_gst_gtk -m ../models
/model.tflite -l ../models/labels.txt -i ../models/images/

In my opinion C/C++ code should take less inference time than the python code. However C/C++ code takes ~2.5 seconds and python code takes ~2.7 seconds.

Am I missing something? How can I use C/C++ code to run my model?

Thanking you,

Saurabh

VABRI · ‎2020-03-06

Hi @SChau.2062

I have just tested the same on my side with COCO SSD embedded in the demo:

root@stm32mp1:/usr/local/demo-ai/ai-cv/bin# ./objdetect_tfl_gst_gtk -m ../models/coco_ssd_mobilenet/detect.tflite -l ../models/coco_ssd_mobilenet/labels.txt -i  ../models/coco_ssd_mobilenet/testdata/

C/C++ application have around 750ms of inference.

root@stm32mp1:/usr/local/demo-ai/ai-cv/python# python3 objdetect_tfl_multiprocessing.py -m ../models/coco_ssd_mobilenet/detect.tflite -l ../models/coco_ssd_mobilenet/labels.txt -i ../models/coco_ssd_mobilenet/testdata/

Python application have aroun 1.3s of inference.

Can you please try the same just to be sure that you have approximately same results?

Further you can benchmark you own NN network on the STM32MP1 using the benchmark application provided by TensorFlowLite framework.

The benchmark application is named benchmark_model and is located here:

/usr/bin/tensorflow-lite-2.0.0/examples/benchmark_model

You can use it as follow:

root@stm32mp1:/usr/local/demo-ai/ai-cv# /usr/bin/tensorflow-lite-2.0.0/examples/benchmark_model --graph=models/coco_ssd_mobilenet/detect.tflite

By default the benchmark application use only one thread (use of 1 core) to execute the inference. You can give a try by using the 2 cortexA7 core to run the inference by executing the following command:

root@stm32mp1:/usr/local/demo-ai/ai-cv# /usr/bin/tensorflow-lite-2.0.0/examples/benchmark_model --graph=models/coco_ssd_mobilenet/detect.tflite --num_threads=2

You should see an improvement.

C/C++ application can take benefit of the 2 cores whereas the python application can't.

In you case, if you confirm that benchmark application provide the same figures you give in your post, it means that according to your NN model structure, you have only a small benefit (but still have a benefit).

With some NN model like deepspeech, either you run the inference with 1 or 2 core, it does not change the inference time because of the architecture of the NN model.

Hope it answer you questions.

BR

Vincent

SChau.2062 · ‎2020-03-06

Hi @VABRI ,

Thanks for your reply.

I have tried to run COCO SSD embedded in the demo using both Python and C/C++ application. Both worked! Python code took ~1.3 seconds and C/C++ application took ~750ms for inference (as you pointed).

So far everything is clear. Further I verified benchmark result of COCO SSD (everything is clear here) and my own model.

When I perform benchmark application on my model using the following command:

root@stm32mp1:/usr/local/demo-ai/ai-cv# /usr/bin/tensorflow-lite-2.0.0/examples/benchmark_model --graph=models/model.tflite

it displays the following result:

Average inference timings in us: Warmup: 2.72468e+06, Init: 218333, no stats: 2.68986e+06

Similarly for the following command (using num_threads=2:(

root@stm32mp1:/usr/local/demo-ai/ai-cv# /usr/bin/tensorflow-lite-2.0.0/examples/benchmark_model --graph=models/model.tflite --num_threads=2

It displays the following output:

Average inference timings in us: Warmup: 2.52347e+06, Init: 36266, no stats: 2.49059e+06

There isn't much improvement.

But my main problem is: Python code detects object in the test images while C/C++ application doesn't detect object in the same test images.

How can I resolve this?

Thanking you,

Saurabh

VABRI · ‎2020-03-09

Hi @SChau.2062

It seems that you environment is OK.

It is little bit difficult to explain why the C/C++ application is not working with your model whereas python script is working.

Nevertheless, here is some tracks:

The source code of the C/C++ application is available in the X-LINUX-AI-CV expansion package so that you are able to add some log.
Please check in the C/C++ application that inference is running well.
You can display the results using printf and check accuracy. If accuracy is to low, the rectangle will not be displayed on the screen.
Maybe the the output results are not formatted as expected by the C/C++ application.

Hope it helps.

BR

Vincent

SChau.2062 · ‎2020-03-09

Hi @VABRI ,

Thanks for the explanation.

Could you please tell where can I find C/C++ script on STM device itself i.e. location so that I can edit C/C++ script directly on the device side. I can edit the Python script in this path /usr/local/demo-ai/ai-cv/python.

I have simply replaced the default object detection detect.tflite with some other tflite file i.e. model.tflite. Basically both model files belong the same class of application i.e. object detection. If object detection (coco SSD Mobilenet V1) C/C++ application demo runs than it should work with the custom tflite file.
If Python script detects an object using model.tflite file it means model works properly then C/C++ should deliver the same result. Here we are only moving from Python script to C/C++ application (there is no affect to/from model.tflite file). The result should be the same. Because in the Python script and C/C++ application, the default threshold level is 0.5.

Please feel free to correct me and share your feedback!

Thanks,

Saurabh

VABRI · ‎2020-03-09

Hi @SChau.2062 ,

I'm fully in line with your analysis. It should work. So there is something wrong and my advise is to have a look into the source code of the C/C++ application.

You will not be able to edit the C/C++ application on the target, because only the binary is available on the target.

The source of the C/C++ object detection application is located in the meta-st-stm32mpu-ai openembedded layer of the OpenSTLinux Distribution:

./meta-st-stm32mpu-ai/recipes-samples/demo/tensorflow-lite-cv-apps/bin/objdetect_tfl_gst_gtk.cc

I assume that you have the OpenSTLinux Distribution environment up and running so you can rebuild the objdetect_tfl_gst_gtk application with the following command:

bitbake tensorflow-lite-cv-apps -c compile

BR

Vincent

SChau.2062 · ‎2020-03-09

Hi @VABRI ,

Thank you, I will look into that and meanwhile move forward with the Python script.

Thanks,

Saurabh