cancel
Showing results for 
Search instead for 
Did you mean: 

how to get the bounding boxes from the output of the object detection models?

jetty
Associate

Hi everyone,

I'm trying to integrate st_yolo_lc_v1_192 pretrained model to my project on STM32H750 MCU.

I'm new to the framework and AI tech.

I see the output shape is 12x12x30 which I know the 12x12 means there are 12x12 grids. What I don't know is the '30', does that mean 5 * [x, y, w, h, confidence, class] ? 

and do I need to apply an activation function to these values? 

Thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
MCHTO.1
ST Employee
1 REPLY 1
MCHTO.1
ST Employee

Hi,

Indeed 30 is 5 * [x, y, w, h, confidence, class] because by default st_yololc_v1 has 5 anchors.

To decode the raw output of the model we use this function:
https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/e5361e76f8427b0907b67d9815101d05c32e7407/object_detection/src/postprocessing/tiny_yolo_v2_postprocess.py#L12 

and then you need to apply non max suppression, an example can be found here:

https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/e5361e76f8427b0907b67d9815101d05c32e7407/object_detection/src/postprocessing/tiny_yolo_v2_postprocess.py#L104