stm32mp2 used yolo11 module.

fanronghua0123456 · ‎2025-09-21

Now, i used yolo11 module ,

cd /opt/ST/STEdgeAI/2.2/Utilities/linux/

./stedgeai generate -m /home/alientek/STM32MPU_workspace/yolo11.onnx --target stm32mp25

and get yolo11.nb file

in the my stm32mp2 broad.

x-linux-ai-benchmark -m ./yolo11n.nb

╔════════════════════════════════════════════════╗
║ X-LINUX-AI unified NN model benchmark ║
╠════════════════════════════════╦═══════════════╣
║ Machine ║ STM32MP257 ║
║ CPU cores ║ 2 ║
║ CPU Clock frequency ║ 1.5GHz ║
║ GPU/NPU Driver Version ║ 6.4.19 ║
║ GPU/NPU Clock frequency ║ 800 MHZ ║
║ X-LINUX-AI Version ║ v6.0.0 ║
║ ║ ║
║ ║ ║
╚════════════════════════════════╩═══════════════╝
For hardware accelerated models, computation engine used for benchmark is NPU running at 800 MHZ
For other models, computation engine uses for benchmark is CPU with 2 cores at : 1.5GHz
╔══════════════════════════════════════════════════════════════════════════╗
║ NBG models benchmark ║
╠════════════╦═════════════════════╦═══════╦═══════╦═══════╦═══════════════╣
║ Model Name ║ Inference Time (ms) ║ CPU % ║ GPU % ║ NPU % ║ Peak RAM (MB) ║
╠════════════╬═════════════════════╬═══════╬═══════╬═══════╬═══════════════╣
║ yolo11n ║ 1043.37 ║ 0.0 ║ 96.23 ║ 3.77 ║ 30.02 ║
╚════════════╩═════════════════════╩═══════╩═══════╩═══════╩═══════════════╝
╔══════════════════════════════════════════════════════════════╗
║ Non-Optimal models ║
╠════════════╦═════════════════════════════════════════════════╣
║ model name ║ comments ║
╠════════════╬═════════════════════════════════════════════════╣
║ yolo11n ║ GPU usage is 96.23% compared to NPU usage 3.77% ║
║ ║ please verify if the model is quantized or that ║
║ ║ the quantization scheme used is the 8-bits per- ║
║ ║ tensor ║
╚════════════╩═════════════════════════════════════════════════╝

the Inference Time is 1043.37 ms.

Julian E. · ‎2025-09-22

Hello @fanronghua0123456,

What is the issue here?

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

fanronghua0123456 · ‎2025-09-22

the Inference Time is 1043.37 ms. The reasoning time is too long.

Julian E. · ‎2025-09-30

Hello @fanronghua0123456,

This part of the message seems to indicate that your model is not quantized in per channel (maybe not quantized at all):
╠════════════╬═════════════════════════════════════════════════╣
║ yolo11n ║ GPU usage is 96.23% compared to NPU usage 3.77% ║
║ ║ please verify if the model is quantized or that ║
║ ║ the quantization scheme used is the 8-bits per- ║
║ ║ tensor ║
╚════════════╩═════════════════════════════════════════════════╝

The NPU is around 10x faster than the GPU and here your model is running mainly on the GPU.

The NPU can only be used with per tensor uint8 quantized model. I don't think it is the case here.

Here are some elements (first figure):

How to deploy your NN model on STM32MPU - stm32mpu

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.