2025-09-21 11:49 PM
Now, i used yolo11 module ,
cd /opt/ST/STEdgeAI/2.2/Utilities/linux/
./stedgeai generate -m /home/alientek/STM32MPU_workspace/yolo11.onnx --target stm32mp25
and get yolo11.nb file
in the my stm32mp2 broad.
x-linux-ai-benchmark -m ./yolo11n.nb
╔════════════════════════════════════════════════╗
║ X-LINUX-AI unified NN model benchmark ║
╠════════════════════════════════╦═══════════════╣
║ Machine ║ STM32MP257 ║
║ CPU cores ║ 2 ║
║ CPU Clock frequency ║ 1.5GHz ║
║ GPU/NPU Driver Version ║ 6.4.19 ║
║ GPU/NPU Clock frequency ║ 800 MHZ ║
║ X-LINUX-AI Version ║ v6.0.0 ║
║ ║ ║
║ ║ ║
╚════════════════════════════════╩═══════════════╝
For hardware accelerated models, computation engine used for benchmark is NPU running at 800 MHZ
For other models, computation engine uses for benchmark is CPU with 2 cores at : 1.5GHz
╔══════════════════════════════════════════════════════════════════════════╗
║ NBG models benchmark ║
╠════════════╦═════════════════════╦═══════╦═══════╦═══════╦═══════════════╣
║ Model Name ║ Inference Time (ms) ║ CPU % ║ GPU % ║ NPU % ║ Peak RAM (MB) ║
╠════════════╬═════════════════════╬═══════╬═══════╬═══════╬═══════════════╣
║ yolo11n ║ 1043.37 ║ 0.0 ║ 96.23 ║ 3.77 ║ 30.02 ║
╚════════════╩═════════════════════╩═══════╩═══════╩═══════╩═══════════════╝
╔══════════════════════════════════════════════════════════════╗
║ Non-Optimal models ║
╠════════════╦═════════════════════════════════════════════════╣
║ model name ║ comments ║
╠════════════╬═════════════════════════════════════════════════╣
║ yolo11n ║ GPU usage is 96.23% compared to NPU usage 3.77% ║
║ ║ please verify if the model is quantized or that ║
║ ║ the quantization scheme used is the 8-bits per- ║
║ ║ tensor ║
╚════════════╩═════════════════════════════════════════════════╝
the Inference Time is 1043.37 ms.
2025-09-22 2:39 AM
Hello @fanronghua0123456,
What is the issue here?
Have a good day,
Julian
2025-09-22 3:07 AM
the Inference Time is 1043.37 ms. The reasoning time is too long.
2025-09-30 1:29 AM - edited 2025-09-30 4:45 AM
Hello @fanronghua0123456,
This part of the message seems to indicate that your model is not quantized in per channel (maybe not quantized at all):
╠════════════╬═════════════════════════════════════════════════╣
║ yolo11n ║ GPU usage is 96.23% compared to NPU usage 3.77% ║
║ ║ please verify if the model is quantized or that ║
║ ║ the quantization scheme used is the 8-bits per- ║
║ ║ tensor ║
╚════════════╩═════════════════════════════════════════════════╝
The NPU is around 10x faster than the GPU and here your model is running mainly on the GPU.
The NPU can only be used with per tensor uint8 quantized model. I don't think it is the case here.
Here are some elements (first figure):
How to deploy your NN model on STM32MPU - stm32mpu
Have a good day,
Julian