cancel
Showing results for 
Search instead for 
Did you mean: 

Why Are Some Layers Executed in Software with X-CUBE-AI on STM32N6570-DK

diama13
Associate II

Hello,

I'm working with the STM32N6570-DK board and using X-CUBE-AI to run quantized neural networks converted from ONNX models.

I'm trying to better understand which parts of the model are executed in software vs. hardware, specifically on this board, and what determines that split.

My Observations:

1. ResNet-8 Model (custom quantized)

When I validated a quantized ResNet-8 model, the output during target validation showed:

Total number of epochs: 16 of which 2 implemented in software
epoch ID HW/SW/EC Operation (SW only)
epoch 15 -SW- ( Softmax )
epoch 16 -SW- ( DequantizeLinear )

So, almost everything was offloaded to hardware, except for the final softmax and dequantization — which I expected.

2. Simple Model Quantized via ST Developer Cloud

diama13_0-1744796650997.png

 

However, when I tested a very simple model (shown above), quantized through the ST Developer Cloud, I saw this:

Total number of epochs: 5 of which 3 implemented in software
epoch ID HW/SW/EC Operation (SW only)
epoch 2 -SW- ( DequantizeLinear )
epoch 3 -SW- ( Conv )
epoch 4 -SW- ( QuantizeLinear )

Here, even Conv was executed in software, despite the model being quantized.

My Questions:

  1. Why does Conv2D run in software in the second case, but in hardware in the ResNet-8 case?

  2. Does the hardware accelerator require a certain layer structure, padding style, kernel size, or input alignment to be offloaded properly?

  3. Is there a way to force or hint that a layer should run in hardware if it technically can?

  4. What are the conditions under which DequantizeLinear and QuantizeLinear run in hardware?

Thanks a lot

1 REPLY 1
Julian E.
ST Employee

Hello @diama13,

 

The Aton compiler (which manages the epochs) is looking at your memories and NPU configuration and tries to get the best (fastest) combination it finds.

JulianE_1-1744879610245.png

You cannot manually force any layer into HW or SW. but you can try to change some options to change the outcome of the aton compiler. 

For example, you can change the optimization mode and try the 3 available (by default it is in auto and will select the best)

JulianE_2-1744880175572.png

 

(https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_neural_art_compiler.html)

You can find general tips here: 

https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_neural_art_compiler.html#generic-recommendations 

 

For your second question in particular, you can look at the layer supported as you will find more information about what is done:  https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_operator_support.html#ref_size

 

I hope it helps,

Julian

 


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.