cancel
Showing results for 
Search instead for 
Did you mean: 

STM32N6 NPU – ROM size increases for some models after Neural-ART compilation?

ayaaaa
Associate III

STM32N6 NPU – Why does ROM size increase for some models after Neural-ART compilation?

Hello,

I noticed an interesting behavior while comparing STM32N6 CPU execution vs NPU execution using ST Edge AI + Neural-ART.

For a simple Dense/Fully Connected (ad.tflite) model, enabling the NPU did not significantly change the weights size:

  • CPU target:
    • weights (ro): ~270 KB
  • NPU target:
    • weights (ro): ~269 KB

So the compiled NPU representation stayed almost identical.

However, for a VWW/MobileNet-like model using many DEPTHWISE_CONV_2D layers, I observed a large ROM increase after enabling the NPU:

  • CPU target:
    • weights (ro): ~42 KB
  • NPU target:
    • weights (ro): ~227 KB

At first I thought this was caused by the quantization format conversion:

  • CPU:
    • model_fmt : ss/sa per channel
  • NPU:
    • model_fmt : ss/sa per tensor

But I also tested ResNet model where the same quantization conversion occurs, without a major ROM increase.

So it seems the quantization format change alone is not the main reason.

My question

Can ST confirm whether:

  1. Neural-ART internally repacks/duplicates depthwise convolution weights for NPU scheduling?
  2. Some MobileNet-style architectures inherently require more ROM overhead on STM32N6 NPU?
  3. The per-channel -> per-tensor conversion contributes significantly to this behavior, or is the main factor actually the depthwise memory layout optimization?

Thanks!

1 REPLY 1
Julian E.
ST Employee

Hi @ayaaaa,

 

Please check the generated report to compare the number of SW epochs and HW epochs.

In the first case, the model may still run mostly through SW epochs, so the generated weights stay close to the CPU version.

In the second case, the NPU is likely used more heavily through HW epochs. This can require weight repacking/reordering/alignment for the NPU, especially for DEPTHWISE_CONV_2D layers, which can significantly increase the ROM size.

So, the ROM increase is likely related more to NPU weight transformation than to the quantization format change alone.

 

Have a good day,

Julian

 


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.