2026-05-07 1:07 AM
Hello,
I noticed an interesting behavior while comparing STM32N6 CPU execution vs NPU execution using ST Edge AI + Neural-ART.
For a simple Dense/Fully Connected (ad.tflite) model, enabling the NPU did not significantly change the weights size:
So the compiled NPU representation stayed almost identical.
However, for a VWW/MobileNet-like model using many DEPTHWISE_CONV_2D layers, I observed a large ROM increase after enabling the NPU:
At first I thought this was caused by the quantization format conversion:
But I also tested ResNet model where the same quantization conversion occurs, without a major ROM increase.
So it seems the quantization format change alone is not the main reason.
Can ST confirm whether:
Thanks!
2026-05-07 5:39 AM
Hi @ayaaaa,
Please check the generated report to compare the number of SW epochs and HW epochs.
In the first case, the model may still run mostly through SW epochs, so the generated weights stay close to the CPU version.
In the second case, the NPU is likely used more heavily through HW epochs. This can require weight repacking/reordering/alignment for the NPU, especially for DEPTHWISE_CONV_2D layers, which can significantly increase the ROM size.
So, the ROM increase is likely related more to NPU weight transformation than to the quantization format change alone.
Have a good day,
Julian