2025-08-24 6:37 AM
Dear all,
I'm converting a quantized ONNX model using the STEdgeAI Core CLI, for use on the STM32N6 NPU. However, I cannot explain the very high RAM usage for the activations, as reported by stedgeai.
The network is a simple CNN with 10 convolution layers each followed by a ReLU. I have attached the original (not quantized) model to this post (mvs_model.onnx, see also image below) as well as the quantized version (mvs_model.quant.onnx). Quantization was performed using the ONNXRuntime tools.
When generating code for the N6 NPU (using "stedgeai generate --target stm32n6 --st-neural-art O3@neural_art.json --model mvs_model.quant.onnx --input-data-type uint8 --name mvs" with neural_art.json and stm32n6.mpool as attached) the CLI reports crazy high RAM usage for the activations (full output attached as stedgeai_output.txt):
Requested memory size by section - "stm32n6npu" target
------------------------------- ------- ------------ ------ ------------
module text rodata data bss
------------------------------- ------- ------------ ------ ------------
mvs.o 54 83,203 0 0
NetworkRuntime1020_CM55_GCC.a 0 0 0 0
lib (toolchain)* 0 0 0 0
ll atonn runtime 3,044 2,277 0 13
------------------------------- ------- ------------ ------ ------------
RT total** 3,098 85,480 0 13
------------------------------- ------- ------------ ------ ------------
weights 0 10,060,864 0 0
activations 0 0 0 22,883,328
io 0 0 0 0
------------------------------- ------- ------------ ------ ------------
TOTAL 3,098 10,146,344 0 22,883,341
------------------------------- ------- ------------ ------ ------------
However, manually calculating the activation memory usage I would estimate this to be only around 2.5MB (instead of nearly 23 MB !). Here, I assume the activation buffers of previous layers are re-used when they are not needed anymore when executing subsequent layers, and also that all activations take 1 byte each (since it is 8-bit quantized). I know that some NPUs (not sure about the Neural-ART) need some additional scratch memory, and also that the STEdgeAI inserts padding nodes (see intermediate model graph: mvs_model.quant_OE_3_3_0.onnx) that increase activation memory usage slightly, but none of this can explain this huge memory consumption.
Does anybody have insights into how STEdgeAI-Core allocates the activations, and how one can debug such unexplainably high memory usage? I guess I'm probably just missing some configuration option here, since this is a pretty standard small-ish CNN...
Thanks a lot, Michael