2025-08-24 6:37 AM
Dear all,
I'm converting a quantized ONNX model using the STEdgeAI Core CLI, for use on the STM32N6 NPU. However, I cannot explain the very high RAM usage for the activations, as reported by stedgeai.
The network is a simple CNN with 10 convolution layers each followed by a ReLU. I have attached the original (not quantized) model to this post (mvs_model.onnx, see also image below) as well as the quantized version (mvs_model.quant.onnx). Quantization was performed using the ONNXRuntime tools.
When generating code for the N6 NPU (using "stedgeai generate --target stm32n6 --st-neural-art O3@neural_art.json --model mvs_model.quant.onnx --input-data-type uint8 --name mvs" with neural_art.json and stm32n6.mpool as attached) the CLI reports crazy high RAM usage for the activations (full output attached as stedgeai_output.txt):
 Requested memory size by section - "stm32n6npu" target                                             
 ------------------------------- ------- ------------ ------ ------------                           
 module                             text       rodata   data          bss                           
 ------------------------------- ------- ------------ ------ ------------                           
 mvs.o                                54       83,203      0            0                           
 NetworkRuntime1020_CM55_GCC.a         0            0      0            0                           
 lib (toolchain)*                      0            0      0            0                           
 ll atonn runtime                  3,044        2,277      0           13                           
 ------------------------------- ------- ------------ ------ ------------                           
 RT total**                        3,098       85,480      0           13                           
 ------------------------------- ------- ------------ ------ ------------                           
 weights                               0   10,060,864      0            0                           
 activations                           0            0      0   22,883,328                           
 io                                    0            0      0            0                           
 ------------------------------- ------- ------------ ------ ------------                           
 TOTAL                             3,098   10,146,344      0   22,883,341                           
 ------------------------------- ------- ------------ ------ ------------ However, manually calculating the activation memory usage I would estimate this to be only around 2.5MB (instead of nearly 23 MB !). Here, I assume the activation buffers of previous layers are re-used when they are not needed anymore when executing subsequent layers, and also that all activations take 1 byte each (since it is 8-bit quantized). I know that some NPUs (not sure about the Neural-ART) need some additional scratch memory, and also that the STEdgeAI inserts padding nodes (see intermediate model graph: mvs_model.quant_OE_3_3_0.onnx) that increase activation memory usage slightly, but none of this can explain this huge memory consumption.
Does anybody have insights into how STEdgeAI-Core allocates the activations, and how one can debug such unexplainably high memory usage? I guess I'm probably just missing some configuration option here, since this is a pretty standard small-ish CNN...
Thanks a lot, Michael
2025-08-26 7:52 AM
Hello @asdfasdf,
Your comment is rational.
Both in SW and in HW predicting the activations is challenging since there are a lot of optimizations which impact on it.
In the other hand I found something strange.
The activation of your quantized model are 2 times bigger than the one of your not quantized models. I don't see why it would be the case.
I opened a bug internally as it may be one.
I'll update you when I get news from the dev team
Have a good day,
Julian
2025-08-27 6:48 AM
Thanks @Julian E. for your reply - I was already starting to question my own sanity :D As you say, I also find it very difficult and frustrating to optimize memory usage of models for the Neural ART. I opened another question here to collect tipps and tricks regarding this, and I'd love to get your input there as well!
Best, Michael
2025-09-08 6:42 AM
Hello @asdfasdf,
Sorry for the delay, but I am still looking for answer internally.
Basically, here is the comment of our team:
It would be interesting to know how the theoretically value has been computed. If this is actually the case, then probably there is some errors/bugs in how the tool is invoked. Otherwise, if the model is definitely too large for the system there is not much we can do, i.e., a model cannot be shrunk indefinitely to fit in the memory
Could you explain how you compute the 2.5MB?
For your other post, I'll answer when I get more info.
Have a good day,
Julian
