Unexplainably high RAM consumption of STEdgeAI-Core generated model

asdfasdf · ‎2025-08-24

Dear all,

I'm converting a quantized ONNX model using the STEdgeAI Core CLI, for use on the STM32N6 NPU. However, I cannot explain the very high RAM usage for the activations, as reported by stedgeai.

The network is a simple CNN with 10 convolution layers each followed by a ReLU. I have attached the original (not quantized) model to this post (mvs_model.onnx, see also image below) as well as the quantized version (mvs_model.quant.onnx). Quantization was performed using the ONNXRuntime tools.

When generating code for the N6 NPU (using "stedgeai generate --target stm32n6 --st-neural-art O3@neural_art.json --model mvs_model.quant.onnx --input-data-type uint8 --name mvs" with neural_art.json and stm32n6.mpool as attached) the CLI reports crazy high RAM usage for the activations (full output attached as stedgeai_output.txt):

 Requested memory size by section - "stm32n6npu" target                                             
 ------------------------------- ------- ------------ ------ ------------                           
 module                             text       rodata   data          bss                           
 ------------------------------- ------- ------------ ------ ------------                           
 mvs.o                                54       83,203      0            0                           
 NetworkRuntime1020_CM55_GCC.a         0            0      0            0                           
 lib (toolchain)*                      0            0      0            0                           
 ll atonn runtime                  3,044        2,277      0           13                           
 ------------------------------- ------- ------------ ------ ------------                           
 RT total**                        3,098       85,480      0           13                           
 ------------------------------- ------- ------------ ------ ------------                           
 weights                               0   10,060,864      0            0                           
 activations                           0            0      0   22,883,328                           
 io                                    0            0      0            0                           
 ------------------------------- ------- ------------ ------ ------------                           
 TOTAL                             3,098   10,146,344      0   22,883,341                           
 ------------------------------- ------- ------------ ------ ------------

However, manually calculating the activation memory usage I would estimate this to be only around 2.5MB (instead of nearly 23 MB !). Here, I assume the activation buffers of previous layers are re-used when they are not needed anymore when executing subsequent layers, and also that all activations take 1 byte each (since it is 8-bit quantized). I know that some NPUs (not sure about the Neural-ART) need some additional scratch memory, and also that the STEdgeAI inserts padding nodes (see intermediate model graph: mvs_model.quant_OE_3_3_0.onnx) that increase activation memory usage slightly, but none of this can explain this huge memory consumption.

Does anybody have insights into how STEdgeAI-Core allocates the activations, and how one can debug such unexplainably high memory usage? I guess I'm probably just missing some configuration option here, since this is a pretty standard small-ish CNN...

Thanks a lot, Michael

Julian E. · ‎2025-08-26

Hello @asdfasdf,

Your comment is rational.

Both in SW and in HW predicting the activations is challenging since there are a lot of optimizations which impact on it.

In the other hand I found something strange.

The activation of your quantized model are 2 times bigger than the one of your not quantized models. I don't see why it would be the case.

I opened a bug internally as it may be one.

I'll update you when I get news from the dev team

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

asdfasdf · ‎2025-08-27

Thanks @Julian E. for your reply - I was already starting to question my own sanity :D As you say, I also find it very difficult and frustrating to optimize memory usage of models for the Neural ART. I opened another question here to collect tipps and tricks regarding this, and I'd love to get your input there as well!

Best, Michael

Julian E. · ‎2025-09-08

Hello @asdfasdf,

Sorry for the delay, but I am still looking for answer internally.

Basically, here is the comment of our team:

It would be interesting to know how the theoretically value has been computed. If this is actually the case, then probably there is some errors/bugs in how the tool is invoked. Otherwise, if the model is definitely too large for the system there is not much we can do, i.e., a model cannot be shrunk indefinitely to fit in the memory

Could you explain how you compute the 2.5MB?

For your other post, I'll answer when I get more info.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.