cancel
Showing results for 
Search instead for 
Did you mean: 

Unexplainably high RAM consumption of STEdgeAI-Core generated model

asdfasdf
Associate III

Dear all,

I'm converting a quantized ONNX model using the STEdgeAI Core CLI, for use on the STM32N6 NPU. However, I cannot explain the very high RAM usage for the activations, as reported by stedgeai.

The network is a simple CNN with 10 convolution layers each followed by a ReLU. I have attached the original (not quantized) model to this post (mvs_model.onnx, see also image below) as well as the quantized version (mvs_model.quant.onnx). Quantization was performed using the ONNXRuntime tools.

When generating code for the N6 NPU (using "stedgeai generate --target stm32n6 --st-neural-art O3@neural_art.json --model mvs_model.quant.onnx  --input-data-type uint8  --name mvs" with neural_art.json and stm32n6.mpool as attached) the CLI reports crazy high RAM usage for the activations (full output attached as stedgeai_output.txt):

 Requested memory size by section - "stm32n6npu" target                                             
 ------------------------------- ------- ------------ ------ ------------                           
 module                             text       rodata   data          bss                           
 ------------------------------- ------- ------------ ------ ------------                           
 mvs.o                                54       83,203      0            0                           
 NetworkRuntime1020_CM55_GCC.a         0            0      0            0                           
 lib (toolchain)*                      0            0      0            0                           
 ll atonn runtime                  3,044        2,277      0           13                           
 ------------------------------- ------- ------------ ------ ------------                           
 RT total**                        3,098       85,480      0           13                           
 ------------------------------- ------- ------------ ------ ------------                           
 weights                               0   10,060,864      0            0                           
 activations                           0            0      0   22,883,328                           
 io                                    0            0      0            0                           
 ------------------------------- ------- ------------ ------ ------------                           
 TOTAL                             3,098   10,146,344      0   22,883,341                           
 ------------------------------- ------- ------------ ------ ------------ 

However, manually calculating the activation memory usage I would estimate this to be only around 2.5MB (instead of nearly 23 MB !). Here, I assume the activation buffers of previous layers are re-used when they are not needed anymore when executing subsequent layers, and also that all activations take 1 byte each (since it is 8-bit quantized). I know that some NPUs (not sure about the Neural-ART) need some additional scratch memory, and also that the STEdgeAI inserts padding nodes (see intermediate model graph: mvs_model.quant_OE_3_3_0.onnx) that increase activation memory usage slightly, but none of this can explain this huge memory consumption.

Does anybody have insights into how STEdgeAI-Core allocates the activations, and how one can debug such unexplainably high memory usage? I guess I'm probably just missing some configuration option here, since this is a pretty standard small-ish CNN...

Thanks a lot, Michael

mvs_model.png

0 REPLIES 0