cancel
Showing results for 
Search instead for 
Did you mean: 

Flash overflow when using a quantified tflite model

EnzoC
Associate

 

Hello!

I am using CubeAI (version 9.1) to generate code for running an ML model on a microcontroller.

When I generate the code with CubeMX using my .keras model, there are no compilation issues, and it runs perfectly.

However, when I generate the code with CubeMX using my .tflite model (quantized in int8) and then using it to compile my project,  I encounter an overflow error during compilation:
region 'FLASH' overflowed by 22340 bytes.

This is quite surprising because, according to CubeMX, my quantized .tflite model is about two times smaller in both RAM and FLASH usage compared to the .keras model (I’ve attached a photo showing the FLASH and RAM usage for both models).

tflite quantified modeltflite quantified modelKeras modelKeras model

I use tensorflow 2.12. 

My theory is that the libraries required to run my int8 model take up more FLASH space, but I’m not sure. Or maybe I made a mistake when integrating the model into my project. Does anyone have an idea where this problem might come from?

4 REPLIES 4
hamitiya
ST Employee

Hello @EnzoC,

Could you please perform, on STM32CubeMX, the "Analyze" action ? 

You should retrieve, at the end of the textual output, the information regarding library and code size.

For example:

 

 Requested memory size by section - "stm32h7" target
 ------------------------------ -------- --------- -------- ---------
 module                             text    rodata     data       bss
 ------------------------------ -------- --------- -------- ---------
 NetworkRuntime1000_CM7_GCC.a     36,764         0        0         0
 network.o                         4,322    24,188   29,232     1,284
 network_data.o                       48        16       88         0
 lib (toolchain)*                    892       624        0         0
 ------------------------------ -------- --------- -------- ---------
 RT total**                       42,026    24,828   29,320     1,284
 ------------------------------ -------- --------- -------- ---------
 weights                               0   494,760        0         0
 activations                           0         0        0   246,256
 io                                    0         0        0         0
 ------------------------------ -------- --------- -------- ---------
 TOTAL                            42,026   519,588   29,320   247,540
 ------------------------------ -------- --------- -------- ---------
 *  toolchain objects (libm/libgcc*)
 ** RT AI runtime objects (kernels+infrastructure)
  Summary - "stm32h7" target
  ---------------------------------------------------
               FLASH (ro)      %*   RAM (rw)       %
  ---------------------------------------------------
  RT total         96,174   16.3%     30,604   11.1%
  ---------------------------------------------------
  TOTAL           590,934            276,860
  ---------------------------------------------------
  *  rt/total

 

Best regards,

Yanis


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Hello @hamitiya,

I checked this table for both the quantized TFLite model and the Keras model, and I confirm that the libraries required to run the INT8 model take up more flash memory compared to those for the FLOAT32 model (Requested memory size tables below). However, the weights of my INT8 model are four times smaller than those of the FLOAT32 model. Therefore, the INT8 model is supposed to require less flash memory overall, even though the libraries for running it are larger.

hamitiya
ST Employee

Thanks for your update.

You are right, int8 model should consume less flash and ram.

Are you using STM32CubeIDE ?

If yes, you can find detailed flash consumption in Build Analyzer => Memory Details sections

Example:

hamitiya_0-1737540966432.png

 

On your side, could you compare "FLASH" section between your two projects ? I expect a region consuming more in one than the other and then we can discriminate which element is the culprit.

 

Best regards,

Yanis

 


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Hello @EnzoC ,

Did you solve your issue?

 

Have a good day,

Julian


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.