2025-01-21 08:00 AM
Hello!
I am using CubeAI (version 9.1) to generate code for running an ML model on a microcontroller.
When I generate the code with CubeMX using my .keras model, there are no compilation issues, and it runs perfectly.
However, when I generate the code with CubeMX using my .tflite model (quantized in int8) and then using it to compile my project, I encounter an overflow error during compilation:
region 'FLASH' overflowed by 22340 bytes.
This is quite surprising because, according to CubeMX, my quantized .tflite model is about two times smaller in both RAM and FLASH usage compared to the .keras model (I’ve attached a photo showing the FLASH and RAM usage for both models).
I use tensorflow 2.12.
My theory is that the libraries required to run my int8 model take up more FLASH space, but I’m not sure. Or maybe I made a mistake when integrating the model into my project. Does anyone have an idea where this problem might come from?
2025-01-21 11:55 PM
Hello @EnzoC,
Could you please perform, on STM32CubeMX, the "Analyze" action ?
You should retrieve, at the end of the textual output, the information regarding library and code size.
For example:
Requested memory size by section - "stm32h7" target
------------------------------ -------- --------- -------- ---------
module text rodata data bss
------------------------------ -------- --------- -------- ---------
NetworkRuntime1000_CM7_GCC.a 36,764 0 0 0
network.o 4,322 24,188 29,232 1,284
network_data.o 48 16 88 0
lib (toolchain)* 892 624 0 0
------------------------------ -------- --------- -------- ---------
RT total** 42,026 24,828 29,320 1,284
------------------------------ -------- --------- -------- ---------
weights 0 494,760 0 0
activations 0 0 0 246,256
io 0 0 0 0
------------------------------ -------- --------- -------- ---------
TOTAL 42,026 519,588 29,320 247,540
------------------------------ -------- --------- -------- ---------
* toolchain objects (libm/libgcc*)
** RT AI runtime objects (kernels+infrastructure)
Summary - "stm32h7" target
---------------------------------------------------
FLASH (ro) %* RAM (rw) %
---------------------------------------------------
RT total 96,174 16.3% 30,604 11.1%
---------------------------------------------------
TOTAL 590,934 276,860
---------------------------------------------------
* rt/total
Best regards,
Yanis
2025-01-22 01:51 AM
Hello @hamitiya,
I checked this table for both the quantized TFLite model and the Keras model, and I confirm that the libraries required to run the INT8 model take up more flash memory compared to those for the FLOAT32 model (Requested memory size tables below). However, the weights of my INT8 model are four times smaller than those of the FLOAT32 model. Therefore, the INT8 model is supposed to require less flash memory overall, even though the libraries for running it are larger.
2025-01-22 02:18 AM
Thanks for your update.
You are right, int8 model should consume less flash and ram.
Are you using STM32CubeIDE ?
If yes, you can find detailed flash consumption in Build Analyzer => Memory Details sections
Example:
On your side, could you compare "FLASH" section between your two projects ? I expect a region consuming more in one than the other and then we can discriminate which element is the culprit.
Best regards,
Yanis
2025-01-29 01:19 AM
Hello @EnzoC ,
Did you solve your issue?
Have a good day,
Julian