cancel
Showing results for 
Search instead for 
Did you mean: 

RAM Overflow and X-CUBE-AI Quantization Analysis Error with Onnx Model on STM32F401RE

SR1218
Associate II

RAM Overflow and X-CUBE-AI Quantization Analysis Error with PyTorch Model on STM32F401RE

Hello everyone,

I'm working on deploying a PyTorch model to an STM32F401RE NUCLEO board and encountering some challenging memory and quantization issues that I hope the community can help me resolve.

Project Context

My project involves running a custom PyTorch model (converted to ONNX format) on the STM32F401RE NUCLEO board. The system already has USB Host (Audio Class) library and FreeRTOS integrated as essential components for my application, which means I need to work within the remaining available memory space.

Development Environment

  • Board: STM32F401RE NUCLEO (96KB SRAM)
  • IDE: STM32CubeIDE 1.18.1
  • X-CUBE-AI: 10.2.0
  • Additional Libraries: USB Host (Audio Class), FreeRTOS
  • Model: PyTorch → ONNX converted

Problem 1: RAM Overflow During Build

When I configure X-CUBE-AI with compression set to high and optimization set to ram, the build process fails with a linker error:

C:/ST/STM32CubeIDE_1.18.1/STM32CubeIDE/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.13.3.rel1.win32_1.0.0.202411081344/tools/bin/../lib/gcc/arm-none-eabi/13.3.1/../../../../arm-none-eabi/bin/ld.exe: Xcube.elf section `.bss' will not fit in region `RAM'

C:/ST/STM32CubeIDE_1.18.1/STM32CubeIDE/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.13.3.rel1.win32_1.0.0.202411081344/tools/bin/../lib/gcc/arm-none-eabi/13.3.1/../../../../arm-none-eabi/bin/ld.exe: region `RAM' overflowed by 10800 bytes

Given that the STM32F401RE has only 96KB of SRAM and I already have USB Host and FreeRTOS consuming memory, this overflow isn't entirely surprising.

Problem 2: X-CUBE-AI Quantization Analysis Failure

To address the memory constraints, I attempted INT8 quantization using ONNX Runtime. Here's the quantization code I used:

class MyCalibrationDataReader(CalibrationDataReader):
    def __init__(self, data, model_path):
        self.enum_data = None
        self.data = data 

        # Use inference session to get input shape.
        session = onnxruntime.InferenceSession(model_path, None)
        batch_size, channel, length = session.get_inputs()[0].shape
        self.input_name = session.get_inputs()[0].name
        self.datasize = len(data)

    def get_next(self):
        if self.enum_data is None:
            self.enum_data = iter([
                {self.input_name: sample[np.newaxis, np.newaxis, :].astype(np.float32)}  # (2048,) → (1, 1, 2048)
                for sample in self.data
            ])
        return next(self.enum_data, None)
    
    def rewind(self):
        self.enum_data = None  # Reset the enumeration of calibration data

dr = MyCalibrationDataReader(cali_data, model_fp32_prep)

quantize_static(
    model_fp32_prep,
    model_quant,
    dr,
    quant_format=QuantFormat.QDQ,
    per_channel=True,
    weight_type=QuantType.QInt8, 
    activation_type=QuantType.QInt8, 
    reduce_range=True,
    extra_options={'WeightSymmetric': True, 'ActivationSymmetric': False}
)

However, when I try to analyze the quantized model with X-CUBE-AI, I encounter this error:

Analyzing model C:/Users/user/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/10.2.0/Utilities/windows/stedgeai.exe analyze --target stm32f4 --name network -m C:/Users/user/Downloads/FRFconv-TDS_onnx.quant.onnx --compression high --verbosity 1 --no-inputs-allocation -O ram --no-outputs-allocation --memory-pool C:\Users\user\AppData\Local\Temp\mxAI_workspace1712638268650010180404439894557726\mempools.json --workspace C:/Users/user/AppData/Local/Temp/mxAI_workspace1712638268650010180404439894557726 --output C:/Users/user/.stm32cubemx/network_output 

ST Edge AI Core v2.2.0-20266 2adc00962 
INTERNAL ERROR: 'NoneType' object is not subscriptable

Problem 3: Same Error on ST Edge AI Developer Cloud

I also tried using ST Edge AI Developer Cloud for quantization, but encountered the same issue:

>>> stedgeai analyze --model FRFconv-TDS_onnx_PerTensor_quant_random_2.onnx --optimization ram --target stm32f4 --name network --workspace workspace --output output 

ST Edge AI Core v2.2.0-20266 2adc00962 
INTERNAL ERROR: 'NoneType' object is not subscriptable

My Questions

I'm quite attached to my current model architecture as it's specifically designed for my application requirements, so I'd prefer not to change the model structure if possible.

  1. Memory Optimization: Has anyone successfully deployed AI models on STM32F401RE with other libraries like USB Host and FreeRTOS running simultaneously? Are there additional memory optimization techniques beyond X-CUBE-AI's high compression and RAM optimization that I could try?

  2. Quantization Error: Have you encountered the 'NoneType' error when analyzing quantized ONNX models in X-CUBE-AI? This seems to occur both locally and on the cloud platform. Could this be a compatibility issue with my quantization approach or the ONNX model format?

  3. Alternative Approaches: Are there other strategies to make my model fit within the available memory constraints without modifying the model architecture?

Additional Information Available

I have the following resources available if they would help with troubleshooting:

  • Original PyTorch model code
  • ONNX conversion and quantization scripts
  • Original ONNX model file (before quantization)

Please let me know if you need any additional information to help diagnose these issues.

Thank you in advance for your assistance!

1 ACCEPTED SOLUTION

Accepted Solutions
Julian E.
ST Employee

Hello @SR1218,

 

Your model is already very small. When quantizing the model in QDQ format (which is what we support, and the dev cloud is doing), it adds new layers quantize and dequatize that can help reducing the size of big models.
But in the case of small models, the weight of these multiples new layers increases more the size of the model than what you gain in terms of weights compression.

Non quantize model 53kb -> QDQ model 62kb.

JulianE_0-1758531578605.png

In your case it even creates a bug when using the ST Edge AI Core (that converts the model to C, behing the dev cloud and X Cube AI). But as I explained, even if there was no bug, it would still not help you...

 

As you pointed out, the main issue here is the memory available on the STM32F401RE NUCLEO. 

I don't know what your use case is, I would guess that it is related to sound based on your I/O.

For sound, our example applications are based on the B-U585I-IOT02A, and STM32N6570-DK, you may try to get one of these. (N6 is interesting for big models)

 

I don't see how you could reduce the size of your model to make it usable.

I can maybe advise you to take a look at NanoEdge AI Studio and see if the machine learning libraries (models) that it generates can be enough for your usecase. It is a free tool and very easy to use. If you have dataset, I think you can very quickly see if you seem to get good results or not

NanoEdge AI Studio - STMicroelectronics - STM32 AI

 

Have a good day,

Julian

 

 


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

1 REPLY 1
Julian E.
ST Employee

Hello @SR1218,

 

Your model is already very small. When quantizing the model in QDQ format (which is what we support, and the dev cloud is doing), it adds new layers quantize and dequatize that can help reducing the size of big models.
But in the case of small models, the weight of these multiples new layers increases more the size of the model than what you gain in terms of weights compression.

Non quantize model 53kb -> QDQ model 62kb.

JulianE_0-1758531578605.png

In your case it even creates a bug when using the ST Edge AI Core (that converts the model to C, behing the dev cloud and X Cube AI). But as I explained, even if there was no bug, it would still not help you...

 

As you pointed out, the main issue here is the memory available on the STM32F401RE NUCLEO. 

I don't know what your use case is, I would guess that it is related to sound based on your I/O.

For sound, our example applications are based on the B-U585I-IOT02A, and STM32N6570-DK, you may try to get one of these. (N6 is interesting for big models)

 

I don't see how you could reduce the size of your model to make it usable.

I can maybe advise you to take a look at NanoEdge AI Studio and see if the machine learning libraries (models) that it generates can be enough for your usecase. It is a free tool and very easy to use. If you have dataset, I think you can very quickly see if you seem to get good results or not

NanoEdge AI Studio - STMicroelectronics - STM32 AI

 

Have a good day,

Julian

 

 


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.