2025-09-21 3:36 AM
Hello everyone,
I'm working on deploying a PyTorch model to an STM32F401RE NUCLEO board and encountering some challenging memory and quantization issues that I hope the community can help me resolve.
My project involves running a custom PyTorch model (converted to ONNX format) on the STM32F401RE NUCLEO board. The system already has USB Host (Audio Class) library and FreeRTOS integrated as essential components for my application, which means I need to work within the remaining available memory space.
When I configure X-CUBE-AI with compression set to high and optimization set to ram, the build process fails with a linker error:
C:/ST/STM32CubeIDE_1.18.1/STM32CubeIDE/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.13.3.rel1.win32_1.0.0.202411081344/tools/bin/../lib/gcc/arm-none-eabi/13.3.1/../../../../arm-none-eabi/bin/ld.exe: Xcube.elf section `.bss' will not fit in region `RAM' C:/ST/STM32CubeIDE_1.18.1/STM32CubeIDE/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.13.3.rel1.win32_1.0.0.202411081344/tools/bin/../lib/gcc/arm-none-eabi/13.3.1/../../../../arm-none-eabi/bin/ld.exe: region `RAM' overflowed by 10800 bytes
Given that the STM32F401RE has only 96KB of SRAM and I already have USB Host and FreeRTOS consuming memory, this overflow isn't entirely surprising.
To address the memory constraints, I attempted INT8 quantization using ONNX Runtime. Here's the quantization code I used:
class MyCalibrationDataReader(CalibrationDataReader): def __init__(self, data, model_path): self.enum_data = None self.data = data # Use inference session to get input shape. session = onnxruntime.InferenceSession(model_path, None) batch_size, channel, length = session.get_inputs()[0].shape self.input_name = session.get_inputs()[0].name self.datasize = len(data) def get_next(self): if self.enum_data is None: self.enum_data = iter([ {self.input_name: sample[np.newaxis, np.newaxis, :].astype(np.float32)} # (2048,) → (1, 1, 2048) for sample in self.data ]) return next(self.enum_data, None) def rewind(self): self.enum_data = None # Reset the enumeration of calibration data dr = MyCalibrationDataReader(cali_data, model_fp32_prep) quantize_static( model_fp32_prep, model_quant, dr, quant_format=QuantFormat.QDQ, per_channel=True, weight_type=QuantType.QInt8, activation_type=QuantType.QInt8, reduce_range=True, extra_options={'WeightSymmetric': True, 'ActivationSymmetric': False} )
However, when I try to analyze the quantized model with X-CUBE-AI, I encounter this error:
Analyzing model C:/Users/user/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/10.2.0/Utilities/windows/stedgeai.exe analyze --target stm32f4 --name network -m C:/Users/user/Downloads/FRFconv-TDS_onnx.quant.onnx --compression high --verbosity 1 --no-inputs-allocation -O ram --no-outputs-allocation --memory-pool C:\Users\user\AppData\Local\Temp\mxAI_workspace1712638268650010180404439894557726\mempools.json --workspace C:/Users/user/AppData/Local/Temp/mxAI_workspace1712638268650010180404439894557726 --output C:/Users/user/.stm32cubemx/network_output ST Edge AI Core v2.2.0-20266 2adc00962 INTERNAL ERROR: 'NoneType' object is not subscriptable
I also tried using ST Edge AI Developer Cloud for quantization, but encountered the same issue:
>>> stedgeai analyze --model FRFconv-TDS_onnx_PerTensor_quant_random_2.onnx --optimization ram --target stm32f4 --name network --workspace workspace --output output ST Edge AI Core v2.2.0-20266 2adc00962 INTERNAL ERROR: 'NoneType' object is not subscriptable
I'm quite attached to my current model architecture as it's specifically designed for my application requirements, so I'd prefer not to change the model structure if possible.
Memory Optimization: Has anyone successfully deployed AI models on STM32F401RE with other libraries like USB Host and FreeRTOS running simultaneously? Are there additional memory optimization techniques beyond X-CUBE-AI's high compression and RAM optimization that I could try?
Quantization Error: Have you encountered the 'NoneType' error when analyzing quantized ONNX models in X-CUBE-AI? This seems to occur both locally and on the cloud platform. Could this be a compatibility issue with my quantization approach or the ONNX model format?
Alternative Approaches: Are there other strategies to make my model fit within the available memory constraints without modifying the model architecture?
I have the following resources available if they would help with troubleshooting:
Please let me know if you need any additional information to help diagnose these issues.
Thank you in advance for your assistance!