STM32N6570-DK can run Model Zoo examples but not custom quantized model — possible 4.2 MB SRAM limit?

BCPH357 · ‎2025-11-06

I’m currently testing my own quantized model on the STM32N6570-DK, using the official X-CUBE-AI application example.
All Model Zoo examples (e.g. SSD, image classification) run perfectly with the same firmware and codebase.
However, when I replace the model with my own quantized TensorFlow Lite model (Zero-DCE, converted with ST Edge AI Developer Cloud), the inference process freezes — even though the conversion and compilation complete without errors.

I’m wondering if there is a hard restriction that prevents models from running when used RAM exceeds the on-chip SRAM (≈ 4.2 MB), even though the report shows that part of the activations are mapped to external HyperRAM.

Here are the key parts of my Edge AI Core 2.2.0 generation report:

ST Edge AI Core v2.2.0-20266
Model name: zerodce_int8_192

Total memory usage:
----------------------------------------------
Total: 11.643 MB
Weights: 78.751 KB
Activations: 11.566 MB

Memory mapping:
cpuRAM2 [0x34100000 - 0x34200000]: 1.000 MB
npuRAM3 [0x34200000 - 0x34270000]: 416 KB
npuRAM4 [0x34270000 - 0x342E0000]: 432 KB
npuRAM5 [0x342E0000 - 0x34350000]: 432 KB
npuRAM6 [0x34350000 - 0x343C0000]: 324 KB
octoFlash [0x71000000 - 0x71080000]: 78.7 KB (weights)
hyperRAM [0x90000000 - 0x90900000]: 9.000 MB (activations)

Epochs:
Total: 36, of which 1 implemented in software (Split)

So the model uses around 11.6 MB total, with 9 MB activations placed in HyperRAM.
But in practice, the device freezes as soon as inference begins, while the SSD model (which uses less than 4 MB RAM) runs normally on the same code and hardware.

My questions are:

Is there a hardware or runtime limitation preventing models that exceed the internal SRAM size (≈ 4.2 MB) from running, even if the memory pool maps to external RAM?
Does the runtime require additional configuration (e.g. MPU or cache attributes) to allow activations in HyperRAM to be used safely?
Are there recommended memory profiles or examples for models > 4 MB activations on STM32N6570-DK?

Any insights or examples from ST engineers or community members who have successfully run large models (using external RAM) would be greatly appreciated.

GRATT.2 · ‎2025-11-07

Hi,

I used the ModelZoo to deploy your model. First, install the requirements.txt in the python venv (see README). Change the model_path in object_detection/src/config_file_examples/deployment_n6_ssd_mobilenet_v2_fpnlite_config.yaml to point to your model file. In the object_detection/ folder, run python stm32ai_main.py --config-path src/config_file_examples --config-name deployment_n6_ssd_mobilenet_v2_fpnlite_config.yaml

Be sure your board is in development mode to allow the deployment script to flash the weights and the app, then put the switch to boot-from-flash mode and reset your board. You will see the inference time and the welcome screen, but nothing will happen anymore as the application crashed during the post-processing display.

Guillaume

View solution in original post

GRATT.2 · ‎2025-11-07

Hello @BCPH357,

Your model’s memory profile is indeed quite unique, with very small weights (78.8 KB) but very large activations (11.6 MB), your model may inference very slowly as external RAM is slow compared to internal RAM.

To answer your questions:
1. The external RAM can be used safely by the model as it is initialized at the beginning of the application before the first inference occurs. It is also used by the application to store image buffers.

2. No additional configuration required.

3. The default memory profiles are good.

Allow me to ask you a few questions so that I can better assist you:
A. What is your workflow? Are you using the deployment service of the ModelZoo or are you modifying the GettingStarted package (application_code/object_detection/STM32N6)?
B. In case you are using the GettingStarted package, ensure to flash the weights generated by STEdgeAI (network_data.hex) before running the application (it's automatically done by the deployment service).

C. Regarding your model, it doesn't have an output similar to the SSD models proposed in the ModelZoo (see ssd_mobilenet_v2_fpnlite_035_192_int8.tflite for example). This means their is no post-processing available to process the model's output. I managed to deploy your model using the deployment script (replacing model_path in this SSD's example yaml file), assuming it's an object detection model. The inference time is 952ms. But the output of the model isn't well interpreted by the SSD post-processing so the app crashes when it tries to print the boxes, as they are outside of the screen. You must write your own post-processing in C if you do not wish to use one of the available ones.

I hope this helps,

Guillaume

BCPH357 · ‎2025-11-07

Hello Guillaume,

Thank you for the detailed explanation — that really helps clarify things.

To answer your question about my workflow:
I followed exactly the steps described in this tutorial:
How to build an AI application from scratch on the STM32N6570-DK

Using this guide, I successfully deployed and ran the ssd_mobilenet_v2_fpnlite_035_192_int8.tflite model from the Model Zoo — it works perfectly.
However, when I replace that model with my own quantized zerodce_int8_192.tflite, following the exact same procedure, the application builds and flashes fine, but inference does not start (it freezes).

This happens even though my generated model report shows that the compilation and memory mapping complete successfully, and that the model includes only one software layer (Split).

So it seems the problem is not related to memory mapping or initialization, but to how the runtime handles this model during inference.

By the way, could you please share the full content of your void MX_X_CUBE_AI_Process(void) function (or the version you used for testing my model)?
It would help me verify if there are any subtle differences in the inference flow that might affect model execution.

Best regards,
BCPH357

GRATT.2 · ‎2025-11-07

Hi,

I used the ModelZoo to deploy your model. First, install the requirements.txt in the python venv (see README). Change the model_path in object_detection/src/config_file_examples/deployment_n6_ssd_mobilenet_v2_fpnlite_config.yaml to point to your model file. In the object_detection/ folder, run python stm32ai_main.py --config-path src/config_file_examples --config-name deployment_n6_ssd_mobilenet_v2_fpnlite_config.yaml

Be sure your board is in development mode to allow the deployment script to flash the weights and the app, then put the switch to boot-from-flash mode and reset your board. You will see the inference time and the welcome screen, but nothing will happen anymore as the application crashed during the post-processing display.

Guillaume