How to run AI models from model zoo on STM32N6

B.Montanari · ‎2025-06-24

Summary

This article provides a comprehensive guide on running AI models from the STM32 model zoo on STM32N6 microcontrollers. It includes step by step instructions on selecting a model, preparing the development environment, and converting the model using STM32Cube.AI. The guide also explains how to integrate the generated code into the firmware and deploy it onto the STM32 board for real-time testing.

Introduction

In the field of embedded systems, integrating artificial intelligence (AI) into microcontrollers has become increasingly important for enabling advanced functionalities in resource-constrained devices. The STM32 model zoo provides a collection of pretrained AI models optimized for STM32 microcontrollers, making it easier for developers to implement AI in their applications. These models are designed for various tasks such as image classification, object detection, and speech recognition. They are tailored to work efficiently within the hardware limitations of STM32 devices.

Running an AI model on STM32 microcontrollers involves several key steps. This includes selecting a suitable model, preparing the development environment, converting the model into a compatible format using STM32Cube.AI, and integrating the generated code into the firmware. Proper deployment ensures that the model operates effectively in real-time applications while meeting the performance and accuracy requirements.

This article provides a step by step guide on how to run a model from the STM32 model zoo, specifically using the STM32N6570-DK. The guide covers the entire workflow, from selecting a suitable model to preparing the development environment, converting the model using STM32Cube.AI, and integrating the generated code into the firmware. By following this guide, you gain a clear understanding of the process and learn how to leverage the STM32N6 Discovery kit’s capabilities to implement AI in your embedded projects effectively.

STM32Cube.AI ecosystem

1. Prerequisites

1.1 Required tools and software

STM32CubeIDE (1.17.0 or later): An integrated development environment for the STM32 microcontroller
STM32CubeProgrammer (2.18 or later): For flashing the firmware in STM32N6 Discovery kit
STM32N6 HAL driver (1.1.1 or later): The high abstract layer to support the STM32N6 microcontrollers series
STEdgeAI-Core (2.1): To convert and optimize the AI model for STM32
STM32AI-ModelZoo-Services (3.1.0): Provides pretrained and optimized AI models for seamless deployment on STM32 MCUs and MPUs

1.2 Set your environment variables

You also need to configure the environment variables to include the paths for stedgeai.exe. This ensures that the tools can be accessed from the command line.

If you have admin-rights on your computer, follow these steps:

Step-by-step instructions to set the environment variables

Now, add to the system PATH the stedgeai.exe path, assuming the default path:

 C:\ST\STEdgeAI\2.1\Utilities\windows

And add the armcc as well, assuming STM32CubeIDE version 1.18.1, this is the default path:

C:\ST\STM32CubeIDE_1.18.1\STM32CubeIDE\plugins\com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.13.3.rel1.win32_1.0.0.202411081344\tools\bin

In case admin rights are not available, use the environment setup in a temporary session using PowerShell:

Powershell

From this point on, issue these two commands (assuming default paths were used):

$Env:Path += ";C:\ST\STEdgeAI\2.1\Utilities\windows"

$Env:Path += ";C:\ST\STM32CubeIDE_1.18.1\STM32CubeIDE\plugins\com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.13.3.rel1.win32_1.0.0.202411081344\tools\bin"

1.3 Verify the configuration

After setting the environment variables, verify that the paths are correctly configured by running the following commands in the terminal:

stedgeai.exe --version

Validation of environment variables

By completing these steps, you have the necessary tools and environment ready to proceed with optimizing, deploying, and running your AI model on the STM32N6 Discovery kit.

2. Development

In this section, we focus on optimizing, quantizing, and deploying the AI model on the neural processing unit (NPU) of the STM32N6 Discovery kit. The first model is designed for an object recognition task, where we use a modified YOLO model developed by STMicroelectronics, known as st_yolo_x_nano. After optimizing and quantizing the model using STEdgeAI-Core, a network weights file is generated. This file is stored in the XSPI2, associated with the external serial NOR flash memory on the Discovery kit. The starting position of XSPI2, as defined in the STM32N6 reference manual, is 0x70000000 and the content is programmed at the address 0x70380000. This address is part of the external flash memory, which the NPU can access during runtime to execute the neural network. The next image illustrated that.

Simplified Interconnect top view of STM32N6 only with the CPU, NPU and the External Flash memory with the address table

Once deployed, we run an application to validate the performance and functionality of the model. Let’s begin with the step by step process for the object detection model.

3. Object recognition model

3.1 Step 1: Convert the model

In this step, use STEdgeAI-Core to convert the selected model, preparing it for deployment on the STM32N6 Discovery kit. The process involves reducing the model's size and computational complexity while maintaining its accuracy, ensuring it runs efficiently on the microcontroller's neural processing unit (NPU).

To begin, you need to clone the STM32 model zoo services GitHub repository. Once cloned, navigate to the model directory for the object detection application. The path to the model is as follows:

stm32ai-modelzoo-services\application_code\object_detection\STM32N6\Model

Here, you find the pretrained YOLO model (st_yolo_x_nano) that needs to be converted. This directory contains all the files required to optimize and quantize the model for STM32 hardware. The next image illustrates the folder mentioned.

The YOLO model on “Model” folder

Before we optimize and quantize the model, we delete the files inside the STM32N670-DK directory. These files will be regenerated by STEdgeAI after we run the CLI commands. Deleting them ensures that there are no conflicts with existing files and provides a clean slate for tutorial purposes.

The files inside of STM32N670-DK folder

Now, we can optimize and quantize our model. Simply navigate back to the Model directory and open a terminal inside it. Then, run the following STEdgeAI command to start the conversion process:

stedgeai generate --model st_yolo_x_nano_480_1.0_0.25_3_st_int8.tflite --target stm32n6 --st-neural-art default@user_neuralart_STM32N6570-DK.json --input-data-type uint8

The following images illustrate this process:

Opening a terminal on Model directory

STEdgeAI command with the output

Before we deploy the model on the neural processing unit (NPU), here is an explanation of the parameters passed in the STEdgeAI command:

--model st_yolo_x_nano_480_1.0_0.25_3_st_int8.tflite:

Specifies the path to the pretrained and quantized YOLO model file in TensorFlow Lite format (.tflite). This is the model that is optimized and prepared for deployment.

--target stm32n6:

Indicates the target microcontroller family. In this case, it is the STM32N6 series, ensuring the generated files are compatible with the STM32N6 hardware.

--st-neural-art default@user_neuralart_STM32N6570-DK.json:

Refers to the compilation profile JSON file (.json) that contains specific configurations for the STM32N6570-DK board. This file defines how the model is mapped on the NPU. It includes a pointer to the .mpool file, which manages memory pooling for the NPU, and some compilation options. More details can be found here: https://stm32ai-cs.st.com/assets/embedded-docs/stneuralart_neural_art_compiler.html#ref_compilation_profiles_json_file .

--input-data-type uint8:

Specifies the data type of the input tensor. Here, uint8 indicates that the model uses 8-bit unsigned integers for input data, which is common for quantized models to reduce memory and computational requirements. More details can be found here: https://stedgeai-dc.st.com/assets/embedded-docs/command_line_interface.html#ref_input_data_type_option

These parameters ensure that the model is correctly configured for execution on the STM32N6 Discovery Kit's NPU. Once this command is executed, the necessary files for deployment will be generated.

3.2 Step 2: Deploy on the neural processing unit (NPU)

In this step, you use STEdgeAI-Core to optimize and quantize the selected model, preparing it for deployment on the STM32N6 Discovery kit. The process involves reducing the model's size and computational complexity while maintaining its accuracy, ensuring it runs efficiently on the microcontroller's neural processing unit (NPU).

During the optimization process, STEdgeAI generates two folders inside the Model directory: st_ai_output and st_ai_ws. The st_ai_output folder contains the neural network binary file in a *.raw format and associated header files (.h), which are essential for deployment. The st_ai_ws folder includes intermediate files and logs created during the optimization process. For deployment, you primarily work with the files in the st_ai_output folder. The next image will clarify/explain these two folders.

st_ai_output and st_ai_ws folders explained

We focus only on the files in the st_ai_output folder. Copy the network.c, network_ecblobs.h, and network_atonbuf.xSPI2.raw files into the STM32N670-DK folder.

Copying the files from st_ai_output to the STM32N6570-DK

After that, you just need to rename the network_atonbuf.xSPI2.raw to network_data.xSPI2.bin. If a *.hex file is desired, this command can be used in the PowerShell or command prompt:

arm-none-eabi-objcopy.exe -I binary .\network_atonbuf.xSPI2.raw --change-addresses 0x70380000 -O ihex network_data.hex

Finally, we can upload the network binary to the board using STM32CubeProgrammer on the XSPI2 (0x7038 0000). This base address is defined by the .mpools configuration of the STM32N6570-DK (located in the Models\my_mpools folder).

The STM32N6570-DK .mpools configuration file with the base address highlighted

Make sure that the External loader is checked with the STM32N6570-DK’s external flash before proceeding with the programming:

Ensure the Ext. Flash memory is activate on STM32CubeProgrammer

Once that box is checked, proceed with the programming, pointing to the *.bin file and placing it at 0x70380000.

Uploading the binary of the network on STM32CubeProgrammer

3.3 Step 3: Run the application and tryout the AI model

In this step, you execute the final phase of deploying an AI model on the STM32N6 Discovery kit. After completing all the steps, it’s time to run the application and evaluate the AI model's performance in a real-world scenario. By the end of this step, you are able to verify the model's behavior, assess its efficiency, and ensure it meets the desired performance and accuracy requirements for your application.

First, make sure you have it in DEV boot mode to program the code:

Dev Boot configuration

Also, remember to connect a camera module to test the application:

Connecting your camera to the STM32N670-DK

Now, we can import the object detection application on STM32CubeIDE by following these steps:
Import Project → Existing Projects into Workspace → Browser and find the object detection project. This step by step is illustrated in the next figures. This is the path:

..\stm32ai-modelzoo-services\application_code\object_detection\STM32N6\Application\STM32N6570-DK\STM32CubeIDE

Step-by-step on how to import the object detection project on STM32CubeIDE

Selecting the Object Detection Project to import on the STM32CubeIDE

At last, run the application and watch the magic happen!

Application of object recognition task running on STM32N6

Conclusion

This article provided a comprehensive guide to deploying an AI model from the STM32 model zoo on the STM32N6 Discovery kit. It covers all key steps from model selection to real-world testing. By following this process, you have learned how to optimize and quantize a model for object recognition, ensuring it operates efficiently within the hardware limitations of STM32 microcontrollers. With this knowledge, you are now equipped to leverage AI in your embedded projects, unlocking new possibilities for innovative and intelligent applications.