How to build an AI application from scratch on the STM32N6570-DK using STM32CubeMX

Julian E. · ‎2025-07-30

Introduction

This tutorial begins with STM32CubeMX and demonstrates an AI application for the STM32N6-DK board.

This example explains how to load an application from external flash memory, execute an AI model inference stored in external flash, and display the inference output via a serial interface.

To achieve this, the tutorial uses an example model from the ST model zoo and the STM32CubeMX package: X-CUBE-AI.

The .ioc can be found attached, at the end of this article.

Introduction
1. Hardware and software prerequisites
2. STM32CubeN6 package
2.1Types of templates
2.1.1 Basic template
2.1.2FSBL Load and Run (LRUN) template
2.1.3 FSBL execute in place (XIP) template
2.1.4 Isolation LRUN template
2.1.5 Isolation XIP template
3. Design in STM32CubeMX
3.1Create a project
3.2 Configure peripherals
3.2.1 System core
3.2.2 LEDs
3.2.3 Overdrive mode
3.2.4 Analog
3.2.5 Connectivity
3.2.5.1 Configure serial communication with external memory
3.2.5.2 Configure serial communication for debugging
3.2.6 Multimedia
3.2.7 Security
3.2.8 Middleware and software packs
3.2.8.1 Configure external memory manager middleware
3.2.8.2 Configure the X-CUBE-AI middleware
3.3 Configure the clocks
3.4 Project manager
4. STM32CubeIDE
4.1 Add code in the FSBL
4.2 Add code in Appli
4.3 Build
5. Deploying the application
5.1 Sign binary files with Signing Tool
5.2 Generate model weights binary image
5.3 Flash binaries with STM32CubeProgrammer
6. Running the application
Conclusion
Related links
Annexes

1. Hardware and software prerequisites

This tutorial uses the STM32N6570-DK board. You also need a USB Type-C® cable to program the board.

You can create the project in STM32CubeMX or use the STM32CubeMX embedded in STM32CubeIDE.

The following software and versions were used:

STM32CubeIDE: Version 1.18.1
STM32CubeMX: Version 6.15.0
STM32CubeProgrammer: Version 2.19.0
X-CUBE-AI plugin: Version 10.2.0
ST32Cube FW_N6: Version 1.1.1
STM32Cube MCU package for STM32N6 series: Version 1.2.0

2. STM32CubeN6 package

The STM32N6 microcontroller does not include internal flash memory. Therefore, to retain the application after power-off, external flash memory is typically used to store the binaries initially. Based on this, several design templates are provided to allow the user to copy the application from flash to RAM, either entirely or in multiple stages.

This tutorial is based on the FSBL LRUN template.

2.1 Types of templates

The five design templates available in the CubeN6 package are described below.

2.1.1 Basic template

The FSBL binary is initially stored in the external memory of the STM32N6 board. It is copied by the boot ROM at power-on into the internal SRAM and executed from there once the boot ROM finishes its task.

2.1.2 FSBL Load and Run (LRUN) template

Two binaries, the FSBL and the application (Appli), are initially stored in the external memory of the STM32N6 board. At power-on, the boot ROM copies the FSBL binary into internal SRAM. Once the boot ROM completes its task, the FSBL executes after performing clock and system configuration, it copies the application binary into internal SRAM. When done, the application starts and runs.

2.1.3 FSBL execute in place (XIP) template

Two binaries, the FSBL and the application (Appli), are initially stored in the external memory of the STM32N6 board. At power-on, the boot ROM copies the FSBL binary into internal SRAM. Once the boot ROM completes, the FSBL executes after system configuration, it maps the external memory (containing the application binary) into the memory space for XIP (execute in place). When the FSBL completes, the application executes directly from external memory.

2.1.4 Isolation LRUN template

Three binaries, the FSBL, the secure application (AppliSecure), and the nonsecure application (AppliNonSecure) are initially stored in the external memory of the STM32N6 board. At power-on, the boot ROM copies the FSBL binary into internal SRAM. After boot ROM execution, the FSBL executes. It configures the system and then copies both the secure and nonsecure application binaries into internal SRAM. Once complete, the secure application runs first and configures isolation settings, followed by the execution of the nonsecure application.

2.1.5 Isolation XIP template

Three binaries, the FSBL, the secure application, and the nonsecure application are initially stored in the external memory of the STM32N6 board. At power-on, the boot ROM copies the FSBL binary into internal SRAM. Once the boot ROM completes, the FSBL executes it performs clock and system configuration and then maps the external memory for execution. The secure application runs directly from external memory, configures isolation settings, and then jumps to the nonsecure application.

3. Design in STM32CubeMX

3.1 Create a project

You can create a project in either the standalone STM32CubeMX application or the integrated STM32CubeMX version within STM32CubeIDE. The standard approach is to design the project in STM32CubeMX and then export it to STM32CubeIDE. This method is used in this tutorial.

Begin your project by clicking Access to board selector and selecting STM32N6570-DK, which corresponds to the STM32N6 Discovery kit board.

When asked to "Initialize all peripherals with their default Mode", click [No].

While some peripherals may be enabled by default, remove any that are unnecessary to avoid pin conflicts.

Select "Secure domain only" and click [OK]:

We can proceed with the configuration.

3.2 Configure peripherals

Navigate to "Pinout & Configuration."

3.2.1 System core

Enable the CPU ICACHE and CPU DCACHE.

Enable CACHEAXI.

Ensure that ICACHE is disabled.

As shown in the previous image above, a pin conflict warning is displayed for PA8 in the RCC section. This pin is assigned to both RCC and LCD. Since this tutorial does not use the LCD, change the PA8 pin function from LTDC_B6 to RCC_MCO_1 in the "Pinout View."

3.2.2 LEDs

To ensure that the code executes correctly, the green and red LEDs indicate the status of the FSBL and the application, respectively. Although the red LED is preconfigured as LED2, it is not set as an output by default. In the pinout view, the green LED corresponds to pin PO1, and the red LED corresponds to pin PG10.

In the "Pinout View", configure PO1 and PG10 as GPIO_Output.

Under "Pin Context Assignment", assign PO1 to the FSBL and PG10 to the Application.
Other parameters related to these pins can be left at their default values.

3.2.3 Overdrive mode

To enable overdrive mode, where the CPU operates at its maximum frequency (as shown below in the table from the STM32N6 datasheet), the EXT_SMPS_MODE pin must be configured. According to the STM32N6570-DK User Manual, this corresponds to pin PF4.

Therefore, configure PF4 as GPIO_OutPut and edit his configuration as follows:

Additionally, the maximum CPU frequency in overdrive mode requires setting the "Power Regulator Voltage Scale" to "0." Ensure that this configuration is applied in the RCC section.

Since intermediate activations during inference are stored in RAM, enable the RAM controllers accordingly.

3.2.4 Analog

You can deactivate everything in this part to avoid potential conflict:

3.2.5 Connectivity

First, you can deactivate everything that is set by default by CubeMX which we do not use, in this case:

ETH1
I2C1
I2C2
SDMMC2
UCPD1
USB1_OTG_HS
USB2_OTG_HS
XSPI1

3.2.5.1 Configure serial communication with external memory

As shown in the image below from the STM32N6570-DK user manual, the MCU is connected to the octo-SPI flash memory via the XSPI (extended-SPI) interface. Thus, XSPI2 is configured under the "Connectivity" section.

In XSPIM, select the operation mode as shown:

Then, configure XSPI2 as shown below:

Fifo threshold: 4
Memory Size: 1 GBits
Delay Hold Quarter Cycle: Enabled

3.2.5.2 Configure serial communication for debugging

The serial interface USART1 (PE5/PE6), which supports the bootloader, is directly accessible as a Virtual COM port on the PC when connected via the STLINK-V3EC USB connector (CN6). We use this interface to redirect the ‘printf‘ output, enabling easy debugging through a serial terminal. Enable USART1 for the Application runtime context, set the mode to "Asynchronous", and configure the baud rate to 115200 bit/s

As you can see below, there is a conflict shown in orange. To fix it, change the pin PA11 to USART1_CTS:

To redirect ‘printf‘, additional code needs to be added to your project. This is done later.

3.2.6 Multimedia

We do not use anything, so you can deactivate everything that was set by default.

3.2.7 Security

To ensure boot security, enable "BSEC" for the FSBL. Additionally, since the NPU is used in the application, activate "RIF" for the Application. Check the boxes as shown below:

3.2.8 Middleware and software packs

3.2.8.1 Configure external memory manager middleware

With the serial interface already configured, you can now set up the external memory loader. Start by selecting [Load and Run] under "Selection of Boot System". The remaining parameters depend on the size of the generated binaries.
If you are unsure of the sizes, you can first generate the binaries using estimated values, then adjust the configuration accordingly and regenerate.

For this project, the FSBL binary is approximately 65 KB, and the Appli binary is around 295 KB. The board’s memory map is shown below.

Note two key addresses in this map: the secure RAM block starts at address ‘0x34000000‘, and the octo-SPI flash (interfaced by XSPI2) starts at address ‘0x70000000‘.

Upon power-up, the boot ROM is executed first. After that, the FSBL (stored at the beginning of flash) is copied into RAM and executed. The FSBL then copies the Appli binary from flash to RAM and executes it. Thus, the FSBL binary is stored at ‘0x70000000‘.
The Appli binary should be placed at an address offset greater than the FSBL size. Choosing an offset of ‘0x00100000‘ (1 MB) provides ample space. The code size corresponds to the Appli binary size; ‘0x0004BAF0‘ (310 KB) offers a suitable margin.

By default, "Memory 1" is the source memory, corresponding to the XSPI2 interface. The destination address should be set to the start of the secure RAM block, where the Appli binary is loaded.

3.2.8.2 Configure the X-CUBE-AI middleware

The X-CUBE-AI middleware is used to generate application code for running neural network inferences. Enable it in the Application context and select "Application Template" as the application type.

When asked if you want to automatically fix peripherals and clocks, click [No].

In this tutorial, the object detection model from the ST model zoo is used (link). Get the quantized .tflite model.

For your information: You can find script to retrain, deploy example application and much more for different use cases (image classification, object detection, audio detection etc.) on our GitHub model zoo services:

GitHub: AI model zoo services for STM32 devices

Select "TFLite" and browse for your model. Additionally, select the profile "n6-allmems-03" if not already used by default.

In the "Advanced Settings" (accessible via the blue icon above "Show Graph"), you can view the memory pool used by X-CUBE-AI. This pool stores the model’s fixed weights in flash and its activations in RAM. The OctoFlash pool begins at address ‘0x71000000‘, so after generating the model weights image, it should be downloaded to this address. The image generation process is explained later in this tutorial.

3.3 Configure the clocks

Configure the clock according to the maximum supported frequencies (shown earlier in the overdrive mode section).

High-speed OTP optimization is not enabled for XSPI. Therefore, the XSPI2 clock must be reduced. Configure IC3 with PLL4 as input and a prescaler of 1, then set IC3 as the input for the XSPI2 clock multiplexer. You should have this:

3.4 Project manager

At this point, the CubeMX design and configuration are complete. In the "Project Manager" section, click [Generate Code] to export your project. Ensure that both FSBL and Appli are included and select STM32CubeIDE as the target toolchain.

4. STM32CubeIDE

After exporting your project from CubeMX, you will have the following structure. There are two nested projects: one for the FSBL and one for the Appli. The "Drivers" folder, which includes CMSIS and HAL drivers, is located outside the global project, and both nested projects access it by including the appropriate headers. The same applies to the ‘Middlewares‘ folder.

4.1 Add code in the FSBL

Now, you should add code to the main functions of each project to complete certain initializations, use the LEDs to indicate successful execution, and validate the outputs. In the FSBL, to turn on the green LED, add the function call as shown in the image below.

Toggle LED1 after FSBL initialization in main.c

4.1.png

HAL_GPIO_WritePin(GPIOO,GPIO_PIN_1,GPIO_PIN_SET);

Additionally, in the FSBL main code, the overdrive mode selection pin must be set high before configuring the system clock. Therefore, make sure to initialize the corresponding GPIO before the system clock configuration, as shown in the image below:

4.1 - 2.png

MX_GPIO_Init();
HAL_Delay(1);

4.2 Add code in Appli

Next, go to the Appli project. At the top of ‘main.c‘, in the private define section, declare the following macro for the ‘put_char‘ function.

4.2.png

#define PUTCHAR_PROTOTYPE int __io_putchar(int ch)

Then, in the ‘User Code 4‘ section at the bottom of ‘main.c‘, implement the following functions:

4.2 - 2.png

PUTCHAR_PROTOTYPE
{
HAL_UART_Transmit(&huart1, (uint8_t *)ch, 1, 0xFFFF);
return ch;
}
int _write(int fd, char * ptr, int len){
HAL_UART_Transmit(&huart1, (uint8_t *) ptr, len, HAL_MAX_DELAY);
return len;
}

Within ‘main.c‘, in the X-CUBE-AI initialization function, add the following lines to enable the RAM sections that were previously initialized and enabled:

4.2 - 3.png

RAMCFG_SRAM2_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM3_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM4_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM5_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM6_AXI->CR &= ~RAMCFG_CR_SRAMSD;

Important note: Depending on the version of CubeMX, X-CUBE-AI, and the STM32N6Cube package, you may have different things in this MX_X_CubeAI_Init() function. Edit it if needed to have exactly what is in the image above.

Add the following line to the RIF (SystemIsolation_Config) function to complete the slave configuration:

4.2 - 4.png

HAL_RIF_RISC_SetSlaveSecureAttributes(RIF_RISC_PERIPH_INDEX_NPU, RIF_ATTRIBUTE_PRIV | RIF_ATTRIBUTE_SEC);

Important note 2: Make sure that the part of the code above the line we add is present in the generated code. If not, it may indicate an issue in the RIF configuration. Make sure to look back at the Security configuration in CubeMX and that the NPU is selected in the RIF tab of CubeMX.

Finally, you need to handle the input and output buffers to feed your neural network and retrieve the inferred results. Since both the NPU and MCU have their own cache memories, these must be properly managed before invoking the low-level inference function to ensure that both units access up-to-date data in memory and that any results are stored in a mutually accessible region. Therefore, you must perform a cache clean and invalidate operations.

The ‘MX_X_CUBE_AI_Process‘ function below implements the following features:

Calculating buffer sizes.
Filling the input buffer with constant values.
Cleaning and invalidating the MCU DCACHE and invalidating the NPU cache prior to inference.
Converting integer table values to float and printing them via UART.

The last functionality is implemented because memory is always allocated as an int table, but for this model, the output format is float. An int is 8 bits, and a float is 32 bits. Therefore, reading a single element from the array does not yield 19 valid values. Instead, four consecutive 8-bit values must be concatenated and interpreted as a single 32-bit float. You can copy the ‘MX_X_CUBE_AI_Process‘ function from the script below. After pasting it, press [Ctrl+I] to autoindent the code properly.

// Initialize these 2 variables at the begining of the file
uint32_t buff_in_len, buff_out_len;

void MX_X_CUBE_AI_Process(void)
{
    /* USER CODE BEGIN 6 */
	LL_ATON_RT_RetValues_t ll_aton_rt_ret = LL_ATON_RT_DONE;
	const LL_Buffer_InfoTypeDef * ibuffersInfos = NN_Interface_Default.input_buffers_info();
	const LL_Buffer_InfoTypeDef * obuffersInfos = NN_Interface_Default.output_buffers_info();
	buffer_in = (uint8_t *)LL_Buffer_addr_start(&ibuffersInfos[0]);
	buffer_out = (uint8_t *)LL_Buffer_addr_start(&obuffersInfos[0]);
	// Printing buffer start and end addresses.
	printf("Input buffer: offset start = %lu, \n \r offset end = %lu \n \r",ibuffersInfos->offset_start,ibuffersInfos->offset_end);
	printf("Output buffer: offset start = %lu, \n \r offset end = %lu \n \r",obuffersInfos->offset_start,obuffersInfos->offset_end);
	// Getting buffer size and printing it.
	buff_in_len = ibuffersInfos->offset_end - ibuffersInfos->offset_start;
	buff_out_len = obuffersInfos->offset_end - obuffersInfos->offset_start;
	printf("Buffer input size = %lu \n\r Buffer output size = %lu \n\r", buff_in_len, buff_out_len);
	uint8_t val = 10;
	LL_ATON_RT_RuntimeInit();
	// Run 10 inferences
	for (int inferenceNb = 0; inferenceNb < 10; ++inferenceNb) {
		/* ------------- */
		/* - Inference - */
		/* ------------- */
		/* Pre-process and fill the input buffer */
		// Fill input buffer with constant data.
		for(uint32_t i = 0; i < buff_in_len; i++){
			buffer_in[i] = val;
		}
		// Clean and invalidate MCU DCache and invalidate NPU cache.
		mcu_cache_clean_invalidate_range(buffer_in, buffer_in + buff_in_len);
		npu_cache_invalidate();
		// Check that input buffer was properly assigned with "val".
		printf("Buffer[1] = %d \n \r", buffer_in[1]);
		printf("Buffer[1000] = %d \n \r", buffer_in[1000]);
		printf("Buffer[10000] = %d \n \r", buffer_in[10000]);
		//_pre_process(buffer_in);
		/* Perform the inference */
		LL_ATON_RT_Init_Network(&NN_Instance_Default); // Initialize network instance
		do {
			// Execute first/next epoch block
			ll_aton_rt_ret = LL_ATON_RT_RunEpochBlock(&NN_Instance_Default);
			// Wait for event if required
			if (ll_aton_rt_ret == LL_ATON_RT_WFE) {
				LL_ATON_OSAL_WFE();
			}
		} while (ll_aton_rt_ret != LL_ATON_RT_DONE);
		// Post-process the output buffer
		// Invalidate CPU cache if needed
		// Convert int8 to float. Buffer is int8, but model's output is float.
		uint8_t aux[4];
		float_t *conver;
		for(uint32_t i = 0; i < buff_out_len; i += 4){
			aux[0] = buffer_out[i];
			aux[1] = buffer_out[i+1];
			aux[2] = buffer_out[i+2];
			aux[3] = buffer_out[i+3];
			conver = (float_t *)aux;
			printf("Out %lu = %f \n \r", i, *conver);
		}
		//_post_process(buffer_out);
		LL_ATON_RT_DeInit_Network(&NN_Instance_Default);
		/* -------------------- */
		/* - End of Inference - */
		/* -------------------- */
	}
	LL_ATON_RT_RuntimeDeInit();
    /* USER CODE END 6 */
}

You will most likely have to enable to use float with printf. In properties:

This model has a 3-dimensional output buffer. To validate the output, only the first dimension is used in this example. Configure a serial terminal, such as Tera Term, with the settings shown in the image below. You are able to observe all the output buffer values.

You can use the Python script provided in the annex (provided at the end of this article). This is used to compare whether the quantized and optimized model running on your MCU produces the same outputs as the original model running on your PC.

To ensure that the application was correctly copied, that the peripheral initialization succeeded, and that the main loop is not stuck during inference, add the following two lines of code to blink LED2:

4.2 -4.png

HAL_GPIO_TogglePin(GPIOG,GPIO_PIN_10);
HAL_Delay(200);

4.3 Build

Your project is now complete. You may proceed to build it. Normally, there should be no errors. However, if you encounter dependency errors such as missing external sources, you can manually add them inside the nested projects. For instance, if you get errors indicating that LL functions are undeclared, it means that the compiler cannot locate the LL sources in the global middleware folder.

In that case, import the required source files and ensure that the folder is marked as a source location in the project settings.

To import a folder, right-click on the project, then select "Import" → "General" → "File System". Choose the folder containing the missing source files, and filter to import only ‘.c‘ files. Then, right-click the project again, go to "Properties" → "C/C++ General" → "Paths and Symbols." Under the "Source Location" tab, add the folder you just imported.

Look at this question from the ST Community product forums for additional troubleshooting: Solved: Linker garbage problem when deploying AI models on... - STMicroelectronics Community

After building, you will find the binaries in their respective "Debug" folders.

5. Deploying the application

5.1 Sign binary files with Signing Tool

Embedded systems that implement security features such as TrustZone®, as in the STM32N6, require firmware authentication. The STM32-SignTool is a key utility that ensures a secure platform by signing binary images using ECC keys. These signed binaries are used during the STM32 secure boot process to establish a trusted boot chain. This process ensures authentication and integrity checks of the loaded images.

In short, you must sign the generated binaries before flashing them to the N6.

The Signing Tool executable is located in your STM32CubeProgrammer installation directory (by default: C:/Program Files/STMicroelectronics/STM32Cube/STM32Cube Programmer/bin). To run the commands shown below, you can add the ‘bin‘ folder to your environment variables so that you can execute them from any directory.
Otherwise, run the command directly from the bin folder, specifying the full path to the binary file

STM32_SigningTool_CLI.exe -bin <your_project>.bin -nk -of 0x80000000 -t fsbl -o <your_project>-trusted.bin -hv 2.3 -dump <your_project>-trusted.bin

In our case, you want to sign two files:

Your_Project_Folder/FSBL/Debug/<Your_Project_Name>_FSBL.bin
Your_Project_Folder/Appli/Debug/<Your_Project_Name>_Appli.bin

And you should end up with 2 new files:

<Your_Project_Name>_FSBL-trusted.bin
<Your_Project_Name>_Appli-trusted.bin

For reference, the terminal output should look like:

5.2 Generate model weights binary image

In your project folder, you can find at the root a file named network_data.xSPI2.raw which contains the weights of your model. This file results from X-CUBE-AI and in particular is the result of the ST Edge AI Core command running behind it:

stedgeai generate --model Model_File.tflite --target stm32n6 --st-neural-art

Documentation: https://stedgeai-dc.st.com/assets/embedded-docs/index.html

In our case, we want to rename and convert this file to network_data.xSPI2.bin:

cp network_atonbuf.xSPI2.raw network_data.xSPI2.bin

Next, add the path to ‘arm-none-eabi-objcopy‘ to your environment variables. You can find it in your STM32CubeIDE installation, typically under: C:/ST/STM32CubeIDE_<version>/STM32CubeIDE/plugins/ com.st.stm32cube. ide.mcu.externaltools.gnu-tools-for-stm32.13.3.rel1.win32_1.0.0.202411081344/ tools/bin. This tool allows you to convert the ‘.bin‘ file into a ‘.hex‘ file with a specified flash memory address:

arm-none-eabi-objcopy -I binary network_data.xSPI2.bin --change-addresses 0x71000000 -O ihex network_data.hex

You now have the hexadecimal file containing fixed weights and parameters ready for flashing.

5.3 Flash binaries with STM32CubeProgrammer

We now have all three image files: the FSBL, the application, and the model weights.

Open STM32CubeProgrammer and ensure that your ST-LINK configuration matches the image below and confirm that the firmware is up to date.

Set your STM32N6570-DK board to development boot mode (Boot1 and Boot2 to the right) and click [Connect] in STM32CubeProgrammer.

Boot switches on the DK board:

On the image, the board is in boot from flash.

FSBL and application are binary files, and their flashing addresses must be specified manually.

Flash FSBL to the start of OctoFlash at ‘0x70000000‘.
Flash the application to ‘0x70100000‘, based on the offset defined in the external memory loader.
The weights image is a ‘.hex‘ file with a predefined address (‘0x71000000‘), specified in the ‘objcopy‘ command.

For example:

To flash the images:

Put the board into development boot mode by sliding both BOOT switches to the right.
Flash all binary and hex images.
After flashing, switch the board to flash boot mode. Now, both switches to the left.
Power-cycle or reset the board.

At boot, the boot ROM loads the FSBL from flash to internal RAM. The FSBL then loads the application from flash and executes it.

6. Running the application

Now, if you connect the board with both switches set to the left, the application is loaded and executed from flash memory. Here is what you should observe in a terminal:

Here are some comments:

The input size and output size can be understood by opening the model.tflite file that was downloaded earlier.

Tool: Netron

X-CUBE-AI allocates both input and output buffers as INT8 tables. For the model used in this tutorial, the input buffer data type is INT8, therefore the memory size allocated corresponds directly to its dimensions (192x192x3 = 110592 bytes).

However, the output buffers data type is FLOAT32, which means the memory size allocated is 4 times greater. So, for example, for the third output buffer on Netron - that is the first one you see on LL_aton_buffers_output_info - whose dimensions are (3875,2), 3875x2x4 = 30700 bytes were allocated. Then, as you can see on MX_CUBE_Ai_Process, the output buffer INT8 values must be "cast" to FLOAT32. Therefore, we use a float pointer to point at the beginning of a INT8 table of length 4.

In the serial terminal, only this output buffer used as example was printed out, as well as in the Python script.

By opening the network.c file in Appli/X-Cube-AI/App, you can find information about the outputs of the model generated by X-CUBE-AI:

In the function LL_Buffer_InfoTypeDef *LL_ATON_Output_Buffers_Info_Default, the three outputs are listed in order, and the first one corresponds to the size 3845 * 2 (so the last one in netron).

Conclusion

If you have carefully followed the steps in this tutorial, the green LED should now be turned on, and the red LED should be blinking. The blinking interval reflects the sum of the user-defined delay, the inference execution time, and the time required to print the outputs to the serial terminal. Furthermore, the output values observed on the terminal should match those produced by executing the reference Python script provided in the annex.

This tutorial aimed to provide a minimal yet functional application that enables users, regardless of expertise level, to develop STM32N6 edge AI projects with a clear and structured workflow.

Thank you for reading.

Annexes

import numpy as np
import tensorflow as tf
# Path to your TFLite model file
MODEL_PATH = "ssd_mobilenet_v2_fpnlite_035_192_int8.tflite"
# The constant value to fill into the input tensor
FILL_VALUE = 10
# Load TFLite model and allocate tensors
interpreter = tf.lite.Interpreter(model_path=MODEL_PATH)
interpreter.allocate_tensors()
# Get input and output tensor details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# You can print these to inspect:
print("Input details:", input_details)
print("Output details:", output_details)
# Getting model's input details
input_index = input_details[0]['index']
input_shape = input_details[0]['shape']
input_dtype = input_details[0]['dtype']
# Create input data filled with the constant value
input_data = np.full(input_shape, FILL_VALUE, dtype=input_dtype)
# Set the tensor
interpreter.set_tensor(input_index, input_data)
# Run inference
interpreter.invoke()
# Retrieve output tensors
outputs = []
for out in output_details:
output_data = interpreter.get_tensor(out['index'])
outputs.append(output_data)
# Print first buffer outputs
for dim in outputs[0]:
for i, val in enumerate(dim):
print(f"Output {i} = {val}")
print("\n\n\n\n")

SM19X · ‎2025-08-09

@Julian E.

This is exactly the type of documentation and demo I was looking for.

Thank you for writing this!!!

In line with the design contest using this board, may I ask for some guidance regarding - how can I display my results on the LCD? I know there is a TouchGFX library, but never used it. So would it possible to explain this?

For example - for MNIST I would like to display the digit and the NN prediction to the side.

Also - how do I use the ethernet to stream in test data (smulating a sensor) for my NN and then also publish predictions using ethernet (serial port is also okay, but ethernet would be nice)

Also - I assume the same way works for .keras models as well, or, is there some difference?

I was trying to use the online AI tool to use with onnx model (exported from Pytorch). I created a npz for quantization data files but always get a dimension mismatch error.

The npz had x=[BxCxHxW] and y=[BxD] type tensors for MNIST example

Best Regards

Julian E. · ‎2025-08-11

Hello @SM19X,

You can find different example applications with LCD display on the N6 here:

STM32N6-AI | Software - STMicroelectronics

I would suggest looking at the image classification as it is the easiest use case in my opinion.

Concerning the ethernet use, I am not aware of any example on our side.

For both LCD and ethernet, there is not really specific changes because you use AI, it is purely embedded development skills. So I would suggest to ask for support here in case you need it:

STM32 MCUs Products - STMicroelectronics Community

Concerning the keras support, it is indeed the same idea, the only think that may change is the supported layers that can find here: https://stedgeai-dc.st.com/assets/embedded-docs/supported_ops_keras.html

All the documentation can be found here:

https://stedgeai-dc.st.com/assets/embedded-docs/index.html

For your last point, please create a dedicated thread here: Edge AI - STMicroelectronics Community

Have a good day,

Julian

The-Big-Monster · ‎2025-09-30

Hello! Julian

Thank you very much for writing this document. Especially in the analysis of the model output section, it was very clear and intuitive, which greatly benefited me. I used the 'yolov8' model and converted it into TFlite. Deployed to STM32N647 using X-CUBE-AI.
Here is my code, and the parsed values are as follows:

Is my interpretation correct like this?

How can the parsed data be matched with the labels of the yolov8?

For example, the label data is:

0 0.501660 0.492237 0.224610 0.854224 0.966749
<class_id> <center_x> <center_y> <width> <height> <confidence>

Looking forward to your clarification。
Happy Mid-Autumn Festival!

Julian E. · ‎2025-10-01

Hi @The-Big-Monster,

I think you are printing correctly.

In your case you have the output and the quantization scale and offset, so you can get your float32 output with the formula: f=(q−z)×s, which you did.

Your output is of shape (1,7,1008). YOLOv8 outputs are arranged as:

7 = number of channels per prediction:
- [class_id, x_center, y_center, width, height, confidence, …] or
- [x_center, y_center, width, height, confidence, class_logits…] depending on export.

In your example:

<class_id> <center_x> <center_y> <width> <height> <confidence>

So, each of the 1008 predictions is one bounding box candidate.

To compare the output to your label, you need to do some filtering (post processing).

You don’t use all predictions. Typical steps:

Dequantize each output value. (which you did)
Apply confidence threshold: keep only boxes with confidence > e.g. 0.25.
Decode bounding boxes (scale x_center, y_center, w, h back to image coordinates).
Non-Maximum Suppression (NMS): remove overlapping boxes, keep the highest confidence ones.
Get final predictions to compare against your ground-truth labels.

Have a good day,

Julian

goat · ‎2025-10-03

The tutorial is very good

However, at the step 6, nothing shows in the terminal on my side.
I follow the tutorial step by step.

What can be the cause???

Julian E. · ‎2025-10-05

Hello @goat,

It is difficult to say.

Look at the LEDs status to first know if the issue seems to come from the FSBL or APPLI.

Add printf or change the LEDs status in the code to try determining where it stops etc

Make sure to follow everything listed.

Try to first create the project with ST32Cube FW_N6: Version 1.1.1. I think by default the latest version is 1.2.0.

Have a good day,

Julian

ponara28 · ‎2025-10-08

Hi,

Thanks for this wonderful article on deploying model on STM32N6

When i ran the code above, i had 2 observations. As per the document when i enabled RIF i see only below code generated.

static void SystemIsolation_Config(void)

{

/* USER CODE BEGIN RIF_Init 0 */

/* USER CODE END RIF_Init 0 */

/* set all required IPs as secure privileged */

__HAL_RCC_RIFSC_CLK_ENABLE();

/* RIF-Aware IPs Config */

2.) When i run with the above code, my module is stuck forever at

LL_ATON_RT_Init_Network(&NN_Instance_Default); // Initialize passed network instance object

do {

/* Execute first/next step */

ll_aton_rt_ret = LL_ATON_RT_RunEpochBlock(&NN_Instance_Default);

/* Wait for next event */

if (ll_aton_rt_ret == LL_ATON_RT_WFE) {

LL_ATON_OSAL_WFE();

}

} while (ll_aton_rt_ret != LL_ATON_RT_DONE);

Appreciate to know of feedback.

Thanks.

Léo · ‎2025-10-15

Hi @ponara28, I faced the same two issues during my tests. Did you succeed to go through?

Hi @Julian E., thank you for the tutorial. I followed all the steps and even try with the given .ioc file but didn't manage to make it works. I have no log and LEDs doesn't seem to be activated. I change the LED activation in the FSBL to make it happens earlier but doesn't seem to do anything. Where are the two LEDs on the board. Are they LED3 and 4? In that case, can it have conflicts between the LED control we perform from the code and the LED status of the serial connection?

In the guide, you indicated to use the following firmware:

ST32Cube FW_N6: Version 1.1.1
STM32Cube MCU package for STM32N6 series: Version 1.2.0

I'm already using STM32Cube MCU package for STM32N6 series: Version 1.2.0 (don't know how to change it), verified on "Help > Manage embedded software packages".

Where can I verify (and also change if needed) that the ST32Cube FW_N6 patch I'm using is Version 1.1.1. Is it included/related to STM32Cube MCU package for STM32N6 series version?

Appreciate to have a feedback.

Thanks

Julian E. · ‎2025-10-29

Hello @Léo,

ST32Cube FW_N6 and STM32Cube MCU package for STM32N6 series are the same things, sorry for this mistake, I will edit the guide.

Please use the version 1.1.1

Have a good day,

Julian

nonspeaker · ‎2025-11-08

Hello! Julian.

@Julian E.
I followed your steps, but why is the content of the SystemIsolation_Config() function in the main.c file of my Appli project different from what you described in your tutorial?
My code was successfully compiled at the end, but after programming and running it, there was no display in the terminal. I'm not sure if there is a problem here.

The following software and versions were used in my project:

STM32CubeIDE: Version 1.19.0
STM32CubeMX: Version 6.15.0
X-CUBE-AI plugin: Version 10.2.0
STM32Cube MCU package for STM32N6 series: Version 1.1.1

How to build an AI application from scratch on the STM32N6570-DK using STM32CubeMX

Introduction

1. Hardware and software prerequisites

2. STM32CubeN6 package

2.1 Types of templates

2.1.1 Basic template

2.1.2 FSBL Load and Run (LRUN) template

2.1.3 FSBL execute in place (XIP) template

2.1.4 Isolation LRUN template

2.1.5 Isolation XIP template

3. Design in STM32CubeMX

3.1 Create a project

3.2 Configure peripherals

3.2.1 System core

3.2.2 LEDs

3.2.3 Overdrive mode

3.2.4 Analog

3.2.5 Connectivity

3.2.5.1 Configure serial communication with external memory

3.2.5.2 Configure serial communication for debugging

3.2.6 Multimedia

3.2.7 Security

3.2.8 Middleware and software packs

3.2.8.1 Configure external memory manager middleware

3.2.8.2 Configure the X-CUBE-AI middleware

3.3 Configure the clocks

3.4 Project manager

4. STM32CubeIDE

4.1 Add code in the FSBL

4.2 Add code in Appli

4.3 Build

5. Deploying the application

5.1 Sign binary files with Signing Tool

5.2 Generate model weights binary image

5.3 Flash binaries with STM32CubeProgrammer

6. Running the application

Conclusion

Related links

Annexes