Problems generating ONNX model with ST Edge AI for Nucleo-N657X0-Q

Dresult · ‎2025-05-19

Hi everyone,
I'm having trouble generating a project to run an AI model on the Nucleo-N657X0-Q board.

I open STM32CubeMX version 6.14.1, select the board, choose "Secured Application", and then when I select the X-CUBE-AI package, it asks for the ZIP file available here:
https://www.st.com/en/development-tools/stedgeai-core.html?ecmp=tt9470_gl_link_feb2019&rt=db&id=DB4255

I download it, choose to use X-CUBE-AI in the Application domain, and when I try to select the ONNX model, it asks whether I want to optimize peripherals and clock. At this point, whether I click "Yes" or "No", CubeMX freezes indefinitely.

I’ve already tried using CubeMX version 6.13 as well, but it didn’t help.

My suspicion is that the problem might be related to the version of ST Edge AI. Version 2.1 on the ST Edge AI Developer Cloud throws an error during optimization (although maybe it's unrelated). Also, when selecting the package ZIP file, the dialog only shows folders, not ZIP files. I have to manually navigate to the directory and enter the ZIP file name in the dialog box to select it. However, I can't try older versions as I can't download them from the website.

Is there any way to fix this issue ?

N.B. I am working on Windows 11. However, I have the problems also on a Ubuntu system. I am attaching the resulting log file if it could be of help.

SlothGrill · ‎2025-05-26

Hello

As stated in the other ticket you refer to (here)

As it seems that the firmware runs (i.e. it manages to connect through uart when you ask for a validation)...

The main difference that may cause issues between Nucleo and DK projects is that the Nucleo boards do not come with an external RAM connected on XSPI1.

Could you make sure that:

Your firmware does not try to read/write/memory-map the external RAM (that is present on the DK)
Your firmware does not try to set RISAF config if the XSPI1 is not clocked.

On your configuration side:

As Nucleo does not have a memory mapped at address 0x9000'0000, the memory pool file (.mpool) file you provide to the Neural-Art compiler shall not show it either -or set its size to 0-! (otherwise it is possible that the generated file will try to access memory @0x9000'0000 ... which will result in a bad crash.)

Could you do those checks and try again, and tell us if this is better ?
Thanks .

Dresult · ‎2025-05-26

Hello @SlothGrill!

I checked and there was an entry in the memory pool related to the xSPI1, moreover its use was enabled in the memory ram tab. I removed the entry and disabled its use but nothing changed. I also checked if in the generated code there was some line in which the program tried to R/W the external memory but I found nothing.

As for the RISAF, it is configured as in the pictures:

I'm attaching the code, the ioc project and generated binaries if it could be of help.

Question #1: Am I supposed to flash something in the NPURAMs ? As I am flashing the network weights at 0x71000000 which corresponds to the initial address of the octoFlash but from the generated report I can see that also part of the NPU RAMs are allocated and the application binary is just 152 KB...

Question #2: As you can see from the UART output in the previous page, input and output buffers seems to be mapped at the same address (0x342e0000), it is normal ?

Question #3: From the UART output I can see "params: n.a." is it normal ?

SlothGrill · ‎2025-05-27

@Dresult ,

I tested your project, and end up with the same behaviour as you: the inference seems stuck in the first epoch executed. There may be a bug somewhere, would it be possible for you to share your original model with us ? (so i can try it with our new versions, that may have fixed that)

Thanks.

Q#1: What do you mean by " from the generated report I can see that also part of the NPU RAMs are allocated" ?

From what i see in your config, no weight/param should end up in NPURams, so there should be no need to initialize them. (if an initializer file is generated, then you can check if it is only zeros). If the NPURams are used only for activations computations, then it should be ok without initing it.

Q#2: Yes this is normal, input is placed in 0x342e0000, then there is no guarantee that the data at this address will be kept intact during inference. This address slot is used during inference to store activations. Outputs are a special kind of activation. So input is at 0x342e0000 before inference (and is then lost during inference) and output is at 0x342e0000 after inference.

Q#3: This is normal, tweaking the code i can print params : 1077 KiB Printing such info requires you to set the following define LL_ATON_DBG_BUFFER_INFO_EXCLUDED=0 (which will make the codesize grow) + fix an issue in aiValidation_ATON.c:600 -> #if (LL_ATON_DBG_BUFFER_INFO_EXCLUDED == 0)

Dresult · ‎2025-05-27

Thanks a lot @SlothGrill

I'm attaching the zip file containing the onnx model

SlothGrill · ‎2025-05-28

Hey,
So i have other kinds of issues with newer versions of the tool. I guess i will raise a bug. I am sorry :)

Not lucky, here...

Do you have other models to test ? (e.g. nets for CV ? )

Did you draft this one by yourself ?

Dresult · ‎2025-05-28

Thanks anyway for trying. This is the only model I wanted to put in the board and yes, I had made it with Pytorch and exported to ONNX quantizing it layer with PTQ quantization.

Maybe there would be some possibility to get the CubeIDE project from the cloud for the N657-DK and adapt it by substituting the files I get from ST Edge AI locally, maybe adapting the memory pool addresses ?

Dresult · ‎2025-05-30

Hello @Julian E., @SlothGrill,

I have an update:

I converted my relatively simple model in Keras by reconstructing the architecture and transferring the weights. I then quantized and converted it to .tflite.
In the process, I had to make several optimizations, such as manually using Conv2D and MaxPool2D instead of the 1D versions. I also had to use TensorFlow/Keras 2.9 to avoid the error from the ST tool: TOOL ERROR: operands could not be broadcast together with shapes (x,x) (x,x). Now, the model is equivalent to the PyTorch/ONNX counterpart.
Using CubeMX 6.14.1 and X-CUBE-AI 10.1.0, I managed to generate a CubeIDE project that compiles without issues. I flash the FSBL, application, and weights, then try validation, but I still get a timeout.
I also tried using the .tflite model miniresnet_1stacks_64x50_tl_int8 from the model zoo, but I still get a timeout. At this point, I'm starting to think the problem is that the NPU on my board isn’t starting because the board on the cloud can run them.

I tried to compare the two generated C-code (local vs cloud generated ones) and using the same compilation flags they are the same apart from the xSPI2 addresses in use as I have changed the default one in my local project.

Is there anything I can do ?

I'm attaching the tflite model and the latest generated project with the .ioc inside if you want to take a look.

N.B. In the project attached I'm not using the suggested clock and peripheral values from X-CUBE-AI as if I enable them the application doesn't start...

SlothGrill · ‎2025-05-30

Hello @Dresult ,

So, i took a look at your project (and reproduced your issues).

I've done a lot of fiddling around, but I guess those two steps might help to get your software working:

(IRQ_ERR) In the provided code, your inference uses Epoch Controller of the Neural-Art accelerator
- This is a bit complex :)
- What seems to happen is that, when the epoch controller is used to execute some code, it fails.
- Illegal Accesses can be observed through IAC and RISAF5-NPU_MST1- & RISAF7-FLEXRAM-
- My guess is that when the epoch controller is trying to fetch its microcode in RAM, the access is forbidden (the Epoch controller fetches only "0", and this microcode isn't executable -> NPU_IRQ -> IrqErr.
  - This is due to the "secure guard" feature of the RIFSC (see Reference manual): since the configuration port of the NPU is not configured, it is "non-secure" by default, and as such, all the requests it does are non-secure. (and blocked by default risaf configuration).
  - In your "systemIsolationConfig" add the following line, that should do the trick:
    HAL_RIF_RISC_SetSlaveSecureAttributes(RIF_RISC_PERIPH_INDEX_NPU, RIF_ATTRIBUTE_PRIV | RIF_ATTRIBUTE_SEC);
  - This will set the configuration port of the NPU "secure", then allowing to do proper secure+privileged+CID=1 accesses everywhere (eg. in FLEXRAM) in your current code. --otherwise, again, accesses are non-secure and may be blocked by RISAFs--
(HARD_FAULT) In the provided code, the 2nd epoch is a "software convolution", this does a call to the "Network_Runtime" library provided by STEdgeAI.

This function needs a bit of stack to properly execute.
Please grow your stack size (in the .ld script), this should solve this other issue.

Have a nice weekend,

Cheers.

Dresult · ‎2025-06-02

Hi @SlothGrill ,
thank you so much for your support and, above all, for taking the time to help me! :)

I followed the steps you suggested, and I can confirm that the validation now works correctly, thanks a lot!

I might have eventually figured out the stack issue on my own, but I doubt I would have ever identified the Secure Guard part without your help. I’ll definitely take a closer look at the Reference Manual soon.

Thanks again!

P.S. I noticed that when I change the stack and heap sizes from the .ioc file, the values seem to reset to the default (0x800) upon saving.

Dresult · ‎2025-06-10

Edit: Issue solved on my side. Thanks anyway!