2025-05-19 6:13 AM - edited 2025-05-23 3:29 AM
Hi everyone,
I'm having trouble generating a project to run an AI model on the Nucleo-N657X0-Q board.
I open STM32CubeMX version 6.14.1, select the board, choose "Secured Application", and then when I select the X-CUBE-AI package, it asks for the ZIP file available here:
https://www.st.com/en/development-tools/stedgeai-core.html?ecmp=tt9470_gl_link_feb2019&rt=db&id=DB4255
I download it, choose to use X-CUBE-AI in the Application domain, and when I try to select the ONNX model, it asks whether I want to optimize peripherals and clock. At this point, whether I click "Yes" or "No", CubeMX freezes indefinitely.
I’ve already tried using CubeMX version 6.13 as well, but it didn’t help.
My suspicion is that the problem might be related to the version of ST Edge AI. Version 2.1 on the ST Edge AI Developer Cloud throws an error during optimization (although maybe it's unrelated). Also, when selecting the package ZIP file, the dialog only shows folders, not ZIP files. I have to manually navigate to the directory and enter the ZIP file name in the dialog box to select it. However, I can't try older versions as I can't download them from the website.
Is there any way to fix this issue ?
N.B. I am working on Windows 11. However, I have the problems also on a Ubuntu system. I am attaching the resulting log file if it could be of help.
Solved! Go to Solution.
2025-05-26 7:11 AM - edited 2025-05-26 7:12 AM
Hello
As stated in the other ticket you refer to (here)
As it seems that the firmware runs (i.e. it manages to connect through uart when you ask for a validation)...
The main difference that may cause issues between Nucleo and DK projects is that the Nucleo boards do not come with an external RAM connected on XSPI1.
Could you make sure that:
On your configuration side:
Could you do those checks and try again, and tell us if this is better ?
Thanks .
2025-05-26 8:37 AM - edited 2025-05-27 1:08 AM
Hello @SlothGrill!
I checked and there was an entry in the memory pool related to the xSPI1, moreover its use was enabled in the memory ram tab. I removed the entry and disabled its use but nothing changed. I also checked if in the generated code there was some line in which the program tried to R/W the external memory but I found nothing.
As for the RISAF, it is configured as in the pictures:
I'm attaching the code, the ioc project and generated binaries if it could be of help.
Question #1: Am I supposed to flash something in the NPURAMs ? As I am flashing the network weights at 0x71000000 which corresponds to the initial address of the octoFlash but from the generated report I can see that also part of the NPU RAMs are allocated and the application binary is just 152 KB...
Question #2: As you can see from the UART output in the previous page, input and output buffers seems to be mapped at the same address (0x342e0000), it is normal ?
Question #3: From the UART output I can see "params: n.a." is it normal ?
2025-05-27 4:24 AM
@Dresult ,
I tested your project, and end up with the same behaviour as you: the inference seems stuck in the first epoch executed. There may be a bug somewhere, would it be possible for you to share your original model with us ? (so i can try it with our new versions, that may have fixed that)
Thanks.
Q#1: What do you mean by " from the generated report I can see that also part of the NPU RAMs are allocated" ?
From what i see in your config, no weight/param should end up in NPURams, so there should be no need to initialize them. (if an initializer file is generated, then you can check if it is only zeros). If the NPURams are used only for activations computations, then it should be ok without initing it.
Q#2: Yes this is normal, input is placed in 0x342e0000, then there is no guarantee that the data at this address will be kept intact during inference. This address slot is used during inference to store activations. Outputs are a special kind of activation. So input is at 0x342e0000 before inference (and is then lost during inference) and output is at 0x342e0000 after inference.
Q#3: This is normal, tweaking the code i can print params : 1077 KiB Printing such info requires you to set the following define LL_ATON_DBG_BUFFER_INFO_EXCLUDED=0 (which will make the codesize grow) + fix an issue in aiValidation_ATON.c:600 -> #if (LL_ATON_DBG_BUFFER_INFO_EXCLUDED == 0)
2025-05-27 5:27 AM
Thanks a lot @SlothGrill
I'm attaching the zip file containing the onnx model
2025-05-28 7:47 AM
Hey,
So i have other kinds of issues with newer versions of the tool. I guess i will raise a bug. I am sorry :)
Not lucky, here...
Do you have other models to test ? (e.g. nets for CV ? )
Did you draft this one by yourself ?
2025-05-28 9:03 AM
Thanks anyway for trying. This is the only model I wanted to put in the board and yes, I had made it with Pytorch and exported to ONNX quantizing it layer with PTQ quantization.
Maybe there would be some possibility to get the CubeIDE project from the cloud for the N657-DK and adapt it by substituting the files I get from ST Edge AI locally, maybe adapting the memory pool addresses ?
2025-05-30 12:31 AM - edited 2025-05-30 1:22 AM
Hello @Julian E., @SlothGrill,
I have an update:
I converted my relatively simple model in Keras by reconstructing the architecture and transferring the weights. I then quantized and converted it to .tflite.
In the process, I had to make several optimizations, such as manually using Conv2D and MaxPool2D instead of the 1D versions. I also had to use TensorFlow/Keras 2.9 to avoid the error from the ST tool: TOOL ERROR: operands could not be broadcast together with shapes (x,x) (x,x). Now, the model is equivalent to the PyTorch/ONNX counterpart.
Using CubeMX 6.14.1 and X-CUBE-AI 10.1.0, I managed to generate a CubeIDE project that compiles without issues. I flash the FSBL, application, and weights, then try validation, but I still get a timeout.
I also tried using the .tflite model miniresnet_1stacks_64x50_tl_int8 from the model zoo, but I still get a timeout. At this point, I'm starting to think the problem is that the NPU on my board isn’t starting because the board on the cloud can run them.
I tried to compare the two generated C-code (local vs cloud generated ones) and using the same compilation flags they are the same apart from the xSPI2 addresses in use as I have changed the default one in my local project.
Is there anything I can do ?
I'm attaching the tflite model and the latest generated project with the .ioc inside if you want to take a look.
N.B. In the project attached I'm not using the suggested clock and peripheral values from X-CUBE-AI as if I enable them the application doesn't start...
2025-05-30 6:30 AM - edited 2025-05-30 6:36 AM
Hello @Dresult ,
So, i took a look at your project (and reproduced your issues).
I've done a lot of fiddling around, but I guess those two steps might help to get your software working:
Have a nice weekend,
Cheers.
2025-06-02 12:45 AM
Hi @SlothGrill ,
thank you so much for your support and, above all, for taking the time to help me! :)
I followed the steps you suggested, and I can confirm that the validation now works correctly, thanks a lot!
I might have eventually figured out the stack issue on my own, but I doubt I would have ever identified the Secure Guard part without your help. I’ll definitely take a closer look at the Reference Manual soon.
Thanks again!
P.S. I noticed that when I change the stack and heap sizes from the .ioc file, the values seem to reset to the default (0x800) upon saving.
2025-06-10 6:34 AM - edited 2025-06-10 7:00 AM
Hi @SlothGrill,
sorry to reopen the topic. I noticed that during validation, although the inference time matches the one obtained in the cloud, the comparison between the original model and the C implementation shows high errors. This only happens when using the NPU, if I compile for the MCU instead, the metrics are almost perfect.
I’ve tried enabling and disabling the epoch controller as well as completely disabiling the RIF for the application, but it didn’t help. Any ideas?