2025-06-17 11:59 PM
Hello,
I'm trying to run a ML model (miniresnet from modelzoo) to a NUCLEO-N627X0-Q starting from STM32CubeMX (initialization of device, model & code generation). Everything is OK when i select STM32Cube.AI MCU runtime, the model is created and runs smoothly, however, when i select STM32Cube.AI Neural-ART runtime (with n6-noextmem profile since i have various problems with external flash and the model is small enough to fit in SRAM) the code is generated fine, but an extra .raw file ([modelname]_atonbuf.AXISRAM5.raw) is created.
I noticed the the compiled .bin file (In STM32CubeIDE) is rather small (~62Kb) so i am pretty confident that it does not contain the full model (~122kb). I suspect that somehow i have to load this raw file to memory for the NPU to read.
When i run the model through NPU the output is constantly zero (uninitialized or zeroed weights maybe?).
I am using the nucleo in n DEV_BOOT Mode, do i need to load the raw file to memory manually? is there any guides or instruction on how to do it?
I am attaching the memory-pools setup and the NN analyze results for the model i use
Thanks
2025-06-19 6:44 AM
I ran into this issue as well the documentation on it is thin on the ground. What I did to solve it was use cube programmer to place the raw file in flash at address 0x71000000. Then in my application read this memory mapped flash area and copy it into the NPU region expected by the network.c file. I my case I had weights and activations in the raw, if you have them split you will have to load them separately. You also need to invalidate the dcache where the weights/inputs/outputs are and have set up the RIF to allow the MCU to read the memory areas. Failure to do that last step and you will see zeros in memory.
2025-06-19 10:36 PM - edited 2025-06-19 10:55 PM
Thanks for the reply.
I tryed to do something similar by selecting extflash profile to X-Cube-AI plugin in STM32CubeMX (instead of the noExtMem that creates the .raw npu axiSram5 raw it created an xspi2 raw file) i then uploaded the model to 0x71000000 through CubeProgrammer. However, as soon as the MCU requested to access the external memory, it crushed (debug shows that it jumped to HardFault_Handler()). I verified the problem by uploading some known bytes (in fact a "HELLO" string) to 0x71000000 and then tryed to read them by assigning some pointers to these addresses. As soon as i requested to printf the data from the extenal memory it crushed. However, I did not invalidated the Dcache or modified the RIF, as you did , so maybe, therein lies the problem. I'll try to drop the clock for the XSPI2 to 50Mhz to insure stability and try your way. I will the report the results here.
Could you please tell me how did you invalidate the dcache and when (i assume just before you begun to read the external memory?)
Thanks
Tasos
2025-06-20 12:59 AM
If you are getting a crash trying to access 0x71000000 its because the region isn't mapped correctly. I copied the nucleo BSP code to do this. In the FSBL right before BOOT_Application.
BSP_XSPI_NOR_Init_t NOR_Init;
NOR_Init.InterfaceMode = BSP_XSPI_NOR_OPI_MODE;
NOR_Init.TransferRate = BSP_XSPI_NOR_DTR_TRANSFER;
BSP_XSPI_NOR_Init(0, &NOR_Init);
BSP_XSPI_NOR_EnableMemoryMappedMode(0);
and in Appli (axis ram regions I took from network.c)
// copy weights to axis ram 0x342E0000-0x3433E5E0
uint32_t modelSize = 0x3433E5E0 - 0x342E0000;
memcpy(0x342E0000, 0x71000000, modelSize);
SCB_CleanInvalidateDCache_by_Addr(0x342E0000, modelSize);
2025-06-21 1:51 AM - edited 2025-06-21 1:54 AM
Thanks again for your reply, I really appreciate it!
It looks like STM32CubeMX gui does not generate the full BSP code, thus the NOR setup & init code is not available in the exported CubeIDE project.
I thought to apply “the good ‘ol way” of ML model encoding/storage: xxd the raw file to a c array copy/paste it to appli code and then traverse-copy it to 0x342E0000 one byte at a time (loop-increased pointer copy from the array)! Failed. I also tried to use memcpy array to sram using the code you provided… again zeros everywhere (even with disabled cache).
Right now, I’m flabbergasted by the fact that I’m not even able to… write to RAM! (e.g. a simple *((uint32_t*)0x342e0000)=0x11223344;) returned zeros to SRAM!!!
I’m also pretty much certain that this is not a hardware issue with the board since the code (binary) in the following post, runs perfectly from the flash (Boot jumpers to flash boot!). This, however is the only example I managed to run from flash
I’m either doing something extremely wrong, although following the stm32cubemx example to the letter does not work for me ( https://community.st.com/t5/stm32-mcus/how-to-create-an-stm32n6-fsbl-load-and-run/ta-p/768206 ) or the stm32cubemx gui produces (extremely) broken code!!!
The question now is (since i can't access the flash), what prevents the direct SRAM writing? Maybe a security feature? Does is have to be included separately to linker script? (Tried to __attrib the c array to a custom RAM section…. Major crush there aswell!!!!!).
Tasos
2025-06-21 2:53 AM
Did you enable the ram regions?
RAMCFG_HandleTypeDef hramcfg = {0};
hramcfg.Instance = RAMCFG_SRAM3_AXI;
HAL_RAMCFG_EnableAXISRAM(&hramcfg);
hramcfg.Instance = RAMCFG_SRAM4_AXI;
HAL_RAMCFG_EnableAXISRAM(&hramcfg);
hramcfg.Instance = RAMCFG_SRAM5_AXI;
HAL_RAMCFG_EnableAXISRAM(&hramcfg);
hramcfg.Instance = RAMCFG_SRAM6_AXI;
HAL_RAMCFG_EnableAXISRAM(&hramcfg);
2025-06-21 10:08 AM
Well it looks like, it was partially hardware issue and part software.
I managed to read/write the external flash by decreasing -some- clocks [with some of them having nothing to do with flash.... but whatever!!!!]
Here, i'm attaching the maximum (found by trial & error) clocks in order to run the NPU and have access to external flash. Keep in mind that high speed OTP for flash (VDDI3 level) is enabled and 1.8V level is also enabled in software!
Now the issue i have is that i'm trying to fill the model input buffer with random data in order to verify that it works but i cannot write to buffer (for some reason everything remains zeroed).
Any idea anyone? (i'll also make a new post since this is a totaly new problem)!
2025-06-23 9:22 AM
You mean this one in GUI? no..... it would have been too easy and too logical to do so!
I may have been chasing my tail these last days!!!! {***... this brings back memories of "sleepless debuging nights for the lost EOL!"}. I will be testing tommorow and report back!
Kudos once again :)
2025-06-24 2:41 AM - edited 2025-06-24 2:42 AM
Alright... update...
part of the problem solved (axi sram had to be initialized and enabled).... however... for some reason it initializes base address on different area: (this is auto-generated code by STM32CubeMX)... reading and writing to 0x342E0200 crushes, reading and writing to 0x42023000 does not crush, but returns zero no matter what value is written (all caching is disabled just to be on the safe side)!
2025-07-01 6:02 AM
So.... it is a problem with CubeMX code generation (broken memory init)
https://community.st.com/t5/edge-ai/ml-model-input-buffer-problem/m-p/817906#M5046