2025-11-15 11:54 PM
Good day Folks,
Firstly, is it possible to put NN weights on the external Flash, but not use it in memory mapped mode?
BSP_XSPI_NOR_Init_t NOR_Init;
NOR_Init.InterfaceMode = BSP_XSPI_NOR_OPI_MODE;
NOR_Init.TransferRate = BSP_XSPI_NOR_DTR_TRANSFER;
BSP_XSPI_NOR_Init(0, &NOR_Init);
BSP_XSPI_NOR_EnableMemoryMappedMode(0);Secondly is it possible to run an AI model with NO flash access at all, i.e. to only use internal RAM for the case of a small model?
I have an application that is already using the flash for storing data such as Logs & settings, via a XSPI Read/Write interface. However that interface is broken when configuring the flash to use the memory map.
The image classification STM32N6570-DK example code runs fine and can use the external flash, however it then appears I can't use the Flash for anything else but the NN tables & weights.
I then tried updating the .mpool to exclude using the Flash, but the model is then not working or giving the same inference results.
{
"fname": "xSPI2",
"name": "octoFlash",
"fformat": "FORMAT_RAW",
"prop": { "rights": "ACC_READ", "throughput": "MID", "latency": "HIGH", "byteWidth": 1, "freqRatio": 6.00, "cacheable": "CACHEABLE_ON", "read_power": 110, "write_power": 400.0, "constants_preferred": "true" },
"offset": { "value": "0x70680000", "magnitude": "BYTES" },
"size": { "value": "0", "magnitude": "KBYTES" }
}Thirdly, if flash access is REQUIRED is it possible to read the required constants out of flash into a RAM buffer when the NN is initialised, so Flash access is not required on every inference?
Thank you in advance, I am really hoping we can navigate this so the flash can be used by other subsystems and not just the NN.
Regards
Solved! Go to Solution.
2025-11-26 8:44 AM
Hello @exarian ,
Firstly, is it possible to put NN weights on the external Flash, but not use it in memory mapped mode?No :)
The NPU DMAs need to address the external flash to be memory-mapped, they cannot handle indirect mode.
Secondly is it possible to run an AI model with NO flash access at all, i.e. to only use internal RAM for the case of a small model?Yes, this is definitely possible.
To do that you should Remove the external flash from the mpool file (or set its size to 0 like you did above).
This will work, but then in real life, you will need to save your weights in a non-volatile memory and move them to internal RAMs at their correct addresses before running your inference, when all the RAMs are powered-on, and accessible. This is not hard, but can be time consuming to develop something solid.
This use-case is easier to handle with the provided solution documented here (Relocatable Mode). Maybe you should take a look at it !
Thirdly, if flash access is REQUIRED is it possible to read the required constants out of flash into a RAM buffer when the NN is initialised, so Flash access is not required on every inference?Flash access is not REQUIRED if your model weights/activations (and whatever else needed) can fit in RAM.
Flash is required if you need to store things between resets => Using the flash to store weights directly is easier (but not necessarily faster) than moving them around at inference time.
Again, take a look at the Relocatable mode page above, as it may make things simpler for you, or dig more into how you can install the weights of the network in internal memories before doing the inference manually (again, you will need to store weights eg. in flash, and manually install them into internal memories before doing the inference), this should solve your issues.
Let us know how it goes !
Best regards.
2025-11-26 8:44 AM
Hello @exarian ,
Firstly, is it possible to put NN weights on the external Flash, but not use it in memory mapped mode?No :)
The NPU DMAs need to address the external flash to be memory-mapped, they cannot handle indirect mode.
Secondly is it possible to run an AI model with NO flash access at all, i.e. to only use internal RAM for the case of a small model?Yes, this is definitely possible.
To do that you should Remove the external flash from the mpool file (or set its size to 0 like you did above).
This will work, but then in real life, you will need to save your weights in a non-volatile memory and move them to internal RAMs at their correct addresses before running your inference, when all the RAMs are powered-on, and accessible. This is not hard, but can be time consuming to develop something solid.
This use-case is easier to handle with the provided solution documented here (Relocatable Mode). Maybe you should take a look at it !
Thirdly, if flash access is REQUIRED is it possible to read the required constants out of flash into a RAM buffer when the NN is initialised, so Flash access is not required on every inference?Flash access is not REQUIRED if your model weights/activations (and whatever else needed) can fit in RAM.
Flash is required if you need to store things between resets => Using the flash to store weights directly is easier (but not necessarily faster) than moving them around at inference time.
Again, take a look at the Relocatable mode page above, as it may make things simpler for you, or dig more into how you can install the weights of the network in internal memories before doing the inference manually (again, you will need to store weights eg. in flash, and manually install them into internal memories before doing the inference), this should solve your issues.
Let us know how it goes !
Best regards.
2025-11-27 7:31 PM
Thank you @SlothGrill for your support, it is greatly appreciated!
Prior to posting I tried setting the flash to zero, however then got a different result. I think what what left is putting the actual weights at the addresses placed in RAM. It shouldn't be too hard to test this out. I had assumed the generated network.c would already just include these as constants or an array.
I was not aware of the relocatable model, that is quite a neat trick. That is exactly what I was looking for, being able to update the model OTA by updating the model in flash, without updating the whole running application as well.
In the mean time I worked around having to leave the weights in flash, by creating a flash access mutex that can be shared between subsystems. When the NN gets the mutex, it then enables the Mmap, does its work, then disables the Mmap. The subsystems later can do the indirect read/writes as before when they grab the mutex. It is not as elegant as I hoped, especially given that each inference is reading from the flash, whereby it could just get it from RAM.
However the RAM migration is an update which we'll have to do if we need it to perform better. However given the total inference is 2ms using the flash we are orders of magnitude better already.
For updates, I definitely will be taking a look at the relocatable model, thank you again.