2025-09-24 4:47 AM
Hello ST Community!
In my use of the STM32N6570-DK I have run several experiments with the kit's tools, and have arrived into some doubts regarding the overall use of the API and the models in the Model Zoo. I have the following questions:
Question 1.
The Input/Output of my YOLO model is Uint8_t/Float, confirmed by checking the value of the Buffer_DataType_TypeDef element in the network instance, which are both supposed to be stored in a single dimension array as suggested by the API.
The output of the model as per the model zoo is (1,4+K,F), with F being proportional to the input image size, in the case of a 256x256x3 image, it's value is 1344.
The value of K can be calculated by the maximum size in bytes of the output located in network.h, making the output (1,5,1344), which in a single array would mean, considering output[] as a float array buffer of the output pointer, with values normalized between [0,1]:
output[0] = Center X
output[1] = Center Y
output[2] = Width
output[3] = Height
output[4] = Score
output[5] = Center X2
output[n] = ...
I would like to know if my interpretation of such results are accurate to the model behavior, since I have not been able to get coherent results so far, be it by reinterpreting the results in the output buffer in either a float or a unsigned int.
Question 2.
During my research I have found applications that declare the input/output buffers by either getting the address to the start of the buffer:
const LL_Buffer_InfoTypeDef * ibuffersInfos = LL_ATON_Input_Buffers_Info(&NN_Instance_Default);
input_ptr = (uint8_t *)LL_Buffer_addr_start(&ibuffersInfos[0]);
Or by using LL_ATON_Set_User_Input_Buffer() with a predefined array as parameter, which is also explained by the ll_aton API doc:
uint8_t input_buff[256 * 256 * 3];
res = LL_ATON_Set_User_Input_Buffer(&NN_Instance_Default, 0, input_buff, 256 * 256 * 3);
So out of both approaches, what exactly is the correct way to use the input/output buffers? Should I rely on the LL_ATON_Set_User_Input_Buffer() or simply use memcpy() to write/recieve in the address of the buffer directly?
Links:
Model Zoo
https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/object_detection/yolo11n
ST Neural Art API
https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_api_and_stack.html#user-allocated-inputsoutputs
Solved! Go to Solution.
2025-10-13 8:00 AM - edited 2025-10-13 8:02 AM
By talking to the ST AI Team I have found and confirmed the answers to the questions above:
Question 1.
As supposed in the original text, the type of your input/output buffer can be checked by looking at the value in the Buffer_DataType_Typedef element of a LL_Buffer_Info_TypeDef, which follows the definition below:
typedef enum
{
DataType_UNDEFINED = 0,
DataType_FLOAT = 1,
DataType_UINT2 = 2,
DataType_INT2 = 3,
DataType_UINT4 = 4,
DataType_INT4 = 5,
DataType_UINT8 = 6,
DataType_INT8 = 7,
DataType_UINT16 = 8,
DataType_INT16 = 9,
DataType_INT32 = 10,
DataType_INT64 = 11,
DataType_STRING = 12,
DataType_BOOL = 13,
DataType_FLOAT16 = 14,
DataType_DOUBLE = 15,
DataType_UINT32 = 16,
DataType_UINT64 = 17,
DataType_COMPLEX64 = 18,
DataType_COMPLEX128 = 19,
DataType_BFLOAT16 = 20,
DataType_FXP = 100 // AtoNN specific
} Buffer_DataType_TypeDef;
Now for the output of a YOLO Model, the information in the GitHub documentation must be followed:
For a model with a 256x256x3 input layer, the output format can be specified by calculating F.
In this case, N = M = 256.
(256/8)^2 + (256/16)^2 + (N/32)^2
1024 + 256 + 64
Which equals 1344, making F = 1344.
This information means that the output shape of the model will be (1, 5, 1344), you can confirm that your calculations are correct by checking the LL_ATON_DEFAULT_OUT_1_SIZE_BYTES define in network.h, for instance:
My model output values are in FLOAT format, equivalent to 4 bytes of data, which means that my output size will be 5 * 4 * 1344 = 26880, if your define in network.h matches your specific resulting value, your output shape preview is probably correct.
The values in the output follow the original YOLO output, as supposed in the original question:
output[0] = Center X
output[1] = Center Y
output[2] = Width
output[3] = Height
output[4] = Score
output[5] = Center X2
output[n] = ...
Question 2.
By default, the ST Edge AI Core will already allocate specific input/output buffers for you, but if you want to allocate them yourself, you must use the --no-inputs-allocation or --no-outputs-allocation.
My original question had mistakenly interpreted LL_ATON_Set_User_Input_Buffer() and the get output equivalent as a way to copy my input/output data to/from the buffers, but it turns out that if you don't have the flags when generating your project, you will end up with a version of these functions which just return LL_ATON_User_IO_WRONG_INDEX or a NULL Pointer:
LL_ATON_User_IO_Result_t LL_ATON_Set_User_Input_Buffer_Default(uint32_t num, void* buffer, uint32_t size)
{
{
return LL_ATON_User_IO_WRONG_INDEX;
}
}
void *LL_ATON_Get_User_Input_Buffer_Default(uint32_t num)
{
{
return NULL;
}
}
So yes, if you did not include those flags, the easiest way to use the NPU is to just use a memcpy to/from the input/output buffers, or you can just write/read the data in the specified addresses directly as long as your inference does not run in the same time as you are writing data from, for example, the camera module in STM32N6570-DK, or similar cases, did not test those cases but you would probably run into your application crashing or something similar by doing that.
const LL_Buffer_InfoTypeDef * ibuffersInfos = nn_instance->network->input_buffers_info();
uint8_t* buffer_in = (uint8_t *)LL_Buffer_addr_start(&ibuffersInfos[0]);
memcpy(buffer_in, pointer_to_input, LL_ATON_DEFAULT_IN_1_SIZE_BYTES);
Additionally, if you are using the cache (CACHEAXI) for the NPU, you must use the cache maintenance functions specified in the original documentation:
LL_ATON_Cache_NPU_Invalidate(); /* if NPU cache is used */
LL_ATON_Cache_MCU_Clean_Invalidate_Range(input_start_address, input_end_address - input_start_address);
LL_ATON_Cache_MCU_Invalidate_Range(output_start_address, output_end_address - output_start_address);
And if you are using your NPU in a isolated enviroment, such as a project separated into Application and FSBL contexts, you must configure your RIF to make sure the involved peripherals have the privelege to access data as they wish, or else your inference won't write/read anything at all. By the default generated code for SystemIsolation_Config():
/* set all required IPs as secure privileged */
__HAL_RCC_RIFSC_CLK_ENABLE();
RIMC_MasterConfig_t RIMC_master = {0};
RIMC_master.MasterCID = RIF_CID_1;
RIMC_master.SecPriv = RIF_ATTRIBUTE_SEC | RIF_ATTRIBUTE_PRIV;
/*RIMC configuration*/
HAL_RIF_RIMC_ConfigMasterAttributes(RIF_MASTER_INDEX_NPU, &RIMC_master);
/*RISUP configuration*/
HAL_RIF_RISC_SetSlaveSecureAttributes(RIF_RISC_PERIPH_INDEX_NPU , RIF_ATTRIBUTE_SEC | RIF_ATTRIBUTE_PRIV);
I hope this helps anyone who found similar issues, as always, my thanks to the ST Community and Team for the support I found both in this thread and other threads, have a good day! :D
2025-09-28 10:23 AM
Hi VitorWagner
This post has been escalated to the ST Online Support Team for additional assistance. We'll contact you directly.
Regards
Joe
STMicro Support
2025-10-13 8:00 AM - edited 2025-10-13 8:02 AM
By talking to the ST AI Team I have found and confirmed the answers to the questions above:
Question 1.
As supposed in the original text, the type of your input/output buffer can be checked by looking at the value in the Buffer_DataType_Typedef element of a LL_Buffer_Info_TypeDef, which follows the definition below:
typedef enum
{
DataType_UNDEFINED = 0,
DataType_FLOAT = 1,
DataType_UINT2 = 2,
DataType_INT2 = 3,
DataType_UINT4 = 4,
DataType_INT4 = 5,
DataType_UINT8 = 6,
DataType_INT8 = 7,
DataType_UINT16 = 8,
DataType_INT16 = 9,
DataType_INT32 = 10,
DataType_INT64 = 11,
DataType_STRING = 12,
DataType_BOOL = 13,
DataType_FLOAT16 = 14,
DataType_DOUBLE = 15,
DataType_UINT32 = 16,
DataType_UINT64 = 17,
DataType_COMPLEX64 = 18,
DataType_COMPLEX128 = 19,
DataType_BFLOAT16 = 20,
DataType_FXP = 100 // AtoNN specific
} Buffer_DataType_TypeDef;
Now for the output of a YOLO Model, the information in the GitHub documentation must be followed:
For a model with a 256x256x3 input layer, the output format can be specified by calculating F.
In this case, N = M = 256.
(256/8)^2 + (256/16)^2 + (N/32)^2
1024 + 256 + 64
Which equals 1344, making F = 1344.
This information means that the output shape of the model will be (1, 5, 1344), you can confirm that your calculations are correct by checking the LL_ATON_DEFAULT_OUT_1_SIZE_BYTES define in network.h, for instance:
My model output values are in FLOAT format, equivalent to 4 bytes of data, which means that my output size will be 5 * 4 * 1344 = 26880, if your define in network.h matches your specific resulting value, your output shape preview is probably correct.
The values in the output follow the original YOLO output, as supposed in the original question:
output[0] = Center X
output[1] = Center Y
output[2] = Width
output[3] = Height
output[4] = Score
output[5] = Center X2
output[n] = ...
Question 2.
By default, the ST Edge AI Core will already allocate specific input/output buffers for you, but if you want to allocate them yourself, you must use the --no-inputs-allocation or --no-outputs-allocation.
My original question had mistakenly interpreted LL_ATON_Set_User_Input_Buffer() and the get output equivalent as a way to copy my input/output data to/from the buffers, but it turns out that if you don't have the flags when generating your project, you will end up with a version of these functions which just return LL_ATON_User_IO_WRONG_INDEX or a NULL Pointer:
LL_ATON_User_IO_Result_t LL_ATON_Set_User_Input_Buffer_Default(uint32_t num, void* buffer, uint32_t size)
{
{
return LL_ATON_User_IO_WRONG_INDEX;
}
}
void *LL_ATON_Get_User_Input_Buffer_Default(uint32_t num)
{
{
return NULL;
}
}
So yes, if you did not include those flags, the easiest way to use the NPU is to just use a memcpy to/from the input/output buffers, or you can just write/read the data in the specified addresses directly as long as your inference does not run in the same time as you are writing data from, for example, the camera module in STM32N6570-DK, or similar cases, did not test those cases but you would probably run into your application crashing or something similar by doing that.
const LL_Buffer_InfoTypeDef * ibuffersInfos = nn_instance->network->input_buffers_info();
uint8_t* buffer_in = (uint8_t *)LL_Buffer_addr_start(&ibuffersInfos[0]);
memcpy(buffer_in, pointer_to_input, LL_ATON_DEFAULT_IN_1_SIZE_BYTES);
Additionally, if you are using the cache (CACHEAXI) for the NPU, you must use the cache maintenance functions specified in the original documentation:
LL_ATON_Cache_NPU_Invalidate(); /* if NPU cache is used */
LL_ATON_Cache_MCU_Clean_Invalidate_Range(input_start_address, input_end_address - input_start_address);
LL_ATON_Cache_MCU_Invalidate_Range(output_start_address, output_end_address - output_start_address);
And if you are using your NPU in a isolated enviroment, such as a project separated into Application and FSBL contexts, you must configure your RIF to make sure the involved peripherals have the privelege to access data as they wish, or else your inference won't write/read anything at all. By the default generated code for SystemIsolation_Config():
/* set all required IPs as secure privileged */
__HAL_RCC_RIFSC_CLK_ENABLE();
RIMC_MasterConfig_t RIMC_master = {0};
RIMC_master.MasterCID = RIF_CID_1;
RIMC_master.SecPriv = RIF_ATTRIBUTE_SEC | RIF_ATTRIBUTE_PRIV;
/*RIMC configuration*/
HAL_RIF_RIMC_ConfigMasterAttributes(RIF_MASTER_INDEX_NPU, &RIMC_master);
/*RISUP configuration*/
HAL_RIF_RISC_SetSlaveSecureAttributes(RIF_RISC_PERIPH_INDEX_NPU , RIF_ATTRIBUTE_SEC | RIF_ATTRIBUTE_PRIV);
I hope this helps anyone who found similar issues, as always, my thanks to the ST Community and Team for the support I found both in this thread and other threads, have a good day! :D