2026-02-03 9:46 AM
Hello,
I’m working on a project using the STM32N6 NPU, and I’m consistently encountering a HardFault during model inference.
To set up my project, I followed this tutorial:
https://community.st.com/t5/stm32-mcus/how-to-build-an-ai-application-from-scratch-on-the-nucleo-n657x0/ta-p/828502
I also used a TFLite model from the Model Zoo on GitHub (I tested several models from the zoo as well as my own custom models).
Using the debugger, I was able to pinpoint that the crash occurs on the following line in network.c:
LL_Arithacc_Init(2, &Conv2D_7_mul_scale_4_init3);
More specifically, the fault happens inside ll_aton.c at:
uint32_t t; uint32_t A = Ap != NULL ? LL_ATON_getbits(Ap, bitcnt_A, nbits_A) : (uint32_t)conf->A_scalar;
The convolution configuration generated in network.c looks like this:
static const LL_Arithacc_InitTypeDef Conv2D_7_mul_scale_4_init3 = {
.rounding_x = 0,
.saturation_x = 0,
.round_mode_x = 0,
.inbytes_x = 2,
.outbytes_x = 2,
.shift_x = 0,
.rounding_y = 0,
.saturation_y = 0,
.round_mode_y = 0,
.inbytes_y = 2,
.outbytes_y = 2,
.combinebc = 0,
.clipout = 0,
.shift_y = 0,
.rounding_o = 1,
.saturation_o = 1,
.round_mode_o = 1,
.relu_mode_o = 0,
.outbytes_o = 2,
.shift_o = 16,
.scalar = 0,
.dualinput = 0,
.operation = ARITH_AFFINE,
.bcast = ARITH_BCAST_CHAN,
.Ax_shift = 0,
.By_shift = 0,
.C_shift = 0,
.fWidth = 224,
.fHeight = 224,
.fChannels = 16,
.batchDepth = 8,
.clipmin = 0,
.clipmax = 0,
.A_scalar = 1,
.B_scalar = 0,
.C_scalar = 0,
.A_vector = {((unsigned char *)(ATON_LIB_PHYSICAL_TO_VIRTUAL_ADDR(0x71000000UL + 11064256)))},
.B_vector = {0},
.C_vector = {0},
.vec_precision = {16, 16, 16},
};Initially, I flashed the FSBL and the application using the programmer. Later, I noticed that I could load the application directly into RAM using the debugger. I also enabled RIF for the application in CubeMX, and I still get the same HardFault when flashing the application normally.
My questions are:
2026-02-20 2:14 AM - edited 2026-02-20 5:54 AM
Hi @Hugi_,
I would say that it is most likely due to a wrong generation of cubeMX. I don't think it comes from your last question point.
The tutorial you linked was to help user work with AI and CubeMX with the N6 as no getting started or example contains a ioc.
X Cube AI is no longer being updated, it was replaced with STM32Cube AI Studio, a desktop app:
https://community.st.com/t5/developer-news/introducing-stm32cubeai-studio/ba-p/876445
Everything related to the N6 and X Cube AI was tricky and not really used internally (which explains why you don't find ioc). But now, the generated project in STM32CubeAI Studio should follow a more standard generation procedure (it uses cubeMX as any other STM32).
We plan to document how this generated project (the output of STM32CubeAI Studio) is being generated to replace the tutorial you linked.
This tutorial is right, but very difficult to follow because some randomness, or at least a single small mistake can make it produce a project that do not work.
Have a good day,
Julian