2026-02-03 9:46 AM
Hello,
I’m working on a project using the STM32N6 NPU, and I’m consistently encountering a HardFault during model inference.
To set up my project, I followed this tutorial:
https://community.st.com/t5/stm32-mcus/how-to-build-an-ai-application-from-scratch-on-the-nucleo-n657x0/ta-p/828502
I also used a TFLite model from the Model Zoo on GitHub (I tested several models from the zoo as well as my own custom models).
Using the debugger, I was able to pinpoint that the crash occurs on the following line in network.c:
LL_Arithacc_Init(2, &Conv2D_7_mul_scale_4_init3);
More specifically, the fault happens inside ll_aton.c at:
uint32_t t; uint32_t A = Ap != NULL ? LL_ATON_getbits(Ap, bitcnt_A, nbits_A) : (uint32_t)conf->A_scalar;
The convolution configuration generated in network.c looks like this:
static const LL_Arithacc_InitTypeDef Conv2D_7_mul_scale_4_init3 = {
.rounding_x = 0,
.saturation_x = 0,
.round_mode_x = 0,
.inbytes_x = 2,
.outbytes_x = 2,
.shift_x = 0,
.rounding_y = 0,
.saturation_y = 0,
.round_mode_y = 0,
.inbytes_y = 2,
.outbytes_y = 2,
.combinebc = 0,
.clipout = 0,
.shift_y = 0,
.rounding_o = 1,
.saturation_o = 1,
.round_mode_o = 1,
.relu_mode_o = 0,
.outbytes_o = 2,
.shift_o = 16,
.scalar = 0,
.dualinput = 0,
.operation = ARITH_AFFINE,
.bcast = ARITH_BCAST_CHAN,
.Ax_shift = 0,
.By_shift = 0,
.C_shift = 0,
.fWidth = 224,
.fHeight = 224,
.fChannels = 16,
.batchDepth = 8,
.clipmin = 0,
.clipmax = 0,
.A_scalar = 1,
.B_scalar = 0,
.C_scalar = 0,
.A_vector = {((unsigned char *)(ATON_LIB_PHYSICAL_TO_VIRTUAL_ADDR(0x71000000UL + 11064256)))},
.B_vector = {0},
.C_vector = {0},
.vec_precision = {16, 16, 16},
};Initially, I flashed the FSBL and the application using the programmer. Later, I noticed that I could load the application directly into RAM using the debugger. I also enabled RIF for the application in CubeMX, and I still get the same HardFault when flashing the application normally.
My questions are: