cancel
Showing results for 
Search instead for 
Did you mean: 

HardFault during NPU Inference on STM32N6: ATON Vector Access Failure

Hugi_
Associate

Hello,

I’m working on a project using the STM32N6 NPU, and I’m consistently encountering a HardFault during model inference.

To set up my project, I followed this tutorial:
https://community.st.com/t5/stm32-mcus/how-to-build-an-ai-application-from-scratch-on-the-nucleo-n657x0/ta-p/828502

I also used a TFLite model from the Model Zoo on GitHub (I tested several models from the zoo as well as my own custom models).

Using the debugger, I was able to pinpoint that the crash occurs on the following line in network.c:

LL_Arithacc_Init(2, &Conv2D_7_mul_scale_4_init3);

More specifically, the fault happens inside ll_aton.c at:

uint32_t t;
uint32_t A = Ap != NULL ? LL_ATON_getbits(Ap, bitcnt_A, nbits_A) : (uint32_t)conf->A_scalar;

The convolution configuration generated in network.c looks like this:

static const LL_Arithacc_InitTypeDef Conv2D_7_mul_scale_4_init3 = {
    .rounding_x = 0,
    .saturation_x = 0,
    .round_mode_x = 0,
    .inbytes_x = 2,
    .outbytes_x = 2,
    .shift_x = 0,
    .rounding_y = 0,
    .saturation_y = 0,
    .round_mode_y = 0,
    .inbytes_y = 2,
    .outbytes_y = 2,
    .combinebc = 0,
    .clipout = 0,
    .shift_y = 0,
    .rounding_o = 1,
    .saturation_o = 1,
    .round_mode_o = 1,
    .relu_mode_o = 0,
    .outbytes_o = 2,
    .shift_o = 16,
    .scalar = 0,
    .dualinput = 0,
    .operation = ARITH_AFFINE,
    .bcast = ARITH_BCAST_CHAN,
    .Ax_shift = 0,
    .By_shift = 0,
    .C_shift = 0,
    .fWidth = 224,
    .fHeight = 224,
    .fChannels = 16,
    .batchDepth = 8,
    .clipmin = 0,
    .clipmax = 0,
    .A_scalar = 1,
    .B_scalar = 0,
    .C_scalar = 0,
    .A_vector = {((unsigned char *)(ATON_LIB_PHYSICAL_TO_VIRTUAL_ADDR(0x71000000UL + 11064256)))},
    .B_vector = {0},
    .C_vector = {0},
    .vec_precision = {16, 16, 16},
};

Initially, I flashed the FSBL and the application using the programmer. Later, I noticed that I could load the application directly into RAM using the debugger. I also enabled RIF for the application in CubeMX, and I still get the same HardFault when flashing the application normally.

My questions are:

  • Could this issue be caused by loading the application into RAM instead of flashing it?
  • Is there a missing RIF configuration step in the tutorial that would allow the CPU to access this memory region?
  • Or does this indicate a model generation issue (e.g., invalid ATON address, misaligned vector, unsupported layer configuration, etc.)?
1 REPLY 1
Julian E.
ST Employee

Hi @Hugi_,

 

I would say that it is most likely due to a wrong generation of cubeMX. I don't think it comes from your last question point.

 

The tutorial you linked was to help user work with AI and CubeMX with the N6 as no getting started or example contains a ioc.

 

X Cube AI is no longer being updated, it was replaced with STM32Cube AI Studio, a desktop app:

https://community.st.com/t5/developer-news/introducing-stm32cubeai-studio/ba-p/876445 

 

Everything related to the N6 and X Cube AI was tricky and not really used internally (which explains why you don't find ioc). But now, the generated project in STM32CubeAI Studio should follow a more standard generation procedure (it uses cubeMX as any other STM32).

 

We plan to document how this generated project (the output of STM32CubeAI Studio) is being generated to replace the tutorial you linked.

This tutorial is right, but very difficult to follow because some randomness, or at least a single small mistake can make it produce a project that do not work.

 

Have a good day,

Julian


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.