cancel
Showing results for 
Search instead for 
Did you mean: 

STM32N6 NPU inference stuck at LL_ATON_RT_RunEpochBlock (no IRQ fired)

seokjs
Associate II

안녕하세요,

현재 STM32N657-DK 와 X-CUBE-AI(ST Edge AI Core v2.2.0)를 사용하고 있습니다 .
TensorFlow Lite 모델을 NPU에서 실행되도록 변환했으며, 펌웨어를 성공적으로 빌드하고 플래싱할 수 있습니다.

seokjs_0-1758778760868.png

하지만 내가 전화하면:

 ret = LL_ATON_RT_RunEpochBlock(&NN_Instance_Default);

함수가 결코 반환되지 않습니다.

내가 이미 구성한 것

  1. 시계 및 재설정

    • NPU 클럭을 활성화하고 재설정을 해제했습니다.

       
      __HAL_RCC_NPU_CLK_ENABLE(); __HAL_RCC_NPU_FORCE_RESET(); __HAL_RCC_NPU_RELEASE_RESET();
    • set_clk_sleep_mode()에서도 슬립 모드 클록이 활성화됩니다.

  2. 인터럽트 라우팅

    • 보안 프로젝트에서:

       
      NVIC_DisableIRQ(NPU3_IRQn); NVIC_ClearPendingIRQ(NPU3_IRQn); NVIC_SetTargetState(NPU3_IRQn); // 비보안으로 경로 지정
    • NonSecure 프로젝트에서:

       
      HAL_NVIC_SetPriority(NPU3_IRQn, 0 , 0 ); HAL_NVIC_EnableIRQ(NPU3_IRQn); void NPU3_IRQHandler ( void ) { printf ( ">> NPU IRQ가 실행됨\r\n" ); ATON_STD_IRQHandler(); }
  3. RIF / RISAF 구성

    • NPU 마스터/슬레이브 속성을 비보안 + 특권으로 구성했습니다.

    • NonSecure에서 NPU RAM3~RAM6(0x3420_0000~0x343C_0000)에 액세스할 수 있도록 RISAF 구성을 추가했습니다.

       
      RISAF_ConfigRegion( 3 , 0x34200000 , 0x70000 , RISAF_ATTR_비보안 | RISAF_ATTR_PRIV); RISAF_ConfigRegion( 4 , 0x34270000 , 0x70000 , RISAF_ATTR_비보안 | RISAF_ATTR_PRIV); RISAF_ConfigRegion( 5 , 0x342E0000 , 0x70000 , RISAF_ATTR_비보안 | RISAF_ATTR_PRIV); RISAF_ConfigRegion( 6 , 0x34350000 , 0x70000 , RISAF_ATTR_비보안 | RISAF_ATTR_PRIV);
  4. 활성화 버퍼

    • 32바이트 정렬로 .noncacheable 섹션에 선언되었습니다.


문제

위의 모든 구성에도 불구하고:

  • LL_ATON_RT_RunEpochBlock()이 끝나지 않습니다.

  • ret은 LL_ATON_RT_DONE에 도달하지 않습니다.

  • NPU IRQ(NPU3_IRQn)가 트리거되지 않는 것 같습니다.


질문

  1. NPU가 비보안 환경에서 IRQ를 생성할 수 있도록 하려면 추가적인 RISAF 또는 RIF ​​구성이 필요합니까?

  2. Epoch Controller 인터럽트를 활성화하려면 ATON_INTCTRL 레지스터(예: ATON_INTCTRL_CTRL_SET_EN, ATON_INTCTRL_INTORMSK0_SET)를 명시적으로 구성해야 합니까 ? 아니면 X-CUBE-AI가 이를 자동으로 처리해야 합니까?

  3. 이 문제가 NPU 메모리 영역 속성 (캐시 가능 대 캐시 불가능) 과 관련이 있을 수 있나요 ? 그렇다면 권장되는 구성은 무엇인가요?

  4. 보안/비보안 TrustZone 프로젝트에서 STM32N6 NPU 인터럽트 라우팅(NPU3_IRQn) 과 관련하여 알려진 문제가 있습니까 ?


응원해주셔서 감사합니다.
감사합니다.
[seokjs]

 

12 REPLIES 12
seokjs
Associate II

@PedroDeOliveira 

Please help me......

Imen.D
ST Employee

Hello @seokjs ,

Please try to write in English because most of the people on this community can speak English but not Korean.
Please follow the posting Tips in this article: How to write your question to maximize your chances to find a solution, for how to properly post and insert source code.

When your question is answered, please close this topic by clicking "Accept as Solution".
Thanks
Imen
seokjs
Associate II

Hello,

I am currently using the STM32N657-DK board with X-CUBE-AI (ST Edge AI Core v2.2.0).
I have successfully converted a TensorFlow Lite model to run on the NPU and can build and flash the firmware without issues.

seokjs_0-1759100975297.png

 

Problem
When I call the function

ret = LL_ATON_RT_RunEpochBlock(&NN_Instance_Default);

it never returns.
The variable ret never reaches LL_ATON_RT_DONE, and it seems that the NPU interrupt (NPU3_IRQn) is not triggered.

Current configuration

Clock and Reset

  • Enabled the NPU clock and released the reset:

     
    __HAL_RCC_NPU_CLK_ENABLE(); __HAL_RCC_NPU_FORCE_RESET(); __HAL_RCC_NPU_RELEASE_RESET();
     
  • In set_clk_sleep_mode(), the NPU sleep-mode clock is also enabled.

Interrupt routing

  • In the Secure project:

     

     
    NVIC_DisableIRQ(NPU3_IRQn);
    NVIC_ClearPendingIRQ(NPU3_IRQn);
    NVIC_SetTargetState(NPU3_IRQn); // route to NonSecure
     
  • In the NonSecure project:

     

     
     
    HAL_NVIC_SetPriority(NPU3_IRQn, 0, 0);
    HAL_NVIC_EnableIRQ(NPU3_IRQn);
     
    void NPU3_IRQHandler(void)
    {
      printf(">> NPU IRQ triggered\r\n");
      ATON_STD_IRQHandler();
    }
     

RIF / RISAF configuration

  • Configured NPU master/slave attributes to NonSecure + privileged.

  • Added RISAF regions so that NonSecure code can access NPU RAM3–RAM6 (0x3420_0000–0x343C_0000):

     
    RISAF_ConfigRegion(3, 0x34200000, 0x70000, RISAF_ATTR_NONSECURE | RISAF_ATTR_PRIV); RISAF_ConfigRegion(4, 0x34270000, 0x70000, RISAF_ATTR_NONSECURE | RISAF_ATTR_PRIV); RISAF_ConfigRegion(5, 0x342E0000, 0x70000, RISAF_ATTR_NONSECURE | RISAF_ATTR_PRIV); RISAF_ConfigRegion(6, 0x34350000, 0x70000, RISAF_ATTR_NONSECURE | RISAF_ATTR_PRIV);

Activation buffer

  • Declared in the .noncacheable section with 32-byte alignment.

Questions

  1. Is any additional RISAF or RIF configuration required to allow the NPU to generate interrupts in a NonSecure environment?

  2. To enable Epoch Controller interrupts, do I need to explicitly configure ATON_INTCTRL registers (e.g. ATON_INTCTRL_CTRL_SET_EN, ATON_INTCTRL_INTORMSK0_SET), or should X-CUBE-AI handle this automatically?

  3. Could this issue be related to the memory attributes of the NPU region (cacheable vs. non-cacheable)? If so, what configuration is recommended?

  4. Are there any known issues with NPU interrupt routing (NPU3_IRQn) in Secure/NonSecure TrustZone projects on STM32N6?

Thank you very much for your support.

I am attaching all the files I have modified so far.

@Imen.D 

I have updated the post in English with the revised files.
Please help me resolve the issue.

VitorWagner
Associate II

Greetings @seokjs,

I ran into a similar problem, I was triggering my inference with a interruption, but other interruptions, with higher priority, were being called at the same time my inference was running, cutting the process in half, I see you have already checked the NPU NVIC Priority level and trigger, but if your project has a similar behavior to the one I described, I would suggest checking your NVIC Priorities, by properly setting them up in my project I have been able to execute an inference without any major issues.

The model you used was originated from the Model Zoo or did you convert it yourself to a .tflite and then used the X-CUBE-AI package to convert it into a network.c file? Maybe by using a custom model you could have ran into a generation error in the network.c, this is probably a stretch but maybe you could look into the layers in the network.c file and see if you have a final output layer, same goes for network.h which usually has a named last layer.

 

Hello, @VitorWagner 

and thank you very much for the helpful information. I will proceed as you suggested.

After reading your advice, I do have one question.
The model I am using is not from the Model Zoo; I converted it myself from ONNX to INT8 TFLite. I then imported this model into CubeMX, ran the Analyze function to confirm that NPU acceleration was supported, and generated a project in CubeIDE to continue my work.

My question is regarding the network.c / network.h files.
I am currently using the files generated automatically through CubeMX. Is this the correct approach?

Alternatively, I noticed that network files can also be generated via the stedgeai command-line tool. Should I instead be using the network files created by this CLI tool? From what I understand, the files generated by stedgeai are meant to run on the CPU, which is why I did not use them in my project, even though I tested generating them.

Could you please clarify whether I should continue using the CubeMX-generated network.c/h files for NPU execution, or if the CLI-generated files are required?

Thank you again for your guidance.

Hello @seokjs,

 

X Cube AI is mainly a GUI in cubeMX, but it uses the ST Edge AI Core in background.

 

For any MCU, the command to generate a model is something like:

stedgeai generate --model my_model.onnx --target stm32

 

In the case of the N6, you can use the npu by adding the option --st-neural-art:

# generation on N6 for MCU only
stedgeai generate --model my_model.onnx --target stm32n6

# generation on N6 with NPU this time
stedgeai generate --model my_model.onnx --target stm32n6 --st-neural-art

 

So, all the functionalities of X Cube AI use the Core. X Cube AI provide template of applications that use the generated files. Otherwise, the core and X Cube AI are the same thing.

 

It is the same idea for the ST Developer Cloud and the Model Zoo (deployment part). Every time you see the model in C, the core was used. The only exception is NanoEdge AI Studio, that uses its own model for now.

 

Doc st edge ai core: https://stedgeai-dc.st.com/assets/embedded-docs/index.html 

 

Have a good day,

Julian


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Hello @Julian E. 

Thank you again for your detailed explanation.
I just have one more question to clarify.

If I upload my model to CubeMX and let it generate the network.c and network.h files, is it fine to use those files directly for my project? Or should I always generate them separately with the stedgeai CLI command?

Have a good day,

seokjs

Hello @seokjs,

 

It is literally the same thing in the end.

 

I would say that if you start your project from scratch using X Cube AI helps you more because you can use the application template.

The core will only output the c. and .h file, so it is better to use it if you already have a project and want to add the AI part.

 

But again, the generated files are exactly the same thing, they always come from the core.

Have a good day,
Julian


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.