cancel
Showing results for 
Search instead for 
Did you mean: 

STM32N6 FreeRTOS LL_ATON_OSAL_WFE

Tuomas95
Associate III

Hello,

Based on my understanding, in this code

  /* Run inference by executing epoch blocks until done */
  LL_ATON_RT_RetValues_t result;
  uint32_t total_run_ticks = 0u;
  uint32_t total_wfe_ticks = 0u;
  do {
    uint32_t run_start = tim2_get_count();
    result = LL_ATON_RT_RunEpochBlock(&NN_Instance_audio_network);
    uint32_t run_end = tim2_get_count();
    total_run_ticks += (uint32_t)(run_end - run_start);
    if (result == LL_ATON_RT_WFE) {
      uint32_t wfe_start = tim2_get_count();
      LL_ATON_OSAL_WFE();
      uint32_t wfe_end = tim2_get_count();
      total_wfe_ticks += (uint32_t)(wfe_end - wfe_start);
    }
  } while (result != LL_ATON_RT_DONE);
  uint32_t npu_end = tim2_get_count();
  uint32_t total_us = tim2_ticks_to_us((uint32_t)(npu_end - npu_start));
  uint32_t run_us = tim2_ticks_to_us(total_run_ticks);
  uint32_t wfe_us = tim2_ticks_to_us(total_wfe_ticks);
  LOGI_I32("NPU inference (us): ", (unsigned long)total_us);
  LOGI("NPU epoch time (us): run=%u wfe=%u", run_us, wfe_us);

calling LL_ATON_OSAL_WFE() does not give time for other tasks to run while the epoch is executing, but instead just calls __WFE() and sleeps the core, as i don't see an implementation of aton_osal_freertos_wfe()?


With the Yamnet1024 model, the time is split like this (printed by the code above): NPU epoch time (us): run=10480 wfe=7660

What would be the best way to let other tasks execute while the epoch is running?

CubeMX version 6.16.1

X-CUBE-AI version 10.2.0

X-CUBE-FREERTOS version 1.4.0

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Khaled_DHIF
ST Employee

Hello @Tuomas95 ,

In many ST Model Zoo examples, operator mapping and runtime integration are tuned for high NPU offload, so you typically observe less apparent CPU blocking; behavior still depends on model topology and project OSAL/RTOS integration.

But your understanding is mostly correct. In your setup, LL_ATON_OSAL_WFE appears to fall back to a generic wait path (typically __WFE), and there is no FreeRTOS-specific aton_osal_freertos_wfe implementation active. So, the inference loop is not using an RTOS-aware blocking primitive.

One important nuance: __WFE does not permanently block all task execution. Interrupts can wake the core and FreeRTOS can schedule. But this is still not the best way to share CPU time predictably during NPU epochs.

Best practice with FreeRTOS:
1. Implement an RTOS-aware OSAL wait function for ATON.
2. In the WFE path, block the inference task on a FreeRTOS primitive (task notification or semaphore), instead of plain __WFE.
3. Signal that primitive from the NPU/epoch ISR callback (FromISR API).

This gives deterministic task scheduling while the NPU is progressing and is the recommended way to let other tasks run during inference.

Overall, your run/wfe split is model-dependent. A model with better NPU offload will typically reduce CPU-side overhead.

Kind regards, 

DHIF Khaled

Please mark my answer as best by clicking on the “Accept as solution" button if it fully answered your question. This will help other users find this solution faster.​

View solution in original post

1 REPLY 1
Khaled_DHIF
ST Employee

Hello @Tuomas95 ,

In many ST Model Zoo examples, operator mapping and runtime integration are tuned for high NPU offload, so you typically observe less apparent CPU blocking; behavior still depends on model topology and project OSAL/RTOS integration.

But your understanding is mostly correct. In your setup, LL_ATON_OSAL_WFE appears to fall back to a generic wait path (typically __WFE), and there is no FreeRTOS-specific aton_osal_freertos_wfe implementation active. So, the inference loop is not using an RTOS-aware blocking primitive.

One important nuance: __WFE does not permanently block all task execution. Interrupts can wake the core and FreeRTOS can schedule. But this is still not the best way to share CPU time predictably during NPU epochs.

Best practice with FreeRTOS:
1. Implement an RTOS-aware OSAL wait function for ATON.
2. In the WFE path, block the inference task on a FreeRTOS primitive (task notification or semaphore), instead of plain __WFE.
3. Signal that primitive from the NPU/epoch ISR callback (FromISR API).

This gives deterministic task scheduling while the NPU is progressing and is the recommended way to let other tasks run during inference.

Overall, your run/wfe split is model-dependent. A model with better NPU offload will typically reduce CPU-side overhead.

Kind regards, 

DHIF Khaled

Please mark my answer as best by clicking on the “Accept as solution" button if it fully answered your question. This will help other users find this solution faster.​