2026-02-05 12:26 AM - last edited on 2026-03-12 7:23 AM by Imen.D
Hello,
Based on my understanding, in this code
/* Run inference by executing epoch blocks until done */
LL_ATON_RT_RetValues_t result;
uint32_t total_run_ticks = 0u;
uint32_t total_wfe_ticks = 0u;
do {
uint32_t run_start = tim2_get_count();
result = LL_ATON_RT_RunEpochBlock(&NN_Instance_audio_network);
uint32_t run_end = tim2_get_count();
total_run_ticks += (uint32_t)(run_end - run_start);
if (result == LL_ATON_RT_WFE) {
uint32_t wfe_start = tim2_get_count();
LL_ATON_OSAL_WFE();
uint32_t wfe_end = tim2_get_count();
total_wfe_ticks += (uint32_t)(wfe_end - wfe_start);
}
} while (result != LL_ATON_RT_DONE);
uint32_t npu_end = tim2_get_count();
uint32_t total_us = tim2_ticks_to_us((uint32_t)(npu_end - npu_start));
uint32_t run_us = tim2_ticks_to_us(total_run_ticks);
uint32_t wfe_us = tim2_ticks_to_us(total_wfe_ticks);
LOGI_I32("NPU inference (us): ", (unsigned long)total_us);
LOGI("NPU epoch time (us): run=%u wfe=%u", run_us, wfe_us);
calling LL_ATON_OSAL_WFE() does not give time for other tasks to run while the epoch is executing, but instead just calls __WFE() and sleeps the core, as i don't see an implementation of aton_osal_freertos_wfe()?
With the Yamnet1024 model, the time is split like this (printed by the code above): NPU epoch time (us): run=10480 wfe=7660
What would be the best way to let other tasks execute while the epoch is running?
CubeMX version 6.16.1
X-CUBE-AI version 10.2.0
X-CUBE-FREERTOS version 1.4.0
Thanks!
Solved! Go to Solution.
2026-04-02 3:04 AM
Hello @Tuomas95 ,
In many ST Model Zoo examples, operator mapping and runtime integration are tuned for high NPU offload, so you typically observe less apparent CPU blocking; behavior still depends on model topology and project OSAL/RTOS integration.
But your understanding is mostly correct. In your setup, LL_ATON_OSAL_WFE appears to fall back to a generic wait path (typically __WFE), and there is no FreeRTOS-specific aton_osal_freertos_wfe implementation active. So, the inference loop is not using an RTOS-aware blocking primitive.
One important nuance: __WFE does not permanently block all task execution. Interrupts can wake the core and FreeRTOS can schedule. But this is still not the best way to share CPU time predictably during NPU epochs.
Best practice with FreeRTOS:
1. Implement an RTOS-aware OSAL wait function for ATON.
2. In the WFE path, block the inference task on a FreeRTOS primitive (task notification or semaphore), instead of plain __WFE.
3. Signal that primitive from the NPU/epoch ISR callback (FromISR API).
This gives deterministic task scheduling while the NPU is progressing and is the recommended way to let other tasks run during inference.
Overall, your run/wfe split is model-dependent. A model with better NPU offload will typically reduce CPU-side overhead.
Kind regards,
DHIF Khaled
2026-04-02 3:04 AM
Hello @Tuomas95 ,
In many ST Model Zoo examples, operator mapping and runtime integration are tuned for high NPU offload, so you typically observe less apparent CPU blocking; behavior still depends on model topology and project OSAL/RTOS integration.
But your understanding is mostly correct. In your setup, LL_ATON_OSAL_WFE appears to fall back to a generic wait path (typically __WFE), and there is no FreeRTOS-specific aton_osal_freertos_wfe implementation active. So, the inference loop is not using an RTOS-aware blocking primitive.
One important nuance: __WFE does not permanently block all task execution. Interrupts can wake the core and FreeRTOS can schedule. But this is still not the best way to share CPU time predictably during NPU epochs.
Best practice with FreeRTOS:
1. Implement an RTOS-aware OSAL wait function for ATON.
2. In the WFE path, block the inference task on a FreeRTOS primitive (task notification or semaphore), instead of plain __WFE.
3. Signal that primitive from the NPU/epoch ISR callback (FromISR API).
This gives deterministic task scheduling while the NPU is progressing and is the recommended way to let other tasks run during inference.
Overall, your run/wfe split is model-dependent. A model with better NPU offload will typically reduce CPU-side overhead.
Kind regards,
DHIF Khaled