Get the same result for all samples from the NPU of NUCLEO-N657X0-Q

autoome · ‎2026-05-05

Hello ST Community,

I am trying to run a TFLite model on the Neural-ART NPU of the NUCLEO-N657X0-Q board using STM32CubeIDE + STM32CubeMX and STM32Cube AI Studio. I followed the guide here: https://community.st.com/t5/stm32-mcus/how-to-build-an-ai-application-from-scratch-on-the-nucleo-n657x0/ta-p/828502 but with my own TFLite model.

Model details:

Architecture: 5x FullyConnected layers (41→32→32→16→8→1) with ReLU activations
Input: int8[1,41], scale=0.0717, zero_point=-24
Output: int8[1,1], scale=0.0354, zero_point=1
Task: binary classification

Problem: I have 31 test samples. Every sample produces the same output value 53 regardless of input. I verified the inputs are correctly quantized and different for each sample.

What I investigated: Through per-epoch debugging I found the following epoch flow:

EP0 (HW): reads input
EP1 (HW): continuation
EP2 (SW): DequantizeLinear
EP3 (SW): Conv float → writes correct floats

EP4 (hybrid): outputs ALL ZEROS

EP5 (HW): reads zeros → always produces same output 53

I ran the TFLite model and same dataset on STM32 Cube AI Studio, and still got the same output value 53 regardless of input.

This is my inference code:

int aiRun(void) {
    LL_ATON_RT_RetValues_t ret = LL_ATON_RT_DONE;
    LL_ATON_RT_Reset_Network(&NN_Instance_network);
    LL_ATON_Set_User_Input_Buffer_network(0, stai_input_data, 41);
    LL_ATON_Set_User_Output_Buffer_network(0, stai_output_data, 1);
    SCB_CleanDCache_by_Addr((uint32_t*)stai_input_data, 64);
    SCB_InvalidateDCache_by_Addr((uint32_t*)stai_output_data, 64);
    do {
        ret = LL_ATON_RT_RunEpochBlock(&NN_Instance_network);
        if (ret == LL_ATON_RT_WFE)
            LL_ATON_OSAL_WFE();
    } while (ret != LL_ATON_RT_DONE);
    SCB_InvalidateDCache_by_Addr((uint32_t*)stai_output_data, 64);
    return 0;
}

Environment:

Board: NUCLEO-N657X0-Q
ST Edge AI Studio: v4.0.0
OS: Ubuntu 24

Thank you very much for any advice in advance!!

Julian E. · ‎2026-05-05

Hi @autoome,

First of all, we are working on updating this tutorial. Hopefully, it will come out soon.

Then, when doing a "validate on target" on STM32CubeAI Studio, you can download the output in the table with the metrics. Are all the outputs the same?

It would help us understand if this is an issue at the model level or at the embedded code level.

You can also download an application template from STM32Cube AI Studio, this could help you.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

autoome · ‎2026-05-05

Hi @Julian E.

Thank you very much for your reply!

Yes when I tried the "validate on target" on AI Studio, I got the same output for every run, here's network_val_c_outputs.csv:

And here's network_val_m_outputs.csv:

Thank you very much for the help!

Julian E. · ‎2026-05-05

Hi @autoome,

Ok, so it seems that the issue is coming from the converted model and not your implementation.

Maybe a bug from the ST Edge AI Core.

Could you do 2 "validate on target", one with the NPU, one without it and send me the metrics you get, in particular the COS.

Doing it with and without the NPU will help us see if the issue comes from the NPU compiler part. If you get a good cos without the NPU but a bad one with it for example.

If you get bad cos in both cases, then it is another story.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.