Issue: NPU inference output remains identical for every run (STM32N6 / ATON / int8 CNN)

seokjs · ‎2026-01-06

1. Problem Description

I am running an int8 quantized 1D CNN model on STM32N6570 (NPU / ATON runtime).
The full pipeline is:

Modbus sensor input → preprocessing → NPU inference → postprocessing (softmax)

The issue is that the NPU output (net_out) is identical for every inference, even though:

Sensor values are updated periodically via Modbus
Preprocessing produces different x_quant values each time
NPU inference runs without error
Output buffer is properly invalidated after inference

Example output log (repeats every inference):

Despite different input values, the raw int8 output vector is always the same.

2. What has been confirmed (important)

1) Pipeline connectivity is correct

The full data flow is confirmed to be connected:

ModbusTask updates sensor_vals
update_timeseries_buffer(sensor_vals) is called
Preprocess_Data() runs successfully
x_quant changes every inference ([DEBUG] x_quant CHANGED)
x_quant is copied into net_in
NPU inference runs (Run_Epoch())
Output buffer is invalidated
Postprocess runs on updated buffer

So this is not a “stale input” or “skipped preprocessing” issue.

2) This was NOT a RAM isolation / power issue

Initially, NPU outputs were all zeros.
That issue was solved by explicitly disabling SRAM sleep:

RAMCFG_SRAM2_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM3_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM4_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM5_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM6_AXI->CR &= ~RAMCFG_CR_SRAMSD;

After this fix, valid non-zero outputs appear, so SRAM accessibility is confirmed.

3) Cache maintenance is handled correctly

The following cache operations are applied:

dcache_clean_by_addr(net_in, in_len);     // before NPU run
dcache_invalidate_by_addr(net_out, out_len); // after NPU run

This resolves previous “output always zero” behavior.

3. Model details

Input shape: (1, 50, 10) → flattened to 500 int8
Output: 5-class softmax (int8)
Architecture:

Conv1D(filters=8, kernel_size=5, padding="same", activation="relu")
GlobalAveragePooling1D
Dense(5, softmax)
Quantization: full int8
Output dequantization:

prob = (output - output_zero_point) * output_scale;

4. Observations

Input (x_quant) does change every run
Output buffer is overwritten (not stale / not 0x5A pattern)
Yet the output vector is numerically identical every inference
This happens even when sensor values change slightly (frequency deltas, etc.)

5. Questions

Is it expected behavior for ATON/NPU to produce identical outputs if:
- The model uses GlobalAveragePooling1D
- Most input features remain near zero
- Only small deltas change between time steps?
Is there any known issue with int8 softmax outputs saturating
(e.g., output stuck near max/min values) on STM32N6 NPU?
Are there additional required steps for:
- Resetting internal NPU state between inferences?
- Handling GAP layers in ATON runtime?
Could this be related to:
- ATON internal buffer reuse?
- Missing network reset timing?
- Unsupported layer behavior with int8 + GAP?

I have attached the preprocessing logic, inference logic, and postprocessing logic.
Despite all of this, the NPU output remains identical for every inference run.
Could you please help explain why the output is always the same, even though the input data clearly changes?

Thank you.

Julian E. · ‎2026-01-20

HI @seokjs,

Could you try to add a dcache_invalidate_by_addr(net_out, out_len); before the inference:

    /* 1) 입력 복사 */
    memcpy(net_in, x_quant, in_len);
    dcache_clean_by_addr(net_in, in_len);
    dcache_invalidate_by_addr(net_out, out_len);

    /* 2) NPU Run */
    Run_Epoch();

    /* 3) 출력 invalidate 후 읽기 */
    dcache_invalidate_by_addr(net_out, out_len);

Doing the invalidate after the epoch run is a good thing, but we wonder if inside the Run_epoch, the CPU could possibly still look at the cache that is not invalidated. Adding this invalidate could help.

Let me know if this helps.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.