cancel
Showing results for 
Search instead for 
Did you mean: 

Issue: NPU inference output remains identical for every run (STM32N6 / ATON / int8 CNN)

seokjs
Associate II

1. Problem Description

I am running an int8 quantized 1D CNN model on STM32N6570 (NPU / ATON runtime).
The full pipeline is:

Modbus sensor input → preprocessing → NPU inference → postprocessing (softmax)

The issue is that the NPU output (net_out) is identical for every inference, even though:

  • Sensor values are updated periodically via Modbus

  • Preprocessing produces different x_quant values each time

  • NPU inference runs without error

  • Output buffer is properly invalidated after inference

Example output log (repeats every inference):

seokjs_0-1767749775757.png

Despite different input values, the raw int8 output vector is always the same.

2. What has been confirmed (important)

1) Pipeline connectivity is correct

The full data flow is confirmed to be connected:

  • ModbusTask updates sensor_vals

  • update_timeseries_buffer(sensor_vals) is called

  • Preprocess_Data() runs successfully

  • x_quant changes every inference ([DEBUG] x_quant CHANGED)

  • x_quant is copied into net_in

  • NPU inference runs (Run_Epoch())

  • Output buffer is invalidated

  • Postprocess runs on updated buffer

So this is not a “stale input” or “skipped preprocessing” issue.

2) This was NOT a RAM isolation / power issue

Initially, NPU outputs were all zeros.
That issue was solved by explicitly disabling SRAM sleep:

RAMCFG_SRAM2_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM3_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM4_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM5_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM6_AXI->CR &= ~RAMCFG_CR_SRAMSD;

After this fix, valid non-zero outputs appear, so SRAM accessibility is confirmed.

3) Cache maintenance is handled correctly

The following cache operations are applied:

dcache_clean_by_addr(net_in, in_len);     // before NPU run
dcache_invalidate_by_addr(net_out, out_len); // after NPU run

This resolves previous “output always zero” behavior.

3. Model details

  • Input shape: (1, 50, 10) → flattened to 500 int8

  • Output: 5-class softmax (int8)

  • Architecture:

     
    Conv1D(filters=8, kernel_size=5, padding="same", activation="relu")
    GlobalAveragePooling1D
    Dense(5, softmax)
  • Quantization: full int8

  • Output dequantization:

     
    prob = (output - output_zero_point) * output_scale;

4. Observations

  • Input (x_quant) does change every run

  • Output buffer is overwritten (not stale / not 0x5A pattern)

  • Yet the output vector is numerically identical every inference

  • This happens even when sensor values change slightly (frequency deltas, etc.)


5. Questions

  1. Is it expected behavior for ATON/NPU to produce identical outputs if:

    • The model uses GlobalAveragePooling1D

    • Most input features remain near zero

    • Only small deltas change between time steps?

  2. Is there any known issue with int8 softmax outputs saturating
    (e.g., output stuck near max/min values) on STM32N6 NPU?

  3. Are there additional required steps for:

    • Resetting internal NPU state between inferences?

    • Handling GAP layers in ATON runtime?

  4. Could this be related to:

    • ATON internal buffer reuse?

    • Missing network reset timing?

    • Unsupported layer behavior with int8 + GAP?

I have attached the preprocessing logic, inference logic, and postprocessing logic.
Despite all of this, the NPU output remains identical for every inference run.
Could you please help explain why the output is always the same, even though the input data clearly changes?

 

Thank you.

 

0 REPLIES 0