2026-01-06 5:46 PM
I am running an int8 quantized 1D CNN model on STM32N6570 (NPU / ATON runtime).
The full pipeline is:
Modbus sensor input → preprocessing → NPU inference → postprocessing (softmax)
The issue is that the NPU output (net_out) is identical for every inference, even though:
Sensor values are updated periodically via Modbus
Preprocessing produces different x_quant values each time
NPU inference runs without error
Output buffer is properly invalidated after inference
Example output log (repeats every inference):
Despite different input values, the raw int8 output vector is always the same.
The full data flow is confirmed to be connected:
ModbusTask updates sensor_vals
update_timeseries_buffer(sensor_vals) is called
Preprocess_Data() runs successfully
x_quant changes every inference ([DEBUG] x_quant CHANGED)
x_quant is copied into net_in
NPU inference runs (Run_Epoch())
Output buffer is invalidated
Postprocess runs on updated buffer
So this is not a “stale input” or “skipped preprocessing” issue.
Initially, NPU outputs were all zeros.
That issue was solved by explicitly disabling SRAM sleep:
RAMCFG_SRAM2_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM3_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM4_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM5_AXI->CR &= ~RAMCFG_CR_SRAMSD;
RAMCFG_SRAM6_AXI->CR &= ~RAMCFG_CR_SRAMSD;After this fix, valid non-zero outputs appear, so SRAM accessibility is confirmed.
The following cache operations are applied:
dcache_clean_by_addr(net_in, in_len); // before NPU run
dcache_invalidate_by_addr(net_out, out_len); // after NPU runThis resolves previous “output always zero” behavior.
Input shape: (1, 50, 10) → flattened to 500 int8
Output: 5-class softmax (int8)
Architecture:
Quantization: full int8
Output dequantization:
Input (x_quant) does change every run
Output buffer is overwritten (not stale / not 0x5A pattern)
Yet the output vector is numerically identical every inference
This happens even when sensor values change slightly (frequency deltas, etc.)
Is it expected behavior for ATON/NPU to produce identical outputs if:
The model uses GlobalAveragePooling1D
Most input features remain near zero
Only small deltas change between time steps?
Is there any known issue with int8 softmax outputs saturating
(e.g., output stuck near max/min values) on STM32N6 NPU?
Are there additional required steps for:
Resetting internal NPU state between inferences?
Handling GAP layers in ATON runtime?
Could this be related to:
ATON internal buffer reuse?
Missing network reset timing?
Unsupported layer behavior with int8 + GAP?
I have attached the preprocessing logic, inference logic, and postprocessing logic.
Despite all of this, the NPU output remains identical for every inference run.
Could you please help explain why the output is always the same, even though the input data clearly changes?
Thank you.