2025-10-15 10:38 PM
Hi,
I'm trying to deploy a quantized ONNX neural network on a STM32N6-Nucleo and inference on board returns clipped values (see clipped-output.png).
Previously, I've exported and quantized my torch neural network to onnx using the following configuration (the same is used in this end-to-end notebook example)
torch.onnx.export(
model,
inputs,
str(onnx_path),
input_names=input_names,
output_names=output_names,
opset_version=15,
export_params=True,
do_constant_folding=True,
)
quantize_static(
str(onnx_path),
str(int8_path),
CalibReader(str(calib_path)),
calibrate_method=CalibrationMethod.MinMax,
quant_format=QuantFormat.QDQ,
per_channel=True,
weight_type=QuantType.QInt8,
activation_type=QuantType.QInt8,
reduce_range=True,
extra_options={'WeightSymmetric': True, 'ActivationSymmetric': False})
I've generated c-files related to this ONNX model using the following command :
stedgeai generate --target stm32n6
--model "int8.onnx" \
--st-neural-art \
--inputs-ch-position chfirst \
--input-data-type float32 \
--output-data-type float32 \
Then I've flashed my board using python n6_loader.py --n6-loader-config config_n6l.json
I've tried on both a N6-DK and a N6-Nucleo.
Validation on target show a difference between onnx and hardware results. I've been further and I've run the original onnx model, the optimized onnx generated by "stedgeai generate" and the onboard model using stm_ai_runner on the same set of inputs. Results point out that values returned by the board are clipped.
I've suspected last int8-->float32 conversion layer added by --output-data-type float32 but it turns out that raw int8 are clipped as well.
Attached are my original ONNX quantized file and the ONNX generated by stedgeai during generate command
I've tried different onnx quantization parameters, per_channel or not, reduce_range or not.
Thanks for your help
2025-10-15 10:49 PM