STM32N6 onboard inference returns clipped values

cp5900 · ‎2025-10-15

Hi,

I'm trying to deploy a quantized ONNX neural network on a STM32N6-Nucleo and inference on board returns clipped values (see clipped-output.png).

Previously, I've exported and quantized my torch neural network to onnx using the following configuration (the same is used in this end-to-end notebook example)

torch.onnx.export(                                                                             
            model,                                                                                     
            inputs,                                                                                    
            str(onnx_path),                                                                            
            input_names=input_names,                                                                   
            output_names=output_names,                                                                 
            opset_version=15,                                                                          
            export_params=True,                                                                        
            do_constant_folding=True,                                                                  
        )

quantize_static(
        str(onnx_path),
        str(int8_path),
        CalibReader(str(calib_path)),
        calibrate_method=CalibrationMethod.MinMax, 
        quant_format=QuantFormat.QDQ,
        per_channel=True,
        weight_type=QuantType.QInt8, 
        activation_type=QuantType.QInt8, 
        reduce_range=True,
        extra_options={'WeightSymmetric': True, 'ActivationSymmetric': False})

I've generated c-files related to this ONNX model using the following command :
stedgeai generate --target stm32n6
--model "int8.onnx" \
--st-neural-art \
--inputs-ch-position chfirst \
--input-data-type float32 \
--output-data-type float32 \

Then I've flashed my board using python n6_loader.py --n6-loader-config config_n6l.json
I've tried on both a N6-DK and a N6-Nucleo.

Validation on target show a difference between onnx and hardware results. I've been further and I've run the original onnx model, the optimized onnx generated by "stedgeai generate" and the onboard model using stm_ai_runner on the same set of inputs. Results point out that values returned by the board are clipped.

I've suspected last int8-->float32 conversion layer added by --output-data-type float32 but it turns out that raw int8 are clipped as well.

Attached are my original ONNX quantized file and the ONNX generated by stedgeai during generate command

I've tried different onnx quantization parameters, per_channel or not, reduce_range or not.

Thanks for your help

cp5900 · ‎2025-10-15

Hi,

I'm trying to deploy a quantized ONNX neural network on a STM32N6-Nucleo and inference on board returns clipped values (see clipped-output.png).

Previously, I've exported and quantized my torch neural network to onnx using the following configuration (the same is used in this end-to-end notebook example)

torch.onnx.export(                                                                             
            model,                                                                                     
            inputs,                                                                                    
            str(onnx_path),                                                                            
            input_names=input_names,                                                                   
            output_names=output_names,                                                                 
            opset_version=15,                                                                          
            export_params=True,                                                                        
            do_constant_folding=True,                                                                  
        )

quantize_static(
        str(onnx_path),
        str(int8_path),
        CalibReader(str(calib_path)),
        calibrate_method=CalibrationMethod.MinMax, 
        quant_format=QuantFormat.QDQ,
        per_channel=True,
        weight_type=QuantType.QInt8, 
        activation_type=QuantType.QInt8, 
        reduce_range=True,
        extra_options={'WeightSymmetric': True, 'ActivationSymmetric': False})

I've generated c-files related to this ONNX model using the following command :
stedgeai generate --target stm32n6
--model "int8.onnx" \
--st-neural-art \
--inputs-ch-position chfirst \
--input-data-type float32 \
--output-data-type float32 \

Then I've flashed my board using python n6_loader.py --n6-loader-config config_n6l.json
I've tried on both a N6-DK and a N6-Nucleo.

Validation on target show a difference between onnx and hardware results. I've been further and I've run the original onnx model, the optimized onnx generated by "stedgeai generate" and the onboard model using stm_ai_runner on the same set of inputs. Results point out that values returned by the board are clipped.

I've suspected last int8-->float32 conversion layer added by --output-data-type float32 but it turns out that raw int8 are clipped as well.

Attached are my original ONNX quantized file and the ONNX generated by stedgeai during generate command

I've tried different onnx quantization parameters, per_channel or not, reduce_range or not.

Thanks for your help

cp5900 · ‎2025-10-15

Attached are 27 input files, use to get the previous plot.

cp5900 · ‎2025-10-28

Hi, I'm still blocked on this matter. Let me know if you cannot reproduce the problem.
Thanks for your help

MCHTO.1 · ‎2025-10-30

Hello,

When quantizing a model, it’s important to ensure that the outputs of your initial floating-point model are within a consistent value range for example, all outputs should fall within [−1,1], or [0,1] ..... This consistency helps achieve better quantization results and minimizes potential accuracy loss.

I hope this helps.

Best regards,

cp5900 · ‎2025-10-30

Hi,

Thanks for your answer. If this was the problem, shouldn't the issue arise when running the quantized onnx model ?

However, I've tried running the original float 32 onnx model on the board, performance are terrible but values are not clipped so there is definitely an issue with quantization.

I'll try to keep outputs within a consistent value range

Best,