2024-11-19 02:56 AM
Hello,
I have a large .onnx model which, untouched, I am unable to run on my STM32 board. To tackle this problem, I wanted to quantize the model in order to decrease its size. I followed the official STM wiki (https://wiki.st.com/stm32mcu/wiki/AI:X-CUBE-AI_support_of_ONNX_and_TensorFlow_quantized_models). First, I preprocess the model using onnxruntime as described in the wiki, and then I change the opset version of the preprocessed model to 13. Finally, using almost identical code to the wiki, I apply static quantization to the model, drastically reducing its size. (As in the wiki, I also tried quantizing with and without the per_channel=true option, but both attempts failed.)
When trying to analyze this model using X-CUBE-AI, I get the following error:
How can I fix this? The strange part is that the original, larger model can be analyzed successfully, so the quantization process must have introduced a change that made the model unanalyzable. I am also using the latest X-CUBE-AI package, version 9.1.0. Additionally, I tried running the quantized model using Python and onnxruntime with random inputs, and it produced outputs without any errors.
I have attached both the original, larger model and the quantized model in this post.
2024-11-20 02:02 AM - edited 2024-11-20 02:02 AM
Hello @casperbroch,
From your quantized model, I am also able to reproduce your issue.
Your ONNX version correctly matches what is expected for the tool.
However, I performed a quantization from your original ONNX model through the ST Edge AI Developer Cloud and it can run successfully. You will find this file in this message (for benchmark purpose since it is quantized with random data).
When quantizing your model, could you try with these settings and see if it resolves your issue:
quant_format=QuantFormat.QDQ,
per_channel=True,
weight_type=QuantType.QInt8,
activation_type=QuantType.QInt8,
optimize_model=False,
reduce_range=True,
extra_options={'WeightSymmetric': True, 'ActivationSymmetric': False}
Best regards,
Yanis