2024-11-19 02:56 AM
Hello,
I have a large .onnx model which, untouched, I am unable to run on my STM32 board. To tackle this problem, I wanted to quantize the model in order to decrease its size. I followed the official STM wiki (https://wiki.st.com/stm32mcu/wiki/AI:X-CUBE-AI_support_of_ONNX_and_TensorFlow_quantized_models). First, I preprocess the model using onnxruntime as described in the wiki, and then I change the opset version of the preprocessed model to 13. Finally, using almost identical code to the wiki, I apply static quantization to the model, drastically reducing its size. (As in the wiki, I also tried quantizing with and without the per_channel=true option, but both attempts failed.)
When trying to analyze this model using X-CUBE-AI, I get the following error:
How can I fix this? The strange part is that the original, larger model can be analyzed successfully, so the quantization process must have introduced a change that made the model unanalyzable. I am also using the latest X-CUBE-AI package, version 9.1.0. Additionally, I tried running the quantized model using Python and onnxruntime with random inputs, and it produced outputs without any errors.
I have attached both the original, larger model and the quantized model in this post.