What's the best practice of porting a pretrained pytorch model to STM32 MCU?

ZFang.1 · ‎2022-01-18

Dear support-team,

I've encountered quite some problems when loading a pre-trained Pytorch mobilenetv2 model in Xcube-AI.

I have tried the following with the app stm32ai located in the installation repository of X-Cube-AI/7.0.0.

torch fp32 --> onnx_fp32 --> onnx_int8 (external)
torch fp32 --> onnx fp32 --> keras (h5) fp32
converted keras (h5) fp32 --> tflite_fp32

I found that:

quantized onnx model cannot be ported and analyzed directly in Xcube-AI. The quantization tool is only aimed at keras model. Does it mean that we cannot use quantized int8 ONNX model on STM32 platform?
since only channel-last (NHWC) model is supported in STM32 MCU, the torch model has to be converted from NCHW to NHWC. But the Transposition operator does not seem to supported in Xcube-AI. If that is case, how would you recommend to port pytorch to STM32 platform? Should we use directly the torch JIT format? Is torchscript model supported on STM32 platform?

Looking forward to your answer,

Best regards,

Zhen

VTOMA.1 · ‎2022-01-18

Dear Zhen,

thanks for your questions.

Her below my answers:

1) We have currently a limited support to ONNX quantized models in XCubeAI. We can support only static quantized models and we support QLinearConv, QLinearMatmul, QuantizeLinear and DequantizeLinear layers (as explained in the documentation). The post-training quantization is supported by XCubeAI CLI only for keras models. So, external tools should be used for quantizing ONNX models.

2) XCubeAI manages both channel last and channel first models (typically ONNX models are channel first). I would recommend you to give a look at "Channel first support for ONNX model" section in the XCubeAI documentation. Moreover Transpose layer is also supported.

If you still encounter any issues, please let us know.

It could be useful for us to see the error messages you get.

Thanks and Best Regards

Valeria

ZFang.1 · ‎2022-01-18

Dear Valeria,

Many thanks for the prompt reply. It's very helpful.

I tried to load my quantized int8 mobilenetv2 again in XcubeAI, and the error msg showed that:

NotImplementedError: Unsupported layer types: QLinearAdd, QLinearGlobalAveragePool
[8362] Failed to execute script stm32ai

which confirmed that this type of ONNX model is still not supported yet on your platform.

To find a workaround, I also tried to convert ONNX fp32 model to Keras/tensorflow and then to TFLite, as I understood that TFLite is better supported on your platform. However, despite that I have converted both models successfully, they cannot be analyzed on XcubeAI. The full error msg of loading Keras fp32 & the TFLite f32 are shown in the attachment [TF version: 2.5.0; Keras version 2.5.0].

Do you know where the problem comes from?

Looking forward to reply,

Best,

Zhen

KSOdin2 · ‎2022-01-21

If you look at the X-Cube-AI documentation in the STM32CUBEIDE there is a section on support ONNX operators (It's called ONNX toolbox support)

It is located under your home directory

~/STM32Cube/Repository/Pack/STMicroelectronics/X-CUBE-AI/7.1.0/Documentation/index.html

It can also be directly accessed from STM32CubeMX using Help->X-Cube-AI Documentation. The menu option is available once you have opened a project in STM32CubeMX with the X-Cube-AI pack enabled in the project

This should help to know what ONNX operators are supported for you.

If you haven't tried it I recommend using the application template as this will set up your inputs/outputs correctly (as I've had issues here previously). The template will also generate the code so I highly recommend you give it a go.

Hope this helps :)

VTOMA.1 · ‎2022-01-24

Dear Zhen,

I confirm that you can find the list of the supported operators in ~/STM32Cube/Repository/Pack/STMicroelectronics/X-CUBE-AI/7.1.0/Documentation/index.html.

The QLinearAdd and QLinearGlobalAveragePool layers are not in vanilla ONNX definition. How did you quantize your model? Did you use the quantization script from onnxruntime?

The error you got with Keras model is due to channel first convolutions that are not supported in CubeAI for Keras framework.

The error you got with tflite is not totally clear. Could you make a try with the newest CubeAI 7.1.0 version?

Thanks and Best Regards

Valeria

ZFang.1 · ‎2022-01-24

Dear Valeria,

Thank you for your reply.

Regarding the quantization of ONNX: yes, I used the quantize_static function from onnxruntime/quantization/quantize module. I wrote a custom calibration_data_reader class to load/preprocess the dataset.

Regarding the error I got from converted TFLite model, I hereby confirm that the problem is already solved with the new release of CubeAI 7.1.0.

Best regards,

Zhen