Can STM32N6 run Transformer or Multi Head Attention?

mincho00 · ‎2025-10-24

I’m currently trying with the STM32N6570-DK board using the ST Edge-AI v2.2.0

I would like to know whether a lightweight Transformer model (for example, a small Vision Transformer or a minimal Transformer encoder) can be successfully converted and executed on the STM32N6. I checked that batch_matmul, transpose, etc. are supported on the link (https://stm32ai-cs.st.com/assets/embedded-docs/stneuralart_operator_support.html )

Currently, after converting the lightweight ViT model to tflite or onnx, I am trying to convert it using stedgeai and stm32cubeide, but only the following error messages appear and are not being converted
Code, model file, and error text: https://github.com/minchoCoin/lightweight_vit

These model was successfully analyzed with STM32Cube.AI MCU runtime on STM32CubeIDE

Following error messages appear when I try to convert these model(tflite and onnx) with STM32Cube.AI Neural-ART runtime in STM32CubeIDE or with stedgeai in command line

1. tflite

stedgeai generate --model custom_vit_int8.tflite --target stm32n6 --st-neural-art default@user_neuralart_STM32N6570-DK.json

ST Edge AI Core v2.2.0-20266 2adc00962

WARNING: nl_8 is not quantized

...

STEDGEAI_BuildAtonnExe_Win/git/onnx_backend/platform_passes/transform_gemm_fc_into_conv.cc:203: runTransform: Assertion `(b_shape.size() == 1) || ((b_shape[0].dim == M) || b_shape[0].dim == N)` failed.

      Warning: Missing Quantization info for Pow_35_exp; will consider Pow_35_exp as a native Float

...

Warning: Lowering of node=Transpose_52 kind=Transpose not yet supported. the generated code will not compile

      terminate called after throwing an instance of 'std::runtime_error'

        what():  SW mapping failed:

       Node Transpose_88 not mapped

       Internal compiler error (signo=6), please report it

2. onnx

ST Edge AI Core v2.2.0-20266 2adc00962

INTERNAL ERROR: Exported ONNX could be malformed since ONNX shape inference fails

thank you for your support in advance!

Julian E. · ‎2025-10-24

Hello @mincho00,

You may try to use the --use-onnx-simplifier with your onnx model (It may also work with the tflite, as it may affect the intermediary onnx model generated).

Something like:

stedgeai generate --model custom_vit_int8.tflite --target stm32n6 --st-neural-art default@user_neuralart_STM32N6570-DK.json --use-onnx-simplifier

If it does not work, please upload your model in a .zip if you can share it. I will be useful for the dev team for future updates. Thanks

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

mincho00 · ‎2025-10-24

Dear [Recipient],

Thank you for your reply.

Unfortunately, the --use-onnx-simplifier option did not resolve the issue.
I’ve uploaded the Colab notebook (ipynb, running on colab) and the model file for reference.

(I’ve been successfully using the STM32N6 Neural art accelerator with CNN models, but it has been challenging to implement Transformer or multi-head attention on the STM32N6-DK device)

torch==2.8.0+cu126

tensorflow==2.19.0

onnx==1.19.1

onnxruntime==1.23.2

Thank you again for your time and assistance.

Best regards,

Taehun Kim

mincho00 · ‎2025-10-26

Thank you @Julian E.
Unfortunately, the '--use-onnx-simplifier' option didn’t help with converting the models.
I’ve uploaded the code, model file, and error log for your reference.

Thank you.