Sharing my repo - Apply Tiny Vision Transformer on STM32N6

mincho00 · ‎2026-03-16

I've implemeted a vision transformer for STM32N6 NPU.

Repository: https://github.com/minchoCoin/stm32n6-transformer

BatchMatmul operator is not yet supported on NPU and run on the CPU, but it is much faster than STM32H7(inference time was reduced by about 90% than STM32H7) because fully connected layer and convolutions is executed on NPU.

Vision Transformer for STM32N6 NPU has three differences from the original ViT(A. Dosovitskiy et al., 2020)

1. Patch Embedding (PE) is configured to be performed in the pre-processing process before model input: Because if the model has PE and self attention, it is not possible to interpret the model structure using STEdgeAI (I don't know why yet...)

2. Remove bias parameter from Fully Connected layer or change fully connected layer to 1x1 Conv2D: fully connected layer of onnx should have 2-dimensional input. however, fully connected layer of ViT has 3 dimensional input(batch,patch,embedding), so compiler removed the batch axis, and make patch axis as batch axis. however, an error occured when batch size is not 1 and the fully connected layer has bias.

3. Use ReLU activation function for MLP: because TFLite GELU is not supported

For detailed information, please refer the PPT: https://github.com/minchoCoin/stm32n6-transformer/blob/main/assets/stm32n6_transformer.pdf

Julian E. · ‎2026-03-23

Hi @mincho00,

Thank you for sharing.

I will show this to my colleague working on the compiler. It may be useful for them.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

Julian E. · ‎2026-03-23