Issue with 16-bit input and 16-bit weight convolution on STM32N6 ConvAcc

yangweidage · ‎2025-09-12

Hi all,

According to the STM32N6 reference manual (see attached figure), both the diagram and the text description state that the ConvAcc should support 16-bit input × 16-bit weight (16×16) operations.

To verify this, I designed a simple test model(see attached files):

Input: 4×4 tensor (all ones)
Kernel: 3×3 (all ones)
Expected output: 2×2 tensor

When I set the SIMD field to 1, the ConvAcc performs 8-bit input × 8-bit weight (8×8) operations, and the output results are correct as expected.

When I set the SIMD field to 2, the ConvAcc performs 16-bit input × 8-bit weight (16×8) operations, and the results are also correct.

My questions are:

Does the ConvAcc really support 16×16 convolution (16-bit input × 16-bit weight) as described in the reference manual?
If yes, how should I correctly configure the fields of LL_Convacc_InitTypeDef to enable 16×16 operation? (e.g., simd, inbytes_f, outbytes_o, kseten, etc.)

yangweidage · ‎2025-09-12

Thanks in advance!

Julian E. · ‎2025-09-16

Hello @yangweidage,

End user should not write code for the NPU. We don' t support it.

The use of the NPU is to be done via the ST Edge AI Core:

https://stedgeai-dc.st.com/assets/embedded-docs/index.html

As of today, the toolchain does not support 16×16 convolution even though the Hardware could as the documentation point out. And we will most likely start by supporting 8x16 convolution.

In any case, as I said, for now, the NPU is to be used for Neural Network with the code being generated by the ST Edge AI Core only.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

AMurz.1 · ‎2025-11-04

This is not supported, but if you want to go to unexplored areas, you can do 16x16 using simd=0, it seems to work the same way as 16x8 (with 8 bits weights).

deepmode and kseten seems to require 8x8 mode (simd=1) and are not supported by others modes. So they need to be 0 in 16x16 mode.

I've only tried 16x16 mode with 1x1 Convolution of 1D vector (that is, just a 2D matrix multiplication by a 1D vector).

Obviously as this is unsupported by ST, things can go wrong unexpectedly (for example weights must be rearranged in a special manner).

If you need 16 bits weights due to precision issues at 8 bits, you can also try Quantization-Aware Training which may gives you a 8 bits model with a good enough precision (and faster execution with less RAM usage).