2025-09-12 2:08 AM
Hi all,
According to the STM32N6 reference manual (see attached figure), both the diagram and the text description state that the ConvAcc should support 16-bit input × 16-bit weight (16×16) operations.
To verify this, I designed a simple test model(see attached files):
Input: 4×4 tensor (all ones)
Kernel: 3×3 (all ones)
Expected output: 2×2 tensor
When I set the SIMD field to 1, the ConvAcc performs 8-bit input × 8-bit weight (8×8) operations, and the output results are correct as expected.
When I set the SIMD field to 2, the ConvAcc performs 16-bit input × 8-bit weight (16×8) operations, and the results are also correct.
My questions are:
Does the ConvAcc really support 16×16 convolution (16-bit input × 16-bit weight) as described in the reference manual?
If yes, how should I correctly configure the fields of LL_Convacc_InitTypeDef to enable 16×16 operation? (e.g., simd, inbytes_f, outbytes_o, kseten, etc.)
2025-09-12 2:15 AM
Thanks in advance!
2025-09-16 4:52 AM - edited 2025-09-16 6:57 AM
Hello @yangweidage,
End user should not write code for the NPU. We don' t support it.
The use of the NPU is to be done via the ST Edge AI Core:
https://stedgeai-dc.st.com/assets/embedded-docs/index.html
As of today, the toolchain does not support 16×16 convolution even though the Hardware could as the documentation point out. And we will most likely start by supporting 8x16 convolution.
In any case, as I said, for now, the NPU is to be used for Neural Network with the code being generated by the ST Edge AI Core only.
Have a good day,
Julian
2025-11-04 3:14 PM
This is not supported, but if you want to go to unexplored areas, you can do 16x16 using simd=0, it seems to work the same way as 16x8 (with 8 bits weights).
deepmode and kseten seems to require 8x8 mode (simd=1) and are not supported by others modes. So they need to be 0 in 16x16 mode.
I've only tried 16x16 mode with 1x1 Convolution of 1D vector (that is, just a 2D matrix multiplication by a 1D vector).
Obviously as this is unsupported by ST, things can go wrong unexpectedly (for example weights must be rearranged in a special manner).
If you need 16 bits weights due to precision issues at 8 bits, you can also try Quantization-Aware Training which may gives you a 8 bits model with a good enough precision (and faster execution with less RAM usage).