How to make multiplications faster in STM32G series

DKuro.1 · ‎2020-05-10

Hi,

when I took two variables and perform a multiplication between them with the STM32CubeIDE, the multiplication itself its made automatically with the multiplier module in the fastest way or there is something I need to do/configure to make it faster?

Danish1 · ‎2020-05-10

It depends stm32G you have, and what data-type your variables are.

STM32G0 only has an integer multiplier. The compiler will automatically use it if your variables are integers.

STM32G4 has an integer multiplier and a single-precision floating-point multiplier.

If your variables are integers, the compiler will automatically use the integer multiplier.

If your variables are single-precision floating-point, and you have enabled floating-point calculations in your compiler (no I don't know how to specify this in stm32CubeIDE) then it will use the single-precision floating-point multiplier.

But if one or both of the variables are double-precision (and watch out, C compilers assume and promote to double-precision unless told otherwise) then the multiplication must be done in software.

But the quickest test is to see what assembly-code is generated for the multiply. If it is a subroutine-call then it is done in software.

Hope this helps,

Danish

berendi · ‎2020-05-10

The compiler options are set up according to the capabilities of the MCU at project creation, no need to change them.

The floating-point unit (if there is one) must be enabled in software, it is taken care of in the generated code, but you must do it yourself if you don't use generated code.

SCB->CPACR |= ((3UL << (10*2))|(3UL << (11*2)));  /* set CP10 and CP11 Full Access */

Check the system_stm32g*.c file for the above line. Even if it appears to be grayed out by some #if directives, it will be compiled in, you can verify it in the list file.

DKuro.1 · ‎2020-05-10

Hi brendi, thanks for your answer.

I am using the STM32G474 and the datasheet shows that it has a floating point unit. But, SCB->CPACR is not initialized in the #ifdef part... sttrange right?

What do you recomend me to do?

DKuro.1 · ‎2020-05-10

Hi Danish,

thanks for your answer.

I am using STM32G474 and it seems to have a floating point unit. So do you think that is correct to enable it and just make my multiplications in code and the compiler will find out how to do it on hardware?

if my variables are double precision, can I just cast them?

berendi · ‎2020-05-10

Set a breakpint on the line. Is the debugger able to stop on that line, or skips it? What does the SCB->CPACR register contain before / after that line of code?

DKuro.1 · ‎2020-05-11

It actually passes trough there and sets the register SCB->CPACR to 0xF00000.

So it is using the floating point unit, right?

berendi · ‎2020-05-11

Setting the register enables the floating-point unit on the MCU.

Setting the -mfpu=... and -mfloat-abi=hard compiler options lets the compiler generate instructions that actually use it.

The compiler options are set up automatically to use the FPU, don't change them.

If the compiler options are right, then the macros in the #if line get defined at compilation time, and the FPU will be enabled. Because they are defined by the compiler, not in a header file, the editor mistakenly shows them grayed out.

DKuro.1 · ‎2020-05-11

So helpful, thanks berendi!!

Two questions,

1) So, I just need to multiply a*b in my code and that's it?

2) The FMAC module can make the multiplications faster?

Danish1 · ‎2020-05-11

if my variables are double precision, can I just cast them?

Well yes you can. But is that what you want? Casting will do a time-consuming software conversion from a 64-bit double-precision number to a 32-bit single-precision number. Then (if the option is enabled in the compiler) the fast hardware multiply will happen. But then if the result is to be stored in a double-precision variable or compared with a double-precision threshold, it has to be converted in software back to a 64-bit double-precision number.

So that might end up taking as long or longer than a software double-precision multiply. It's much faster if you store everything in single-precision, so you can minimise the number of conversions, as well as saving memory if you have an array of values. But e.g. numeric constants need the suffix f e.g. 1.0f to tell the C compiler not to start with a double-precision value and promote / convert.

Have you tried looking at the assembly-code generated for one of your multiplies?

I see lines like vmul.f32 s14, s14, s20 and I think the "f32" is a giveaway that this is a hardware single-precision multiply.