What's the correct way to write THUMB/ARM instructions mixed with LL Drivers for the STM32F334?

LS. B.1 · ‎2021-02-16

I'm writting a very time-critical application for the STM32F334, where I have to control 4 power converters (mostly 3, one is time-shared another power converter). And recently I saw some ways to optimize code, like this:

//*********** 1360ns ****************

LL_HRTIM_TIM_SetCompare3(HRTIM1, LL_HRTIM_TIMER_A, DEF_TMRA_STATIC_MAX_PWM_CTE+i_CalcDutyA);

LL_HRTIM_TIM_SetCompare1(HRTIM1, LL_HRTIM_TIMER_A, DEF_TMRA_STATIC_MAX_PWM_CTE-i_CalcDutyA);

LL_HRTIM_TIM_SetCompare3(HRTIM1, LL_HRTIM_TIMER_B, DEF_TMRA_STATIC_MAX_PWM_CTE+i_CalcDutyB);

LL_HRTIM_TIM_SetCompare1(HRTIM1, LL_HRTIM_TIMER_B, DEF_TMRA_STATIC_MAX_PWM_CTE-i_CalcDutyB);

//********* Optimized version **** 500ns or less

HRTIM1->sTimerxRegs[HRTIM_TIMERINDEX_TIMER_A].CMP3xR = DEF_TMRA_STATIC_MAX_PWM_CTE+i_CalcDutyA;

HRTIM1->sTimerxRegs[HRTIM_TIMERINDEX_TIMER_A].CMP1xR = DEF_TMRA_STATIC_MAX_PWM_CTE-i_CalcDutyA;

HRTIM1->sTimerxRegs[HRTIM_TIMERINDEX_TIMER_B].CMP3xR = DEF_TMRA_STATIC_MAX_PWM_CTE+i_CalcDutyB;

HRTIM1->sTimerxRegs[HRTIM_TIMERINDEX_TIMER_B].CMP1xR = DEF_TMRA_STATIC_MAX_PWM_CTE-i_CalcDutyB;

Now I'm looking for better/other optimization strategies (like writing Thumb/ARM assembly directly, as is already being done with mostly, if any, DSP that I've worked on), so the question is: how to do it in a correct way to mix with C/LL (C syntax) ? Any documentation available ?

(For example, a fastPID routine on this STM32F334 (72MHz) uses about 1000ns - 16bit Q15 fixed point arithmetic operations; on a dsPIC33 the same routine takes *only* 500ns - running at a theoretically lower clock speed - 40MIPs)

Thanks for any help.

Ozone · ‎2021-02-16

Both the assembler syntax and the interface between C code and assembler routines (parameter handling) depends on the toolchain you are using.

This is not C standard.

LS. B.1 · ‎2021-02-16

I'm using Ac6's SystemWorkbench / Eclipse IDE + CubeMX.

In the future i'll migrate to SMT32CubeIDE.

LDSB.

Andrew Neil · ‎2021-02-16

As @Ozone says, the syntax for inline/embedded assembler within 'C' code is entirely compiler dependent.

However ARM do define the ABI (Application Binary Interface) - so that should be compatible across compliant toolchains.

https://developer.arm.com/architectures/system-architectures/software-standards/abi

The syntax, rules, restrictions, etc for inline/embedded assembler (even the name varies!) tend to be very arcane - therefore I would strongly suggest that you make a separately-built assembler module that you call from 'C'. Then at least the 'C' remains standard & portable.

https://www.avrfreaks.net/comment/2800826#comment-2800826

LS. B.1 · ‎2021-02-16

Allright, I did found some interesting stuff (based on Andrew Neil links):

Releases · ARM-software/abi-aa · GitHub

Procedure Call Standard for the Arm Architecture - pdf, html
Run-time ABI for the Arm Architecture - pdf, html

Writing a separate ".s" file for a very specific part of the code sometimes can be useful, as painful to do...

So i'd like to employ ST's/ARM based C code syntax for such job, like this one here ... (asm functions located at cmsis_gcc.h)

...

/* Derived coefficient A0 */

S->A0 = __QADD16(__QADD16(S->Kp, S->Ki), S->Kd);

...

Thanks for the help!

LDSB

Tesla DeLorean · ‎2021-02-16

Look at the code the compiler is generating.

Make sure it is in-lining properly, and look for algorithmic shortcuts.

Doing embedded-inline assembler is always a bane to portability, and compiler developers keep changing the rules to accommodate themselves.

Where possible get your critical code into a .s file where you can control the alignments, branching, unrolling, and literal pools.

Compilers can handle complex register juggling and finding the queen, but when it comes to balancing critical code and algorithms they are pretty shallow.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Ozone · ‎2021-02-16

Basically all toolchain suppliers stick to ARM's ABI, that is correct.

The biggest problem are usually the core registers you are using in the assembler code.

As already noted, the syntax to confer your intended usage to the compiler is rather arcane.

I personally did no very much assembler coding, and found it quite hard to beat the compiler in efficiency, at least at higher optimisation levels.