cancel
Showing results for 
Search instead for 
Did you mean: 

System clock speed and its effect on floating point (soft) calculations performance

Greetings.

I'm using STM32L051 for a project, and there are some maths calculations where I'm using single-precision floating point calculations (additions, multiplications, sinf(), cosf(), etc.).

My calculation routine is taking about 7ms when using 16MHz HSI without PLL, 0 wait-state, buffer cache and pre-read enabled.

I'm still not pressed, but I thought, if required, I have some space there with increasing the clock up 32MHz and decreasing the calculation time to 3.5ms, but in reality, when I enabled the PLL to have 32MHz system clock (with 1 wait state) the calculation dropped just by 1ms, so now it's 6ms. Enabling the prefetch changes nothing either.

What could be the reason of this strange issue?

P.S.

I know that I can drastically increase the performance by switching to, for example, STM32L4 which has a dedicated FPU.

7 REPLIES 7

>>What could be the reason of this strange issue?

Flash wait states on the CM0 implementations are brutal.

Run it at 24 MHz with 0-ws, or 27-28 as it might permit, assuming an ~35ns access time.

The L4 at least has a caching/prefetch mechanism that hides the slowness of the underlying memory.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Run the code from RAM

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Thanks for the explanation and your suggestions! Very much appreciated.

The RM says the maximum clock speed for 0-ws is 16 MHz. Are you suggesting to ignore that and try to run at 24 MHz with 0-ws? What issues can I face in that case?

I tried to place the calculation functions into RAM:

void calc(struct meter_state *state) __attribute__((long_call, noinline, section(".data")));

But I'm getting these kind of errors:

error: relocation truncated to fit: R_ARM_THM_CALL against symbol `__aeabi_fmul' defined in .text section in ..../thumb/v6-m/libgcc.a

I guess I'll have to place libgcc and libm into RAM too. 95% of the calculations are in those libraries, so it's a good idea in any case. Tried to do it using the linker file, still no success (same error), investigating..

  .data : 
  {
    . = ALIGN(4);
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */
 
    libm.a(*)
    libgcc.a(*)
 
    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT> FLASH

AvaTar
Lead

I would suggest using integer instead of floating point, and use a "scaled math" approach.

I guess it would require too much change of the code-base and in my case (it's a small toy project for fun) it would be easier just to upgrade the MCU than do those changes. Unless there is some kind of easy-to-integrate library allowing to do trigonometry calculations with integers and at least 0.1 degree precision. So, thanks, I will try to find one and maybe even try it.

Commercial projects tend to use the integer approach, allowing for a cheaper MCU. If your toy project works with emulation, it's fine. I tend to use a M4F core in such a case.

> Unless there is some kind of easy-to-integrate library allowing to do trigonometry calculations with integers and at least 0.1 degree precision

I re-use older source code of mine, before the advent M4/M4F cores. With 32 bit integer and tabulated sin/cos values, this is quite easy, and can easily keep up with the FPU.

Found a handful of fixed point arithmetic libraries. Libfixmath looks very interesting. There is also Qfplib which is optimized floating point library for Cortex-M0, it would be interesting to compare its performance with GCC's implementation, the provided benchmarks do look good.

https://github.com/PetteriAimonen/libfixmath

https://www.quinapalus.com/qfplib.html