System clock speed and its effect on floating point (soft) calculations performance

After Forever · ‎2018-11-08

Greetings.

I'm using STM32L051 for a project, and there are some maths calculations where I'm using single-precision floating point calculations (additions, multiplications, sinf(), cosf(), etc.).

My calculation routine is taking about 7ms when using 16MHz HSI without PLL, 0 wait-state, buffer cache and pre-read enabled.

I'm still not pressed, but I thought, if required, I have some space there with increasing the clock up 32MHz and decreasing the calculation time to 3.5ms, but in reality, when I enabled the PLL to have 32MHz system clock (with 1 wait state) the calculation dropped just by 1ms, so now it's 6ms. Enabling the prefetch changes nothing either.

What could be the reason of this strange issue?

P.S.

I know that I can drastically increase the performance by switching to, for example, STM32L4 which has a dedicated FPU.

Tesla DeLorean · ‎2018-11-08

>>What could be the reason of this strange issue?

Flash wait states on the CM0 implementations are brutal.

Run it at 24 MHz with 0-ws, or 27-28 as it might permit, assuming an ~35ns access time.

The L4 at least has a caching/prefetch mechanism that hides the slowness of the underlying memory.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2018-11-08

Run the code from RAM

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

After Forever · ‎2018-11-08

Thanks for the explanation and your suggestions! Very much appreciated.

The RM says the maximum clock speed for 0-ws is 16 MHz. Are you suggesting to ignore that and try to run at 24 MHz with 0-ws? What issues can I face in that case?

I tried to place the calculation functions into RAM:

void calc(struct meter_state *state) __attribute__((long_call, noinline, section(".data")));

But I'm getting these kind of errors:

error: relocation truncated to fit: R_ARM_THM_CALL against symbol `__aeabi_fmul' defined in .text section in ..../thumb/v6-m/libgcc.a

I guess I'll have to place libgcc and libm into RAM too. 95% of the calculations are in those libraries, so it's a good idea in any case. Tried to do it using the linker file, still no success (same error), investigating..

  .data : 
  {
    . = ALIGN(4);
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */
 
    libm.a(*)
    libgcc.a(*)
 
    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT> FLASH

AvaTar · ‎2018-11-08

I would suggest using integer instead of floating point, and use a "scaled math" approach.

After Forever · ‎2018-11-09

I guess it would require too much change of the code-base and in my case (it's a small toy project for fun) it would be easier just to upgrade the MCU than do those changes. Unless there is some kind of easy-to-integrate library allowing to do trigonometry calculations with integers and at least 0.1 degree precision. So, thanks, I will try to find one and maybe even try it.

AvaTar · ‎2018-11-09

Commercial projects tend to use the integer approach, allowing for a cheaper MCU. If your toy project works with emulation, it's fine. I tend to use a M4F core in such a case.

> Unless there is some kind of easy-to-integrate library allowing to do trigonometry calculations with integers and at least 0.1 degree precision

I re-use older source code of mine, before the advent M4/M4F cores. With 32 bit integer and tabulated sin/cos values, this is quite easy, and can easily keep up with the FPU.

After Forever · ‎2018-11-09

Found a handful of fixed point arithmetic libraries. Libfixmath looks very interesting. There is also Qfplib which is optimized floating point library for Cortex-M0, it would be interesting to compare its performance with GCC's implementation, the provided benchmarks do look good.

https://github.com/PetteriAimonen/libfixmath

https://www.quinapalus.com/qfplib.html