2018-11-08 08:00 AM
Greetings.
I'm using STM32L051 for a project, and there are some maths calculations where I'm using single-precision floating point calculations (additions, multiplications, sinf(), cosf(), etc.).
My calculation routine is taking about 7ms when using 16MHz HSI without PLL, 0 wait-state, buffer cache and pre-read enabled.
I'm still not pressed, but I thought, if required, I have some space there with increasing the clock up 32MHz and decreasing the calculation time to 3.5ms, but in reality, when I enabled the PLL to have 32MHz system clock (with 1 wait state) the calculation dropped just by 1ms, so now it's 6ms. Enabling the prefetch changes nothing either.
What could be the reason of this strange issue?
P.S.
I know that I can drastically increase the performance by switching to, for example, STM32L4 which has a dedicated FPU.
2018-11-08 08:44 AM
>>What could be the reason of this strange issue?
Flash wait states on the CM0 implementations are brutal.
Run it at 24 MHz with 0-ws, or 27-28 as it might permit, assuming an ~35ns access time.
The L4 at least has a caching/prefetch mechanism that hides the slowness of the underlying memory.
2018-11-08 08:46 AM
Run the code from RAM
2018-11-08 11:42 PM
Thanks for the explanation and your suggestions! Very much appreciated.
The RM says the maximum clock speed for 0-ws is 16 MHz. Are you suggesting to ignore that and try to run at 24 MHz with 0-ws? What issues can I face in that case?
I tried to place the calculation functions into RAM:
void calc(struct meter_state *state) __attribute__((long_call, noinline, section(".data")));
But I'm getting these kind of errors:
error: relocation truncated to fit: R_ARM_THM_CALL against symbol `__aeabi_fmul' defined in .text section in ..../thumb/v6-m/libgcc.a
I guess I'll have to place libgcc and libm into RAM too. 95% of the calculations are in those libraries, so it's a good idea in any case. Tried to do it using the linker file, still no success (same error), investigating..
.data :
{
. = ALIGN(4);
_sdata = .; /* create a global symbol at data start */
*(.data) /* .data sections */
*(.data*) /* .data* sections */
libm.a(*)
libgcc.a(*)
. = ALIGN(4);
_edata = .; /* define a global symbol at data end */
} >RAM AT> FLASH
2018-11-08 11:50 PM
I would suggest using integer instead of floating point, and use a "scaled math" approach.
2018-11-09 12:07 AM
I guess it would require too much change of the code-base and in my case (it's a small toy project for fun) it would be easier just to upgrade the MCU than do those changes. Unless there is some kind of easy-to-integrate library allowing to do trigonometry calculations with integers and at least 0.1 degree precision. So, thanks, I will try to find one and maybe even try it.
2018-11-09 12:49 AM
Commercial projects tend to use the integer approach, allowing for a cheaper MCU. If your toy project works with emulation, it's fine. I tend to use a M4F core in such a case.
> Unless there is some kind of easy-to-integrate library allowing to do trigonometry calculations with integers and at least 0.1 degree precision
I re-use older source code of mine, before the advent M4/M4F cores. With 32 bit integer and tabulated sin/cos values, this is quite easy, and can easily keep up with the FPU.
2018-11-09 02:25 AM
Found a handful of fixed point arithmetic libraries. Libfixmath looks very interesting. There is also Qfplib which is optimized floating point library for Cortex-M0, it would be interesting to compare its performance with GCC's implementation, the provided benchmarks do look good.