2022-01-20 09:37 PM
Hi I am trying to optimise my gait generation code for my robot. I use a lot of float operations . I want to use FPU to accelerate computing.
As an experiment I am using the following task
TickType_t start = xTaskGetTickCount();
while( i<100000000)
{
float a = sin(1.7)+sin(0.58)+cos(0.66)/(2*(atan2(3,5)+sin(0.11)));
i++;
}
TickType_t end = xTaskGetTickCount();
sprintf(msg,"%lu \r\n",end-start);
HAL_UART_Transmit(&huart3,( uint8_t*) &msg,strlen(msg), 15);
i = 0;
I also setup the flags
But the time taking to execute the task is same as before setting up the above flags .
I was under assumption that once flag is set all legacy c defined math operation will be converted in FPU using code ? Am I wrong in the assumption ?
Please Advice !
Thank You
Yours Sincerely,
S.Shyam
2022-01-21 01:23 AM
I think that FPU is used regardless of __FPU_PRESENT symbol, because it is already defined in
"\Drivers\CMSIS\Device\ST\STM32H7xx\Include\stm32h750xx.h" (this path for STM32H750):
#define __FPU_PRESENT 1 /*!< FPU present
I think you can evaluate non-FPU performance by setting Floating-point unit to "None":
And maybe some additional changes to HAL driver defines.
For performance boost use integer implementation with lookup tables, CORDIC, polynomial, etc.
For SIN/COS I use simple lookup table with 32 bit phase accumulator
For ATAN2 I use polynomial ATAN from Efficient Approximations for the Arctangent Function and considering to replace it with lookup table.
2022-01-21 10:50 PM
One person told me that I have to use DSP in the CubeF7 driver to work with FPU.
https://github.com/STMicroelectronics/STM32CubeF7/tree/master/Drivers/CMSIS/DSP/Source
They are not related, right? DSP library can be used regardless of using FPU correct ?
2022-01-22 01:22 AM
whats the difference between FPU andFloating point ABI ?
2022-01-24 05:10 PM
DSP and FPU are independent. Use of one does not require or imply use of another.
The "Floating point ABI" is the correct place to set whether the compiler creates FPU instructions or emulated floating point support.
2022-01-25 11:20 AM
Line 4 in your sample code is "invariant" (the answer never changes) and the result (stored in "a") is never used. Likewise, the resulting value of "i" is never used outside the loop. Depending on your optimization setting, that line, and in fact the entire while() loop, may be optimized out of the executable code.
2022-01-29 11:37 PM
After experiments, these are my observations.
I used the DWT cycle counter to measure the cycles consumed.
I do two type of declaration
//Type 1
float t = 2.45f
//Type 2
float t = 2.45f
float32_t t2 = t;
Among both types of variable (t) declaration. Type 2 seems to be 20-30 cycles faster. even though I am not even using t2.
Thank you everyone for the advice !
2022-01-30 04:30 AM
I'm late here, but anyway: Check the code disassembly to better understand what happened and what you are measuring. Are FPU instructions generated or function calls to SW emulations? Does the assembler code correspond 1:1 to the source code?
In your above loop, chances are high that the compiler optimizes away calculations with unused results.
Declare variables volatile to ensure that the calculations are not optimzed away.
The compiler may reorder instructions statically (at compile time) for better performance.
On a Cortex-M7, the CPU may reorder memory access dynamically (at execution time) for better performance. This, together with cache behaviour, makes it quite complex to get accurate timing. See __DMB, memory barries etc..
For example, I doubt that in general
> 3. The cycle consumed increases as the magnitude of operand increases(12.45f>2.45f).
hth
KnarfB