IAR : 32 bits SP is not 64 bits DP !

tarzan2 · ‎2014-11-26

Posted on November 26, 2014 at 14:27

Reading

http://www.iar.com/Global/Resources/Developers_Toolbox/C_Cplusplus_Programming/Improve_performance_of_digital_signal_processing_with_IAR_Embedded_Workbench_for_ARM.pdf

page 3 :

arm_sqrt_f32 : 52 cycles

sqrt : 752 cycles

The cycles count are good. BUT sqrt works on 64 bits double precision floating point number.

Try use sqrtf instead : armcc will use VSQRT instruction, during 14 cycles (25-28 cycles including function call) => sqrtf is 2 time faster than the DSP iterative algorithm.

Tesla DeLorean · ‎2014-11-26

Posted on November 26, 2014 at 15:51

Ok, and your point is what?

They used sqrt() as a software FP solution, because the sqrtf() would likely use the FPU in any reasonably selected library. Try timing a software sqrtf() implementation vs an FPU assisted one.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

tarzan2 · ‎2014-11-26

Posted on November 26, 2014 at 19:08

My point is that despite IAR recommend to use arm_sqrt function to optimize speed, I recommend to use sqrtf, witch is 2 time faster.

The arm_sqrt is based on a Newton-Raphson iterative algorithm using FPU for basic maths (mul/add...). Since the FPU has the VSQRT instruction, I cannot see why to don't use it.

I'm just sorry but using double types to bypass a 32bits-FPU is not a good idea. Disabling FPU in the compiler options make more sense.

Without FPU (and without optimizations), the timing are :

arm_sqrt_f32 : 250 cycles

sqrt : 663 cycles

sqrtf : 188 cycles

With FPU :

arm_sqrt_f32 : 55 cycles

sqrt : 655 cycles

sqrtf : 28 cycles

IAR conclusion :

''From this simple example, we can see that CMSIS-DSP is very easy to use and that it improves the performance significantly.''

nonsense.

frankmeyer9 · ‎2014-11-26

Posted on November 26, 2014 at 19:19

The arm_sqrt is based on a Newton-Raphson iterative algorithm using FPU for basic maths (mul/add...). Since the FPU has the VSQRT instruction, I cannot see why to don't use it.

Look at the head comment in the implementation file ''arm_sqrt_q31.c''. It says:

* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0

Not all the listed cores feature the FPU.