2014-11-26 05:27 AM
Reading
page 3 :arm_sqrt_f32 : 52 cyclessqrt : 752 cyclesThe cycles count are good. BUT sqrt works on 64 bits double precision floating point number.Try use sqrtf instead : armcc will use VSQRT instruction, during 14 cycles (25-28 cycles including function call) => sqrtf is 2 time faster than the DSP iterative algorithm.2014-11-26 06:51 AM
Ok, and your point is what?
They used sqrt() as a software FP solution, because the sqrtf() would likely use the FPU in any reasonably selected library. Try timing a software sqrtf() implementation vs an FPU assisted one.2014-11-26 10:08 AM
My point is that despite IAR recommend to use arm_sqrt function to optimize speed, I recommend to use sqrtf, witch is 2 time faster.
The arm_sqrt is based on a Newton-Raphson iterative algorithm using FPU for basic maths (mul/add...). Since the FPU has the VSQRT instruction, I cannot see why to don't use it.I'm just sorry but using double types to bypass a 32bits-FPU is not a good idea. Disabling FPU in the compiler options make more sense. Without FPU (and without optimizations), the timing are :arm_sqrt_f32 : 250 cyclessqrt : 663 cyclessqrtf : 188 cyclesWith FPU :arm_sqrt_f32 : 55 cyclessqrt : 655 cyclessqrtf : 28 cyclesIAR conclusion :''From this simple example, we can see that CMSIS-DSP is very easy to use and that it improves the performance significantly.''nonsense.2014-11-26 10:19 AM
The arm_sqrt is based on a Newton-Raphson iterative algorithm using FPU for basic maths (mul/add...). Since the FPU has the VSQRT instruction, I cannot see why to don't use it.
Look at the head comment in the implementation file ''arm_sqrt_q31.c''. It says:
* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
Not all the listed cores feature the FPU.