CORDIC for phase(atan2) not much faster than math.h atan2

GHrib.1 · ‎2023-12-26

I have two functions to calculate the phase via DFT. In both functions, real and imaginary parts are first calculated, and then the phase is determined using different methods.

The first function employs the standard atan2f function from the math.h to calculate the phase. The execution time for this function is approximately 126 us.

The second function utilizes the CORDIC phase function, which essentially performs the same task as atan2. Surprisingly, the execution time for this CORDIC-based approach is not much faster, taking around 121 microseconds.

This result raises questions because, according to the reference manual, the CORDIC calculation should take approximately 10 clock cycles, equivalent to around 50 nanoseconds on my MCU. I am utilizing HAL_CORDIC_CalculateZO() to obtain the results.

Is this expected and does anyone have an idea how to get the specified calculation time. I would be pleased with 10us?

AScha.3 · ‎2023-12-26

Hi,

i just tried run 1x cordic : CORDIC_FUNCTION_ARCTANGENT , on H563 at 250MHz core:

+ CORDIC_SCALE_0 ; q1.31 format for input + output data;

648 ns cordic 1x run

1912 ns cordic 10x run

so about 1 us for 1 result; 2 us for 10 results , cordic needs 140ns here, 35 cpu clks.

(only HAL call is about 500 ns "waste"),

+ CORDIC_PRECISION_6CYCLES; /* max precision for q1.31 */

+ optimizer set -O2 (did you set optimizer level?)

(ed: corrected timing, at 250MHz core ; was 200 at first)

And with 20 variables in a call:

3312 ns cordic 20x run, so cordic needs 140ns here, 35 cpu clks.

If you feel a post has answered your question, please click "Accept as Solution".

View solution in original post

AScha.3 · ‎2023-12-26

Hi,

i just tried run 1x cordic : CORDIC_FUNCTION_ARCTANGENT , on H563 at 250MHz core:

+ CORDIC_SCALE_0 ; q1.31 format for input + output data;

648 ns cordic 1x run

1912 ns cordic 10x run

so about 1 us for 1 result; 2 us for 10 results , cordic needs 140ns here, 35 cpu clks.

(only HAL call is about 500 ns "waste"),

+ CORDIC_PRECISION_6CYCLES; /* max precision for q1.31 */

+ optimizer set -O2 (did you set optimizer level?)

(ed: corrected timing, at 250MHz core ; was 200 at first)

And with 20 variables in a call:

3312 ns cordic 20x run, so cordic needs 140ns here, 35 cpu clks.

If you feel a post has answered your question, please click "Accept as Solution".

GHrib.1 · ‎2023-12-26

Thank you for your answer.
I can confirm that the CORDIC function indeed takes approximately 1us.
The problem was in my time measurement approach. I was measuring execution time of a function with and without calling CORDIC function at the end.
When I commented out the CORDIC function, the compiler likely optimized out the for loop inside, creating the false impression that CORDIC was the main contributor to the observed time consumption, when in reality it is the for loop that is "time problematic".