Why does CORDIC performance vary so much with optimizations?
I am using the CORDIC on an STM32G4 for a project and was curious about the difference in performance based on the compiler optimization level. With no optimizations, LL_CORDIC_FUNCTION_PHASE completed in 113 cycles. With O1 optimizations, LL_CORDIC_FUNCTION_PHASE completed in 31 cycles.
I'm a bit confused by the huge disparity in execution times with and without optimization. Since the trig calculations occur in the CORDIC hardware, I wouldn't think that optimizations would have any effect. Additionally, the LL CORDIC function calls are simply inlines that access the peripheral registers, I wouldn't think there would be much to optimize there either.
I took a look at the generated assembly and while I don't fully understand it, it seems that the difference is primarily in additional debug features. This makes sense, but I would like some confirmation that this is the case. Thanks.
g_start_time = SysTick->VAL;
LL_CORDIC_WriteData(hcordic.Instance, cordic_input[0]);
LL_CORDIC_WriteData(hcordic.Instance, cordic_input[1]);
cordic_output = (int32_t)LL_CORDIC_ReadData(hcordic.Instance);
g_stop_time = SysTick->VAL;
g_elapsed_time = g_start_time - g_stop_time;