CORDIC COS implementation

Sushmita · ‎2024-02-21

https://github.com/Sushmitabaga/CORDIC/tree/main

I was trying to Enable CORDIC IN STM32G4 series for checking the time taken by normal cos function and cos CORDIC function. But i found that both are taking almost the same time. which is not possible according to cordic .

Time Taken with CORDIC

1.61E-05

Time taken without CORDIC

2.50E-06

please the attached github code and below:

inline int f32_to_q31( float input )

{

const float Q_31_MAX_F = 1.0f ;// 0x0.FFFFFFp0F;//

const float Q_31_MIN_F = -1.0f;

return (int)roundf(scalbnf(fmaxf(fminf(input,Q_31_MAX_F),Q_31_MIN_F),31));

}

#define pi 3.14159265359

/*****convert notation integer to 32 bits float*********/

#define q31_to_f32(x) ldexp((int32_t)x, -31) //q31 represent no in range -1,1

float cordic_q31_cosf(float x)

{

CORDIC_ConfigTypeDef sConfig;

int32_t input_q31 = f32_to_q31(fmod(x,2.0f*pi)/(2.0f*pi))<<1;

int32_t output_q31;

sConfig.Function = CORDIC_FUNCTION_COSINE;

sConfig.Precision = CORDIC_PRECISION_6CYCLES;

sConfig.Scale = CORDIC_SCALE_0;

sConfig.NbWrite = CORDIC_NBWRITE_1;

sConfig.NbRead = CORDIC_NBREAD_1;

sConfig.InSize = CORDIC_INSIZE_32BITS;

sConfig.OutSize = CORDIC_OUTSIZE_32BITS;

HAL_CORDIC_Configure(&hcordic, &sConfig);

HAL_CORDIC_CalculateZO(&hcordic, &input_q31, &output_q31,1 ,0);

return q31_to_f32(output_q31);

}

int i;

for(i=0;i<361;i++)

{

radian = (float)(i)*2.0*pi/360.0; //degree to radain (2pi/360)

DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;

start_cyc = DWT->CYCCNT;// degree into radian

start_tick= SysTick->VAL;

cordic_cos[i] = cordic_q31_cosf(radian);

stop_tick = SysTick->VAL;

elasped = (float)(start_tick - stop_tick)/170000000.0;

end_cyc = DWT->CYCCNT;

if(end_cyc > start_cyc)

{

diff_cyc = end_cyc - start_cyc;

}

else

{

diff_cyc = end_cyc + (0xFFFFFFFF - start_cyc);

}

timeTaken = (float)diff_cyc/170000000.0;

}

Danish1 · ‎2024-02-22

What actually goes on inside cordic_q31_cosf()?

You have the cordic calculation, the thing you're trying to calculate how long it takes.

But you also have conversion to and from single-precision-float.

And a call to set up the cordic cosine calculation HAL_CORDIC_Configure()

None of those are part of the cordic calculation itself, and should not be included in the calculation for the time it takes.

If you're doing loads of calculations that need the speed of cordic, you're best-off keeping the numbers in and out in cordic-friendly format. And you shouldn't keep chopping-and-changing the type of calculation you want from the cordic hardware.

Sushmita · ‎2024-02-22

inside cordic_q31_cosf() ? radian = (float)(i)*2.0*pi/360.0 is going

without conversion to and from single-precision-float, I'm not getting cos(theta/radian) correct values. suppose cos(50) is 0.64 but its giving wrong.so convert notation integer to 32 bits float or vice versa is needed.

cordic_q31_cosf(float x)

{

CORDIC_ConfigTypeDef sConfig;

int32_t input_q31 = f32_to_q31(fmod(x,2.0f*pi)/(2.0f*pi))<<1;

int32_t output_q31;

HAL_CORDIC_CalculateZO(&hcordic, &input_q31, &output_q31,1 ,0);

return q31_to_f32(output_q31);

}

by keeping this much in cordic function block also not improving the code.

LCE · ‎2024-02-22

Have you disabled interrupts?

For time critical measurements I also use the cycle counter, but almost always inside:

_disable_irq();
start_cyc = DWT->CYCCNT;
function_under_test();
end_cyc = DWT->CYCCNT;
_enable_irq();

/* now all cyc calculations */

Everything else doesn't make much sense if you really want to make use of the highest resolution counter.

CMYL · ‎2024-02-22

Hello @Sushmita

The AN5325 " How to use the CORDIC to perform mathematical functions on STM32 MCUs - Application note " describes some requirements of CORDIC and performance evaluation (section 3) of mathematical expressions (cos and sin) compared to an equivalent function from a software library such as math.h.

An example is provided in the STM32CubeG4 MCU Package, under \Projects\NUCLEO-G474RE\Examples_LL\CORDIC\CORDIC_CosSin. This example performs a polar to rectangular conversion using the cosine function.

For cos and sin, there is an enhancement of a factor from 5 to 10 can be reached in terms of speed when using CORDIC accelerator versus math.h and arm_math.h software.

Please try the provided example of evaluation in the AN5325. If you are still not convinced with materials and results inside, we need your software implementation of cos/sin to compare it versus CORDIC.

___________________________________________________________________________________

When your question is answered, please close this topic by clicking "Accept as Solution".

Best regards,

Younes

Sushmita · ‎2024-02-25

The given example is in DMA, i want to try without DMA, using cordic zero overhead mode method.

For cos and sin, there is an enhancement of a factor from 5 to 10 can be reached in terms of speed when using CORDIC accelerator versus math.h and arm_math.h software. But i am not getting the required result.

CMYL · ‎2024-02-27

Hi @Sushmita

The HAL_CORDIC_CalculateZO() API you are calling by cordic_q31_cosf(float x) is counting all config, data formatting ... Not only the CORDIC computation process.

For rational performance estimation, in pooling mode, I point you to the following code example in ~\STM32Cube_FW_G4_V1.5.1\Projects\NUCLEO-G431RB\Examples_LL\CORDIC.

This code is based on the STM32 low level drive (LL).

The code is also described in section 3.1 of AN5325 "Acceleration example".

Step 1 is done only one time.

Step2 and 3 can be measured using (DWT or systick as follow):

/* Read systick counter */
start_ticks = SysTick->VAL;
/* Write angle */
LL_CORDIC_WriteData(CORDIC, ANGLE_CORDIC);
/* Read cosine */
cosOutput = (int32_t)LL_CORDIC_ReadData(CORDIC);
/* Read sine */
sinOutput = (int32_t)LL_CORDIC_ReadData(CORDIC);
/* Read systick counter */
stop_ticks = SysTick->VAL;
/* Calculate number of cycles elapsed */
elapsed_ticks = start_ticks-stop_ticks;

In your bedchmark, I suggest you integrate the lines 4, 6 and 8 above.

Best regards,

Younes

LCE · ‎2024-02-27

Another example that HAL functions often do a little too much if you want to have it fast.

@Sushmita please remove the "solved" tag from my post, I just showed how to best use the cycle counter, not how to get the Cordic faster.

CMYL · ‎2024-02-27

Hi @LCE

Thank you, as you can see ST provides 2 types of drivers:

- HAL is the higher abstraction layer to easily type-out your embedded firmware.

- When optimization is a matter, low-Level drivers are provided but more knowledge on the hardware peripherals is required.

Any way, customers can use both drivers in mix-mode :)

The CubeFW package ( ~\STM32Cube_FW_G4_V1.5.1\Projects\) provides a template showing how to use the 3 driver-modes (HAL, LL or mixed).

@Sushmita, hope you see perfromance enhancements using the LL drivers ?

Best regards,

Younes

LCE · ‎2024-02-27

@CMYL I know about HAL and LL, that's why I don't use either when I need best performance, but direct register access.

I come from the hardware side (schematics, analog, digital (PLD), PCB layout) and I want to know what's going on inside an SOC anyway. At least as much as possible or necessary.