Optimizing DFT for loop

GHrib.1 · ‎2023-12-29

Do you think that it is common for the function bellow (arraySize = 32) to take approximately 125us to execute? The for loop is the main problem as it takes almost 120us to execute.
I am using STM32G474ret6u MCU, with clock at 170Mhz? Also the optimization is set to -03.
Does anyone have any idea in which direction should I go to optimize it? Should I examine the assembler code, or try to use some peripheral like FMAC?

const float reComponent[32] = {1.000000, 0.923880, 0.707107, 0.382683, 0.000000, -0.382683, -0.707107, -0.923880,
		-1.000000, -0.923880, -0.707107, -0.382683, -0.000000, 0.382683, 0.707107, 0.923880, 1.000000, 0.923880,
		0.707107, 0.382683, 0.000000, -0.382683,-0.707107, -0.923880, -1.000000, -0.923880, -0.707107, -0.382683,
		-0.000000, 0.382683, 0.707107, 0.923880 };

const float imComponent[32] = {0.000000, 0.382683, 0.707107, 0.923880, 1.000000, 0.923880, 0.707107, 0.382683, 0.000000,
		-0.382683, -0.707107, -0.923880, -1.000000, -0.923880, -0.707107, -0.382683, -0.000000, 0.382683, 0.707107,
		0.923880, 1.000000, 0.923880, 0.707107, 0.382683, 0.000000, -0.382683, -0.707107, -0.923880, -1.000000,
		-0.923880, -0.707107, -0.382683 };

float DFTphase(uint16_t* inputArray, int arraySize)
{

    //local variables
    float fkRe=0;
    float fkIm=0;
    float phase=0;

    //Computing of Fourier series
    for (int n = 0; n < arraySize; n++)
    {
        fkRe = fkRe + (*inputArray - 2048.0) * reComponent[n];
        fkIm = fkIm + (*inputArray - 2048.0) * imComponent[n];
        //Assign address of next element to pointer inputArray
        inputArray++;
    }

    //Evaluation of phase; atan2f function returns angle in the interval [-PI,PI]
    phase= atan2f(fkRe,fkIm);

    return phase;
}

AScha.3 · ‎2023-12-29

Your input is uint16, so subtracting 2048 (as integer) would be much faster , at same precision as your 2048.0 (as double float); try...and tell , how much faster it is.

+

Cordic can do the atan in about 140ns (i tried on H563 at 250MHz) - if this helps.

If you feel a post has answered your question, please click "Accept as Solution".

View solution in original post

AScha.3 · ‎2023-12-29

Your input is uint16, so subtracting 2048 (as integer) would be much faster , at same precision as your 2048.0 (as double float); try...and tell , how much faster it is.

+

Cordic can do the atan in about 140ns (i tried on H563 at 250MHz) - if this helps.

If you feel a post has answered your question, please click "Accept as Solution".

GHrib.1 · ‎2023-12-29

Crazy :D . The whole function execution time is now approximately 8us (before it was 126us), I am more than pleased with that :D. I double-checked because I couldn't believe it.
Thank you.

Pavel A. · ‎2023-12-29

Also you can put the const arrays in RAM: fetching from RAM may be faster than flash.

Suggest to massage the code a bit so it doesn't scratch the reviewer's eye...

float DFTphase(uint16_t* inputArray, int arraySize)
{
    assert(arraySize <= 32);
    float fkRe=0;
    float fkIm=0;

    for (int n = 0; n < arraySize; n++)
    {
        float v = (float)(inputArray[n] - 2048U);
        fkRe += v * reComponent[n];
        fkIm += v * imComponent[n];
    }

    //Evaluation of phase; atan2f function returns angle in the interval [-PI,PI]
    return atan2f(fkRe,fkIm);
}

AScha.3 · ‎2023-12-29

massage

:face_with_tears_of_joy:

If you feel a post has answered your question, please click "Accept as Solution".