Time Delay in FFT performance

SA V.1 · ‎2024-09-24

Hello community,
The CMSIS DSP library supports a 4096-point FFT, but my project requires a 16384-point FFT. I added some extra library files (downloaded from GitHub; these files are written in C) to my project and successfully
performed a 16384-point FFT. However, the issue is that performing the FFT takes too much time, with a delay of around 27 to 30 seconds. Does anyone have a solution to make this more efficient and reduce the time?

SA V.1 · ‎2024-10-03

In the CMSIS DSP library, a 4K FFT takes 1.6 seconds to execute. When I enable the I+D cache, it takes 440 ms. However, when I add extra DSP library files to perform a 16,384-point FFT, it takes 27 seconds to execute. With the I+D cache enabled, it takes 7 seconds. Is there Any Solution ???

Andrew Neil · ‎2024-10-03

So, with cache:

4,096 points takes 440 ms;
16,384 points takes 7s

without cache

4,096 points takes 1.6s;
16,384 points takes 27s

In both cases, the difference is a factor of 16:

16K points is 4 times as many as 4K points;

4 squared is 16.

Is it a surprise that multiplying the number of points by X multiplies the execution time by X squared ?

AScha.3 · ‎2024-10-03

>Is there Any Solution ???

Yes. At first : "a 4K FFT takes ..." --- which FFT ? complex, real, in place..?

-- which data format ? float, double, int32 ?

-- if float: use float hardware ? also in fft-lib ? (libs can be for soft or hard mul.)

-- "add extra DSP library" : which one ? and its settings...?

If you feel a post has answered your question, please click "Accept as Solution".

SA V.1 · ‎2024-10-03

> complex FFT , data format float

>H7 will supports FPU.

> extra DSP library - fallowed by this --> https://github.com/Treeed/Long_FFTs_for_CMSIS_DSP/tree/master

some part of code :

Process_FFT(ADC_TEMP_Ch1);

// FFT processing

void Process_FFT(signed short * pfftin_array)

{

unsigned short fl_i;

for(fl_i = 0; fl_i<FFT_SIZE; fl_i++)

{

fft_input_buffer[fl_i] = 0.0001007080078125 * pfftin_array[fl_i];

}

// Perform the FFT

arm_rfft_fast_f32_extra(&fft_instance, fft_input_buffer, fft_output_buffer, 0);

// Visualize the results (you can customize this part as needed)

for (uint32_t i = 0; i < FFT_SIZE / 2; i++)

{

float_t real = fft_output_buffer[i * 2];

float_t imag = fft_output_buffer[i * 2 + 1];

float_t magnitude = sqrtf(real * real + imag * imag);

FFT[index_1] = magnitude;

index_1++;

if(index_1>=FFT_SIZE)

{

index_1=0;

}

for(int m=0; m<=FFT_SIZE; m++)

{

FFT_VALUE=FFT[m];

FFT_INPUT = FFT_VALUE*10;

}

HAL_GPIO_TogglePin(GPIOB, GPIO_PIN_13);

}

*******************************************************

/**

@brief Processing function for the floating-point real FFT.

@param[in] S points to an arm_rfft_fast_instance_f32 structure

@param[in] p points to input buffer (Source buffer is modified by this function.)

@param[in] pOut points to output buffer

@param[in] ifftFlag

- value = 0: RFFT

- value = 1: RIFFT

@return none

*/

void arm_rfft_fast_f32_extra(

const arm_rfft_fast_instance_f32_extra * S,

float32_t * p,

float32_t * pOut,

uint8_t ifftFlag)

{

const arm_cfft_instance_f32_extra * Sint = &(S->Sint);

/* Calculation of Real FFT */

if (ifftFlag)

{

/* Real FFT compression */

merge_rfft_f32_extra(S, p, pOut);

/* Complex radix-4 IFFT process */

arm_cfft_f32_extra( Sint, pOut, ifftFlag, 1);

}

else

{

/* Calculation of RFFT of input */

arm_cfft_f32_extra( Sint, p, ifftFlag, 1);

/* Real FFT extraction */

stage_rfft_f32_extra(S, p, pOut);

}

*************************************************

/**

@brief Processing function for the floating-point complex FFT.

@param[in] S points to an instance of the floating-point CFFT structure

@param[in,out] p1 points to the complex data buffer of size <code>2*fftLen</code>. Processing occurs in-place

@param[in] ifftFlag flag that selects transform direction

- value = 0: forward transform

- value = 1: inverse transform

@param[in] bitReverseFlag flag that enables / disables bit reversal of output

- value = 0: disables bit reversal of output

- value = 1: enables bit reversal of output

@return none

*/

void arm_cfft_f32_extra(

const arm_cfft_instance_f32_extra * S,

float32_t * p1,

uint8_t ifftFlag,

uint8_t bitReverseFlag)

{

uint32_t L = S->fftLen, l;

float32_t invL, * pSrc;

if (ifftFlag == 1U)

{

/* Conjugate input data */

psrc=p1 + 1;

for (l = 0; l < L; l++)

{

*psrc=-*pSrc;

pSrc += 2;

}

switch (L)

{

case 16:

case 128:

case 1024:

case 8192:

case 65536:

arm_cfft_radix8by2_f32_extra ( (arm_cfft_instance_f32_extra *) S, p1);

break;

case 32:

case 256:

case 2048:

case 16383:

arm_cfft_radix8by4_f32_extra ( (arm_cfft_instance_f32_extra *) S, p1);

break;

case 64:

case 512:

case 4096:

case 32768:

arm_radix8_butterfly_f32 ( p1, L, (float32_t *) S->pTwiddle, 1);

break;

}

if ( bitReverseFlag )

arm_bitreversal_32_extra ((uint32_t*) p1, S->bitRevLength, S->pBitRevTable);

if (ifftFlag == 1U)

{

invL = 1.0f / (float32_t)L;

/* Conjugate and scale output data */

psrc=p1;

for (l= 0; l < L; l++)

{

*pSrc++ *= invL ;

*psrc=-(*pSrc) * invL;

pSrc++;

}

#endif /* defined(ARM_MATH_MVEF) && !defined(ARM_MATH_AUTOVECTORIZE) */

Andrew Neil · ‎2024-10-03

See the Posting Tips for how to properly post source code:

https://community.st.com/t5/community-guidelines/how-to-write-your-question-to-maximize-your-chances-to-find-a/ta-p/575228