2024-09-24 12:18 AM
Hello community,
The CMSIS DSP library supports a 4096-point FFT, but my project requires a 16384-point FFT. I added some extra library files (downloaded from GitHub; these files are written in C) to my project and successfully
performed a 16384-point FFT. However, the issue is that performing the FFT takes too much time, with a delay of around 27 to 30 seconds. Does anyone have a solution to make this more efficient and reduce the time?
2024-10-03 02:38 AM
In the CMSIS DSP library, a 4K FFT takes 1.6 seconds to execute. When I enable the I+D cache, it takes 440 ms. However, when I add extra DSP library files to perform a 16,384-point FFT, it takes 27 seconds to execute. With the I+D cache enabled, it takes 7 seconds. Is there Any Solution ???
2024-10-03 02:58 AM
So, with cache:
without cache
In both cases, the difference is a factor of 16:
16K points is 4 times as many as 4K points;
4 squared is 16.
Is it a surprise that multiplying the number of points by X multiplies the execution time by X squared ?
2024-10-03 03:01 AM
>Is there Any Solution ???
Yes. At first : "a 4K FFT takes ..." --- which FFT ? complex, real, in place..?
-- which data format ? float, double, int32 ?
-- if float: use float hardware ? also in fft-lib ? (libs can be for soft or hard mul.)
-- "add extra DSP library" : which one ? and its settings...?
2024-10-03 03:38 AM
> complex FFT , data format float
>H7 will supports FPU.
> extra DSP library - fallowed by this --> https://github.com/Treeed/Long_FFTs_for_CMSIS_DSP/tree/master
some part of code :
Process_FFT(ADC_TEMP_Ch1);
// FFT processing
void Process_FFT(signed short * pfftin_array)
{
unsigned short fl_i;
for(fl_i = 0; fl_i<FFT_SIZE; fl_i++)
{
fft_input_buffer[fl_i] = 0.0001007080078125 * pfftin_array[fl_i];
}
// Perform the FFT
arm_rfft_fast_f32_extra(&fft_instance, fft_input_buffer, fft_output_buffer, 0);
// Visualize the results (you can customize this part as needed)
for (uint32_t i = 0; i < FFT_SIZE / 2; i++)
{
float_t real = fft_output_buffer[i * 2];
float_t imag = fft_output_buffer[i * 2 + 1];
float_t magnitude = sqrtf(real * real + imag * imag);
FFT[index_1] = magnitude;
index_1++;
if(index_1>=FFT_SIZE)
{
index_1=0;
}
for(int m=0; m<=FFT_SIZE; m++)
{
FFT_VALUE=FFT[m];
FFT_INPUT = FFT_VALUE*10;
}
}
HAL_GPIO_TogglePin(GPIOB, GPIO_PIN_13);
}
*******************************************************
/**
@brief Processing function for the floating-point real FFT.
@param[in] S points to an arm_rfft_fast_instance_f32 structure
@param[in] p points to input buffer (Source buffer is modified by this function.)
@param[in] pOut points to output buffer
@param[in] ifftFlag
- value = 0: RFFT
- value = 1: RIFFT
@return none
*/
void arm_rfft_fast_f32_extra(
const arm_rfft_fast_instance_f32_extra * S,
float32_t * p,
float32_t * pOut,
uint8_t ifftFlag)
{
const arm_cfft_instance_f32_extra * Sint = &(S->Sint);
/* Calculation of Real FFT */
if (ifftFlag)
{
/* Real FFT compression */
merge_rfft_f32_extra(S, p, pOut);
/* Complex radix-4 IFFT process */
arm_cfft_f32_extra( Sint, pOut, ifftFlag, 1);
}
else
{
/* Calculation of RFFT of input */
arm_cfft_f32_extra( Sint, p, ifftFlag, 1);
/* Real FFT extraction */
stage_rfft_f32_extra(S, p, pOut);
}
}
*************************************************
/**
@brief Processing function for the floating-point complex FFT.
@param[in] S points to an instance of the floating-point CFFT structure
@param[in,out] p1 points to the complex data buffer of size <code>2*fftLen</code>. Processing occurs in-place
@param[in] ifftFlag flag that selects transform direction
- value = 0: forward transform
- value = 1: inverse transform
@param[in] bitReverseFlag flag that enables / disables bit reversal of output
- value = 0: disables bit reversal of output
- value = 1: enables bit reversal of output
@return none
*/
void arm_cfft_f32_extra(
const arm_cfft_instance_f32_extra * S,
float32_t * p1,
uint8_t ifftFlag,
uint8_t bitReverseFlag)
{
uint32_t L = S->fftLen, l;
float32_t invL, * pSrc;
if (ifftFlag == 1U)
{
/* Conjugate input data */
psrc=p1 + 1;
for (l = 0; l < L; l++)
{
*psrc=-*pSrc;
pSrc += 2;
}
}
switch (L)
{
case 16:
case 128:
case 1024:
case 8192:
case 65536:
arm_cfft_radix8by2_f32_extra ( (arm_cfft_instance_f32_extra *) S, p1);
break;
case 32:
case 256:
case 2048:
case 16383:
arm_cfft_radix8by4_f32_extra ( (arm_cfft_instance_f32_extra *) S, p1);
break;
case 64:
case 512:
case 4096:
case 32768:
arm_radix8_butterfly_f32 ( p1, L, (float32_t *) S->pTwiddle, 1);
break;
}
if ( bitReverseFlag )
arm_bitreversal_32_extra ((uint32_t*) p1, S->bitRevLength, S->pBitRevTable);
if (ifftFlag == 1U)
{
invL = 1.0f / (float32_t)L;
/* Conjugate and scale output data */
psrc=p1;
for (l= 0; l < L; l++)
{
*pSrc++ *= invL ;
*psrc=-(*pSrc) * invL;
pSrc++;
}
}
}
#endif /* defined(ARM_MATH_MVEF) && !defined(ARM_MATH_AUTOVECTORIZE) */
2024-10-03 03:41 AM
See the Posting Tips for how to properly post source code: