Skip to main content
SA V.1
Associate III
September 24, 2024
Question

Time Delay in FFT performance

  • September 24, 2024
  • 4 replies
  • 4417 views

Hello community,
The CMSIS DSP library supports a 4096-point FFT, but my project requires a 16384-point FFT. I added some extra library files (downloaded from GitHub; these files are written in C) to my project and successfully
performed a 16384-point FFT. However, the issue is that performing the FFT takes too much time, with a delay of around 27 to 30 seconds. Does anyone have a solution to make this more efficient and reduce the time?

 

4 replies

AScha.3
Super User
September 24, 2024

Hi,

16k FFT , is about N*log(N) butterfly ops -> about 230k ; assuming the cpu can do 4Mio /s , your FFT should run in about 100ms or so, not 30 s.

What we talk about : data/FFT in place? only real data ? fixed 16b or float or double data/FFT ?

 

To get best speed, you have to use (for ARM ) optimized FFT, CMSIS DSP library  should be this exactly.

Then your data : in RAM, all caches ON. (I+D)

And code with optimizer on -O2 or -Ofast .

"If you feel a post has answered your question, please click ""Accept as Solution""."
SA V.1
SA V.1Author
Associate III
September 24, 2024

16k FFT , is about N*log(N) butterfly ops -> about 230k ; assuming the cpu can do 4Mio /s , your FFT should run in about 100ms or so, not 30 s.  ---> i didn't understand this calculation ?  

FFT input is float, am using ARM(STM32H745xx muc CortexM7),CMSIS DSP library limit is 4096 points my project requrement 16384 points .

 

AScha.3
Super User
September 24, 2024

Calculation is about : how many operations needed -> (i.e., order n log ⁡ n  or greater)

-> https://en.wikipedia.org/wiki/Fast_Fourier_transform

Just ABOUT how much time it will need, assuming 4 cpu clocks for a MUL ... just to see, what will come out.

Ok, so float (try on H7 : double may be faster, depends on lib .)

only real data or complex ? real maybe : arm_rfft_fast_f32(..) ;

So try at first this, with max. possible - 4k or whatever, to see: you do it right (computes in some ms, then ok.)

Then go for other implementation, if here in CMSIS some limit is bad for you, maybe use ffftw or other, just look on web and try. (You sure about the 4k limit ? No other function in CMSIS with bigger FFT ?

I just see the int16 "limit" :

uint16_t fftLenRFFT

 So max. 65k leght possible here. )

(Btw i used most time my own FFT, not optimized code, but i made it (as a student) and its working correct . )

 

"If you feel a post has answered your question, please click ""Accept as Solution""."
SA V.1
SA V.1Author
Associate III
September 24, 2024

 working on H7 only (STM32H745xx) ,real data  

---> The functions support lengths of [32, 64, 128, ..., 4096]samples.

 

Done with 4K and got the output properly Next Added some library files to perform 16384

and got the result also but consumes more time https://github.com/Treeed/Long_FFTs_for_CMSIS_DSP/tree/master.

so looking for solution ?? or guidence to use fftw ??

 

TDK
September 24, 2024
"If you feel a post has answered your question, please click ""Accept as Solution""."
SA V.1
SA V.1Author
Associate III
October 3, 2024

In the CMSIS DSP library, a 4K FFT takes 1.6 seconds to execute. When I enable the I+D cache, it takes 440 ms. However, when I add extra DSP library files to perform a 16,384-point FFT, it takes 27 seconds to execute. With the I+D cache enabled, it takes 7 seconds. Is there Any Solution ???

Andrew Neil
Super User
October 3, 2024

So, with cache:

  •  4,096 points takes 440 ms;
  • 16,384 points takes 7s

without cache

  •  4,096 points takes 1.6s;
  • 16,384 points takes 27s

In both cases, the difference is a factor of 16:

16K points is 4 times as many as 4K points;

4 squared is 16.

Is it a surprise that multiplying the number of points by X multiplies the execution time by X squared ?

 

A complex system that works is invariably found to have evolved from a simple system that worked.A complex system designed from scratch never works and cannot be patched up to make it work.