Time Delay in FFT performance

SA V.1 · ‎2024-09-24

Hello community,
The CMSIS DSP library supports a 4096-point FFT, but my project requires a 16384-point FFT. I added some extra library files (downloaded from GitHub; these files are written in C) to my project and successfully
performed a 16384-point FFT. However, the issue is that performing the FFT takes too much time, with a delay of around 27 to 30 seconds. Does anyone have a solution to make this more efficient and reduce the time?

AScha.3 · ‎2024-09-24

Hi,

16k FFT , is about N*log(N) butterfly ops -> about 230k ; assuming the cpu can do 4Mio /s , your FFT should run in about 100ms or so, not 30 s.

What we talk about : data/FFT in place? only real data ? fixed 16b or float or double data/FFT ?

To get best speed, you have to use (for ARM ) optimized FFT, CMSIS DSP library should be this exactly.

Then your data : in RAM, all caches ON. (I+D)

And code with optimizer on -O2 or -Ofast .

If you feel a post has answered your question, please click "Accept as Solution".

SA V.1 · ‎2024-09-24

16k FFT , is about N*log(N) butterfly ops -> about 230k ; assuming the cpu can do 4Mio /s , your FFT should run in about 100ms or so, not 30 s. ---> i didn't understand this calculation ?

FFT input is float, am using ARM(STM32H745xx muc CortexM7),CMSIS DSP library limit is 4096 points my project requrement 16384 points .

Andrew Neil · ‎2024-09-24

@SA V.1 wrote:
FFT input is float,

Beware that standard C libraries often use double - does the H7's floating point unit support double ?

If not, the calculations will be done in software...

@SA V.1 wrote:

I added some extra library files (downloaded from GitHub...

So give a link.

Are those libraries intended for embedded microcontrollers?

SA V.1 · ‎2024-09-24

H7's floating point unit support double ?

Yes --->The Arm® Cortex®-M7 with double-precision FPU processor is the latest generation of Arm processors for embedded systems.

I add these library files and peforming the FFT and got the result also but the problen is taking too much time for this any solution ?

link--->https://github.com/Treeed/Long_FFTs_for_CMSIS_DSP/tree/master

AScha.3 · ‎2024-09-24

Calculation is about : how many operations needed -> (i.e., order n log ⁡ n or greater)

-> https://en.wikipedia.org/wiki/Fast_Fourier_transform

Just ABOUT how much time it will need, assuming 4 cpu clocks for a MUL ... just to see, what will come out.

Ok, so float (try on H7 : double may be faster, depends on lib .)

only real data or complex ? real maybe : arm_rfft_fast_f32(..) ;

So try at first this, with max. possible - 4k or whatever, to see: you do it right (computes in some ms, then ok.)

Then go for other implementation, if here in CMSIS some limit is bad for you, maybe use ffftw or other, just look on web and try. (You sure about the 4k limit ? No other function in CMSIS with bigger FFT ?

I just see the int16 "limit" :

uint16_t

fftLenRFFT

So max. 65k leght possible here. )

(Btw i used most time my own FFT, not optimized code, but i made it (as a student) and its working correct . )

If you feel a post has answered your question, please click "Accept as Solution".

TDK · ‎2024-09-24

Duplicate:

16384 points of FFT(Fast Fourier Transform) - STMicroelectronics Community

If you feel a post has answered your question, please click "Accept as Solution".

SA V.1 · ‎2024-09-24

working on H7 only (STM32H745xx) ,real data

---> The functions support lengths of [32, 64, 128, ..., 4096]samples.

Done with 4K and got the output properly Next Added some library files to perform 16384

and got the result also but consumes more time https://github.com/Treeed/Long_FFTs_for_CMSIS_DSP/tree/master.

so looking for solution ?? or guidence to use fftw ??

Andrew Neil · ‎2024-09-24

@SA V.1 wrote:
Done with 4K and got the output properly Next Added some library files to perform 16384
and got the result also but consumes more time

well, of course it's going to take more time - you've got four times as many samples to process!

Have you contacted the author of the library - what is their expectation of the performance?

What optimisation level are you using when building the code?

Have you evaluated any other libraries?

@MasterT gave you some suggestions in your other thread - how do they perform?

https://community.st.com/t5/stm32-mcus-products/16384-points-of-fft-fast-fourier-transform/m-p/642837/highlight/true#M236288

AScha.3 · ‎2024-09-24

So how much time needs your 4k FFT, and the 16k ?

If you feel a post has answered your question, please click "Accept as Solution".