2021-02-19 02:03 AM
Hello,
i calculate an FFT with the STM32F401RE. It works fine so far. Here are the relevant code snippets:
arm_rfft_q31(&S1, (q31_t*)fft_input_buf1, (q31_t*)fft_complex_buf1); //takes about 1760 µs
arm_cmplx_mag_q31((q31_t*)fft_complex_buf1, (q31_t*)fft_amplitude_buf1, (uint32_t)FFT_LENGTH); // takes about 340 µs.
I use DMA to collect the ADC data (2048 points) and calculate an 1024 points FFT both, after half the buffer is filled, and after the complete buffer is filled.
The results looks good. However, now I try to optimize the code in terms of performance and things become strange:
My observation:
uint32_t delay = 20000; // 20000/84MHz * 10= 2380 µs
while (delay > 0){
delay --;
}
extern int32_t fft_input_buf1 [FFT_LENGTH];
extern int32_t fft_complex_buf1[FFT_LENGTH*2];
extern int32_t fft_amplitude_buf1[FFT_LENGTH*1];
extern int32_t fft_input_buf2 [FFT_LENGTH];
extern int32_t fft_complex_buf2[FFT_LENGTH*2];
extern int32_t fft_amplitude_buf2[FFT_LENGTH*1];
My conclusions:
Finally:
Thanks a lot for your help,
Cheers
2021-02-19 07:38 AM
Hello @felix23,
I recommend you referring to the "Digital signal processing for STM32 microcontrollers using CMSIS" application note (AN4841).
It contains a typical example with explicit results that can help you, please check the 4.2.2 section.
I also recommend you to download the X-CUBE-DSPDEMO, it contains several examples, you can check them.
Chahinez.
2021-02-20 04:02 AM
Hello Chahinez,
thanks a lot for your hints. I have already read this document and it was helpful. The FFT output is reasonable. I am happy with the nurmeric result. I just wonder, if I do something wrong because these strange 900 µs are needed in every second FFT calculation and only, if I calculate both, FFT and magnitude.
I forgot to mention, that I sample the analog data with 102,4 kHz, so the ADC buffer is half filled every 10 ms. This means, that an FFT calcualtion takes place every 10000 µs. This should not lead to trouble, when the calculation of the FFT including magnitude takes 3570 µs in the worst case, right?
BR
Felix
2021-02-24 08:36 AM
Hello @felix23,
I suggest you the following:
Try disabling/enabling the data cache, instruction cache, prefetch bits of the flash access control register (Flash_ACR), please refer to the 3.8.1 section of the RM0368.
In case of cache and prefetch enabled, it is possible that during the half complete handler the FFT function was not preloaded in the cache, so the CPU fetched it from Flash. Then right after in the complete handler, the FFT function instruction were executed from cache, which can explain the shorter duration.
I suggest adding a cache reset before executing the FFT functions just to confirm this hypothesis.
Please keep me updated whether one of the previous suggestions helped you.
Chahinez.