2023-03-20 07:13 PM
STM32H7 has a FMAC (Filter mathematical accelerator), and FFT also build on multiply-accumulation operations similar to the filters. So, can the FMAC be tweaked to implement FFT with minimum MCU intervention.
And also FFT calculation with M7 core without HW accelerator is super slow (1024-point Q15 FFT takes 265 us for STM32F746). And I need it to be at least 7 times faster. So I hope FMAC make much it faster.
BTW. the FFT I'm interested in, is 2N real input to N complex output.
2024-03-25 09:11 AM
I don't believe the FMAC can be tweaked to implement a FFT as it appears to be designed to implement a FIR or IIR type filter that is much simpler in nature.
From your timings of the FFTs I suspect you maybe able to improve the performance of the H7 processor. When implementing a FFT using this processor it is important that the input/output data is in memory the processor can access easily. This can be done by placing the data in DTCM memory. If this memory can not be used then at least ensure that the ICACHE and DCACHE are enabled. I have personally measured a 512 point FFT taking 27338 cycles, which if the processor is running at 550MHz would be 49.7us. Even at this speed though it would not be able to meet your requirements of 7 times faster than you had.