STM32H7 has a FMAC (Filter mathematical accelerator), and FFT also build on multiply-accumulation operations similar to the filters. So, can the FMAC be tweaked to implement FFT with minimum MCU intervention.
And also FFT calculation with M7 core without HW accelerator is super slow (1024-point Q15 FFT takes 265 us for STM32F746). And I need it to be at least 7 times faster. So I hope FMAC make much it faster.
BTW. the FFT I'm interested in, is 2N real input to N complex output.