2023-08-01 02:09 PM
Hi, I am working on a project doing realtime FFT processing on an audio signal using an STM32H730 processor with the CMSIS FFT library. Currently trying to reduce latency through lower block size and higher overlap between FFTs, and looking for ways to optimize and reduce processing overhead in the FFT function.
I wanted to ask whether performance could be improved by replacing the arm_rfft_f32 function with the arm_rfft_f16 function? Would this be faster on the STM32H730 hardware? And would the fixed point functions like arm_rfft_q31 be faster still?
Please let me know. Also open to any other advice on how to optimize CMSIS FFT processing for SMT32H730 hardware. Thanks!
2023-08-01 03:06 PM
The MCU has an instruction cycle counter, this can be used to benchmark different algorithm efficiency, ie loop unrolling, induction, optimization, etc.
See DWT CYCCNT
2023-08-03 09:02 AM - edited 2023-08-03 09:15 AM
It is very difficult for us to AB this since we have to use a custom modified version of CMSIS for the FFT library to fit on the on chip flash. Switching to FFT16 is not trivial so we wanted to verify there would be any benefit to doing so before going down that path.
It seems like there should be a direct answer as to whether FFT16 is faster than FFT32 on the STM32H730 hardware. Is anyone able to answer whether FFT16 runs faster than FFT32?