I need the fastest implementation of a biquad IIR filter for STM32F4 in assembler. I have to run more than 60 filters at 48kHz. My current best time are 24 cycles per filter with TDF2 (17 at STM32F7).
The order of the instructions is important, I noticed. Wrong instruction orders are penalized with a lot more computation time.
Hope there is an assembler freak with the same passion!