AnsweredAssumed Answered

STM32F4: Low FIR filter performance using the DSP library

Question asked by schaffhauser.daniel on Feb 20, 2013
Latest reply on Feb 21, 2013 by schaffhauser.daniel
I need to implement a LP filter for a synchronous DSP application running at 400 kSamples/s. To evaluate the peformance of the FIR/Decimation function I am calling this function:

arm_fir_decimate_fast_q15(&FIR_Decimator_Instance, inData + (i*32), outData + (i*32), 32);

Initialization is performed using 31 coefficients:

arm_fir_decimate_init_q15(&FIR_Decimator_Instance, 31, 1, (q15_t *)&FIR31Taps[0], &StateBuffer32[0], 32);

I can not get more than 208980 Samples/s throughput on an STM32F407VG (Discovery board) running at 168 Mhz.

According to ST, a fully optimized routine in C should take 1.625 cycles per filter tap. For a 31-tap filter this amounts to a throughtput rate above 3 MSamples/s!

I've checked the code behind the function and it seems that all the optimizations are implemented (MAC, SIMD, loop unrolling).

Please note that I am not a uC expert. Any help is greatly appreciated!