cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F4: Low FIR filter performance using the DSP library

danielschaffhauser
Associate
Posted on February 20, 2013 at 10:31

I need to implement a LP filter for a synchronous DSP application running at 400 kSamples/s. To evaluate the peformance of the FIR/Decimation function I am calling this function:

arm_fir_decimate_fast_q15(&FIR_Decimator_Instance, inData + (i*32), outData + (i*32), 32);

Initialization is performed using 31 coefficients:

arm_fir_decimate_init_q15(&FIR_Decimator_Instance, 31, 1, (q15_t *)&FIR31Taps[0], &StateBuffer32[0], 32);

I can not get more than 208980 Samples/s throughput on an STM32F407VG (Discovery board) running at 168 Mhz.

According to ST, a fully optimized routine in C should take 1.625 cycles per filter tap. For a 31-tap filter this amounts to a throughtput rate above 3 MSamples/s!

I've checked the code behind the function and it seems that all the optimizations are implemented (MAC, SIMD, loop unrolling).

Please note that I am not a uC expert. Any help is greatly appreciated!

#fir
2 REPLIES 2
Amel NASRI
ST Employee
Posted on February 20, 2013 at 11:13

Hello Daniel,

The performance results may depend on the toolchain you are using and the options you set (Ex: FPU used or not, optimization level...).

Did you taken this into account?

ST.MCU

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

danielschaffhauser
Associate
Posted on February 21, 2013 at 06:43

Thank you for the prompt response. The optimization option of the compiler (GCC 4.7.3) did indeed make the difference. After setting it from ''-O0'' (none) to ''-O3'' (most) I have been able to obtain dramatically improved processing times. Using a FIR/Decimator with 127 taps and a decimation factor of 8 via the function call

arm_fir_decimate_fast_q15(&FIR_Decimator_Instance, inData + (i*512), outData + (i*64), 128);

I can now obtain throughput rates above 2 MS/s!

Daniel