Skip to main content
danielschaffhauser
Associate
February 20, 2013
Question

STM32F4: Low FIR filter performance using the DSP library

  • February 20, 2013
  • 2 replies
  • 1083 views
Posted on February 20, 2013 at 10:31

I need to implement a LP filter for a synchronous DSP application running at 400 kSamples/s. To evaluate the peformance of the FIR/Decimation function I am calling this function:

arm_fir_decimate_fast_q15(&FIR_Decimator_Instance, inData + (i*32), outData + (i*32), 32);

Initialization is performed using 31 coefficients:

arm_fir_decimate_init_q15(&FIR_Decimator_Instance, 31, 1, (q15_t *)&FIR31Taps[0], &StateBuffer32[0], 32);

I can not get more than 208980 Samples/s throughput on an STM32F407VG (Discovery board) running at 168 Mhz.

According to ST, a fully optimized routine in C should take 1.625 cycles per filter tap. For a 31-tap filter this amounts to a throughtput rate above 3 MSamples/s!

I've checked the code behind the function and it seems that all the optimizations are implemented (MAC, SIMD, loop unrolling).

Please note that I am not a uC expert. Any help is greatly appreciated!

#fir
    This topic has been closed for replies.

    2 replies

    Amel NASRI
    Technical Moderator
    February 20, 2013
    Posted on February 20, 2013 at 11:13

    Hello Daniel,

    The performance results may depend on the toolchain you are using and the options you set (Ex: FPU used or not, optimization level...).

    Did you taken this into account?

    ST.MCU

    To give better visibility on the answered topics, please click on "Best Answer" on the reply which solved your issue or answered your question.
    danielschaffhauser
    Associate
    February 21, 2013
    Posted on February 21, 2013 at 06:43

    Thank you for the prompt response. The optimization option of the compiler (GCC 4.7.3) did indeed make the difference. After setting it from ''-O0'' (none) to ''-O3'' (most) I have been able to obtain dramatically improved processing times. Using a FIR/Decimator with 127 taps and a decimation factor of 8 via the function call

    arm_fir_decimate_fast_q15(&FIR_Decimator_Instance, inData + (i*512), outData + (i*64), 128);

    I can now obtain throughput rates above 2 MS/s!

    Daniel