cancel
Showing results for 
Search instead for 
Did you mean: 

Fast FIR - less than one cycle per tap

gregstm
Senior III
Posted on September 03, 2015 at 16:17

Any interest in a block FIR routine (16 bit data) that can execute (on average) 1 tap in less than 1 cycle?

The catch: It is an 8 tap filter (result not rounded).

The filter has low start and end overhead and can be easily cascaded. Calculating coefficients for cascaded sections is a bit of a mystery though. Data block size has to be a multiple of 8.

When I finally get my hands on some STM32L4 hardware I can verify/optimise the load/store cycles and tidy up the code.

Not sure how useful this thing is, but it was certainly good assembly practice.

2 REPLIES 2
gregstm
Senior III
Posted on September 23, 2015 at 11:46

Tested and tweaked with actual STM32L4 hardware now. Result: 15 cycles for 16 taps = 0.9374 cycles per tap. So the M4 can be faster than a standard DSP - at least for an 8 tap filter. Will keep researching into how to cascade these things.

qwer.asdf
Senior
Posted on September 23, 2015 at 12:21

Thank you, surely it'll be interesting to see. It'll be wonderful if you could share the code on GitHub.