cancel
Showing results for 
Search instead for 
Did you mean: 

Optimizing FIR Filter Multiply Accumulate

JP_ama
Associate III

Hello,

I am trying to optimize a function for a basic FIR filter on a STM32H743. This is the function:

inline float_t FIRFilter(const float_t* inputBuffer, const float_t* firData, const uint16_t size, const uint16_t currentBufferIndex)
{
	float_t output = 0;

	for(int i=0; i<size; i+=4)
	{
		output += inputBuffer[(currentBufferIndex+BUFFER_SIZE-i) % BUFFER_SIZE]*firData[i];
		output += inputBuffer[(currentBufferIndex+BUFFER_SIZE-i-1) % BUFFER_SIZE]*firData[i+1];
		output += inputBuffer[(currentBufferIndex+BUFFER_SIZE-i-2) % BUFFER_SIZE]*firData[i+2];
		output += inputBuffer[(currentBufferIndex+BUFFER_SIZE-i-3) % BUFFER_SIZE]*firData[i+3];
	}

	return output;
}

I tried to use the VFMA.F32 instruction using this:

__STATIC_FORCEINLINE void __VFMAF32(float_t* op1, float_t op2, float_t op3)
{
  __ASM volatile ("vfma.f32 %0, %1, %2" : "=t" (op1) : "t" (op2), "t" (op3) );
}

And modifying the function accordingly:

inline float_t FIRFilter(const float_t* inputBuffer, const float_t* firData, const uint16_t size, const uint16_t currentBufferIndex)
{
	float_t output = 0;

	for(int i=0; i<size; i+=4)
	{
		__VFMAF32(&output, inputBuffer[(currentBufferIndex+BUFFER_SIZE-i) % BUFFER_SIZE], firData[i]);
		__VFMAF32(&output, inputBuffer[(currentBufferIndex+BUFFER_SIZE-i-1) % BUFFER_SIZE], firData[i+1]);
		__VFMAF32(&output, inputBuffer[(currentBufferIndex+BUFFER_SIZE-i-2) % BUFFER_SIZE], firData[i+2]);
		__VFMAF32(&output, inputBuffer[(currentBufferIndex+BUFFER_SIZE-i-3) % BUFFER_SIZE], firData[i+3]);
	}

	return output;
}

Not only is it slower, it also returns the wrong result. I'm not really sure how to use the VFMA.F32 instruction with the inline assembler correctly.

Can anybody help how to optimize this function properly?

Thanks!

1 REPLY 1
Piranha
Chief II

Just use the standard fmaf() function. Or use FIR functions from the ARM's CMSIS-DSP library.