How to accelerate a running average ?

jogerh · ‎2017-04-13

Posted on April 13, 2017 at 09:10

Dear Community,

to realize a nearly ideal low pass (without phase shift) – filter I need to perform a symmetric running average operation using the STM32F4 core. This operation takes 3000 sampled points of raw data and averages for each filtered point 100 of those raw data points ( +/- 50). Thus at least 290000 additions plus 2900 division are required. Currently the complete process takes about ~30 ms @ 144 MHz clock speed and I think it can be significantly accelerated. I would be glad if someone could give a suggestion what I can do to accelerate this operation.

Here is a piece of Code:

short int Even_Moving_average(short int * data_array, short int windowsize, short int position) {

int count = 0;

int average = 0;

//check

if(position - 50 < 0) return 0;

if(position + 50 + 1 > BLOCK_SIZE -1) return 0;

for(count = 0; count < 100 + 1; count++) {

average = average + data_array[count + position - 50];

}

return (short int)(average/101);

}

//call

for(count = 50; count < BLOCK_SIZE - 52; count ++) {

result[count] = Even_Moving_average(BASEwaveOUT,100,count);

}

// takes ~ 30 ms ;-(

Jan Waclawek · ‎2017-04-13

Posted on April 13, 2017 at 09:35

Remember the sum. Except the first and last 50, which you solve separately as lead-in and lead-out, for each new point you then you don't need to perform 100 additions, only subtract subtract x[pos - 50] add x[pos + 50].

Try it with pencil and paper on a short average of 3 or 4.

JW

jogerh · ‎2017-04-13

Posted on April 13, 2017 at 11:25

Dear Jan,

thank you very much for giving this hint â€“ this clearly accelerated everything. After implementing your suggestion this time intensive step now takes only 3.7 ms â€“ its now 8 times faster than before and this is great !

Andrei Chichak · ‎2017-04-13

Posted on April 14, 2017 at 08:56

This statement:

average = average + data_array[count + position - 50];

might be tightened up by using a pointer, don't allocate space for average count unless you're going to do the calculation. There's less messing around with the initial check if they are brought together into one statement.

#include <stdint.h>

int16_t Even_Moving_average( int16_t data_array[], int16_t windowsize, int16_t position) {

//check

if (((position - 50) < 0) || ((position + 50 + 1) > (BLOCK_SIZE - 1))) {

return 0;

}

int32_t average;

int16_t* data;

data = &data_array[position - 50];

for(int16_t count = 0, average = 0; count < 100 + 1; count++) {

average = average + (uint32_t) *data++;

}

return ((int16_t)(average/101));

}

But make sure that the optimizer is turned on. Your divide is probably being turned into a multiply anyway using 9s-complement math unless your compiler really sucks.

It's probably test code, but windowsize is never used.

Andrei

Andrei Chichak · ‎2017-04-14

Posted on April 14, 2017 at 09:04

Orrrrr, there's the

arm_mean_f32

DSP function on the F4. You could use the force.

A

jogerh · ‎2017-04-20

Posted on April 20, 2017 at 10:57

Dear Andrej,

using 'the force' sounds promising - I will try to do so.