cancel
Showing results for 
Search instead for 
Did you mean: 

How to accelerate a running average ?

jogerh
Associate II
Posted on April 13, 2017 at 09:10

Dear Community,

to realize a nearly ideal low pass (without phase shift) – filter I need to perform a symmetric running average operation using the STM32F4 core. This operation takes 3000 sampled points of raw data and averages for each filtered point 100 of those raw data points ( +/- 50). Thus at least 290000 additions plus 2900 division are required. Currently the complete process takes about ~30 ms @ 144 MHz clock speed and I think it can be significantly accelerated. I would be glad if someone could give a suggestion what I can do to accelerate this operation.

Here is a piece of Code:

short int Even_Moving_average(short int * data_array, short int windowsize, short int position) {

 int count = 0;

 int average = 0;

 //check 

 if(position - 50 < 0)       return 0;

 if(position + 50 + 1 > BLOCK_SIZE -1) return 0;

 for(count = 0; count < 100 + 1; count++) {

  average = average + data_array[count + position - 50];

 }   

 return (short int)(average/101);

}

//call

 for(count = 50; count < BLOCK_SIZE - 52; count ++) {  

    result[count] = Even_Moving_average(BASEwaveOUT,100,count);  

 }

// takes  ~ 30 ms ;-(

5 REPLIES 5
Jan Waclawek
Senior II
Posted on April 13, 2017 at 09:35

Remember the sum. Except the first and last 50, which you solve separately as lead-in and lead-out, for each new point you then you don't need to perform 100 additions, only subtract subtract x[pos - 50] add x[pos + 50].

Try it with pencil and paper on a short average of 3 or 4.

JW

Posted on April 13, 2017 at 11:25

Dear Jan,

thank you very much for giving this hint – this clearly accelerated everything. After implementing your suggestion this time intensive step now takes only 3.7 ms – its now 8 times faster than before  and this is great !

Posted on April 14, 2017 at 08:56

This statement:

  average = average + data_array[count + position - 50];

might be tightened up by using a pointer, don't allocate space for average count unless you're going to do the calculation. There's less messing around with the initial check if they are brought together into one statement.

#include <stdint.h>

int16_t Even_Moving_average( int16_t data_array[], int16_t windowsize, int16_t position) {

//check

   if (((position - 50) < 0) || ((position + 50 + 1) > (BLOCK_SIZE - 1))) {

      return 0;

   }

   int32_t average;

   int16_t* data;

   data = &data_array[position - 50];

   for(int16_t count = 0, average = 0; count < 100 + 1; count++) {

      average = average + (uint32_t) *data++;

    }

    return ((int16_t)(average/101));

}

But make sure that the optimizer is turned on. Your divide is probably being turned into a multiply anyway using 9s-complement math unless your compiler really sucks.

It's probably test code, but windowsize is never used.

Andrei

Posted on April 14, 2017 at 09:04

Orrrrr, there's the 

arm_mean_f32

DSP function on the F4. You could use the force.

A

Posted on April 20, 2017 at 10:57

Dear Andrej,

using 'the force' sounds promising - I will try to do so.