2017-04-13 12:10 AM
Dear Community,
to realize a nearly ideal low pass (without phase shift) – filter I need to perform a symmetric running average operation using the STM32F4 core. This operation takes 3000 sampled points of raw data and averages for each filtered point 100 of those raw data points ( +/- 50). Thus at least 290000 additions plus 2900 division are required. Currently the complete process takes about ~30 ms @ 144 MHz clock speed and I think it can be significantly accelerated. I would be glad if someone could give a suggestion what I can do to accelerate this operation.
Here is a piece of Code:
short int Even_Moving_average(short int * data_array, short int windowsize, short int position) {
int count = 0; int average = 0;//check
if(position - 50 < 0) return 0; if(position + 50 + 1 > BLOCK_SIZE -1) return 0;for(count = 0; count < 100 + 1; count++) {
average = average + data_array[count + position - 50]; } return (short int)(average/101);}
//call
for(count = 50; count < BLOCK_SIZE - 52; count ++) {
result[count] = Even_Moving_average(BASEwaveOUT,100,count); }// takes ~ 30 ms ;-(
2017-04-13 12:35 AM
Remember the sum. Except the first and last 50, which you solve separately as lead-in and lead-out, for each new point you then you don't need to perform 100 additions, only subtract subtract x[pos - 50] add x[pos + 50].
Try it with pencil and paper on a short average of 3 or 4.
JW
2017-04-13 04:25 AM
Dear Jan,
thank you very much for giving this hint – this clearly accelerated everything. After implementing your suggestion this time intensive step now takes only 3.7 ms – its now 8 times faster than before and this is great !
2017-04-13 11:56 PM
This statement:
average = average + data_array[count + position - 50];
might be tightened up by using a pointer, don't allocate space for average count unless you're going to do the calculation. There's less messing around with the initial check if they are brought together into one statement.
#include <stdint.h>
int16_t Even_Moving_average( int16_t data_array[], int16_t windowsize, int16_t position) { //check if (((position - 50) < 0) || ((position + 50 + 1) > (BLOCK_SIZE - 1))) { return 0; } int32_t average; int16_t* data; data = &data_array[position - 50]; for(int16_t count = 0, average = 0; count < 100 + 1; count++) { average = average + (uint32_t) *data++; } return ((int16_t)(average/101));}But make sure that the optimizer is turned on. Your divide is probably being turned into a multiply anyway using 9s-complement math unless your compiler really sucks.
It's probably test code, but windowsize is never used.
Andrei
2017-04-14 02:04 AM
Orrrrr, there's the
arm_mean_f32
DSP function on the F4. You could use the force.A
2017-04-20 03:57 AM
Dear Andrej,
using 'the force' sounds promising - I will try to do so.