STM32 get Magnitude after fft

otr · ‎2010-01-17

Posted on January 17, 2010 at 21:19

otr · ‎2011-05-17

Posted on May 17, 2011 at 13:38

stm32@72MHz,1024 point FFT, placed the FFT coefficients in RAM.

less than 1.8ms,but get the Magnitude use 90ms~~.howto speed up the function?I want only 16bit~~

code from the DSP_lib demo:

void powerMag(long nfill, char* strPara)

{

int32_t lX,lY;

uint32_t i;

for (i=0; i < nfill; i++)

{

lX= (lBUFOUT[i]<>16; /* sine_cosine --> cos */

lY= (lBUFOUT[i] >> 16); /* sine_cosine --> sin */

{

float X= 64*((float)lX)/32768;

float Y = 64*((float)lY)/32768;

float Mag = sqrt(X*X+ Y*Y)/nfill;

lBUFMAG[i] = (uint32_t)(Mag*65536);

}

if (strPara == ''1SIDED'') onesided(nfill);

}

16-32micros · ‎2011-05-17

Posted on May 17, 2011 at 13:38

Hi,

We have not yet optimized Magnitude function for our STM32, but in our example demo. We are using the default compiler mathematical libraries using float librairies, Therefore it is closely dependent on the compiler option ( RVMDK, EWARM, GNU etc...)

Magnitude = sqrt( real ^2 + imaginary ^2)

I recommend you to refer to this

http://www.st.com/mcu/forums-cat-9362-23.html

from ''Ivan Mellen'' where

''magnitude16_16bIn'' for 1024 points should take only ''220µs''.

Cheers,

STOne-32.

otr · ‎2011-05-17

Posted on May 17, 2011 at 13:38

Amazing!

thanks very much~.~

:D

thomasbretgeld9 · ‎2011-05-17

Posted on May 17, 2011 at 13:38

Hi,

currently I am facing the same problem with the magnitude calculations. FFT takes around 26000 cycles and magnitude consumes 330000 cycles (only for the first half of bins!). Unfortunately the link to the ''magnitude16_16bIn'' is not working anymore. Can somebody tell me where to find the post about this function or any site, where I can find more information about it? I am using the Code Sourcery gcc. Google and the search in this forum didn't help me.

Kind regards,

Thomas

Tesla DeLorean · ‎2011-05-17

Posted on May 17, 2011 at 13:38

You could always try Ivan

http://www.embeddedsignals.com/

This is the FFT 2.0, but probably not what you want

https://my.st.com/public/STe2ecommunities/mcu/Lists/ARM%20CortexM3%20STM32/DispForm.aspx?ID=10404&RootFolder=/public/STe2ecommunities/mcu/Lists/ARM CortexM3 STM32/Complexreal, 1632bit, radix42 FFT, windowing, sqrt and magnitude library&Source=https://my.st.com/public/STe2ecommunities/mcu/Lists/ARM%2520CortexM3%2520STM32/Flat.aspx?RootFolder%3D%252Fpublic%252FSTe2ecommunities%252Fmcu%252FLists%252FARM%2520CortexM3%2520STM32%252FComplexreal%252

[DEAD LINK /public/STe2ecommunities/mcu/Lists/ARM%20CortexM3%20STM32/Flat.aspx?RootFolder=/public/STe2ecommunities/mcu/Lists/ARM CortexM3 STM32/Complexreal, 1632bit, radix42 FFT, windowing, sqrt and magnitude library&currentviews=1594]Thread

He has an address in the thread, not clear if the SQRT source is available.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

thomasbretgeld9 · ‎2011-05-17

Posted on May 17, 2011 at 13:38

Hi Clive,

thanks for the answer. Unfortunately the Ivan Mellen library seems to only be free for personal use. I have found another solution, which works fine for me:

http://www.codecodex.com/wiki/Calculate_an_integer_square_root

Here are several integer square root algorithms, which work much faster than the gcc floating point implementation. It does the job quite fine and its around factor 12-15 faster then the floating point approach. Looks like the fastest ''C'' version is the following:

unsigned int sqrt32(unsigned long n)

{

unsigned int c = 0x8000;

unsigned int g = 0x8000;

for(;;) {

if(g*g > n)

g ^= c;

c >>= 1;

if(c == 0)

return g;

g |= c;

}

} Thomas

Tesla DeLorean · ‎2011-05-17

Posted on May 17, 2011 at 13:38

// Jack Crenshaw's Integer Square Root

unsigned long integer_sqrt(unsigned long a)

{

unsigned long rem = 0;

unsigned long root = 0;

unsigned long divisor = 0;

int i;

for(i=0; i<16; i++) // 32-bit, 2 at a time

{

root <<= 1;

rem = ((rem << 2) + (a >> 30));

a <<= 2;

divisor = (root << 1) + 1;

if (divisor <= rem)

{

rem -= divisor;

root++;

}

return(root);

}

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

thomasbretgeld9 · ‎2011-05-17

Posted on May 17, 2011 at 13:38

Hi Clive,

I have tried this algorithm already. With gcc and optimization set to -O2, the one that I have posted is considerably faster. My magnitude calculation function for 512 values then takes ca. 60000 cycles to complete compared to ca. 91000 cycles with the algorithm that you have posted. As stated in the link from my previous post, the site explains that its pretty dependent on the compiler and cpu which algorithm will be faster.

Btw.: The floating point version with compiler libraries takes ca. 600000 cycles for 256 values, so this is an increase of about factor 20. However, pure assembler code probably is again significantly faster.

Thomas