2010-01-17 12:19 PM
STM32 get Magnitude after fft
2011-05-17 04:38 AM
stm32@72MHz,1024 point FFT, placed the FFT coefficients in RAM.
less than 1.8ms,but get the Magnitude use 90ms~~.howto speed up the function?I want only 16bit~~ code from the DSP_lib demo: void powerMag(long nfill, char* strPara) { int32_t lX,lY; uint32_t i; for (i=0; i < nfill; i++) { lX= (lBUFOUT[i]<>16; /* sine_cosine --> cos */ lY= (lBUFOUT[i] >> 16); /* sine_cosine --> sin */ { float X= 64*((float)lX)/32768; float Y = 64*((float)lY)/32768; float Mag = sqrt(X*X+ Y*Y)/nfill; lBUFMAG[i] = (uint32_t)(Mag*65536); } } if (strPara == ''1SIDED'') onesided(nfill); }2011-05-17 04:38 AM
Hi,
We have not yet optimized Magnitude function for our STM32, but in our example demo. We are using the default compiler mathematical libraries using float librairies, Therefore it is closely dependent on the compiler option ( RVMDK, EWARM, GNU etc...) Magnitude = sqrt( real ^2 + imaginary ^2) I recommend you to refer to this from ''Ivan Mellen'' where ''magnitude16_16bIn'' for 1024 points should take only ''220µs''. Cheers, STOne-32.2011-05-17 04:38 AM
Amazing!
thanks very much~.~ :D2011-05-17 04:38 AM
Hi,
currently I am facing the same problem with the magnitude calculations. FFT takes around 26000 cycles and magnitude consumes 330000 cycles (only for the first half of bins!). Unfortunately the link to the ''magnitude16_16bIn'' is not working anymore. Can somebody tell me where to find the post about this function or any site, where I can find more information about it? I am using the Code Sourcery gcc. Google and the search in this forum didn't help me. Kind regards, Thomas2011-05-17 04:38 AM
You could always try Ivan
http://www.embeddedsignals.com/ This is the FFT 2.0, but probably not what you wanthttps://my.st.com/public/STe2ecommunities/mcu/Lists/ARM%20CortexM3%20STM32/DispForm.aspx?ID=10404&RootFolder=/public/STe2ecommunities/mcu/Lists/ARM CortexM3 STM32/Complexreal, 1632bit, radix42 FFT, windowing, sqrt and magnitude library&Source=https://my.st.com/public/STe2ecommunities/mcu/Lists/ARM%2520CortexM3%2520STM32/Flat.aspx?RootFolder%3D%252Fpublic%252FSTe2ecommunities%252Fmcu%252FLists%252FARM%2520CortexM3%2520STM32%252FComplexreal%252
[DEAD LINK /public/STe2ecommunities/mcu/Lists/ARM%20CortexM3%20STM32/Flat.aspx?RootFolder=/public/STe2ecommunities/mcu/Lists/ARM CortexM3 STM32/Complexreal, 1632bit, radix42 FFT, windowing, sqrt and magnitude library¤tviews=1594]Thread He has an address in the thread, not clear if the SQRT source is available.2011-05-17 04:38 AM
Hi Clive,
thanks for the answer. Unfortunately the Ivan Mellen library seems to only be free for personal use. I have found another solution, which works fine for me: http://www.codecodex.com/wiki/Calculate_an_integer_square_root Here are several integer square root algorithms, which work much faster than the gcc floating point implementation. It does the job quite fine and its around factor 12-15 faster then the floating point approach. Looks like the fastest ''C'' version is the following: unsigned int sqrt32(unsigned long n){
unsigned int c = 0x8000; unsigned int g = 0x8000; for(;;) { if(g*g > n) g ^= c; c >>= 1; if(c == 0) return g; g |= c; }} Thomas2011-05-17 04:38 AM
// Jack Crenshaw's Integer Square Root
unsigned long integer_sqrt(unsigned long a) { unsigned long rem = 0; unsigned long root = 0; unsigned long divisor = 0; int i; for(i=0; i<16; i++) // 32-bit, 2 at a time { root <<= 1; rem = ((rem << 2) + (a >> 30)); a <<= 2; divisor = (root << 1) + 1; if (divisor <= rem) { rem -= divisor; root++; } } return(root); }2011-05-17 04:38 AM
Hi Clive,
I have tried this algorithm already. With gcc and optimization set to -O2, the one that I have posted is considerably faster. My magnitude calculation function for 512 values then takes ca. 60000 cycles to complete compared to ca. 91000 cycles with the algorithm that you have posted. As stated in the link from my previous post, the site explains that its pretty dependent on the compiler and cpu which algorithm will be faster. Btw.: The floating point version with compiler libraries takes ca. 600000 cycles for 256 values, so this is an increase of about factor 20. However, pure assembler code probably is again significantly faster. Thomas