Complex/real, 16/32bit, radix4/2 FFT, windowing, sqrt and magnitude library

imellen · ‎2010-01-18

Posted on January 18, 2010 at 10:54

#great?thanks?

imellen · ‎2011-05-17

Posted on May 17, 2011 at 13:36

Complex/real FFT, 16/32bit FFT, radix4/2 FFT, windowing, sqrt and magnitude functions for Cortex-M3.

Hi,

in last couple of weeks I've got quite a few emails regarding FFT library 2.0.

I found some time to put all together and benchmark all functions.

All functions were written in hand optimized assembly code. Small speed improvement still possible especially for small sizes FFT.

FFT library 2.0 content:

• Windowing functions (e.g. Hamming window)

o Perform speed optimized windowing of input signal before FFT

o 16 to 32 bit version performs proper scaling of 16 bit signal for 32 bit FFT

• FFT functions

o Complex and real FFT, 16 and 32bit FFT versions

o Radix4/2 FFT – sizes 4,8,16,32,64,128,256,512,1024,2048 and 4096

o Inverse FFT available (complex)

o 32 bit FFT increases dynamic range by 90 dB , needs extra 20% to 50% cycles

o Coefficients located in Flash. RAM location means faster FFT for higher latencies.

• Magnitude functions

o Calculate complex frequency magnitude mag=sqrt (re2 + im2)

o Based on custom 32 bit square root algorithm (7 cycles)

o Multiple precision/speed variants for 32 bit frequencies (square root of 64 bit integer based)

As an example I've atached these library fuctions with test program (gcc, Keil, IAR):

Window16to32b_real() ;16bit to 32 bit windowing function

FFT128Real_32b() ;128 points 32 bit real FFT

magnitude32_32bIn() ;complex magnitude calculation, 32 bit precision (64 bit sqrt)

PDF document with benchamrks is also attached.

If you need more information contact me at: imellen at embeddedsignals dot com

Happy New Year,

Ivan

16-32micros · ‎2011-05-17

Posted on May 17, 2011 at 13:36

Dear Ivan,

Thank you a lot for your excellent Contribution ! This seems a great job and this is a nice gift to start 2010 to our friends on this forums.

I will study your implementation and I will feedback you.

Cheers,

STOne-32.

benderb · ‎2011-05-17

Posted on May 17, 2011 at 13:36

Thank you for your contribution!

Would it be possible to provide an separate sqrt() function to be used independent of the magnitude functions? Computing a 16.16 sqrt of a 32 bit number in only 7 cycles is quite a trick!

Thanks

Bernhard

imellen · ‎2011-05-17

Posted on May 17, 2011 at 13:36

Hi,

Let me bring more light into SQRT routines:

All my square root routines are based on approximation, with multiple speed/accuracy/memory size variants:

7 cycles SQRT ( 32Q0 bit input in R0 to 16Q16 output in R0)

- worst case accuracy is P bits from leading 1 in result , this mean that max. error is proportional to result size. For P=8, max error is 0.39%

- if P=8, SQRT of 16 bit number always yields correct integer part (as it is always < 256)

- look up table (LUT) size is 3/4*(2^P) in 16 bit words. E.g for 8 bit precision, LUT size is 192 words = 384 bytes. For P=9 LUT size is 768 bytes.

- example: sqrtP8(12) is 227536 in 16Q16 which is 3.47192. Correct value is 3.4641, error is 0.2258%

- 7 cycles can be achieved only if LDRH instruction takes 1 cycle (pipelining it with other load instructions), 8 cycles otherwise

13 cycles SQRT ( 32Q0 bit input in R0 to 16Q16 output in R0)

- this version uses interpolation to reduce error, while keeping LUT reasonably small

- LUT size is fixed to 192 of 32 bit words (768 bytes)

- worst case accuracy is 18 bits from leading 1 in results

- SQRT from any 32 bit number provides 16 bit integer + at least 2 valid bits in the decimal part

- 2 versions of LUT:

- 1st always provides positive error (truncated integer part is correct),

- 2nd reduces error to 1/2, but error can be negative (truncated integer part can equal expected value - 1)

- 13 cycles assumes 2 cycle LDR, if pipelined, can be reduced to 12 cycles

Cycle count does not include call overhead and LUT address loading (performed only once,reused for multiple sqrt calculations)

Magnitude functions in FFT libray v 2.0 actually use 13 cycle version, not 7 cycle version of SQRT, so their accuracy is at least 18 bits from leading 1 in result.

Bernhard, at this time SQRT routines are available only by email. Please contact me at: imellen at embeddedsignals dot com .

Cheers,

Ivan

otr · ‎2011-05-17

Posted on May 17, 2011 at 13:36

Amazing!

Will you publish the FFT library on the form? 🙂

petter23 · ‎2011-05-17

Posted on May 17, 2011 at 13:36

I cannot find any attachments in the above posts, maybe I am

missing how to find it, but it does not show up under View Properties/Attachments!?

st3 · ‎2011-05-17

Posted on May 17, 2011 at 13:36

Quote:

Will you publish the FFT library on the form?

Errrm... look at the attachments to the opening post?!

Tomas DRESLER · ‎2011-05-17

Posted on May 17, 2011 at 13:36

Looks like the attachments were lost when forum was ported to new system...

Nickname12657_O · ‎2011-05-17

Posted on May 17, 2011 at 13:36

Hi,

Here you go (See attachments).

We are trying to fix the issue of missing attachments from our old forums

Thank you for comprehension.

Cheers,

STOne-32.