Complex/real, 16/32bit, radix4/2 FFT, windowing, sqrt and magnitude library

Complex/real, 16/32bit, radix4/2 FFT, windowing, sqrt and magnitude library

- Hi,

Let me bring more light into SQRT routines:

All my square root routines are based on approximation, with multiple speed/accuracy/memory size variants:

7 cycles SQRT ( 32Q0 bit input in R0 to 16Q16 output in R0)

- worst case accuracy is P bits from leading 1 in result , this mean that max. error is proportional to result size. For P=8, max error is 0.39%

- if P=8, SQRT of 16 bit number always yields correct integer part (as it is always < 256)

- look up table (LUT) size is 3/4*(2^P) in 16 bit words. E.g for 8 bit precision, LUT size is 192 words = 384 bytes. For P=9 LUT size is 768 bytes.

- example: sqrtP8(12) is 227536 in 16Q16 which is 3.47192. Correct value is 3.4641, error is 0.2258%

- 7 cycles can be achieved only if LDRH instruction takes 1 cycle (pipelining it with other load instructions), 8 cycles otherwise

13 cycles SQRT ( 32Q0 bit input in R0 to 16Q16 output in R0)

- this version uses interpolation to reduce error, while keeping LUT reasonably small

- LUT size is fixed to 192 of 32 bit words (768 bytes)

- worst case accuracy is 18 bits from leading 1 in results

- SQRT from any 32 bit number provides 16 bit integer + at least 2 valid bits in the decimal part

- 2 versions of LUT:

- 1st always provides positive error (truncated integer part is correct),

- 2nd reduces error to 1/2, but error can be negative (truncated integer part can equal expected value - 1)

- 13 cycles assumes 2 cycle LDR, if pipelined, can be reduced to 12 cycles

Cycle count does not include call overhead and LUT address loading (performed only once,reused for multiple sqrt calculations)

Magnitude functions in FFT libray v 2.0 actually use 13 cycle version, not 7 cycle version of SQRT, so their accuracy is at least 18 bits from leading 1 in result.

Bernhard, at this time SQRT routines are available only by email. Please contact me at: imellen at embeddedsignals dot com .

Cheers,

Ivan - Hi all,

I've just noticed that Ex_FFT128Real_32b.s files do not describe input and output data properly. I forgot to change data description from complex FFT to real FFT.

void FFT128Real_32b(int *y, int *x);

Original description (wrong):

; x[N] = {re0,im0,re1,im1, ... re(N-1),im(N-1)}

; y[N+2] = {DC,0,reF1,imF1,reF2,imF2,.. reF(N),0 }

New descrition (correct):

N is 128

int x[N] = { x0, x1, x2 ... x(N-1) } ; 32 bit int scaled for FFT, 128 real elements

int y[N+2] = {DC,0,reF1,imF1,reF2,imF2,.. reF(N/2),0 }, 32 bit int, 66 complex elements

Input data restrictions remains the same:

x must be scaled to 1/2 of the full range to avoid overflow (+- 2^30)

regards,

Ivan - Hi all,

I've noticed that attachments are not working very well in the new forum. I put all posted Cortex M3 FFT library files to the embeddedsignals.com website. The exact address is:

www.embeddedsignals.com/ARM.htm

Ivan

Hi,

in last couple of weeks I've got quite a few emails regarding FFT library 2.0.

I found some time to put all together and benchmark all functions.

All functions were written in hand optimized assembly code. Small speed improvement still possible especially for small sizes FFT.

FFT library 2.0 content:

• Windowing functions (e.g. Hamming window)

o Perform speed optimized windowing of input signal before FFT

o 16 to 32 bit version performs proper scaling of 16 bit signal for 32 bit FFT

• FFT functions

o Complex and real FFT, 16 and 32bit FFT versions

o Radix4/2 FFT – sizes 4,8,16,32,64,128,256,512,1024,2048 and 4096

o Inverse FFT available (complex)

o 32 bit FFT increases dynamic range by 90 dB , needs extra 20% to 50% cycles

o Coefficients located in Flash. RAM location means faster FFT for higher latencies.

• Magnitude functions

o Calculate complex frequency magnitude mag=sqrt (re2 + im2)

o Based on custom 32 bit square root algorithm (7 cycles)

o Multiple precision/speed variants for 32 bit frequencies (square root of 64 bit integer based)

As an example I've atached these library fuctions with test program (gcc, Keil, IAR):

Window16to32b_real() ;16bit to 32 bit windowing function

FFT128Real_32b() ;128 points 32 bit real FFT

magnitude32_32bIn() ;complex magnitude calculation, 32 bit precision (64 bit sqrt)

PDF document with benchamrks is also attached.

If you need more information contact me at: imellen at embeddedsignals dot com

Happy New Year,

Ivan