Fast floating point math

linas2 · ‎2013-04-03

Posted on April 03, 2013 at 11:27

Hello, i run in to a problem, where i need to do math with single precision floating point.

it looks like if(...) condition, and printing floating point to uint16_t is very slow.

so i have two questions:

first, what is the fastest way to print float to integer ? ( float will be between -10 and +10, and that should correspond to 0x0000 and 0xFFFF in integer.)

and second, how can i read binary data from float variable to u32 variables, so i can make if(...) condition faster (i need to set floating point to zero, if it is more than 10.0 or less that -10.0

( i know that floating point variable is in memory with adress of 0x2000000C so all i need to do is to copy raw binary data to u32 variable, and check mantissa and exponent to see if it goes off the limit, but i don't know how to copy 32b data to 32b unsigned integer, if i try to do that with pointers i get error, or compiler do conversion for me, and slow my program)

Tesla DeLorean · ‎2013-04-03

Posted on April 03, 2013 at 11:39

Doing printf() of anything is pretty slow in the scale of things.

float foo = 1.23;

uint32_t bar;

bar = *((uint32_t *)&foo); // Convert 32-bit float into it's binary representation IEEE-754

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

linas2 · ‎2013-04-03

Posted on April 03, 2013 at 11:49

bar = *((uint32_t *)&foo); // Convert 32-bit float into it's binary representation IEEE-754

this single line slow down my program from 8KHz cycle speed to 8KHz cycle speed.

i can do tones of integer math with very little speed reduction

as example code like this

a++;
a=a*b+a*c+a*d+a*e;

slow down loop speed by very small amount, i sill read 8KHz cycle speed do you know any other way to copy binary data from f32 to u32 ?

frankmeyer9 · ‎2013-04-03

Posted on April 03, 2013 at 11:59

As clive said, avoiding printf() calls would help, especially such with floating point arguments.

For an uint16, you can write your own printf substitution.

it looks like if(...) condition, ... is very slow.

Are you really sure ?

I can hardly believe this. But you might try ''

-mfloat-abi=hard

'', to use the FPU directly. And turning on at least minimal optimization (

-O1

) might help, too.

Tesla DeLorean · ‎2013-04-03

Posted on April 03, 2013 at 12:09

I think one of your problems here is that you don't have a clue what code the compiler is actually generating, or removing via optimization. If you want to play machine cycle level games, then use assembler.

Write some numeric output routines which are efficient, and relate to the range of numbers you wish to represent.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

linas2 · ‎2013-04-03

Posted on April 03, 2013 at 12:19

Are you really sure ?

I can hardly believe this. But you might try ''

-mfloat-abi=hard

'', to use the FPU directly. And turning on at least minimal optimization (

-O1

) might help, too. ok, as it turns out, that was compiler personality. if in while(1) cycle end i write:

if(phase>0)
phase=0;
if(phase<-0)
phase=0;

speed goes down from 8KHz to 44KHz. BUT ! if i write additional line like

if(faze>0)
faze=0;
if(faze<-0)
faze=0;
a=a+1;// simple void counter with no other use, just count to 0xFFFF

speed go up to 8KHz again, that is strange...but still, printing float to integer still slows down my program. i bet that is have something to do with IAR compiler now, since only one expression float to int slow down by ~10KHz, but if i add second non retailed float to integer conversion, speed the same as use single float to integer conversion. Well, this is no fun !

linas2 · ‎2013-04-03

Posted on April 03, 2013 at 12:19

ok, that is just compiler skipping code and not compiling it right, when i added breakpoint, it show error that this code does not exist in ASM, so yes, if() for float is slow, and f32 to u32 conversion is slow,

my guess that floating point is good for MAC but and that all.

so go back for old approach, check mantissa and exponent in integer format where ALU is fast to do the job.

frankmeyer9 · ‎2013-04-03

Posted on April 03, 2013 at 12:53

if(phase>0) ...

Are you aware that you use double constants ?

Rephrase it to

if(phase>0f) ...

could help here.

And, if i remember correctly, the FPU has an VABS instruction, so something like

if (fabs(phase < 0f) ...

could also speed up things, too.

linas2 · ‎2013-04-03

Posted on April 03, 2013 at 13:09

ok, thanks for the tip.

i cross checked asm code and loop cycle speed, and now i know real problem. in main loop i was doing floating point MAC, 256 times with adc data. when i write anything what will going to use data from MAC compiler will include my floating point operations to assembler code. so in short, when i try to print calculated phase from main loop to DAC, it start to include all floating point operations from main loop, so in general f32 to in32 is fast only problem i have more floating point operations elsewhere. so that was wrong interpretation what compiler is doing, and it's doing quite a good job. original loop to see what i did...

while(k<
128
)
{
cosinusas[k]=cosf((6.28318531*k*POINT)/N);
sinusas[k] =sinf((6.28318531*k*POINT)/N);
k++;
}
while(1)
{
while(GPIOA->IDR < 
32766
);
GPIOD->BSRRL= GPIO_Pin_13; 
while(GPIOA->IDR > 32766);
while(GPIOA->IDR < 
32766
);
GPIOD->BSRRH= GPIO_Pin_13;
imag=0;
real=0;
i=0;
a=0;
while(i<
128
)
{
while(GPIOC->IDR < 
32766
);
CLK_LOW;
CLK_HIGH;
k
=
GPIOB
->IDR;
real+=k*cosinusas[i]; // FMAC
imag-=k*sinusas[i]; // FMAC
i++;
}
faze=faze-(0.2*Angle(imag,real));
if(fabsf(faze)>0f)
faze=0;
DAC_DATA=(int)(faze*375+32767);
}

 while(GPIOC->IDR < 
32766
)

is for FIFO buffer to get data from adc , so i read only when fifo buffer is not empty, so in this loop lost 700ns of time, (clock is 8MHz so all data is loaded in in 16us, and unloaded in 7us, so it is acceptable) Two FMAC operations i

n
while(i<
128
)

loop cost me 4us of time, and angle calculation, If(...) condition checking and printing float to DAC cost me additional 2us. I don't know, can i do better than this ? or does any one know fast and cheap processor for this kind of work ?

frankmeyer9 · ‎2013-04-03

Posted on April 03, 2013 at 13:43

I guess this is no commercial project, otherwise project management would slap on your fingers.

As the output data for the DAC are 12 bit, you could easily do it with scaled integer math on a cheap M3 or even a M0. But nevermind ...

But I suggest to append an

f

to all non-integral constants. According to the C standard, they are otherwise double by default.