issue with fmaf function

KwA · ‎2023-11-15

Hallo,

I used in my app fmaf function and I found it doesn't behave as expected:

y = fmaf(a1,a2,a3) should be y = a1*a2 + a3

but I got as result y = a1*a2 without additive term.

Of course I can and use "classic" expresion instead this functions.

Is this a bug or maybe I misunderstood something?

I'm using STM32CubeIDE 1.13.2, and STM32F401ceu...

Greg

Below is also dissasembled piece of code:

137 q = fmaf(0.98564736f, D, 280.459f);
080014ea: vldr s2, [pc, #200] ; 0x80015b4 <Sun_Longitude_Deg+332>
080014ee: vldr s1, [r7, #4]
080014f2: vldr s0, [pc, #188] ; 0x80015b0 <Sun_Longitude_Deg+328>
080014f6: bl 0x800c8d0 <fmaf>

fmaf:
0800c8d0: vfma.f32 s0, s0, s1
0800c8d4: bx lr
0800c8d6: movs r0, r0
__ieee754_atan2:

STTwo-32 · ‎2023-11-16

Hello again @KwA

I've just tested your code and i Can confirm the issue. As a workaround, you Can use fma (i've tested and it works fine).

PS : i've done my test using CubeIDE 1.13.2.

Best Regards.

STTwo-32

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

View solution in original post

STTwo-32 · ‎2023-11-15

Hello @KwA and welcome to the ST Community :smiling_face_with_smiling_eyes:.

I can't reproduce that issue. Can you share your code here so we may can find the problem.

Best Regards.

STTwo-32

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

Tesla DeLorean · ‎2023-11-15

Yes looks broken, although the Keil output has __hardfp_fmap which is much more complex

        0x08003eda:    ed9f0a0c    ....    VLDR     s0,[pc,#48] ; [0x8003f0c] = 0x3f7c5363
        0x08003ede:    eddf0a0c    ....    VLDR     s1,[pc,#48] ; [0x8003f10] = 0x42f6e666
        0x08003ee2:    ed9f1a0c    ....    VLDR     s2,[pc,#48] ; [0x8003f14] = 0x438c3ac1
        0x08003ee6:    f000f829    ..).    BL       __hardfp_fmaf ; 0x8003f3c

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2023-11-15

With GNU 4.93 something similar

f = fmaf(0.98564736f, f, 280.459f);

        0x080015e4:    ed9d7a02    ...z    VLDR     s14,[sp,#8]
        0x080015e8:    eddf6a23    ..#j    VLDR     s13,[pc,#140] ; [0x8001678] = 0x3f7c5363
        0x080015ec:    eddf7a23    ..#z    VLDR     s15,[pc,#140] ; [0x800167c] = 0x438c3ac1
        0x080015f0:    481b        .H      LDR      r0,[pc,#108] ; [0x8001660] = 0x80084ef
        0x080015f2:    eee67a87    ...z    VFMA.F32 s15,s13,s14
        0x080015f6:    edcd7a02    ...z    VSTR     s15,[sp,#8]

       printf("%f\n", f);

        0x080015fa:    eddd7a02    ...z    VLDR     s15,[sp,#8]
        0x080015fe:    eeb77ae7    ...z    VCVT.F64.F32 d7,s15
        0x08001602:    ec532b17    S..+    VMOV     r2,r3,d7
        0x08001606:    f002fb81    ....    BL       printf ; 0x8003d0c

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

KwA · ‎2023-11-15

Hello, yes I can share but I think it is unusable in this form. I just use normal expression insted of fma. This all fma callings are commented out.

Sorry, the code looks terrible - I don't know how to make this formated here:

float Sun_Longitude_Deg(float D) {
float g, q, L;
float gr;

// g, q, and L are in degrees
// Mean anomaly of the Sun:
g = 357.529f + 0.98560028f * D;
//g = fmaf(0.98560028f , D, 357.529f);

if (g >= 360) //while (g > 360)
{ //g -= 360;
g = fmodf(g, 360);
} else {
while (g < 0) { g += 360;}
}

// Mean longitude of the Sun:
q = 280.459f + 0.98564736f * D;
//q = fmaf(0.98564736f, D, 280.459f);
if (q >= 360) //while (q > 360)
{ //q -= 360;
q = fmodf(q, 360);
} else {
while (q < 0) { q += 360;}
}

// Sun's geocentric apparent ecliptic longitude (adjusted for aberration):

gr = D2R(g);
L = q + 1.915f*sinf(gr) + 0.020f*sinf(2*gr);

//L = fmaf(1.915f, sinf(gr), q);
//L = fmaf(0.020f, sinf(2*gr), L);

return L;
}

KwA · ‎2023-11-15

in this dissasembly it looks like VFMA is operating on three different registers contrary to my where it operates on two different registers (omits s2 in my case).

Tesla DeLorean · ‎2023-11-15

Yes, I don't know if that's the optimizer trying to be clever. I had it calling routines, and the optimizer was folding constants, juggling registers and in-lining code. I put it in a loop so the compiler couldn't precompute everything.

The code-paste tool is the icon that looks like </> hit the ... icon to expand the tool icon list.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

KwA · ‎2023-11-16

Hello Tesla and @STTwo-32 . I did today some experiment today. I created fresh project with code (given below which also have also the same calculations done normal way to show difference) and again this function didn't work as it should. It seems that it is improperly implemented.

ResuIts are the same with GNU Tools for STM32 - 11.3.rel1 and 10.3-2021.10. (also in dissasembly both give the same calling parameters to VFMA.F32 as in my first message).

I use hardware FP.

#include "math.h"

float ffun, fnor;
volatile float fres;

for (int i = 0; i < 10; ++i) {
	for (int j = 0; j < 10; ++j) {
		ffun = fmaf((float)i, (float)j, (float)(2*j));
		fnor = (float)i * (float)j + (float)(2*j);
		fres = ffun - fnor;
	}
}

Greg

STTwo-32 · ‎2023-11-16

Hello again @KwA

I've just tested your code and i Can confirm the issue. As a workaround, you Can use fma (i've tested and it works fine).

PS : i've done my test using CubeIDE 1.13.2.

Best Regards.

STTwo-32

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.