2023-11-15 09:32 AM
Hallo,
I used in my app fmaf function and I found it doesn't behave as expected:
y = fmaf(a1,a2,a3) should be y = a1*a2 + a3
but I got as result y = a1*a2 without additive term.
Of course I can and use "classic" expresion instead this functions.
Is this a bug or maybe I misunderstood something?
I'm using STM32CubeIDE 1.13.2, and STM32F401ceu...
Greg
Below is also dissasembled piece of code:
137 q = fmaf(0.98564736f, D, 280.459f);
080014ea: vldr s2, [pc, #200] ; 0x80015b4 <Sun_Longitude_Deg+332>
080014ee: vldr s1, [r7, #4]
080014f2: vldr s0, [pc, #188] ; 0x80015b0 <Sun_Longitude_Deg+328>
080014f6: bl 0x800c8d0 <fmaf>
fmaf:
0800c8d0: vfma.f32 s0, s0, s1
0800c8d4: bx lr
0800c8d6: movs r0, r0
__ieee754_atan2:
Solved! Go to Solution.
2023-11-16 10:25 PM - edited 2023-11-16 10:26 PM
Hello again @KwA
I've just tested your code and i Can confirm the issue. As a workaround, you Can use fma (i've tested and it works fine).
PS : i've done my test using CubeIDE 1.13.2.
Best Regards.
STTwo-32
To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.
2023-11-15 11:25 AM - edited 2023-11-15 11:25 AM
Hello @KwA and welcome to the ST Community :smiling_face_with_smiling_eyes:.
I can't reproduce that issue. Can you share your code here so we may can find the problem.
Best Regards.
STTwo-32
To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.
2023-11-15 11:28 AM
Yes looks broken, although the Keil output has __hardfp_fmap which is much more complex
0x08003eda: ed9f0a0c .... VLDR s0,[pc,#48] ; [0x8003f0c] = 0x3f7c5363
0x08003ede: eddf0a0c .... VLDR s1,[pc,#48] ; [0x8003f10] = 0x42f6e666
0x08003ee2: ed9f1a0c .... VLDR s2,[pc,#48] ; [0x8003f14] = 0x438c3ac1
0x08003ee6: f000f829 ..). BL __hardfp_fmaf ; 0x8003f3c
2023-11-15 11:41 AM
With GNU 4.93 something similar
f = fmaf(0.98564736f, f, 280.459f);
0x080015e4: ed9d7a02 ...z VLDR s14,[sp,#8]
0x080015e8: eddf6a23 ..#j VLDR s13,[pc,#140] ; [0x8001678] = 0x3f7c5363
0x080015ec: eddf7a23 ..#z VLDR s15,[pc,#140] ; [0x800167c] = 0x438c3ac1
0x080015f0: 481b .H LDR r0,[pc,#108] ; [0x8001660] = 0x80084ef
0x080015f2: eee67a87 ...z VFMA.F32 s15,s13,s14
0x080015f6: edcd7a02 ...z VSTR s15,[sp,#8]
printf("%f\n", f);
0x080015fa: eddd7a02 ...z VLDR s15,[sp,#8]
0x080015fe: eeb77ae7 ...z VCVT.F64.F32 d7,s15
0x08001602: ec532b17 S..+ VMOV r2,r3,d7
0x08001606: f002fb81 .... BL printf ; 0x8003d0c
2023-11-15 11:50 AM
Hello, yes I can share but I think it is unusable in this form. I just use normal expression insted of fma. This all fma callings are commented out.
Sorry, the code looks terrible - I don't know how to make this formated here:
float Sun_Longitude_Deg(float D) {
float g, q, L;
float gr;
// g, q, and L are in degrees
// Mean anomaly of the Sun:
g = 357.529f + 0.98560028f * D;
//g = fmaf(0.98560028f , D, 357.529f);
if (g >= 360) //while (g > 360)
{ //g -= 360;
g = fmodf(g, 360);
} else {
while (g < 0) { g += 360;}
}
// Mean longitude of the Sun:
q = 280.459f + 0.98564736f * D;
//q = fmaf(0.98564736f, D, 280.459f);
if (q >= 360) //while (q > 360)
{ //q -= 360;
q = fmodf(q, 360);
} else {
while (q < 0) { q += 360;}
}
// Sun's geocentric apparent ecliptic longitude (adjusted for aberration):
gr = D2R(g);
L = q + 1.915f*sinf(gr) + 0.020f*sinf(2*gr);
//L = fmaf(1.915f, sinf(gr), q);
//L = fmaf(0.020f, sinf(2*gr), L);
return L;
}
2023-11-15 11:56 AM
in this dissasembly it looks like VFMA is operating on three different registers contrary to my where it operates on two different registers (omits s2 in my case).
2023-11-15 01:33 PM
Yes, I don't know if that's the optimizer trying to be clever. I had it calling routines, and the optimizer was folding constants, juggling registers and in-lining code. I put it in a loop so the compiler couldn't precompute everything.
The code-paste tool is the icon that looks like </> hit the ... icon to expand the tool icon list.
2023-11-16 08:20 AM
Hello Tesla and @STTwo-32 . I did today some experiment today. I created fresh project with code (given below which also have also the same calculations done normal way to show difference) and again this function didn't work as it should. It seems that it is improperly implemented.
ResuIts are the same with GNU Tools for STM32 - 11.3.rel1 and 10.3-2021.10. (also in dissasembly both give the same calling parameters to VFMA.F32 as in my first message).
I use hardware FP.
#include "math.h"
float ffun, fnor;
volatile float fres;
for (int i = 0; i < 10; ++i) {
for (int j = 0; j < 10; ++j) {
ffun = fmaf((float)i, (float)j, (float)(2*j));
fnor = (float)i * (float)j + (float)(2*j);
fres = ffun - fnor;
}
}
Greg
2023-11-16 10:25 PM - edited 2023-11-16 10:26 PM
Hello again @KwA
I've just tested your code and i Can confirm the issue. As a workaround, you Can use fma (i've tested and it works fine).
PS : i've done my test using CubeIDE 1.13.2.
Best Regards.
STTwo-32
To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.