Question about the assemble code for floating point computing: the FPU didn't be used in fact?

Lingjun Kong · ‎2017-08-03

Posted on August 03, 2017 at 22:11

The original post was too long to process during our migration. Please click on the attachment to read the original post.

Tesla DeLorean · ‎2017-08-03

Posted on August 03, 2017 at 22:50

Your CPU only has a single precision FPU, this is why it the double precision computations are being done in software, it is using the FPU registers as extra holding space.

The FPU isn't inherently complicated, it only implements fairly basic math functionality which you can gleen from the opcode name. The cleverness with come from how you manage the register resources to minimize the movement in/out, and make the math flow. The FPU does not hold higher precision intermediate results.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

View solution in original post

Tesla DeLorean · ‎2017-08-03

Posted on August 03, 2017 at 22:50

Your CPU only has a single precision FPU, this is why it the double precision computations are being done in software, it is using the FPU registers as extra holding space.

The FPU isn't inherently complicated, it only implements fairly basic math functionality which you can gleen from the opcode name. The cleverness with come from how you manage the register resources to minimize the movement in/out, and make the math flow. The FPU does not hold higher precision intermediate results.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2017-08-03

Posted on August 03, 2017 at 22:52

See FPU Check code in this thread

https://community.st.com/0D50X00009XkYbhSAF

https://community.st.com/0D50X00009XkYbhSAF#comment-151216

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Lingjun Kong · ‎2017-08-03

Posted on August 03, 2017 at 23:15

Thanks for your answer. It's helpful. the FPU on STM32F746 is actually only support single point computing.

in manual

Thanks again

Lingjun Kong · ‎2017-08-03

Posted on August 03, 2017 at 23:23

And, do you think it is more efficiency to load the data from the source address and travel back to the destination than LDRD them directly to the target address?

What's the purpose of the compiler to let the data travel through the FPU register? I didn't find there is any need to holding the data in FPU.

Tesla DeLorean · ‎2017-08-03

Posted on August 04, 2017 at 01:17

The 2MB version has the newer core and the FPU-D, they should have built the original CM7 with this, like Atmel did.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2017-08-03

Posted on August 04, 2017 at 01:24

I'm not going to argue compiler performance/behaviour here. Not sure the optimization level here, but no doubt a hand crafted routine where you understood the math/algorithm could be more efficient.

Bruce Smith's ARM A32 ASSEMBLY LANGUAGE book has some chapters on the FPU/VFP

http://www.brucesmith.info/arm-a32-assembly-language/

https://www.amazon.com/ARM-A32-Assembly-Language-32-Bit/dp/0992391695?_encoding=UTF8&psc=1

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

T J · ‎2017-08-03

Posted on August 04, 2017 at 02:10

Clive, are there any other books that you would recommend ?

I am working on the STM32F7 (upgrading to H7 immediately that it's available.)

Visual Studio GDB

AWS Cloud

Poco Server(current work)

three.js

HTML

Tesla DeLorean · ‎2017-08-03

Posted on August 04, 2017 at 04:10

On the FPU I can't say I've found anything I like a lot, I've got books on the old 80x87 and I did try to engage some ARM staff to discuss the internals with a little more detail. I wanted the 'Haynes Manual' but didn't find it, I'm interested in the mechanics. It's a lot less complex than the 80x87 and 6888x type parts, and I have some history with the Intel MMX/SSE vector stuff.

On the Cortex-Mx series I like Joseph Yiu's books, augment the TRM

Assembler, J R Gibson

https://www.abebooks.com/products/isbn/9781447717157

Hohl is overpriced, and the first edition seemed a decade or more out of date when published, the second is better, but obviously targets a college text book audience, buy the international version from India.

Mazidi is quite good, the first edition had a few issues, I wrote a review on Amazon which seems to have been removed/lost

https://www.amazon.com/Assembly-Language-Programming-Architecture-books/dp/0997925906

Liked Langbridge

https://www.amazon.com/Professional-Embedded-Development-James-Langbridge/dp/111878894X/

Most cover ARM/Thumb in the classic sense, not so much the Cortex-Mx vectoring, but from a 'how to do things' perspective if you understand ARM and how it evolved and the 16-bit ISA, it is not a huge logical leap.

I can't make a case to spend more than $20 USD on any of them.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..