2017-06-12 11:51 PM
Hello,
I am working on a firmware for the STM32F765.
This MCU was chosen because of its double precision floating point capabilities.I manged to configure the FPU/Compiler (ARM GCC 5.4) to genertate correct DP native assembly instructions.When debugging, I can verify correct DP calculations.The only thing I cannot manage is to printf results in double precision.
I have tried the formatters %f and %lf but on every printf only single precision is printed out (correctly rounded results but with too less digits).I am using newlib speed optimized (NOT nano), but also the lib stdc++ does the same.Has somebody ever seen the printf output in double precision for an embedded ARM Cortex-M7 code with GCC & if yes, how do I get it running ?Thank you in advance,
GahlenSolved! Go to Solution.
2017-06-14 08:21 AM
Some Cortex M7 extend well into the lower Cortex A range, clock frequency - wise.
The pressure could come from the Cortex A licensees, or from ARM itself (price difference between a Cortex M and Cortex A license ...).
Just as a reminder, Cortex M0/M0+ are limited to 50MHz clock frequency by ARM's license conditions, not technically.
2017-06-14 10:35 AM
I tried to engage ST and ARM in a discussion about the implementation. ST's chose to build their first M7 with the FPU-S, ATMEL chose the fuller spec'd FPU-D. I believe it is a die size issue, my assumption would be the FPU-D used 3x more gates.
32-bit floats in my opinion are next to useless, and frequently dangerous. People chose floats because it solves the issue of them managing the numeric range they think they need (ie they get to ignore it), but frequently what they want is the least significant digits to be usable, which is the exact opposite of what they get. It also requires significantly more thought than people usually expend on the flow, order and ranges of the maths across the computation.
I come from a background using Motorola MC68881/2 and Intel 80x87 FPU, if someone built the former in current silicon geometries it would be awesome. It had ~2.5x more transistors than the 68K processor. This is a fraction of the transistors in a 2MB FLASH array, or 256KB RAM. Motorola lost the plot with the MC68040 with a stripped down FPU. The x86 implementation was perhaps quirkier, but still very solid, back in the day DOS compilers allowed for 80-bit floating point, Win32 C compilers limited you to 64-bit, while professional FORTRAN tools still supported the higher precision, because that's what's critically important in most cases.
The FPU-S in the M4 and M7 is a check-box item, a handful of people are probably using it for DSP, but few are using it do real maths (ie not 12/16-bit ADC data). Most people I track use the M4 because the parts have more memory, the M3 (STM32F2) would still have traction if not abandoned. The assumption there is that the M4 added a few instruction, and adding the FPU was fractional compared to all the other silicon committed to the peripherals and memory. It is a volume game, and I suspect most don't even use the FPU. I know people using the M3 core and a lot of internal SRAM to solve math problems with 64-bit doubles, people who'd be a prime customer for a real FPU.
2017-06-14 10:47 AM
>>
Just as a reminder, Cortex M0/M0+ are limited to 50MHz clock frequency by ARM's license conditions, not technically.
Counter intuitive if you ask me, ARM shouldn't try to cause artificial stratification. The 8-bit/8051 guys have low expectations...
No doubt some more inventive users of the core have have evaluated the critical paths.
From ST's perspective the Flash array is going to be the primary limitation, and the lack of ART or cache to mask it's performance.