Dear STM-experts,

i wanted to use the FPU of my STM32F4 (Cortex-M4). To see if it's working properly i compared with this page:

He is using exactly the same processor and toolchain (With GCC Compiler).

Here is how long it takes with my settings:

REFERENCE / Reference // My controller running from Flash // My controller running from Sram

long lX, lY, lZ;

lX = 123L; // 2 cycle // 2 cycle // 5 cycles

lY = 456L; // 2 cycle // 3 cycles // 3 cycles

lZ = lX*lY; // 5 cycles // 7 cycles // 9 cycles

fX = 123.456; // 3 cycles // 5 cycles // 4 cycles

fY = 9.99; // 3 cycles // 5 cycles // 4 cycles

fZ = fX * fY; // 6 cycles // 10 cycles // 10 cycles

fZ = sqrt(fY); // 20 cycles // 2742 cycles // 3405 cycles

fZ = sin(1.23); // 124 cycles // 1918 cycles // 2552

The settings are Arm architecture: v7EM

Arm core type: Cortex-M4

Arm FP Abi Type: Soft-FP (Or Hard, doens't make a huge difference)

Arm FPU Type: FPv4-SP-D16

GCC target: arm-unknown-eabi

So not only the floating point arithmetic is runing slower but also integer! And sin and sqrt are horrible!!

The offset of my cycle measurement is deducted.

In CP10 and CP11 is 0b11 so FPU should be enabled properly.

Do you have any idea what is wrong with my settings or my toolchain or whatever??

Thank you so much for you efforts!

Florian

Here with Keil, timing 1000 iterations, and subtracting null loop time.

1765.1 cycles sqrt

4608.1 cycles sin

42.0 cycles sqrtf

96.1 cycles sinf

Without FPU

1746.1 cycles sqrt

4251.2 cycles sin

358.0 cycles sqrtf

924.3 cycles sinf