i want to use the FPU of my STM32F4Discovery. So i did the following compiler options:
-Arm Architecture: v7EM
-Arm Core Type: Cortex-M4
-Arm FPU Type: FPv4-SP-D16
-GCC Target: arm-unknown-eabi
After this in CP10 and CP11 is 0b11, which should be good.
But i found som test code on the net (Here):
#define CORE_SysTickEn() (*((u32*)0xE0001000)) = 0x40000001
#define CORE_SysTickDis() (*((u32*)0xE0001000)) = 0x40000000
#define CORE_GetSysTick() (*((u32*)0xE0001004))
float f = 1.01f;
vu32 it = CORE_GetSysTick();
float f2 = f * 2.29f;
vu32 it2 = CORE_GetSysTick() - it;
He needs for this 11 cycles but 6 are for the clock calculation.
So he needs 5 cycles for the multiplication and assignment.
But I need 18 cycles total which means 12 for the multiplication and assignment :( :(
He also uses the STM32F4Discovery.
Any ideas what could be the reason for that? 12 are definitely too much for this multiplication and the assignment...
My disassembly code for this is:
float f2 = f * 2.29f;ED977A03 vldr s14, [r7, #12]
EDDF7A0B vldr s15, 0x080003F0 <__text_start__+0x5C>
EE677A27 vmul.f32 s15, s14, s15
EDC77A02 vstr s15, [r7, #8]
Thank you for responses!