cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F407 FPU clock cycles

Terry Barnaby
Associate III

I have been testing the performance of some maths functions on an STM32F407 CPU running at 168 MHz. From the documentation it is stated that the vmul.f32 (HW 32bit multiply instruction) should take 1 clock cycle to operate. Using my simple test program it looks like it is taking 2 clock cycles. The same test code implementing a "nop" or "add" (integer 32bit add) do appear to be taking 1 cycle so the test program looks ok, and the CPU's clock and FLASH configuration seems ok.

  1. Should the vmul.f32 take 1 clock cycle ?
  2. If so any ideas why it appears to be going slow in my case ?

I am using a GCC compiler with my own build environment which has been in use for many years.

1 REPLY 1
Terry Barnaby
Associate III

Ok, sorted this. I missed the note at the bottom of the ARM M4 documentations floating point cycle instruction table which says:

"Floating-point arithmetic data processing instructions, such as add, subtract, multiply, divide, square-root, all forms of multiply with accumulate, as well as conversions of all types take one cycle longer if their result is consumed by the following instruction."

My simple test code was executing a "vmul.f32 s16,s16,s16" continually and this takes two cycles per instruction.