cancel
Showing results for 
Search instead for 
Did you mean: 

SQRT, 32bit X 32bit multiply, 32bitX16bit divide execution time

drobison
Associate II
Posted on January 25, 2013 at 17:28

I have spent a lot of time with the manuals for the stm32f4, but have had no luck finding the information I need regarding how long it takes to execute certain instructions.  I assume add, takes maybe two CPU cycles.  Moving from one register to another must take only a few cycles too, but on the st10 I was working with before multiply took ten times longer than add and division took 15 times longer.  I really need to do square roots, too, so I need to know how long that instruction is going to take to execute.

Have you come across this information in the manuals?  Could you give me a citation?  How long do the different instructions take to execute?

#instruction-time-cpu-cycle-sqrt
6 REPLIES 6
Martin Davey
Associate III
Posted on January 25, 2013 at 17:45

Hi,

It's ARM you need to get the info from:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0439b/CHDDIGAC.html

Cheers,

Martin.

PS square root of float is quoted as 14 cycles.

Posted on January 25, 2013 at 17:47

The instruction execution times would be the domain of the ARM documentation.

The process is also pipelined, so there is latency vs through put considerations.

Most register base instructions are going to be single cycle in terms of entering the pipeline. The divide has a 10-12 cycle latency.

Where you get implementation specific behaviour  is load/store and the buses it has to transact across. You will also need to consider the write buffer, and prefetch, and the interaction with the ART cache in front of the FLASH.

You should use the core cycle counter in the Trace Unit (DWT_CYCCNT) to benchmark the throughput of your code, with your system configuration.
Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..
drobison
Associate II
Posted on January 25, 2013 at 17:57

You have provided a great resource to me.  where did you find the squareroot 14 cycle thing?  Is the floating point processor ARM too?

Posted on January 25, 2013 at 18:19

A quick google for the cortex-m4 sqrt offers some cycle speeds

http://www.arm.com/files/pdf/dspconceptsm4presentation.pdf

The FPU attached to the M4 (M4F) is an ARM design

Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..
Posted on January 25, 2013 at 18:35

[DEAD LINK /public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Flat.aspx?RootFolder=/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Duration%20of%20FLOAT%20operations&FolderCTID=0x01200200770978C69A1141439FE559EB459D7580009C4E14902C3CDE46A77F0FFD06506F5B&currentviews=2435]https://my.st.com/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Flat.aspx?RootFolder=%2Fpublic%2FSTe2ecommunities%2Fmcu%2FLists%2Fcortex_mx_stm32%2FDuration%20of%20FLOAT%20operations&FolderCTID=0x01200200770978C69A1141439FE559EB459D7580009C4E14902C3CDE46A77F0FFD06506F5B¤tviews=2435

[DEAD LINK /public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Flat.aspx?RootFolder=/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Why%20is%20my%20Cortex-M4%20taking%20too%20much%20cycles&FolderCTID=0x01200200770978C69A1141439FE559EB459D7580009C4E14902C3CDE46A77F0FFD06506F5B&currentviews=213]https://my.st.com/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Flat.aspx?RootFolder=%2Fpublic%2FSTe2ecommunities%2Fmcu%2FLists%2Fcortex_mx_stm32%2FWhy%20is%20my%20Cortex-M4%20taking%20too%20much%20cycles&FolderCTID=0x01200200770978C69A1141439FE559EB459D7580009C4E14902C3CDE46A77F0FFD06506F5B¤tviews=213

Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..
Martin Davey
Associate III
Posted on January 25, 2013 at 19:03

Hi,

This is the PDF version:

http://infocenter.arm.com/help/topic/com.arm.doc.ddi0439b/DDI0439B_cortex_m4_r0p0_trm.pdf

Section 7-5 (FPU section), VSQRT.F32.

Cheers,

Martin.