cancel
Showing results for 
Search instead for 
Did you mean: 

How many clock cycles does each assembly instruction require in STM32?

hosseinam1370
Associate II

If I couldn't find documentation specifying the execution time or clock cycles required for each assembly instruction, and if I want to calculate the processor speed in executing instructions, what should I do?

Is there a resource available for calculating the number of clock cycles for each assembly instruction?

thanks.

 

1 ACCEPTED SOLUTION

Accepted Solutions
AScha.3
Chief

This is an ARM core, so look, what they give as timing...

https://developer.arm.com/documentation/#f-navigationhierarchiescontenttype=Technical%20Reference%20Manual&cf-navigationhierarchiesproducts=%20IP%20Products,Processors&numberOfResults=48

And its not so simple, to give just "this instruction is one cycle ", because these cpu is designed to work together with the compiler, that makes the code, and depending on compiler settings then.

Maybe an instruction is followed by other instruction, that can be executed same time, or other, that needs long address (+ 1 cycle), it depends on what the compiler arranges (out of order execution...) and is the instruction in cached area or needs a new flash access (+ x waitstates), so better you look on effective speed, at different optimizer settings, than looking at a special instruction. Most are "one cycle" basically, because this is a RISC cpu - but some wait cycles may come, depending on surrounding code.

If you feel a post has answered your question, please click "Accept as Solution".

View solution in original post

8 REPLIES 8
Michal Dudka
Senior III

https://www.st.com/resource/en/programming_manual/pm0056-stm32f10xxx20xxx21xxxl1xxxx-cortexm3-programming-manual-stmicroelectronics.pdf

but be ware that some instructions (accessing to peripheral buses etc) can take longer in dependency on bus clock and its usage...

SofLit
ST Employee

For Cortex-M3: https://developer.arm.com/documentation/ddi0337/h/programmers-model/instruction-set-summary/cortex-m3-instructions

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.

The pipeline generally allows for a throughput of one instruction per cycle.

You should also look at the DWT unit and the CYCCNT register to benchmark code and loops, etc. 

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Pavel A.
Evangelist III

what should I do?

Please tell what is your real need? Is it purely academic?

 

AScha.3
Chief

This is an ARM core, so look, what they give as timing...

https://developer.arm.com/documentation/#f-navigationhierarchiescontenttype=Technical%20Reference%20Manual&cf-navigationhierarchiesproducts=%20IP%20Products,Processors&numberOfResults=48

And its not so simple, to give just "this instruction is one cycle ", because these cpu is designed to work together with the compiler, that makes the code, and depending on compiler settings then.

Maybe an instruction is followed by other instruction, that can be executed same time, or other, that needs long address (+ 1 cycle), it depends on what the compiler arranges (out of order execution...) and is the instruction in cached area or needs a new flash access (+ x waitstates), so better you look on effective speed, at different optimizer settings, than looking at a special instruction. Most are "one cycle" basically, because this is a RISC cpu - but some wait cycles may come, depending on surrounding code.

If you feel a post has answered your question, please click "Accept as Solution".

Completely agree with @AScha.3 ! ARM Cortex-M core is a RISC architecture and in terms in MIPS We can assume that average instruction by cycle = 1. 

as said also by @Tesla DeLorean it is linked to  Pipeline - each stage is 1 HCLK , with complex pipelines such as Cortex-M7 it is a dual issue and 2 instructions can execute at same cycle  but can be up to 5 to 7 stages . the nightmare for Pipelines are Branches which is the most costing in terms of cycles and flush the pipeline.

Hope it helps you .

STOne-32

gregstm
Senior III

You can use a software simulator to step through your assembly code and watch the cycle counter - but it will probably not show all the delays with memory accesses - that's where the fun starts, organising the register loads/unloads so they don't slow down the other operations as much.