cancel
Showing results for 
Search instead for 
Did you mean: 

__nop() taking more than one clock cycle?

IJoe.1
Associate II

I decided to run an experiment on my STM32F767ZI microcontroller. I turned on the cycle count register (DWT->CYCCNT) and decided to run the program step by step on Keil. I set the clock to the maximum amount possible with a PLL on this MCU, which is 216 MHz. I found out that stepping into __nop() will add a value of 17 to the CYCCNT register. I tried this experiment multiple times and every time I got the same result.

What may the reason be? I thought that __nop() will only take one clock to finish?

1 ACCEPTED SOLUTION

Accepted Solutions

Isn't the design synchronous? The machine cycles shouldn't care if it's 216 or 42 MHz

T​he debugger is invasive, you need to benchmark normal running operation, measure the elapsed time for 100x NOP in a loop ten times. Then compute the throughput.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

View solution in original post

4 REPLIES 4
TDK
Guru

Are you stepping through C source or disassembly? The M7 core has a more complicated pipeline and cycles per instruction are not always constant compared to a simpler core like the M4 (where they still may not be constant, but are much more consistent).

The debugger strives to work the same as if it were in run mode, minus the pauses, but it's not always the case. I would imagine comparing CYCCNT before/after a NOP would result in much less than 17 when in run mode.

If you feel a post has answered your question, please click "Accept as Solution".

Isn't the design synchronous? The machine cycles shouldn't care if it's 216 or 42 MHz

T​he debugger is invasive, you need to benchmark normal running operation, measure the elapsed time for 100x NOP in a loop ten times. Then compute the throughput.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

> The machine cycles shouldn't care if it's 216 or 42 MHz

No; but they will care if the instruction is fetched from FLASH, and its latency is set to different values for the two different frequencies.

Also, in Cortex-M7, NOP may be executed in 0 cycles, too.

JW

Yup, this was the answer. The debugger was messing the clock up.

I put 100 nops and decided to measure the DWT->CYCCNT between the start of the first one and the end of the last one, and ended up with a difference of 93.

I don't know why it's not exactly 100, but still, it fits the fact that every __nop() takes one clock.