Skip to main content
IJoe.1
Associate II
April 12, 2022
Solved

__nop() taking more than one clock cycle?

  • April 12, 2022
  • 3 replies
  • 4550 views

I decided to run an experiment on my STM32F767ZI microcontroller. I turned on the cycle count register (DWT->CYCCNT) and decided to run the program step by step on Keil. I set the clock to the maximum amount possible with a PLL on this MCU, which is 216 MHz. I found out that stepping into __nop() will add a value of 17 to the CYCCNT register. I tried this experiment multiple times and every time I got the same result.

What may the reason be? I thought that __nop() will only take one clock to finish?

This topic has been closed for replies.
Best answer by Tesla DeLorean

Isn't the design synchronous? The machine cycles shouldn't care if it's 216 or 42 MHz

T​he debugger is invasive, you need to benchmark normal running operation, measure the elapsed time for 100x NOP in a loop ten times. Then compute the throughput.

3 replies

TDK
Super User
April 12, 2022

Are you stepping through C source or disassembly? The M7 core has a more complicated pipeline and cycles per instruction are not always constant compared to a simpler core like the M4 (where they still may not be constant, but are much more consistent).

The debugger strives to work the same as if it were in run mode, minus the pauses, but it's not always the case. I would imagine comparing CYCCNT before/after a NOP would result in much less than 17 when in run mode.

"If you feel a post has answered your question, please click ""Accept as Solution""."
Tesla DeLorean
Tesla DeLoreanBest answer
Guru
April 12, 2022

Isn't the design synchronous? The machine cycles shouldn't care if it's 216 or 42 MHz

T​he debugger is invasive, you need to benchmark normal running operation, measure the elapsed time for 100x NOP in a loop ten times. Then compute the throughput.

Tips, Buy me a coffee, or three.. PayPal VenmoUp vote any posts that you find helpful, it shows what's working..
IJoe.1
IJoe.1Author
Associate II
April 13, 2022

Yup, this was the answer. The debugger was messing the clock up.

I put 100 nops and decided to measure the DWT->CYCCNT between the start of the first one and the end of the last one, and ended up with a difference of 93.

I don't know why it's not exactly 100, but still, it fits the fact that every __nop() takes one clock.

waclawek.jan
Super User
April 12, 2022

> The machine cycles shouldn't care if it's 216 or 42 MHz

No; but they will care if the instruction is fetched from FLASH, and its latency is set to different values for the two different frequencies.

Also, in Cortex-M7, NOP may be executed in 0 cycles, too.

JW