2022-04-12 02:45 AM
I decided to run an experiment on my STM32F767ZI microcontroller. I turned on the cycle count register (DWT->CYCCNT) and decided to run the program step by step on Keil. I set the clock to the maximum amount possible with a PLL on this MCU, which is 216 MHz. I found out that stepping into __nop() will add a value of 17 to the CYCCNT register. I tried this experiment multiple times and every time I got the same result.
What may the reason be? I thought that __nop() will only take one clock to finish?
Solved! Go to Solution.
2022-04-12 05:34 AM
Isn't the design synchronous? The machine cycles shouldn't care if it's 216 or 42 MHz
The debugger is invasive, you need to benchmark normal running operation, measure the elapsed time for 100x NOP in a loop ten times. Then compute the throughput.
2022-04-12 05:10 AM
Are you stepping through C source or disassembly? The M7 core has a more complicated pipeline and cycles per instruction are not always constant compared to a simpler core like the M4 (where they still may not be constant, but are much more consistent).
The debugger strives to work the same as if it were in run mode, minus the pauses, but it's not always the case. I would imagine comparing CYCCNT before/after a NOP would result in much less than 17 when in run mode.
2022-04-12 05:34 AM
Isn't the design synchronous? The machine cycles shouldn't care if it's 216 or 42 MHz
The debugger is invasive, you need to benchmark normal running operation, measure the elapsed time for 100x NOP in a loop ten times. Then compute the throughput.
2022-04-12 05:56 AM
> The machine cycles shouldn't care if it's 216 or 42 MHz
No; but they will care if the instruction is fetched from FLASH, and its latency is set to different values for the two different frequencies.
Also, in Cortex-M7, NOP may be executed in 0 cycles, too.
JW
2022-04-13 01:20 AM
Yup, this was the answer. The debugger was messing the clock up.
I put 100 nops and decided to measure the DWT->CYCCNT between the start of the first one and the end of the last one, and ended up with a difference of 93.
I don't know why it's not exactly 100, but still, it fits the fact that every __nop() takes one clock.