Why it always takes 12 clock cycle while using the STLINK to trace single instruction's execution time?(about dual-issue, St-link and Cortex-M7 pipeline)

Question asked by Lingjun Kong on Aug 22, 2017

The device is STM32F746NG-DISCO, and I measure the execution time of single instruction by this ways:


int a=0;
int b=255;




make breakpoint and watch the value of States. And by this, I found that for ADD, SUBS, MOV, they are all 12 clock cycles.


And if I test by this:


It's 423 clock cycles.


I know the cortex-m7 is a dual-issued core but Why the st-link tracer looks a little no accuracy?


So my questions are:


1. If I break the code instruction by instruction, most of the instruction takes 12 clock cycle, how many clock cycles in this is cost by the debugger and tracer?  6 clock cycle? (12 clock cycle - 6 stage pipeline of the instruction = 6 clock cycle)


2. The whole progress is 423 clock cycles (may include 6 clock cycles' debugger latency), could that be calculated out without measuring in debugger?