The device is STM32F746NG-DISCO, and I measure the execution time of single instruction by this ways:
make breakpoint and watch the value of States. And by this, I found that for ADD, SUBS, MOV, they are all 12 clock cycles.
And if I test by this:
It's 423 clock cycles.
I know the cortex-m7 is a dual-issued core but Why the st-link tracer looks a little no accuracy?
So my questions are:
1. If I break the code instruction by instruction, most of the instruction takes 12 clock cycle, how many clock cycles in this is cost by the debugger and tracer? 6 clock cycle? (12 clock cycle - 6 stage pipeline of the instruction = 6 clock cycle)
2. The whole progress is 423 clock cycles (may include 6 clock cycles' debugger latency), could that be calculated out without measuring in debugger?