2026-01-29 11:47 PM
My original aim was to prove to myself that I was able to measure execution time with cycle accuracy. In order to do that, I have 9 'test_dwt' functions, they all have same code except number of nop between 2 reads of DWT.
That gives unexpected results because M55 has dual-issue capability, as people explained here.
Nevertheless, I am puzzled with the result of test_dwt7 (7 nops). All clocks are at 100MHz, interrupts, icache and dcache are disabled, all 'test_dwt' functions starts with isb and dsb instructions.
Here I print the result of 5 executions of each function. The execution order is 0 to 7c, repeated 5 times, saving results in an array. Then I print the results:
test_dwt0 : 1 1 1 1 1
test_dwt1 : 1 1 1 1 1
test_dwt2 : 1 1 1 1 1
test_dwt3 : 2 2 2 2 2
test_dwt4 : 2 2 2 2 2
test_dwt5 : 3 3 3 3 3
test_dwt6 : 3 3 3 3 3
test_dwt7 : 27 27 27 27 27
test_dwt8 : 5 5 5 5 5
test_dwt7b : 4 4 4 4 4
test_dwt7c : 4 4 4 4 4the code of the 4 last functions:
34180824 <test_dwt7>:
34180824: f3bf 8f6f isb sy
34180828: f3bf 8f4f dsb sy
3418082c: f8df 3018 ldr.w r3, [pc, #24] @ 34180848 <.test_dwt7.dwt_base>
34180830: 6858 ldr r0, [r3, #4]
34180832: bf00 nop
34180834: bf00 nop
34180836: bf00 nop
34180838: bf00 nop
3418083a: bf00 nop
3418083c: bf00 nop
3418083e: bf00 nop
34180840: 6859 ldr r1, [r3, #4]
34180842: eba1 0000 sub.w r0, r1, r0
34180846: 4770 bx lr
34180848 <.test_dwt7.dwt_base>:
34180848: 1000 asrs r0, r0, #32
3418084a: e000 b.n 3418084e <test_dwt8+0x2>
3418084c <test_dwt8>:
3418084c: f3bf 8f6f isb sy
34180850: f3bf 8f4f dsb sy
34180854: 4b06 ldr r3, [pc, #24] @ (34180870 <.test_dwt8.dwt_base>)
34180856: 6858 ldr r0, [r3, #4]
34180858: bf00 nop
3418085a: bf00 nop
3418085c: bf00 nop
3418085e: bf00 nop
34180860: bf00 nop
34180862: bf00 nop
34180864: bf00 nop
34180866: bf00 nop
34180868: 6859 ldr r1, [r3, #4]
3418086a: eba1 0000 sub.w r0, r1, r0
3418086e: 4770 bx lr
34180870 <.test_dwt8.dwt_base>:
34180870: 1000 asrs r0, r0, #32
34180872: e000 b.n 34180876 <test_dwt7b+0x2>
34180874 <test_dwt7b>:
34180874: f3bf 8f6f isb sy
34180878: f3bf 8f4f dsb sy
3418087c: f8df 3018 ldr.w r3, [pc, #24] @ 34180898 <.test_dwt7b.dwt_base>
34180880: 6858 ldr r0, [r3, #4]
34180882: bf00 nop
34180884: bf00 nop
34180886: bf00 nop
34180888: bf00 nop
3418088a: bf00 nop
3418088c: bf00 nop
3418088e: bf00 nop
34180890: 6859 ldr r1, [r3, #4]
34180892: eba1 0000 sub.w r0, r1, r0
34180896: 4770 bx lr
34180898 <.test_dwt7b.dwt_base>:
34180898: 1000 asrs r0, r0, #32
3418089a: e000 b.n 3418089e <test_dwt7c+0x2>
3418089c <test_dwt7c>:
3418089c: f3bf 8f6f isb sy
341808a0: f3bf 8f4f dsb sy
341808a4: 4b06 ldr r3, [pc, #24] @ (341808c0 <.test_dwt7c.dwt_base>)
341808a6: 6858 ldr r0, [r3, #4]
341808a8: bf00 nop
341808aa: bf00 nop
341808ac: bf00 nop
341808ae: bf00 nop
341808b0: bf00 nop
341808b2: bf00 nop
341808b4: bf00 nop
341808b6: 6859 ldr r1, [r3, #4]
341808b8: eba1 0000 sub.w r0, r1, r0
341808bc: 4770 bx lr
...
341808c0 <.test_dwt7c.dwt_base>:
341808c0: 1000 asrs r0, r0, #32
341808c2: e000 b.n 341808c6 <test_pmu0+0x2>What is causing such large delay in test_dwt7 ? (I observe a similar thing with PMU)