instruction timing

smead · ‎2011-12-26

Posted on December 27, 2011 at 08:50

I'm debugging on an Olimex STM32-P103. I think I have the PLL set up at 8 times the 8Mhz crystal. If so, then instruction timing is pretty poor. Here's my loop:

ldr r0, =portc

ldr r0, [r0]

mov r1, #0x80

mov r2, #0

loop_io:

str r1, [r0]

str r2, [r0]

b loop_io

portc: .word 0x4001100c

smead · ‎2011-12-26

Posted on December 27, 2011 at 08:52

Even this website doesn't work for me - Chrome on Linux. I'm getting 129 nS in the loop, 43 nS per instruction. Why''

smead · ‎2011-12-26

Posted on December 27, 2011 at 08:54

I meant 9 times for 72 Mhz clock.

Tesla DeLorean · ‎2011-12-27

Posted on December 27, 2011 at 14:00

Ignoring for a second the impact of flash reads (~42ns), a write through buffer to an APB location is 4 cycles.

So optimally your loop is the order of 9 cycles.

72 MHz = 13.89 ns

9 x 13.89 = 125.01 ns

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

smead · ‎2011-12-28

Posted on December 29, 2011 at 04:23

So here's another timing quiz. STM32f103 Olimex board, 72 mHz.

#ifdef TESTING_TIMING

ldr r0, =portc

ldr r0, [r0]

ldr r1, [r0]

bic r1, #0x80

str r1, [r0]

#endif

ldr r1, = tmuxr_system_time

ldrd r0, r1, [r1]

ldr r3, =tick_amount /* nS per tick */

ldr r2, [r3]

mov r3, #0

adds r2, r2, r0

adc r3, r3, r1

ldr r0, =tmuxr_system_time /* the address of a 64 bit */

strd r2, r3, [r0]

// bl scope_off

#ifdef TESTING_TIMING

ldr r0, =portc

ldr r0, [r0]

ldr r1, [r0]

mov r2, #0x80

orr r1, r2

str r1, [r0]

#endif

The scope says 840 nS.

Here's the code generated by gcc.

push {r0}

ldr r0, [pc, #164]

ldr r0, [r0, #0]

ldr r1, [r0, #0]

bic.w r1, r1, #128

str r1, [r0, #0] set scope low

ldr r1, [pc, #156]

ldrd r0, r1, [r1]

ldr r3, [pc, #152]

ldr r2, [r3, #0]

mov.w r3, #0

adds r2, r2, r0

adc.w r3, r3, r1

ldr r0, [pc, #136]

strd r2, r3, [r0]

ldr r0, [pc, #124]

ldr r0, [r0, #0]

ldr r1, [r0, #0]

mov.w r2, #128

orr.w r1, r1, r2

str r1, [r0, #0] set scope high

That's 15 instructions for an average of 56 nS per. There is one read of

the portc register and 2 stores to it so 3 AHPB operations.

Does this timing make sense?

Tesla DeLorean · ‎2011-12-29

Posted on December 29, 2011 at 19:22

As entertaining an exercise as that might be, the literal loads (via PC) are likely to be quite expensive (as they are apt to expose the speed of the flash array, compared to prefetch which get masked), and the indirect, then R-M-W of the GPIO register also seems rather inefficient. I'd tend to use the GPIOx_BSRR and GPIOx_BRR, and cache constants in registers. Still 60 cycles does seem rather high.

Consider also using the DWT cycle count (DWT_CYCCNT) to benchmark instruction cycle measurements.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..