AnsweredAssumed Answered

stm32f3 assembly instruction execution timing accuracy

Question asked by tag.aseok on Jan 24, 2017
Latest reply on Jan 29, 2017 by tag.aseok

Hello

suppose this code:

array[0] = GPIOA->IDR;

array[1] = GPIOA->IDR;

.

.

.

array[n] = GPIOA->IDR;

Except first line, all other lines are assembled with a LDR and STRH instructions (tested with keil uvision IDE) that, as mentioned in arm cortex m4 TRM, they take 2 cpu cycles each:

.

.

.

LDR       r5, [r0,#0x00]

STRH    r5, [r1,#0xF92]

.

.

.

about STR it is mentioned in arm cortex m4 TRM that:

STR Rx,[Ry,#imm] is always one cycle. This is because the address generation is performed in the initial cycle, and the data store is performed at the same time as the next instruction is executing. If the store is to the write buffer, and the write buffer is full or not enabled, the next instruction is delayed until the store can complete. If the store is not to the write buffer, for example to the Code segment, and that transaction stalls, the impact on timing is only felt if another load or store operation is executed before completion.

And about LDR:

LDR [any] are pipelined when possible. This means that if the next instruction is an LDR or STR, and
the destination of the first LDR is not used to compute the address for the next instruction, then one
cycle is removed from the cost of the next instruction. So, an LDR might be followed by an STR, so
that the STR writes out what the LDR loaded. More multiple LDRs can be pipelined together. Some
optimized examples are:
LDR R0,[R1]; LDR R1,[R2] - normally three cycles total.
LDR R0,[R1,R2]; STR R0,[R3,#20] - normally three cycles total.
LDR R0,[R1,R2]; STR R1,[R3,R2] - normally three cycles total.
LDR R0,[R1,R5]; LDR R1,[R2]; LDR R2,[R3,#4] - normally four cycles total.

So we can assume that each array[i] = gpio->idr should take 3 cpu cycles (or 2? because of pipelining), here is the result of reading 1000 lines of such code, one just after another:

f3_prefetch disable: 4495 cpu cycles
f3_prefetch enable: 2503 cpu cycles.

So, how can we interpret these results?

Thanks.

 

Outcomes