2021-11-30 09:55 AM
I'm trying to determine exact duration in clock cycles for simple delay loop below :
loop: subs r2, r2, 1
bne loop
I'm using directives :
.thumb
.syntax unified
RefManual says : 1 + P
P The number of cycles required for a pipeline refill. This ranges from 1 to 3 depending on the alignment and width of the target instruction, and whether the processor manages to speculate the address early.
ARM9 (5stage pipeline) has 3 cycles in this case for bne and 1 cycle for subs, Cortex M4 has 3 stage pipeline, but it seems quite similar, since first two stages look similar...
Thanks.
2021-11-30 10:35 AM
Why would it matter? Do you intend to write long and precise loopdelays?
This is something not specified exactly because there are too many variables. The core is configurable and is configured by the vendors. And then you have a significant impact of the bus fabric and memories. Just try to run that loop in an 'F4 from various addresses mod FLASH width; with various latency settings; with or without prefetch; with or without jumpcache; from SRAM1 mapped at 0 from halfword-odd or halfword-even address, from SRAM1 at its "native" address, again both cases; from SRAM2 again both cases. Plus impact of other busmasters. You can try all that easily yourself.
I'm absolutely sure that would you present enough incentive to ST, as expressed in $M++ of purchase, they merrily fire up their expensive simulators to provide you an exact answer for the various situations.
JW