LOAD/STORE speed in STM32

arcadiushh · ‎2009-06-11

Posted on June 11, 2009 at 14:48

arcadiushh · ‎2011-05-17

Posted on May 17, 2011 at 13:14

Welcome,

I've a question. How fastest execute the LDRB and STRB instruction?

I've the following program:

LDRB R2, [R1], #+1

STRB R2, [R0]

LDRB R2, [R1], #+1

STRB R2, [R0]

....etc.

Above code need 8 cycles, but other program:

LDRB R2, [R0]

STRB R2, [R1], #+1

LDRB R2, [R0]

STRB R2, [R1], #+1

.... etc.

need to execute 14 cycles!!! Why? This is almost as first program.

16-32micros · ‎2011-05-17

Posted on May 17, 2011 at 13:14

Hi,

Could you let me know the content of R0 and R1 in both cases ?

Cheers,

STOne-32.

arcadiushh · ‎2011-05-17

Posted on May 17, 2011 at 13:14

So, in first program:

R0 contain GPIOA high address ODR 0x4001080D

R1 contain address of memory buffer (in RAM), where from will read data

in second program:

R0 contain GPIOA low address IDR 0x40010808

R2 contain address of memory, where will write data of input port

16-32micros · ‎2011-05-17

Posted on May 17, 2011 at 13:14

Hi,

Could you repeat again the tests at a speed of 8Mhz running from internal RC

and see if these results are the same ? without putting wait-states for Flash, let them as default.

Cheers,

STOne-32.

arcadiushh · ‎2011-05-17

Posted on May 17, 2011 at 13:14

Result is the same for 8MHz. Wait states for Flash is important only while branch (probably...) Maybe GPIO are slow? Write to port is faster than read. I have result like this. Change registers are no effect.

ping · ‎2011-05-17

Posted on May 17, 2011 at 13:14

Hi, I am wondering how do you measure the number of cycles. This relates to your conclusion. :o

16-32micros · ‎2011-05-17

Posted on May 17, 2011 at 13:14

Quote:

On 09-06-2009 at 21:18, Anonymous wrote:

So, in first program:

R0 contain GPIOA high address ODR 0x4001080D

R1 contain address of memory buffer (in RAM), where from will read data

in second program:

R0 contain GPIOA low address IDR 0x40010808

R2 contain address of memory, where will write data of input port

I believe that R2 at the end is R1, right ? and was a typo, In that case this is easy not so difficult to explain , I have replaced R1 and R0 by their meaning :

Quote:

LDRB R2, [Internal RAM]

STRB R2, [GPIO DR]

LDRB R2, [Internal RAM]

STRB R2, [GPIO DR]

....etc.

Above code need 8 cycles, but other program:

LDRB R2, [GPIO DR]

STRB R2, [Internal RAM]

LDRB R2, [GPIO DR]

STRB R2, [Internal RAM]

.... etc.

need to execute 14 cycles!!! Why?

GPIO DR are located in the ''High speed APB'' bus after the AHBtoAPB Bridge

whereas The internal RAM is located at AHB Bus, In fact inside the AHBtoAPB bridge there is a ''write Buffer'' that buffers the writes to the APB locations. That this why in the first case the timing is minimum, However in the second case, when you load from ''APB'' using the LDRB, which is a blocking instruction, the high speed APB needs 1 additional cycle compared to RAM if the APB/AHB prescaler is equal to 1.

Hope this explains to you results.

Cheers,

STOne-32.

arcadiushh · ‎2011-05-17

Posted on May 17, 2011 at 13:14

Wow, this is a bit complex, but I understand now why read is slower in second case. Thank you very much! Can I read data faster with GPIO? For example, if exist other LOAD instruction? I've found nothing in datasheet yet.