Associate

Question

More efficient startup code

Forum|Forum|10 years ago
March 2, 2016
2 replies
762 views

Posted on March 02, 2016 at 22:03

Hi all,

For an application I'm working on, I needed to speed up startup. I looked into the .init and .bss routines and saw that they weren't very efficient. I re-wrote them and got about a 2.2x improvement in startup time for my application. The revised code does the same thing, just in fewer cycles.

Feel free to use freely if this is useful for anyone. @ST, feel free to copy into the library you distribute.

https://gist.github.com/ppannuto/672328eb8184abdb9559

-Pat

#startup-speed-efficiency-fast

This topic has been closed for replies.

Amel NASRI

Technical Moderator

Posted on March 03, 2016 at 16:17

Hi pannuto.pat,

Thanks for sharing your new startup file for gcc.

I would like to understand in which sens ''.init and .bss routines are not very very efficient''?

Then is it possible to provide us more details on the updates you made in order to decrease the startup time?

-Mayla-

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

ppannutoAuthor

Associate

Posted on March 03, 2016 at 19:45

Sure, if you compare the running of the two loops, the original code

executed several more instructions per loop, every loop iteration it

would read the same memory address into the same register, which it

didn't need to do. You can also eliminate the adds that increments

the pointer by using the stmia [store and increment after] instruction;

for a single store operation, stm and stmia both take 2 cycles. (On more

powerful (cortex-m3 and up) cores, you usually use postfix addressing,

i.e. str r0, [r1], #4 to do this, but postfix isn't supported on the m0,

stmia is, however).

In the old code, the loop part was

ldr r3, =_sidata

ldr r3, [r3, r1]

str r3, [r0, r1]

adds r1, r1, #4

ldr r0, =_sdata

ldr r3, =_edata

adds r2, r0, r1

cmp r2, r3

bcc CopyDataInit

5 memory operations x 2 cycles each = 10 cycles

+ 3 alu operations x 1 cycle each = 3 cycles

+ 1 branch opreation x 1 cycle (usu) = 1 cycle

For 14 cycles / loop. In the new code, the loop part is

ldmia r2!, {r3}

stmia r0!, {r3}

cmp r0, r1

bcc CopyDataInitializersLoop

2 memory/alu operations x 2 cycles each = 4 cycles

1 alu operation x 1 cycle each = 1 cycle

1 branch operation x 1 cycle (usu) = 1 cycle

For 6 cycles / loop.

Is this clear?

-Pat

Complete Old Copy Data:

movs r1, #0

b LoopCopyDataInit

CopyDataInit:

ldr r3, =_sidata

ldr r3, [r3, r1]

str r3, [r0, r1]

adds r1, r1, #4

LoopCopyDataInit:

ldr r0, =_sdata

ldr r3, =_edata

adds r2, r0, r1

cmp r2, r3

bcc CopyDataInit

Complete New Copy Data:

CopyDataInitializersStart:

ldr r0, =_sdata /* write to this addr */

ldr r1, =_edata /* until you get to this addr */

ldr r2, =_sidata /* reading from this addr */

b CopyDataInitializersEnterLoop

CopyDataInitializersLoop:

ldmia r2!, {r3}

stmia r0!, {r3}

CopyDataInitializersEnterLoop:

cmp r0, r1

bcc CopyDataInitializersLoop

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded