Skip to main content
ppannuto
Associate
March 2, 2016
Question

More efficient startup code

  • March 2, 2016
  • 2 replies
  • 762 views
Posted on March 02, 2016 at 22:03

Hi all,

For an application I'm working on, I needed to speed up startup. I looked into the .init and .bss routines and saw that they weren't very efficient. I re-wrote them and got about a 2.2x improvement in startup time for my application. The revised code does the same thing, just in fewer cycles.

Feel free to use freely if this is useful for anyone. @ST, feel free to copy into the library you distribute.

https://gist.github.com/ppannuto/672328eb8184abdb9559

-Pat

#startup-speed-efficiency-fast
    This topic has been closed for replies.

    2 replies

    Amel NASRI
    Technical Moderator
    March 3, 2016
    Posted on March 03, 2016 at 16:17

    Hi pannuto.pat,

    Thanks for sharing your new startup file for gcc.

    I would like to understand in which sens ''.init and .bss routines are not very very efficient''?

    Then is it possible to provide us more details on the updates you made in order to decrease the startup time?

    -Mayla-

    To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.
    ppannuto
    ppannutoAuthor
    Associate
    March 3, 2016
    Posted on March 03, 2016 at 19:45

    Sure, if you compare the running of the two loops, the original code

    executed several more instructions per loop, every loop iteration it

    would read the same memory address into the same register, which it

    didn't need to do. You can also eliminate the adds that increments

    the pointer by using the stmia [store and increment after] instruction;

    for a single store operation, stm and stmia both take 2 cycles. (On more

    powerful (cortex-m3 and up) cores, you usually use postfix addressing,

    i.e. str r0, [r1], #4 to do this, but postfix isn't supported on the m0,

    stmia is, however).

    In the old code, the loop part was

      ldr  r3, =_sidata

      ldr  r3, [r3, r1]

      str  r3, [r0, r1]

      adds  r1, r1, #4

      ldr  r0, =_sdata

      ldr  r3, =_edata

      adds  r2, r0, r1

      cmp  r2, r3

      bcc  CopyDataInit

      5 memory operations x 2 cycles each = 10 cycles

    + 3 alu operations    x 1 cycle  each =  3 cycles

    + 1 branch opreation  x 1 cycle (usu) =  1 cycle

    For 14 cycles / loop. In the new code, the loop part is

      ldmia r2!, {r3}

      stmia r0!, {r3}

      cmp   r0, r1

      bcc   CopyDataInitializersLoop

      2 memory/alu operations x 2 cycles each = 4 cycles

      1 alu operation         x 1 cycle  each = 1 cycle

      1 branch operation      x 1 cycle (usu) = 1 cycle

    For 6 cycles / loop.

    Is this clear?

    -Pat

    Complete Old Copy Data:

      movs  r1, #0

      b  LoopCopyDataInit

    CopyDataInit:

      ldr  r3, =_sidata

      ldr  r3, [r3, r1]

      str  r3, [r0, r1]

      adds  r1, r1, #4

    LoopCopyDataInit:

      ldr  r0, =_sdata

      ldr  r3, =_edata

      adds  r2, r0, r1

      cmp  r2, r3

      bcc  CopyDataInit

    Complete New Copy Data:

    CopyDataInitializersStart:

      ldr   r0, =_sdata   /* write to this addr */

      ldr   r1, =_edata   /* until you get to this addr */

      ldr   r2, =_sidata  /* reading from this addr */

      b     CopyDataInitializersEnterLoop

    CopyDataInitializersLoop:

      ldmia r2!, {r3}

      stmia r0!, {r3}

    CopyDataInitializersEnterLoop:

      cmp   r0, r1

      bcc   CopyDataInitializersLoop