More efficient startup code
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2016-03-02 1:03 PM
Posted on March 02, 2016 at 22:03
Hi all,
For an application I'm working on, I needed to speed up startup. I looked into the .init and .bss routines and saw that they weren't very efficient. I re-wrote them and got about a 2.2x improvement in startup time for my application. The revised code does the same thing, just in fewer cycles.Feel free to use freely if this is useful for anyone. @ST, feel free to copy into the library you distribute.https://gist.github.com/ppannuto/672328eb8184abdb9559-Pat #startup-speed-efficiency-fast
This discussion is locked. Please start a new topic to ask your question.
2 REPLIES 2
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2016-03-03 7:17 AM
Posted on March 03, 2016 at 16:17Hi pannuto.pat,Thanks for sharing your new startup file for gcc.I would like to understand in which sens ''.init and .bss routines are not very very efficient''?Then is it possible to provide us more details on the updates you made in order to decrease the startup time?-Mayla-
To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2016-03-03 10:45 AM
Posted on March 03, 2016 at 19:45
Sure, if you compare the running of the two loops, the original code
executed several more instructions per loop, every loop iteration itwould read the same memory address into the same register, which itdidn't need to do. You can also eliminate the adds that incrementsthe pointer by using the stmia [store and increment after] instruction;for a single store operation, stm and stmia both take 2 cycles. (On morepowerful (cortex-m3 and up) cores, you usually use postfix addressing,i.e. str r0, [r1], #4 to do this, but postfix isn't supported on the m0,stmia is, however).In the old code, the loop part was ldr r3, =_sidata ldr r3, [r3, r1] str r3, [r0, r1] adds r1, r1, #4 ldr r0, =_sdata ldr r3, =_edata adds r2, r0, r1 cmp r2, r3 bcc CopyDataInit 5 memory operations x 2 cycles each = 10 cycles+ 3 alu operations x 1 cycle each = 3 cycles+ 1 branch opreation x 1 cycle (usu) = 1 cycleFor 14 cycles / loop. In the new code, the loop part is ldmia r2!, {r3} stmia r0!, {r3} cmp r0, r1 bcc CopyDataInitializersLoop 2 memory/alu operations x 2 cycles each = 4 cycles 1 alu operation x 1 cycle each = 1 cycle 1 branch operation x 1 cycle (usu) = 1 cycleFor 6 cycles / loop.Is this clear?-PatComplete Old Copy Data: movs r1, #0 b LoopCopyDataInitCopyDataInit: ldr r3, =_sidata ldr r3, [r3, r1] str r3, [r0, r1] adds r1, r1, #4LoopCopyDataInit: ldr r0, =_sdata ldr r3, =_edata adds r2, r0, r1 cmp r2, r3 bcc CopyDataInitComplete New Copy Data:CopyDataInitializersStart: ldr r0, =_sdata /* write to this addr */ ldr r1, =_edata /* until you get to this addr */ ldr r2, =_sidata /* reading from this addr */ b CopyDataInitializersEnterLoopCopyDataInitializersLoop: ldmia r2!, {r3} stmia r0!, {r3}CopyDataInitializersEnterLoop: cmp r0, r1 bcc CopyDataInitializersLoop