IAR: cost of branching, how to place a method into RAM and would it improve performance?

arro239 · ‎2014-11-22

Posted on November 22, 2014 at 22:07

I am trying to reduce the duration of my critical zones so I am checking which section of the code costs how many processor cycles.

I am looking at CYCCNT and it seems as if entering or exiting a method alone costs about 30 cycles? I wonder if that has something to do with reading from flash with a ART accelerator miss? So I wonder if placing some of my key methods into RAM would help.

How does this work? I have a section defined in my .icf:

place in RAM_region {section .ram};

and I defined a method like

void m() @ ''.ram'' {

}

this compiles & links, but fails while flashing with an error saying that I am trying to write an address outside of the flash range. Which makes sense since RAM is not mapped into flash.

How does placing code into RAM work? Am I missing some attributes somewhere or would I have to copy the code from flash into ram manually? Does this have a chance to improve performance? Are there any penalties for going from flash execution into ram and back?

Tesla DeLorean · ‎2014-11-22

Posted on November 23, 2014 at 02:07

Doesn't IAR use this

__ramfunc void test(void)

{

//...

}

And use the C runtime startup code to copy it to RAM?

The ART is generally quite efficient, the flash lines are very wide, and subsequent words are delivered to the prefetch faster than RAM. On the other hand RAM will provide predictable timing.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

chen · ‎2014-11-24

Posted on November 24, 2014 at 11:36

Hi

''I am looking at CYCCNT and it seems as if entering or exiting a method alone costs about 30 cycles?''

This sounds about right. It is all the code that is executed to save and restore registers to/from stack.

''So I wonder if placing some of my key methods into RAM would help.''

As clive1 says about branch prediction by the ART.

I would have thought not because the processor still has to save/restore registers.

I think register saving is a thing the compiler does. I know there are compiler directives to make ISR more efficient (look up 'naked ISR') but I do not know if there are directives to make function calls more efficient. I know it differs between GNU and IAR

frankmeyer9 · ‎2014-11-24

Posted on November 24, 2014 at 11:48

...but I do not know if there are directives to make function calls more efficient. I know it differs between GNU and IAR

I expect both gcc and IAR to adhere to the ARM ABI, which allows to pass up to 4 parameters in registers (and not on the stack), if possible. And AFAIK, the ''scratch registers'' R0 ... R3 are used for that purpose. So perhaps reducing the number of function arguments will speed up calls.

The ABI documentation is found on the ARM infocenter webpage.

chen · ‎2014-11-24

Posted on November 24, 2014 at 12:46

Hi

''I expect both gcc and IAR to adhere to the ARM ABI, which allows to pass up to 4 parameters in registers (and not on the stack), if possible. And AFAIK, the ''scratch registers'' R0 ... R3 are used for that purpose.''

Thanks, nice to know. I had been taught that by Feahbas but I could never find that information in the compiler manuals. Nice to know where to look (ABI documents).

''I know it differs between GNU and IAR''

Sorry, I meant the compiler directives differ between GNU and IAR.

Also that the ISR directive differ ('naked' works for GNU but not for IAR I think)

frankmeyer9 · ‎2014-11-24

Posted on November 24, 2014 at 14:07

Sorry, I meant the compiler directives differ between GNU and IAR.

Also that the ISR directive differ ('naked' works for GNU but not for IAR I think)

The compilers are usually not interfering with the ABI, so are rarely defining directives to change it (like ''_cdecl'', ''_pascal'', ''_stdcall'' and ''_fastcall'' in the WIN32 world). The ABI is the toolchain vendor's homework, to ensure compatibility between each other, at least a link time. So it is not an important issue for the application developer - except in those cases ...

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0439d/BcgdbagaB.html

Tesla DeLorean · ‎2014-11-24

Posted on November 24, 2014 at 17:08

Started on-topic, veered off later.

https://my.st.com/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Flat.aspx?RootFolder=https://my.st.com/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/compensating%20latencies%20on%20STM32F4%20interrupts&FolderCTID=0x01200200770978C69A1141439FE559EB459D7580009C4E14902C3CDE46A77F0FFD06506F5B&cu...

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..