How to execute code from internal SRAM?

Palacios.Bruno · ‎2013-04-18

Posted on April 18, 2013 at 16:24

Hi guys! I need some help. Im working with the STM32F4 discovery and CoIDE 1.7.1 and I want to know what should I do to indicate to CoIDE that I want to save some code in the internal SRAM memory, because I need that this code to execute as fast as posibble.

Thanks for all and best regards!

#stm32f4-discovery #art

Tesla DeLorean · ‎2013-04-18

Posted on April 18, 2013 at 16:51

For GNU/GCC based compilers surely you can use the ''attribute'' directive to place code in specific sections, which you can further describe in the linker script.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

dthedens23 · ‎2013-04-18

Posted on April 18, 2013 at 19:52

Just placing code into RAM may not make it execute faster than executing out of flash because of ART.

When executing out of RAM, you just converted this fancy Harvard Architecture Cortex M4 into a Von Neumann Architecture M0 with the DCode AHB bus constantly switching back to READ to fetch the next instruction. (wastes a cycle and I may be wrong, but I believe more if there is a pending AHB write)

depending on the type of code, it may or may not run faster. I'm sure that a bunch of nops will run faster but real world code that access variables may run slower.

perhaps you can carefully locate code and data into different locations with different AHB bus.

read RM0090 2.3.1 & 8.2.1

Andrew Neil · ‎2013-04-18

Posted on April 18, 2013 at 20:07

''Just placing code into RAM may not make it execute faster than executing out of flash because of ART''

See:

http://eda360insider.wordpress.com/2011/09/22/ingenious-architectural-features-allow-st-micro-to-extract-maximum-performance-from-new-microcontroller-family-based-on-arm-cortex-m4-cost-less-than-6-bucks-in-1000s/

''I want to know what should I do to indicate to CoIDE...''

For CoIDE support, you should be asking CooCox:

http://www.coocox.org/Forum/index.php

Tesla DeLorean · ‎2013-04-18

Posted on April 18, 2013 at 20:31

The ART does a pretty good job of masking the slowness of the flash, but like any other cache does add a certain about of unpredictability, and variability. One of the clever aspects is the wide flash line width, and getting data quickly to the prefetch path.

The RAM on the F4 doesn't really perform as well as the ARM9 TCM implementations (often used in place of a cache - Tightly Coupled Memory). The F4's CCM (Closely Coupled Memory) is fast, but can't execute code. This issue was addressed in the F3 design, but neither permit this memory to be used for DMA. The lack of contention is beneficial, but means the RAM areas need to be understood, and used for appropriate purposes.

Putting code in RAM, along with vectors/interrupts can be helpful at eliminating FLASH erase/write stalling.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

dthedens23 · ‎2013-04-18

Posted on April 19, 2013 at 01:02

Perhaps I was over thinking the theoretical. And here I go again.

If the code is some giant inline asm code then ART would be re-filling the 64 branch cache lines and one would get stalls. But if it was some loop type code, then the first pass through the loops might have stalls, but then the 64 branch cache lines would be set and subsequent loops would run flat out.

I can see where interrupt vector table in RAM is a win. ART would probably never have these in cache.

I suppose I'll have to run some tests to see what level of improvement is possible.

My biggest concern would be high speed DMA contention with instruction fetches.

Palacios.Bruno · ‎2013-04-20

Posted on April 20, 2013 at 17:45

Hi guys, thank you all for your replies!!

clive1 what should I describe in the linker script? I don't understand that part.

rocketdawg I will read the rm! Thanks.

neil.andrew I posted this question in the coocox forum but nobody answered me, lol.

Maybe I should told you guys that the goal is to save in ram an MP3 decoder algorithm to achieve faster decoding of the mp3 stream.

I want to measure the time it takes to decode one strem when it is saved in ram and in flash (the decoder algorithm).

I read some posts about mp3 decoders in cortex m3 architectures and all they said that the time reduced. So I want to try.

Again thank you all!

For GNU/GCC based compilers surely you can use the ''attribute'' directive to place code in specific sections, which you can further describe in the linker script.

Tesla DeLorean · ‎2013-04-20

Posted on April 20, 2013 at 20:02

clive1 what should I describe in the linker script? I don't understand that part.

The linker script describes regions of MEMORY, and SECTIONS which fall into them, you can also direct specific routines or objects into areas of memory. ie use .execram and put *(.execram) after the *(.bss) directives.

For GNU/GCC based compilers surely you can use the ''attribute'' directive to place code in specific sections, which you can further describe in the linker script.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Palacios.Bruno · ‎2013-04-21

Posted on April 21, 2013 at 14:07

Thank you clive1! I'm going to check it out.

Best regards!