How to place and execute STM32 code in SRAM memory with STM32CubeIDE

RomainR. · ‎2023-04-21

This article aims to show some methods to set up STM32 project firmware to be able to execute a part or the entire firmware into embedded SRAM memory.Generally, STM32s execute user code directly from their embedded flash memory.
Thanks to the mechanism of ART Accelerator and Prefetch buffer that allows the execution of the code with a minimum flash wait state latency at a maximum cpu and peripheral frequency.
But in some cases, it is useful to execute a part or a full firmware in SRAM.
This can motivate an embedded system developer for the following reasons:

Improve the execution times of code or interrupts considered critical.
Reduce power consumption by placing the flash memory in low power mode.
Perform other concurrent read/write operations on flash memory, while the CPU is executing code in SRAM.
Speed up the code upload when doing debugging sessions.

All the following examples use:

NUCLEO-U575ZI-Q
STM32CubeIDE and STM32Cube_FW_U5_V1.2.0.
STM32CubeProgrammer

Case 1: Placing and executing a function in SRAM.

This part describes the steps to link and execute a C function in SRAM using STM32CubeIDE. In this example we will create a basic function called Prime_Calc_SRAM() in charge of computing an amount of Prime number in a given interval.

1. Run STM32CubeIDE and create a new STM32 project using NUCLEO-U575ZI-Q board (refer to the getting started with STM32 step by step in STM32 MCU Wiki)

2. Name it as an example ‘Nucleo-U575ZI_SRAM_Exec’ and generate the project.
3. Leave all the STM32CubeMX configuration to default value except:

Clock configuration from MSI and PLL to obtain HCLK = 160MHz

4. Do not Generate MX_USART1_UART_Init function call in Project Manager Tab.

5. Save the project and generate the code.

6. Build the project to check if there are no error, you should only see the following warning:

../Core/Src/main.c:382:13: warning: 'MX_USART1_UART_Init' defined but not used [-Wunused-function]

7. Open main.c file, create the prototype of Prime_Calc_SRAM() function which is in charge of computing prime number in an array.
Note: This function is defined with __attribute__((section(".RamFunc"))) keyword.

/* USER CODE BEGIN PFP */
static void __attribute__((section(".RamFunc"))) Prime_Calc_SRAM(void);
/* USER CODE END PFP */

8. Create the following Private constant to define the size of the prime number array.

/* USER CODE BEGIN PD */
#define PRIM_NUM        64U          /* Size of the prime array */	
/* USER CODE END PD */

9. Create the following Private variable.
Note: This variable array is defined with __attribute__(( aligned(32))) keyword to make sure that the data as well aligned into the memory.

/* USER CODE BEGIN PV */
static __attribute__(( aligned(32))) uint32_t primes_ram[PRIM_NUM];
/* USER CODE END PV */

10. Finally create the Prime_Calc_SRAM() function declaration.
This code is in charge of computing the prime numbers from 1 to 64 (PRIM_NUM) and store them into primes_ram array variable.
Note: As for the prototype definition the presence of the __attribute__(( aligned(32))) keyword.

/* USER CODE BEGIN 0 */
/**
  * @brief  Compute an amount of Prime number in SRAM
  * @param  None
  * @retval None
  */
static void __attribute__((section(".RamFunc"))) Prime_Calc_SRAM(void)
{
  /* Compute prime number in the elements size of the array */
  /* Initial condition */
  primes_ram[0] = 1;
  for (uint32_t i = 1; i < PRIM_NUM;)
  {
    for (uint32_t j = primes_ram[i - 1] + 1;; j++)
    {
      for (uint32_t k = 2; k <= j; k++)
      {
        primes_ram[i] = j;
        goto nexti;
      }
    }
nexti:
      i++;
  }
} 
/* USER CODE END 0 */

11. Call this function in main.

/* USER CODE BEGIN 2 */
Prime_Calc_SRAM();
/* USER CODE END 2 */

12. Open the linker file STM32U575ZITXQ_FLASH.ld and take a look at the memories definition below:

/* Memories definition */
MEMORY
{
  RAM   (xrw)	: ORIGIN = 0x20000000,	LENGTH = 768K
  SRAM4 (xrw)	: ORIGIN = 0x28000000,	LENGTH = 16K
  FLASH (rx)	: ORIGIN = 0x08000000,	LENGTH = 2048K
}

The STM32U5 embeds several memories, these definitions are used as physical location in order to inform the linker what section to place each firmware symbol, code and data.
Refer to STM32U5 Reference manual chapter Memory organization (RM0456) for a full description of the memory organization in the device.
Then the linker file contains predefined sections. The users can create is own section.
Next, we can see the section dedicated for interrupt vector (.isr_vector), placed by default at the beginning of the flash memory, then the program code (.text) also in flash memory and variable, bss heap, and stack in (.data, .bss, ._user_heap_stack) in RAM.
For us, our goal is to use a section placement that is defined in section .data, and it is name: .RamFunc see below:

/* Initialized data sections into "RAM" Ram type memory */
.data :
{
  _sdata = .;        /* create a global symbol at data start */
  *(.data)           /* .data sections */
  *(.data*)          /* .data* sections */
  *(.RamFunc)        /* .RamFunc sections */
  *(.RamFunc*)       /* .RamFunc* sections */
  _edata = .;        /* define a global symbol at data end */
} >RAM AT> FLASH

The command >RAM AT> FLASH is used to inform the linker to resolve by copying from the flash memory to RAM at startup initialization, all the symbols defined with these placement section configurations.
In our case, the function Prime_Calc_SRAM() will be copied from flash memory to SRAM1 after reset.
This task is managed by the Reset_Handler: LoopCopyDataInit in the startup_stm32u575zitxq.s assembly file.

13. Build the project.
14. Locate Prime_Calc_SRAM symbol in Build Analyzer, this is the hierarchical representation of the map file generated during the linker phase.

The point here as described in step 12, the linker resolves the function from Flash address 0x0800317C and copies it at startup in SRAM1 at address 0x2000000C.
15. Place two breakpoints:

In Reset_Handler at line bl SystemInit in startup_stm32u575zitxq.s
In Prime_Calc_SRAM() function at line primes_ram[0] = 1;

16. Start STM32CubeIDE debugger. The first breakpoint Halt the CPU in Reset_Handler: in startup_stm32u575zitxq

17. Open Windows -> Show view -> Memory and monitors the address location in flash memory and in SRAM1 of the Prime_Calc_SRAM() function.
18. We can see that the linker placed the function starting at 0x0800317C and with a size of 96 bytes, in the flash memory section.

19. While the SRAM1 contains random data (or not allocated) at 0x2000000C address.

20. Check the Registers -> General Registers watch expression, the PC counter is in flash memory.
That means that the code is executed in flash.

21. Resume (F8) the execution. The second breakpoint Halt again the CPU at the beginning of Prime_Calc_SRAM() function in main.c.
In addition, the 96 bytes of code related to the Prime_Calc_SRAM() function is copied from flash memory at 0x0800317C to SRAM at 0x2000000C.
The copy was done by LoopCopyDataInit: in startup_stm32u575zitxq.s

22. Now, check again the PC counter. The function is now executed in SRAM1.

23. Select Run -> Remove all Breakpoints and stop the debug session.

Conclusion: Using the gcc __attribute__((section(".RamFunc"))) keyword combined with linker sub section definition, it helps us to easily place and execute a function or a part of code in the SRAM memory.
An other method is to use the existing __RAM_FUNC define (in stm32u5xx_hal_def.h) to declare a function to be executed in SRAM.

#define __RAM_FUNC HAL_StatusTypeDef  __attribute__((section(".RamFunc")))

Case 2: Placing and executing an interrupts in SRAM.

This second part describes the steps to link and execute interrupts in SRAM using STM32CubeIDE.
It is based on the same project used in the previous steps.
1. Open Nucleo-U575ZI_SRAM_Exec.ioc in order to configure an EXTI interrupts for the User button on the NUCLEO-U575ZI-Q board.

2. Open NVIC Categories and simply check EXTI Line 13 interrupt Enabled. Keep all the other existing configuration.
3. Save the project and regenerate the code.

4. Build the project and run the debugger.
5. Place a Breakpoint in stm32u5xx_it.c at line:
HAL_GPIO_EXTI_IRQHandler(USER_BUTTON_Pin);

6. Press the User button (Blue button) on the Nucleo board, the CPU is halted into EXTI13_IRQHandler() in stm32u5xx_it.c.
7. Check the Registers -> General Registers watch expression, the PC counter is in flash memory.
That is means the interrupt handler function is executed in flash.

8. Stop the debugger.
9. Change the EXTI13_IRQHandler name by applying the same attribute keyword than the previous case.
Add the __attribute__((section(".RamFunc"))) keyword.

/**
  * @brief This function handles EXTI Line13 interrupt.
  */
void __attribute__((section(".RamFunc"))) EXTI13_IRQHandler(void)
{
  /* USER CODE BEGIN EXTI13_IRQn 0 */

  /* USER CODE END EXTI13_IRQn 0 */
  HAL_GPIO_EXTI_IRQHandler(USER_BUTTON_Pin);
  /* USER CODE BEGIN EXTI13_IRQn 1 */

  /* USER CODE END EXTI13_IRQn 1 */
}

10. Rebuild the project, and run the debugger. Keep the breakpoint at the same line as step 5.

11. Press again, the User button (Blue button) on the Nucleo board, the CPU is halted into EXTI13_IRQHandler().
12.Check the Registers -> General Registers watch expression, the PC counter is at address 0x20000070.
That’s means that the interrupt handler function is executed in SRAM1.

13. Select Run -> Remove all Breakpoints and stop the debug session.
Conclusion: This example is equivalent to the previous chapter. It can be used to place and execute a single or several interrupts’ handlers in the embedded SRAM.
Case 3: Placing and executing an entire project in SRAM (code and interrupts).

This third part describes the steps to link and execute a full STM32 project in SRAM using STM32CubeIDE.
It is still based on the same project used in the previous steps.

1. Remove the __attribute__((section(".RamFunc"))) in the Prime_Calc_SRAM() and in the EXTI13_IRQHandler().
Note: For Prime_Calc_SRAM() you must change prototype and function definition.
2. Open system_stm32u5xx.c and search for #define VECT_TAB_SRAM symbol.
Uncomment the corresponding line (see below)

/************************* Miscellaneous Configuration ************************/
/*!< Uncomment the following line if you need to relocate your vector Table in
     Internal SRAM. */
#define VECT_TAB_SRAM
#define VECT_TAB_OFFSET  0x00000000UL /*!< Vector Table base offset field.
                                   This value must be a multiple of 0x200. */
/******************************************************************************/

3. Add the following code in the main while(1) loop, it will be useful later:

/* Infinite loop */
  /* USER CODE BEGIN WHILE */
  while (1)
  {
     HAL_GPIO_TogglePin(LED_BLUE_GPIO_Port, LED_BLUE_Pin);
     HAL_Delay(250);
  /* USER CODE END WHILE */

4. Check and locate in Project Explorer the linker file generated by CubeMX.

5. As explained in the first case, CubeMX generates two types of linker file.

STM32U575ZITXQ_FLASH.ld is related to defining all physicals memory of the device, like flash memory and all SRAM bank, and to place all the symbols in these sections.
STM32U575ZITXQ_RAM.ld define only the RAM memory of the device (see below)

/* Memories definition */
MEMORY
{
  RAM	  (xrw)	: ORIGIN = 0x20000000,	LENGTH = 768K
  SRAM4   (xrw)	: ORIGIN = 0x28000000,	LENGTH = 16K
}

In this linker configuration all the symbols are placed in the memories definition, in this case in SRAM1 (0x2000 0000) and you can check the linker command >RAM of each SECTIONS.
Example for the .isr_vector (interrupts) and .text (Code) Sections.

/* The startup code into "RAM" Ram type memory */
  .isr_vector :
  {
    KEEP(*(.isr_vector)) /* Startup code */
  } >RAM

  /* The program code and other data into "RAM" Ram type memory */
  .text :
  {
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
…
…
    KEEP (*(.init))
    KEEP (*(.fini))

    _etext = .;        /* define a global symbols at end of code */
  } >RAM

6. Open Project Properties, right click on Nucleo-U575ZI_SRAM_Exec in Project Explorer.
7. Go to C/C++ Build -> Settings and select Tool Settings tab.
8. Select MCU GCC Linker -> General (see below)
9. Change the Linker Script (-T) path with following: ${workspace_loc:/${ProjName}/STM32U575ZITXQ_SRAM.ld}

10. Apply and Close to save the new configuration.
11. Clean the project then rebuild it.
12. Check the Build Analyzer that provides an analysis of the new map file generated by the linker configuration.

The flash memory section is not present, and all the symbols are placed in SRAM1 (interrupt vector, code, and data)
13. To be sure the new project will be executed from SRAM1, we use STM32CubeProgrammer to perform a full chip erase of the flash memory.
14. Open STM32CubeProgrammer, connect to the board and select Full chip erase Icon on the bottom left corner.

15. The flash memory is erased to 0xFFFFFFFF value.
16. Disconnect the Nucleo board from STM32Cube Programmer.
17. In STM32CubeIDE, start the debugger and step over to check the same features as the case 1 and 2 as:

EXTI13_IRQHandler when User button is pressed.
The result of the Prime_Calc_RAM() function result.
The Blue LED blinking

All the code is placed and executed from SRAM1 and can be debugged as long the NUCLEO-U575ZI-Q is powered on.
If the Nucleo board is powered off, the SRAM lose all the data, and you need to start a new debug session to load and execute the code again.
18. Stop the debugger. And simply press the Reset button on the board (black button) to reboot and restart the firmware: The Blue LED2 blinking.
19. The Blue LED does not blink, the code is not executed? The board was not powered off, why the code does not restart?
20. Even if we configure all the project to place and execute the full code in SRAM, all the STM32 had a boot mode, which select what memory address the processor use to boot at startup.
For the STM32U5 family, you must refer to RM0456 in chapter 4 boot modes.
The same chapter is also present for all other STM32 in their respective Reference Manual.
The table 25 in RM0456 explains the boot mode when TZ=0 (Nonsecure mode)

By default, on a new fresh STM32U5, the memory used for boot mode is the flash memory.
It depends on BOOT0 PH3 pin level, in our case, this pin is connected to VSS through a resistor.
We just need to change the NSBOOTADD0[24:0] defined by user option bytes, and configure the boot mode in SRAM1 address, to be aligned with our linker configuration.
21. Open STM32CubeProgrammer, and establish a connection with the board.
22. On the left, select Option bytes icon.
23. Select Boot Configuration, you should see the NSBOOTADD0 field (see below)

24. Change the value of NSBOOTADD0 Address to 0x20000000. And click Apply to write the new option byte value in the flash memory of the STM32.

25. Observe the Blue LED2, it blinks again. This is because a reset of the STM32U5 has been performed after the write sequence of the NSBOOTADD0. As the board was still powered on during this sequence, the contents of the SRAM1 are not lost, and the firmware can start the execution in SRAM1 again.

Conclusion

This tutorial showed the important steps to place and run a project and its entire code from SRAM1 as example:

A in application programing firmware.
A low power application that powers down the flash memory to reduce the power consumption.

This practice requires some minor changes in the configuration of the STM32CubeIDE.
Any other IDE environments such as IAR-EWARM or KEIL-ARM may handle the same project configuration, please refer to their respective documentation to understand the syntax of the linkers.

etheory · ‎2023-05-09

This article is fantastic, you should share it more widely on the ST Community forum and make it more visible.

RomainR. · ‎2023-05-09

Hi @etheory (Community Member)

Thank you for your feedback, I'm glad if this article can helps.

For sure, there will be some other publication, it's on the plan.

BR

Romain,

etheory · ‎2023-05-09

One quick note on the above.

In my own tests if I use:

static void __attribute__((section(".RamFunc"))) Prime_Calc_SRAM(void)

it doesn't work, if I instead use:

void __attribute__((section(".RamFunc"))) Prime_Calc_SRAM(void)

then it works.

Is there any reason for this?

Thanks.

etheory · ‎2023-05-09

How could I use the above technique to place specific code into CCM RAM? I don't see any logical extension to the above technique.

I note that the .RamFunc definition in the .data section in the linker seems disconnected from the memory RAM definition, so it's unclear how you could successfully add new sections.

RomainR. · ‎2023-05-09

Hi @etheory (Community Member)

Concerning your first note above.

The static C language keyword is defining the limited scope of the declared function, here Prime_Calc_SRAM().

Of course check in your own test that both Prime_Calc_SRAM() prototype and function declaration use the same static keyword.

And your second point:

If you plan to use Core Coupled Memory or multi banks of memory, you must add a new memory section in the linker file.

As an example:

MEMORY

{

...

CCMRAM (rw) : ORIGIN = 0x10000000, LENGTH = 64K

...

}

And now I believe you can change :

} >RAM AT> FLASH

by

} >CCMRAM AT> FLASH

These topic is specific to gnu linker script syntax, and you should find resource on the web.

Using LD, the GNU linker - Command Language

BR

Romain,

etheory · ‎2023-05-09

I gave it a go myself but couldn't get it working with a Nucleo-G474RE. What I tried was the following:

For the linker script, STM32G474RETX_FLASH.ld, hange the default:

/* Memories definition */
MEMORY
{
  RAM    (xrw)    : ORIGIN = 0x20000000,   LENGTH = 128K
  FLASH    (rx)    : ORIGIN = 0x8000000,   LENGTH = 512K
}

to:

/* Memories definition */
MEMORY
{
  CCMRAM (xrw)    : ORIGIN = 0x10000000,   LENGTH = 32K
  RAM    (xrw)    : ORIGIN = 0x20000000,   LENGTH = 128K
  FLASH    (rx)    : ORIGIN = 0x8000000,   LENGTH = 512K
}

Then also add:

/* Used by the startup to initialize ccm_data */
  _ccm_sidata = LOADADDR(.ccm_data);
 
  /* Initialized data sections into "CCMRAM" Ram type memory */
  .ccm_data :
  {
    . = ALIGN(4);
    _ccm_sdata = .;        /* create a global symbol at data start */
    *(.ccm_data)           /* .ccm_data sections */
    *(.ccm_data*)          /* .ccm_data* sections */
    *(.CCMRamFunc)        /* .CCMRamFunc sections */
    *(.CCMRamFunc*)       /* .CCMRamFunc* sections */
 
    . = ALIGN(4);
    _ccm_edata = .;        /* define a global symbol at data end */
 
  } >CCMRAM AT> FLASH

Then add to the startup script startup_stm32g474retx.s:

.global	g_pfnVectors
.global	Default_Handler
 
/* MY MODIFICATION HERE */
.word	_ccm_sidata
.word	_ccm_sdata
.word	_ccm_edata
 
/* start address for the initialization values of the .data section.
defined in linker script */
.word	_sidata
/* start address for the .data section. defined in linker script */
.word	_sdata
/* end address for the .data section. defined in linker script */
.word	_edata
/* start address for the .bss section. defined in linker script */
.word	_sbss
/* end address for the .bss section. defined in linker script */
.word	_ebss

and:

/* Copy the data segment initializers from flash to SRAM */
  ldr r0, =_sdata
  ldr r1, =_edata
  ldr r2, =_sidata
  movs r3, #0
  b	LoopCopyDataInit
 
/* MY MODIFICATION HERE */
/* Copy the data segment initializers from flash to CCM SRAM */
  ldr r0, =_ccm_sdata
  ldr r1, =_ccm_edata
  ldr r2, =_ccm_sidata
  movs r3, #0
  b	LoopCopyDataInit

Then to call my function:

void __attribute__((section(".CCMRamFunc"))) run_test(void)

I even get the results visible in the STM32CubeIDE Build Analyzer:

But it doesn't run.

If I switch the code pragma back to:

void __attribute__((section(".RamFunc"))) run_test(void)

it's fine.

So I'm confused. What did I miss?

When I run it through the debugger, the second it calls my function run_test, it immediately hits the function:

void HardFault_Handler(void)

but I don't know why.

RomainR. · ‎2023-05-09

I can suggest this user guide:

OpenSTM32 Community Site | Using CCM Memory

BR

Romain

etheory · ‎2023-05-09

I fixed my bug. Everything above is correct except for the .s file, which is now:

/* Copy the data segment initializers from flash to SRAM */
  ldr r0, =_sdata
  ldr r1, =_edata
  ldr r2, =_sidata
  movs r3, #0
  b	LoopCopyDataInit
 
CopyDataInit:
  ldr r4, [r2, r3]
  str r4, [r0, r3]
  adds r3, r3, #4
 
LoopCopyDataInit:
  adds r4, r0, r3
  cmp r4, r1
  bcc CopyDataInit
 
/* MY CHANGES BELOW */
/* Copy the data segment initializers from flash to CCM SRAM */
  ldr r0, =_ccm_sdata
  ldr r1, =_ccm_edata
  ldr r2, =_ccm_sidata
  movs r3, #0
  b	CCMLoopCopyDataInit
 
CCMCopyDataInit:
  ldr r4, [r2, r3]
  str r4, [r0, r3]
  adds r3, r3, #4
 
CCMLoopCopyDataInit:
  adds r4, r0, r3
  cmp r4, r1
  bcc CCMCopyDataInit

And now it works.

etheory · ‎2023-05-09

@RomainR. your link above isn't that useful, since I want to only put code into CCM RAM, since that's the option recommended for fastest performance, and your article says only how to put data there, which doesn't seem to be a good idea. Only data or code should go there, and for best performance, only code. At least that's what the data sheets say.

If people follow my above post, they can successfully do that, and get a nice speed up as a result. They can then choose for code to go in flash (no pragma) SRAM (RamFunc pragma) or CCM SRAM (CCMRamFunc pragma) which is more flexible.

Thanks.

NOTE: putting code into SRAM as per the article actually made my code run SLOWER due to bus contention. Putting data in SRAM and code in CCM SRAM took my runtime usage statistics from 68.1% to 55.68% which is absolutely incredible.

RomainR. · ‎2023-05-09

Your technique has allowed you to improve the performance of your application, that's a very good point. And thank you for sharing it.

In complement, the AN4296 deals with this topic on STM32F3/G4 devices.

CCM is not available on all products, but I completely agree with you on the interest of being able to place data in SRAM and code in CCM in order to reach the best performance.

Thank you,

BR

Romain,