on 2023-04-21 06:40 AM
/* USER CODE BEGIN PFP */ static void __attribute__((section(".RamFunc"))) Prime_Calc_SRAM(void); /* USER CODE END PFP */
/* USER CODE BEGIN PD */ #define PRIM_NUM 64U /* Size of the prime array */ /* USER CODE END PD */
/* USER CODE BEGIN PV */ static __attribute__(( aligned(32))) uint32_t primes_ram[PRIM_NUM]; /* USER CODE END PV */
/* USER CODE BEGIN 0 */ /** * @brief Compute an amount of Prime number in SRAM * @param None * @retval None */ static void __attribute__((section(".RamFunc"))) Prime_Calc_SRAM(void) { /* Compute prime number in the elements size of the array */ /* Initial condition */ primes_ram[0] = 1; for (uint32_t i = 1; i < PRIM_NUM;) { for (uint32_t j = primes_ram[i - 1] + 1;; j++) { for (uint32_t k = 2; k <= j; k++) { primes_ram[i] = j; goto nexti; } } nexti: i++; } } /* USER CODE END 0 */
/* USER CODE BEGIN 2 */ Prime_Calc_SRAM(); /* USER CODE END 2 */
/* Memories definition */ MEMORY { RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 768K SRAM4 (xrw) : ORIGIN = 0x28000000, LENGTH = 16K FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 2048K }The STM32U5 embeds several memories, these definitions are used as physical location in order to inform the linker what section to place each firmware symbol, code and data.
/* Initialized data sections into "RAM" Ram type memory */ .data : { _sdata = .; /* create a global symbol at data start */ *(.data) /* .data sections */ *(.data*) /* .data* sections */ *(.RamFunc) /* .RamFunc sections */ *(.RamFunc*) /* .RamFunc* sections */ _edata = .; /* define a global symbol at data end */ } >RAM AT> FLASHThe command >RAM AT> FLASH is used to inform the linker to resolve by copying from the flash memory to RAM at startup initialization, all the symbols defined with these placement section configurations.
#define __RAM_FUNC HAL_StatusTypeDef __attribute__((section(".RamFunc")))
/** * @brief This function handles EXTI Line13 interrupt. */ void __attribute__((section(".RamFunc"))) EXTI13_IRQHandler(void) { /* USER CODE BEGIN EXTI13_IRQn 0 */ /* USER CODE END EXTI13_IRQn 0 */ HAL_GPIO_EXTI_IRQHandler(USER_BUTTON_Pin); /* USER CODE BEGIN EXTI13_IRQn 1 */ /* USER CODE END EXTI13_IRQn 1 */ }
This third part describes the steps to link and execute a full STM32 project in SRAM using STM32CubeIDE.
It is still based on the same project used in the previous steps.
/************************* Miscellaneous Configuration ************************/ /*!< Uncomment the following line if you need to relocate your vector Table in Internal SRAM. */ #define VECT_TAB_SRAM #define VECT_TAB_OFFSET 0x00000000UL /*!< Vector Table base offset field. This value must be a multiple of 0x200. */ /******************************************************************************/3. Add the following code in the main while(1) loop, it will be useful later:
/* Infinite loop */ /* USER CODE BEGIN WHILE */ while (1) { HAL_GPIO_TogglePin(LED_BLUE_GPIO_Port, LED_BLUE_Pin); HAL_Delay(250); /* USER CODE END WHILE */4. Check and locate in Project Explorer the linker file generated by CubeMX.
/* Memories definition */ MEMORY { RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 768K SRAM4 (xrw) : ORIGIN = 0x28000000, LENGTH = 16K }In this linker configuration all the symbols are placed in the memories definition, in this case in SRAM1 (0x2000 0000) and you can check the linker command >RAM of each SECTIONS.
/* The startup code into "RAM" Ram type memory */ .isr_vector : { KEEP(*(.isr_vector)) /* Startup code */ } >RAM /* The program code and other data into "RAM" Ram type memory */ .text : { *(.text) /* .text sections (code) */ *(.text*) /* .text* sections (code) */ … … KEEP (*(.init)) KEEP (*(.fini)) _etext = .; /* define a global symbols at end of code */ } >RAM6. Open Project Properties, right click on Nucleo-U575ZI_SRAM_Exec in Project Explorer.
This article is fantastic, you should share it more widely on the ST Community forum and make it more visible.
Hi @etheory (Community Member)
Thank you for your feedback, I'm glad if this article can helps.
For sure, there will be some other publication, it's on the plan.
BR
Romain,
One quick note on the above.
In my own tests if I use:
static void __attribute__((section(".RamFunc"))) Prime_Calc_SRAM(void)
it doesn't work, if I instead use:
void __attribute__((section(".RamFunc"))) Prime_Calc_SRAM(void)
then it works.
Is there any reason for this?
Thanks.
How could I use the above technique to place specific code into CCM RAM? I don't see any logical extension to the above technique.
I note that the .RamFunc definition in the .data section in the linker seems disconnected from the memory RAM definition, so it's unclear how you could successfully add new sections.
Hi @etheory (Community Member)
Concerning your first note above.
The static C language keyword is defining the limited scope of the declared function, here Prime_Calc_SRAM().
Of course check in your own test that both Prime_Calc_SRAM() prototype and function declaration use the same static keyword.
And your second point:
If you plan to use Core Coupled Memory or multi banks of memory, you must add a new memory section in the linker file.
As an example:
MEMORY
{
...
CCMRAM (rw) : ORIGIN = 0x10000000, LENGTH = 64K
...
}
And now I believe you can change :
} >RAM AT> FLASH
by
} >CCMRAM AT> FLASH
These topic is specific to gnu linker script syntax, and you should find resource on the web.
Using LD, the GNU linker - Command Language
BR
Romain,
I gave it a go myself but couldn't get it working with a Nucleo-G474RE. What I tried was the following:
For the linker script, STM32G474RETX_FLASH.ld, hange the default:
/* Memories definition */
MEMORY
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 128K
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 512K
}
to:
/* Memories definition */
MEMORY
{
CCMRAM (xrw) : ORIGIN = 0x10000000, LENGTH = 32K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 128K
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 512K
}
Then also add:
/* Used by the startup to initialize ccm_data */
_ccm_sidata = LOADADDR(.ccm_data);
/* Initialized data sections into "CCMRAM" Ram type memory */
.ccm_data :
{
. = ALIGN(4);
_ccm_sdata = .; /* create a global symbol at data start */
*(.ccm_data) /* .ccm_data sections */
*(.ccm_data*) /* .ccm_data* sections */
*(.CCMRamFunc) /* .CCMRamFunc sections */
*(.CCMRamFunc*) /* .CCMRamFunc* sections */
. = ALIGN(4);
_ccm_edata = .; /* define a global symbol at data end */
} >CCMRAM AT> FLASH
Then add to the startup script startup_stm32g474retx.s:
.global g_pfnVectors
.global Default_Handler
/* MY MODIFICATION HERE */
.word _ccm_sidata
.word _ccm_sdata
.word _ccm_edata
/* start address for the initialization values of the .data section.
defined in linker script */
.word _sidata
/* start address for the .data section. defined in linker script */
.word _sdata
/* end address for the .data section. defined in linker script */
.word _edata
/* start address for the .bss section. defined in linker script */
.word _sbss
/* end address for the .bss section. defined in linker script */
.word _ebss
and:
/* Copy the data segment initializers from flash to SRAM */
ldr r0, =_sdata
ldr r1, =_edata
ldr r2, =_sidata
movs r3, #0
b LoopCopyDataInit
/* MY MODIFICATION HERE */
/* Copy the data segment initializers from flash to CCM SRAM */
ldr r0, =_ccm_sdata
ldr r1, =_ccm_edata
ldr r2, =_ccm_sidata
movs r3, #0
b LoopCopyDataInit
Then to call my function:
void __attribute__((section(".CCMRamFunc"))) run_test(void)
I even get the results visible in the STM32CubeIDE Build Analyzer:
But it doesn't run.
If I switch the code pragma back to:
void __attribute__((section(".RamFunc"))) run_test(void)
it's fine.
So I'm confused. What did I miss?
When I run it through the debugger, the second it calls my function run_test, it immediately hits the function:
void HardFault_Handler(void)
but I don't know why.
I fixed my bug. Everything above is correct except for the .s file, which is now:
/* Copy the data segment initializers from flash to SRAM */
ldr r0, =_sdata
ldr r1, =_edata
ldr r2, =_sidata
movs r3, #0
b LoopCopyDataInit
CopyDataInit:
ldr r4, [r2, r3]
str r4, [r0, r3]
adds r3, r3, #4
LoopCopyDataInit:
adds r4, r0, r3
cmp r4, r1
bcc CopyDataInit
/* MY CHANGES BELOW */
/* Copy the data segment initializers from flash to CCM SRAM */
ldr r0, =_ccm_sdata
ldr r1, =_ccm_edata
ldr r2, =_ccm_sidata
movs r3, #0
b CCMLoopCopyDataInit
CCMCopyDataInit:
ldr r4, [r2, r3]
str r4, [r0, r3]
adds r3, r3, #4
CCMLoopCopyDataInit:
adds r4, r0, r3
cmp r4, r1
bcc CCMCopyDataInit
And now it works.
@RomainR. your link above isn't that useful, since I want to only put code into CCM RAM, since that's the option recommended for fastest performance, and your article says only how to put data there, which doesn't seem to be a good idea. Only data or code should go there, and for best performance, only code. At least that's what the data sheets say.
If people follow my above post, they can successfully do that, and get a nice speed up as a result. They can then choose for code to go in flash (no pragma) SRAM (RamFunc pragma) or CCM SRAM (CCMRamFunc pragma) which is more flexible.
Thanks.
NOTE: putting code into SRAM as per the article actually made my code run SLOWER due to bus contention. Putting data in SRAM and code in CCM SRAM took my runtime usage statistics from 68.1% to 55.68% which is absolutely incredible.
Your technique has allowed you to improve the performance of your application, that's a very good point. And thank you for sharing it.
In complement, the AN4296 deals with this topic on STM32F3/G4 devices.
CCM is not available on all products, but I completely agree with you on the interest of being able to place data in SRAM and code in CCM in order to reach the best performance.
Thank you,
BR
Romain,