2022-03-17 04:32 PM
I have several arrays of data and one or two arrays have to stay inside the TCM part while others can be wherever there is space. I don't want to have fixed memory location but I want to influence the order in which compiler (or linker) allocates them in memory.
My current code flips between terrible and superbe performance for the simple reason on where in the code I make changes and how the arrays are then moved in memory.
I want some control back.
Solved! Go to Solution.
2022-03-17 06:49 PM
You can use an attribute to specify which section of memory the array is placed in. The sections of memory are defined in the linker script. You need to place it in a particular section.
This explains how to place an array in CCMRAM using GCC based tools (e.g. STM32CubeIDE). The concept is the same for any other memory region (e.g. DTCM).
https://www.openstm32.org/Using%2BCCM%2BMemory
2022-03-17 06:49 PM
You can use an attribute to specify which section of memory the array is placed in. The sections of memory are defined in the linker script. You need to place it in a particular section.
This explains how to place an array in CCMRAM using GCC based tools (e.g. STM32CubeIDE). The concept is the same for any other memory region (e.g. DTCM).
https://www.openstm32.org/Using%2BCCM%2BMemory
2022-03-21 03:29 AM
Dear TDK, your answer is spot on. Thanks a lot.
I guess this is the "best" way to have the influence in placing variables in memory.
For further reference (probably for myself in the future), I will add some of my thoughts.
First: The linker file created by the Cube-Tools (MX and IDE) only define Flash and RAM region. So I had to define the memory in more detail. With my uC (STM32F767) I referenced (RM0410 p76) keyword "Memory Map".
Maybe to elaborate a bit more on my specific issue since it has direct influence in the path I chose.
My code has a rather complex FSM and a multitude of sensors. Most of the sensor data is served via DMA (as much as possible). The heart of the code is an extensive mathematical algorithm which of course must be fast. Since the algorithm accesses a lot of memory repeatedly (e.g. matrix calculations) DTCM memory (or Cache) is a major performance driver.
On the other hand there are many (large) data buffers. Some of them supplied by DMA. So DTCM and Cache is counterproductive. That is why I need control of where variables go in memory.
So I changed my memory region as follows (in the LinkerScript.ld):
MEMORY (OLD)
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 512K
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 2048K
}
MEMORY (NEW)
{
DTCM (xrw) : ORIGIN = 0x20000000, LENGTH = 128K
RAM (xrw) : ORIGIN = 0x20020000, LENGTH = 368K
RAM2 (xrw) : ORIGIN = 0x2007C000, LENGTH = 16K
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 2048K
}
So without further change, all variables are stored in RAM but this is now starting at 0x20020000. So the very precious DTCM is completely free.
Now I defined a "Section" called "dtcm" as follows:
/* Tightly Couppled Data Memory */
dtcm :
{
. = ALIGN(4);
_sdtcm = .; /* create a global symbol at dtcm start */
*(.dtcm)
*(.dtcm*)
. = ALIGN(4);
_edtcm = .; /* create a global symbol at dtcm end */
} >DTCM
To place variables in the DTCM space, a compiler directive is needed at the end of each variable declaration:
static float32_t m1d[16 * 16] __attribute__((section("dtcm")));
Over all it is a bit of a messy process since the attribute is copied behind every variable and does not improve readability, but it does work.
As far as I could gather from the documentation of linker scripts, there may be a possibility to define memory allocation according to file name... In a more perfect world, I would define that all variables inside a file with the name pattern algo_*.o is placed inside DTCM. With the possibility to override this in the said file. It would make the code a bit more readable. Unfortunately I could not figure out how to have a default definition for .bss and then a definition for a file specific .bss
So for the time being I stick with my solution and accept the cluttering of my source code.
2022-03-29 05:23 AM
As a further note for future users:
The way I described above should in principle work. Yet in my case I ran into issues, not surprising with ST's HAL. So beware if you go this route. It is in general a good idea to get rid of all HAL drivers as code quality and documentation is terrible.
In the meantime I will change my strategy and let variables by default be allocated to the dtcm but move large buffers with limited r/w action into SRAM1 area.