2020-06-18 06:25 AM
Does anyone know if its possible to have a dedicated area memory of the Cortex to load dynamic functions during runtime? Something like an external C compiled function and send the binary through a communication interface (serial,i2c,etc) to the Microcontroller and have a custom bootloader write this data to a memory section . Afterwards in the main program by knowing where this new function is located in memory it can be called.
I am not that familiar with the ARM processors so I dont know where should I start if I want to achieve this.
Thanks
2020-06-18 07:34 AM
Overlays for code is a very old technique to manage limited memory address space. The concept goes back to minicomputers in the 1960s. Not much use for it now with 32-bit addressing.
Short answer is yes it is possible, but there are some difficulties. Most compilers/linkers today don't support overlays, though you might be able to accomplish this with a complicated linker map. The STM32L4s typically have an SRAM2 memory region to run code from SRAM. You can allocate a fixed base address for the overlay function and swap code in as needed to change functions. There are complications referencing globals in the main application so you may need to assign fixed SRAM1 areas (or even unused SRAM2) to hold whatever needs to be passed back and forth.
Jack Peacock
2020-06-18 10:55 PM
It is possible, but some steps depend on the toolchain used. I'm using gcc.
Bootloaders actually do something like this, the main difference is that an application started by a bootloader does not return (but I've seen exceptions).
I'd suggest you create separate projects for the overlays. If you are lucky, your IDE creates two variants of the linker script (*.ld), one for running from flash, and another for running from RAM. In the one called *RAM.ld there is a block near the top
MEMORY
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 96K
RAM2 (xrw) : ORIGIN = 0x10000000, LENGTH = 32K
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 1024K
}
As @Community member has pointed out above, SRAM2 is unused, so you can put the overlay code there. Just delete the RAM line, and rename RAM2 to RAM. You can remove the FLASH line as well, it is not referenced anywhere. Set the linker to use this file as the linker script. In Eclipse based IDEs it can be usually found under project properties / C/C++ build settings / configuration: [all configurations] / tool settings / linker / general / linker script.
Delete all source files from the project, including the startup*.s assembly code. Now you can create your own functions, write a few silly ones like
int foo(int a, b, c) {
return a + b + c;
}
int bar(int a,b) {
return a - b;
}
You need a jump table at a fixed location so that the main application can find the function entry addresses. ARM programs have a vector table at the beginning which serves a similar function, the hardware looks for function addresses at fixed memory locations for the main and the interrupt entry points. This table is gone now with the startup*.s file, so write one as a C array of function pointers to replace it.
At this point you might want to review the function pointers topic in a C textbook if you haven't used them much before.
Look at the linker script once again, just below the memory definitions
SECTIONS
{
.isr_vector :
{
. = ALIGN(4);
KEEP(*(.isr_vector))
. = ALIGN(4);
} >RAM
it tells the linker to put the section named ".isr_vector" at the beginning of the image. So create an array of pointers and tell the compiler to place them in the ".isr_vector" section.
__attribute__ ((section(".isr_vector"),used))
void (* const vectors[])(void) = {
(void (*const)(void))foo,
(void (*const)(void))bar,
};
Now you should be able to compile and link the overlay image, and load it at 0x10000000. The linker might complain about missing main() or Reset_Handler() functions. There might be a way to shut it up, or you can just provide empty stubs for them.
Then the main program can access the functions through a set of function pointers. You have to set the proper type for each of the functions to be able to pass arguments and get return values.
int (*foo)(int,int,int) = *(void**)0x10000000;
int (*bar)(int,int) = *(void**)0x10000004;
x = foo(1, 2, 3);
y = bar(4, 5);
2020-06-30 06:48 AM
Thanks for the input, I found a project that does what I intend to do. [link]
It has a script to compile the code to load for creating the image. Afterwards, with a set of methods I can load the image to the RAM but when I try to call the function pointer it goes into the hardfault handler. The INVSTATE flag is set to 1 in the UsageFault Status Register (USFR). What reasons could be causing this problem?
Just as additional information I am using the arm-attollic toolchain
2020-07-02 06:40 AM
I found that the problem resides on that I cannot use the `.data` section of my generated image. The approach from the mentioned project loads the relative offset table where the data section is located before executing any function that is in the binary image by loading it to the register `R9` (static base register). The following code is what the script generates (I have added some comments on what I understand/think it does)
push {r9, lr} ; Save R9 and LR to stack
push {r0, r1} ; Save r0 and R1 to stack
mov r1, #0x1c ; Load 21 to r1
ldr r1, [r1] ; Reads value of address 21 to R1 (???)
mov r0, PC ;Save PC in R1
blx r1 ;Execute address loaded to R1 (??)
mov r9, r0 ;Save value of R0 in R9
pop {r0, r1} ; Set old values of R0,R1
bl functionToExecute ;Execute the function from the image
pop {r9, pc} ;Remove r9 and PC from stack
Currently I am able to execute functions from the binary image but only if I skip the `blx R1` instruction, I debugged my code and apparently this instruction doesn't work as expected and executes the instructions on absolute memory address `0x21`, which I think is not what should happen.
2020-07-02 07:42 AM
The compiler by default uses absolute address locations, if you move the block it won't work. Some compilers manage the linux way of using only relative jumps, it's advanced programming....
2020-07-02 09:52 AM
Jumps are relative, but pointers are absolute on ARM. gcc -fPIC puts all pointer constants, i.e. addresses of variables into a table called GOT, and generates code that loads them through [R9+#offset]. Pointers in GOT must be adjusted by the loader to whatever addresses the code and data ended up at, missing ones filled in.
The vector at 0x1C corresponding to exception 7 is unused on Cortex-M, I suspect that it is supposed to point to a function that does the GOT fixup based on the PC value in R0. (Why can't it use LR?)
2020-07-02 12:55 PM
Indeed, R9 must point to the local GOT of the image. With the API provided from the mentioned project I get the absolute address where the GOT is allocated in RAM. Afterwards I modified the original script of the project since I couldnt make it work "as is".
What I am doing is implementing this temporal fix:
register int * r9_register asm ("r9"); //Assign a variable to r9
asm("stmdb sp!, {r9}"); //Backup R9
r9_register =(int) localGOTValue; //Point R9 to GOT
dynamicFunctionPtr(); //Executes my function from Ram
asm("ldmia.w sp!, {r9}"); //Restore Value of R9
So far it works, I will probably run the Unit tests from the project to verify it.
Just as further question, how could I directly move the value of the variable "localGOTValue" to register R9 (asm instruction) instead of having to declare it (line 1)
2020-07-05 01:07 AM
> So far it works
Even if you enable optimizations? Because then it ceases to work for me. I could not find a way to get the compiler to load anything in r9_register.
Besides, I would not mess with the stack frame in C code. Too dangerous if the optimizer starts to reorder stuff.
So I came up with this
asm volatile(
"mov r9, %[replace_r9]\n\t"
"bl %[jumpaddress]"
:
: [replace_r9] "r"(localGOTValue), [jumpaddress] "r"(dynamicFunctionPtr)
: "r9", "lr");
which apparently does what you want even with -O3. But it would become messy if there were arguments to the function.
2021-03-23 05:48 AM
Just an update on the post.
Your suggestion works as expected with -O3 but found with -Os it messed up the stack. At first, I added the optimize -O0 attribute to the function. After a while, I decided to test if the code works with Clang (-Oz) as well and not to my surprise the compiler again did some strange optimizations.
As I wanted a reliable behaviour independent of the compiler (gcc/clang) I implemented this snippet of code in assembly and then call it from the C code, which gives me total control over the low level implementation.