Hardfault when accessing a global variable in a position independent program compiled with -fPIE

matteoricciutelli_enel · ‎2023-08-02

Hello board,

I seek to implement a position independent firmware to achieve a dual slot fota strategy:

The application firmware can be saved into one among two possible memory areas, the firmware does not know at compile time where it will be stored so it must be able to run regardless of the position.
A small bootloader is in charge of properly launching the firmware (the bootloader knows where the application firmware is stored).

I followed the steps described by the UM2609 guide for stm32CubeIDE chapter 2.8 which covers this exact requirement https://www.st.com/resource/en/user_manual/um2609-stm32cubeide-user-guide-stmicroelectronics.pdf .

The application firmware is compiled with the -fPIE option
The instruciton "bl __libc_init_array */"is eliminated from the application firmware startup code
.got is inserted into the main firmware linker inside the .text section among GOT_START and GOT_END
Stack pointer is loaded at _estack in application firmware reset handler
A ram section is dedicated to holding the application firmware vector table
The bootloader copies the application firmware vector table to ram and applies the appropriate offset to each entry.
The bootloader deinits all peripherals and HAL and jumps to the application firmware

At this point:

The main firmware reset handler is reached,
Startup code executes,
When it reaches the function HAL_Init inside main the application firmware goes into hardfault upon accessing a global variable, whose loaded address is outside ram memory spaceinstruction that triggers hardfault
1. Funnily enough, this exact variable location is properly displayed by the expression monitor

At this point I think I am maybe missing something regarding global variables relocation, or am I doing something else wrong?

I hope any of you can shed light on this mistery as online information regarding this subject is very scarce!

Regards, Matteo

matteoricciutelli_enel · ‎2023-08-10

++++ UPDATE ++++

I have discovered that by disabling the -fpie option the code runs even if I load it in a different address than where it is supposed to be placed (by linker script) and does not crash the way it did with -fpie was enabled,

Removed -fPIE from compiler options
I eliminated the global offset table
Vector table reloacted to ram and all its addresses properly shifted by the bootloader considering where the firmware is loaded.
Relocated RoData init preninit finit sections to ram

Things work like a charm, but I do not understand:

Why does my compiled code seem to be position independent with default when neither -fpic, -fpie or such are enabled?

Bob S · ‎2023-08-10

Do you have the program loaded into both areas? Specially if the same image is loaded into both areas, code running in the 2nd area may "call" functions in the first area.

For example, code is compiled/linked to start at 0x0800 0000. Function MyInit() is at 0x0800 1000. Code is loaded at 0x0800 0000 (1st area) and 0x0801 0000 (2nd area). Yes, I know this doesn't allow for the bootloader that is probably really at 0x0800 0000. So the bootloader starts running the program in the 2nd area. It gets to the MyInit() call and calls the (fixed) address of MyInit() at 0x0800 1000, NOT what would be the position-independent address of 0x0801 1000.

matteoricciutelli_enel · ‎2023-08-10

Hi, no, the flash is formatted beforehand and I only upload the "application-fw" binary "manually"using stm32CubeProgrammer, then I upload and debug the bootloader using stm32cubeide (and I also load the application-fw debug information with it but without uploading the application of course).

The main firmware and the bootloader share the stm32 libraries initializations but then they call different functions and print different texts on the terminal.

In more detail:

The "application" linker settings would place it at 0x8020000
The application .data, .rodata .init .prenit, .fini sections are also reloacted to ram by linker settings
The application vector table is loaded from ram at an absolute address
I upload the application compiled .bin file at 0x8010000 using STM32CubeProgrammer (so there is a 0x10000 offset with the original linker placement).
The bootloader copies the irq table it finds in flash to ram at the address known by the application and shifts the irq table addresses by - 0x10000.
I set the bootloader to launch an application at 0x8010000

Bob S · ‎2023-08-10

Probably not the issue, but did you modify the startup code to copy "rodata' from Flash into RAM? I don't THINK the stock startup will do that.

The assembly code you showed looks a bit odd, specially the "add r3, pc" line. Does that source code generate the same assembly without the -fPIE flag? Though even subtracting out the PC value it STILL doesn't give the correct address.

matteoricciutelli_enel · ‎2023-08-11

In the newer firmware (the one without fpie) I do copy rodata from flash to ram, in the older it didn't
in the current firmware (no fpie) the global variable address is correctly fetched, there is no "add r3, pc" it loads the variable address adding a fixed offset to the pc

Besides why fpie was shifting the load addresses in that manner, I would like also to know how come the code without fpie is still somewhat position independent as I see that functions are called and jumps are performed always relatively to the program counter!

matteoricciutelli_enel · ‎2023-08-11

Another Update: regarding my first firmware, the one with -fpie, I now notice that the disassembly loads the global offset table exactly 0x10000 away from where it is actually located. 0x10000 is coincideentally the offset at which I load the firmware with respect to the original linker placement....

So I wonder, it looks like the program expects the GOT to be placed relatively to the PC with a fixed offset, but if the firmware gets uploaded in a different place than the linker's expected location, the GOT location calculated in that way becomes invalid.

What is the point of having position independent code if the global offset table is loaded in a position-dependent manner?Am I missing something?

matteoricciutelli_enel · ‎2023-08-11

So, to answer my last message, the key is to use -fpic in conjunction with -msingle-pic-base (remove -fpie), this option will make the compiler reference the GOT from a fixed register (R9 by default).

To allow this option to be effective the bootloader must load the correct ram address of the GOT (already relocated to ram and patched with the offset where necessary) into R9 right before launching the application firmware.

(Another way would be to populate R9 in the startup code of the firmware itself)

I am still experiencing some problems tho, i will update the post as soon as I discover something more

Tesla DeLorean · ‎2023-08-11

Been a while since I've spent a lot of time on ELF objects at this level. LINUX would allow for relocation, etc.

In KEIL the ARM ABI used R9 as a means of communicating data addressing, where the code / data live at unrelated addresses, and there might be multiple instances / threads, ie code exists once, data exists in multiple different and unrelated contexts. In STM32 usage I've seen this with several of the .STLDR (External Loaders) when built with Keil using address independent options, and perhaps not adequately addressed by STM32 Cube Programmer or ST-LINK Utilities in furnishing the execution environment.

https://stackoverflow.com/questions/7879278/arm-register-r9-in-the-linux-kernel

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

matteoricciutelli_enel · ‎2023-08-21

So, the problem now seems to be that static variables are somehow not listed by the .got.

When trying to access one of these static zero-initialized variables (so living int he .bss section) , the program goes into hardfault because it loads it from the wrong address.

This is the content of the .got section and since .bss starts at 0x20000f20 and ends at 0x200015d8 you can see several entries belong to the .bss section, but others do not appear, like this one below

Which is this static array

When the program tries to access this variable it does so at 0x1fff15d8 whic is exactly 0x10000 away from where it actually is ( and 0x10000 is the offset by which I upload the code to test the position independence).
This is because said variable is loaded by adding an offset to the program counter PC, if it were to be loaded via GOT this would not happen!
So why are local static variables not addressed via GOT while global variables are? is there a way to ensure GOT coverage for all static variables?