Bootloader code hard fault intermittently when jumping to application

shane mattner · ‎2023-07-29

I developed a bootloader firmware application last month that worked fine. I tested dozens, if not hundreds of times, on the Nucleo-F303RE board.

This morning I tried using that same code with a custom STM32f301C8TX board and it hard faulted over and over when trying to jump to the application code. I double and triple checked the memory bounds to make sure everything was aligned (0x5000 NVIC offset). Nothing worked. Main application worked fine when I debug downloaded it through CubeIDE.

Then I retested the F303R3 bootloader code I previously developed and had the same issue: hard faults when trying to jump to the main application. And then after a bunch of debugging and trying different code I found online I went back to my same code and low-and-behold it worked! So then I tried again with the custom STM32F301 board and it worked, too! Hallelujah!

But after developing other parts of the bootloader and testing again it seems that this code is back to throwing hard faults. Anyone have an idea what might be happening? I've tried:

- no optimization
- full optimization
- changing memory bounds addresses
- Restarting the computers involved (laptop for building firmware, Pi for sending bootloader commands to STM32)

void bootloader_start_app(void)
{

	printf("Starting user application...\n\r");
	// Set the Main Stack Pointer (MSP) to the value at the specified address in flash memory
	__set_MSP(*(uint32_t *)APP_FLASH_ADDR);

	// Get the address of the reset handler function from the flash memory
	uint32_t reset_handler_address = *(volatile uint32_t *)(APP_FLASH_ADDR + 4);

	// Declare a function pointer to the reset handler function
	void (*app_reset_handler)(void);

	// Assign the address of the reset handler to the function pointer
	app_reset_handler = (void *)reset_handler_address;

	// Call the user application's reset handler function to start the application
	app_reset_handler();
}

Any ideas of suggestions of what to investigate would be much appreciated!

Pavel A. · ‎2023-07-29

> it hard faulted over and over when trying to jump to the application code.

This is good. If the problem reproduces in 100% cases it should be easy to spot. Now it's the time to get know the debugger, especially its hard fault analyzer tool. What does it show? Can you make sense of it?

A hint: local variables may sit on the stack. When you arbitrarily move the main stack pointer, guess what happens to the stack. What if a systick interrupt occurs before the main app sets its vector table?

No amount of testing can prove bug-free. Testing only can prove existence of bugs.

shane mattner · ‎2023-07-29

Thank you @Pavel A. . I think I managed to solve the issue by making the reset handler address and associated function pointer `static voltatile`. This code below seems to work, for now...

static volatile uint32_t reset_handler_address = (APP_FLASH_ADDR + 4);
static volatile void (*app_reset_handler)(void);
void bootloader_start_app(void)
{

	printf("Starting user application...\n\r");
	// Set the Main Stack Pointer (MSP) to the value at the specified address in flash memory
	__set_MSP(*(uint32_t *)APP_FLASH_ADDR);

	// Assign the address of the reset handler to the function pointer
	app_reset_handler = (void *)*(volatile uint32_t *)reset_handler_address;

	// Call the user application's reset handler function to start the application
	app_reset_handler();
}

Here's a few links I referenced if anyone else runs into this issue:
https://interrupt.memfault.com/blog/cortex-m-hardfault-debug#bfsr

https://community.st.com/t5/stm32cubeide-mcu/hard-fault-when-running-code-generated-by-stm32cube-ide-for/td-p/175170

https://community.st.com/t5/stm32-mcu-products/after-calling-set-msp-the-local-variables-get-corrupted/td-p/90447