Custom bootloader on the STM32L4

Stanford · ‎2023-06-10

Here's how I implemented a custom bootloader using the STM32L475. I thought I would share since I have seen so many posts from people having issues invoking their application from their bootloader.

I attempted to post this as an idea in the knowledgebase, however it kept giving me an invalid link.

In my case, the bootloader is located at 0x08000000-0x0801FFFF (128K). My bootloader uses interrupts, the UART, timers, I2C, including the USB host (MSC class) so that application code can either be downloaded via a proprietary UART protocol or via a .hex file on the USB thumb drive. All of that easily fits within the 128K space.

My application code occupies 0x08020000 up to the remainder of the 896KB space. I defined an application header from 0x08020000-0x080207FF (2KB) in the linker file, so technically my application starts at 0x08000800. The application header contains the firmware version and a CRC hash for the entire application space.

The bootloader will look at those two items to decide if there is a valid application. The bootloader will compute the CRC hash through application space and compare the result with the CRC stored in the application header. If they match, then the application is valid.

(Note: if an application firmware update is needed, then the application "erases itself" simply by erasing the 2KB flash sector of its header (0x08020000-0x080207FF) and then resetting the chip via HAL_NVIC_SystemReset(). This causes the CRC has computation by the bootloader to fail. The bootloader will then remain and waiting for instructions over UART. It also, at startup, enabled USB Host MSC to look for a USB thumb drive and a possible .hex file).

In the system_stm32l4xx.c file within the application code, make sure uncomment:

#define USER_VECT_TAB_ADDRESS

and then set the following for the vector table base and offsets (in my case):

#define VECT_TAB_BASE_ADDRESS  0x08020000U

and

#define VECT_TAB_OFFSET     0x00000800U

In the application linker file:

MEMORY
{
  RAM     (xrw)  : ORIGIN = 0x20000000,  LENGTH = 96K 
  RAM2    (xrw)  : ORIGIN = 0x10000000,  LENGTH = 32K  
  APP_HEADER (xr)   : ORIGIN = 0x08020000,  LENGTH = 2K
  FLASH    (rx)   : ORIGIN = 0x08020800,  LENGTH = 894K
}
 
.fw_version :
{
 KEEP (*(.fw_version))
 . = ALIGN(4) ;
} >APP_HEADER
.crc :
{
 KEEP (*(.crc))
 . = ALIGN(4) ;
} >APP_HEADER

Note that my bootloader linker file is simply the default. And USER_VECT_TAB_ADDRESS remains commented out.

Somewhere in your application C/C++ code, define the fw version and CRC storage for the application header.

const uint16_t __attribute__((section(".fw_version"))) versionInfo[4] =
 { VERSION_FW_MAJOR, VERSION_FW_MINOR };
const uint16_t __attribute__((section(".crc"))) crc[2] =
 { 0x0000, 0x0000 };

Compile/link time computation and insertion of the CRC/version into the application .hex is beyond the scope of this post, but it involves invoking an external command-line program (invoked using the post-build steps in my application project) to compute the CRC of the application.hex file and inserting that into the application.hex file at the address of the .crc/.fw within the application header. Then, post-build steps invoke another program (hexmate) to append the bootloader.hex and application.hex files together, with the resulting .hex file is supplied to the contract manufacturer for board rogramming.

In the application, the location of the beginning of the stack is always the first 32-bits at 0x08000800. The reset vector is at 0x08000804, from which the bootloader will fetch the jump address.

The application invocation within the bootloader is simply:

#define BOARD_APP_FLASH_BASE_ADDRESS  (0x08020000)
#define BOARD_APP_FLASH_HDR_SIZE    (0x800)
 
HAL_DeInit();
__set_MSP(*(__IO uint32_t*)(BOARD_APP_FLASH_BASE_ADDRESS + BOARD_APP_FLASH_HDR_SIZE));
 void (*appResetVector)(void) = (void*)(*((volatile uint32_t *)(BOARD_APP_FLASH_BASE_ADDRESS +
   BOARD_APP_FLASH_HDR_SIZE + 4)));
appResetVector();

Note that in my case, the MSP of the bootloader and application happen to be the same, but you should not assume that. For safety, I set the MSP, the starting address of which is stored at 0x08020800. Also, note that the application assumes that the SP has basically been reset at power-on/reset, which allows it to take full advantage of the stack size it has defined. However, the SP before the vectoring to appResetVector() will be whatever it is at that moment. The application will start with that location within the stack space. Just make sure that you are aware of this (for instance if your application invocation routine within the bootloader is deeply nested in some code and you have a lot declared on the stack, then you effectively will have taken that away from the application.)

Also, there is no need to globally disable interrupts (such as with __disable_irq) before invoking the application vector. The HAL_DeInit() call forces all peripherals to their reset state. After that function excutes, global interrupts are still enabled, but no peripherals will interrupt since they are in their power-on, reset state. There is also no need to call __enable_irq() at startup within your application since interrupts are by default enabled at power-on/reset of the processor.

All of the above was generated using STM32CubeMX along with my custom bootloader and application accordingly. I only use the HAL and do not call LL directly or poke around any registers, except for the bootloader snippet above (__set_MSP).

(Note: my application uses all five (5) UART/USARTS peripherals using IT/DMA, GPIOs, I2C, ADC channels using DMA, RTC with VBAT, and all classes of USB activated (but only use MSC and HID). Pretty much a fully-loaded chip. The bootloader uses all of the above except ADC).

Hope this helps and provides some insight.

I welcome any feedback from the above. This is my first bootloader on the STM32.

JJordan · ‎2024-03-02

Really clean and concise, thank you.
In your bootloader, did you define HAL_MspDeInit() to do any additional user defined periperhal specific DeInit actions?

tjaekel · ‎2024-03-02

A pretty cool and well done description what a bootloader would do. Nice job.

How do you activate such a bootloader?
And yes, as JJordan has asked: how to you make sure all already used pins, peripherals are DeInitialized (to avoid conflicts with the bootloader)?

But why to develop your own bootloader when it is already in the chip (the BOOT pin)?

For a "reliable" bootloader you might these features:

have "magic" code in the code to flash, so that you know: "yes, it is intended for this platform"
Check the size of the image to flush: if it too large - it does not make sense to start (an "image size" information, in the BIN file itself to flash)
potentially, you could check, if the image to flash has a "valid" vector table, if it starts "reasonable"
and you need a "flow control" to get the image file:
It can be large (larger as you can store in internal SRAM entirely). So, you need also the "counterpart" on PC, e.g. a tool like "dfu-util.exe":
it will receive a packet of the new code image, stores in SRAM, erases the internal MCU flash sector and programs it with a new content:
All this takes time: you have to "pause" the transmission of the packets for the entire BIN file. A flow control between the flash tool on PC ("dft-util.exe") and the bootloader running inside the MCU
and how do you realize that the bootloader should be activated on every reset and power-up?
You need something so realize when to activate the bootloader.
For instance: the Portenta-H7 and Arduino bootloader has such feature: press the reset button twice to activate the bootloader. It uses registers in MCU which are not reset, as RTC BKP registers.

You can have a look at the Portenta-H7 Arduino bootloader (the source code is published meanwhile),

I would say: a "nice working" bootloader is a very tough part to implement. You did it very well, esp. to describe how it works. But do you cover also the other issues, like programming via USB, UART, with flow control and checking if the image is "potentially" correct? And what about the counterpart needed on PC side?

BTW: the STM32 internal bootloader (activated via BOOT) works pretty well. Also, the Arduino bootloader, e.g. for Portenta-H7, a STM32H7xx MCU, works pretty well. So, why do I need a "new" bootloader?

And when it comes to MCUs with "Trust Zone" (security) - how do you handle now to secure vs. non-secure part?