How does one design a circuit where an STM32H7 is never unrecoverable?

NAbun.1 · ‎2022-10-14

Per section 2.6 of RM0433 (https://www.st.com/resource/en/reference_manual/rm0433-stm32h742-stm32h743753-and-stm32h750-value-line-advanced-armbased-32bit-mcus-stmicroelectronics.pdf), when the STM32H7 boots up it latches the boot pin and then goes to the memory address that is coded in the BOOT_ADDx option bytes.

At the factory they are set to the following: BOOT_ADD0 is somewhere in flash memory, and BOOT_ADD1 is set to system memory's system bootloader.

Ideally, I think one would just never change BOOT_ADD1 to preserve the safety of being able to always be able to access the system bootloader in case the flash becomes corrupted in a over the air re-programming application. However, given that BOOT_ADD1 is modifiable, if one were to modify BOOT_ADD1 to also point to some corrupted flash memory, how would one recover the chip at that point?

The easy answer is to make unmodifiable code, but I want to design something that is robust to egregious failures.

I want to make a schematic where we can always recover the chip in the event of corruption (because there will be a lot of over the air programming with a possibility for failure) and I am wondering if that is even possible?

Pavel A. · ‎2022-10-16

> The easy answer is to make unmodifiable code, but I want to design something that is robust to egregious failures.

Go for unmodifiable code. Else there always remains a way to brick the MCU.

Danish1 · ‎2022-10-16

It is always possible to intentionally make a device unrecoverable - just eraseor put bad code into a device, write-protect it then set code readout protection to level 2.

How “invasive�? can you be when trying to recover a device that has (for whatever reason) stopped functioning? My personal preference is to have some way to attach a JTAG or SWD interface e.g. STLINK and not rely on ST’s bootloader. I don’t expect a customer to be able to un-brick a device, but one that is brought back to the factory can be reprogrammed by me.

What failure(s) are you trying to guard against?

Incomplete or corrupt download over-the-air?
Incomplete writing of the downloaded firmware into FLASH
Buggy/bad code being downloaded

If you have enough nonvolatile memory to separate the downloading from writing into main “executing�? FLASH then a small, and never updated, bootloader of your own can protect against 1 and 2. It needs to be able to check if freshly downloaded code is good and complete e.g. by checksums. And if it is good, whether it is identical to what is in main FLASH.

All the bootloader does is: If download is incomplete/corrupt/same - run existing code. Otherwise copy download into main FLASH. Then execute main FLASH

If you have enough spare FLASH on your stm32 then the download can be into that piece of FLASH. But if you put it on removable storage e.g. a micro-SD card then it opens up an upgrade/recovery path of “put in an SD card with good firmware�? rather than having to be over-the-air. And this can recover from 3 as well as 1 and 2.

Hope this helps,

Danish

NAbun.1 · ‎2022-10-18

Hello Pavel and Danish,

Regardless of intention, the hope is to make a design as robust as possible.

I am trying to protect against

Corrupt download over the air
Corrupt writing of downloaded firmware into flash
Download and writing of incorrect (but technically valid from a CRC perspective) firmware that causes a crash (buggy code)

I will not have physical access to the boards, so I need a robust recovery method that is possible to perform remotely (using only the boot pin and the reset pin, and a bootloader UART, unless there are other pins that would help), so no SD card.

Essentially, I want to have two images in flash with the ability to automatically switch between them in the event of one of them crashing. Let's say that image A is the default image and we perform a successful over the air update of image B. We verify the boot functionality of image B by changing the BOOT_ADD0 to the address of image B. While in image B, we want to now perform the update to image A. A possible case is that we complete a download of the incorrect (older but valid) firmware, change BOOT_ADD0 back to image A, reset, and then crash due to the out of date firmware on image A. With the boot pin low, we are stuck on this crashing image A.

The recovery method here would then be to use boot pin to invoke the system bootloader, and try and flash the less robust way through the system bootloader. I would rather have this be a fail-safe, not the first line of defense.

If the flash had infinite write endurance, one could make a small application specific bootloader that just always alternates between the images by flipping a bit at each boot. But the flash only has 10k cycles of endurance so for applications that may be turning on and off many times a day, it isn't viable. One thought I had was for the small application bootloader to use the random number generator to choose between the images and then you would have a good chance of loading the "good image" after having programmed a "bad" one, but that isn't a 100% foolproof method.

I also don't know if there is a way to use the watchdogs to modify flash and then that could be a way to only modify the bit that causes you to swap between images when there is a crash (which would hopefully be less than 10k times), but I couldn't see a way.

Another thought is to change the BOOT_ADDx values to use the boot pin as an image selector. Then if we get into the situation above, we could just toggle the boot pin to get back to the good image. But this would mean that if we have code that works on boot, but has a buggy edge case that manifests much later after programming and starts to continually crash, we don't have have a way to access the bootloader since we essentially took away the purpose of the bootloader functionality of the boot pin.

I'm not entirely sure how JTAG flashing works since programmers like the ST-Link don't use the boot pin and therefore have some other way of accessing the flash, but maybe something like that would be helpful (but I also don't want to have to implement another micro to use JTAG).

Pavel A. · ‎2022-10-18

> One thought I had was for the small application bootloader to use the random number generator to choose between the images

You can make an external circuit to toggle BOOT0. It can also serve as a watchdog to detect failure to start the app or failure of the bootloader. Then it can automatically toggle BOOT0 and reset.

But, as you know, both boot addresses BOOT_ADDx can be changed by code, so much for using the BOOT0 pin.

>I'm not entirely sure how JTAG flashing works since programmers like the ST-Link don't use the boot pin and therefore have some other way of accessing the flash

The debugger block of STM32 gives full access to almost everything. It is convenient for development and recovery, but for the end product it is wide open backdoor.

Think well whether you want to allow it in the production state.

Newer STM32s have password-protected entry to debug mode, which makes the debugger interface attractive for recovery - but not STM32H7, unfortunately.

NAbun.1 · ‎2022-10-21

Hello Pavel. I was thinking about using an external flip flop to toggle a GPIO (not the boot pin to save that for the actual system bootloader) that an application bootloader could look at before choosing which image to jump to. The problem is that it is extra hardware that would have to be included in every circuit this micro was in.

Regarding the debugger block, I don't mind having a wide open backdoor in this case because the boards this circuit would be for are for applications where no person can access it (which is why I am trying to make it so robust to begin with).