How to perform independent CRC on internal flash in STM32H7 (avoiding flash ECC / bus fault)?

MWalk.3 · ‎2021-06-25

I have a bootloader that attempts to verify the integrity of an application image before it transfers execution to the application. It does this by performing a CRC (using software) on the image. However, if the internal flash is corrupted somewhere inside the application that results in a double ECC error, then a bus fault exception will be thrown.

I've tried several methods of handling the bus fault, but so far none have worked. Does anyone have a suggestion on how to handle ECC errors in internal flash?

Things I've Tried that Did not Work

Skip the Offending Instruction

In the bus fault handler, increment the application program counter and return. This unfortunately didn't work because the ECC algorithm compiled itself into a fused instruction that loaded the offending location and incremented the offset at the same time (LDRB.W r2, [r3], #1). As it's a fused instruction, skipping it means that the loop counter doesn't increment and we just end up in an infinite loop. Although I could rewrite the algorithm to avoid this issue, it would likely become something specialized just for this chip family and issue; something I wish to avoid.

Enable Internal Flash Interrupts

In the hopes that having the flash interrupt (on double and single interrupts) enabled would somehow circumvent the bus fault; flash interrupts were duly configured. Unfortunately this just results in the bus fault first, then the interrupt handler and still no method to really correct the issue or make forward progress.

Disable Interrupts/Faults

Pavel A in https://community.st.com/s/question/0D53W00000Jkw8ZSAR/stm32h7-flash-write-returns-ok-yet-hard-fault-during-readback suggested that disabling faults with a "cpsid if" might work. Unfortunately that just leads to a double fault and lockup.

Things I'll Try Next

Using the flash CRC engine to do a CRC precheck. If it can avoid issuing a bus fault and instead issue a CRC failure then I can know the memory region is safe to access.
Issue a flash erase command inside the bus fault handler.

Reproducing this Issue

If anyone wants to play, the easiest way I've found of causing this corruption is to double program a flash word.

I ran across this forum post as well, but as far as I can tell no one has a good answer there either: https://community.st.com/s/question/0D50X0000AX8Hm3SQF/stm32h7-internal-flash-error.

MWalk.3 · ‎2021-06-25

I have a small update; using the CRC engine inside the flash does not result in any error being generated while it is reading over the affected region. As in, the CRC operation completes successfully (CRCEND is set and CRCRDERR is clear), no interrupts are generated, and no status flags are raised in SR.

I could change the protective CRC to an Ethernet CRC (which is what the engine performs) but then my application build process for just the H7 will be different. Doable, but extremely annoying.

waclawek.jan · ‎2021-06-26

I don't quite understand what do you want to accomplish - do you want to continue with CRC calculation regardless of the fault?

I would set a flag in the fault handler, then check this flag in the CRC calculation loop after reading a word from memory and abort the loop if the flag is set.

JW

Pavel A. · ‎2021-06-26

> Pavel A in https://community.st.com/s/question/0D53W00000Jkw8ZSAR/stm32h7-flash-write-returns-ok-yet-hard-fault-during-readback suggested that disabling faults with a "cpsid if" might work. Unfortunately that just leads to a double fault and lockup.

Sorry I've forgot to mention, this trick requires to enable separate handler of bus fault or usage fault (whatever happens), so that it won't immediately elevate to hardfault.

Which STM32H7 do you have? Some older models have errata (2.2.17) that simply says:

Workaround

Do not use the Flash memory CRC calculation feature

-- pa

MWalk.3 · ‎2021-06-28

waclawek -- In my applications I wish to continue the CRC calculation regardless of ECC errors or to escape the CRC calculation so that I can return an error. Detecting that the fault is precise, happened in the flash memory space, was an ECC error, and ideally that it occurred in the CRC function, then setting a flag in the fault handler, incrementing the thread PC, and then returning to the thread will work; but that's somewhat complex and invasive. That being said; it's looking increasingly like that's my best path forward.

Pavel -- I do have separate handlers for the faults, if I don't disable them I see that I'm in the Bus Fault handler. But if I disable the fault handlers with the "cpsid f" then the system enters lockup because it has nothing to handle the fault. Unless I'm misreading what you suggested in the other thread, it seemed like you managed to get the CPU to retire the offending instruction in the offending thread, and proceed in the offending thread without escalating to lockup, but while leaving the hardfault pending. I might be missing a system configuration register setting, but my system does not behave that way. Mine will not retire the instruction without proceeding to the hardfault handler, and only then because I was explicitly incrementing the PC. Just returning from the fault handler attempts the offending instruction again and we just bounce between the application and the fault handler.

I have to make this work on STM32H743, STM32H753, and STM32H7A3. For the 74 and 75 we're in the latest stepping.

It turns out the built in flash CRC calculation wont work for my needs anyways; mostly due to the fact that it'll return a CRC but it won't have checked the ECC data, so I'm still at risk of hard faults when I attempt to use the data.

If wishes were hardware; I wish there was a bit in the flash that I could set to have it just return whatever data it had regardless of error correction, and then throw a flash interrupt for me to do something else if I need to. Much like how the RAM ECC works.

Pavel A. · ‎2021-06-28

Well then this is bad news for me. I hoped that enabling separate bus fault handler can prevent elevation to hardfault...

What if you move the offending code *and vectors* to RAM?

--pa