STM32: avoid deadloops when intercepting (expected) ECC errors while reading flash memory

MrJorge · ‎2025-05-09

I'm using STM32 with HAL and LL drivers (H7 and G4 families, in particular, but I think this can be a general question) and I trying to avoid being forced in recursive faults when reading a flash location with broken ECC.

In my application it may happen to try to read a broken flash location.

In TrapHandler I'm able to intercept the error, report it to flash driver, clear error flags, and avoid any reset.

However, when returning from TrapHandler, I'll fall back on the same flash instruction which generated the fault, which will try to read the same location again, and so another fault will be generated in loop.

Is there a way to continue with the execution in a portable way after encountering this fault?

For a deeper understanding, this is one of the specific use case in which I would need the above behavior:

When in bootloader, before jumping to application, I calculate the CRC of application flash, compare it with the one stored to another flash location, and jump to application only if they match.

However it may happen that a flash location is broken (e.g. due to an error in the application, or to a sudden power loss while writing/erasing) generating an ECC error.

When encountering this error, I report the faults to the flash driver which results in a failed CRC calculation, and I would like to proceed with my algorithm.

However, even if I'm correctly detecting the error, clearing flags and returning from fault handler, the flash driver will try to read the broken location again with another fault immediatly generated, and I get stuck on the same memcpy instruction forever, without being able to proceed.

This is a simplified code for the flash read, with the Fls_SetEccError called inside TrapHandler to report errors to the flash driver. However, if an ECC error is encountered, I'll get stuck forever in the memcpy operation.

To break the flash ECC, I can perform two writes with different values on the same flash location and then try to read it.

bool_t Fls_bEccError = FALSE;

bool_t Fls_Read( uint32_t u32StartAddress, uint32_t u32Length, uint8_t *pu8Buffer ) {
memcpy( pu8Buffer, ( uint32_t * )u32StartAddress, u32Length );
return !(Fls_bNoEccError);
}

void Fls_SetEccError( void ) {

Fls_bEccError = TRUE;

}

Thank you in advance for your help

Tesla DeLorean · ‎2025-05-09

Not sure there's an easy answer. You can check the stack frame to recover the context and advance the instruction pointer based on the opcode. Or recognize it is in this particular loop.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

TDK · ‎2025-05-09

Consider resetting instead and using a magic value in memory which indicates an error was encountered during a read to flash location X.

If you feel a post has answered your question, please click "Accept as Solution".

Tesla DeLorean · ‎2025-05-09

Not sure there's an SEH (Structured Exception Handler) in the try/catch sense.

The stated issue is that the return simply retries the operation, hoping the handler has fixed the issue, say pulling in virtual memory, or it digs into the context to fix or emulate the opcode that faulted, and advance. Making a general handler would be quite a task, making something relatively selective/specific, perhaps not so.

In that loop you could have it jump to a different location upon return, say breaking from the loop, by modifying the PC in the stacked context.

You have to return, rather than have the handler jump directly, as the machine has to unstack the context and MCU/NVIC internal states.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Pavel A. · ‎2025-05-09

Since you haven't cured the ECC error (erase & rewrite the whole sector) the error condition remains. As Tesla wrote there's no easy answer. For one, you can mask the memory fault permanently and resume running, leaving the proper fix for latter.

waclawek.jan · ‎2025-05-09

If you use asm to read the FLASH (and at least partially also for the ISR) you know which register contains the offending address, so you can increment it or set to a safe value (directly modifying given register or modifying it on stack, depending on which register is it) before returning.

JW

MrJorge · ‎2025-05-14

My target would be to avoid a reset in presence of an ECC error during a Fls_Read, which should only produce E_NOT_OK as result of the corrupted memory location (result which I was able to obtain on other MCUs without complex trap management and stack operations).

I give you a small update based on my reasearch (see the code below):

I found a way to avoid the reset and detect errors: disable the faults before the flash access and enable them again after that, while checking flash peripheral status registers to detect errors. However I do not like this approach since I would be scared to loose other possibly dangerous faults (and because, if my understanding is correct, it would also disable interrupts during the operation).

Std_eReturn_t Fls_Read( uint32_t u32StartAddress, uint32_t u32Length, uint8_t *pu8Buffer )
{
     Std_eReturn_t eRet = E_OK;

     /* Flash read simply copy bytes from flash to buffer */

     // Set FAULTMASK = 1: disables all faults except HardFault and NMI
    __set_FAULTMASK( 1 );

    // Set BFHFNMIGN: ignore BusFaults during FAULTMASK, NMI, HardFault
    SCB->CCR |= SCB_CCR_BFHFNMIGN_Msk;

     memcpy( pu8Buffer, ( uint32_t * )u32StartAddress, u32Length );

#if 0
    // Check for ECC double-bit error, and set E_NOT_OK if found
    errors are reported on SR1/2 and on ECC_FA1/2
#endif

    // Clear BFHFNMIGN bit
    SCB->CCR &= ~( SCB_CCR_BFHFNMIGN_Msk );

    // Clear FAULTMASK = 0
    __set_FAULTMASK( 0 );

     return eRet;
}

A suitable alternative, which seems far less dangerous, would be to insert a label just after the memcpy operation in the Fls driver, and inside TrapHandel modify the return point after fault managment so that the application software restarts from that label (and not from the memcpy operation) if the fault is recognized as belonging to Fls_Read.

However I was not able to update the next instruction executed inside the Trap Handler, so that program counter restarts from a different point after the fault managment.

If any of you would know how to accomplish this task would be very helpful.

Thank you again for your time

Edited to apply code formatting - please see How to insert source code for future reference.

waclawek.jan · ‎2025-05-17

Interesting, I wasn't aware of existence of SCB_CCR.BFHFNMIGN feature/bit. Thanks.

And your NMI handler was just a return, then?

A suitable alternative, which seems far less dangerous, would be to insert a label just after the memcpy operation in the Fls driver, and inside TrapHandel modify the return point after fault managment so that the application software restarts from that label (and not from the memcpy operation) if the fault is recognized as belonging to Fls_Read.

As I've said above (although I've suggested something slightly different, just avoiding the repeated reads (and possibly setting some flag), doing this sort of things implies to use asm, as it's straight against the grain of any higher-level language.

JW

Pavel A. · ‎2025-05-17

> as it's straight against the grain of any higher-level language.

Maybe this can be implemented on the C level using setjmp/longjmp (exit the NMI exception to a trampoline that will call longjmp)

TDK · ‎2025-05-18

There was another thread recently where the user said that after exiting NMI, the program jumped to the code after the command which produced the ECC error. I wonder which is correct, or what the difference in results is attributed to.

If you feel a post has answered your question, please click "Accept as Solution".