Bus fault Accessing flash

andy_long · ‎2024-02-15

Hi,

MCU- STM32H7

I get a busfault exception when I read bank2 flash memory. I can read bank1 with no issues. From the flash register SR2, it shows as ECC double bit error. This is new pcb and I can flash the hexfile to bank1 successfully.

Even a Jlink memory read showed read error (below SS):

I erased the entire chip using "JFlashLite.exe" and after that it started working normally. Why is this so ? Are we supposed to erase the entire flash for every new MCUs ?

Regards,

Andy

Pavel A. · ‎2024-02-16

What means 'recovery' in this question? If you ask whether the code can resume and continue after a busfault exception then yes. I've posted an example of "probing" a memory location not long ago.

IIRC erase of single sector or bank also resets the bad ECC in that sector/bank, the full erase is not required.

For example, a bootloader can start a watchdog and a restart counter in SRAM before checking the main app and jumping to it. If the restart counter runs to certain limit but the main app does not start, the bootloader can erase the sectors occupied by the main app. Then scan the erased sectors and verify that erase succeed and there's no exceptions. If this fails, the hardware should be replaced, else do a normal software update.

View solution in original post

Pavel A. · ‎2024-02-15

Do a mass erase.

> I erased the entire chip using "JFlashLite.exe" and after that it started working normally. Why is this so ?

Because the ECC thing can go hairwire. TL;DR. The flash mass erase (aka full chip erase) internally resets the ECC and the flash should behave well until the next disturbance.

What could happen to a new PCB? No idea. May be static electricity, radiation etc., power failure during programming or erase.

Tesla DeLorean · ‎2024-02-15

No mention of a specific part#, provide fully qualified part designation.

There's a memory location in OTP that should specify the size in KB of the FLASH memory tested on the tester at the factory. This is frequently a significant subset of the memory on the die.

Should work fine for all memory you've erased and written. If you're touching memory areas that you haven't written correctly it faulting from ECC failures shouldn't be unexpected.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

andy_long · ‎2024-02-16

Thanks @Tesla DeLorean and @Pavel A. for your replies.

Full part number is STM32H743XI.

Are you saying that the full chip erase should be a part of production programming SOP ?

Also, how can we recover if such an error happens on the fields ? What are the best practices ?

Pavel A. · ‎2024-02-16

> Are you saying that the full chip erase should be a part of production programming SOP ?

I'm not a specialist in production but would say yes - unless it is a too heavy impact on programming time. In that case I'd try to optimize by running a quick test in RAM: like if the code can read the whole flash, it reads as empty (FFs) and no exceptions, the option bytes are default - skip mass erase.

> how can we recover if such an error happens on the fields ? What are the best practices ?

A good question... To answer it one should know more about your product. Best practices are, of course, planning the recovery beforehand. The usual way of swapping the processor board in the field often does not work well with small microcontrollers like stm32. They are so small that h/w designers are tempted to solder them deep in large expensive boards - even multiple of them - so to service them the whole board must be replaced. For more complex and expensive controllers (iMX...) designers use SoM modules that can be easily detached. But only few manufacturers make SoMs for STM32.

andy_long · ‎2024-02-16

Thanks @pavel for your answer.

So, Is there anything we can do in the FW in a busfault exception for recovery ?

ie, If an ECC error occurs in flash, is there no way of recovery ?

Pavel A. · ‎2024-02-16

What means 'recovery' in this question? If you ask whether the code can resume and continue after a busfault exception then yes. I've posted an example of "probing" a memory location not long ago.

IIRC erase of single sector or bank also resets the bad ECC in that sector/bank, the full erase is not required.

For example, a bootloader can start a watchdog and a restart counter in SRAM before checking the main app and jumping to it. If the restart counter runs to certain limit but the main app does not start, the bootloader can erase the sectors occupied by the main app. Then scan the erased sectors and verify that erase succeed and there's no exceptions. If this fails, the hardware should be replaced, else do a normal software update.

andy_long · ‎2024-02-17

Thanks @Pavel A. Got it..