2024-07-11 07:31 AM
Hello,
We have a device that runs an stm32g0. I have a bootloader at the beginning of the flash and then 2 pages for config and the rest of the flash for the firmware. The bootloader basically checks if the two config pages are OK and then starts the firmware. I have now 3 decives that suddenly stopped working and all have the same issue:
The device starts in bootloader, prints some messages (UART) and then freezes and restarts when the watchdog times out. Based on the messages I see at the output, I know that the only thing the bootloader does before freezing is reading the two config flash pages (copying the content to a buffer with memcpy).
I see two possibilities:
1) the bootloader (code in flash) is somehow corrupt, however I've write protected the flash where the bootloader rests...
2) reading the internal flash (config pages) creates an interrupt (hard fault maybe?) and the bootloader is stuck in the busy loop of the interrupt handler (I have no output there unfortunately).
The devices have readout protection enabled (level 1 read protection) so I cannot reflash a modified bootlader to debug the problem. I did remove the readout protection of one device and reflashd it and it worked without problems afterwards. I've now done a stress test on that device by savig the config pages many times (now already over 30'000 times and it works without any problems)...
Any hints what I could do to find the problem?
Solved! Go to Solution.
2024-07-15 12:19 AM
I've just noticed this:
> I've written the config pages about 50'000 times
The endurance of 'G0 FLASH is 10kcycles, see datasheet. So what happened is that you've damaged the FLASH by cycling it way more than its endurance is. The symptoms appear to confirm this: worn out FLASH tends to leak charge from the memory cells, i.e. it "forgets after some time" - exactly as in your case, verify was OK immediately after the write, but after several hours the leak was bad enough to result in an uncorrectable double-error and throw the NMI.
JW
2024-07-15 03:55 AM
Thanks a lot for your help!
Erasing a flash page and then reading it before writing does not cause the ECC double bit error, I've tested it multiple times (I've also erased the page and powered the device off, it still works the next power on).
A reset or power loss during the erase/write however can cause the ECC error sometimes. I've produced the error on a new device by resetting it during erase/write and then modified the bootloader by adding
if (READ_BIT(FLASH->ECCR, FLASH_ECCR_ECCD) != 0U) {
SET_BIT(FLASH->ECCR, FLASH_ECCR_ECCD);
return;
}
to the NMI_Handler to ignore the error and then I was able to read and correct the config page and startup normally.
2024-07-15 09:33 AM - edited 2024-07-15 09:34 AM
While power loss may undoubtedly result in double-error/NMI and you certainly need to cater for that, you definitively should honor the endurance values, too.
JW
2024-07-16 12:51 AM
The 50'000 writes was just a test, normally, the config gets written only a couple of times during the lifetime of the device. We've analyzed some other devices that were on the same "system" and they all had write times < 50.