2023-09-28 03:14 AM
We're using the STM32WB55CE across a range of products, using a mostly shared codebase. Thousands of devices deployed and mostly very stable, but one particular device (the one with the largest code base) is causing a lot of grief..
After a few weeks of "life", the device will not turn on. Power cycle does nothing. We read protect all the devices so can't attach a debugger (ofc, can't reproduce the issue when not protected, that would be too easy). But we can still read the option bytes and some of the registers.
Reprogramming the device ALWAYS fixes the issue: RDP to 0xAA (erase), reprog, all works straight away. But we have no confidence that it will stay fixed.
What I can see is the devices are going into LOCKUP with PC 0xfffffffe. I've got a range of theories regarding how this can happen, but not sure if any are correct, these are currently my main two:
- a bug in this version of code is causing the flash at 0x8000000 to be erased (we use eeprom flash emulation and have bootloader capabilities built in, so the code is in there to erase pages. Its highly unlikely but possible that a stack corruption/null pointer is causing a certain jump that would erase the very start of the flash)
- boot pin H3 is high (GPIO) on reboot. We now change the option byte to ignore this but as far as I'm aware, that flag was default on all "dying" devices. I was sure that default was set to checked, but reading back the option bytes of "dead" devices. If this was set, could it be booting into RAM and executing random code that would somehow corrupt the processor itself? Seems unlikely also..
Does anyone have any experience with LOCKUPs and what causes them? I've tried to research but all very vague.
Thanks
2023-09-28 06:03 AM
> Does anyone have any experience with LOCKUPs and what causes them? I've tried to research but all very vague.
The STM32 doesn't have critical flaws that cause lockups every once in a while, so you're unlikely to find anything by searching for that. There's no magic solution here, just gotta find the bug in the program and fix it.
Seems like you are on the right path. EEPROM emulation having a bug and erasing necessary flash is certainly worth investigating. Perhaps toggle a pin at the very start of your code to confirm the flash has been erased.
PH3 being high on reset would send it into the system bootloader instead of user code. But if that were the case, it's unclear how the device would function at all.