STM32 is getting stuck/frozen and enables read-out protection at runtime

Vidhi_V · ‎2023-11-30

Hi STM team,

We are using the stm32L476ZET6 microcontroller and developed our custom-embedded device, which operates on the backup battery (4.4V) as well as on an external constant DC (12V) supply. So far, we got good results for our business use case with this stm32L476ZET6 microcontroller. Recently we observed one of the blocker issues in the long run and it is affecting our field deployment, hence this issue is a very critical issue for us and we need your quick support for the same.

Background :
As per our business use case, if the user removes the external supply then our device runs on a backup battery (4.4 V) for a few defined minutes and then goes into deep sleep mode. So we have defined a few wake-up events to wake the device and report the data to the cloud. This way our device runs in the long run on a backup battery.

Problem Statement :

Recently, in the long run on a backup battery power supply, we observed some of the devices woken up on wake-up events and then went into stuck/frozen. Please note, for a long, all those devices were working fine before this issue was observed. We did the RCA (ST-link) and observed that MCU had its RDP (Read-out Protection ) flag set from level-0 (AA) to (FF).

We need your help to resolve the below queries.

1. From the software perspective we are configuring just 2 option bytes - IWDG_STOP at the bootup & BFB2 at the time of OTA upgrade/flash. As this MCU is running on a backup battery, is are any chance while performing the option byte configurations due to power fluctuation MCU triggers the RDP to set 0XFF to protect the flash from corruption?

2. We went through the user manual of STM32 but we couldn't find in what situations MCU can set this RDP flag to 0XFF. Can you please help to identify the possible cases?

3. There are many places in our software where we access the MCU Flash right after it gets powered up. Is there any chance that due to power fluctuations/low battery, these places get hit multiple times and can corrupt the MCU flash operation which eventually can set the RDP flag to 0XFF?

4. Is there any specific power sequence in the MCU that can trigger the RDP set issue? Because from code we are not accessing the RDP bit.

5. In existing software, we do not have any voltage check to perform any flash operations. Do you recommend one should have a voltage threshold defined to perform flash operations to protect the MCU flash from getting corrupted?

6. At what voltage level STM32 can handle the successful flash operation? Does it have any voltage limits defined? Are there chances that below a certain voltage level if the user tries to access flash, it enables the RDP flag to protect the chip?

Since this is a major blocker issue, your early inputs will help a lot to reach at some level. Let us know If you need any further information on top of this.

Thanks

Vidhi V.

TDK · ‎2023-11-30

> 1. From the software perspective we are configuring just 2 option bytes - IWDG_STOP at the bootup & BFB2 at the time of OTA upgrade/flash. As this MCU is running on a backup battery, is are any chance while performing the option byte configurations due to power fluctuation MCU triggers the RDP to set 0XFF to protect the flash from corruption?

Yes, interrupting power during OB configuration can cause it to remain in the erased state (0xFF).

> 2. We went through the user manual of STM32 but we couldn't find in what situations MCU can set this RDP flag to 0XFF. Can you please help to identify the possible cases?

Power loss during programming and code bugs are the possible causes.

> 3. There are many places in our software where we access the MCU Flash right after it gets powered up. Is there any chance that due to power fluctuations/low battery, these places get hit multiple times and can corrupt the MCU flash operation which eventually can set the RDP flag to 0XFF?

Yes.

> 4. Is there any specific power sequence in the MCU that can trigger the RDP set issue? Because from code we are not accessing the RDP bit.

~~It is not published, but~~ I suspect the OB are set on an internal flash page which is not accessible to the user. As with programming any flash, it is first erased entirely and then programmed. Presumably, changing any option bit could cause others to be corrupt if power gets reset.

Edit: actually, yes this is stated explicitly in the RM:

> 5. In existing software, we do not have any voltage check to perform any flash operations. Do you recommend one should have a voltage threshold defined to perform flash operations to protect the MCU flash from getting corrupted?

Not specifically, but stable voltage is going to be required. Hard to make any guarantees on program behavior without that.

> 6. At what voltage level STM32 can handle the successful flash operation? Does it have any voltage limits defined? Are there chances that below a certain voltage level if the user tries to access flash, it enables the RDP flag to protect the chip?

~~Pretty sure this is stated somewhere, perhaps the datasheet. I will edit when I find it.~~

There doesn't appear to be any voltage restriction that I could find.

Also of note, to see if the option bytes really did get erased, you should be able to look at the complementary option bytes and see if they are the XOR of the real ones. If everything is 0xFF, it all just got left in the erased state, presumably due to power loss.

If you feel a post has answered your question, please click "Accept as Solution".

Vidhi_V · ‎2023-12-01

Thank you for your quick response and support.

At the time of our testing, we observed that there is one custom API that is accessing a flash by configuring a few flash parameters. In this custom API for various flash operations we've kept retries as well. So in the worst/failure cases there are chances of performing flash operations more than once. Can you confirm while performing these operations, the MCU can trigger the flash lock if anything goes wrong? I have attached the screenshot below for our custom API code with the highlighted portion where our suspect lies.

Thanks,

Vidhi V.

TDK · ‎2023-12-01

> Can you confirm while performing these operations, the MCU can trigger the flash lock if anything goes wrong?

Oh, actually let me walk back what I wrote, partially.

If a flash operation (non-option bytes) is interrupted, the option bytes should not be being modified. Only on option byte operations is this a possibility. So I don't think the code you posted could cause this.

That said, there are a number of threads over the years with issues like this claiming that the MCU is setting RDP=0xFF, but in my opinion none of these users actually posted very convincing evidence that the chip was to blame.

I would be interested to hear if the entire option byte section was erased, or if it was programmed with valid values (complementary values are the inverse of the real ones).

If you feel a post has answered your question, please click "Accept as Solution".

Vidhi_V · ‎2023-12-04

Hi,

I really appreciate your quick responses.

This is what I understood based on your clarification. If we perform memory operations other than the option bytes in shaky/low power, then MCU won't necessarily set the RDP. It comes into the picture only when something goes wrong during OB configuration. Please correct me if I'm wrong.

I've attached 2 screenshots where we are clearing the IWDG_STOP user OB flag. These operations are in the main function right when the MCU initialization happens. According to your clarification while performing these sequences there are chances to set RDP if anything goes wrong. Can you please confirm?

Also, the reason why we suspect on MCU chip to set the RDP rather than the code bug is that neither throughout this operation sequence nor in the whole software code did we touch OPTIONBYTE_RDP flag but instead used the OPTIONBYTE_USER flag. Do you think OPTIONBYTE_USER configuration can indirectly corrupt OPTIONBYTE_RDP?

For your knowledge, at the time of this issue, the entire option byte section was erased.

Thank you

Vidhi

TDK · ‎2023-12-04

> If we perform memory operations other than the option bytes in shaky/low power, then MCU won't necessarily set the RDP. It comes into the picture only when something goes wrong during OB configuration.

Correct. Regular flash operations will not affect option bytes.

> According to your clarification while performing these sequences there are chances to set RDP if anything goes wrong. Can you please confirm?

Correct.

> Do you think OPTIONBYTE_USER configuration can indirectly corrupt OPTIONBYTE_RDP?

Absolutely. Whenever ANY option byte is set, the entire option byte area is erased (including RDP and whatever else is in there), then reprogrammed. At the hardware level, they cannot be modified one at a time. See the excerpt above from the reference manual.

> For your knowledge, at the time of this issue, the entire option byte section was erased.

Thanks. This lines up with what we think is happening--power loss during OB programming.

It's best to avoid writing to OB at all during production, if it can be avoided. Not sure if your particular situation, of course, but it seems reasonable to either always or never disable IWDG in stop mode.

If you feel a post has answered your question, please click "Accept as Solution".

Vidhi_V · ‎2023-12-05

Hi,

Thank you for further clarifications.

Can you please simplify this below sentence to understand what exactly you're trying to convey?

> but it seems reasonable to either always or never disable IWDG in stop mode.

Also in one of the instances, we've observed that at the time of initialization, the MCU gets stuck and then reset. On debugging, we found that a hard fault was getting triggered. Is there a chance to have HardFault triggered in case of the wrong OB configuration?

TDK · ‎2023-12-06

>Can you please simplify this below sentence to understand what exactly you're trying to convey?

>> but it seems reasonable to either always or never disable IWDG in stop mode.

Paraphrased as:

Never set option bytes during production. Choose an IWDG_STOP value and leave it alone.

> Is there a chance to have HardFault triggered in case of the wrong OB configuration?

Probably. I don't see how it could happen as a direct result, but certainly there is code logic that would result in a hard fault if OB isn't set up as expected. If the device is already bricked due to the OB being messed up, and that's the only time you saw that behavior, I wouldn't worry too much about it, although attaching a debugger and seeing the nature of the hard fault seems fast and prudent.

If you feel a post has answered your question, please click "Accept as Solution".

Vidhi_V · ‎2023-12-06

Okay. Thanks for simplifying.

I understand what you said about HardFault. Speaking of which, if the MCU gets stuck (like an infinite loop) or a hard fault occurs (Like a crash) during the execution of business logic, I assume the STM HW watchdog will be triggered and will reset the MCU. Can you please clarify if it is otherwise?

TDK · ‎2023-12-06

correct. If enabled, the watchdog will reset the chip even if it's in the hardfault handler. Of course, if it just ends up there again after reset, it doesn't help your cause much.

If you feel a post has answered your question, please click "Accept as Solution".