STM32F103 - Code flash getting erased automatically

PJosh.8 · ‎2019-03-28

Hi,

Our STM application is a smart bike IoT device, that let's a user track his / her vehicle.

Of our devices deployed in the field, some of them stopped functioning. On debugging we found that the entire STM flash was erased.

What could be the possible reason for the STM flash getting erased?

Additional info:

We use the the STM internal flash for storing certain critical parameters, which are written to typically 10 times a day.
Additionally we have a second MCU that is used for updating STM's firmware. We have placed extra checks here to ensure it does not accidentally bootload STM

What measures should we take to prevent such an erase?

Thanks in advance,

Prathamesh

S.Ma · ‎2019-03-28

Have you implemented the HW watchdog or ensure the Vdd is always within the operating range? The reset or power on/off reset implementation must ensure that the voltage of the MCU is in the operating range.

Erase flash or other driving high current HW (lights, motor) may source more current causing voltage droops
If the supply can transiently be removed could also cause a droop
Someone using ESD gun on bike

In general case, Vdd has 3 ranges: very low voltage where the chip does not work

nominal voltage where the chip is guaranteed to work

Between the two, the chip is "unknown" and could be "drunk".

This is the reason why there is power on reset or power off reset (voltage droops) to take care off.

The reset pin is one way to let the chip run in valid conditions.

In other typical case, if someone use a chip on a CR2032 button battery and there is a too big surge of current, the serial resistor of the battery will make a big voltage drop which easily show this thing to take care.

In some cases, we fill the unused flash memory with op-code causing reset or exception in case the program counter derails.

All these are learnt by customer field returns making a valuable QA test list over time which improves the ruggedness of the end product.

Again, this is just an hypothesis scenario in present case, which is good to checklist in any design.

S.Ma · ‎2019-03-28

Note: Some STM32 have an analog watchdog for such type of purpose (in case the watchdog is not enough). I think it's called WWD or something.

PJosh.8 · ‎2019-03-29

> Have you implemented the HW watchdog or ensure the Vdd is always within the operating range?

I don't think so. Do you mean something similar to the brown out detection some processors provide?

> All these are learnt by customer field returns making a valuable QA test list over time which improves the ruggedness of the end product.

Yes! The 3 range Vdd input was very insightful. We will bear that in mind in future hardware revisions.

Meanwhile we are eliminating all runtime usage of STM code flash, and have pushed the update in a select few devices. We'll observe if the issue recurs.

S.Ma · ‎2019-03-29

Watchdog requires specific SW implementation, basically you got a reset countdown (say 500msec) and the SW must reset this countdown in one place of the code. The problem is that if the core gets crazy, 500 msec is a lot of crazy executed code before the reset.

That is why there is also the analog watchdog which triggers if the MCU voltage drops too low.

Worst case would be if the flash erase occurs just when the supply is cut, is there enough juice in the decoupling caps to complete the erase/write? Did I implement a backup sector so that when writing or erase fails, there is a recovery in place?

CYap.1 · ‎2021-11-21

Did you ever solve the issue through one of your changes or track down its root cause?