EEPROM emulator corrupts STM32G431KBU6 program

WSpar.1 · ‎2023-10-30

Hi all,

We deployed a lot of devices that have a STM32G431KBU6 controller.
It basically has a foot switch to turn on/off a 12v DC motor and a rotary switch to control the speed.

We use EEPROM emulator to save the speed setting.
If speed setting is different from the stored value and there is no new speed change detected within a minute we store the new value. (to safe unnecessary writes)

Clients mostly only set the speed setting once when they first use the product.

So EEPROM emulated write is almost never done only EEPROM emulated read on start up of the device

Somehow support receives a lot of complains about non responding devices. Only reprogram the device helps. But this is a temporarily fix.

Sometimes clients tell us they had a power outage before these issues appear.

We can not reproduce the issue, maybe because our mains power is very clean?

Can power glitches (12v power jack) cause flash program corruption?

Bob S · ‎2023-11-02

> Unlock/Lock direct before and after the write.

Your code should NEVER unlock the flash. The EEPROM emulation calls should handle that (I think).

> Detect setting changes and only write and unlock Flash ...... detect stable voltages. (BOR detection)

This is useless. The only way you can guarantee stable voltages for the duration of the FLASH erase/write cycle is to have an EXTERNAL brown-out detector (not the on-chip one) and enough bulk capacitance to provide power to the CPU after input power goes away so that the FLASH operations can complete. Once the CPU starts a FLASH write operation, there is nothing you can do in software to prevent flakey power from interfering/corrupting the data.

And once again, I don't think your "store config data" operation is what is corrupting the FLASH. That only happens once, right? I think the issue is the EEPROM emulation init code that ON EVERY SINGLE POWER ON writes to FLASH.

FLASH should be stable (good/valid) regardless of good/bad power if code never attempts to write to it. It is only when FLASH write operations happen with "bad" power that things can get corrupted.

And if it is your code that is getting corrupted, then there is nothing you can do. Period. Except maybe provide a way for the customer to force the internal boot loader to run and re-program the chip from scratch (I wouldn't want MY customers doing that). If it is only the configuration data that gets corrupted, then as @Tesla DeLorean said you need to have some kind of default/fall-back config that will allow the system to run, and then allow the customer to re-program their specific configuration.

MM..1 · ‎2023-11-02

@WSpar.1 wrote:
It basically has a foot switch to turn on/off

Here is your power outage, how you prevent user switch off on same moment as EEWrite?
Your design require redesign, but start with locate what in flash is corrupted.

WSpar.1 · ‎2023-12-04

Hi all,

I got one bricked device back from the client and saved both the entire flash and my EEPROM pages.
Attached is the EEPROM export starting from 0x0801b000 to the end of the flash region.
I noticed at 0x081b800 it restarts the headers, or is this a guard page?

I can now inject the hex dump into fresh programmed devices and reproduce the bricking.
I can not reproduce how the EEPROM pages got corrupted, but at least I can now start making code that should be able to repair this.

Can someone here have a look at the file?
Should EE_init be able to handle this?

Another issue I face now is that Debug inside STM32CubeIDE is probably erasing the entire flash?
Because in debug mode everything seems to work again.
It is only when injecting the hex file with STM32CubeProgrammer that I can brick a device.

But I like to verify where it is hanging in the code.

MM..1 · ‎2023-12-04

Is right say your brick is jump to errorhandler after EE_Init ?

From hex isnt clear what is fail.

IDE or Programmer can be set to erase full or only required part of flash.

Debug can be started withoud flash erase, set it in debug config.

Trace EE_Init for error source after stored hex dump reflash...

WSpar.1 · ‎2023-12-08

So far not found an option how I can skip programming when start debugging in STM32CubeIDE.

Also an error in EE_init should turn on a red led, that led is never reported turned on.
Could it be NMI / EECD on the G4 series?

It is really difficult to find out what is going on here.
I read EEPROM emulation documentation multiple times, I see some points I can improve on, but are not linkable to the real issue in my opinion.

The metal footpadel on the device has a TVS diode, but other then that is going straight to input pin on the uC.
Maybe I have ESD issues here?

For now I try to collect bricked devices and make full flash dumps of them to figure out what is going on here.

If I look at the ST G4 eeprom example, it is basically implemented the same as what I have now:

A Flash_unlock() high in the main loop and unlocked for a long time

And error handling:

static void Error_Handler(void)
{
  while(1)
  {
    /* Toggle LED_KO (Red) fast */
    BSP_LED_Toggle(LED_KO);
    HAL_Delay(40);
  }
}

So the examples are not very great.
Does someone know a very good implementation example?

MM..1 · ‎2023-12-08

In debug config set Download to false

Piranha · ‎2023-12-09

Should EE_init be able to handle this?

The initialization function should be capable of getting the memory into a usable state regardless of it's previous state. In the worst case it can loose the data, but it must not brick the device under any circumstances. But I'm talking about a code written by competent people, not the HAL/Cube broken bloatware...

There is also a chance that the tools are also lying. You can make and flash a small test firmware, which checks what the actual state of the suspicious 0xFF bytes really is. Probably those are invalidated with 0x00 or have an ECC errors.

WSpar.1 · ‎2023-12-11

I totally overlooked that option, thank you.
When I'm not entirely sure my project code is equal to the dump, what will happen in that case?

So the client dump.bin I made is reprogrammed inside the device and then I start debugging in STM32IDE with code open that I'm not entirely sure that it is exactly the same code.

When pausing the debugger it is simply not mapping the halt accurately to the right line of code?

WSpar.1 · ‎2023-12-11

The documentation is not very clear, but the provided examples make it worse.
And so far, I haven't found any good example that implements the necessary features.

- Do I get a cleanup signal when it is time to cleanup? Or is it handled by default?
- What happens when power is removed while page erase was still busy?
- How to deal with ECC errors? One error can be repaired they say, is it done automatically? Two errors can not be repaired and results in an endless while loop?

Where are the ECC flags mapped?

Because I received a bricked device, made a dump from 0x08000000 to end of flash, but somehow this dump works again.

Did I maybe clear some ECC flags during the entire chip erase?
Is there a nice way to trigger ECC errors?

MM..1 · ‎2023-12-11

Is realy hard when you dont have code for search bugs .... but simply you can reflash only code and eeprom part leave untuched or backup it before. Exist many variants, good is stlink utility or programmer...