x-cube-eeprom - EE_Init() can I remove the step 8 ?

Giambar · ‎2023-12-01

function : EE_Status EE_Init(EE_Erase_type EraseType)

at the end write 0 in flash:

/*********************************************************************/

/* Step 8: Perform dummy write '0' to get rid of potential */

/* instability of line value 0xFFFFFFFF consecutive to a */

/* reset during write here */

/* Only needed if recovery transfer did not occured */

/*********************************************************************/

if (recoverytransfer == 0U)

{

status = VerifyPagesFullWriteVariable(0U, 0U);

I have got problem with my system where there are frequently reset. These sometimes cause an ECCD error.

For this reason I'd like to remove this step.

What do you suggest ?

Bob S · ‎2023-12-01

remove it at your own risk. That step is to guard against the case of a reset (or power failure) during a FLASH write operation such that the FLASH location is only partially programmed. It may then read as 0xFFFFFFFFFFFFFFFF but in reality some bits may be borderline programmed to zeros. If you try to program that location (again) with a non-zero value, some bits that you expect to be "1" may end up crossing that borderline threshold and read as "0" and you get corrupted data. This was explained better by one of the ST employees a while back.

The drawback is that every time your program runs, it writes to the next available location in the simulated EEPROM area, thus causing more FLASH writes than you may expect over the lifetime of the device. But that is the tradeoff for trying to use FLASH as EEPROM.

Giambar · ‎2023-12-02

Hi Bob, thank you for your answer.

I understand that if I don't do this write to zero, the first write that is done can be wrong.

In my case, a solution for the first write, could be :

1- to write the data

2- to read back the data

3- if it does not coincide with the written data, execute EE_DeleteCorruptedAddress and repeat the writing.

What do you think? Just to know if I've understood the matter.

Piranha · ‎2023-12-02

Verification and invalidation+repetition in case of failure on a write operation is exactly the solution, which should have been implemented in that library. Wearing out a flash on every initialization, like ST are doing, is a terribly bad "solution". If you implement this, you can safely remove the detrimental "Step 8". Just take a note that at any time the read operation can potentially detect a CRC mismatch, return EE_NO_DATA and your application code has to deal with it properly.

Bob S · ‎2023-12-03

Glossing over the rant, as @Piranha said:

> Just take a note that at any time the read operation can potentially detect a CRC mismatch, return EE_NO_DATA and your application code has to deal with it properly.

Because even if you do read back the value you wrote and it matches, it may not STAY that way. Partially programmed bits may read as "1" for a while, and then start reading as "0" some (indeterminate) time later. ST's method avoids this. If your code can recover from that, then, sure, skip step 8. Maybe store 2 copies of each "thing".

Giambar · ‎2023-12-03

At this point the best solution for my application is not to write 0 every time it is turned on, but only just before the first data has to be written. In this way I reduce the reset problems and the problem with partially programmed bits.

Thank you all for helping me understand this aspect of eeprom emulation!

Piranha · ‎2023-12-05

I know it and that is exactly the reason why the CRC is there. And ST's code has a CRC and does validate it on every read. Therefore the "faded" bits will be detected with high enough probability and the particular record will be invalidated. That is why I reminded that the application code has to deal with such situation gracefully.

SBone.3 · ‎2024-01-16

Hi, I have the same issue that @Giambar reported. A power failure during step 8 of EE_Init() caused an ECCD error, which ended up triggering an NMI Interrupt.

The code I'm implementing in the NMI handler is the one implemented by ST to handle such cases, but it seems bugged since the program ended up in the while(1) at the end of the handler and even a power cycle was not enough to address the issue (basically the MCU was forever stuck there).

Here's the code:

void NVMSP_NMIHandlerCb(void)
{
    uint32_t Address;
    /* Check if NMI is due to flash ECCD (error detection) */
    if (__HAL_FLASH_GET_FLAG(FLASH_FLAG_ECCD)) {
        if (EE_IsCleanupPhase()) {
            if ((EE_GetAddressRead() >= START_PAGE_ADDRESS) && (EE_GetAddressRead() <= END_EEPROM_ADDRESS)) {
                /* Delete the corrupted flash address */
                Address = (uint32_t) (EE_GetAddressRead() & 0xFFFFFFF0);
                if (EE_DeleteCorruptedFlashAddress((uint32_t) Address) == EE_OK) {
                    /* Resume execution if deletion succeeds */
                    return;
                }
                /* If we do not succeed to delete the corrupted flash address */
                /* This might be because we try to write 0 at a line already considered at 0 which is a forbidden operation */
                /* This problem triggers PROGERR, PGAERR and PGSERR flags */
                else {
                    /* We check if the flags concerned have been triggered */
                    if ((__HAL_FLASH_GET_FLAG(FLASH_FLAG_PROGERR)) && (__HAL_FLASH_GET_FLAG(FLASH_FLAG_PGAERR)) && (__HAL_FLASH_GET_FLAG(FLASH_FLAG_PGSERR))) {
                        /* If yes, we clear them */
                        __HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_PROGERR);
                        __HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_PGAERR);
                        __HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_PGSERR);

                        /* And we exit from NMI without doing anything */
                        /* We do not invalidate that line because it is not programmable at 0 till the next page erase */
                        /* The only consequence is that this line will trigger a new NMI later */
                        return;
                    }
                }
            }
        }
        else {
            __HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_ECCD);
            return;
        }
    }

    /* Go to infinite loop when NMI occurs in case:
     - ECCD is raised in eeprom emulation flash pages but corrupted flash address deletion fails (except PROGERR, PGAERR and PGSERR)
     - ECCD is raised out of eeprom emulation flash pages
     - no ECCD is raised */

    /* Go to infinite loop when NMI occurs */
    while (1) {

    }
}

debugging the issue, i found out that the following function failed (due to the fact that the offending flash location was already at 0):

EE_DeleteCorruptedFlashAddress((uint32_t) Address) == EE_OK

the code fell back to:

if ((__HAL_FLASH_GET_FLAG(FLASH_FLAG_PROGERR)) && (__HAL_FLASH_GET_FLAG(FLASH_FLAG_PGAERR)) && (__HAL_FLASH_GET_FLAG(FLASH_FLAG_PGSERR))) {

but also this check failed because ONLY the PROGERR flag was active in the flash status registers, not all 3 of them.

This caused the function to fall into the while(1) forever.

I think that the line above should be changed to (if there's at least one of these errors, clear them and return) :

if ((__HAL_FLASH_GET_FLAG(FLASH_FLAG_PROGERR)) || (__HAL_FLASH_GET_FLAG(FLASH_FLAG_PGAERR)) || (__HAL_FLASH_GET_FLAG(FLASH_FLAG_PGSERR))) {

What do you think about it?