Handling ECCD error during flash read on stm32l4

seutpxle · ‎2016-10-09

Posted on October 09, 2016 at 20:03

Hello,

I'm building an application with ST STM32L4 that stores information on internal flash (ECC flash on L4). The information is valuable, so the flash programming must be robust to unexpected shut downs including ones caused by an external watchdog (that may reset the MCU unexpectedly at any point in time).

In the reference manual RM0351 chapter 3.3.2 it is written that when ECCD error is detected an NMI is generated by the flash peripheral. What is the proper way to handle such interrupt when ECCD error was detected during reading of a faulty double-word?

This problem occurs when the MCU is being reset while the flash peripheral is in the middle of programming a double-word (so the ECC has not been written yet therefore it's incompatible with the double-word that was being written).

There is no reference of handling such a problem on ST's HAL code or in the reference manual.

Thanks,

MC.

#stm32l4

Lukasz Nowak · ‎2017-08-22

Posted on August 23, 2017 at 01:36

I have just hit exactly the same problem - power cut during flash programming (I am periodically storing log data in the embedded flash).

I looked a bit in the code, and it turns out that HAL_FLASH_IRQHandler() does contain handling of the FLASH_FLAG_ECCD. When I added HAL_FLASH_IRQHandler() to NMI_Handler() it does clear the ECCD error flag, and the program can move on.

This however can have serious consequences. For instance if the ECCD error occurs when reading .text section from flash, you probably do not want to continue. One could add a check that the error is from a known user data address in flash. But that would have to go inside HAL_FLASH.

It would be good if STM could provide an example of handling such an ECCD failure in the official code. Power failure during programming seems like a common-enough scenario to provide an official coverage. With current example code, an MCU is essentially bricked when a flash programming error occurs (it just freezes inside NMI_Handler() on reads).

I would say that it would probably require adding a HAL_FLASH_Read() function, instead of direct access from user code.

Geoffrey1 · ‎2019-01-20

I'm curious how you ended up handling this. I'm also logging data and worry about what might happen in case of an ECC problem due to reset or power glitch. The code to read/write the data log is quite localized so I was considering using a global flag to let the NMI handler "know" that it's ok to move on. (Sort of an "uncritical" section for reading log data).

Geoffrey1 · ‎2019-01-20

A bit more digging. Here's how the STM32L4 eeprom emulation software handles it. The NMI handler calls the following routine to write 0 to the offending double word (the only allowed modification to already programmed memory)

Not quite where I will do it in my application, but a good enough hint on how to proceed. fortunately, 0 is not a legal log value in my case, so it's easy to detect and discard.

/**
  * @brief  Delete corrupted Flash address, can be called from NMI. No Timeout.
  * @param  Address Address of the FLASH Memory to delete
  * @retval EE_Status
  *           - EE_OK: on success
  *           - EE error code: if an error occurs
  */
EE_Status DeleteCorruptedFlashAddress(uint32_t Address)
{
  uint32_t dcachetoreactivate = 0U;
  EE_Status status = EE_OK;
 
  /* Deactivate the data cache if they are activated to avoid data misbehavior */
  if(READ_BIT(FLASH->ACR, FLASH_ACR_DCEN) != RESET)
  {
    /* Disable data cache  */
    __HAL_FLASH_DATA_CACHE_DISABLE();
    dcachetoreactivate = 1U;
  }
 
  /* Set FLASH Programmation bit */
  SET_BIT(FLASH->CR, FLASH_CR_PG);
 
  /* Program double word of value 0 */
  *(__IO uint32_t*)(Address) = (uint32_t)0U;
  *(__IO uint32_t*)(Address+4U) = (uint32_t)0U;
 
  /* Wait programmation completion */
  while(__HAL_FLASH_GET_FLAG(FLASH_FLAG_BSY))
  {
  }
 
  /* Check if error occured */
  if((__HAL_FLASH_GET_FLAG(FLASH_FLAG_OPERR))  || (__HAL_FLASH_GET_FLAG(FLASH_FLAG_PROGERR)) ||
     (__HAL_FLASH_GET_FLAG(FLASH_FLAG_WRPERR)) || (__HAL_FLASH_GET_FLAG(FLASH_FLAG_PGAERR))  ||
     (__HAL_FLASH_GET_FLAG(FLASH_FLAG_SIZERR)) || (__HAL_FLASH_GET_FLAG(FLASH_FLAG_PGSERR)))
  {
    status = EE_DELETE_ERROR;
  }
 
  /* Check FLASH End of Operation flag  */
  if (__HAL_FLASH_GET_FLAG(FLASH_FLAG_EOP))
  {
    /* Clear FLASH End of Operation pending bit */
    __HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_EOP);
  }
 
  /* Clear FLASH Programmation bit */
  CLEAR_BIT(FLASH->CR, FLASH_CR_PG);
 
  /* Flush the caches to be sure of the data consistency */
  if(dcachetoreactivate == 1U)
  {
    /* Reset data cache */
    __HAL_FLASH_DATA_CACHE_RESET();
    /* Enable data cache */
    __HAL_FLASH_DATA_CACHE_ENABLE();
  }
 
  /* Clear FLASH ECCD bit */
  __HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_ECCD);
 
  return status;
}

KStew.1 · ‎2023-09-29

I tried the following and it is not recovering - the NMI handler is getting triggered again and again.

if (__HAL_FLASH_GET_FLAG(FLASH_FLAG_ECCR_ERRORS) != 0) // Check if the ECCD flag is set
{
HAL_FLASH_IRQHandler();
__HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_ECCR_ERRORS);
}

Has anyone successfully recovered from an ECCD error occurring (handling the corrupt byte/s and returning)?

KStew.1 · ‎2023-09-29

Which chip are you using? Could I see your NMI handler? I called the HAL_FLASH_IRQHandler but it isn't resolving the issue.

wogr · ‎2023-10-10

I had a similar problem these days, maybe it helps you to find the solution. Working on STM32G0B1. There was a corrupted flash address that caused a NMI when reading it. But in the debugger I couldn't see any flag set. After writing "0" to this double word the NMI was still triggered. After hours of debugging I found out that the ECCD flag in the ECC2R is set. This register isn't showing in Cube debugger. After clearing this flag the NMI was no longer triggered.

dtarrago · ‎2023-12-21

Hello,

I'm facing the same issue.

Anyone has found a workaround for this?