How to inject ECC Error into RAM on the STM32H75

ajs1288 · ‎2024-09-06

Hi,

I’ve been trying to simulate a RAM ECC single and double error for testing on the STM32H753. I’ve seen these other two posts that have a similar issue to mine but with no clear solution:

https://community.st.com/t5/stm32-mcus-products/how-to-test-handling-of-h7-ecc-in-sram/td-p/105854

https://community.st.com/t5/stm32-mcus-embedded-software/stm32h750-ramecc-testing/td-p/696069

I have referenced this example code to set up monitoring and configuration just for AXI SRAM.

My RAMECC initialization and callback function look like this:

static void MX_RAMECC_Init(void)
{

  /* USER CODE BEGIN RAMECC_Init 0 */

  /* USER CODE END RAMECC_Init 0 */

  /* USER CODE BEGIN RAMECC_Init 1 */

  /* USER CODE END RAMECC_Init 1 */

  /** Initialize RAMECC1 M1 : AXI SRAM
  */
  hramecc1_m1.Instance = RAMECC1_Monitor1;
  if (HAL_RAMECC_Init(&hramecc1_m1) != HAL_OK)
  {
	Error_Handler();
  }
  /* USER CODE BEGIN RAMECC_Init 2 */
  if (HAL_RAMECC_EnableNotification(&hramecc1_m1, (RAMECC_IT_MONITOR_SINGLEERR_R | RAMECC_IT_MONITOR_DOUBLEERR_R)) != HAL_OK)
  {
  	Error_Handler();
  }
  if (HAL_RAMECC_StartMonitor(&hramecc1_m1) != HAL_OK)
  {
  	Error_Handler();
  }
  HAL_NVIC_SetPriority(ECC_IRQn, 0, 0);  // Set the priority level
  HAL_NVIC_EnableIRQ(ECC_IRQn);      	// Enable the interrupt
  /* USER CODE END RAMECC_Init 2 */

}

void HAL_RAMECC_DetectErrorCallback(RAMECC_HandleTypeDef *hramecc)
{
  if ((HAL_RAMECC_GetRAMECCError(hramecc) & HAL_RAMECC_SINGLEERROR_DETECTED)  != 0U)
  {
	RAMECCSingleErrorDetected ++;
  }

  if ((HAL_RAMECC_GetRAMECCError(hramecc) & HAL_RAMECC_DOUBLEERROR_DETECTED)  != 0U)
  {
	RAMECCDoubleErrorDetected ++;
  }

  hramecc->RAMECCErrorCode = HAL_RAMECC_NO_ERROR;
  HAL_GPIO_TogglePin(GPIOE, GPIO_PIN_1);
}

I also put this function into stm32h7xx_it.c as done in the example code:

void ECC_IRQHandler(void)
{
	HAL_RAMECC_IRQHandler(&hramecc);
}

I’ve read through AN5342 and specifically section 2.5 about ECC testing. I’ve attempted to read uninitialized RAM from a cold boot by throwing this in my main function after MX_RAMECC_Init() is called:

// Start and end addresses for AXI SRAM
  volatile uint32_t* baseAddress = 0x24000000;
  uint32_t endAddress = 0x2407FFFF;
  volatile uint32_t currentData;
  while (baseAddress < (volatile uint32_t*)endAddress)
  {
  	// Read the memory at the current address
  	currentData = *baseAddress;
  	baseAddress++;
  }

An ECC error does get detected and ends up calling the error handler and callback when I step through in the debugger, but the status register has a weird value, something like this in binary “01000100000000001110001110000000”. Bit 0, 1, or 2 should be set to 1 indicating either a single or double bit error, but they aren’t. So in HAL_RAMECC_IRQHandler the respective RAMECCErrorCode doesn’t get flagged properly, and then this causes the debugger to pass through the callback without actually counting errors. The debugger also ends up getting stuck on the same address where the error occurred. Is this only the case because I’m reading from uninitialized RAM? Am I doing something wrong here? I want a guaranteed test that passes as it should.

In AN5342 it also mentions that it is possible to create an ECC error by writing the same address twice without erasing between writes. I tried getting this to work but to no success. I couldn’t even get the callback or handler to be called. Here’s an example of something I attempted, this was also put into my main function:

  volatile uint32_t *ptr;
  uint32_t original_value = 0x12345678;  // Some initial value
  uint32_t new_value = 0x87654321;   	// A new value to overwrite the original
  // Example address in SRAM
  ptr = (uint32_t *)0x24077778;
  uint32_t value;
  *ptr = original_value;
  // Write the new value to the same address without erasing
  *ptr = new_value;
  // Reading from this address may now trigger an ECC error
  value = *ptr;

I was able to manipulate the status register and the respective bits for a single and double bit error. I then called the handler to see if everything proceeded as it should. It does and passes through the callback as intended, counting the errors. Here’s what I threw in my main to achieve this:

  hramecc.Instance->SR |= 0x1;
  HAL_RAMECC_IRQHandler(&hramecc);
  hramecc.Instance->SR |= 0x2;
  HAL_RAMECC_IRQHandler(&hramecc);

This passing was reassuring, but this purely is emulating an ECC error. I want to actually inject an error into RAM. I want to simulate an error.

Is there a way to intentionally inject an ECC error into RAM?

Christophe VRIGNAUD · ‎2024-09-26

Hello,

First of all, on the STM32H753xI, the RAM ECC can't be turned off. It can't be enabled/disabled like on the STM32H5 or STM32U5.

You can refer to Guide: Injecting and Handling ECC Errors in RAM and Flash on STM32H7 - STMicroelectronics Community

You will find the description of what to do to detect RAM (but also flash) ECC errors.

At the end of the article, the way of getting a RAM ECC error is given => If the RAM is not initialized (not filled with zeroes) at power on, the RAM content is random. Writing zeroes to the RAM allows to update the ECC information stored for this memory location. But if this initialization is not done before starting the monitor, there is very big chance that an error will be detected very fast when reading from the RAM.

But it's not possible to predict if a single or a double error will be detected.

The other way to generate an error and be sure to get a single or a double error is with the debugger: stop the execution, change manually the content of the RAM and run again.

On the ECC subject: AN5342