ITCM and DTCM RAM ECC monitors triggering HAL_RAMECC_DetectErrorCallback

20jmorrison · ‎2024-09-20

MCU:

STM32H753VI

Scenario:

I am working on the error management system for our firmware. Part of that is handling ECC errors in flash and RAM. I have been able to successfully inject and handle flash ECC errors, but am having trouble with RAM ECC.

Problem:

Requirements specify that I must monitor all regions of RAM with ECC enabled. In the .IOC, I enabled monitoring for every available region of RAM under the RAMECC section. In main.c, I enable notifications and start each monitor. I also implemented the HAL_RAMECC_DetectErrorCallback(). However, I have encountered something strange with both RAMECC1_Monitor2 (mapped to ITCM-RAM) and RAMECC1_Monitor4 (mapped to D1TCM-RAM). When stepping through the program with the debugger, I notice that the HAL_RAMECC_DetectErrorCallback() keeps getting triggered by both of those monitors. When checking the status of the RAMECC handle that is passed to the callback, the attributes have the following values:

State: HAL_RAMECC_STATE_RESET

ErrorCode: 0

RAMECCErrorCode: 0

I can not figure out why the HAL_RAMECC_DetectErrorCallback is being triggered if the RAMECC handle is indicating that there is no ECC error.

What I've tried:

Writing all 0's to both ITCM and DTCM regions of ram on startup, before initializing the RAM ECC monitors. Here is the relevant code:

	volatile uint32_t* itcm_ram_base = D1_ITCMRAM_BASE;  // Base address of ITCM-RAM
	volatile uint32_t* dtcm_ram_base = D1_DTCMRAM_BASE;
	uint32_t itcm_ram_size = 0x10000;  // Size of ITCM-RAM in bytes (64KB)
	uint32_t dtcm_ram_size = 0x20000;
	// Iterate through the memory region and initialize with 0
	for (uint32_t i = 0; i < (itcm_ram_size / sizeof(uint32_t)); i++)
	{
		itcm_ram_base[i] = 0;  // Initialize ITCM-RAM with zeroes (or any known value)
	}

	for (uint32_t i = 0; i < (dtcm_ram_size / sizeof(uint32_t)); i++)
	{
		dtcm_ram_base[i] = 0;  // Initialize ITCM-RAM with zeroes (or any known value)
	}

I've noticed that if I run this code, the HAL_RAMECC_DetectErrorCallback is no longer called. However, if I comment this code out and run again, the HAL_RAMECC_DetectErrorCallback is called again (which I would expect), but when checking the RAMECC handle passed to the callback, it indicates that an ECC error has actually been detected.

Question:

- Why are both ITCM and DTCM regions of ram triggering the HAL_RAMECC_DetectErrorCallback even though no ECC error has been detected?

- Why is it that when I initialize both ITCM and DTCM regions of ram, run the program, then comment out the initialization of both ITCM and DTCM regions of ram, an ECC error is then detected?

While I could just leave in the code that sets ITCM and DTCM ram to 0 on startup, I would like to know why only these two regions are causing the HAL_RAMECC_DetectErrorCallback to trigger even when no ECC error is detected.

Thank you!

Bubbles · ‎2024-12-10

Hi @20jmorrison

this could be related to the access size.

ITCM is 64b width and DTCM is 32b width

If an access to this memory is performed with a size lower than the memory width then the memory controller will perform a RD / MODIFY/ WR

As the memory was not initialized the RD is generating an ECC error.

When the same operation is done after a memory initialization (WR 0 in the memory width) the RD is no more generating an error.

It's correct to initialize the memory (WR with the memory width to avoid RD/MODIFY/WR) before enabling the ECC monitoring

BR,

J

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.