cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H74 ECC Errors

andy_long
Associate III

Hi,

andy_long_0-1712312892764.png

Some doubts on the ECC functionality on STm32h74x.

  1. How can we know if the data can be corrected for cases DEDF and DEBWDF ? 
  2. I always see the DEDF and SEDF fields set in my case. This bits are set when the control is hit the main(), and then I cleared it forcefully by writing 0x03 to M3SR and M4SR (when the control is at the red line). But, these bits were set immediately when I stepped into the next instruction (see below snapshot). Why is this so ?

andy_long_0-1712315036609.png

andy_long_1-1712315066857.png

 

9 REPLIES 9
SofLit
ST Employee

Hello,

ECC Single error is detected and automatically corrected by HW.

ECC Double error is detected but not corrected.

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS: Be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help/support.

Thank you for your prompt reply @SofLit . I had to update my question, could you please have another look ? 

why does the flowchart check "if the data can be corrected" for Double error if it cannot be corrected ?

You didn't mention the document you're referring to. After some search it founds out that you're referring to the AN5342 "How to use error correction code (ECC) management for internal memories
protection on STM32 MCUs"

@Bubbles could help you on this.

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS: Be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help/support.

Sorry about that @SofLit . Yes, you are right. The document is AN5342.

Hi @andy_long,

the label is not "if the data can be corrected" but "can correct data be obtained". This covers covers cases where the application implements some extra redundancy - for example if all the critical data are stored twice, in two different locations, there's a way to extrapolate that missing data from the remaining data, or the data can be re-computed (for example it's an intermediate result of cryptographic operation).

For sure the SECDED cannot itself recover the location where 2 bit error was detected.

BR,

J

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

Bubbles
ST Employee

PS then regarding the second question - maybe your code is reading some uninitialized memory. Each time the read is done, ECC is checked and the error pops again. Clearing error flags won't help, you need to rewrite that faulty memory location.

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

@Bubbles Thank you for your replies. Point 1 is clear.

The code snippet I showed is before the scheduler is started, which means there is no other code running concurrently. There are memories not initialized but my understanding is that you will get single/double error when you try to read the uninitialized memory. 

In my case, as mentioned before, I am clearing the status registers (but not rewriting the memory location) where the red line shows and a single step in the debugger shows single error and double error bits set. Is this because the previous faulty memory location is not rewritten (but I am not reading that location) ?

Bubbles
ST Employee

Hello @andy_long ,

the source code is not really telling, I'd need to see the disassembly and the CPU registers step by step to have full picture.

But there is not much depth to the ECC, topic, it's really simple. Each time the memory location is read, ECC is checked. The location may be more than one word, it can be even 128bits, depending on memory type. If single error is corrected, it's only corrected on the data read, not in the original location. To prevent double bit error developing, it's advised to use the corrected data and rewrite the faulty one, removing the single bit error. This way, once another error occurs, it will be only single bit again.

You said you are not reading the same location, but two adjacent variables in SRAM may in some cases share the same ECC.

BR,

J

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

andy_long
Associate III

@Bubbles 

Thanks... I will have a look