cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H755 RAMECC DTCM unexpected behavior

tkee2023
Associate

Reference documents: STM32H755 reference manual - RM0399, ECC implementation guide for STM devices - AN5342

I have been implementing a handler for errors triggered by RAMECC on an STM32H755 Nucleo dev board. Testing of the handler has been conducted by initializing RAMECC interrupts and error latching  (monitor specific, the "global interrupts" are not enabled to maintain control on a per monitor basis) and then reading uninitialized memory regions until an interrupt is triggered.

In my testing, all of the 32 bit regions, the AXI-SRAM and ITCM-RAM behave as expected - the word offset is latched into the FAR register, and the 1 or 2 32 bit words are latched in the FDRL and FDRH registers as appropriate (FDRH is unused in the 32 bit memory space). Correctness of the FDRL and FDRH registers can be verified by examining the memory that is referenced by the FAR offset (this offset points to a word not a byte, so it needs to be scaled to get the byte address - see AN5342). If a double bit flip is found, the FDRH/FDRL registers latch the current value at the address. If a single bit flip is found the corrected value is latched. The corrected value differs only slightly, so verification that the FAR value is accurate can still be done via comparison.

The ECC monitors (ECC domain 1, monitors 3 and 4) for the DTCM, however, do not behave as expected. The documentation implies that the two monitors split responsibility in monitoring the DTCM space, i.e. monitor 3 is responsible for the first 64 KiB of RAM, while monitor 4 is responsible for the second 64 KiB of RAM. This is important as it impacts the word offset - monitor 3's failing address register value is an offset that should be added to the start of that region (0x20000000), while monitor 4's FAR value should be added to the second region (0x20010000). Furthermore, since the DTCM is a 64 bit memory space, any ECC error should cause the monitors to latch 2 32 bit words in the FDRL and FDRH registers.

In testing, examining the fault registers associated with the DTCM monitors shows the following behaviors. First, the FDRH registers are never set - they are locked to 0x00. Second, when an ECC error is detected by either monitor the other monitor usually (occasionally the monitors “desync�?, with different FARs being recorded) also detects an error, with both recording the same offset in their FAR registers. Examining the memory via the memory browser at the offset added to the base address of 0x20000000 shows that the value latched by monitor 3 in its FDRL is present at the first 4 bytes pointed to by its FAR, while the next 4 bytes are present in the FDRL latched by monitor 4 (see attached image). This is unexpected, as the base address for monitor 4 should be 0x20010000, yet the data at the address using 0x20010000 as the offset does not resemble the latched value at all. Furthermore, when testing addresses higher in the memory space, both monitors' FAR exceeds the 64 KiB length limit. Instead the correct data is found by assuming that both monitors cover 128 KiB of memory and use 0x20000000 as the base offset, with monitor 4 having a further 4 byte offset.

The monitors’ offsets point to 64 bit words, yet they behave as if they are 32 bit word monitors (acting as the SRAM monitors) that cover the entire 128 KiB space, with monitor 3 recording the first 32 bits and monitor 4 recording the last 32 bits.

I have not been able to find any references to this issue online. I suspect this may be either a result of an error in documentation or it may be an undocumented hardware bug. Has anyone else seen this behavior? Aside from disabling the ECC interrupts in DTCM, the other easy option is to force a reset on any bit flip in the DTCM. The harder approach is to implement some fix where both monitors are checked to get the full 64 bit word to write back. This may be difficult as occasionally the monitors “desync�? as their FARs differ, I will update the post if I find a solution.

1 ACCEPTED SOLUTION

Accepted Solutions
tkee2023
Associate

Update - it appears that I was wrong in some of my assumptions. First, the DTCM is on a 64 bit bus, but the actual accesses are via 2 separate 32 bit busses. The regions covered by the DTCM are interleaved, with the first 4 bytes of each 64 bit word being handled by DTCM0 and the last 4 bytes handled by DTCM1. This is contrast to the SRAM monitors where the memory handled by each monitor is contiguous - SRAM1_0 covers first 64 KiB, SRAM1_1 covers the last 64 KiB. Writing to the address indicated by the FAR via a 32 bit access fixes the issues that I have seen.

View solution in original post

2 REPLIES 2
tkee2023
Associate

Update - it appears that I was wrong in some of my assumptions. First, the DTCM is on a 64 bit bus, but the actual accesses are via 2 separate 32 bit busses. The regions covered by the DTCM are interleaved, with the first 4 bytes of each 64 bit word being handled by DTCM0 and the last 4 bytes handled by DTCM1. This is contrast to the SRAM monitors where the memory handled by each monitor is contiguous - SRAM1_0 covers first 64 KiB, SRAM1_1 covers the last 64 KiB. Writing to the address indicated by the FAR via a 32 bit access fixes the issues that I have seen.

FBL
ST Employee

Hello @Tim Kuzmenkov​,

First, let me thank for you feedback. 

Second, it is referenced in the RM0399 page 111, CPU uses the 2x32-bit DTCM bus for accessing data in the DTCM. The 2x32-bit DTCM bus allows load/load and load/store instruction pairs to be dual-issued on the DTCM memory. 

Also, as you mentioned, in Table 11. ECC controller mapping, we can find the address offset for each DTCM RAM monitor.

Added to that, in AN5342 section 3.1.3 Interpreting FAR, to compute the physical address of the failure use the following formula: Address = 0x2000 0000 + FADD x word size in byte =0x2000 0018. So, I guess it makes sense the fail address corresponds where the error occurred.

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.