cancel
Showing results for 
Search instead for 
Did you mean: 

Disable Flash ECC NMI Issue

Markus8494
Associate II

Hello,

I am currently developing a Bootloader for a STM32H562. The Bootloader checks, if there is a valid application present by calculating a checksum over the application area. To avoid getting ECC errors during the checksum calculation, I disable the ECC NMI for this procedure. (see attached code snippet image "Code_CRC.png")

Summary of the planned procedure:
* disable ECC NMI
* read data from the flash memory
* enable ECC NMI

Unfortunately, the NMI is still generated even though I have disabled its generation by the bit "ECCNMI_MASK_EN" in the "SBS_ECCNMIR" register.

In my test case I write 4 words at the flash address 0x0800'8000. Then I write 4 different words again at this location (without erasing) which seems to cause an ECC double bit error at this location.
Now I reset the processor and execute the procedure in the code snippet. I set a breakpoint at line #111.
With my debugger I can verify that the "ECCNMMI_MASK_EN" bit is set to 1 (=> see attached image "SBS_Regs.png")
The function in line #111 simply does *(uint32*)0x08008000. If I step over this line, the processor gets caught in the NMI handler. The Flash double bit error flag "ECCD" in register "FLASH_ECCDETR" is set. (=> see attached image "FLASH_ECCD_Err.png").

Why is the NMI still generated?

During this procedure, the ICACHE and the Flash prefetch are active. May this cause this issue?
Edit: Tried it with disabled ICACHE and disabled Flash prefetch => does not make a difference.

Thank you,
Markus

1 ACCEPTED SOLUTION

Accepted Solutions
Markus8494
Associate II

I think I may have found the reason why it is not possible to disable the ECC double bit error NMI.

In 7.9.10 of RM0481 (Rev 1) I found the following information:

[...]
When the ECCD the flag is raised, an NMI is generated, it can be masked in SBS registers (SBS flift ECC NMI mask register (SBS_ECCNMIR)) for data access (OTP, data area, RO data).
[...]

I am calculating the checksum in "normal" code space in flash, so I assume that's why setting this bit has no effect in my application. Maybe an ST employee can confirm that.

 

My solution was to handle this with 2 (static) global variables and some code in the NMI handler, similar to the suggestion of @Pavel A. :

volatile bool ignoreEccErr = false;
volatile bool eccDoubleBitErrOccurred = false;

/**
 * NMI handler.
 * Handles ECC errors during Bootloader checksum calculation.
 * Returns from NMI if it is caused by an ECC double bit error and the 'Ignore ECC errors' flag is set.
 * Never returns in all other cases.
 */
void __attribute__((used)) NMI_Handler(void)
{
    /* Check if ECC double bit error occurred if ignoring ECC errors flag is set */
    if (ignoreEccErr && (READ_BIT(FLASH->ECCDETR, FLASH_ECCR_ECCD) != 0U))
    {
        /* Reset the detection flag to enable detecting subsequent double bit ECC errors */
        SET_BIT(FLASH->ECCDETR, FLASH_ECCR_ECCD);
        eccDoubleBitErrOccurred = true;
        return;
    }
    
    while (1)
    {
    }
}

 

The application verification function looks now like:

bool isValidApplication(void)
{
    bool isValid;
    ignoreEccErr = true;
    /* CRC calculation and verification; reads from program flash memory */
    isValid = isCrcCorrect();
    ignoreEccErr = false;
    
    if (eccDoubleBitErrOccurred)
    {
        isValid = false;
    }
    return (isValid);
}

 

In my testcase this solution seems to work. After the NMI returns, the function that read the bad memory, got the bad memory value, adds it to the checksum and continues with the next memory address. The checksum calculation finishes and because of the 'eccDoubleBitErrOccurred' flag, the checksum verification is set to invalid (just in case to avoid an accidently correct CRC with ECC errors in the application area).

View solution in original post

4 REPLIES 4
TDK
Guru

What is the utility of intentionally producing and then ignoring ECC errors?

If you feel a post has answered your question, please click "Accept as Solution".
Markus8494
Associate II

This is just a *test case* for my Bootloader. In normal operation, the Bootloader shall program the whole application including the application size and the checksum and it should work without any problems.

But I think it is possible that, for example, a power failure may occur during a flash program operation. In this case I think it might be possible, that the last written quad-word may be incorrectly programmed.

Because my Bootloader does the application check right after booting, an ECC error would cause an NMI reset (and therefore a boot loop) in case of an ECC error in the application flash area. This would render the device unusable as it would not be possible to download a new application.

So I would like to disable the ECC NMI during the application verification procedure (to avoid NMI reset) and re-enable it, after the verification has been completed.

 

Pavel A.
Evangelist III

NMI is an exception. You can return from it as usual, but modify the PC and other registers as needed to avoid touching bad memory again. Something like setjmp/longjmp. 

Markus8494
Associate II

I think I may have found the reason why it is not possible to disable the ECC double bit error NMI.

In 7.9.10 of RM0481 (Rev 1) I found the following information:

[...]
When the ECCD the flag is raised, an NMI is generated, it can be masked in SBS registers (SBS flift ECC NMI mask register (SBS_ECCNMIR)) for data access (OTP, data area, RO data).
[...]

I am calculating the checksum in "normal" code space in flash, so I assume that's why setting this bit has no effect in my application. Maybe an ST employee can confirm that.

 

My solution was to handle this with 2 (static) global variables and some code in the NMI handler, similar to the suggestion of @Pavel A. :

volatile bool ignoreEccErr = false;
volatile bool eccDoubleBitErrOccurred = false;

/**
 * NMI handler.
 * Handles ECC errors during Bootloader checksum calculation.
 * Returns from NMI if it is caused by an ECC double bit error and the 'Ignore ECC errors' flag is set.
 * Never returns in all other cases.
 */
void __attribute__((used)) NMI_Handler(void)
{
    /* Check if ECC double bit error occurred if ignoring ECC errors flag is set */
    if (ignoreEccErr && (READ_BIT(FLASH->ECCDETR, FLASH_ECCR_ECCD) != 0U))
    {
        /* Reset the detection flag to enable detecting subsequent double bit ECC errors */
        SET_BIT(FLASH->ECCDETR, FLASH_ECCR_ECCD);
        eccDoubleBitErrOccurred = true;
        return;
    }
    
    while (1)
    {
    }
}

 

The application verification function looks now like:

bool isValidApplication(void)
{
    bool isValid;
    ignoreEccErr = true;
    /* CRC calculation and verification; reads from program flash memory */
    isValid = isCrcCorrect();
    ignoreEccErr = false;
    
    if (eccDoubleBitErrOccurred)
    {
        isValid = false;
    }
    return (isValid);
}

 

In my testcase this solution seems to work. After the NMI returns, the function that read the bad memory, got the bad memory value, adds it to the checksum and continues with the next memory address. The checksum calculation finishes and because of the 'eccDoubleBitErrOccurred' flag, the checksum verification is set to invalid (just in case to avoid an accidently correct CRC with ECC errors in the application area).