NMI error with STM32H563VGT MCU

massimoperdigo · ‎2025-02-17

Hi everyone!

I’m facing random NMI errors in my project using the STM32H563 MCU, and I’m trying to identify the root cause.

System Overview

My project uses Ethernet (ETH) with NETx Duo and Thread X with static configuration.
No RAMCFG (SRAMx, BKPSRAM) is used, and FLASH is not enabled in CubeMX.
I am not using HSE, but HSI instead, as I previously suspected an HSE-related issue.
Not using I-CHACHE/ D-CACHE (although I know it is highly recommended for the ETH).
Not usign MPU

I am by no means an expert in NMI errors, so I want to ensure I am not overlooking other possible causes.

These errors may occur due to custom memory partitioning?

MEMORY
{
/* RAM       ( xrw) : ORIGIN = 0x20000000 , LENGTH = 640K */
RAM1         ( xrw) : ORIGIN = 0x20000000 , LENGTH = 100K 
RAM_APP      ( xrw) : ORIGIN = 0x20019000 , LENGTH = 210K 
RAM_NEXDUO   ( xrw) : ORIGIN = 0x2004D800 , LENGTH = 330K
FLASH        ( rx ) : ORIGIN = 0x08000000, LENGTH = 946K
COLORS       ( rx ) : ORIGIN = 0x080EC800, LENGTH = 70K
ROOT_CA      ( rx ) : ORIGIN = 0x080FE000, LENGTH =  4K 
DEV_CERT     ( rx ) : ORIGIN = 0x080FF000, LENGTH =  4K
}

How can I debug the exact source of the NMI?

I’d really appreciate any insights, suggestions, or debugging strategies to help pinpoint the issue. Thanks in advance for your help!
I will provide any feedback or relevant information!

SofLit · ‎2025-02-17

Hello,

From your linker file, it seems you are handling some Flash stuff (read/write) in your application:

FLASH        ( rx ) : ORIGIN = 0x08000000, LENGTH = 946K
COLORS       ( rx ) : ORIGIN = 0x080EC800, LENGTH = 70K
ROOT_CA      ( rx ) : ORIGIN = 0x080FE000, LENGTH =  4K 
DEV_CERT     ( rx ) : ORIGIN = 0x080FF000, LENGTH =  4K
}

So first check if you have Flash ECCD (double ECC error) errors detected:

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS:
1 - This is NOT an online support (https://ols.st.com) but a collaborative space.
2 - Please be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help.

Sarra.S · ‎2025-02-17

Hello @massimoperdigo,

Also, from searching the RM0481, NMIs are linked to 3 cases:

Flash ECC double error: check Handling ECC errors in STM32H5 series: Reading unw... - STMicroelectronics Community
If an HSE clock failure occurs when the CSS is enabled, the CSS generates an interrupt that
causes the automatic generation of an NMI (since you're not using HSE, I guess this one is eliminated)
System configuration, boot, and security (SBS)

In each case, you need to monitor the status of the relevant registers when the NMI occurs!

Also, you can check this article in case you'll be using MPU in the future: How to avoid a HardFault when ICACHE is enabled on... - STMicroelectronics Community

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

massimoperdigo · ‎2025-02-17

Hello,

Thank you for your quick response.

Why would I have this type of error?

one way to check it would be?:

void NMI_Handler(void) {
    // Check if an ECC double error occurred in SRAM2, SRAM3, or BKPSRAM
    // although i do not have this type of RAM activated
    if (RAMCFG->MISR & RAMCFG_MISR_DED) {
        uint32_t error_address = RAMCFG->MDEAR;  // Read the failing address
        RAMCFG->MICR |= RAMCFG_MICR_CDED;       // Clear the ECC error flag

        SEGGER_RTT_printf(0, "SRAM ECC Double Error at 0x%08X\n", error_address);
    }

    // If the NMI was caused by HSE clock security failure
    // the same as before, I am using HSI
    if (RCC->CIFR & RCC_CIFR_HSECSSF) {
        RCC->CICR |= RCC_CICR_HSECSSC;  // Clear HSE clock security flag
    }

    // Handle FLASH ECC Double Error (ECCD)
    if (FLASH->ECCR & FLASH_ECCR_ECCD) {
        uint32_t error_address = FLASH->ECCDR;  // Read failing address from ECCDR
        // Clear the FLASH ECC error flag
        FLASH->ECCR |= FLASH_ECCR_ECCD;
        SEGGER_RTT_printf(0, "FLASH ECC Double Error at 0x%08X\n", error_address);

        // If the error is caused by an unprogrammed OTP read:
        if ((error_address >= 0x08FFF000) && (error_address <= 0x08FFF7FF)) {
            SEGGER_RTT_printf(0, "Virgin OTP read error detected!\n");
        }
    }
}

Thank you!

SofLit · ‎2025-02-17

Hello,

@massimoperdigo wrote:

Why would I have this type of error?

Maybe you written to the Flash without erasing it.

Also refer to this article pointed out by @Sarra.S

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS:
1 - This is NOT an online support (https://ols.st.com) but a collaborative space.
2 - Please be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help.

massimoperdigo · ‎2025-02-17

Hello, Sarra,

Thanks for your point of view!

I was wondering how can I monitor the last error:

System configuration, boot, and security (SBS)

Thank you!

massimoperdigo · ‎2025-02-17

Oh, thank you for this article!

Do you mean writing the Flash when flashing with the debugger?
In the end, everything is stored in the Flash with attributes, except for the ROOT_CA, which the first 6 bytes are flashed with jlink tools like:

device STM32H563VG
SelectInterface SWD
Speed 4000
erase

; Load application
loadfile build/Program.hex

; Load MAC address binary file (6 bytes)
loadfile build/mac_address.bin, 0x080FE000

; Verify contents in memory
mem 0x080FE000, 20

As you can see, I always do an erase before flashing.

massimoperdigo · ‎2025-02-19

hey, Sarra

Could you please point if I am accessing the registers correctly with this implementation?

Is there something that is not correct or missing?

void NMI_Handler(void) {
    // Check if an ECC double error occurred in SRAM2, SRAM3, or BKPSRAM
    // although i do not have this type of RAM activated
    if (RAMCFG->MISR & RAMCFG_MISR_DED) {
        uint32_t error_address = RAMCFG->MDEAR;  // Read the failing address
        RAMCFG->MICR |= RAMCFG_MICR_CDED;       // Clear the ECC error flag

        SEGGER_RTT_printf(0, "SRAM ECC Double Error at 0x%08X\n", error_address);
    }

    // If the NMI was caused by HSE clock security failure
    // the same as before, I am using HSI
    if (RCC->CIFR & RCC_CIFR_HSECSSF) {
        RCC->CICR |= RCC_CICR_HSECSSC;  // Clear HSE clock security flag
    }

    // Handle FLASH ECC Double Error (ECCD)
    if (FLASH->ECCR & FLASH_ECCR_ECCD) {
        uint32_t error_address = FLASH->ECCDR;  // Read failing address from ECCDR
        // Clear the FLASH ECC error flag
        FLASH->ECCR |= FLASH_ECCR_ECCD;
        SEGGER_RTT_printf(0, "FLASH ECC Double Error at 0x%08X\n", error_address);

        // If the error is caused by an unprogrammed OTP read:
        if ((error_address >= 0x08FFF000) && (error_address <= 0x08FFF7FF)) {
            SEGGER_RTT_printf(0, "Virgin OTP read error detected!\n");
        }
    }
}

Thanks

massimoperdigo · ‎2025-02-19

Hi, SofLit.

I have tried what Sarra explain in they post:

void NMI_Handler(void)
{
  if((FLASH->ECCDR && 0xFF))
  {
    //the memory is empty 
    //ECC error due to access to uninitialized memory
    
    //Clear the ECCD flag
    FLASH->ECCDETR |= (1<<31);
  }
  else
  {
    //ECC error detected a true failure
    while (1)
    {
    }
  }
}

However, when the NMI comes in, the first condition is not met, and my program gets stuck.
However, I have tried to force the clear of the ECCD flag, the NMI seems to not pop up again.

void NMI_Handler(void)
{
  FLASH->ECCDETR |= (1<<31);
}

Could you please explain me this to me? I am not able to fully understand it.

Thanks!