cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H7S XSPI1 Memory Fault Despite Errata Workaround Implementation

JenishR
Associate

System Overview

Hardware Configuration

Microcontroller: STM32H7S3L8H6

Memory Architecture

XSPI1 (Flash, MX25UW25645GXDI00): 32MB, accessed via GPIOP/GPIOO

  • Clock: PLL2S source, Prescaler = 3
flash_xspi_handle.Instance = config->instance;
flash_xspi_handle.Init.FifoThresholdByte = 1;
flash_xspi_handle.Init.MemoryMode = HAL_XSPI_SINGLE_MEM;
flash_xspi_handle.Init.MemoryType = HAL_XSPI_MEMTYPE_MACRONIX;
flash_xspi_handle.Init.MemorySize = HAL_XSPI_SIZE_256MB;
flash_xspi_handle.Init.ChipSelectHighTimeCycle = 2;
flash_xspi_handle.Init.FreeRunningClock = HAL_XSPI_FREERUNCLK_DISABLE;
flash_xspi_handle.Init.ClockMode = HAL_XSPI_CLOCK_MODE_0;
flash_xspi_handle.Init.WrapSize = HAL_XSPI_WRAP_NOT_SUPPORTED;
flash_xspi_handle.Init.ClockPrescaler = 3;
flash_xspi_handle.Init.SampleShifting = HAL_XSPI_SAMPLE_SHIFT_NONE;
flash_xspi_handle.Init.DelayHoldQuarterCycle = HAL_XSPI_DHQC_ENABLE;
flash_xspi_handle.Init.ChipSelectBoundary = HAL_XSPI_BONDARYOF_NONE;
flash_xspi_handle.Init.MaxTran = 0;
flash_xspi_handle.Init.Refresh = 0;
flash_xspi_handle.Init.MemorySelect = HAL_XSPI_CSSEL_NCS1;

XSPI2 (HyperRAM, W956D8MBYA5I): 8MB, accessed via GPION

  • Clock: PLL2S source, Prescaler = 1
ram_xspi_handle.Instance = config->instance;
ram_xspi_handle.Init.FifoThresholdByte = 1;
ram_xspi_handle.Init.MemoryMode = HAL_XSPI_SINGLE_MEM;
ram_xspi_handle.Init.MemoryType = HAL_XSPI_MEMTYPE_HYPERBUS;
ram_xspi_handle.Init.MemorySize = HAL_XSPI_SIZE_64MB;
ram_xspi_handle.Init.ChipSelectHighTimeCycle = 5;
ram_xspi_handle.Init.FreeRunningClock = HAL_XSPI_FREERUNCLK_DISABLE;
ram_xspi_handle.Init.ClockMode = HAL_XSPI_CLOCK_MODE_0;
ram_xspi_handle.Init.WrapSize = HAL_XSPI_WRAP_NOT_SUPPORTED;
ram_xspi_handle.Init.ClockPrescaler = 1;
ram_xspi_handle.Init.SampleShifting = HAL_XSPI_SAMPLE_SHIFT_NONE;
ram_xspi_handle.Init.DelayHoldQuarterCycle = HAL_XSPI_DHQC_DISABLE;
ram_xspi_handle.Init.ChipSelectBoundary = HAL_XSPI_BONDARYOF_NONE;
ram_xspi_handle.Init.MaxTran = 0;
ram_xspi_handle.Init.Refresh = 355;
ram_xspi_handle.Init.MemorySelect = HAL_XSPI_CSSEL_NCS1;

MPU Region Map

  • I-Cache: Enabled (instruction fetch optimization)
  • D-Cache: Enabled (data fetch optimization)
  • Both caches critical for maintaining performance with external memory access at 600 MHz

JenishR_1-1763538684006.png

System Clocking

  • CPU Core: 600 MHz
  • HCLK (AHB): 300 MHz
  • Both XSPI instances: PLL2S clock source
    • PLL2S = 200MHz

    • Hyperram = 100 Mhz

    • Flash = 50 MHz

Power Domains

  • I/O Supply (XSPI signals): 1.8V
  • Core Supply: 1.8V

Issue Description

I am getting garbage data reading from External Flash Memory during normal execution of application under some circumstances. In depth details at the end.

As per recommendation, I have implemented the I/O compensation cell workaround as described in the STM32H7Rxx/7Sxx errata sheet for the duty-cycle skew issue. However, I am still experiencing memory faults when accessing data from the external flash on XSPI1 (sometimes).

Temperature Correlation:

The error appears significantly more frequently when the MCU is heated, even slightly above room temperature.This strongly suggests ongoing data corruption issues during XSPI1 communication.

Implemented Workaround

I have implemented the compensation cell adjustment as follows as per ERRATA mentioned HERE:

void xspi_configure_compensation_cells(void)
{
    const board_config_t *board = board_defs_get_config();

    // Configure compensation cells for XSPI1
    if (board->xspi1.instance != NULL)
    {
        HAL_SBS_ConfigCompensationCell(SBS_IO_XSPI1_CELL, SBS_IO_CELL_CODE, 0U, 0U);
        HAL_SBS_EnableCompensationCell(SBS_IO_XSPI1_CELL);
        while (HAL_SBS_GetCompensationCellReadyStatus(SBS_IO_XSPI1_CELL_READY) != 1U)
        {
            // Wait for compensation cell ready
        }

        // ERRATA: I/O compensation duty-cycle skew workaround
        // Apply compensation values adjustment to prevent duty-cycle skew and jitter
        // Read boot-time compensation values from SBS_CCVALR register
        uint32_t boot_psrc=(SBS->CCVALR >> SBS_CCVALR_XSPI1_PSRC_Pos) & 0xFU;
        uint32_t boot_nsrc=(SBS->CCVALR >> SBS_CCVALR_XSPI1_NSRC_Pos) & 0xFU;

        // Apply compensation adjustment per errata specification:
        // SW_Psrc=boot_PSRC - 2
        // SW_Nsrc=boot_NSRC + 2
        int32_t adj_psrc=(int32_t)boot_psrc + XSPI_COMP_PSRC_ADJUSTMENT;
        int32_t adj_nsrc=(int32_t)boot_nsrc + XSPI_COMP_NSRC_ADJUSTMENT;

        // Clamp values to valid 4-bit range [0, 15]
        if (adj_psrc < 0)
        {
            adj_psrc=0;
        }
        if (adj_psrc > 15)
        {
            adj_psrc=15;
        }
        if (adj_nsrc < 0)
        {
            adj_nsrc=0;
        }
        if (adj_nsrc > 15)
        {
            adj_nsrc=15;
        }

        // Write adjusted compensation values to SBS_CCSWVALR register
        HAL_SBS_ConfigCompensationCell(SBS_IO_XSPI1_CELL, SBS_IO_REGISTER_CODE, adj_nsrc, adj_psrc);
        HAL_SBS_EnableIOSpeedOptimize(SBS_IO_XSPI1_HSLV);
    }

    // Configure compensation cells for XSPI2
    if (board->xspi2.instance != NULL)
    {
        HAL_SBS_ConfigCompensationCell(SBS_IO_XSPI2_CELL, SBS_IO_CELL_CODE, 0U, 0U);
        HAL_SBS_EnableCompensationCell(SBS_IO_XSPI2_CELL);
        while (HAL_SBS_GetCompensationCellReadyStatus(SBS_IO_XSPI2_CELL_READY) != 1U)
        {
            // Wait for compensation cell ready
        }

        // ERRATA: I/O compensation duty-cycle skew workaround
        // Apply compensation values adjustment to prevent duty-cycle skew and jitter
        // Read boot-time compensation values from SBS_CCVALR register
        uint32_t boot_psrc=(SBS->CCVALR >> SBS_CCVALR_XSPI2_PSRC_Pos) & 0xFU;
        uint32_t boot_nsrc=(SBS->CCVALR >> SBS_CCVALR_XSPI2_NSRC_Pos) & 0xFU;

        // Apply compensation adjustment per errata specification:
        // SW_Psrc=boot_PSRC - 2
        // SW_Nsrc=boot_NSRC + 2
        int32_t adj_psrc=(int32_t)boot_psrc + XSPI_COMP_PSRC_ADJUSTMENT;
        int32_t adj_nsrc=(int32_t)boot_nsrc + XSPI_COMP_NSRC_ADJUSTMENT;

        // Clamp values to valid 4-bit range [0, 15]
        if (adj_psrc < 0)
        {
            adj_psrc=0;
        }
        if (adj_psrc > 15)
        {
            adj_psrc=15;
        }
        if (adj_nsrc < 0)
        {
            adj_nsrc=0;
        }
        if (adj_nsrc > 15)
        {
            adj_nsrc=15;
        }

        // Write adjusted compensation values to SBS_CCSWVALR register
        HAL_SBS_ConfigCompensationCell(SBS_IO_XSPI2_CELL, SBS_IO_REGISTER_CODE, adj_nsrc, adj_psrc);
        HAL_SBS_EnableIOSpeedOptimize(SBS_IO_XSPI2_HSLV);
    }
}

Root Cause Analysis of Main Issue:

When the application runs, I encounter hard fault with the following characteristics:

Please refer to following capture of Fault capture, which is explained in detail below:
IMAGE 1 :
JenishR_1-1763535500728.png
IMAGE 2 :
JenishR_6-1763536725787.png

Understanding the Memory Fault Above

This is a pointer corruption issue caused by corrupted data being read from external flash memory. Let me break down the failure sequence:

The MCU tries to execute following C Code, and triggers fault at executing

__HAL_SD_DISABLE_IT(_cyhal_sdio_handle, SDMMC_IT_SDIOIT);

void stm32_cyhal_sdio_irq_handler(void)
{
    uint32_t sta_reg = _cyhal_sdio_handle->Instance->STA;
    cyhal_sdio_t* obj = (cyhal_sdio_t*)_cyhal_sdio_handle->Context;

    if ((_cyhal_sdio_handle != NULL) &&
        (__HAL_SD_GET_FLAG(_cyhal_sdio_handle, SDMMC_STA_SDIOIT) != RESET))
    {
        ((cyhal_sdio_event_callback_t)obj->callback)(obj->callback_arg,
                                                     CYHAL_SDIO_CARD_INTERRUPT);

        /* Clear the interrupt */
        __HAL_SD_CLEAR_FLAG(_cyhal_sdio_handle, SDMMC_FLAG_SDIOIT);

        /* Mask interrupt, to be unmasked later by Tx Path */
        __HAL_SD_DISABLE_IT(_cyhal_sdio_handle, SDMMC_IT_SDIOIT); <- FAULT AT THIS LINE
......
}

The Execution Flow

Understanding the execution flow of above code with Disassembly:

JenishR_0-1763538840393.png

 

This single line of C code __HAL_SD_DISABLE_IT(_cyhal_sdio_handle, SDMMC_IT_SDIOIT) compiles to the four assembly instructions that are failing catastrophically (check image 2 for clear reference):

 

ldr r3, [pc, #196] // Load address of _cyhal_sdio_handle
ldr r3, [r3, #0] // Dereference _cyhal_sdio_handle
ldr r3, [r3, #0] // Access _cyhal_sdio_handle->Instance
ldr r2, [r3, #60] // Read Instance->MASK (at offset 60)

Detailed Breakdown

Pointer Dereferences

1. Get the address of the global variable _cyhal_sdio_handle:

ldr r3, [pc, #196]
// Equivalent to: &_cyhal_sdio_handle
// R3 = 0x24021480 (address where the handle pointer is stored)

2. Dereference to get the handle value:

ldr r3, [r3, #0]
// Equivalent to: _cyhal_sdio_handle
// R3 = *0x24021480 = 0x24001b04 (the actual sdio handle)

3. Access the Instance member (first member of the struct, offset 0):

ldr r3, [r3, #0]
// Equivalent to: _cyhal_sdio_handle->Instance
// R3 = *0x24001b04 = 0x48002400 (Instance address)

4. Read the STA register (at offset 60 bytes in SDMMC_TypeDef):

ldr r2, [r3, #60]
// Equivalent to: _cyhal_sdio_handle->Instance->MASK
// R2 = *(0x48002400 + 60) = 0x40013a (MASK value) 

Why This Particular Line Fails

The Corruption Point

The very first instruction reads from the literal pool in external flash:

ldr r3, [pc, #196]  // Reading from 0x900D1A78 in external flash

This literal pool contains the address of _cyhal_sdio_handle (0x24021480), but due to flash corruption, it reads UNKNOWNdata instead which starts the domino effect.

The Cascade Effect

JenishR_0-1763537934782.png

 

Why This Causes a Fault

The corrupted address 0xFA7BAFFF points to unmapped/protected memory space, triggering the memory management unit (MMU/MPU) to generate a hard fault when the CPU attempts to access it.

The corrupted pointer value (0xfa7bafff) indicates that data being read from the external flash via XSPI1 is incorrect. This is not a pointer that exists in my code - it's garbage data resulting from communication errors.

 

VIDEO

Video Showing Corrupted R3 along with Disassembly View Showing Call Stack is HERE

 

Thank you

ANY GUIDANCE would be greatly appreciated. The thermal sensitivity suggests the compensation workaround is not fully addressing the signal integrity issues at the I/O level. Also, there might be different problem as well.

1 REPLY 1
STOne-32
ST Employee

Dear @JenishR ,

First , thank you for the detailed description , including the  code snippets and Video. At first analysis, it is unlikely your are facing the errata case as your External Flash is running at 50MHz and RAM at 100MHz. The case of errata is seen only when running at 150MHZ up to 200MHz in DDR mode . As the temperature is affecting your case, I would suggest to check and share your schematics and PCB and the compatibility with the external memories .Let us know.

Regards,

STOne-32