cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H7S XSPI1 Memory Fault Despite Errata Workaround Implementation [Video Attached]

JenishR
Associate

System Overview

Hardware Configuration

Microcontroller: STM32H7S3L8R6

Memory Architecture

XSPI1 (Flash, MX25UW25645GXDI00): 256MB, accessed via GPIOP/GPIOO

  • Clock: PLL2S source, Prescaler = 3
flash_xspi_handle.Instance = config->instance;
flash_xspi_handle.Init.FifoThresholdByte = 1;
flash_xspi_handle.Init.MemoryMode = HAL_XSPI_SINGLE_MEM;
flash_xspi_handle.Init.MemoryType = HAL_XSPI_MEMTYPE_MACRONIX;
flash_xspi_handle.Init.MemorySize = HAL_XSPI_SIZE_256MB;
flash_xspi_handle.Init.ChipSelectHighTimeCycle = 2;
flash_xspi_handle.Init.FreeRunningClock = HAL_XSPI_FREERUNCLK_DISABLE;
flash_xspi_handle.Init.ClockMode = HAL_XSPI_CLOCK_MODE_0;
flash_xspi_handle.Init.WrapSize = HAL_XSPI_WRAP_NOT_SUPPORTED;
flash_xspi_handle.Init.ClockPrescaler = 3;
flash_xspi_handle.Init.SampleShifting = HAL_XSPI_SAMPLE_SHIFT_NONE;
flash_xspi_handle.Init.DelayHoldQuarterCycle = HAL_XSPI_DHQC_ENABLE;
flash_xspi_handle.Init.ChipSelectBoundary = HAL_XSPI_BONDARYOF_NONE;
flash_xspi_handle.Init.MaxTran = 0;
flash_xspi_handle.Init.Refresh = 0;
flash_xspi_handle.Init.MemorySelect = HAL_XSPI_CSSEL_NCS1;

XSPI2 (HyperRAM, W956D8MBYA5I): 64MB, accessed via GPION

  • Clock: PLL2S source, Prescaler = 1
ram_xspi_handle.Instance = config->instance;
ram_xspi_handle.Init.FifoThresholdByte = 1;
ram_xspi_handle.Init.MemoryMode = HAL_XSPI_SINGLE_MEM;
ram_xspi_handle.Init.MemoryType = HAL_XSPI_MEMTYPE_HYPERBUS;
ram_xspi_handle.Init.MemorySize = HAL_XSPI_SIZE_64MB;
ram_xspi_handle.Init.ChipSelectHighTimeCycle = 5;
ram_xspi_handle.Init.FreeRunningClock = HAL_XSPI_FREERUNCLK_DISABLE;
ram_xspi_handle.Init.ClockMode = HAL_XSPI_CLOCK_MODE_0;
ram_xspi_handle.Init.WrapSize = HAL_XSPI_WRAP_NOT_SUPPORTED;
ram_xspi_handle.Init.ClockPrescaler = 1;
ram_xspi_handle.Init.SampleShifting = HAL_XSPI_SAMPLE_SHIFT_NONE;
ram_xspi_handle.Init.DelayHoldQuarterCycle = HAL_XSPI_DHQC_DISABLE;
ram_xspi_handle.Init.ChipSelectBoundary = HAL_XSPI_BONDARYOF_NONE;
ram_xspi_handle.Init.MaxTran = 0;
ram_xspi_handle.Init.Refresh = 355;
ram_xspi_handle.Init.MemorySelect = HAL_XSPI_CSSEL_NCS1;

MPU Region Map

  • I-Cache: Enabled (instruction fetch optimization)
  • D-Cache: Enabled (data fetch optimization)
  • Both caches critical for maintaining performance with external memory access at 600 MHz

JenishR_0-1763522271836.png

System Clocking

  • CPU Core: 600 MHz
  • HCLK (AHB): 300 MHz
  • Both XSPI instances: PLL2S clock source

Power Domains

  • I/O Supply (XSPI signals): 1.8V
  • Core Supply: 3.3V
  • Enabled power domains: XSPIM1, XSPIM2

Issue Description

I have implemented the I/O compensation cell workaround as described in the STM32H7Rxx/7Sxx errata sheet for the duty-cycle skew issue. However, I am still experiencing memory faults when accessing data from the external flash on XSPI1 (sometimes).

Fault Symptoms

When the application runs, I encounter hard faults with the following characteristics:

Debugger Information:

  • Instruction: ldr r2, [r3, #60]
  • Register r3: 0xfa7bafff
  • Result: Memory fault
  • Image below shows the fault with Disassembly
  •  ldr r3, [pc, #196] is supposed to put 0x24021480 in R3
  • ldr r3, [r3, #0] is supposed to then put the actual address of _cyhal_sdio_handle into r3 which is 0x24001b04 (that clearly gets messed up because previous pc, #196 was not correct)
    JenishR_1-1763522298550.png

The issue shows that:

  • Code is executing from external flash (0x90xxxxxx region)
  • An invalid/corrupted pointer value (0xfa7bafff) is being read from memory (which has no right being there)
  • Attempting to dereference this corrupted pointer causes immediate fault

Temperature Correlation

Critical observation: This error appears significantly more frequently when the external flash IC is heated, even slightly above room temperature. This strongly suggests ongoing data corruption issues during XSPI1 communication.

Implemented Workaround

I have implemented the compensation cell adjustment as follows as per ERRATA mentioned HERE:

void xspi_configure_compensation_cells(void)
{
    const board_config_t *board = board_defs_get_config();

    // Configure compensation cells for XSPI1
    if (board->xspi1.instance != NULL)
    {
        HAL_SBS_ConfigCompensationCell(SBS_IO_XSPI1_CELL, SBS_IO_CELL_CODE, 0U, 0U);
        HAL_SBS_EnableCompensationCell(SBS_IO_XSPI1_CELL);
        while (HAL_SBS_GetCompensationCellReadyStatus(SBS_IO_XSPI1_CELL_READY) != 1U)
        {
            // Wait for compensation cell ready
        }

        // ERRATA: I/O compensation duty-cycle skew workaround
        // Apply compensation values adjustment to prevent duty-cycle skew and jitter
        // Read boot-time compensation values from SBS_CCVALR register
        uint32_t boot_psrc=(SBS->CCVALR >> SBS_CCVALR_XSPI1_PSRC_Pos) & 0xFU;
        uint32_t boot_nsrc=(SBS->CCVALR >> SBS_CCVALR_XSPI1_NSRC_Pos) & 0xFU;

        // Apply compensation adjustment per errata specification:
        // SW_Psrc=boot_PSRC - 2
        // SW_Nsrc=boot_NSRC + 2
        int32_t adj_psrc=(int32_t)boot_psrc + XSPI_COMP_PSRC_ADJUSTMENT;
        int32_t adj_nsrc=(int32_t)boot_nsrc + XSPI_COMP_NSRC_ADJUSTMENT;

        // Clamp values to valid 4-bit range [0, 15]
        if (adj_psrc < 0)
        {
            adj_psrc=0;
        }
        if (adj_psrc > 15)
        {
            adj_psrc=15;
        }
        if (adj_nsrc < 0)
        {
            adj_nsrc=0;
        }
        if (adj_nsrc > 15)
        {
            adj_nsrc=15;
        }

        // Write adjusted compensation values to SBS_CCSWVALR register
        HAL_SBS_ConfigCompensationCell(SBS_IO_XSPI1_CELL, SBS_IO_REGISTER_CODE, adj_nsrc, adj_psrc);
        HAL_SBS_EnableIOSpeedOptimize(SBS_IO_XSPI1_HSLV);
    }

    // Configure compensation cells for XSPI2
    if (board->xspi2.instance != NULL)
    {
        HAL_SBS_ConfigCompensationCell(SBS_IO_XSPI2_CELL, SBS_IO_CELL_CODE, 0U, 0U);
        HAL_SBS_EnableCompensationCell(SBS_IO_XSPI2_CELL);
        while (HAL_SBS_GetCompensationCellReadyStatus(SBS_IO_XSPI2_CELL_READY) != 1U)
        {
            // Wait for compensation cell ready
        }

        // ERRATA: I/O compensation duty-cycle skew workaround
        // Apply compensation values adjustment to prevent duty-cycle skew and jitter
        // Read boot-time compensation values from SBS_CCVALR register
        uint32_t boot_psrc=(SBS->CCVALR >> SBS_CCVALR_XSPI2_PSRC_Pos) & 0xFU;
        uint32_t boot_nsrc=(SBS->CCVALR >> SBS_CCVALR_XSPI2_NSRC_Pos) & 0xFU;

        // Apply compensation adjustment per errata specification:
        // SW_Psrc=boot_PSRC - 2
        // SW_Nsrc=boot_NSRC + 2
        int32_t adj_psrc=(int32_t)boot_psrc + XSPI_COMP_PSRC_ADJUSTMENT;
        int32_t adj_nsrc=(int32_t)boot_nsrc + XSPI_COMP_NSRC_ADJUSTMENT;

        // Clamp values to valid 4-bit range [0, 15]
        if (adj_psrc < 0)
        {
            adj_psrc=0;
        }
        if (adj_psrc > 15)
        {
            adj_psrc=15;
        }
        if (adj_nsrc < 0)
        {
            adj_nsrc=0;
        }
        if (adj_nsrc > 15)
        {
            adj_nsrc=15;
        }

        // Write adjusted compensation values to SBS_CCSWVALR register
        HAL_SBS_ConfigCompensationCell(SBS_IO_XSPI2_CELL, SBS_IO_REGISTER_CODE, adj_nsrc, adj_psrc);
        HAL_SBS_EnableIOSpeedOptimize(SBS_IO_XSPI2_HSLV);
    }
}

Root Cause Analysis

The corrupted pointer value (0xfa7bafff) indicates that data being read from the external flash via XSPI1 is incorrect. This is not a pointer that exists in my code - it's garbage data resulting from communication errors.

The sequence of events:

  1. CPU executes code from external flash successfully (address 0x900d19b8)
  2. CPU attempts to load data from external flash (a pointer or structure member)
  3. XSPI1 returns corrupted data instead of correct value
  4. CPU loads corrupted value (0xfa7bafff) into register r3
  5. CPU attempts to dereference this invalid pointer
  6. Mem fault occurs

Video Showing Corrupted R3 along with Disassembly View Showing Call Stack is HERE

ANY GUIDANCE would be greatly appreciated. The thermal sensitivity suggests the compensation workaround is not fully addressing the signal integrity issues at the I/O level. Also, there might be different problem as well.

0 REPLIES 0