cancel
Showing results for 
Search instead for 
Did you mean: 

STM32U5 OSPI BusFault in memory mapped mode on unaligned writes

burn_
Associate III

Good day,

I'm experiencing an unexplained bus fault when the MCU writes to unaligned (not aligned to 32-bit) addresses in the OSPI memory region. I've written a minimal implementation that shows where the issue occurs. See below.

I'm using the STM32U585 with two IS66WVS4M8BLL quad SPI modules. The OctoSPI peripheral is configured in Dual-Quad Mode. See the image for the settings as configured in STM32CubeMX. I use memory mapped mode. I've enabled DQS for writes as per 2.6.1 of the errata. I've tried different MPU cache configurations for this area: no MPU, MPU_DEVICE_nGnRnE, and (MPU_WRITE_THROUGH | MPU_NON_TRANSIENT | MPU_W_ALLOCATE | MPU_R_ALLOCATE).

burn__0-1743753407398.png

Here's the minimal implementation. Note that 8-bit writes work if the previous write was aligned. See comments surrounded by ***, focussing on Test 2.

  // Test 1: Unaligned 8-bit access
  // *** This first test works without issues ***
  volatile uint32_t sramByteLoopIndex;
  volatile __IO uint8_t *mem_addr_byte;
  volatile uint8_t sramTestValByte;

  // Writing Sequence (8-bit, unaligned pattern)
  mem_addr_byte = (uint8_t *)(OCTOSPI1_BASE);
  for (sramByteLoopIndex = 0; sramByteLoopIndex < OSPI_MEM_SIZE_BYTES;
       sramByteLoopIndex++)
  {
    sramTestValByte = (uint8_t)(sramByteLoopIndex &
                                0xFF); // Use lower 8 bits of index as pattern
    *mem_addr_byte = sramTestValByte;
    mem_addr_byte += 1;
  }

  // Reading Sequence (8-bit, unaligned check)
  mem_addr_byte = (uint8_t *)(OCTOSPI1_BASE);
  for (sramByteLoopIndex = 0; sramByteLoopIndex < OSPI_MEM_SIZE_BYTES;
       sramByteLoopIndex++)
  {
    sramTestValByte = (uint8_t)(sramByteLoopIndex & 0xFF);
    assert_param(*mem_addr_byte == sramTestValByte);
    mem_addr_byte += 1;
  }

  // Test 2: Unaligned Copy within SRAM
  // *** Here things are not well ***
  volatile __IO uint8_t *src_addr_byte;
  volatile __IO uint8_t *dst_addr_byte;
  volatile uint8_t test_pattern_byte;
  const uint32_t copy_size_bytes = 512; // Size of data to copy
  // Ensure source and destination are unaligned and distinct
  const uint32_t src_offset = 1; // Unaligned source start
  const uint32_t dst_offset =
      (OSPI_MEM_SIZE_BYTES / 2) + 3; // Unaligned destination start

  // Ensure offsets are valid and don't cause wrap-around issues with copy size
  if ((src_offset + copy_size_bytes < OSPI_MEM_SIZE_BYTES) &&
      (dst_offset + copy_size_bytes < OSPI_MEM_SIZE_BYTES))
  {
    // Dummy aligned write first
    // *** Removing this line causes BusFault on the write in code block 1 below ***
    *((volatile __IO uint8_t *)OCTOSPI1_BASE) = 0; 
    
    // 1. Fill source region with a pattern
    src_addr_byte = (uint8_t *)(OCTOSPI1_BASE + src_offset);
    for (sramByteLoopIndex = 0; sramByteLoopIndex < copy_size_bytes;
         sramByteLoopIndex++)
    {
      test_pattern_byte =
          (uint8_t)((sramByteLoopIndex + 0xAA) & 0xFF); // Arbitrary pattern
      *src_addr_byte = test_pattern_byte; // *** BusFault here on first iteration if the dummy write is commented out, otherwise no issues here ***
      src_addr_byte++;
    }

    // 2. Perform byte-by-byte unaligned copy (Isolated Read/Write)
    src_addr_byte = (uint8_t *)(OCTOSPI1_BASE + src_offset);
    dst_addr_byte = (uint8_t *)(OCTOSPI1_BASE + dst_offset);
    volatile uint8_t temp_byte; // Temporary variable
    for (sramByteLoopIndex = 0; sramByteLoopIndex < copy_size_bytes;
         sramByteLoopIndex++)
    {
      temp_byte = *src_addr_byte;
      __asm__ __volatile__ ("nop"); // Tried adding NOP as per 2.6.10 of errata, even though we are not using DTR
      *dst_addr_byte = temp_byte; // *** BusFault here on first iteration if the dummy aligned write is preset***
      src_addr_byte++;
      dst_addr_byte++;
    }

    // 3. Verify destination region
    // *** We never get here ***
    dst_addr_byte = (uint8_t *)(OCTOSPI1_BASE + dst_offset);
    for (sramByteLoopIndex = 0; sramByteLoopIndex < copy_size_bytes;
         sramByteLoopIndex++)
    {
      test_pattern_byte =
          (uint8_t)((sramByteLoopIndex + 0xAA) & 0xFF); // Expected pattern
      assert_param(*dst_addr_byte == test_pattern_byte);
      dst_addr_byte++;
    }
  }

I can post my memory mapped initialisation code, if needed.

Any suggestions of what I might be missing, or how to move forward?

@Alex - APMemory have you ever seen something like this?

1 REPLY 1
Alex - APMemory
Senior II

Hi, 

I'm not sure how to help.

I've no idea if ISSI QSPI works. 

But also I'm not sure if I understand your configuration well.  You could use our QSPI (APS1604M..., APS6404L..., SOP8/USON8) or our OPI (APS6408L, ...BGA24) and all set up should be available in cube. 

I you use 2 QSPI for some reason I guess you need to use QSPI set up with 2CE to activate one or the other memory.

Regards

Alex

PS: Looking at Mouser, it seems APMemory 64Mb QSPI is half of price for twice the density  than ISSI 32Mb

https://eu.mouser.com/c/?q=IS66WVS4M8BLL

https://eu.mouser.com/c/?q=aps6404L-3SQ