cancel
Showing results for 
Search instead for 
Did you mean: 

STM32U5 OSPI BusFault in memory mapped mode on unaligned writes

burn_
Associate III

Good day,

I'm experiencing an unexplained bus fault when the MCU writes to unaligned (not aligned to 32-bit) addresses in the OSPI memory region. I've written a minimal implementation that shows where the issue occurs. See below.

I'm using the STM32U585 with two IS66WVS4M8BLL quad SPI modules. The OctoSPI peripheral is configured in Dual-Quad Mode. See the image for the settings as configured in STM32CubeMX. I use memory mapped mode. I've enabled DQS for writes as per 2.6.1 of the errata. I've tried different MPU cache configurations for this area: no MPU, MPU_DEVICE_nGnRnE, and (MPU_WRITE_THROUGH | MPU_NON_TRANSIENT | MPU_W_ALLOCATE | MPU_R_ALLOCATE).

burn__0-1743753407398.png

Here's the minimal implementation. Note that 8-bit writes work if the previous write was aligned. See comments surrounded by ***, focussing on Test 2.

  // Test 1: Unaligned 8-bit access
  // *** This first test works without issues ***
  volatile uint32_t sramByteLoopIndex;
  volatile __IO uint8_t *mem_addr_byte;
  volatile uint8_t sramTestValByte;

  // Writing Sequence (8-bit, unaligned pattern)
  mem_addr_byte = (uint8_t *)(OCTOSPI1_BASE);
  for (sramByteLoopIndex = 0; sramByteLoopIndex < OSPI_MEM_SIZE_BYTES;
       sramByteLoopIndex++)
  {
    sramTestValByte = (uint8_t)(sramByteLoopIndex &
                                0xFF); // Use lower 8 bits of index as pattern
    *mem_addr_byte = sramTestValByte;
    mem_addr_byte += 1;
  }

  // Reading Sequence (8-bit, unaligned check)
  mem_addr_byte = (uint8_t *)(OCTOSPI1_BASE);
  for (sramByteLoopIndex = 0; sramByteLoopIndex < OSPI_MEM_SIZE_BYTES;
       sramByteLoopIndex++)
  {
    sramTestValByte = (uint8_t)(sramByteLoopIndex & 0xFF);
    assert_param(*mem_addr_byte == sramTestValByte);
    mem_addr_byte += 1;
  }

  // Test 2: Unaligned Copy within SRAM
  // *** Here things are not well ***
  volatile __IO uint8_t *src_addr_byte;
  volatile __IO uint8_t *dst_addr_byte;
  volatile uint8_t test_pattern_byte;
  const uint32_t copy_size_bytes = 512; // Size of data to copy
  // Ensure source and destination are unaligned and distinct
  const uint32_t src_offset = 1; // Unaligned source start
  const uint32_t dst_offset =
      (OSPI_MEM_SIZE_BYTES / 2) + 3; // Unaligned destination start

  // Ensure offsets are valid and don't cause wrap-around issues with copy size
  if ((src_offset + copy_size_bytes < OSPI_MEM_SIZE_BYTES) &&
      (dst_offset + copy_size_bytes < OSPI_MEM_SIZE_BYTES))
  {
    // Dummy aligned write first
    // *** Removing this line causes BusFault on the write in code block 1 below ***
    *((volatile __IO uint8_t *)OCTOSPI1_BASE) = 0; 
    
    // 1. Fill source region with a pattern
    src_addr_byte = (uint8_t *)(OCTOSPI1_BASE + src_offset);
    for (sramByteLoopIndex = 0; sramByteLoopIndex < copy_size_bytes;
         sramByteLoopIndex++)
    {
      test_pattern_byte =
          (uint8_t)((sramByteLoopIndex + 0xAA) & 0xFF); // Arbitrary pattern
      *src_addr_byte = test_pattern_byte; // *** BusFault here on first iteration if the dummy write is commented out, otherwise no issues here ***
      src_addr_byte++;
    }

    // 2. Perform byte-by-byte unaligned copy (Isolated Read/Write)
    src_addr_byte = (uint8_t *)(OCTOSPI1_BASE + src_offset);
    dst_addr_byte = (uint8_t *)(OCTOSPI1_BASE + dst_offset);
    volatile uint8_t temp_byte; // Temporary variable
    for (sramByteLoopIndex = 0; sramByteLoopIndex < copy_size_bytes;
         sramByteLoopIndex++)
    {
      temp_byte = *src_addr_byte;
      __asm__ __volatile__ ("nop"); // Tried adding NOP as per 2.6.10 of errata, even though we are not using DTR
      *dst_addr_byte = temp_byte; // *** BusFault here on first iteration if the dummy aligned write is preset***
      src_addr_byte++;
      dst_addr_byte++;
    }

    // 3. Verify destination region
    // *** We never get here ***
    dst_addr_byte = (uint8_t *)(OCTOSPI1_BASE + dst_offset);
    for (sramByteLoopIndex = 0; sramByteLoopIndex < copy_size_bytes;
         sramByteLoopIndex++)
    {
      test_pattern_byte =
          (uint8_t)((sramByteLoopIndex + 0xAA) & 0xFF); // Expected pattern
      assert_param(*dst_addr_byte == test_pattern_byte);
      dst_addr_byte++;
    }
  }

I can post my memory mapped initialisation code, if needed.

Any suggestions of what I might be missing, or how to move forward?

@Alex - APMemory have you ever seen something like this?

1 ACCEPTED SOLUTION

Accepted Solutions

Not sure if this helps.

It seems there are something done wrong on STM32 MCU. befow domments from our team:

1, Since Test 1 is good, basic read/write on dual-quad SPI configuration is working.  Effectively, it's OPI with DMM = 1 (dual-memory mode).

2. According to the data alignment constraints, it becomes halfword-addressable memory, not byte-addressable. 

3. Test 2 doesn't seem to honor the constraints, as it touches the odd address in the very first PSRAM read. I suspect it is the MCU cache that shadows the unaligned access in Test 1, where the cache is always filled by the previous aligned access. In another words, the cache absorbs the unaligned accesses, such the PSRAM sees no accesses with odd addresses. Once this assumption is broken (as cache is disabled or access an odd-address when cache is cold), there will be unaligned access to PSRAM. 

 

Relevant constraints:

AlexAPMemory_0-1744038729529.jpeg

(default write-through) MCU cache is preheated by write:

AlexAPMemory_1-1744038748465.jpeg

 

 

View solution in original post

10 REPLIES 10
Alex - APMemory
Senior II

Hi, 

I'm not sure how to help.

I've no idea if ISSI QSPI works. 

But also I'm not sure if I understand your configuration well.  You could use our QSPI (APS1604M..., APS6404L..., SOP8/USON8) or our OPI (APS6408L, ...BGA24) and all set up should be available in cube. 

I you use 2 QSPI for some reason I guess you need to use QSPI set up with 2CE to activate one or the other memory.

Regards

Alex

PS: Looking at Mouser, it seems APMemory 64Mb QSPI is half of price for twice the density  than ISSI 32Mb

https://eu.mouser.com/c/?q=IS66WVS4M8BLL

https://eu.mouser.com/c/?q=aps6404L-3SQ

 

Hi @Alex - APMemory , thanks for your thoughts. 
Wish I had known about the listed AP parts at hardware design time - will definitely consider them if we do a hardware revision.

As for the 2 QSPI modules: Dual-quad SPI splits up a byte and writes 4 bits to each RAM module.

Alex - APMemory
Senior II

You are right about dual quad, this should work

 

Let me share overview of supported device/SoC

https://wiki.st.com/stm32mcu/index.php?title=Introduction_to_external_serial_memories_XSPI_interoperability_for_STM32&oldid=63515

AlexAPMemory_0-1744011169802.png

 APMemory IoT RAM Solution
STM32 MCU familyHPI/OPIOPIQSPI SDRQSPI DDR
STM32L4Rx-✓*--
STM32L5
STM32L4P5/Q5
STM32U575/585
STM32H5
-
STM32H7A3/B3
STM32H72x/3x
-✓*
STM32U59x/U5Ax, STM32U5Fx/U5Gx
STM32H7Rx/Sx
STM32N6
All STM32 supporting NOR QSPI--✓*-
APMemory device256Mb~512Mb
1.8V
BGA24/WLCSP
APS256XXN-OBR/OB9-...
APS512XXN-OBR/OB9-…
64Mb~512Mb
1.8V ~3V
BGA24/WLCSP
APS6408L-xOBM-...
APS12808L-xOBM-BA
APS12808O-OBR-WB
APS25608N-OBR-BD
APS51208N-OBR-BD
16Mb~128Mb
1.8V ~3V
SOP8/USON8/WLCSP
APS1604M-xSQR-…
APS6404L-xSQR-...
APS12808O-SQRH-WA
128Mb
1.8V
WLCSP
APS12808O-DQ-WA
20pins, up to 1GB/s11pins, up to 400MB/s6pins, up to 72MB/s7pins, up to 166MB/s

Thanks for this, @Alex - APMemory . Helpful for when revision time rolls around.

burn_
Associate III

As for the original issue I'm facing, perhaps someone from ST can weigh in? Based on other threads, perhaps @KDJEM.1 or  @mƎALLEm has seen something like this before?

@BDoon.1, did you perhaps come accross something like this while working with the OSPI peripheral? (based on this thread)

Not sure if this helps.

It seems there are something done wrong on STM32 MCU. befow domments from our team:

1, Since Test 1 is good, basic read/write on dual-quad SPI configuration is working.  Effectively, it's OPI with DMM = 1 (dual-memory mode).

2. According to the data alignment constraints, it becomes halfword-addressable memory, not byte-addressable. 

3. Test 2 doesn't seem to honor the constraints, as it touches the odd address in the very first PSRAM read. I suspect it is the MCU cache that shadows the unaligned access in Test 1, where the cache is always filled by the previous aligned access. In another words, the cache absorbs the unaligned accesses, such the PSRAM sees no accesses with odd addresses. Once this assumption is broken (as cache is disabled or access an odd-address when cache is cold), there will be unaligned access to PSRAM. 

 

Relevant constraints:

AlexAPMemory_0-1744038729529.jpeg

(default write-through) MCU cache is preheated by write:

AlexAPMemory_1-1744038748465.jpeg

 

 

BDoon.1
Associate III

@burn_ I did experience something like this.  If you look in the STM32U585 Errata, in the OCTOSPI section, there are a number of items related to 4-byte boundaries.  2.6.7 "Read data corruption after a few bytes are skipped when crossing a four-byte boundary" in particular seems bad, but 2.6.5 and 2.6.10 are also concerning for an application that just wants to treat the memory like it's directly addressable.

Almost all of the workarounds listed for these issues involve enabling the DCache.  Which is what I did, and it did resolve the issues I was seeing.

Hi @Alex - APMemory ,

Thanks for reaching out to your team. Your description of the problem makes perfect sense. The implication is that I'm also likely loosing data when an odd number of bytes is written.

Another helpful clarification is that it is halfword-addressable memory (I assumed word-addressable would be the solution).

I'll construct an example with caching disabled and see if test 1 then also fails.

Hi @BDoon.1 ,

Thanks for your input.

I was convinced that the issues listed in the errata was my issue as well. However, in my case the problem persisted despite extensively playing with the cache settings.