2025-04-04 2:58 AM
Good day,
I'm experiencing an unexplained bus fault when the MCU writes to unaligned (not aligned to 32-bit) addresses in the OSPI memory region. I've written a minimal implementation that shows where the issue occurs. See below.
I'm using the STM32U585 with two IS66WVS4M8BLL quad SPI modules. The OctoSPI peripheral is configured in Dual-Quad Mode. See the image for the settings as configured in STM32CubeMX. I use memory mapped mode. I've enabled DQS for writes as per 2.6.1 of the errata. I've tried different MPU cache configurations for this area: no MPU, MPU_DEVICE_nGnRnE, and (MPU_WRITE_THROUGH | MPU_NON_TRANSIENT | MPU_W_ALLOCATE | MPU_R_ALLOCATE).
Here's the minimal implementation. Note that 8-bit writes work if the previous write was aligned. See comments surrounded by ***, focussing on Test 2.
// Test 1: Unaligned 8-bit access
// *** This first test works without issues ***
volatile uint32_t sramByteLoopIndex;
volatile __IO uint8_t *mem_addr_byte;
volatile uint8_t sramTestValByte;
// Writing Sequence (8-bit, unaligned pattern)
mem_addr_byte = (uint8_t *)(OCTOSPI1_BASE);
for (sramByteLoopIndex = 0; sramByteLoopIndex < OSPI_MEM_SIZE_BYTES;
sramByteLoopIndex++)
{
sramTestValByte = (uint8_t)(sramByteLoopIndex &
0xFF); // Use lower 8 bits of index as pattern
*mem_addr_byte = sramTestValByte;
mem_addr_byte += 1;
}
// Reading Sequence (8-bit, unaligned check)
mem_addr_byte = (uint8_t *)(OCTOSPI1_BASE);
for (sramByteLoopIndex = 0; sramByteLoopIndex < OSPI_MEM_SIZE_BYTES;
sramByteLoopIndex++)
{
sramTestValByte = (uint8_t)(sramByteLoopIndex & 0xFF);
assert_param(*mem_addr_byte == sramTestValByte);
mem_addr_byte += 1;
}
// Test 2: Unaligned Copy within SRAM
// *** Here things are not well ***
volatile __IO uint8_t *src_addr_byte;
volatile __IO uint8_t *dst_addr_byte;
volatile uint8_t test_pattern_byte;
const uint32_t copy_size_bytes = 512; // Size of data to copy
// Ensure source and destination are unaligned and distinct
const uint32_t src_offset = 1; // Unaligned source start
const uint32_t dst_offset =
(OSPI_MEM_SIZE_BYTES / 2) + 3; // Unaligned destination start
// Ensure offsets are valid and don't cause wrap-around issues with copy size
if ((src_offset + copy_size_bytes < OSPI_MEM_SIZE_BYTES) &&
(dst_offset + copy_size_bytes < OSPI_MEM_SIZE_BYTES))
{
// Dummy aligned write first
// *** Removing this line causes BusFault on the write in code block 1 below ***
*((volatile __IO uint8_t *)OCTOSPI1_BASE) = 0;
// 1. Fill source region with a pattern
src_addr_byte = (uint8_t *)(OCTOSPI1_BASE + src_offset);
for (sramByteLoopIndex = 0; sramByteLoopIndex < copy_size_bytes;
sramByteLoopIndex++)
{
test_pattern_byte =
(uint8_t)((sramByteLoopIndex + 0xAA) & 0xFF); // Arbitrary pattern
*src_addr_byte = test_pattern_byte; // *** BusFault here on first iteration if the dummy write is commented out, otherwise no issues here ***
src_addr_byte++;
}
// 2. Perform byte-by-byte unaligned copy (Isolated Read/Write)
src_addr_byte = (uint8_t *)(OCTOSPI1_BASE + src_offset);
dst_addr_byte = (uint8_t *)(OCTOSPI1_BASE + dst_offset);
volatile uint8_t temp_byte; // Temporary variable
for (sramByteLoopIndex = 0; sramByteLoopIndex < copy_size_bytes;
sramByteLoopIndex++)
{
temp_byte = *src_addr_byte;
__asm__ __volatile__ ("nop"); // Tried adding NOP as per 2.6.10 of errata, even though we are not using DTR
*dst_addr_byte = temp_byte; // *** BusFault here on first iteration if the dummy aligned write is preset***
src_addr_byte++;
dst_addr_byte++;
}
// 3. Verify destination region
// *** We never get here ***
dst_addr_byte = (uint8_t *)(OCTOSPI1_BASE + dst_offset);
for (sramByteLoopIndex = 0; sramByteLoopIndex < copy_size_bytes;
sramByteLoopIndex++)
{
test_pattern_byte =
(uint8_t)((sramByteLoopIndex + 0xAA) & 0xFF); // Expected pattern
assert_param(*dst_addr_byte == test_pattern_byte);
dst_addr_byte++;
}
}
I can post my memory mapped initialisation code, if needed.
Any suggestions of what I might be missing, or how to move forward?
@Alex - APMemory have you ever seen something like this?
Solved! Go to Solution.
2025-04-07 8:14 AM
Not sure if this helps.
It seems there are something done wrong on STM32 MCU. befow domments from our team:
1, Since Test 1 is good, basic read/write on dual-quad SPI configuration is working. Effectively, it's OPI with DMM = 1 (dual-memory mode).
2. According to the data alignment constraints, it becomes halfword-addressable memory, not byte-addressable.
3. Test 2 doesn't seem to honor the constraints, as it touches the odd address in the very first PSRAM read. I suspect it is the MCU cache that shadows the unaligned access in Test 1, where the cache is always filled by the previous aligned access. In another words, the cache absorbs the unaligned accesses, such the PSRAM sees no accesses with odd addresses. Once this assumption is broken (as cache is disabled or access an odd-address when cache is cold), there will be unaligned access to PSRAM.
Relevant constraints:
(default write-through) MCU cache is preheated by write:
2025-04-04 3:36 AM
Hi,
I'm not sure how to help.
I've no idea if ISSI QSPI works.
But also I'm not sure if I understand your configuration well. You could use our QSPI (APS1604M..., APS6404L..., SOP8/USON8) or our OPI (APS6408L, ...BGA24) and all set up should be available in cube.
I you use 2 QSPI for some reason I guess you need to use QSPI set up with 2CE to activate one or the other memory.
Regards
Alex
PS: Looking at Mouser, it seems APMemory 64Mb QSPI is half of price for twice the density than ISSI 32Mb
https://eu.mouser.com/c/?q=IS66WVS4M8BLL
https://eu.mouser.com/c/?q=aps6404L-3SQ
2025-04-07 12:06 AM
Hi @Alex - APMemory , thanks for your thoughts.
Wish I had known about the listed AP parts at hardware design time - will definitely consider them if we do a hardware revision.
As for the 2 QSPI modules: Dual-quad SPI splits up a byte and writes 4 bits to each RAM module.
2025-04-07 2:15 AM
You are right about dual quad, this should work
Let me share overview of supported device/SoC
APMemory IoT RAM Solution | ||||
STM32 MCU family | HPI/OPI | OPI | QSPI SDR | QSPI DDR |
STM32L4Rx | - | ✓* | - | - |
STM32L5 STM32L4P5/Q5 STM32U575/585 STM32H5 | - | ✓ | ✓ | ✓ |
STM32H7A3/B3 STM32H72x/3x | - | ✓ | ✓* | ✓ |
STM32U59x/U5Ax, STM32U5Fx/U5Gx STM32H7Rx/Sx STM32N6 | ✓ | ✓ | ✓ | ✓ |
All STM32 supporting NOR QSPI | - | - | ✓* | - |
APMemory device | 256Mb~512Mb 1.8V BGA24/WLCSP APS256XXN-OBR/OB9-... APS512XXN-OBR/OB9-… | 64Mb~512Mb 1.8V ~3V BGA24/WLCSP APS6408L-xOBM-... APS12808L-xOBM-BA APS12808O-OBR-WB APS25608N-OBR-BD APS51208N-OBR-BD | 16Mb~128Mb 1.8V ~3V SOP8/USON8/WLCSP APS1604M-xSQR-… APS6404L-xSQR-... APS12808O-SQRH-WA | 128Mb 1.8V WLCSP APS12808O-DQ-WA |
20pins, up to 1GB/s | 11pins, up to 400MB/s | 6pins, up to 72MB/s | 7pins, up to 166MB/s |
2025-04-07 5:42 AM
Thanks for this, @Alex - APMemory . Helpful for when revision time rolls around.
2025-04-07 6:04 AM - edited 2025-04-07 6:10 AM
As for the original issue I'm facing, perhaps someone from ST can weigh in? Based on other threads, perhaps @KDJEM.1 or @mƎALLEm has seen something like this before?
@BDoon.1, did you perhaps come accross something like this while working with the OSPI peripheral? (based on this thread)
2025-04-07 8:14 AM
Not sure if this helps.
It seems there are something done wrong on STM32 MCU. befow domments from our team:
1, Since Test 1 is good, basic read/write on dual-quad SPI configuration is working. Effectively, it's OPI with DMM = 1 (dual-memory mode).
2. According to the data alignment constraints, it becomes halfword-addressable memory, not byte-addressable.
3. Test 2 doesn't seem to honor the constraints, as it touches the odd address in the very first PSRAM read. I suspect it is the MCU cache that shadows the unaligned access in Test 1, where the cache is always filled by the previous aligned access. In another words, the cache absorbs the unaligned accesses, such the PSRAM sees no accesses with odd addresses. Once this assumption is broken (as cache is disabled or access an odd-address when cache is cold), there will be unaligned access to PSRAM.
Relevant constraints:
(default write-through) MCU cache is preheated by write:
2025-04-08 5:33 AM
@burn_ I did experience something like this. If you look in the STM32U585 Errata, in the OCTOSPI section, there are a number of items related to 4-byte boundaries. 2.6.7 "Read data corruption after a few bytes are skipped when crossing a four-byte boundary" in particular seems bad, but 2.6.5 and 2.6.10 are also concerning for an application that just wants to treat the memory like it's directly addressable.
Almost all of the workarounds listed for these issues involve enabling the DCache. Which is what I did, and it did resolve the issues I was seeing.
2025-04-09 1:34 AM
Hi @Alex - APMemory ,
Thanks for reaching out to your team. Your description of the problem makes perfect sense. The implication is that I'm also likely loosing data when an odd number of bytes is written.
Another helpful clarification is that it is halfword-addressable memory (I assumed word-addressable would be the solution).
I'll construct an example with caching disabled and see if test 1 then also fails.
2025-04-09 1:42 AM
Hi @BDoon.1 ,
Thanks for your input.
I was convinced that the issues listed in the errata was my issue as well. However, in my case the problem persisted despite extensively playing with the cache settings.