H7 QUADSPI weirdness

Tesla DeLorean · ‎2019-09-30

Seen this on two board designs, about two year different in age and many steppings of STM32H7

The H743I-EVAL and H745I-DISCO have 1Gb (128MB) of Micron QSPI flash, one as a dual die single chip, the other as two SOP16

Using the BSP code I can access the parts, both via the Read and Memory Mapped methods.

If it do a bytewise read of the entire array in memory mapped mode the first pass returns the wrong CRC32, and the second one hangs during the next read of a byte at 0x90000000. Stopping execution, it keeps falling in the Systick_Handler, and never reads any memory

If I do half the array, or 128MB - 32 bytes, I get the correct CRC, and can do it multiple times.

...
	status = BSP_QSPI_EnableMemoryMappedMode();
	if (status == QSPI_ERROR)
		puts("QSPI Error");
	else
		puts("QSPI Ok");
	
	printf("%u\n", FlashSize);
	printf("CRC:%08X\n", CRC32(0xFFFFFFFF, FlashSize, (void *)0x90000000)); // Returns wrong sum
	printf("CRC:%08X\n", CRC32(0xFFFFFFFF, FlashSize, (void *)0x90000000)); // Hangs
	printf("CRC:%08X\n", CRC32(0xFFFFFFFF, FlashSize, (void *)0x90000000));
	printf("CRC:%08X\n", CRC32(0xFFFFFFFF, FlashSize, (void *)0x90000000));
	printf("CRC:%08X\n", CRC32(0xFFFFFFFF, FlashSize, (void *)0x90000000));
...
 
//******************************************************************************
 
uint32_t CRC32(uint32_t Crc, uint32_t Size, uint8_t *Buffer)
{
  while(Size--)
  {
    static const uint32_t CrcTable[] = {
      0x00000000,0x1DB71064,0x3B6E20C8,0x26D930AC,0x76DC4190,0x6B6B51F4,0x4DB26158,0x5005713C,
      0xEDB88320,0xF00F9344,0xD6D6A3E8,0xCB61B38C,0x9B64C2B0,0x86D3D2D4,0xA00AE278,0xBDBDF21C };
 
    Crc = Crc ^ (uint32_t)*Buffer++;
 
    Crc = (Crc >> 4) ^ CrcTable[Crc & 0x0F];
    Crc = (Crc >> 4) ^ CrcTable[Crc & 0x0F];
  }
 
  return(Crc);
}
 
//****************************************************************************

It is as if the QUADSPI state machine malfunctions in getting the end bytes, or sends something breaking the Micron parts.

134217696 <-- 32 bytes less

CRC:E741A92C <-- Works

CRC:E741A92C

Infinite loop...

134217697 <--- 31 bytes less

CRC:1F3DC2C7 <- First completes correctly, but then locks

134217728 <-- Whole array

CRC:C3C26772 <- First completes incorrectly, and next pass locks

READ ID (QUAD)

MT25QL 512 (DUAL BANK) / MT25TL01G

UID-0 : 44 00 2E EF 92 00 13 FA FF 26 00 DE A8 FE ED 6C

UID-1 : 44 00 2E EF 92 00 09 12 00 11 00 45 F5 FE 0A B6

Size 134217728

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Andreas Bolsch · ‎2019-10-01

That's a well known problem. Don't ever touch the last few bytes (with regard to FSIZE) in memory mapped mode. Or, increase FSIZE field beyond the actual capacity. I'm not quite sure about the reason, but probably the prefetch (which is nowhere specified precisely) interferes with the address decoder in some nasty way. No problem on F4 and F7, though.

Tesla DeLorean · ‎2019-10-01

After I had posted, I did expand the size from 128MB to 256MB on QSPI configuration side, which did seem to make it happy, but this is pretty sloppy. Impacts Rev Z thru Rev V, not mentioned in Errata.

Prefetch 32-bytes, empirically. Widening the MPU window has no impact.

@STOne-32

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Andreas Bolsch · ‎2019-10-01

Yes, no indication in errata sheet ... Except there is a similar problem with OctoSPI in L4+, and there it's mentioned in the corresponding errata sheet.

Tesla DeLorean · ‎2019-10-01

L4+ ES0393

https://www.st.com/content/ccc/resource/technical/document/errata_sheet/group0/ef/74/70/f6/89/a9/42/3f/DM00371862/files/DM00371862.pdf/jcr:content/translations/en.DM00371862.pdf

2.7.13 Memory-mapped read of the last memory space byte not possible in SDR octal mode Description Memory-mapped read of the last byte of the memory space defined through the DEVSIZE[4:0] bitfield of the OCTOSPI_DCR1 register spuriously always returns zero. A subsequent memory-mapped read not separated from the previous memory-mapped read with a command causes the AHB interface to hang, with the HREADY flag never set. Note: This failure does not occur in DDR octal mode. Workaround Apply one of the following measures: • Avoid reading the last byte of the memory space through memory-mapped access. Use indirect read instead. • Set DEVSIZE value so that the memory space it defines exceeds the memory size, then handle the memory boundary by software.

Seeing this with dual bank DDR on the H7. Tested on Z and V step, would need check on X step, but assuming it is not flagged, so wasn't addressed.

Seems to hang the primary execution, interrupts seems to fire, at least in the debugger.

The routine seems to die on the 0x90000000 read

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2019-10-01

QSPIHandle.Init.FlashSize = POSITION_VAL(MT25TL01G_FLASH_SIZE); // - 1;

Running the H745I-DISCO at 450 MHz

CRC:7DE4758D

134217728 Bytes, 536871068 Cycles

112.50 MBps Read

SUM:B7152F3B

Infinite loop...

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Amel NASRI · ‎2019-10-04

Hello @Community member ,

I confirm that there is a limitation to read the last byte of QuadSPI when in memory mapped mode.

This will be described in the STM32H7 errata sheet.

There are still some possible workarounds:

double the FSIZE then manage the memory real boundaries by software (I think you tested it already)
for CRC check, you can use the indirect mode

Sorry for the inconvenience it creates for you and thanks for bringing it to our attention.

-Amel

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

Tesla DeLorean · ‎2019-10-04

The oddness extends out to 32-bytes, not sure how that comports with your understanding of the IP or implementation details.

I was using the CRC here as a means to exercise the memory array, I have a version that does the block/command level read. Basically for performance and integrity checking as I'm coding some external loaders.

It is an effective Design-For-Test technique as you don't need to do the byte-for-byte comparison, or hold the whole pattern in memory.

Yes, pushing out FSIZE addresses this. Do you have any data on a full 256MB (2Gbit) implementation? I don't have memory devices that cover that amount of space currently, and I suspect the memories being used might object if the range of their internal addressing was exceeded. I'll perhaps experiment.

@Andreas Bolsch I'm assuming you and others have encountered this previously, can you cite and threads/discussion or github commentary?

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Andreas Bolsch · ‎2019-10-05

That's first mentioned here: http://openocd.zylin.com/#/c/4321/1//COMMIT_MSG as of Jan. 2018.

The "That's a well known problem" might have been slightly exaggerated: It took me several days and numerous and rather frustrating experiments to become convinced it's really a silicon bug and not a software bug ...

Tesla DeLorean · ‎2019-10-05

>>might have been slightly exaggerated

That's Ok, I took it as you'd battled with it before. I'm a little surprised it wasn't picked up by the validation engineers over multiple steppings.

The way the processor stops is most scary aspect of this.

Got a SOIC socket on a NUCLEO currently to do experiments with assorted parts. Got a couple of boards with dual memories. Some of the mass-erase times are quite high, and both devices have different completion times, looking at ST's BSP code they only seem to spin on the primary device.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..