cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H75x QSPI flash memory mapped mode questions

regjoe
Senior II

Hi,

I've have to port an existing huge application from F4 to a H753 board. The F4 has PNOR the H7 dual QSPI flash.

One task parses XML configuration data from PNOR flash. A second task stores data e.g. log data in same PNOR.

Parsing configuration data use of string functions which require a memory address, while writing to QSPI requires indirect mode. This would require kind of flash memory management, which I guess is bit tricky to handle.

I started using indirect mode only and DMA to read chunks of data into memory buffers before parsing it. This works fine. A QSPI read request can suspend an ongoing erase operation, so it is not blocked or delayed. Great.

But compared to the PNOR read, the latency when reading data chunks into buffers in indirect mode drops the data processing performance. Therefore I want to check if using memory mapped mode is worth some time spent for investigations.

I've read that QSPI read has a prefetch buffer. Is this mechanism available for data read too or only for XIP? The H75x has 32 byte FIFO so I think this may speed up sequential memory mapped reads e.g. as used by strcmp()? Any experience?

I had problems (and also read about it here in this forum) when switching QSPI interface from memory mapped mode to indirect mode and vice versa. Does an MCU with OSPI or XSPI interface have advantages regarding this application?

Thank you

7 REPLIES 7
KDJEM.1
ST Employee

Hello @regjoe ;

 

What do you mean by PNOR? Are you using FSMC interface with STM32F4?

When Memory-mapped mode is used, a prefetching mechanism fully managed by the hardware permits the
optimization of the read and the execution performances from the external Quad-SPI memory. The FIFO is used as a prefetch buffer to anticipate linear reads. Any access to QUADSPI_DR in this mode returns zero. Please look at Introduction to Quad-SPI interface for STM32 MCUs and MPUs - Application note.

You need to respect QUADSPI internal timing critically as mentioned in the erratasheet.

For that, an additional code must be executed upon reset and upon switching from memory-mapped to any other mode

KDJEM1_0-1776157197137.png

 

Thank you.

Kaouthar

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

regjoe
Senior II

Hello @KDJEM.1 

yes, its parallel NOR flash connected to F4 via FMC.

What does "linear reads" mean? IMHO a FIFO is not a cache, it implies that only bytes or words can be read only sequentially but no random access to the FIFO is possible, right?

AN4760 tells me "Data are prefetched continuously while the FIFO is not full, when a discontinuous access is detected, the QUADSPI rises chip-select and starts a new read operation without sending the command but sending directly the new address."

So e.g. if I read a byte from start address offset 0, 4, 8, 12, ... the QSPI starts a new QSPI read sequence although the requested data is already present in the FIFO?

If a QSPI memory address is read the entire FIFO is always filled with 32bytes? What happens to the CPU, is it stalled until the first byte is in the FIFO or until the FIFO is full?

What about the cache, I guess it should be disabled for memory mapped read?

Some years ago I checked out a QSPI memory mapped demo for a H75x evaluation board but I failed to switch back from memory-mapped to indirect mode. Do you think that the code in the errata may fix that? Or the demo has been improved? If so I'll re-install and check out the demo once again.

Thank you

  

KDJEM.1
ST Employee

Hello @regjoe;

 

Yes, please look at RM0433 precisely 23.3.12 QUADSPI use and 23.3.13 Sending the instruction only once sections.

FIFO threshold and DMA generation are more in indirect mode in such case as documented into the reference manual.

If the burst size is 16 bytes, the FIFO threshold should be set to 16 bytes.

  • In the read operation, when the FIFO accumulates 16 bytes of data, it triggers an interrupt or a DMA transfer to read the data.
  • In the write operation, the FIFO threshold should be set to ensure that there is at least 16 bytes of space available. When there is enough space, it triggers an interrupt or a DMA transfer to write the data.

FAQ: How to configure FIFO with the OCTOSPI interface - STMicroelectronics Community

For the cache recommendation, please look at AN4839 section 4 Mistakes to avoid and tips.

 

Thank you.

Kaouthar

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

Pavel A.
Super User

@regjoe  STM32H753 has FMC similar to F4, so your board designer could just use the same PNOR and relieve you from the hassle.

If you can change STM32H753 to other MCU that has OSPI, you can put the ext. flash to memory-mapped mode. OSPI can both read and write in memory mapped mode.

We need at least 256Mbyte flash so we use two 1Gbit QSPI Flash in Dual QSPI mode.

I did some tests on the STM32H753 Eval2 with QSPI and PNOR flash.

I get ca. 86Mbyte/s @DMA read @ 100MHz clock and indirect mode, probably limited by internal bus saturation.

In memory mapped mode (QSPI_MemoryMappedDual demo) I get 25Mbyte/s if reading byte-wise.

If reading every 2nd, 4th, 8th a.s.o. the read performance drops a bit down to 10Mbyte/s.

If reading every 16th, 32th a.s.o the performance is 1.3Mbyte/s for all.

I checked with scope and found out that the H7 releases NCS now. Tried with fifo size 1, 8 and 16 without any change in speed or behaviour.

The H7RS would be an option but there are rumours that programming in memory mapped mode is unreliable, at least on H73x.

On our board the  MT25QL1G flash has the fastest erase and programming speed of all flash I tested. Two QSPI flashs are erased and programmed in parallel in order to speed up programming.

 

 

Pavel A.
Super User

256 MBytes of XML configuration data? Can you find a "streaming" parser that can work with sequential indirect read?

 

The application contains a selfmade parser which is a mess. It's not my code and I don't want to touch it except changing the hard-coded flash memory addresses to offsets which are finally added to a base address for physical memory access. 

I now think about reading the configuration data in chunks from flash to SDRAM. The parser will work much faster I guess if XML is located in RAM. So log data can be written / sectors can be erased in the same flash because QSPI always stays in indirect mode.