QUAD-SPI too busy!

Albi G. · ‎2020-06-10

Hi guys,

i am using the QuadSPI periphery to interface with a FPGA. I am in need of raw throughput with alternating write and read cycles of some number of bytes. I am just concept-prooving right now.

Since QUADSPI->CCR is not allowed to be modified when BUSY==1 the busy flag effectively limits the command rate - which is useful since i obviously need to wait to finish a a command before sending the next one.

My configuration is:

4 bit wide Instruction
Skip directly to 4bit wide Data-transfer
Min CS-High time = 0 (1 cycle)
IndirectRead / IndirectWrite mode, no DMA for now.

Now, unfortunately the BUSY flag is rather lame and it takes exactly 6-7 clocks after CS rises that the BUSY flag clears. That is really **** ** the achievable command rate!

With 6-7 clocks, i mean periphery-clocks after the prescaler. (@1MHz SPI clock, the BUSY-signal is 1 for 6µs after CS rises. If 10MHz is used, then its 0.6us)

What i dont understand is, that this behavior makes the minimum CS-high setting kind of useless. (i never achieve a scenario where CS is less than 6 cycles high in-between commands)

This holds true for consecutive reads, consecutive writes, and alternating read-write.

What could go wrong ??

This is my example code. Not using the library, but my code hopefully reads kind of intuitively...

Measured though my Debug-GPIO: 16.7µs from Command start to BUSY=0

Chip select Low for 11µs

Give me a faster BUSY, please :(

BTW: i also get the same behavior if i configure the QSPI differently:

skip Instruction
Send one byte address
skip directly to Data

== Same timing(-problems)

thanks

Andreas Bolsch · ‎2020-06-11

There is nothing wrong, but only a misconception regarding the design goal of the QSPI. From the RM: "The QUADSPI is a specialized communication interface targeting single, dual or quad SPI Flash memories. ...". One might add: "For anything else, your mileage may vary." The whole chapter in the RM deals only with attaching flash memory, that is clear enough.

For the intended use, it's the performance in memory mapped mode which really matters. Indirect read/write are intended for programming the flash (this includes status read, write enabling, ...). This is inherently quite slow, hence e.g. command startup time (or in your case return to idle state) doesn't matter that much when a whole page is sent to the flash in a single operation and then the chip has to be polled until programming has finished.

CSHT is certainly relevant only in memory mapped mode.

In your case, the FMC or PSSI (on H7A/B) might be a feasible option, except that if you intend to use only 4-bit parallel data, the necessary packing/unpacking might be an obstacle.

Albi G. · ‎2020-06-11

So this is more a Quad-cripple instead a Quad-SPI?

The memory mapped mode is read-only which is totally useless for me :( One cant use SRAMs or FRAMs in QSPI-memory mapped mode.

There shouldnt be a conceptual reason for this behavior besides the intented purpose. This seems really just a faulty/lazy implementation of the hardware state machine. I mean, if BUSY=1 there must be something busy actually. What is it?

Unfortunately my package does not have the FMC and i cant really switch chips due to price and board space.

The quad-SPI should be really just a quad-spi. nothing more. This is frustrating. I appreciate the command configuration on top of that, but not when it is so crippling.

My intented purpose was to use the QSPI as a high throughput device.

I have all 5 ADCs running at full speed. I use DMA to put the results next to each other in memory and after that, another DMA should be triggered to write those values to QSPI which is attached to an FPGA/coprocessor. After the QSPI-write-DMA finishes, the results must be read back via DMA.

All my plans ruined.

!!UNDOCUMENTED!!

Andreas Bolsch · ‎2020-06-11

Sorry for the bad news ...

You might try not to wait for BUSY to be reset but instead for TCF becoming set and then set ABORT right away. This might save a few cycles, but I doubt it will be fast enough for your purpose.

Albi G. · ‎2020-06-11

Tried that, TCF and BUSY are one and the same for this purpose. Same useless 6 cycle delay.

berendi · ‎2020-06-11

Please post the contents of the QSPI registers before and after issuing the command.

Albi G. · ‎2020-06-11

Easy enough.

This is the initialization-code and the lop that just fills the FIFO. This results in 1 Instruction + 1 Data Byte (4clks) + 6 useless waiting clocks

The QUAD-SPI is filled with what is down below. This is read with debugger at the bkpt-insturction:

QUADSPI->CR = 0xa6000011;
QUADSPI->DCR = 0x1f0000;
//QUADSPI->SR = 4;
//QUADSPI->FCR = 0;
QUADSPI->CCR = 0x3000301;
QUADSPI->AR = 0;
QUADSPI->ABR = 0;
//QUADSPI->DR = 0;
QUADSPI->PSMKR = 0;
QUADSPI->PSMAR = 0;
QUADSPI->PIR = 0;
QUADSPI->LPTR = 0;

I am not sure what "after the command" means in that context. There is "no after the command" here, its "instantaneous" restart since data is available and that is what triggers a transfer as per datasheet..... after 6 idle qspi clocks.

Arnon · ‎2020-06-11

Hi @Albi G.

I don't have answers to your question

but I share your frustration

I am trying to do the same: use the QSPI for FPGA communication. using indirect mode, data phase only.

My problem with the busy bit that sometime it will stay "set" forever (interface is not active)

--Arnon

Andreas Bolsch · ‎2020-06-11

If you refer to indirect read mode: That's a known bug on the H753 etc., see errata sheet, 2.5.3. As the QSPI interface seems to almost identical across the various devices, I wouldn't be surprised if this is present on other devices as well.

Arnon · ‎2020-06-11

thank you @Andreas Bolsch

Yes, I do refer to indirect read mode.

the errata state "slave mode" while I have configured the CPU to be the master.

I am using HAL_QSPI_Receive_DMA(&hqspi,(buffer) to trigger read.

--Arnon