STM32F7, QUADSPI and the data cache - odd behaviour

andy · ‎2018-06-12

Posted on June 12, 2018 at 12:41

Hi all,

I have a board using the STM32F765, which uses the QSPI interface to fetch data from an FPGA. The normal sequence of operation is:

A timer in the FPGA causes it to perform a sequence of events which produce a chunk of data, which is stored in internal FPGA memory
FPGA asserts an output indicating to the STM32 that data is available
This assertion causes an EXTI interrupt, which kicks off a chain of events which use the QSPI interface to read data from the FPGA. (The FPGA is programmed to emulate a serial Flash memory).
The QSPI interface is set to indirect read mode, and DMA2 is used to read data from the QSPI FIFO and write it into standard (not DTCM) RAM.
Once each DMA transfer is complete, the ISR cleans and invalidates the relevant portions of the data cache (by address).
There are usually several blocks of data to read at a time. Once they've all been read, the FPGA is reset (via a separate SPI interface), a flag is set to indicate to the main application that a block of data is available to process.
The main application processes the data, then sits and waits for the next block, and so on.

The problem I'm seeing is that, just occasionally, the QSPI chip select goes active after all the data has been read from the FPGA. The next time the QSPI interface is used, its status register indicates that it's busy and has a full FIFO, as if a read operation has been started, but no DMA has been set up to actually put the results somewhere.

I've spent the last day or so using a scope and some GPIO signals to determine what is happening when, and here's where it gets really interesting. I now know that the spurious QSPI activation is not caused by the code which intentionally initiates QSPI reads. Instead, three conditions must be met in order to cause it:

1) The main code must be actually accessing the data from the FPGA, which is in cacheable RAM (SRAM1);
2) The data cache must be enabled;
3) The SysTick interrupt handler must have just exited.

The SysTick handler is very simple; usually it just sets a few flags and increments some counters, and occasionally it generates some debug output (though this has no effect on whether the spurious QSPI event is triggered). Nevertheless, QSPI CS goes low within a few nanoseconds of the handler exiting.

If I turn the data cache off, then all is well and there are no spurious QSPI events.

If I leave it on while data is being fetched from the FPGA, but turn it off while processing, then that's OK too.

Calling SCB_CleanInvalidateDCache() at the end of the SysTick handler makes no difference.

Putting the data from the FPGA into DTCM RAM does fix the symptoms, but I don't know why.

My ISR normally leaves the QSPI interface enabled. If I turn QSPI off by writing 0 to the control register once it's finished with, then this does prevent spurious QSPI transactions from occurring - but since I don't know why they're occurring in the first place, I can't be sure there isn't something else bad also happening for the same reason.

I never get a spurious QSPI event when the main loop is sitting waiting for new data to arrive; only while it's actually working on that data. This is actually quite a short window of time; if I turn off hardware floating point support, which makes the processing take longer, then spurious QSPI events can occur within a wider time window after each block is received.

So, I have a few workarounds for the spurious QSPI events: turn off the data cache (at least while the data is being processed), move the data into DTCM, or turn off the QSPI interface when it's not being used. None of these really explain the problem, though they do make the symptom go away.

My best guess is that exiting the SysTick handler while the cache contains data from SRAM1 is causing a number of cache operations to occur, and one of these is writing to QUADSPI->AR, or triggering the QSPI interface in some other way.

It *almost* feels like some obscure erratum, ie. 'QSPI interface can be triggered by data cache operations on return from interrupts', but I'd rather fix my code than blame it on something that's 'clearly' a hardware bug that nobody else seems to have noticed!

Any suggestions please, experts?

David Littell · ‎2018-06-12

Posted on June 12, 2018 at 15:17

I'm sure this isn't related to your actual problem but note that calling SCB_CleanInvalidateDCache() can be dangerous to your inbound data if a cache line overlaps a portion of your data buffer. In the case of a dirty line it'll push the whole line to memory, potentially writing stale data from the overlapping portion of the cache line to the recently DMA'ed data in memory. That's why it's recommended to size and align inbound DMA buffers to cache-line boundaries. You then only need to invalidate the cache for inbound data, never flush. Flushing is only needed for outbound (via DMA) data.

andy · ‎2018-06-12

Posted on June 12, 2018 at 16:19

Thanks for the tip; I've changed the structure and location of the data received via QSPI in order to align it to 32 byte boundaries, and now call SCB_InvalidateDCache_by_Addr() after each transfer, which should be safe (right?). Sadly it hasn't fixed the problem, though I've no particular reason to suspect that it would in this case.

[Update]: Some further amusing observations:

Compiling with optimization level 1 (instead of 0) makes the problem occur much more frequently, which is useful for debugging. It also makes the problem occur independently of the SysTick interrupt, so whatever the root cause is, it's probably nothing to do with the contents of the ISR - just the fact that *something* got executed.
Putting __ISB() / __DSB() anywhere in or around the main loop seems to make the problem go away. I suspect this is to do with optimisation of functions that contain assembler, rather than the normal effect of these operands.
In fact, various ridiculously trivial code changes in all sorts of places make the problem go away.

Right now I'm at the point where I've spent a couple of days going round in circles, and I'm at the point of just turning off the data cache and taking the performance hit. I just wish I could find what it is that initiates an unwanted QSPI transaction, then at least I'd have something to work back from.

Christensen.Tyler · ‎2018-07-25

Were you able to figure this out? I'm having similar issues where an unrelated interrupt causes the QuadSPI peripheral to activate and clock in an extra byte into the FIFO which messes up the next reception. I've spent days trying to figure out precisely what does it and it's just not clear why this is happening. In my case it's very hard to debug because if I change just about anything, even adding a NOP in a lot of places, the problem goes away, so I really have to just explain root cause directly.

I have many work-arounds, but they all are random in nature and don't actually explain what the problem is which is a bit scary on this firmware project.

Jaroslav BECKA · ‎2018-07-26

Hi,

I am convinced that this can be caused by speculative read accesses to Normal memory regions performed by the Cortex-M7 core.

QSPI memory area is located within "External RAM" region (0x6000 0000 - 0x9FFF FFFF) and is by default Normal. This means CPU can perform re-ordering of memory accesses as well as speculative reads to this area.

If you use MPU to configure the QSPI memory region to either Device or Strongly-ordered, the problem should disappear.

You can find more about this in AN4861: LCD-TFT display controller (LTDC) on STM32 MCUs

or AN4838: Managing memory protection unit (MPU) in STM32 MCUs or in Cortex-M7 programming manual (PM0253).

I hope this will help you to solve the issue.

Jaroslav

Christensen.Tyler · ‎2018-07-26

I'm not even using the QuadSPI peripheral in memory-mapped mode, so I don't think the 0x90000000 QuadSPI memory space matters to me. The QuadSPI configuration registers are in the 0xA0001000 space and are what I believe are getting tripped up (specifically I *think* AR is getting written to which triggers bus activity, I can't really be sure this isn't some side effect of something but it makes sense since I can see AR change in debugger and that logically would have the effect of clocking an extra byte into the FIFO). This memory region is already default mapped to device-type.

Regardless, I did try device-type as well as strong-ordered for the 0x60000000 - 0x9FFFFFFF region through MPU configuration and it had no effect on my problem.

Jaroslav BECKA · ‎2018-07-27

Would you describe the issue in more detail, please?

What are the symptomps? Is debugger connection lost?

Otherwise, if something writes to the AR register, a watchpoint should reveal that.

Christensen.Tyler · ‎2018-07-27

The exact symptom is that using the ST VCP USB driver, when I plug in the USB cable the first time after boot everything is fine. Unplug it and plug it in the second time, and the QuadSPI peripheral clocks in one byte out of nowhere. USB still works, everything else is fine. Debugger remains connected, the program continues to run, if the code didn't notice that extra byte arrive nothing observable would even happen. But, obviously my code is using QuadSPI and it is problematic to get a random rogue byte into the QuadSPI RX FIFO. If I add even just one __ASM("NOP"); anywhere in most of the USB VCP library, the problem goes away. Changing almost any single line of code makes the problem go away so it is an extremely unlikely and subtle timing bug of some sort. Makes it very hard to debug because anything you try "fixes" the problem even though the problem remains unknown.

I disabled my own QuadSPI memory task such that during operation the QuadSPI peripheral is silent and then did a memory read/write breakpoint on the entire quadSPI register space (0xA0001000 - 64 bytes monitored which extends beyond) and when I plug in the cable the second time I can see the AR and Status registers change in the peripheral watch window as a result of the bug but the break point does not fire. This means code isn't directly changing the contents of that register, something within the processor is changing it as a result of some other action.

I also ran the QuadSPI CS line back to a GPIO and set that up as a GPIO interrupt so that I can essentially create a break point on the CS line going low, and it trips up at a function call (specifically when calling HAL_PCD_EP_Open although I don't think it's a direct bug in that region of code, otherwise adding a NOP anywhere would not fix the problem, it's much lower level and subtle than that).

CHead · ‎2018-08-09

I am seeing a similar problem but with writes to an actual SPI Flash. Occasionally the address register spontaneously changes to zero (you can actually see the value zero if you read back AR), and some of the time when this happens, address zero is sent on the bus instead of the proper address. I introduced a PRIMASK-based critical section starting before the write to AR and ending after BUSY went high, and this seems to eliminate the problem of the wrong address being sent. AR still sometimes changes to the wrong value, but apparently late enough that it doesn’t affect the signals on the bus.

I am also using data in system SRAM with the D-cache enabled.

Something that I tried that didn't help was using the MPU to force either the QUADSPI control registers or the memory mapping region (which I don’t use) to be strongly ordered, on the thought that either a random cache fill might be touching the memory mapping region and causing AR to be modified (which would be theoretically legal, albeit odd, since the memory mapping region is by default Normal-type memory), or that the cacheability attributes on the control registers might not be how ARMv7-M architecturally defines them (as Device-type memory).

JMund · ‎2019-01-10

Did anybody figure out anything with this? We are experiencing random write to different parts of memory that cause our application to crash after 3-5 days.. It's incredibly strange and as noted shifting any memory layout seems to affect crash frequency.

Its extremely frustrating as it's next to impossible to debug however it is predictive /random in nature

I.E. Three identical units plugged in at different times but sft rest at the exact same time will all crash at i.e. 3.12500days and then crash at 1.5 days... and then one will go to 3 days but the rest will rest at 1.5days