SCB_InvalidateDCache_by_Addr not operating correctly

FrankNatoli · ‎2022-05-05

Using twin STM32H7B3I EVAL boards to develop master and slave firmware.

Found that slave board was experiencing overruns on SPI slave receive.

Revised SPI slave implementation to use DMA.

Found incredibly bizarre data corruption, after some number of good packets moved from EVAL master to EVAL slave.

SCB_InvalidateDCache_by_Addr was called after each DMA receive completion to invalidate the data cache for the receive buffer.

However, once data cache was completely disabled, data corruption of receive DMA packets ceased.

Are there any timing or special considerations for the use of SCB_InvalidateDCache_by_Addr?

Pavel A. · ‎2022-05-05

> SCB_InvalidateDCache_by_Addr was called after each DMA receive completion

Try to call it [also] before starting DMA.

Try to use MPU to set the RX buffers non-cacheable.

Tesla DeLorean · ‎2022-05-05

Watch for issues related to 32-byte alignment, both in terms of the structure in question, sufficient coverage, and collateral damage to abutting/surrounding structures.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Piranha · ‎2022-05-05

Recent example of a wrong cache management:

https://community.st.com/s/question/0D53W00001UZLndSAH/stm32h7-cache-and-dma-transfer-error-rate

An instruction of proper cache management:

https://community.st.com/s/question/0D53W00000oXSzySAG/different-cache-behavior-between-stm32h7-and-stm32f7

Topic about the cache eviction:

https://community.st.com/s/question/0D50X0000C9hGozSQE/weird-cache-writeback-behavior-for-stm32f7508

Piranha · ‎2022-05-05

> Try to call it [also] before starting DMA.

It's not in a category of experimenting with some choice. It just must be called before starting the reception on a particular buffer. Also invalidating the second time after the reception is useless. It can make some difference only if the CPU is accessing the data buffer, while it is processed by DMA, but then the code is broken anyway.

FrankNatoli · ‎2022-05-06

I will revise and advise of results.

However, apparently I do not understand what it means to "invalidate" a cache.

My understanding, apparently incorrect, was:

CPU has cache of memory bytes
DMA writes to those memory bytes, unbeknownst to the CPU
Assume no call to SCB_InvalidateDCache_by_Addr
Program accesses those memory bytes but gets what is in cache not what DMA wrote

I thought by calling SCB_InvalidateDCache_by_Addr at step #3, the problem at step #4 is avoided.

If I move the call to SCB_InvalidateDCache_by_addr to between steps #1 and #2, there is still the possibility of the CPU placing data in data cache before reaching step #4, hence the same problem.

However, I will revise as suggested and advise of results.

Thanks for your time.

FrankNatoli · ‎2022-05-06

Well, here's the results, three software configurations:

Calling SCB_InvalidateDCache_by_Addr immediately before calling HAL_SPI_Receive_DMA: abject failure, resulting in DMA timeout on second packet moved from SPI master to SPI slave.
Calling SCB_InvalidateDCache_by_Addr after DMA completes transfer: many successful DMA transfers, then corrupted receive data and failure
Never enabling DCache, never calling SCB_InvalidateDCache_by_Addr: total success, thousands of DMA transfers without problem.

The SPI master sends a 32 byte command packet to the SPI slave, with final byte being a checksum.

The SPI master then may send a 256+32=288 byte data packet to SPI slave, with final byte being a checksum, the lead 256 bytes being data to write to external flash in the SPI slave system.

Originally, the SPI slave was running non-DMA, with interrupts, but experiencing overruns.

That is why I switched the SPI slave to DMA, only to be confronted with this annoying data corruption, clearly having something to do with data cache operations, but I cannot resolve it other than by disabling the data cache entirely.

SPI slave calls osEventFlagsSet when HAL_SPI_RxCpltCallback occurs.

Thread waiting on osEventFlagsWait then proceeds, in scenario #2 above then calling SCB_InvalidateDCache_by_Addr, and only then looking at the data written by DMA.

Pavel A. · ‎2022-05-06

@FrankNatoli

> My understanding, apparently incorrect, was:

Generally what you wrote looks correct, but consider what if CPU wrote into the buffer area before DMA transfer starts (or while DMA is ongoing).

Because of eviction, the writes can make into the RAM, and following invalidate won't help. But as Piranha wrote "then the code is broken anyway".

Some people on this forum come to conclusion that the most robust and efficient way to deal with this is to make the buffer area non-cacheable via MPU (to avoid totally disabling D-cache) and that perf gain from caching the buffers is not worth it.

Of course as Tesla D. noted, make sure that the buffer area is properly aligned at cacheline size (32 bytes).

Pavel A. · ‎2022-05-06

Apologize for bringing this topic again.... Speculative reads.

Can it be that a speculative read is issued to cached RX buffer during DMA and pulls incomplete data to the D cache? :face_without_mouth:

FrankNatoli · ‎2022-05-07

I have only one thread that is manipulating the two buffers, command and flash write data, so all the accesses are localized, no mystery CPU reads or writes.

I do, however, zero both buffers before starting either DMA transfer.

But I am/was calling SCB_InvalidateDCache_by_Addr not SCB_CleanDCache_by_Addr either before or after starting/finishing DMA, see two scenarios above.

The zeroing of the buffers almost certainly causes a dirty cache, which perhaps is confused by Invalidate not Clean?

The buffers are most certainly 32 byte aligned, and an integral multiple of 32 bytes:

#define DMA_BUFFER _Pragma("location=\".dma_buffer\"")

static DMA_BUFFER ALIGN_32BYTES(uint8_t spiReadData[SPI_MESSAGE_DATA_MAX + SPI_MESSAGE_DMA_MODULUS]);

static DMA_BUFFER ALIGN_32BYTES(SPI_REQUEST_STRUCT spiRequestData);

I'll look into MPU and non-cacheable, thanks.