cancel
Showing results for 
Search instead for 
Did you mean: 

Maintaining CPU data cache coherence for DMA buffers

Piranha
Chief II

This topic is inspired by discussions in ST forum and ARM forum, where a proper cache maintenance was sorted out and an example of a real-life speculative read was detected. Also there is another discussion, where a real-life example of cache eviction was detected.

For Tx (from memory to peripheral) transfers the maintenance is rather simple:

// Application code.
GenerateDataToTransmit(pbData, nbData);
// Prepare and start the DMA Tx transfer.
SCB_CleanDCache_by_Addr(pbData, nbData);
DMA_TxStart(pbData, nbData);

For Rx (from peripheral to memory) transfers the maintenance is a bit more complex:

#define ALIGN_BASE2_CEIL(nSize, nAlign)  ( ((nSize) + ((nAlign) - 1)) & ~((nAlign) - 1) )
 
uint8_t abBuffer[ALIGN_BASE2_CEIL(67, __SCB_DCACHE_LINE_SIZE)] __ALIGNED(__SCB_DCACHE_LINE_SIZE);
 
// Prepare and start the DMA Rx transfer.
SCB_InvalidateDCache_by_Addr(abBuffer, sizeof(abBuffer));
DMA_RxStart(abBuffer, sizeof(abBuffer));
 
// Later, when the DMA has completed the transfer.
size_t nbReceived = DMA_RxGetReceivedDataSize();
SCB_InvalidateDCache_by_Addr(abBuffer, nbReceived);
// Application code.
ProcessReceivedData(abBuffer, nbReceived);

The first cache invalidation at line 6 before the DMA transfer ensures that during the DMA transfer the cache has no dirty lines associated to the buffer, which could be written back to memory by cache eviction. The second cache invalidation at line 11 after the DMA transfer ensures that the cache lines, which during the DMA transfer could be read from memory by speculative reads, are discarded. Therefore cache invalidation for Rx buffers must be done before and after DMA transfer and skipping any of these will lead to Rx buffer corruption.

Doing cache invalidation on arbitrary buffer can corrupt an adjacent memory before and after the particular buffer. To ensure that it does not happen, the buffer has to exactly fill an integer number of cache lines. For that to be the case, the buffer address and size must be aligned to the size of cache line. CMSIS defined constant for data cache line size is __SCB_DCACHE_LINE_SIZE and it is 32 bytes for Cortex-M7 processor. The __ALIGNED() is a CMSIS defined macro for aligning the address of a variable. And the ALIGN_BASE2_CEIL() is a custom macro, which aligns an arbitrary number to the nearest upper multiple of a base-2 number. In this example the 67 is aligned to a multiple of 32 and respectively the buffer size is set to 96 bytes.

Unfortunately for Cortex-M processors ARM doesn't provide a clear explanation or example, but they do provide a short explanation for Cortex-A and Cortex-R series processors.

13 REPLIES 13

Many ST examples for Cortex-M7 are configuring DMA memory regions as non-cacheable in MPU instead of doing cache maintenance. Take a look on Ethernet examples - linker script, MPU configuration and variable definition. Also read AN4838 and AN4839 for more details.

Didier_G
ST Employee

Multiple cases:

1) RX buffer not cacheable

Rationale: no real CPU processing on it
Alignment: probably the non-cacheable region is already aligned to a multiple of cache line, so no real constraint (except maybe DMA constraint)
CPU can consume the buffer without any prerequisite

2) RX buffer cacheable

Rationale: CPU processing on it
Alignment: probably the cacheable region is already aligned to a multiple of cache line, so no real constraint (except maybe DMA constraint)
CPU can consume the buffer with prerequisite: once DMA transfer is over, need to invalidate the RX buffer area.

3) TX buffer non cacheable

Rationale: no real CPU processing on it
Alignment: probably the non-cacheable region is already aligned to a multiple of cache line, so no real constraint (except maybe DMA constraint)
CPU can submit the buffer to the HW (DMA here) without any prerequisite

4) TX buffer cacheable

Rationale: CPU processing on it
Alignment: probably the cacheable region is already aligned to a multiple of cache line, so no real constraint (except maybe DMA constraint)
CPU can submit the buffer to the HW (DMA here) with prerequisite: need to clean & invalidate (aka flush) the TX buffer area.

On top, the cache policy (write-trough, write-back) may save some actions as "clean&invalidate before submitting to the HW" for write-through, but the actions listed above are generally applicable (write-back).

 

2) RX buffer cacheable

This is incomplete as explained in the OP and subsequent posts.  Invalidate need to be done before the DMA to address line eviction and after the DMA completes to address speculative accesses.  Further, the buffer should be aligned and sized to the cache line geometry.

 

4.) TX buffer cacheable

The invalidate step isn't needed, only the clean.  It's also a Really Good Idea for TX buffers to be aligned and sized to the cache line geometry.  (Although, Piranha will debate the "and sized" portion, I do it for simplicity and consistency.)

 

Read the OP and Piranha's subsequent posts multiple times as needed until you get it.

 Invalidate need to be done before the DMA to address line eviction

What are you trying to achieve doing so ? Eviction due to dirty lines (slide 31 of http://events17.linuxfoundation.org/sites/events/files/slides/slides_17.pdf) ? But you fixed the rules: buffer is cache-line aligned and size is cache-line multiple. This is a good practice to avoid fragmented cache lines.

Slides 2 & 3 are important to consider: the statements about elaborated caches don' t always apply the same: some assumptions may shorten/enhance your life, typically the real use of the buffer and its life-cycle.

 

The invalidate step isn't needed, only the clean.

This is general practice to clean & invalidate.

https://stackoverflow.com/questions/76155579/whats-the-point-of-cache-clean-and-invalidate-in-arm-cortex-processors

https://stackoverflow.com/questions/77677914/cache-clean-invalidate

 

 

Read the OP and Piranha's subsequent posts multiple times as needed until you get it.

Please keep such sentence for you.