FATFS f_open fails when data cache is enabled

SPfis.1 · ‎2022-08-29

When enabled Data cache for example in main:

/* Enable D-Cache---------------------------------------------------------*/
  SCB_EnableDCache();

I think I am supposed to uncomment L63 in sd_diskio.c to activate Cache maintenance:

/* USER CODE BEGIN enableSDDmaCacheMaintenance */
#define ENABLE_SD_DMA_CACHE_MAINTENANCE  1 
/* USER CODE END enableSDDmaCacheMaintenance */

That way the following code is activated after reading from the SD card via DMA:

#if (ENABLE_SD_DMA_CACHE_MAINTENANCE == 1)
            /*
            the SCB_InvalidateDCache_by_Addr() requires a 32-Byte aligned address,
            adjust the address and the D-Cache size to invalidate accordingly.
            */
            alignedAddr = (uint32_t)buff & ~0x1F;
            SCB_InvalidateDCache_by_Addr((uint32_t*)alignedAddr, count*BLOCKSIZE + ((uint32_t)buff - alignedAddr));

I found that through the FATFS my 32 Byte aligned buffer is not anymore aligned when it reaches that function in sd_diskio.c. Some Bytes will be probably used by the FATFS. Those leading bytes are overwritten by the Cache invalidate function.

Therefore f_open call is failing.

I could fix this behavior by adding the two lines:

alignedAddr = (uint32_t)buff & ~0x1F;
SCB_CleanDCache_by_Addr((uint32_t*)alignedAddr, ((uint32_t)buff - alignedAddr));

right before the DMA setup function in sd_diskio.c around L216:

if(BSP_SD_ReadBlocks_DMA((uint32_t*)buff,
                             (uint32_t) (sector),
                             count) == MSD_OK)

I did not check yet, maybe it needs to be added elsewhere.

Side note: I lately discoverd another problem with the same function see "Read From SD Card gets stuck in a While-Loop"

Tesla DeLorean · ‎2022-08-29

The brokenness of this all has been pointed out before.

SCB_InvalidateDCache_by_Addr() can do collateral damage to structures at either end of unaligned buffers.

Generally FATFS passes through buffers, unless it has to decompose things to get sectors and sector alignment.

If you use f_read() and f_write() for small or unaligned data in memory, and/or on the disk, it will be very slow. If you want any level of performance you should buffer data on the application side so it writes large sector aligned/multiples.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Piranha · ‎2022-08-29

The cache maintenance still is not correct. Look there for more details and an example:

https://community.st.com/s/question/0D53W00001Z9K9TSAV/maintaining-cpu-data-cache-coherence-for-dma-buffers

alignedAddr = (uint32_t)buff & ~0x1F;

This is not necessary since this commit from a year 2018. Unfortunately for F7 series ST is still shipping a ridiculously old and broken CMSIS-Core version before that change, but for H7 series they ship version, which includes that change.

SPfis.1 · ‎2022-08-29

Thank you for clarification. I read your article. Thank you for taking the time. I will have a look to the version of the F7 series thanks.

EDIT:

I guess you meant this file of the F7 Series:

https://github.com/STMicroelectronics/STM32CubeF7/blob/master/Middlewares/Third_Party/FatFs/src/drivers/sd_diskio_dma_template_bspv2.c

It contains the same code as in the file I used for the H7.

SPfis.1 · ‎2022-08-29

Thank you for your response. You are right. the aligned buffer I used only for f_read and f_write. My aligned buffer is not used in f_open. Is there a proper way to align the buffer inside FATFS?

thln47 · ‎2022-08-30

Hello everyone,

I've the same problem using FATFs to manage files with STM32H743 on FRAM device (256Ko).

It's DCache coherency problem when using DMA transfert, and buffer data alignment (32bytes).

I made the read and write lowlevel functions like this below:

For read function:

...
/*
  alignedAddr requires a 32-Byte aligned address,
  adjust the address and the D-Cache size to invalidate/clean accordingly.
*/
uint32_t alignedAddr = (uint32_t)RxBuffer & ~(__SCB_DCACHE_LINE_SIZE - 1);
SCB_CleanDCache_by_Addr((uint32_t*)alignedAddr, RxBufferSize + __SCB_DCACHE_LINE_SIZE);
 
wTransferState = TRANSFER_WAIT;
status = HAL_SPI_Receive_DMA(hspi, RxBuffer, RxBufferSize);
 
/* Invalidate cache prior to access by CPU */
SCB_InvalidateDCache_by_Addr((uint32_t*)alignedAddr, RxBufferSize + __SCB_DCACHE_LINE_SIZE);
 
while (wTransferState == TRANSFER_WAIT) {
}
...

For write function:

...
 
/* Clean cache prior to access by DMA */
uint32_t alignedAddr = (uint32_t)TxBuffer & ~(__SCB_DCACHE_LINE_SIZE - 1);
SCB_CleanDCache_by_Addr((uint32_t*)alignedAddr, TxBufferSize + __SCB_DCACHE_LINE_SIZE);
 
wTransferState = TRANSFER_WAIT;
status = HAL_SPI_Transmit_DMA(hspi, TxBuffer, TxBufferSize);
while (wTransferState == TRANSFER_WAIT) {
}
...

After the DMA transfer, D-Cache update doesn't erase data before and after the buffer address (for non aligned buffer).

Is it the right way do ?

Should I protect theses operations with critical section if an interrupt occur ?

Thanks for reply, Bye

SPfis.1 · ‎2022-08-31

IMO the important part is that the buffer is aligned to 32 byte and the length is a multiple of 32. That way it is insured that at clean and invalidate operation no "leftovers" are manipulated. As the FATFS read buffer is only aligned by good luck, and the important "leftover" in the read operation will be destroyed by the invalidate operation we can add an additional clean in front of the Receive_DMA call. (Calling the Clean operation only if it is needed to save some operations)

uint32_t alignedAddr = (uint32_t)RxBuffer & ~(__SCB_DCACHE_LINE_SIZE - 1);
if(((uint32_t)RxBuffer - alignedAddr) > 0) {
   SCB_CleanDCache_by_Addr((uint32_t*)alignedAddr, ((uint32_t)RxBuffer - alignedAddr));
}

In the answer of your question regarding critical section I would be interested myself ... I would guess it is not needed, because another interrupt is not allowed to write to the same buffer and calling the Receive_DMA and Transmit_DMA functions should be mutexed on a higher software level.

thln47 · ‎2022-08-31

If I resume, and if I understood correctly:

A - For reading function, we must do the following steps:

A1 - Check if RxBuffer is 32 bytes misaligned then in this case clean DCache to avoid to lost data when DCache is invalidate after DMA transfer.

uint32_t alignedAddr = (uint32_t)RxBuffer & ~(__SCB_DCACHE_LINE_SIZE - 1);
if(((uint32_t)RxBuffer - alignedAddr) > 0) {
      SCB_CleanDCache_by_Addr((uint32_t*)alignedAddr, RxBufferSize + __SCB_DCACHE_LINE_SIZE);
   }

A2 - Run the DMA transfer : device to memory.

wTransferState = TRANSFER_WAIT;
status = HAL_SPI_Receive_DMA(hspi, RxBuffer, RxBufferSize);

A3 - Wait for DMA transfer completed.

while (wTransferState == TRANSFER_WAIT) {
}

A4 - Invalidate DCache to update it with fresh data from the memory.

SCB_InvalidateDCache_by_Addr((uint32_t*)alignedAddr, ((uint32_t)RxBuffer - alignedAddr));

B - For writting function:

B1 - Clean DCache to be sure that memory and DCache have the same data.

uint32_t alignedAddr = (uint32_t)TxBuffer & ~(__SCB_DCACHE_LINE_SIZE - 1);
SCB_CleanDCache_by_Addr((uint32_t*)alignedAddr, TxBufferSize + __SCB_DCACHE_LINE_SIZE);

B2 - Run the DMA transfer : memory to device.

wTransferState = TRANSFER_WAIT;
status = HAL_SPI_Transmit_DMA(hspi, TxBuffer, TxBufferSize);

B3 - Wait for DMA transfer completed

while (wTransferState == TRANSFER_WAIT) {
}

C - For critical section, I think you are right (SPfis.1) but I have a doubt if the DCache line is used for another memory location inside an interrupt occurring between steps A1 and A3.

Let me know if I misunderstood, Thanks !

thln47 · ‎2022-08-31

Thanks a lot @SPfis.1 for your awnser,

But I have a question about cleaning DCache. Are you sure about your code?

SCB_CleanDCache_by_Addr((uint32_t*)alignedAddr, ((uint32_t)RxBuffer - alignedAddr));

Do you not think that the size should be equal or greater than RxBufferSize + buffer overlap below and after 32 bytes alignment ?

SCB_CleanDCache_by_Addr((uint32_t*)alignedAddr, RxBufferSize + __SCB_DCACHE_LINE_SIZE);

Piranha · ‎2022-09-04

The address and size of Rx buffers themselves must be aligned to cache line size. Those recalculations of address and size are not necessary and cannot solve the problem. And cleaning before the Rx will also not solve the problem. If some variable in the "leftover" region is modified by CPU during the Rx process, the invalidation after Rx will corrupt it.

My article explains it all in details and gives an example of a proper cache maintenance for DMA buffers. Why don't you read it?