cancel
Showing results for 
Search instead for 
Did you mean: 

[BUG] SD_read in sd_diskio.c fails with DMA in cached memory areas

SStor
Senior

Hello,

there seems to be a problem in function SD_read with reading data in DMA transfer mode to cached buffers.

I've enabled cache handling with

#define ENABLE_SD_DMA_CACHE_MAINTENANCE 1

but there is a serious problem with calling SCB_InvalidateDCache_by_Addr() on the 32-byte aligned adress after DMA transfer.

With this alignment the InvalidateDCache function invalidates some additional cache data before and behind the DMA buffer (up to 31 byte) so this data is lost (memory corrupted and possible crash).

I think it's necessary to clean this data from cache to RAM before DMA transfer is started.

Here is a working SD_read function:

DRESULT SD_read(BYTE lun, BYTE *buff, DWORD sector, UINT count)
{
  DRESULT res = RES_ERROR;
  uint32_t timer;
#if (osCMSIS < 0x20000U)
  osEvent event;
#else
  uint16_t event;
  osStatus_t status;
#endif
#if (ENABLE_SD_DMA_CACHE_MAINTENANCE == 1)
  uint32_t alignedAddr;
#endif
  /*
  * ensure the SDCard is ready for a new operation
  */
 
  if (SD_CheckStatusWithTimeout(SD_TIMEOUT) < 0)
  {
    return res;
  }
 
#if defined(ENABLE_SCRATCH_BUFFER)
  if (!((uint32_t)buff & 0x3))
  {
#endif
#if (ENABLE_SD_DMA_CACHE_MAINTENANCE == 1)
    alignedAddr = (uint32_t)buff & ~0x1F;
    /* Clean whole aligned buffer from data cache */
    //SCB_CleanDCache_by_Addr((uint32_t*)alignedAddr, count*BLOCKSIZE + ((uint32_t)buff - alignedAddr));
    /* Clean data cache to write additional aligned data BEFORE DMA buffer */
    SCB_CleanDCache_by_Addr((uint32_t*)alignedAddr, 32);
    /* Clean data cache to write additional aligned data BEHIND DMA buffer */
    SCB_CleanDCache_by_Addr((uint32_t*)(((uint32_t)buff + count*BLOCKSIZE) & ~0x1F), 32);
#endif
    /* Fast path cause destination buffer is correctly aligned */
    uint8_t ret = BSP_SD_ReadBlocks_DMA((uint32_t*)buff, (uint32_t)(sector), count);
 
    if (ret == MSD_OK) {
#if (osCMSIS < 0x20000U)
    /* wait for a message from the queue or a timeout */
    event = osMessageGet(SDQueueID, SD_TIMEOUT);
 
    if (event.status == osEventMessage)
    {
      if (event.value.v == READ_CPLT_MSG)
      {
        timer = osKernelSysTick();
        /* block until SDIO IP is ready or a timeout occur */
        while(osKernelSysTick() - timer <SD_TIMEOUT)
#else
          status = osMessageQueueGet(SDQueueID, (void *)&event, NULL, SD_TIMEOUT);
          if ((status == osOK) && (event == READ_CPLT_MSG))
          {
            timer = osKernelGetTickCount();
            /* block until SDIO IP is ready or a timeout occur */
            while(osKernelGetTickCount() - timer <SD_TIMEOUT)
#endif
            {
              if (BSP_SD_GetCardState() == SD_TRANSFER_OK)
              {
                res = RES_OK;
#if (ENABLE_SD_DMA_CACHE_MAINTENANCE == 1)
                /*
                the SCB_InvalidateDCache_by_Addr() requires a 32-Byte aligned address,
                adjust the address and the D-Cache size to invalidate accordingly.
                */
                alignedAddr = (uint32_t)buff & ~0x1F;
                SCB_InvalidateDCache_by_Addr((uint32_t*)alignedAddr, count*BLOCKSIZE + ((uint32_t)buff - alignedAddr));
#endif
                break;
              }
            }
#if (osCMSIS < 0x20000U)
          }
        }
#else
      }
#endif
    }
 
#if defined(ENABLE_SCRATCH_BUFFER)
    }
    else
    {
      /* Slow path, fetch each sector a part and memcpy to destination buffer */
      int i;
 
      for (i = 0; i < count; i++)
      {
        ret = BSP_SD_ReadBlocks_DMA((uint32_t*)scratch, (uint32_t)sector++, 1);
        if (ret == MSD_OK )
        {
          /* wait until the read is successful or a timeout occurs */
#if (osCMSIS < 0x20000U)
          /* wait for a message from the queue or a timeout */
          event = osMessageGet(SDQueueID, SD_TIMEOUT);
 
          if (event.status == osEventMessage)
          {
            if (event.value.v == READ_CPLT_MSG)
            {
              timer = osKernelSysTick();
              /* block until SDIO IP is ready or a timeout occur */
              while(osKernelSysTick() - timer <SD_TIMEOUT)
#else
                status = osMessageQueueGet(SDQueueID, (void *)&event, NULL, SD_TIMEOUT);
              if ((status == osOK) && (event == READ_CPLT_MSG))
              {
                timer = osKernelGetTickCount();
                /* block until SDIO IP is ready or a timeout occur */
                ret = MSD_ERROR;
                while(osKernelGetTickCount() - timer < SD_TIMEOUT)
#endif
                {
                  ret = BSP_SD_GetCardState();
 
                  if (ret == MSD_OK)
                  {
                    break;
                  }
                }
 
                if (ret != MSD_OK)
                {
                  break;
                }
#if (osCMSIS < 0x20000U)
              }
            }
#else
          }
#endif
#if (ENABLE_SD_DMA_CACHE_MAINTENANCE == 1)
          /*
          *
          * invalidate the scratch buffer before the next read to get the actual data instead of the cached one
          */
          SCB_InvalidateDCache_by_Addr((uint32_t*)scratch, BLOCKSIZE);
#endif
          memcpy(buff, scratch, BLOCKSIZE);
          buff += BLOCKSIZE;
        }
        else
        {
          break;
        }
      }
 
      if ((i == count) && (ret == MSD_OK ))
        res = RES_OK;
    }
#endif
  return res;
}

7 REPLIES 7
KTrac
Associate

Thank you for pointing out this bug.

I can confirm this method work with STM32H733 custom-made board with a 16GB SD card.

Zhi Pang
Associate III

Confirm this bug is still exits.

Environment:

STM32CubeIDE Version: 1.7.0

STM32CubeH7 Firmware Package V1.9.0

Hardware:

STM32H743IIT6 on custom board

microSD 8GBytes HC-Ⅰ Class 4

Software:

✔�? FreeRTOS 10.3.1 with CMSIS-RTOS V2

✔�? FATFS R0.12c in SD Card Mode with [Use dma template => Enable]

​✔�? SDMMC1 in SD 4 bits Wide bus Mode with [SDMMC1 global interrupt]

and other irrelevant settings.

Generated Code Change:

sd_diskio.c

#define ENABLE_SD_DMA_CACHE_MAINTENANCE 1

#define ENABLE_SCRATCH_BUFFER

I use these codes in defaultTask which has .stack_size = 256 * 4

FATFS *fs = &SDFatFS;
retSD = f_mount(fs, SDPath, 1);
if(retSD != FR_OK)
{
    //PrintError
}
retSD = f_getfree(SDPath, &freeCluster, &fs);
if(retSD != FR_OK)
{
    //PrintError
}

No error occur. I even get the correct free space of SD card.

Then I try to open a test file.

retSD = f_open(&SDFile, "test.txt", FA_OPEN_ALWAYS|FA_WRITE);
if(retSD != FR_OK)
{
    //PrintError
}

​And always get FR_INT_ERR result.

After some bug traces, I found that f_mount function didn't initialize SDFatFS (work area) properly. Although it returns FR_OK.

In f_mount => find_volume => fs->database = bsect + sysect; (lines: 3151), the Data start sector is calculate correctly.

But after move_window(fs, bsect + 1) == FR_OK (lines: 3172), the fs->database variable become zero again.

The move_window function doesn't contain any code to change fs->database value. Except it call SD_read function once.

And SCB_InvalidateDCache_by_Addr function change the value accidentally.

Thanks for @SStor​ solution, this "memory corrupted" bug is solved.

DMA + DCache is easy to make trouble, and I will keep testing other FatFS functions.

:D

mantisrobot
Associate III

I think I just ran into the same issue. I was getting some strange crashes, adding your code to the SD_read() function seems to have fixed it? Why is this still an issue..

#if defined(ENABLE_SCRATCH_BUFFER)
  if (!((uint32_t)buff & 0x3))
  {
#endif
#if (ENABLE_SD_DMA_CACHE_MAINTENANCE == 1)
    alignedAddr = (uint32_t)buff & ~0x1F;
    /* Clean whole aligned buffer from data cache */
    //SCB_CleanDCache_by_Addr((uint32_t*)alignedAddr, count*BLOCKSIZE + ((uint32_t)buff - alignedAddr));
    /* Clean data cache to write additional aligned data BEFORE DMA buffer */
    SCB_CleanDCache_by_Addr((uint32_t*)alignedAddr, 32);
    /* Clean data cache to write additional aligned data BEHIND DMA buffer */
    SCB_CleanDCache_by_Addr((uint32_t*)(((uint32_t)buff + count*BLOCKSIZE) & ~0x1F), 32);
#endif
    /* Fast path cause destination buffer is correctly aligned */
    ret = BSP_SD_ReadBlocks_DMA((uint32_t*)buff, (uint32_t)(sector), count);
 
    if (ret == MSD_OK) {

GMene.785
Associate II

HOLYYYYYYYYYYYYY SSTOR,

2025 and I fell in this BUG. Thank you sir, I love you, I was going nuts.

What is the issue and what is the "bug"?

Using cached memory and DMAs needs a cache maintenance, yes. It is called "cache coherency", necessary to do in SW, via Clean and Invalidate.
Yes, a cache maintenance function works on cache lines, so, it flushes/evicts ("clean") or sets cache lines dirty ("invalidate"). Even it would invalidate a bit more, cache lines also "outside" the used memory region - it should not hurt (it just slows down a bit, because next time the entire cache line will be read again, even it was potentially not updated in memory).

There are some "golden rules", when using DMA:

  • if the DMA writes from cached memory to something else, e.g. SD Card - we need a "clean" - before DMA is kicked off. This makes sure that the cache content is written to memory, so that the DMA sees the correct data in memory content.
  • if the DMA reads from external into cached memory, it needs an "invalidate" - after the DMA read transaction was done. This tells the MCU that the cache content is out-of-date with the updated memory.

But why should be there a "bug"? The "bug" is maybe in SW/FW: the cache maintenance is not done properly.

You have two options to avoid this "bug":

  • you can disable DCache in general, run all code without any DCache enabled - reasonable to test and compare
  • configure the MPU and specify that the memory region for the SD card buffers are un-cached

The SW/FW can crash/fail when the "cache maintenance" is not done (or incorrectly done). In this case, the MCU uses "wrong" (not updated) data, because nobody told the MCU that the cache is now "not in sync" with the memory content (for a read done with DMA).

If you forget to "clean" before a DMA write transaction - the SD card might see wrong data (or commands).

Bear also in mind:
When using DMA - there are DMA descriptors in memory (not cache), so that the DMA engine knows what to do.
Another "golden rule":

  • DMA descriptors should be provided on un-cached memory (MPU configured for un-cached memory holding DMA descriptors).
  • Otherwise: you have to "clean" also the memory regions for the DMA descriptors (in addition to the DMA buffers) - before starting a DMA (for Read and Write).

But the FW/SW should work if caches are disabled (e.g. cache maintenance functions not used).

Even cleaning and invalidating "too much" should not crash the FW/SW.
There is not a memory corruption: it is potentially a "cache out-of-sync with memory" issue.

BTW: I think, this SCB_InvalidateDCache_by_Addr() aligns already for you. Even if not: the HW will invalidate just entire cache lines.
Just to make sure, you invalidate "all" cache lines for the entire DMA read buffer, not just one single cache line.

 

I think the issue, is collateral damage to areas or structures abutting the DMA buffer. ie when it's unaligned, you mask the address, and expand the scope

If you align the buffer, not a problem

On the F7 you can use DTCM.

One should avoid the unbounded DCache Invalidate.

Other issues could be avoided/reduced with DCache Clean earlier. 

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
GMene.785
Associate II

Thanks for your help, guys!

I’m still not sure why the code generated by CubeMX isn’t properly aligning the buffer.

I uncommented the ENABLE_SD_DMA_CACHE_MAINTENANCE define, so I thought it would work, but it’s still not behaving as expected.

Anyway, I really appreciate your time, gentlemen!