cancel
Showing results for 
Search instead for 
Did you mean: 

SD Card + RTOS on F7 processor won't work with D-Cache enabled! (My fix below)

mantisrobot
Associate III

*** EDIT ***

Seems this was a memory issue rather than a D-Cache issue, the D-Cache option fixed it as a side effect!

*** EDIT ***

I have a project setup as follows:

STM32F765IIKx

RTOS

SD Card on SDMMC1 configured as SD 4 bits Wide, DMA Enabled

I followed this example, however, after a day of head scratching and nothing working I found that the only way to get it to work was disable the D-Cache. Once I had found this issue I remembered seeing the option within the "sd_dsikio.c" file for cache maintenance.

/*
 * when using cacheable memory region, it may be needed to maintain the cache
 * validity. Enable the define below to activate a cache maintenance at each
 * read and write operation.
 * Notice: This is applicable only for cortex M7 based platform.
 */
/* USER CODE BEGIN enableSDDmaCacheMaintenance */
 #define ENABLE_SD_DMA_CACHE_MAINTENANCE  1
/* USER CODE END enableSDDmaCacheMaintenance */
 
/*
* Some DMA requires 4-Byte aligned address buffer to correctly read/write data,
* in FatFs some accesses aren't thus we need a 4-byte aligned scratch buffer to correctly
* transfer data
*/
/* USER CODE BEGIN enableScratchBuffer */
//#define ENABLE_SCRATCH_BUFFER
/* USER CODE END enableScratchBuffer */

At a punt I decided to set this option and re-enable the D-Cache, this now works.

So why is it that the F7 processor also requires this cache maintenance setting for SD to work properly? I've not used any cache maintenance with my UART DMA routines and they work fine on this F7 processor, however I did need UART DMA cache maintenance on anther project using an H7 processor?

*EDIT*

I also needed this config switch set! I had removed it thinking it was only the D-Cache issue but this is also required for correct operation.

/* USER CODE BEGIN enableScratchBuffer */
#define ENABLE_SCRATCH_BUFFER
/* USER CODE END enableScratchBuffer */

#define ENABLE_SCRATCH_BUFFER

Matt.

1 ACCEPTED SOLUTION

Accepted Solutions
SofLit
ST Employee

You can either set up a non cached memory region which is the simplest way but decreases the CPU perf.

or doing cache maintenance as described in the section 3.2 Example for cache maintenance and data coherency:

"The data coherency between the core and the DMA is ensured by:

1. Either making the SRAM1 buffers not cacheable

2. Or making the SRAM1 buffers cache enabled with write-back policy, with the coherency ensured by software (clean or invalidate D-Cache)

3. Or modifying the SRAM1 region in the MPU attribute to a shared region.

4. Or making the SRAM1 buffer cache enabled with write-through policy."

Note write-through policy is not recommended for F7 : Errata 2.1.1 Cortex®-M7 data corruption when using data cache configured in write-through

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS: This is NOT an online support (https://ols.st.com) but a collaborative space. So please be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help/support.

View solution in original post

7 REPLIES 7
SofLit
ST Employee

Dear @mantisrobot​ ,

Please refer to the AN4839 "Level 1 cache on STM32F7 Series and STM32H7 Series" /

sections :

3.2 Example for cache maintenance and data coherency

4 Mistakes to avoid and tips

SofLit

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS: This is NOT an online support (https://ols.st.com) but a collaborative space. So please be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help/support.
mantisrobot
Associate III

Hi,

Thanks for the tips and link!

First of all I have been getting inconsistent results as per my post above, so it seems it wasn't a D-Cache problem, rather a memory problem and the D-Cache was a side effect fix!

I have read through the tips and mistakes section and I think I'm doing things correctly, so fro a UART DMA write I would do something like this:

#define BUFFER_SIZE 100 
// align buffer and make sure its multiple of 32 bytes
ALIGN_32BYTES(uint8_t	dmaBuffer[(BUFFER_SIZE+31U)&~(uint32_t)0x1F]);
 
// is DCache enabled
if ((SCB->CCR & SCB_CCR_DC_Msk) != 0U)
	{
	// clean Dcache buffer
	SCB_CleanDCache_by_Addr((uint32_t*)(((uint32_t)dmaBuffer) & ~(uint32_t)0x1F), BUFFER_SIZE);
	}
 
// start DMA transmit 
HAL_UART_Transmit_DMA(&usart6, (uint8_t *)dmaBuffer, BUFFER_SIZE );

And for receive:

#define BUFFER_SIZE 100 
// align buffer and make sure its multiple of 32 bytes
ALIGN_32BYTES(uint8_t	dmaRxBuffer[(BUFFER_SIZE+31U)&~(uint32_t)0x1F]);
 
// is DCache enabled
if ((SCB->CCR & SCB_CCR_DC_Msk) != 0U)
	{
	// clean Dcache buffer
	SCB_InvalidateDCache_by_Addr((uint32_t*)(((uint32_t)dmaRxBuffer) & ~(uint32_t)0x1F), BUFFER_SIZE);
	}
 
// start DMA rx
HAL_UARTEx_ReceiveToIdle_DMA(&usart6, (uint8_t *)dmaRxBuffer, BUFFER_SIZE );
__HAL_DMA_DISABLE_IT(&usart6, DMA_IT_HT);

However reading these two lines make me think I should not be using the DCache clean method, rather setting up no-cached memory regions? I'm not sure where to start with that.

* • Always better to use non-cacheable regions for DMA buffers. The software can use the MPU to set up non-cacheable memory block to use as a shared memory between the CPU and DMA.

 * • Do not enable cache for the memory that is being used extensively for a DMA operation.

SofLit
ST Employee

You can either set up a non cached memory region which is the simplest way but decreases the CPU perf.

or doing cache maintenance as described in the section 3.2 Example for cache maintenance and data coherency:

"The data coherency between the core and the DMA is ensured by:

1. Either making the SRAM1 buffers not cacheable

2. Or making the SRAM1 buffers cache enabled with write-back policy, with the coherency ensured by software (clean or invalidate D-Cache)

3. Or modifying the SRAM1 region in the MPU attribute to a shared region.

4. Or making the SRAM1 buffer cache enabled with write-through policy."

Note write-through policy is not recommended for F7 : Errata 2.1.1 Cortex®-M7 data corruption when using data cache configured in write-through

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS: This is NOT an online support (https://ols.st.com) but a collaborative space. So please be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help/support.
mantisrobot
Associate III

Ok,

So currently I'm using method 2 right?

2. Or making the SRAM1 buffers cache enabled with write-back policy, with the coherency ensured by software (clean or invalidate D-Cache)

Other than simplicity is there any advantage to method 1 or 3? Other than the performance loss.

Option 4 is out due to Errata 2.1.1

So currently I'm using method 2 right?

Answer: usage of SCB_CleanDCache_by_Addr()/SCB_InvalidateDCache_by_Addr() --> Yes.

Other than simplicity is there any advantage to method 1 or 3? Other than the performance loss. Answer: just simplicity / loss of CPU perf for non-cached regions..

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS: This is NOT an online support (https://ols.st.com) but a collaborative space. So please be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help/support.
mantisrobot
Associate III

Thanks.

I'm not sure how to use the MPU yet so I'll stick with method 2 for now.

I'm currently developing a programme that uses RTOS and I'm using the FATFS SD middleware driver within STMCubeIDE. I notice within the SD disk IO routines there are while loops with timeouts that can take up to 30 seconds (default) but there is not yield to the OS:

I've included and example of the function SD_CheckStatusWithTimeout() and added a comment with an osDelay(1) inside the loop. Curious why this isn't added for the RTOS implementation when a loop could block for so long?

static int SD_CheckStatusWithTimeout(uint32_t timeout)
{
  uint32_t timer;
  /* block until SDIO peripheral is ready again or a timeout occur */
#if (osCMSIS <= 0x20000U)
  timer = osKernelSysTick();
  while( osKernelSysTick() - timer < timeout)
#else
  timer = osKernelGetTickCount();
  while( osKernelGetTickCount() - timer < timeout)
#endif
  {
    if (BSP_SD_GetCardState() == SD_TRANSFER_OK)
    {
      return 0;
    }
// ***************************************************************************************
 // SHOULDNT THIS BE HERE TO GIVE CONTROL BACK TO SCHEDULER?
  	osDelay(1);
// ***************************************************************************************
  }
 
  return -1;
}

For MPU usage I propose to refer to the AN4838 "Managing memory protection unit in STM32 MCUs"

I propose to open another thread for the latter issue and close this thread (cache usage with SD card) by selecting the Best answer for you.

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS: This is NOT an online support (https://ols.st.com) but a collaborative space. So please be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help/support.