cancel
Showing results for 
Search instead for 
Did you mean: 

One-time UART DMA data mismatch

robotwhisperer
Associate II

Hello all, 

This is my first post on here, if I am in error at any point please direct me to any relevant posting guidelines. 

I am developing a sender-receiver solution involving two STM32H753 microcontrollers, both on ST's NUCLEO-STM32H753 boards. I am using USART3 to transmit and receive one byte using DMA. The sender receives one byte (for example 'S') and the receiver receives one byte, checks whether it matches 'S', and if it does, it toggles the yellow on-board LED. If it does not match, it toggles the green on-board LED. This check is being done in the HAL_UART_RxCpltCallback() function. 

I am facing an issue where the first time the HAL_UART_RxCpltCallback triggers after the receiver is reset, it toggles the green LED once (meaning that the received byte did not match with the expected data), and then continuously toggles the yellow LED (meaning that the received byte does match the expected data, this is the desired behaviour). When I went to debug the receiver, I see that when debugging, this problem does not exist, and it always only toggles the yellow LED, i.e., the received byte always matches with the expected data, 'S'. 

So, in the debugger, everything works fine. But when not debugging, the first iteration of data does not match but all subsequent ones do. 

Below I have snippets for the UART Callback functions and main functions for the transmitter and receiver. I also have a screenshot of STM32CubeIDE that shows in the debug view the matching data in rx_buffer on the first iteration. 

// ------------------------------------------------------------------------------
// Receiver Code:
/* USER CODE BEGIN 0 */
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart) {
	if (huart == &huart3) {
		uart_rx_complete = 1;
		callback_count++;
		if (rx_buffer[0] == 'S') {
			HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin); // Matched!
		} else {
			HAL_GPIO_TogglePin(LD1_GPIO_Port, LD1_Pin); // Did not match!
		}
	}
	SCB_InvalidateDCache_by_Addr((uint32_t*)rx_buffer, 1);
	HAL_UART_Receive_DMA(&huart3, rx_buffer, 1);
}
/* USER CODE END 0 */

/**
 * @brief  The application entry point.
 * @retval int
 */
int main(void) {

	/* USER CODE BEGIN 1 */

	/* USER CODE END 1 */

	/* MPU Configuration--------------------------------------------------------*/
	MPU_Config();

	/* Enable the CPU Cache */

	/* Enable I-Cache---------------------------------------------------------*/
	SCB_EnableICache();

	/* Enable D-Cache---------------------------------------------------------*/
	SCB_EnableDCache();

	/* MCU Configuration--------------------------------------------------------*/

	/* Reset of all peripherals, Initializes the Flash interface and the Systick. */
	HAL_Init();

	/* USER CODE BEGIN Init */

	/* USER CODE END Init */

	/* Configure the system clock */
	SystemClock_Config();

	/* USER CODE BEGIN SysInit */

	/* USER CODE END SysInit */

	/* Initialize all configured peripherals */
	MX_GPIO_Init();
	MX_DMA_Init();
	MX_USART6_UART_Init();
	MX_USART3_UART_Init();
	/* USER CODE BEGIN 2 */
	SCB_InvalidateDCache_by_Addr((uint32_t*)rx_buffer, 1);
    HAL_UART_Receive_DMA(&huart3, rx_buffer, 1);

	/* USER CODE END 2 */

	/* Infinite loop */
	/* USER CODE BEGIN WHILE */
	while (1) {

		/* USER CODE END WHILE */

		/* USER CODE BEGIN 3 */
	}
	/* USER CODE END 3 */
}

// ------------------------------------------------------------------------------
// Transmitter Code 
/* USER CODE BEGIN 0 */
void HAL_UART_TxCpltCallback(UART_HandleTypeDef *huart) {
    if (huart == &huart3) {
        uart_tx_complete = 1;
        callback_count++;
        HAL_GPIO_TogglePin(LD3_GPIO_Port, LD3_Pin);
    }
}
/* USER CODE END 0 */

/**
  * @brief  The application entry point.
  * @retval int
  */
int main(void)
{

  /* USER CODE BEGIN 1 */

  /* USER CODE END 1 */

  /* MPU Configuration--------------------------------------------------------*/
  MPU_Config();

  /* Enable the CPU Cache */

  /* Enable I-Cache---------------------------------------------------------*/
  SCB_EnableICache();

  /* Enable D-Cache---------------------------------------------------------*/
  SCB_EnableDCache();

  /* MCU Configuration--------------------------------------------------------*/

  /* Reset of all peripherals, Initializes the Flash interface and the Systick. */
  HAL_Init();

  /* USER CODE BEGIN Init */

  /* USER CODE END Init */

  /* Configure the system clock */
  SystemClock_Config();

  /* USER CODE BEGIN SysInit */

  /* USER CODE END SysInit */

  /* Initialize all configured peripherals */
  MX_GPIO_Init();
  MX_DMA_Init();
  MX_USART3_UART_Init();
  /* USER CODE BEGIN 2 */
	uint8_t data[1] = {'S'};
  /* USER CODE END 2 */

  /* Infinite loop */
  /* USER CODE BEGIN WHILE */
	while (1) {
		if (uart_tx_complete) {
			SCB_CleanDCache_by_Addr((uint32_t*)data, 1);
			if (HAL_UART_Transmit_DMA(&huart3, data, 1) != HAL_OK) {
				HAL_GPIO_WritePin(LD3_GPIO_Port, LD3_Pin, 1);
				Error_Handler();
			}
		}
		HAL_GPIO_TogglePin(LD1_GPIO_Port, LD1_Pin);
		HAL_Delay(500);
    /* USER CODE END WHILE */

    /* USER CODE BEGIN 3 */
	}
  /* USER CODE END 3 */
}

 

STM32CubeIDE Debug session:
Screenshot from 2025-09-05 10-48-10.png
 

1 ACCEPTED SOLUTION

Accepted Solutions
bmckenney
Associate III
	SCB_InvalidateDCache_by_Addr((uint32_t*)rx_buffer, 1);

 I suggest you move this to precede the read of rx_buffer[0] (up at the top of the if() block). When you do the read it's been a "long time" since the Invalidate, and it's not unlikely some neighboring variable caused a re-read of the cache line. 

If I can, I set aside one of the alternate SRAM blocks (SRAM1, e.g.) for DMA buffers, and just keep it non-cacheable. A bit wasteful, maybe, but saves a lot of headaches.

View solution in original post

8 REPLIES 8
robotwhisperer
Associate II

I checked again, it looks like the problem does exist in debug too. If the first point I break is line 73, the if statement ` if(rx_buffer[0] == 'S' ` , then it does show me that the data in the rx_buffer[0] is 0x0. This is strange because I would expect that the DMA would only have triggered if there was data received, and if that is the case I would expect that it would be whatever data was received over UART. So for the first iteration, the data is not showing up. 

bmckenney
Associate III
	SCB_InvalidateDCache_by_Addr((uint32_t*)rx_buffer, 1);

 I suggest you move this to precede the read of rx_buffer[0] (up at the top of the if() block). When you do the read it's been a "long time" since the Invalidate, and it's not unlikely some neighboring variable caused a re-read of the cache line. 

If I can, I set aside one of the alternate SRAM blocks (SRAM1, e.g.) for DMA buffers, and just keep it non-cacheable. A bit wasteful, maybe, but saves a lot of headaches.

TDK
Super User

In addition to invalidating before you read the data, you also need to ensure rx_buffer is cache-page aligned and that nothing else occupies that flash page. Easiest way to do this is to align it and make it the size of a cache page.

If you feel a post has answered your question, please click "Accept as Solution".

The required alignment is 32-bytes, but it's also the minimum width, so surrounding data is subject to collateral damage.

DMA for ONE byte seems to introduce a lot of friction for zero benefit.

Check error handling situations, ie where the UART status has noise, framing or parity errors, and the return values from HAL_UART_...  interactions.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

This fixed it. Could you point me to some resources for doing what you are suggesting, to have all DMA buffers in a non cacheable region of memory? 

Thanks! 

Yes, want to read what's in memory rather than still in the cache.

Also needs to be volatile to force compiler to read new content.

Generally, yes it's better to have uncached regions, on the F7 this could be done using DTCM, but the H7 is more aggravating in this regard, so cache coherency must be managed on input/output buffers appropriately.

The structures need to be on 32-byte boundaries. Invalidate blows away pending writes, so avoid buffers within structures with other variable you'd be using in close proximity, ie a FIFO buffer with head/tail pointers falling within the same 32-byte cache-line.

Like I said, there's a lot of friction to using 1-byte DMA, so better to do enough to reward the effort.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

I don't have my materials here, but the three pieces are:
1) Add a new section to the linker (.ld) file and put it into RAM_D2 (0x30000000). This might resemble

.dma_buffers (NOLOAD) :
{
*(.dma_buffers)
} > RAM_D2

2) Declare the relevant variables such that they're put into that section, e.g.

uint8_t __attribute__ ((section (".dma_buffers")) rx_buffer[5];

3) Configure RAM_D2 in the MPU as "shareable, noncacheable". As I was trying to remember all the details for the bare-metal sequence I ran across a blog post (here) which says you can do it via CubeMX (.ioc file), if you're using that. If you use the HAL there's a function HAL_MPU_ConfigRegion (here). AppNote AN4838 (here) has more than you ever wanted to know about the MPU.

bmckenney
Associate III

3A) The MPU setup might resemble:

// 16KB DMA buffer region at the beginning of RAM_D2
#define DMABUF_BASE     0x30000000UL         // RAM_D2 from the .ld
#define DMABUF_LOG_SIZE 13u                  // log2(16K)-1
MPU->RNR = (1u << MPU_RNR_REGION_Pos);       // Select Region 1
MPU->RBAR = DMABUF_BASE | (0*MPU_RBAR_VALID_Msk); // Address, VALID=0 to use RNR
MPU->RASR =
     (3u << MPU_RASR_AP_Pos) |                // Full access per DDI0403E Table B3-15
     (1u << MPU_RASR_TEX_Pos) | (0*MPU_RASR_C_Msk) | (0*MPU_RASR_B_Msk) | // Non-cacheable per Table B3-13
     MPU_RASR_S_Msk |                         // Shareable
     (0u << MPU_RASR_SRD_Pos) |               // All subregions enabled
     (DMABUF_LOG_SIZE << MPU_RASR_SIZE_Pos) | // Size 16K
     MPU_RASR_ENABLE_Msk; // Enable Region
MPU->CTRL = MPU_CTRL_PRIVDEFENA_Msk | MPU_CTRL_ENABLE_Msk; // Enable MPU

[I don't have an appropriate MCU here to try this on, but I think it's about right.]