Tilen MAJERLE

Efficiently use DMA with UART RX on STM32

Discussion created by Tilen MAJERLE Moderator on Aug 7, 2017
Latest reply on Sep 21, 2017 by Andrew Sund

There were questions on how to read UART data using DMA in efficient way when you don't know how many bytes you can expect.

 

U(S)ART peripheral can work very good by using RXNE (Receive Not Empty) for each byte separatelly. In this case, every received byte is manipulated by CPU by jumping to appropriate UART interrupt service routine. To allow CPU to do other job when we receive UART data at high speed we can use DMA (Direct Memory Access) to offload CPU. We can think of DMA as co-processor who can only transfer data between different memories, in our case between peripheral data register UART and temporary DMA buffer we assign to it.

In general, before you start DMA, you have to assign number of bytes DMA should transfer before you say Stop, I’m done with transfer. This event is later called Transfer Complete (TC). But we know, in general, UART can receive data at any time. By UART specifications, we don’t know when and how many of bytes will arrive.

 

Problem you face

Imagine we receive each 5 minutes between 10 and 20 bytes of data. We don't know exact number of bytes each time. We have to tell DMA how many bytes to receive before transfer complete notification is met to read the data with CPU. We can set DMA receive to 10 bytes and after again 10 bytes, but if we receive 14 bytes, then we miss 4 bytes. Actually, they will be in buffer but we won’t be notified that DMA has 4 bytes in memory. In this case, we will spend 5 minutes before another packet with data bytes arrives to first flush old data together with 6 bytes of new data. This can lead to Timeouts if our high-level protocol is packet based with command->response approach, such as RS-485 is in most cases.

 

Solution we provide

We can use very useful feature in UART peripheral, called IDLE line detection. Idle line is detected on RX line when there is no received byte for more than 1 byte time length. So, if we receive 10 bytes one after another (no delay), IDLE line is detected after 11th bytes should be received but it is not.

We are able to force DMA to call transfer complete interrupt when we disable DMA stream by hand, thus disabling enable bit in stream control register. In this case DMA will make an interrupt if they are enabled and we can read number of bytes we need to still receive by reading NDTR register in DMA stream. From here, we can calculate how many elements we already received.

 

If we now do our example again by receiving between 10 and 20 bytes (we receive 14 in this example), we would set DMA to receive 20 bytes. Since we will only receive 14 bytes, we will get IDLE line detection on UART RX line. When IDLE interrupt is enabled, we can handle it and manually disable DMA stream.

 

Example code

I made 2 versions of example code, using SPL and LL drivers. Both use the same hardware setup with UART TX = PA2 and UART RX = PA3 pins. They are available on VCP on Nucleo-64 boards (tested on F401 and F411).

 

/**
* Example was tested on Nucleo-F411 and Nucleo-F401 where VCP is connected to:
*   - USART2, TX: PA2, RX: PA3, used baudrate: 115200
*   - USART2 DMA: DMA1 Stream5 Channel 4
*/

/* Include core modules */
#include "stm32f4xx.h"
#include "string.h"

/* Receive buffer for DMA */
#define DMA_RX_BUFFER_SIZE          64
uint8_t DMA_RX_Buffer[DMA_RX_BUFFER_SIZE];

/* Buffer after received data */
#define UART_BUFFER_SIZE            256
uint8_t UART_Buffer[UART_BUFFER_SIZE];
size_t Write, Read;

USART_InitTypeDef USART_InitStruct;
DMA_InitTypeDef DMA_InitStruct;
GPIO_InitTypeDef GPIO_InitStruct;
NVIC_InitTypeDef NVIC_InitStruct;

int main(void) {
    /* Initialize system */
    SystemInit();
   
    /* Init GPIO pins for UART */
    RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN | RCC_AHB1ENR_DMA1EN;
    RCC->APB1ENR |= RCC_APB1ENR_USART2EN;
    (void)RCC->AHB1ENR;
    (void)RCC->APB1ENR;
   
    /* Set alternate functions */
    GPIO_PinAFConfig(GPIOA, GPIO_PinSource2, GPIO_AF_USART2);
    GPIO_PinAFConfig(GPIOA, GPIO_PinSource3, GPIO_AF_USART2);
   
    /* Init GPIO pins */
    GPIO_StructInit(&GPIO_InitStruct);
    GPIO_InitStruct.GPIO_Pin = GPIO_Pin_2 | GPIO_Pin_3;
    GPIO_InitStruct.GPIO_Mode = GPIO_Mode_AF;
    GPIO_InitStruct.GPIO_OType = GPIO_OType_PP;
    GPIO_InitStruct.GPIO_PuPd = GPIO_PuPd_UP;
    GPIO_InitStruct.GPIO_Speed = GPIO_Speed_100MHz;
    GPIO_Init(GPIOA, &GPIO_InitStruct);
   
    /* Configure UART setup */
    USART_StructInit(&USART_InitStruct);
    USART_InitStruct.USART_BaudRate = 921600;
    USART_InitStruct.USART_HardwareFlowControl = USART_HardwareFlowControl_None;
    USART_InitStruct.USART_Mode = USART_Mode_Tx | USART_Mode_Rx;
    USART_InitStruct.USART_Parity = USART_Parity_No;
    USART_InitStruct.USART_StopBits = USART_StopBits_1;
    USART_InitStruct.USART_WordLength = USART_WordLength_8b;
    USART_Init(USART2, &USART_InitStruct);
   
    /* Enable global interrupts for USART */
    NVIC_InitStruct.NVIC_IRQChannel = USART2_IRQn;
    NVIC_InitStruct.NVIC_IRQChannelCmd = ENABLE;
    NVIC_InitStruct.NVIC_IRQChannelPreemptionPriority = 0;
    NVIC_InitStruct.NVIC_IRQChannelSubPriority = 1;
    NVIC_Init(&NVIC_InitStruct);
   
    /* Enable USART */
    USART_Cmd(USART2, ENABLE);
    USART_DMACmd(USART2, USART_DMAReq_Rx, ENABLE);
    /* Enable IDLE line detection for DMA processing */
    USART_ITConfig(USART2, USART_IT_IDLE, ENABLE);
   
    /* Configure DMA for USART RX, DMA1, Stream5, Channel4 */
    DMA_StructInit(&DMA_InitStruct);
    DMA_InitStruct.DMA_Channel = DMA_Channel_4;
    DMA_InitStruct.DMA_Memory0BaseAddr = (uint32_t)DMA_RX_Buffer;
    DMA_InitStruct.DMA_BufferSize = DMA_RX_BUFFER_SIZE;
    DMA_InitStruct.DMA_PeripheralBaseAddr = (uint32_t)&USART2->DR;
    DMA_InitStruct.DMA_DIR = DMA_DIR_PeripheralToMemory;
    DMA_InitStruct.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;
    DMA_InitStruct.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Byte;
    DMA_InitStruct.DMA_MemoryInc = DMA_MemoryInc_Enable;
    DMA_InitStruct.DMA_PeripheralInc = DMA_PeripheralInc_Disable;
    DMA_Init(DMA1_Stream5, &DMA_InitStruct);
   
    /* Enable global interrupts for DMA stream */
    NVIC_InitStruct.NVIC_IRQChannel = DMA1_Stream5_IRQn;
    NVIC_InitStruct.NVIC_IRQChannelCmd = ENABLE;
    NVIC_InitStruct.NVIC_IRQChannelPreemptionPriority = 0;
    NVIC_InitStruct.NVIC_IRQChannelSubPriority = 0;
    NVIC_Init(&NVIC_InitStruct);
   
    /* Enable transfer complete interrupt */
    DMA_ITConfig(DMA1_Stream5, DMA_IT_TC, ENABLE);
    DMA_Cmd(DMA1_Stream5, ENABLE);
   
    while (1) {
        /**
         * Loop data back to UART data register
         */

        while (Read != Write) {                 /* Do it until buffer is empty */
            USART2->DR = UART_Buffer[Read++];   /* Start byte transfer */
            while (!(USART2->SR & USART_SR_TXE));   /* Wait till finished */
            if (Read == UART_BUFFER_SIZE) {     /* Check buffer overflow */
                Read = 0;
            }
        }
    }
}

/**
* \brief       Global interrupt handler for USART2
*/

void USART2_IRQHandler(void) {
    /* Check for IDLE flag */
    if (USART2->SR & USART_FLAG_IDLE) {         /* We want IDLE flag only */
        /* This part is important */
        /* Clear IDLE flag by reading status register first */
        /* And follow by reading data register */
        volatile uint32_t tmp;                  /* Must be volatile to prevent optimizations */
        tmp = USART2->SR;                       /* Read status register */
        tmp = USART2->DR;                       /* Read data register */
        (void)tmp;                              /* Prevent compiler warnings */
        DMA1_Stream5->CR &= ~DMA_SxCR_EN;       /* Disabling DMA will force transfer complete interrupt if enabled */
    }
}

/**
* \brief       Global interrupt handler for DMA1 stream5
* \note        Except memcpy, there is no functions used to
*/

void DMA1_Stream5_IRQHandler(void) {
    size_t len, tocopy;
    uint8_t* ptr;
   
    /* Check transfer complete flag */
    if (DMA1->HISR & DMA_FLAG_TCIF5) {
        DMA1->HIFCR = DMA_FLAG_TCIF5;           /* Clear transfer complete flag */
       
        /* Calculate number of bytes actually transfered by DMA so far */
        /**
         * Transfer could be completed by 2 events:
         *  - All data actually transfered (NDTR = 0)
         *  - Stream disabled inside USART IDLE line detected interrupt (NDTR != 0)
         */

        len = DMA_RX_BUFFER_SIZE - DMA1_Stream5->NDTR;
        tocopy = UART_BUFFER_SIZE - Write;      /* Get number of bytes we can copy to the end of buffer */
       
        /* Check how many bytes to copy */
        if (tocopy > len) {
            tocopy = len;
        }
       
        /* Write received data for UART main buffer for manipulation later */
        ptr = DMA_RX_Buffer;
        memcpy(&UART_Buffer[Write], ptr, tocopy);   /* Copy first part */
       
        /* Correct values for remaining data */
        Write += tocopy;
        len -= tocopy;
        ptr += tocopy;
       
        /* If still data to write for beginning of buffer */
        if (len) {
            memcpy(&UART_Buffer[0], ptr, len);      /* Don't care if we override Read pointer now */
            Write = len;
        }
       
        /* Prepare DMA for next transfer */
        /* Important! DMA stream won't start if all flags are not cleared first */
        DMA1->HIFCR = DMA_FLAG_DMEIF5 | DMA_FLAG_FEIF5 | DMA_FLAG_HTIF5 | DMA_FLAG_TCIF5 | DMA_FLAG_TEIF5;
        DMA1_Stream5->M0AR = (uint32_t)DMA_RX_Buffer;   /* Set memory address for DMA again */
        DMA1_Stream5->NDTR = DMA_RX_BUFFER_SIZE;    /* Set number of bytes to receive */
        DMA1_Stream5->CR |= DMA_SxCR_EN;            /* Start DMA transfer */
    }
}

Code below was generated using STM32CubeMX software and later modified to LL drivers.

/* Includes ------------------------------------------------------------------*/
#include "main.h"
#include "stm32f4xx_hal.h"
#include "stm32f4xx_ll_dma.h"
#include "stm32f4xx_ll_usart.h"
#include "stm32f4xx_ll_gpio.h"
#include "stm32f4xx_ll_rcc.h"
#include "stm32f4xx_ll_bus.h"
#include "string.h"

void SystemClock_Config(void);

#define DMA_RX_BUFFER_SIZE          64
uint8_t DMA_RX_Buffer[DMA_RX_BUFFER_SIZE];

#define UART_BUFFER_SIZE            256
uint8_t UART_Buffer[UART_BUFFER_SIZE];
volatile size_t Read, Write;

LL_USART_InitTypeDef USART_InitStruct;
LL_DMA_InitTypeDef DMA_InitStruct;

int main(void) {
    /* Reset of all peripherals, Initializes the Flash interface and the Systick. */
    HAL_Init();
    SystemClock_Config();

    /* Enable all clocks */
    LL_AHB1_GRP1_EnableClock(LL_AHB1_GRP1_PERIPH_GPIOA);
    LL_AHB1_GRP1_EnableClock(LL_AHB1_GRP1_PERIPH_DMA1);
    LL_APB1_GRP1_EnableClock(LL_APB1_GRP1_PERIPH_USART2);
   
    /* Init GPIO pins */
    LL_GPIO_SetAFPin_0_7(GPIOA, LL_GPIO_PIN_2, GPIO_AF7_USART2);
    LL_GPIO_SetAFPin_0_7(GPIOA, LL_GPIO_PIN_3, GPIO_AF7_USART2);
   
    LL_GPIO_SetPinMode(GPIOA, LL_GPIO_PIN_2, LL_GPIO_MODE_ALTERNATE);
    LL_GPIO_SetPinOutputType(GPIOA, LL_GPIO_PIN_2, LL_GPIO_OUTPUT_PUSHPULL);
    LL_GPIO_SetPinPull(GPIOA, LL_GPIO_PIN_2, LL_GPIO_PULL_UP);
    LL_GPIO_SetPinSpeed(GPIOA, LL_GPIO_PIN_2, LL_GPIO_SPEED_FREQ_HIGH);
    LL_GPIO_SetPinMode(GPIOA, LL_GPIO_PIN_3, LL_GPIO_MODE_ALTERNATE);
    LL_GPIO_SetPinOutputType(GPIOA, LL_GPIO_PIN_3, LL_GPIO_OUTPUT_PUSHPULL);
    LL_GPIO_SetPinPull(GPIOA, LL_GPIO_PIN_3, LL_GPIO_PULL_UP);
    LL_GPIO_SetPinSpeed(GPIOA, LL_GPIO_PIN_3, LL_GPIO_SPEED_FREQ_HIGH);
   
    /* Configure USART */
    LL_USART_StructInit(&USART_InitStruct);
    USART_InitStruct.BaudRate = 115200;
    USART_InitStruct.DataWidth = LL_USART_DATAWIDTH_8B;
    USART_InitStruct.HardwareFlowControl = LL_USART_HWCONTROL_NONE;
    USART_InitStruct.OverSampling = LL_USART_OVERSAMPLING_16;
    USART_InitStruct.Parity = LL_USART_PARITY_NONE;
    USART_InitStruct.StopBits = LL_USART_STOPBITS_1;
    USART_InitStruct.TransferDirection = LL_USART_DIRECTION_TX_RX;
    LL_USART_Init(USART2, &USART_InitStruct);
   
    /* Enable USART and enable interrupt for IDLE line detection */
    LL_USART_Enable(USART2);
    LL_USART_EnableDMAReq_RX(USART2);
    LL_USART_EnableIT_IDLE(USART2);
   
    /* Enable USART global interrupts */
    NVIC_SetPriority(USART2_IRQn, 1);
    NVIC_EnableIRQ(USART2_IRQn);
   
    /* Configure DMA for USART RX */
    LL_DMA_StructInit(&DMA_InitStruct);
    DMA_InitStruct.Channel = LL_DMA_CHANNEL_4;
    DMA_InitStruct.Direction = LL_DMA_DIRECTION_PERIPH_TO_MEMORY;
    DMA_InitStruct.MemoryOrM2MDstAddress = (uint32_t)DMA_RX_Buffer;
    DMA_InitStruct.NbData = DMA_RX_BUFFER_SIZE;
    DMA_InitStruct.MemoryOrM2MDstIncMode = LL_DMA_MEMORY_INCREMENT;
    DMA_InitStruct.PeriphOrM2MSrcAddress = (uint32_t)&USART2->DR;
    LL_DMA_Init(DMA1, LL_DMA_STREAM_5, &DMA_InitStruct);
   
    LL_DMA_EnableIT_TC(DMA1, LL_DMA_STREAM_5);
    LL_DMA_EnableStream(DMA1, LL_DMA_STREAM_5);
   
    /* Enable global DMA stream interrupts */
    NVIC_SetPriority(DMA1_Stream5_IRQn, 0);
    NVIC_EnableIRQ(DMA1_Stream5_IRQn);
   
    while (1) {
        if (Read != Write) {
            LL_USART_TransmitData8(USART2, UART_Buffer[Read++]);
            while (!LL_USART_IsActiveFlag_TXE(USART2)) {}
            if (Read == UART_BUFFER_SIZE) {
                Read = 0;
            }
        }
    }
}

void USART2_IRQHandler(void) {
    if (LL_USART_IsActiveFlag_IDLE(USART2)) {
        LL_USART_ClearFlag_IDLE(USART2);
        LL_DMA_DisableStream(DMA1, LL_DMA_STREAM_5);
    }
}

void DMA1_Stream5_IRQHandler(void) {
    size_t len, tocopy;
    uint8_t* ptr;
   
    if (LL_DMA_IsActiveFlag_TC5(DMA1)) {
        LL_DMA_ClearFlag_TC5(DMA1);

        len = DMA_RX_BUFFER_SIZE - DMA1_Stream5->NDTR;
        tocopy = UART_BUFFER_SIZE - Write;      /* Get number of bytes we can copy to the end of buffer */
       
        /* Check how many bytes to copy */
        if (tocopy > len) {
            tocopy = len;
        }
       
        /* Write received data for UART main buffer for manipulation later */
        ptr = DMA_RX_Buffer;
        memcpy(&UART_Buffer[Write], ptr, tocopy);   /* Copy first part */
       
        /* Correct values for remaining data */
        Write += tocopy;
        len -= tocopy;
        ptr += tocopy;
       
        /* If still data to write for beginning of buffer */
        if (len) {
            memcpy(&UART_Buffer[0], ptr, len);      /* Don't care if we override Read pointer now */
            Write = len;
        }
       
        /* Prepare DMA for next transfer */
        /* Important! DMA stream won't start if all flags are not cleared first */
        DMA1->HIFCR = DMA_FLAG_DMEIF1_5 | DMA_FLAG_FEIF1_5 | DMA_FLAG_HTIF1_5 | DMA_FLAG_TCIF1_5 | DMA_FLAG_TEIF1_5;
        DMA1_Stream5->M0AR = (uint32_t)DMA_RX_Buffer;   /* Set memory address for DMA again */
        DMA1_Stream5->NDTR = DMA_RX_BUFFER_SIZE;    /* Set number of bytes to receive */
        DMA1_Stream5->CR |= DMA_SxCR_EN;            /* Start DMA transfer */
    }
}

void SystemClock_Config(void) {
  RCC_OscInitTypeDef RCC_OscInitStruct;
  RCC_ClkInitTypeDef RCC_ClkInitStruct;

    /**Configure the main internal regulator output voltage
    */

  __HAL_RCC_PWR_CLK_ENABLE();

  __HAL_PWR_VOLTAGESCALING_CONFIG(PWR_REGULATOR_VOLTAGE_SCALE1);

    /**Initializes the CPU, AHB and APB busses clocks
    */

  RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSI;
  RCC_OscInitStruct.HSIState = RCC_HSI_ON;
  RCC_OscInitStruct.HSICalibrationValue = 16;
  RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON;
  RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_HSI;
  RCC_OscInitStruct.PLL.PLLM = 16;
  RCC_OscInitStruct.PLL.PLLN = 400;
  RCC_OscInitStruct.PLL.PLLP = RCC_PLLP_DIV4;
  RCC_OscInitStruct.PLL.PLLQ = 4;
  if (HAL_RCC_OscConfig(&RCC_OscInitStruct) != HAL_OK)
  {
     
  }

    /**Initializes the CPU, AHB and APB busses clocks
    */

  RCC_ClkInitStruct.ClockType = RCC_CLOCKTYPE_HCLK|RCC_CLOCKTYPE_SYSCLK
                              |RCC_CLOCKTYPE_PCLK1|RCC_CLOCKTYPE_PCLK2;
  RCC_ClkInitStruct.SYSCLKSource = RCC_SYSCLKSOURCE_PLLCLK;
  RCC_ClkInitStruct.AHBCLKDivider = RCC_SYSCLK_DIV1;
  RCC_ClkInitStruct.APB1CLKDivider = RCC_HCLK_DIV2;
  RCC_ClkInitStruct.APB2CLKDivider = RCC_HCLK_DIV1;

  if (HAL_RCC_ClockConfig(&RCC_ClkInitStruct, FLASH_LATENCY_3) != HAL_OK)
  {
     
  }

    /**Configure the Systick interrupt time
    */

  HAL_SYSTICK_Config(HAL_RCC_GetHCLKFreq()/1000);

    /**Configure the Systick
    */

  HAL_SYSTICK_CLKSourceConfig(SYSTICK_CLKSOURCE_HCLK);

  /* SysTick_IRQn interrupt configuration */
  HAL_NVIC_SetPriority(SysTick_IRQn, 0, 0);
}

 

Hope it helps to understand what you need to do for efficient DMA reading from UART.

 

Best regards,

Tilen

Outcomes