Tilen MAJERLE

Efficiently use DMA with UART RX on STM32

Discussion created by Tilen MAJERLE Moderator on Aug 7, 2017
Latest reply on Feb 14, 2018 by hamid hassannejad

There were questions on how to read UART data using DMA in efficient way when you don't know how many bytes you can expect.

 

U(S)ART peripheral can work very good by using RXNE (Receive Not Empty) for each byte separatelly. In this case, every received byte is manipulated by CPU by jumping to appropriate UART interrupt service routine. To allow CPU to do other job when we receive UART data at high speed we can use DMA (Direct Memory Access) to offload CPU. We can think of DMA as co-processor who can only transfer data between different memories, in our case between peripheral data register UART and temporary DMA buffer we assign to it.

In general, before you start DMA, you have to assign number of bytes DMA should transfer before you say Stop, I’m done with transfer. This event is later called Transfer Complete (TC). But we know, in general, UART can receive data at any time. By UART specifications, we don’t know when and how many of bytes will arrive.

 

Problem you face

Imagine we receive each 5 minutes between 10 and 20 bytes of data. We don't know exact number of bytes each time. We have to tell DMA how many bytes to receive before transfer complete notification is met to read the data with CPU. We can set DMA receive to 10 bytes and after again 10 bytes, but if we receive 14 bytes, then we miss 4 bytes. Actually, they will be in buffer but we won’t be notified that DMA has 4 bytes in memory. In this case, we will spend 5 minutes before another packet with data bytes arrives to first flush old data together with 6 bytes of new data. This can lead to Timeouts if our high-level protocol is packet based with command->response approach, such as RS-485 is in most cases.

 

Solution we provide

We can use very useful feature in UART peripheral, called IDLE line detection. Idle line is detected on RX line when there is no received byte for more than 1 byte time length. So, if we receive 10 bytes one after another (no delay), IDLE line is detected after 11th bytes should be received but it is not.

We are able to force DMA to call transfer complete interrupt when we disable DMA stream by hand, thus disabling enable bit in stream control register. In this case DMA will make an interrupt if they are enabled and we can read number of bytes we need to still receive by reading NDTR register in DMA stream. From here, we can calculate how many elements we already received.

 

If we now do our example again by receiving between 10 and 20 bytes (we receive 14 in this example), we would set DMA to receive 20 bytes. Since we will only receive 14 bytes, we will get IDLE line detection on UART RX line. When IDLE interrupt is enabled, we can handle it and manually disable DMA stream.

 

Example code

I made 2 versions of example code, using SPL and LL drivers. Both use the same hardware setup with UART TX = PA2 and UART RX = PA3 pins. They are available on VCP on Nucleo-64 boards (tested on F401 and F411).

 

/**
* Example was tested on Nucleo-F411 and Nucleo-F401 where VCP is connected to:
*   - USART2, TX: PA2, RX: PA3, used baudrate: 115200
*   - USART2 DMA: DMA1 Stream5 Channel 4
*/

/* Include core modules */
#include "stm32f4xx.h"
#include "string.h"

/* Receive buffer for DMA */
#define DMA_RX_BUFFER_SIZE          64
uint8_t DMA_RX_Buffer[DMA_RX_BUFFER_SIZE];

/* Buffer after received data */
#define UART_BUFFER_SIZE            256
uint8_t UART_Buffer[UART_BUFFER_SIZE];
size_t Write, Read;

USART_InitTypeDef USART_InitStruct;
DMA_InitTypeDef DMA_InitStruct;
GPIO_InitTypeDef GPIO_InitStruct;
NVIC_InitTypeDef NVIC_InitStruct;

int main(void) {
    /* Initialize system */
    SystemInit();
   
    /* Init GPIO pins for UART */
    RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN | RCC_AHB1ENR_DMA1EN;
    RCC->APB1ENR |= RCC_APB1ENR_USART2EN;
    (void)RCC->AHB1ENR;
    (void)RCC->APB1ENR;
   
    /* Set alternate functions */
    GPIO_PinAFConfig(GPIOA, GPIO_PinSource2, GPIO_AF_USART2);
    GPIO_PinAFConfig(GPIOA, GPIO_PinSource3, GPIO_AF_USART2);
   
    /* Init GPIO pins */
    GPIO_StructInit(&GPIO_InitStruct);
    GPIO_InitStruct.GPIO_Pin = GPIO_Pin_2 | GPIO_Pin_3;
    GPIO_InitStruct.GPIO_Mode = GPIO_Mode_AF;
    GPIO_InitStruct.GPIO_OType = GPIO_OType_PP;
    GPIO_InitStruct.GPIO_PuPd = GPIO_PuPd_UP;
    GPIO_InitStruct.GPIO_Speed = GPIO_Speed_100MHz;
    GPIO_Init(GPIOA, &GPIO_InitStruct);
   
    /* Configure UART setup */
    USART_StructInit(&USART_InitStruct);
    USART_InitStruct.USART_BaudRate = 921600;
    USART_InitStruct.USART_HardwareFlowControl = USART_HardwareFlowControl_None;
    USART_InitStruct.USART_Mode = USART_Mode_Tx | USART_Mode_Rx;
    USART_InitStruct.USART_Parity = USART_Parity_No;
    USART_InitStruct.USART_StopBits = USART_StopBits_1;
    USART_InitStruct.USART_WordLength = USART_WordLength_8b;
    USART_Init(USART2, &USART_InitStruct);
   
    /* Enable global interrupts for USART */
    NVIC_InitStruct.NVIC_IRQChannel = USART2_IRQn;
    NVIC_InitStruct.NVIC_IRQChannelCmd = ENABLE;
    NVIC_InitStruct.NVIC_IRQChannelPreemptionPriority = 0;
    NVIC_InitStruct.NVIC_IRQChannelSubPriority = 1;
    NVIC_Init(&NVIC_InitStruct);
   
    /* Enable USART */
    USART_Cmd(USART2, ENABLE);
    USART_DMACmd(USART2, USART_DMAReq_Rx, ENABLE);
    /* Enable IDLE line detection for DMA processing */
    USART_ITConfig(USART2, USART_IT_IDLE, ENABLE);
   
    /* Configure DMA for USART RX, DMA1, Stream5, Channel4 */
    DMA_StructInit(&DMA_InitStruct);
    DMA_InitStruct.DMA_Channel = DMA_Channel_4;
    DMA_InitStruct.DMA_Memory0BaseAddr = (uint32_t)DMA_RX_Buffer;
    DMA_InitStruct.DMA_BufferSize = DMA_RX_BUFFER_SIZE;
    DMA_InitStruct.DMA_PeripheralBaseAddr = (uint32_t)&USART2->DR;
    DMA_InitStruct.DMA_DIR = DMA_DIR_PeripheralToMemory;
    DMA_InitStruct.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;
    DMA_InitStruct.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Byte;
    DMA_InitStruct.DMA_MemoryInc = DMA_MemoryInc_Enable;
    DMA_InitStruct.DMA_PeripheralInc = DMA_PeripheralInc_Disable;
    DMA_Init(DMA1_Stream5, &DMA_InitStruct);
   
    /* Enable global interrupts for DMA stream */
    NVIC_InitStruct.NVIC_IRQChannel = DMA1_Stream5_IRQn;
    NVIC_InitStruct.NVIC_IRQChannelCmd = ENABLE;
    NVIC_InitStruct.NVIC_IRQChannelPreemptionPriority = 0;
    NVIC_InitStruct.NVIC_IRQChannelSubPriority = 0;
    NVIC_Init(&NVIC_InitStruct);
   
    /* Enable transfer complete interrupt */
    DMA_ITConfig(DMA1_Stream5, DMA_IT_TC, ENABLE);
    DMA_Cmd(DMA1_Stream5, ENABLE);
   
    while (1) {
        /**
         * Loop data back to UART data register
         */

        while (Read != Write) {                 /* Do it until buffer is empty */
            USART2->DR = UART_Buffer[Read++];   /* Start byte transfer */
            while (!(USART2->SR & USART_SR_TXE));   /* Wait till finished */
            if (Read == UART_BUFFER_SIZE) {     /* Check buffer overflow */
                Read = 0;
            }
        }
    }
}

/**
* \brief       Global interrupt handler for USART2
*/

void USART2_IRQHandler(void) {
    /* Check for IDLE flag */
    if (USART2->SR & USART_FLAG_IDLE) {         /* We want IDLE flag only */
        /* This part is important */
        /* Clear IDLE flag by reading status register first */
        /* And follow by reading data register */
        volatile uint32_t tmp;                  /* Must be volatile to prevent optimizations */
        tmp = USART2->SR;                       /* Read status register */
        tmp = USART2->DR;                       /* Read data register */
        (void)tmp;                              /* Prevent compiler warnings */
        DMA1_Stream5->CR &= ~DMA_SxCR_EN;       /* Disabling DMA will force transfer complete interrupt if enabled */
    }
}

/**
* \brief       Global interrupt handler for DMA1 stream5
* \note        Except memcpy, there is no functions used to
*/

void DMA1_Stream5_IRQHandler(void) {
    size_t len, tocopy;
    uint8_t* ptr;
   
    /* Check transfer complete flag */
    if (DMA1->HISR & DMA_FLAG_TCIF5) {
        DMA1->HIFCR = DMA_FLAG_TCIF5;           /* Clear transfer complete flag */
       
        /* Calculate number of bytes actually transfered by DMA so far */
        /**
         * Transfer could be completed by 2 events:
         *  - All data actually transfered (NDTR = 0)
         *  - Stream disabled inside USART IDLE line detected interrupt (NDTR != 0)
         */

        len = DMA_RX_BUFFER_SIZE - DMA1_Stream5->NDTR;
        tocopy = UART_BUFFER_SIZE - Write;      /* Get number of bytes we can copy to the end of buffer */
       
        /* Check how many bytes to copy */
        if (tocopy > len) {
            tocopy = len;
        }
       
        /* Write received data for UART main buffer for manipulation later */
        ptr = DMA_RX_Buffer;
        memcpy(&UART_Buffer[Write], ptr, tocopy);   /* Copy first part */
       
        /* Correct values for remaining data */
        Write += tocopy;
        len -= tocopy;
        ptr += tocopy;
       
        /* If still data to write for beginning of buffer */
        if (len) {
            memcpy(&UART_Buffer[0], ptr, len);      /* Don't care if we override Read pointer now */
            Write = len;
        }
       
        /* Prepare DMA for next transfer */
        /* Important! DMA stream won't start if all flags are not cleared first */
        DMA1->HIFCR = DMA_FLAG_DMEIF5 | DMA_FLAG_FEIF5 | DMA_FLAG_HTIF5 | DMA_FLAG_TCIF5 | DMA_FLAG_TEIF5;
        DMA1_Stream5->M0AR = (uint32_t)DMA_RX_Buffer;   /* Set memory address for DMA again */
        DMA1_Stream5->NDTR = DMA_RX_BUFFER_SIZE;    /* Set number of bytes to receive */
        DMA1_Stream5->CR |= DMA_SxCR_EN;            /* Start DMA transfer */
    }
}

Code below was generated using STM32CubeMX software and later modified to LL drivers.

/* Includes ------------------------------------------------------------------*/
#include "main.h"
#include "stm32f4xx_hal.h"
#include "stm32f4xx_ll_dma.h"
#include "stm32f4xx_ll_usart.h"
#include "stm32f4xx_ll_gpio.h"
#include "stm32f4xx_ll_rcc.h"
#include "stm32f4xx_ll_bus.h"
#include "string.h"

void SystemClock_Config(void);

#define DMA_RX_BUFFER_SIZE          64
uint8_t DMA_RX_Buffer[DMA_RX_BUFFER_SIZE];

#define UART_BUFFER_SIZE            256
uint8_t UART_Buffer[UART_BUFFER_SIZE];
volatile size_t Read, Write;

LL_USART_InitTypeDef USART_InitStruct;
LL_DMA_InitTypeDef DMA_InitStruct;

int main(void) {
    /* Reset of all peripherals, Initializes the Flash interface and the Systick. */
    HAL_Init();
    SystemClock_Config();

    /* Enable all clocks */
    LL_AHB1_GRP1_EnableClock(LL_AHB1_GRP1_PERIPH_GPIOA);
    LL_AHB1_GRP1_EnableClock(LL_AHB1_GRP1_PERIPH_DMA1);
    LL_APB1_GRP1_EnableClock(LL_APB1_GRP1_PERIPH_USART2);
   
    /* Init GPIO pins */
    LL_GPIO_SetAFPin_0_7(GPIOA, LL_GPIO_PIN_2, GPIO_AF7_USART2);
    LL_GPIO_SetAFPin_0_7(GPIOA, LL_GPIO_PIN_3, GPIO_AF7_USART2);
   
    LL_GPIO_SetPinMode(GPIOA, LL_GPIO_PIN_2, LL_GPIO_MODE_ALTERNATE);
    LL_GPIO_SetPinOutputType(GPIOA, LL_GPIO_PIN_2, LL_GPIO_OUTPUT_PUSHPULL);
    LL_GPIO_SetPinPull(GPIOA, LL_GPIO_PIN_2, LL_GPIO_PULL_UP);
    LL_GPIO_SetPinSpeed(GPIOA, LL_GPIO_PIN_2, LL_GPIO_SPEED_FREQ_HIGH);
    LL_GPIO_SetPinMode(GPIOA, LL_GPIO_PIN_3, LL_GPIO_MODE_ALTERNATE);
    LL_GPIO_SetPinOutputType(GPIOA, LL_GPIO_PIN_3, LL_GPIO_OUTPUT_PUSHPULL);
    LL_GPIO_SetPinPull(GPIOA, LL_GPIO_PIN_3, LL_GPIO_PULL_UP);
    LL_GPIO_SetPinSpeed(GPIOA, LL_GPIO_PIN_3, LL_GPIO_SPEED_FREQ_HIGH);
   
    /* Configure USART */
    LL_USART_StructInit(&USART_InitStruct);
    USART_InitStruct.BaudRate = 115200;
    USART_InitStruct.DataWidth = LL_USART_DATAWIDTH_8B;
    USART_InitStruct.HardwareFlowControl = LL_USART_HWCONTROL_NONE;
    USART_InitStruct.OverSampling = LL_USART_OVERSAMPLING_16;
    USART_InitStruct.Parity = LL_USART_PARITY_NONE;
    USART_InitStruct.StopBits = LL_USART_STOPBITS_1;
    USART_InitStruct.TransferDirection = LL_USART_DIRECTION_TX_RX;
    LL_USART_Init(USART2, &USART_InitStruct);
   
    /* Enable USART and enable interrupt for IDLE line detection */
    LL_USART_Enable(USART2);
    LL_USART_EnableDMAReq_RX(USART2);
    LL_USART_EnableIT_IDLE(USART2);
   
    /* Enable USART global interrupts */
    NVIC_SetPriority(USART2_IRQn, 1);
    NVIC_EnableIRQ(USART2_IRQn);
   
    /* Configure DMA for USART RX */
    LL_DMA_StructInit(&DMA_InitStruct);
    DMA_InitStruct.Channel = LL_DMA_CHANNEL_4;
    DMA_InitStruct.Direction = LL_DMA_DIRECTION_PERIPH_TO_MEMORY;
    DMA_InitStruct.MemoryOrM2MDstAddress = (uint32_t)DMA_RX_Buffer;
    DMA_InitStruct.NbData = DMA_RX_BUFFER_SIZE;
    DMA_InitStruct.MemoryOrM2MDstIncMode = LL_DMA_MEMORY_INCREMENT;
    DMA_InitStruct.PeriphOrM2MSrcAddress = (uint32_t)&USART2->DR;
    LL_DMA_Init(DMA1, LL_DMA_STREAM_5, &DMA_InitStruct);
   
    LL_DMA_EnableIT_TC(DMA1, LL_DMA_STREAM_5);
    LL_DMA_EnableStream(DMA1, LL_DMA_STREAM_5);
   
    /* Enable global DMA stream interrupts */
    NVIC_SetPriority(DMA1_Stream5_IRQn, 0);
    NVIC_EnableIRQ(DMA1_Stream5_IRQn);
   
    while (1) {
        if (Read != Write) {
            LL_USART_TransmitData8(USART2, UART_Buffer[Read++]);
            while (!LL_USART_IsActiveFlag_TXE(USART2)) {}
            if (Read == UART_BUFFER_SIZE) {
                Read = 0;
            }
        }
    }
}

void USART2_IRQHandler(void) {
    if (LL_USART_IsActiveFlag_IDLE(USART2)) {
        LL_USART_ClearFlag_IDLE(USART2);
        LL_DMA_DisableStream(DMA1, LL_DMA_STREAM_5);
    }
}

void DMA1_Stream5_IRQHandler(void) {
    size_t len, tocopy;
    uint8_t* ptr;
   
    if (LL_DMA_IsActiveFlag_TC5(DMA1)) {
        LL_DMA_ClearFlag_TC5(DMA1);

        len = DMA_RX_BUFFER_SIZE - DMA1_Stream5->NDTR;
        tocopy = UART_BUFFER_SIZE - Write;      /* Get number of bytes we can copy to the end of buffer */
       
        /* Check how many bytes to copy */
        if (tocopy > len) {
            tocopy = len;
        }
       
        /* Write received data for UART main buffer for manipulation later */
        ptr = DMA_RX_Buffer;
        memcpy(&UART_Buffer[Write], ptr, tocopy);   /* Copy first part */
       
        /* Correct values for remaining data */
        Write += tocopy;
        len -= tocopy;
        ptr += tocopy;
       
        /* If still data to write for beginning of buffer */
        if (len) {
            memcpy(&UART_Buffer[0], ptr, len);      /* Don't care if we override Read pointer now */
            Write = len;
        }
       
        /* Prepare DMA for next transfer */
        /* Important! DMA stream won't start if all flags are not cleared first */
        DMA1->HIFCR = DMA_FLAG_DMEIF1_5 | DMA_FLAG_FEIF1_5 | DMA_FLAG_HTIF1_5 | DMA_FLAG_TCIF1_5 | DMA_FLAG_TEIF1_5;
        DMA1_Stream5->M0AR = (uint32_t)DMA_RX_Buffer;   /* Set memory address for DMA again */
        DMA1_Stream5->NDTR = DMA_RX_BUFFER_SIZE;    /* Set number of bytes to receive */
        DMA1_Stream5->CR |= DMA_SxCR_EN;            /* Start DMA transfer */
    }
}

void SystemClock_Config(void) {
  RCC_OscInitTypeDef RCC_OscInitStruct;
  RCC_ClkInitTypeDef RCC_ClkInitStruct;

    /**Configure the main internal regulator output voltage
    */

  __HAL_RCC_PWR_CLK_ENABLE();

  __HAL_PWR_VOLTAGESCALING_CONFIG(PWR_REGULATOR_VOLTAGE_SCALE1);

    /**Initializes the CPU, AHB and APB busses clocks
    */

  RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSI;
  RCC_OscInitStruct.HSIState = RCC_HSI_ON;
  RCC_OscInitStruct.HSICalibrationValue = 16;
  RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON;
  RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_HSI;
  RCC_OscInitStruct.PLL.PLLM = 16;
  RCC_OscInitStruct.PLL.PLLN = 400;
  RCC_OscInitStruct.PLL.PLLP = RCC_PLLP_DIV4;
  RCC_OscInitStruct.PLL.PLLQ = 4;
  if (HAL_RCC_OscConfig(&RCC_OscInitStruct) != HAL_OK)
  {
     
  }

    /**Initializes the CPU, AHB and APB busses clocks
    */

  RCC_ClkInitStruct.ClockType = RCC_CLOCKTYPE_HCLK|RCC_CLOCKTYPE_SYSCLK
                              |RCC_CLOCKTYPE_PCLK1|RCC_CLOCKTYPE_PCLK2;
  RCC_ClkInitStruct.SYSCLKSource = RCC_SYSCLKSOURCE_PLLCLK;
  RCC_ClkInitStruct.AHBCLKDivider = RCC_SYSCLK_DIV1;
  RCC_ClkInitStruct.APB1CLKDivider = RCC_HCLK_DIV2;
  RCC_ClkInitStruct.APB2CLKDivider = RCC_HCLK_DIV1;

  if (HAL_RCC_ClockConfig(&RCC_ClkInitStruct, FLASH_LATENCY_3) != HAL_OK)
  {
     
  }

    /**Configure the Systick interrupt time
    */

  HAL_SYSTICK_Config(HAL_RCC_GetHCLKFreq()/1000);

    /**Configure the Systick
    */

  HAL_SYSTICK_CLKSourceConfig(SYSTICK_CLKSOURCE_HCLK);

  /* SysTick_IRQn interrupt configuration */
  HAL_NVIC_SetPriority(SysTick_IRQn, 0, 0);
}

 

Edit 1: Added support for non-interrupt polling for devices with no IDLE line

 

After a nice discussion in comments and possible support for other devices where DMA streams are not available (other DMA structure), here is an implementation without using IDLE line detection and without using any DMA interrupts.

DMA works in circular mode and user is responsible to read the data from DMA buffer periodically otherwise you will have data loss because DMA will overwrite your non-read data.

 

Most of useful comments are inside the code. Example was developed on NUCLEO-L073 development board, but concept should work on every STM32.

 

Example itself will read the data over USART2 (VCP on NUCLEO) and will do simple loopback at 921600 bauds. Function usart_process_data can be used for user processing. In this example, data are written back to USART register (very simple)

 

I always use this approach where IDLE line is not available. This is usually on smaller devices (but not necessary always) where high amount of data over UART is not so common.

 

#include "main.h"

/* List of private functions */
static void LL_Init(void);
void SystemClock_Config(void);
static void MX_USART2_UART_Init(void);
void check_dma_receive(void);

int
main(void) {
    LL_Init();
    SystemClock_Config();
   
    MX_USART2_UART_Init();                      /* Init USART 2 with DMA channel included */
   
    while (1) {
        /**
         * Periodically check for USART data over DMA
         * This function can be called also from interrupt (periodic timer, etc) or single thread in RTOS systems
         */

        check_dma_receive();                    /* Periodically check for new DMA data */
    }
}

/**
* \brief           DMA buffer size
* \note            Depends on how fast you process it and stream length you may expect on DMA
*/

#define DMA_RX_BUFF_SIZE            16         /* DMA buffer size, make it aligned to 2 bytes */
uint8_t dma_rx_buff[DMA_RX_BUFF_SIZE];


/**
* \brief           Actually process USART data received over UART
* \note            Called from \ref check_dma_receive function
* \param[in]       d: Pointer to data to process
* \param[in]       len: Length of data in units of bytes
*/

void
usart_process_data(const void* d, uint32_t len) {
    const uint8_t* b = d;
    while (len--) {                             /* Simply process all bytes */
        LL_USART_TransmitData8(USART2, *b++);   /* And send them over USART */
        while (!LL_USART_IsActiveFlag_TXE(USART2));
    }
}

/**
* \brief           Periodically check for new incoming data from DMA
* \note            Call this function either in main loop (one thread in RTOS, or main while loop) or from timer interrupt
* \note            This function must be called faster than time used to fill entire DMA buffer
*                  Frequency of this function call depends on:
*                    - USART baudrate
*                    - USART number of bytes at a time (stream length if exists)
*                    - DMA RX buffer size
*
* Example uses 921600 baudrate configuration, which means that every byte takes around 10us
* If you call this function every 1ms, in worst case you need 1k buffer to be safe.
* This applies if you have a big stream of RX data on USART. In case you have a stream of 10 bytes every 20ms,
* you can decrease buffer size.
*
* Example would be ~100-200 bytes (depends on NMEA active statements) long stream of data over GPS (NMEA) up to every 100ms
*/

void
check_dma_receive(void) {
    static uint32_t old_pos;
    uint32_t pos;
   
    /**
     * When DMA is active, LL_DMA_GetDataLength will return remaining elements to transfer (in our case bytes)
     * Subtraction is used to invert to number of bytes currently written in buffer
     */

    pos = sizeof(dma_rx_buff) - LL_DMA_GetDataLength(DMA1, LL_DMA_CHANNEL_5);   /* Get current position */
    if (pos != old_pos) {                       /* Something has changed */
        if (pos > old_pos) {                    /* We are linear position, simple buffer managment */
            usart_process_data(&dma_rx_buff[old_pos], pos - old_pos);  /* Process data */
        } else {                                /* Current DMA buffer overflowed already */
            usart_process_data(&dma_rx_buff[old_pos], sizeof(dma_rx_buff) - old_pos);  /* Send remaining data to the end of buffer */
            old_pos = 0;                        /* Go to beginning */
            if (pos > 0) {                      /* Still something to read? */
                usart_process_data(&dma_rx_buff[0], pos);   /* Process remaining bytes */
            }
        }
        old_pos = pos;                          /* Update new value for next call */
    }
}

/**
* \brief           Init USART2 at 921600 bauds, TX = PA2, RX = PA3
* \note            This configuration is used on Nucleo-L073 board
*/

static void
MX_USART2_UART_Init(void) {
    LL_USART_InitTypeDef USART_InitStruct;
    LL_GPIO_InitTypeDef GPIO_InitStruct;
   
    /* Init with LL driver */
    /* DMA controller clock enable */
    LL_AHB1_GRP1_EnableClock(LL_AHB1_GRP1_PERIPH_DMA1);
    /* Peripheral clock enable */
    LL_APB1_GRP1_EnableClock(LL_APB1_GRP1_PERIPH_USART2);
    LL_IOP_GRP1_EnableClock(LL_IOP_GRP1_PERIPH_GPIOA);

    GPIO_InitStruct.Pin = LL_GPIO_PIN_2;
    GPIO_InitStruct.Mode = LL_GPIO_MODE_ALTERNATE;
    GPIO_InitStruct.Speed = LL_GPIO_SPEED_FREQ_VERY_HIGH;
    GPIO_InitStruct.OutputType = LL_GPIO_OUTPUT_PUSHPULL;
    GPIO_InitStruct.Pull = LL_GPIO_PULL_UP;
    GPIO_InitStruct.Alternate = LL_GPIO_AF_4;
    LL_GPIO_Init(GPIOA, &GPIO_InitStruct);

    GPIO_InitStruct.Pin = LL_GPIO_PIN_3;
    GPIO_InitStruct.Mode = LL_GPIO_MODE_ALTERNATE;
    GPIO_InitStruct.Speed = LL_GPIO_SPEED_FREQ_VERY_HIGH;
    GPIO_InitStruct.OutputType = LL_GPIO_OUTPUT_PUSHPULL;
    GPIO_InitStruct.Pull = LL_GPIO_PULL_UP;
    GPIO_InitStruct.Alternate = LL_GPIO_AF_4;
    LL_GPIO_Init(GPIOA, &GPIO_InitStruct);

    LL_DMA_SetPeriphRequest(DMA1, LL_DMA_CHANNEL_5, LL_DMA_REQUEST_4);
    LL_DMA_SetDataTransferDirection(DMA1, LL_DMA_CHANNEL_5, LL_DMA_DIRECTION_PERIPH_TO_MEMORY);
    LL_DMA_SetChannelPriorityLevel(DMA1, LL_DMA_CHANNEL_5, LL_DMA_PRIORITY_LOW);
    LL_DMA_SetMode(DMA1, LL_DMA_CHANNEL_5, LL_DMA_MODE_CIRCULAR);
    LL_DMA_SetPeriphIncMode(DMA1, LL_DMA_CHANNEL_5, LL_DMA_PERIPH_NOINCREMENT);
    LL_DMA_SetMemoryIncMode(DMA1, LL_DMA_CHANNEL_5, LL_DMA_MEMORY_INCREMENT);
    LL_DMA_SetPeriphSize(DMA1, LL_DMA_CHANNEL_5, LL_DMA_PDATAALIGN_BYTE);
    LL_DMA_SetMemorySize(DMA1, LL_DMA_CHANNEL_5, LL_DMA_MDATAALIGN_BYTE);
    LL_DMA_SetMemoryAddress(DMA1, LL_DMA_CHANNEL_5, (uint32_t)dma_rx_buff);
    LL_DMA_SetDataLength(DMA1, LL_DMA_CHANNEL_5, sizeof(dma_rx_buff));
    LL_DMA_SetPeriphAddress(DMA1, LL_DMA_CHANNEL_5, (uint32_t)&USART2->RDR);

    USART_InitStruct.BaudRate = 921600;
    USART_InitStruct.DataWidth = LL_USART_DATAWIDTH_8B;
    USART_InitStruct.StopBits = LL_USART_STOPBITS_1;
    USART_InitStruct.Parity = LL_USART_PARITY_NONE;
    USART_InitStruct.TransferDirection = LL_USART_DIRECTION_TX_RX;
    USART_InitStruct.HardwareFlowControl = LL_USART_HWCONTROL_NONE;
    USART_InitStruct.OverSampling = LL_USART_OVERSAMPLING_16;
    LL_USART_Init(USART2, &USART_InitStruct);
    LL_USART_DisableOverrunDetect(USART2);
    LL_USART_ConfigAsyncMode(USART2);
    LL_USART_Enable(USART2);
    LL_USART_EnableDMAReq_RX(USART2);
   
    LL_DMA_EnableChannel(DMA1, LL_DMA_CHANNEL_5);

    /* DMA interrupt init */
    /* DMA1_Channel4_5_6_7_IRQn interrupt configuration */
    NVIC_SetPriority(DMA1_Channel4_5_6_7_IRQn, 0);
    NVIC_EnableIRQ(DMA1_Channel4_5_6_7_IRQn);
}

static void
LL_Init(void) {
    LL_APB2_GRP1_EnableClock(LL_APB2_GRP1_PERIPH_SYSCFG);
    LL_APB1_GRP1_EnableClock(LL_APB1_GRP1_PERIPH_PWR);

    /* System interrupt init */
    NVIC_SetPriority(SVC_IRQn, 0);
    NVIC_SetPriority(PendSV_IRQn, 0);
    NVIC_SetPriority(SysTick_IRQn, 0);
}

/**
* System Clock Configuration
*/

void
SystemClock_Config(void) {
    LL_FLASH_SetLatency(LL_FLASH_LATENCY_1);
    if (LL_FLASH_GetLatency() != LL_FLASH_LATENCY_1) {
        while (1);
    }
    LL_PWR_SetRegulVoltageScaling(LL_PWR_REGU_VOLTAGE_SCALE1);
    LL_RCC_HSI_Enable();

    /* Wait till HSI is ready */
    while (LL_RCC_HSI_IsReady() != 1);
    LL_RCC_HSI_SetCalibTrimming(16);
    LL_RCC_PLL_ConfigDomain_SYS(LL_RCC_PLLSOURCE_HSI, LL_RCC_PLL_MUL_4, LL_RCC_PLL_DIV_2);
    LL_RCC_PLL_Enable();

    /* Wait till PLL is ready */
    while (LL_RCC_PLL_IsReady() != 1);
    LL_RCC_SetAHBPrescaler(LL_RCC_SYSCLK_DIV_1);
    LL_RCC_SetAPB1Prescaler(LL_RCC_APB1_DIV_1);
    LL_RCC_SetAPB2Prescaler(LL_RCC_APB2_DIV_1);
    LL_RCC_SetSysClkSource(LL_RCC_SYS_CLKSOURCE_PLL);

    /* Wait till System clock is ready */
    while (LL_RCC_GetSysClkSource() != LL_RCC_SYS_CLKSOURCE_STATUS_PLL);
    LL_Init1msTick(32000000);
    LL_SYSTICK_SetClkSource(LL_SYSTICK_CLKSOURCE_HCLK);
    LL_SetSystemCoreClock(32000000);
    LL_RCC_SetUSARTClockSource(LL_RCC_USART2_CLKSOURCE_PCLK1);

    /* SysTick_IRQn interrupt configuration */
    NVIC_SetPriority(SysTick_IRQn, 0);
}

 

Edit 2: An example with HT and TC DMA interrupts + IDLE line detection

In devices where IDLE line is available, bottom approach can be used to deal with UART RX data with DMA. Every DMA in STM32 supports half-transfer and transfer complete interrupts, so the only problem is if IDLE line is supported in particular family or not.

 

This example uses single buffer acting like double buffering. What you have to make sure is that your processing time of data from interrupt is less than time needed to fill half of entire buffer or you may have loss of data (overwrite). IDLE line is added to detect no more data on USART and to trigger immediate processing of remaining data not handled with HT/TC interrupts.

 

Example was developed on Nucleo-F401 board, using USART2 (connected to VCP) at 921600 bauds.

 

Function process_usart_data can be called from 3 different places:

  1. When half-transfer DMA interrupt occurs
  2. When transfer complete DMA interrupt occurs
  3. When IDLE line is detected on UART RX line
    1. In case IDLE line is detected, it may happen that it is detected just after HT/TC interrupts. In this case, no data should be processed as DMA interrupt is triggered before IDLE line.

I always use this approach with STM32 where IDLE line is available.

 

#include "main.h"

static void LL_Init(void);
void SystemClock_Config(void);
static void MX_USART2_UART_Init(void);

int
main(void) {
    LL_Init();
    SystemClock_Config();

    /* Initialize all configured peripherals */
    MX_USART2_UART_Init();                      /* Init USART and DMA and start receiving */
                                               
    while (1) {

    }
}

/**
* \brief           Process DMA received data
* \note            This example will do a loopback and send them back to USART
* \param[in]       data: Pointer to data to send
* \param[in]       len: Number of bytes to process
*/

void
process_usart_data(const void* data, uint16_t len) {
    const uint8_t* b = data;
    while (len--) {
        LL_USART_TransmitData8(USART2, *b++);
        while (!LL_USART_IsActiveFlag_TXE(USART2));
    }
}

/**
* \brief           DMA buffer size
* \note            Size depends on your speed and processing power
*/

#define DMA_RX_BUFF_SIZE            64         /* DMA buffer size, make it aligned to 2 bytes */
uint8_t dma_rx_buff[DMA_RX_BUFF_SIZE];

uint16_t old_pos;

/**
* \brief           Handles USART DMA interrupt
* \note            Handles HT/TC interrupts only
*/

void
DMA1_Stream5_IRQHandler(void) {
    uint16_t pos;
    if (LL_DMA_IsActiveFlag_TC5(DMA1)) {        /* In case of transfer complete flag, process array to the end only */
        pos = 0;
    } else {
        pos = LL_DMA_GetDataLength(DMA1, LL_DMA_STREAM_5);  /* Get remaining data in buffer */
    }
    pos = sizeof(dma_rx_buff) - pos;            /* Get current position in buffer */
    if (LL_DMA_IsActiveFlag_TC5(DMA1)) {        /* Transfer complete */
        LL_DMA_ClearFlag_TC5(DMA1);             /* Clear transfer complete flag */
        process_usart_data(&dma_rx_buff[old_pos], pos - old_pos);
        old_pos = 0;                            /* Reset old position for next read */
    } else if (LL_DMA_IsActiveFlag_HT5(DMA1)) { /* Half transfer complete */
        LL_DMA_ClearFlag_HT5(DMA1);             /* Clear half transfer flag */
        process_usart_data(&dma_rx_buff[old_pos], pos - old_pos);
        old_pos = pos;                          /* Remember position for next time */
    }
}

/**
* \brief           USART 2 interrupt handler
* \note            IDLE line detection is triggered in case there is no data on line for more than 1 frame
*/

void
USART2_IRQHandler(void) {
    uint16_t pos;
    /**
     * Check for idle line detection on USART
     */

    if (LL_USART_IsActiveFlag_IDLE(USART2)) {
        LL_USART_ClearFlag_IDLE(USART2);
        pos = sizeof(dma_rx_buff) - LL_DMA_GetDataLength(DMA1, LL_DMA_STREAM_5);
       
        /**
         * In case IDLE line is detected just after DMA HT/TC,
         * there is no data to read because DMA HT/TC is triggered before
         */

        if ((pos - old_pos) > 0) {              /* Anything to process? */
            process_usart_data(&dma_rx_buff[old_pos], pos - old_pos);
        }
        old_pos = pos;
    }
}

/**
* \brief           Init USART2 at 921600 bauds, TX = PA2, RX = PA3, Nucleo-F401 board setup
*/

static void
MX_USART2_UART_Init(void) {
    LL_USART_InitTypeDef USART_InitStruct;

    LL_GPIO_InitTypeDef GPIO_InitStruct;

    LL_AHB1_GRP1_EnableClock(LL_AHB1_GRP1_PERIPH_GPIOA);
    LL_APB1_GRP1_EnableClock(LL_APB1_GRP1_PERIPH_USART2);
    LL_AHB1_GRP1_EnableClock(LL_AHB1_GRP1_PERIPH_DMA1);

    GPIO_InitStruct.Pin = LL_GPIO_PIN_2 | LL_GPIO_PIN_3;
    GPIO_InitStruct.Mode = LL_GPIO_MODE_ALTERNATE;
    GPIO_InitStruct.Speed = LL_GPIO_SPEED_FREQ_VERY_HIGH;
    GPIO_InitStruct.OutputType = LL_GPIO_OUTPUT_PUSHPULL;
    GPIO_InitStruct.Pull = LL_GPIO_PULL_UP;
    GPIO_InitStruct.Alternate = LL_GPIO_AF_7;
    LL_GPIO_Init(GPIOA, &GPIO_InitStruct);

    /* USART2_RX Init */
    LL_DMA_SetChannelSelection(DMA1, LL_DMA_STREAM_5, LL_DMA_CHANNEL_4);
    LL_DMA_SetDataTransferDirection(DMA1, LL_DMA_STREAM_5, LL_DMA_DIRECTION_PERIPH_TO_MEMORY);
    LL_DMA_SetStreamPriorityLevel(DMA1, LL_DMA_STREAM_5, LL_DMA_PRIORITY_LOW);
    LL_DMA_SetMode(DMA1, LL_DMA_STREAM_5, LL_DMA_MODE_CIRCULAR);
    LL_DMA_SetPeriphIncMode(DMA1, LL_DMA_STREAM_5, LL_DMA_PERIPH_NOINCREMENT);
    LL_DMA_SetMemoryIncMode(DMA1, LL_DMA_STREAM_5, LL_DMA_MEMORY_INCREMENT);
    LL_DMA_SetPeriphSize(DMA1, LL_DMA_STREAM_5, LL_DMA_PDATAALIGN_BYTE);
    LL_DMA_SetMemorySize(DMA1, LL_DMA_STREAM_5, LL_DMA_MDATAALIGN_BYTE);
    LL_DMA_DisableFifoMode(DMA1, LL_DMA_STREAM_5);
    LL_DMA_SetMemoryAddress(DMA1, LL_DMA_STREAM_5, (uint32_t)dma_rx_buff);
    LL_DMA_SetDataLength(DMA1, LL_DMA_STREAM_5, sizeof(dma_rx_buff));
    LL_DMA_SetPeriphAddress(DMA1, LL_DMA_STREAM_5, (uint32_t)&USART2->DR);
    LL_DMA_EnableIT_HT(DMA1, LL_DMA_STREAM_5);
    LL_DMA_EnableIT_TC(DMA1, LL_DMA_STREAM_5);
    LL_DMA_EnableStream(DMA1, LL_DMA_STREAM_5);

    USART_InitStruct.BaudRate = 921600;
    USART_InitStruct.DataWidth = LL_USART_DATAWIDTH_8B;
    USART_InitStruct.StopBits = LL_USART_STOPBITS_1;
    USART_InitStruct.Parity = LL_USART_PARITY_NONE;
    USART_InitStruct.TransferDirection = LL_USART_DIRECTION_TX_RX;
    USART_InitStruct.HardwareFlowControl = LL_USART_HWCONTROL_NONE;
    USART_InitStruct.OverSampling = LL_USART_OVERSAMPLING_16;
    LL_USART_Init(USART2, &USART_InitStruct);
    LL_USART_ConfigAsyncMode(USART2);
    LL_USART_EnableDMAReq_RX(USART2);
    LL_USART_EnableIT_IDLE(USART2);
    LL_USART_Enable(USART2);

    /* Make them the same priority, no preemption allowed */
    NVIC_SetPriority(DMA1_Stream5_IRQn, NVIC_EncodePriority(NVIC_GetPriorityGrouping(), 0, 0));
    NVIC_EnableIRQ(DMA1_Stream5_IRQn);
    NVIC_SetPriority(USART2_IRQn, NVIC_EncodePriority(NVIC_GetPriorityGrouping(), 0, 0));
    NVIC_EnableIRQ(USART2_IRQn);
}

static void
LL_Init(void) {
    NVIC_SetPriorityGrouping(NVIC_PRIORITYGROUP_4);
    NVIC_SetPriority(MemoryManagement_IRQn, NVIC_EncodePriority(NVIC_GetPriorityGrouping(),0, 0));
    NVIC_SetPriority(BusFault_IRQn, NVIC_EncodePriority(NVIC_GetPriorityGrouping(),0, 0));
    NVIC_SetPriority(UsageFault_IRQn, NVIC_EncodePriority(NVIC_GetPriorityGrouping(),0, 0));
    NVIC_SetPriority(SVCall_IRQn, NVIC_EncodePriority(NVIC_GetPriorityGrouping(),0, 0));
    NVIC_SetPriority(DebugMonitor_IRQn, NVIC_EncodePriority(NVIC_GetPriorityGrouping(),0, 0));
    NVIC_SetPriority(PendSV_IRQn, NVIC_EncodePriority(NVIC_GetPriorityGrouping(),0, 0));
    NVIC_SetPriority(SysTick_IRQn, NVIC_EncodePriority(NVIC_GetPriorityGrouping(),0, 0));
}

/**
* System Clock Configuration
*/

void
SystemClock_Config(void) {
    LL_FLASH_SetLatency(LL_FLASH_LATENCY_2);
    if (LL_FLASH_GetLatency() != LL_FLASH_LATENCY_2) {
        while (1);
    }
    LL_PWR_SetRegulVoltageScaling(LL_PWR_REGU_VOLTAGE_SCALE2);
    LL_RCC_HSI_SetCalibTrimming(16);
    LL_RCC_HSI_Enable();
    /* Wait till HSI is ready */
    while(LL_RCC_HSI_IsReady() != 1);
    LL_RCC_PLL_ConfigDomain_SYS(LL_RCC_PLLSOURCE_HSI, LL_RCC_PLLM_DIV_8, 84, LL_RCC_PLLP_DIV_2);
    LL_RCC_PLL_Enable();

    /* Wait till PLL is ready */
    while(LL_RCC_PLL_IsReady() != 1);
    LL_RCC_SetAHBPrescaler(LL_RCC_SYSCLK_DIV_1);
    LL_RCC_SetAPB1Prescaler(LL_RCC_APB1_DIV_2);
    LL_RCC_SetAPB2Prescaler(LL_RCC_APB2_DIV_1);
    LL_RCC_SetSysClkSource(LL_RCC_SYS_CLKSOURCE_PLL);

    /* Wait till System clock is ready */
    while(LL_RCC_GetSysClkSource() != LL_RCC_SYS_CLKSOURCE_STATUS_PLL);
    LL_Init1msTick(84000000);
    LL_SYSTICK_SetClkSource(LL_SYSTICK_CLKSOURCE_HCLK);
    LL_SetSystemCoreClock(84000000);
    LL_RCC_SetTIMPrescaler(LL_RCC_TIM_PRESCALER_TWICE);

    /* SysTick_IRQn interrupt configuration */
    NVIC_SetPriority(SysTick_IRQn, NVIC_EncodePriority(NVIC_GetPriorityGrouping(),0, 0));
}

 

Hope it helps to understand what you need to do for efficient DMA reading from UART.

 

Best regards,

Tilen

Outcomes