Showing results for 
Search instead for 
Did you mean: 

UART Rx with DMA - missing bytes

Posted on May 01, 2014 at 23:28

Hello there,

I have an interesting problem. I am writing a hardware abstraction wrapper for UARTs on an STM32F427, and I want this particular implementation to use DMA for receiving data. I am not using DMA to transmit data. Occasionally and unpredictably, the UART DMA request will happen, (I know this because the DMAxStreamy->NDTR register will decrement), but the byte from the UART will not be copied into the buffer. 

As part of my abstraction interface, I need to determine how many valid bytes I have in my buffer. To help determine this, I read the NDTR register. The above error occurs when I read this register approximately when data is coming in the UART. Could this be creating the issue? I have pasted configuration code and the code for reading the data below. Thank you for the help!



typedef struct


volatile uint8_t buffer[HAL_DMA_IN_BUFFER_SIZE];

volatile uint8_t *  nextUnused;

DMA_Stream_TypeDef * dma_stream;

} HAL_DmaInBufferType;

typedef struct


uint8_t buffer[HAL_DMA_OUT_BUFFER_SIZE];

uint8_t * nextUnsentByte;

uint8_t * nextOpenByte;

uint16_t size;

} HAL_DmaOutBufferType;

static HAL_DmaInBufferType _uartDmaBufferIn = {.nextUnused = NULL, .dma_stream = NULL };

static HAL_DmaOutBufferType _uartDmaBufferOut = {.nextUnsentByte = NULL, .nextOpenByte = NULL, .buffer = {0xA5}, .size = 0 };

int HAL_UartInitializeDevice()


DMA_InitTypeDef  DMA_InitStructure;

USART_TypeDef *uart_base = UART4;

///- enable the UART blocks clock input

RCC_APB1PeriphClockCmd( RCC_APB1Periph_UART4,  ENABLE );

///- choose the correct clock, stream, and channel

_uartDmaBufferIn.dma_stream = DMA1_Stream2;

DMA_InitStructure.DMA_Channel = DMA_Channel_4;

DMA_InitStructure.DMA_PeripheralBaseAddr = (uint32_t)&UART4->DR;

RCC_AHB1PeriphClockCmd(RCC_AHB1Periph_DMA1, ENABLE);

///- Fill buffer with watermarks

memset((void *)_uartDmaBufferIn.buffer, 0xA5, HAL_DMA_IN_BUFFER_SIZE);

///- Deinit the DMA stream


///- configure the DMA

DMA_InitStructure.DMA_DIR = DMA_DIR_PeripheralToMemory;

DMA_InitStructure.DMA_Memory0BaseAddr = (uint32_t)&_uartDmaBufferIn.buffer[0];

DMA_InitStructure.DMA_BufferSize = HAL_DMA_IN_BUFFER_SIZE;

DMA_InitStructure.DMA_PeripheralInc = DMA_PeripheralInc_Disable;

DMA_InitStructure.DMA_MemoryInc = DMA_MemoryInc_Enable;

DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Byte;

DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;

DMA_InitStructure.DMA_Mode = DMA_Mode_Circular;

DMA_InitStructure.DMA_Priority = DMA_Priority_VeryHigh;

DMA_InitStructure.DMA_FIFOMode = DMA_FIFOMode_Disable;

DMA_InitStructure.DMA_MemoryBurst = DMA_MemoryBurst_Single;

DMA_InitStructure.DMA_PeripheralBurst = DMA_PeripheralBurst_Single;

///- initialize the DMA

DMA_Init(_uartDmaBufferIn.dma_stream, &DMA_InitStructure);

///- enable the DMA RX Stream

DMA_Cmd(_uartDmaBufferIn.dma_stream, ENABLE);

///- enable the USART Rx DMA request


///- disable DMA Stream Transfer Complete interrupt

DMA_ITConfig(_uartDmaBufferIn.dma_stream, DMA_IT_TC, DISABLE);

///- initialize the buffers

_uartDmaBufferIn.nextUnused = _uartDmaBufferIn.buffer;

_uartDmaBufferOut.nextOpenByte = _uartDmaBufferOut.buffer;

_uartDmaBufferOut.nextUnsentByte = _uartDmaBufferOut.buffer;

///- Enable the Tx interrupt

USART_ITConfig( uart_base, USART_IT_TXE, ENABLE );

///- Enable for Tx

NVIC_InitTypeDef NVIC_InitStructure = {

.NVIC_IRQChannelPreemptionPriority = 0x01,

.NVIC_IRQChannelSubPriority = 0x01,


NVIC_InitStructure.NVIC_IRQChannelCmd = ENABLE;

NVIC_InitStructure.NVIC_IRQChannel = UART4_IRQn;


///- Configure UART

USART_InitTypeDef UART_InitStructure;

USART_StructInit( &UART_InitStructure );

UART_InitStructure.USART_BaudRate = baud_rate;

UART_InitStructure.USART_WordLength = USART_WordLength_8b;

UART_InitStructure.USART_StopBits = USART_StopBits_1;

UART_InitStructure.USART_Parity = USART_Parity_No;

UART_InitStructure.USART_HardwareFlowControl = USART_HardwareFlowControl_None;

UART_InitStructure.USART_Mode = USART_Mode_Tx | USART_Mode_Rx;

// now let's apply it all

USART_Init( uart_base, &UART_InitStructure );

USART_Cmd( uart_base, ENABLE );


int HAL_UartRecv(char *str, int str_len )


uint32_t nextOpen, arrayBoundary;

int readCount = 0;

///- calculate first invalid address

arrayBoundary = (uint32_t)_uartDmaBufferIn.buffer + HAL_DMA_IN_BUFFER_SIZE;

///- calculate DMA pointer to next open byte

nextOpen = arrayBoundary - _uartDmaBufferIn.dma_stream->NDTR;

///- copy bytes into buffer

while(readCount < str_len)


///- no more unread bytes are available

if((uint32_t)_uartDmaBufferIn.nextUnused == nextOpen)




///- copy data over

*str++ = (char) *_uartDmaBufferIn.nextUnused++;


///- check array boundary

if((uint32_t)_uartDmaBufferIn.nextUnused >= arrayBoundary)


_uartDmaBufferIn.nextUnused = _uartDmaBufferIn.buffer;



///- return the read count

return readCount;


Posted on May 02, 2014 at 10:38

NDTR appears to be decremented *before* the transfer occurs.

In an experiment I triggered the DMA from SPI receiver set to as fast as possible, and read in both NDTR and the target memory:


    testbuffer1[4] = DMA1_Stream3->NDTR;

    testbuffer1[5] = rxBuffer;

    testbuffer1[6] = DMA1_Stream3->NDTR;

    testbuffer1[7] = rxBuffer;

    testbuffer1[8] = DMA1_Stream3->NDTR;

    testbuffer1[9] = rxBuffer;

    testbuffer1[10] = DMA1_Stream3->NDTR;

    testbuffer1[11] = rxBuffer;

 [ etc...]

This compiled in a relatively expectable way into:


 80014a0:    6850          ldr    r0, [r2, #4]

 80014a2:    6821          ldr    r1, [r4, #0]

 80014a4:    6118          str    r0, [r3, #16]

 80014a6:    6159          str    r1, [r3, #20]

 80014a8:    6850          ldr    r0, [r2, #4]

 80014aa:    6821          ldr    r1, [r4, #0]

 80014ac:    6198          str    r0, [r3, #24]

 80014ae:    61d9          str    r1, [r3, #28]

 80014b0:    6850          ldr    r0, [r2, #4]

 80014b2:    6821          ldr    r1, [r4, #0]

 80014b4:    6218          str    r0, [r3, #32]

 80014b6:    6259          str    r1, [r3, #36]    ; 0x24


i.e. something like 4 cycles per ''sample'', if I count it right, maybe one cycle more if a DMA access occurs. (The SPI transmitter was fed by a different stream of the same DMA (of lower priority), so that might have interfered in some way.)

There were 2 samples with NDTR already decremented and memory unchanged, when APB was set to HCLK/4. When APB was set to HCLK/8, there were 4 such samples.

I wouldn't call this exactly a bug, but definitively it is an error in documentation - RM0090 rev.6 clearly states in 10.3.2, that ''Each DMA transfer consists of three operations: a loading [...] a storage [...] a post-decrement of the DMA_SxNDTR register [...]'', and also in description of the DMA_SxNDTR register in 10.5.6, ''This register decrements after each DMA transfer''.

I'd say, a workaround might lie in reading some ''safe-to-read'' USART register  (e.g. _BRR) before reading the buffer, so that the processor will have to wait until the DMA completes the APB transaction; but this would work only if it's sure that at the moment the processor sees a changed NDTR the DMA has already started the APB access, which is something I can't tell and only the ST crew could.

Consider posting a support request to ST.


Posted on March 22, 2017 at 19:03

I've been just hit by the same deficiency.

I understand that contrary to the mentioned documentation, NDTR is decremented upon start of the peripheral transfer i.e. at P-to-M transfers upon the start of the whole process. I understand that data visibility by processor or other AHB masters is then delayed by any delay on the peripheral port (usually APB bridge), plus any latency due to concurrent DMA processes, plus any latency due to target memory port bus contention.

However, it's important to know, when exactly is the memory side safely written.

In P-to-M transfers, is it safe to assume the memory side transfer is finished if NDTR indicates one more transfer has started? And, if FIFO is used, is it safe to assume that the FIFO has been completely transferred to the memory if NDTR indicates one transfer beyond the FIFO threshold? Willing to discuss this further in depth as this is a vital question for reading circularly filled buffers on-the-fly, reducing latencies otherwise involved when waiting for the interrupts, while still maintaining long enough buffers for (infrequent) processing delays.

Btw, in circular mode, does the occurence of half/full-transfer interrupt guarantee that the data for that half of buffer are fully visible in the memory at that point?

At any case, the involved documentation ought to be corrected according to the status quo.

Can ST please comment. 


Jan Waclawek

Posted on March 22, 2017 at 19:35

There were comments and questions in the old ST formums (which seem to have got dumped when they changed the interface) which said the CMISS OTG code was seriously deficient and locked up if used for anything but fairly small, trival data flows.

Previous posts go no reply or attention from ST, so I doubt they are motivated to fix it.

I know one guy who just decided to add an RS232 driver chip and use the USART.

Posted on March 22, 2017 at 19:36

Someone wrote some better code for Chibios, I did not check it out yet but it looked well thought out.

Posted on March 22, 2017 at 19:52

The issue discussed in this thread is a DMA issue - entirely unrelated to OTG, and only coincidentally related to USART.


"I understand that contrary to the mentioned documentation, NDTR is decremented upon start of the peripheral transfer i.e. at P-to-M transfers upon the start of the whole process."

This may only be correct for non-byte width transfers. See below.

"In P-to-M transfers, is it safe to assume the memory side transfer is finished if NDTR indicates one more transfer has started?​"

This has been my experience with half-word (16-bit wide) circular DMA ADC buffers. The half-word pointed to by NDTR can be transiently incorrect if read on-the-fly, but as soon as NDTR pointer moves to the next word/half-word (whatever you have set up), then the data seems solid and correct.

I cannot really comment on byte-level circular NDTR as currently I only use that for incoming audio data streamed in from a serial port and don't "peek" at the data other than at transfer half-complete and transfer complete interrupts (and then, I only look at the inactive buffer half).

It could be that only non-byte-wide circular buffers are affected. In other words, the byte pointed to by NDTR may have already been correctly written, BUT the half-word or word pointed to by NDTR may have only been partially written.

I've already debugged my asynchronous circular half-word ADC DMA buffer processor algorithm and ignore the half-word pointed to by NDTR, so unfortunately I don't really have a need to do an experiment to test this theory. But it would be interesting to know.