2014-05-01 02:28 PM
Hello there,
I have an interesting problem. I am writing a hardware abstraction wrapper for UARTs on an STM32F427, and I want this particular implementation to use DMA for receiving data. I am not using DMA to transmit data. Occasionally and unpredictably, the UART DMA request will happen, (I know this because the DMAxStreamy->NDTR register will decrement), but the byte from the UART will not be copied into the buffer. As part of my abstraction interface, I need to determine how many valid bytes I have in my buffer. To help determine this, I read the NDTR register. The above error occurs when I read this register approximately when data is coming in the UART. Could this be creating the issue? I have pasted configuration code and the code for reading the data below. Thank you for the help!#define HAL_DMA_IN_BUFFER_SIZE 400#define HAL_DMA_OUT_BUFFER_SIZE 400typedef struct{ volatile uint8_t buffer[HAL_DMA_IN_BUFFER_SIZE]; volatile uint8_t * nextUnused; DMA_Stream_TypeDef * dma_stream;} HAL_DmaInBufferType;typedef struct{ uint8_t buffer[HAL_DMA_OUT_BUFFER_SIZE]; uint8_t * nextUnsentByte; uint8_t * nextOpenByte; uint16_t size;} HAL_DmaOutBufferType;static HAL_DmaInBufferType _uartDmaBufferIn = {.nextUnused = NULL, .dma_stream = NULL };static HAL_DmaOutBufferType _uartDmaBufferOut = {.nextUnsentByte = NULL, .nextOpenByte = NULL, .buffer = {0xA5}, .size = 0 };int HAL_UartInitializeDevice(){ DMA_InitTypeDef DMA_InitStructure; USART_TypeDef *uart_base = UART4; ///- enable the UART blocks clock input RCC_APB1PeriphClockCmd( RCC_APB1Periph_UART4, ENABLE ); ///- choose the correct clock, stream, and channel _uartDmaBufferIn.dma_stream = DMA1_Stream2; DMA_InitStructure.DMA_Channel = DMA_Channel_4; DMA_InitStructure.DMA_PeripheralBaseAddr = (uint32_t)&UART4->DR; RCC_AHB1PeriphClockCmd(RCC_AHB1Periph_DMA1, ENABLE); ///- Fill buffer with watermarks memset((void *)_uartDmaBufferIn.buffer, 0xA5, HAL_DMA_IN_BUFFER_SIZE); ///- Deinit the DMA stream DMA_DeInit(_uartDmaBufferIn.dma_stream); ///- configure the DMA DMA_InitStructure.DMA_DIR = DMA_DIR_PeripheralToMemory; DMA_InitStructure.DMA_Memory0BaseAddr = (uint32_t)&_uartDmaBufferIn.buffer[0]; DMA_InitStructure.DMA_BufferSize = HAL_DMA_IN_BUFFER_SIZE; DMA_InitStructure.DMA_PeripheralInc = DMA_PeripheralInc_Disable; DMA_InitStructure.DMA_MemoryInc = DMA_MemoryInc_Enable; DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Byte; DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte; DMA_InitStructure.DMA_Mode = DMA_Mode_Circular; DMA_InitStructure.DMA_Priority = DMA_Priority_VeryHigh; DMA_InitStructure.DMA_FIFOMode = DMA_FIFOMode_Disable; DMA_InitStructure.DMA_MemoryBurst = DMA_MemoryBurst_Single; DMA_InitStructure.DMA_PeripheralBurst = DMA_PeripheralBurst_Single; ///- initialize the DMA DMA_Init(_uartDmaBufferIn.dma_stream, &DMA_InitStructure); ///- enable the DMA RX Stream DMA_Cmd(_uartDmaBufferIn.dma_stream, ENABLE); ///- enable the USART Rx DMA request USART_DMACmd(uart_base, USART_DMAReq_Rx, ENABLE); ///- disable DMA Stream Transfer Complete interrupt DMA_ITConfig(_uartDmaBufferIn.dma_stream, DMA_IT_TC, DISABLE); ///- initialize the buffers _uartDmaBufferIn.nextUnused = _uartDmaBufferIn.buffer; _uartDmaBufferOut.nextOpenByte = _uartDmaBufferOut.buffer; _uartDmaBufferOut.nextUnsentByte = _uartDmaBufferOut.buffer; ///- Enable the Tx interrupt USART_ITConfig( uart_base, USART_IT_TXE, ENABLE ); ///- Enable for Tx NVIC_InitTypeDef NVIC_InitStructure = { .NVIC_IRQChannelPreemptionPriority = 0x01, .NVIC_IRQChannelSubPriority = 0x01, }; NVIC_InitStructure.NVIC_IRQChannelCmd = ENABLE; NVIC_InitStructure.NVIC_IRQChannel = UART4_IRQn; NVIC_Init(&NVIC_InitStructure); ///- Configure UART USART_InitTypeDef UART_InitStructure; USART_StructInit( &UART_InitStructure ); UART_InitStructure.USART_BaudRate = baud_rate; UART_InitStructure.USART_WordLength = USART_WordLength_8b; UART_InitStructure.USART_StopBits = USART_StopBits_1; UART_InitStructure.USART_Parity = USART_Parity_No; UART_InitStructure.USART_HardwareFlowControl = USART_HardwareFlowControl_None; UART_InitStructure.USART_Mode = USART_Mode_Tx | USART_Mode_Rx; // now let's apply it all USART_Init( uart_base, &UART_InitStructure ); USART_Cmd( uart_base, ENABLE );}int HAL_UartRecv(char *str, int str_len ){ uint32_t nextOpen, arrayBoundary; int readCount = 0; ///- calculate first invalid address arrayBoundary = (uint32_t)_uartDmaBufferIn.buffer + HAL_DMA_IN_BUFFER_SIZE; ///- calculate DMA pointer to next open byte nextOpen = arrayBoundary - _uartDmaBufferIn.dma_stream->NDTR; ///- copy bytes into buffer while(readCount < str_len) { ///- no more unread bytes are available if((uint32_t)_uartDmaBufferIn.nextUnused == nextOpen) { break; } ///- copy data over *str++ = (char) *_uartDmaBufferIn.nextUnused++; readCount++; ///- check array boundary if((uint32_t)_uartDmaBufferIn.nextUnused >= arrayBoundary) { _uartDmaBufferIn.nextUnused = _uartDmaBufferIn.buffer; } } ///- return the read count return readCount;}2014-05-02 01:38 AM
NDTR appears to be decremented *before* the transfer occurs.
In an experiment I triggered the DMA from SPI receiver set to as fast as possible, and read in both NDTR and the target memory: [...] testbuffer1[4] = DMA1_Stream3->NDTR; testbuffer1[5] = rxBuffer; testbuffer1[6] = DMA1_Stream3->NDTR; testbuffer1[7] = rxBuffer; testbuffer1[8] = DMA1_Stream3->NDTR; testbuffer1[9] = rxBuffer; testbuffer1[10] = DMA1_Stream3->NDTR; testbuffer1[11] = rxBuffer; [ etc...] This compiled in a relatively expectable way into: [...] 80014a0: 6850 ldr r0, [r2, #4] 80014a2: 6821 ldr r1, [r4, #0] 80014a4: 6118 str r0, [r3, #16] 80014a6: 6159 str r1, [r3, #20] 80014a8: 6850 ldr r0, [r2, #4] 80014aa: 6821 ldr r1, [r4, #0] 80014ac: 6198 str r0, [r3, #24] 80014ae: 61d9 str r1, [r3, #28] 80014b0: 6850 ldr r0, [r2, #4] 80014b2: 6821 ldr r1, [r4, #0] 80014b4: 6218 str r0, [r3, #32] 80014b6: 6259 str r1, [r3, #36] ; 0x24 [etc...] i.e. something like 4 cycles per ''sample'', if I count it right, maybe one cycle more if a DMA access occurs. (The SPI transmitter was fed by a different stream of the same DMA (of lower priority), so that might have interfered in some way.) There were 2 samples with NDTR already decremented and memory unchanged, when APB was set to HCLK/4. When APB was set to HCLK/8, there were 4 such samples. I wouldn't call this exactly a bug, but definitively it is an error in documentation - RM0090 rev.6 clearly states in 10.3.2, that ''Each DMA transfer consists of three operations: a loading [...] a storage [...] a post-decrement of the DMA_SxNDTR register [...]'', and also in description of the DMA_SxNDTR register in 10.5.6, ''This register decrements after each DMA transfer''. I'd say, a workaround might lie in reading some ''safe-to-read'' USART register (e.g. _BRR) before reading the buffer, so that the processor will have to wait until the DMA completes the APB transaction; but this would work only if it's sure that at the moment the processor sees a changed NDTR the DMA has already started the APB access, which is something I can't tell and only the ST crew could. Consider posting a support request to ST. JW2017-03-22 12:03 PM
I've been just hit by the same deficiency.
I understand that contrary to the mentioned documentation, NDTR is decremented upon start of the peripheral transfer i.e. at P-to-M transfers upon the start of the whole process. I understand that data visibility by processor or other AHB masters is then delayed by any delay on the peripheral port (usually APB bridge), plus any latency due to concurrent DMA processes, plus any latency due to target memory port bus contention.
However, it's important to know, when exactly is the memory side safely written.
In P-to-M transfers, is it safe to assume the memory side transfer is finished if NDTR indicates one more transfer has started? And, if FIFO is used, is it safe to assume that the FIFO has been completely transferred to the memory if NDTR indicates one transfer beyond the FIFO threshold? Willing to discuss this further in depth as this is a vital question for reading circularly filled buffers on-the-fly, reducing latencies otherwise involved when waiting for the interrupts, while still maintaining long enough buffers for (infrequent) processing delays.
Btw, in circular mode, does the occurence of half/full-transfer interrupt guarantee that the data for that half of buffer are fully visible in the memory at that point?
At any case, the involved documentation ought to be corrected according to the status quo.
Can ST please comment.
Thanks,
Jan Waclawek
2017-03-22 12:35 PM
There were comments and questions in the old ST formums (which seem to have got dumped when they changed the interface) which said the CMISS OTG code was seriously deficient and locked up if used for anything but fairly small, trival data flows.
Previous posts go no reply or attention from ST, so I doubt they are motivated to fix it.
I know one guy who just decided to add an RS232 driver chip and use the USART.
2017-03-22 12:36 PM
Someone wrote some better code for Chibios, I did not check it out yet but it looked well thought out.
2017-03-22 12:52 PM
The issue discussed in this thread is a DMA issue - entirely unrelated to OTG, and only coincidentally related to USART.
JW
2018-11-07 09:48 AM
"I understand that contrary to the mentioned documentation, NDTR is decremented upon start of the peripheral transfer i.e. at P-to-M transfers upon the start of the whole process."
This may only be correct for non-byte width transfers. See below.
"In P-to-M transfers, is it safe to assume the memory side transfer is finished if NDTR indicates one more transfer has started?"
This has been my experience with half-word (16-bit wide) circular DMA ADC buffers. The half-word pointed to by NDTR can be transiently incorrect if read on-the-fly, but as soon as NDTR pointer moves to the next word/half-word (whatever you have set up), then the data seems solid and correct.
I cannot really comment on byte-level circular NDTR as currently I only use that for incoming audio data streamed in from a serial port and don't "peek" at the data other than at transfer half-complete and transfer complete interrupts (and then, I only look at the inactive buffer half).
It could be that only non-byte-wide circular buffers are affected. In other words, the byte pointed to by NDTR may have already been correctly written, BUT the half-word or word pointed to by NDTR may have only been partially written.
I've already debugged my asynchronous circular half-word ADC DMA buffer processor algorithm and ignore the half-word pointed to by NDTR, so unfortunately I don't really have a need to do an experiment to test this theory. But it would be interesting to know.