2021-10-04 07:34 AM
We have run into a problem where a new run of boards with STM32F429ZIT MCU's have been losing a byte (always loses a single byte which seems odd) from the UART communications that are sent to DMA.
We send packets of UART data to the MCU and the data is then sent to DMA. When we read the data back from DMA we will occasionally read back a packet with a missing byte. The failure rate is about 1 in 10 Megabytes or 1 in 50,000 packets.
I have scoped the UART communications during the point of failure and the failed data packet looks perfect at the pin (so UART signal is ruled out) and there are no transients on VDD or 3V3. Several people have looked at the design and have not determined anything obviously wrong (though of course something is)
We have solved this issue by using retry logic in software, but management is demanding an answer as to why the new board run is having this issue.
So I have two questions:
1: Has anyone encountered an issue like this and have any advice on what could be the root cause?
2: From your experience, is such a failure rate of 1 in 10 Megabytes or 1 in 50,000 packets considered a normal or expected problem with these kinds of systems? In other words, did we just get lucky in the past with not needing retry logic for this kind of system?
Thanks for the help!
2021-10-04 07:53 AM
You shouldn't see failures like this. Are you sure there's not a race condition in the code? How are you receiving data? DMA with always-receiving circular buffer or is the UART inactive between receptions? Is the missing byte always the first byte?
2021-10-04 07:58 AM
What do you mean by "missing byte" exactly? Post example.
How exactly are you using the DMA? Post code.
Which USART, what baudrate, which DMA, what is the system clock, what other peripherals use DMA?
Do you have an USART errors handler in place? Are there USART overruns?
JW
2021-10-04 08:14 AM
UART data from the RX pin (pin 102) is run directly into DMA process, my understanding is that because of this we have no visibility until we read the data from memory. There certainly could be a race condition, but the problem is that the software is the same between board runs and the previous board does not fail. The byte lost is at a random position in the packet. Packet size is typically 172 bytes and we lose it somewhere in the middle.
2021-10-04 09:18 AM
"What do you mean by "missing byte" exactly? Post example."
We have a packet that looks like this in the logic analyzer:
The first byte 0xAB is the packet size of 171 (actual packet size is 172 with size byte included). When this packet is read from the DMA we will get all of the same data less 1 byte somewhere in the middle.
"How exactly are you using the DMA? Post code."
Dma2Init(); is called once in main.c. I believe this is interrupt driven. Note that we have not had an issue with this code on past boards.
#define DMA2_C
#include <stdint.h>
#include <stdbool.h>
#include <string.h>
// CMSIS includes
//
#include <stm32f429xx.h>
// application includes
//
#include "qhal_uart.h"
#include "Dma2.h"
// Module function prototypes
//
void DMA2_Stream2_IRQHandler(void);
void Dma2Init(void);
// Local module variables
//
uint32_t dma2Stream2Error;
// Privately imported functions and variables
//
extern uint8_t us1_rxbuf[US1BUFS_LEN];
void DMA2_Stream2_IRQHandler(void)
/*
// Handles enabled interrupts from DMA2, Stream2
//
//
// Receives: not applicable to an ISR
//
// returns: not applicable to an ISR
*/
{
uint32_t dma2Irq;
dma2Irq = (DMA2->LISR & DMA2_S2_IRQ_MASK);
// if one of the 2 buffers has been filled reset its target address
if((dma2Irq & DMA_LISR_TCIF2) > 0)
{
// if the current target buffer is 1 reset target zero's address
if((DMA2_Stream2->CR & DMA_SxCR_CT) > 0)
DMA2_Stream2->M0AR = ((uint32_t) &us1_rxbuf[0]);
// else the current target buffer is 0 so reset target one's address
else
DMA2_Stream2->M1AR = ((uint32_t) &us1_rxbuf[HALF_MAX_US1BUFS]);
}
// if DMA errors occurred flag them for the application to handle
if((dma2Irq & (DMA_LISR_TEIF2 | DMA_LISR_DMEIF2)) > 0)
dma2Stream2Error = dma2Irq;
// clear the interrupts
DMA2->LIFCR = dma2Irq;
}
void Dma2Init(void)
/*
// Initializes DMA2 to stream USART1 Rx data directly from USART1 to the Rx
// buffer.
//
// Configures Stream 2 to use channel 4 (USART1 RX), single transfers of
// 8 bits in size from USART1 to memory in double buffer mode
//
// Receives: void
//
// returns: void
*/
{
// The initial write of DMA2_Stream2->CR to zero sets the following:
//
// Stream Channel 0
// MBURST: Single transfer
// PBURST: Single transfer
// CT 0, current target is buffer 0
// PL 0, lowest stream priority level
// DBM: Single buffer mode
// MSIZE: 8 bits
// PSIZE: 8 bits
// MINC 0, memory pointer is fixed
// PINC 0, peripheral pointer is fixed
// CIRC 0, circular mode disabled
// DIR 0, Peripheral to Memory
// PFCTRL 0, DMA controller as flow controller
// xxIE 0, No interrupts enabled
// EN 0, Stream is disabled
//
// The required changes are made after stream disable is confirmed
DMA2_Stream2->CR = 0;
while(DMA2_Stream2->CR & 1) // wait here for EN bit to go low
;
DMA2->LISR = 0; // reset transfer status bits
DMA2->HISR = 0;
// set the peripheral port register address (USART1_DR)
DMA2_Stream2->PAR = (USART1_BASE + 4);
// set the memory buffer address
DMA2_Stream2->M0AR = (uint32_t) (&us1_rxbuf[0]);
DMA2_Stream2->M1AR = (uint32_t) (&us1_rxbuf[HALF_MAX_US1BUFS]);
// number of data to be transfered
DMA2_Stream2->NDTR = HALF_MAX_US1BUFS;
DMA2_Stream2->CR = (4 << 25); // set stream2 to use channel 4
DMA2_Stream2->CR |= (3 << 16); // set stream2 priority to very high
// use direct mode not FIFO mode
DMA2_Stream2->FCR = 0;
// stream configuration
DMA2_Stream2->CR |= DMA_SxCR_DBM; // enable stream2 double buffer mode
DMA2_Stream2->CR |= DMA_SxCR_MINC; // memory increment mode
DMA2_Stream2->CR |= DMA_SxCR_TCIE; // transfer complete interrupt enable
DMA2_Stream2->CR |= DMA_SxCR_EN; // enable stream 2
// reset DMA2, Stream2 error indicator
dma2Stream2Error = 0;
NVIC_SetPriority(DMA2_Stream2_IRQn, 10); // set DMA2 Stream2's IRQ priority
NVIC_EnableIRQ(DMA2_Stream2_IRQn); // and unmask it
}
"Which USART, what baudrate, which DMA, what is the system clock, what other peripherals use DMA?"
USART1 (Pin 120 for RX and pin 136 for TX on LQFP 144 pin package). DMA2 presumably. 2 Megabaud. System clock 8MHz, and I don't think we use DMA for anything else since there are no other references to it.
"Do you have an USART errors handler in place? Are there USART overruns?"
We did not have any error handlers in place until now where we implemented retry logic. When a packet read from DMA does not equal the size specified by the packet size byte we retry. It's possible there are overruns I have not checked.
2021-10-04 09:48 AM
> The first byte 0xAB is the packet size of 171 (actual packet size is 172 with size byte included). When this packet is read from the DMA we will get all of the same data less 1 byte somewhere in the middle.
How do you know where is the end of the packet? Is that byte missing, or just has an unexpected value?
> 2 Megabaud. System clock 8MHz,
8MHz, really? That would leave only 80 system cycles to process a Rx byte, I wouldn't be surprised if this would be the problem.
What are values of HALF_MAX_US1BUFS/HALF_MAX_US1BUFS?
How does "main" pick the values from the buffer?
> It's possible there are overruns I have not checked.
Then check them (I mean, set up USART interrupt and enable it only for the errors). It may be revealing.
JW
(Btw. not setting DMA to Circular mode is somewhat confusing, but the Double-Buffer mode enables Circular in hardware.)
2021-10-04 09:58 AM
>>Note that we have not had an issue with this code on past boards.
Don't seem to use volatile variables where appropriate.
Doesn't seem any useful reason to use double buffers to describe continuous memory buffers.
Should be using a circular mode.
No indication how large the buffer is, and the relationship between the size, and the failure points in the data stream.
>>2 Megabaud. System clock 8MHz
Unlikely to get 2Mbaud from 8 MHz bus clocks.
2021-10-04 11:26 AM
If the missed byte is midstream, probably not a synchronization issue.
It could be a clock issue. 2 MBaud is workable, but the tolerance between chips needs to be adequate. Expect issues if you're using HSI.
I use the STM32F4 UART at ~5 MBaud and never had issues like this, even with significant amounts of data.
Look at the NF and FE bits in USARTx_SR which will be set if the clock is mismatched.
2021-10-04 01:22 PM
Can you elaborate on the NF and FE bits? I cannot find these in my source code. ty
2021-10-04 01:40 PM