cancel
Showing results for 
Search instead for 
Did you mean: 

UART misalign received data after previous receive timeout

schperplata
Associate III
Posted on November 02, 2017 at 12:35

Hello,

I'm having a problem making my UART communication robust on error. (two STM32F07xx custom boards communicationg through UART3, HAL + CubeMx). I have custom protocol (quite simple), fixed message length + acknowledge.

During normal runtime, I can correctly receive and transmitt data. If I intentionally (or unintentionally) insert some delay in code (elsewhere, not related to communication part of the project), thereceive part of the two boards should:

- get UART receive timeout,

- log an error into error log buffer,

- flush data register and

- continue to receive/send/communicate.

In reality it does everything except that any receivedata after first timeout is misaligned. For example,

during normal communication with no errors, this is received:

0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF�?�?

On error, when data is not received and UART driver throws receive timeout (aka HAL_TIMEOUT), all the data is received corectly with additional byte at the beggining, causing data to be misaligned and therefore all wrong:

0x00 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF

What could cause this additional byte in reading data after previous rx timeout?

I receive data in blocking mode, using:

rx_status = HAL_UART_Receive(&huart3, data, MSG_SIZE, UART_BU_TO_CU_RX_TIMEOUT);

I tried to flush data register with all functions I could find (I think HAL documentation is a little bit unclear how to properly use this macros):

__HAL_UART_FLUSH_DRREGISTER(&huart3);

__HAL_UART_SEND_REQ(&huart3, UART_RXDATA_FLUSH_REQUEST); #flush #uart #timeour
11 REPLIES 11
AvaTar
Lead
Posted on November 02, 2017 at 12:52

Some part of your post seemingly got lost.

I will not comment on Cube code.

However, according to my experience, robustness in UART communication can only come a higher implementation level, i.e. your protocol. Hardware support is rather rudimentary.

You can transmit packets with a defined start and end (character or timing),  and have a checksum or CRC for each packet.

And you might need a request/response mechanism, to request a re-transmission of corrupted or lost packets.

Posted on November 02, 2017 at 13:04

Are you sure it's not the transmitter's fault? I.e. where exactly was the delay which caused the timeout?

Is UART_BU_TO_CU_RX_TIMEOUT calculated with regard to MSG_SIZE?

JW

schperplata
Associate III
Posted on November 02, 2017 at 13:29

Answers for both of you

Waclawek.Jan

‌ and

meyer.frank

‌:

It looks like my protocol is doing OK. It is reporting that data is not received correctly, because it actually isn't. No UART error is detected except this user-generated timeout.

I make sure thatcorrect data is sent to

HAL_UART_Transmit()

function, but I don't have an oscilloscope or logic analyzer to test what is actually comming out of the uart pins. The link between the two of the boards is further improved with RS-422 converter, which so far didn't cause any problems.

Transmit&receive timeout times are surelly long enough (100ms for 10 bytes at 115 200baud rate).

About the delay: It will be removed from thecode after thisUART thing is solved. So far it was caused by GPIO expander board with bug in the code which caused longer delay than uart timeout, which caused lost communication due this data misaligment. Anyway, I testthis delay with button press and I expect communication to continue after blocking delay is over. So far I had to manually reset systemto re-establish communication, which in my case is unexceptable - I need a reliablesystem.

schperplata
Associate III
Posted on November 04, 2017 at 09:38

Just to confirm, I re-checked my protocol and I can't see any bugs that could cause such behaviour - eg.  sending one additional byte or filling internal software buffer wrong.

If iI get a chance to borrow an osciloscope I will take a look at RS-422 converter, but I think this must be something related to HAL UART driver.

Anyone had similar problems?

Posted on November 04, 2017 at 14:23

I don't see the value of flushing here. You're not getting an overrun error.

Expecting multiple characters to arrive synchronized and with fixed length seems massively optimistic. Robustness doesn't come from hoping things will work properly.

>>

Anyone had similar problems?

Not really, but I use different methods so I don't

If you are blocking, why not sit in a loop probing USART->SR.RXNE and reading USART->DR into a buffer. You get to see each character arrive, you can resync your data based on the start pattern/preamble, and you get to see when you are done.

If you are still losing data you want to use a scope or logic analyzer to better understand the failure, using a GPIO and your own detection of the failure to trigger the capture. Trying random solutions to a problem you don't understand will just be a waste of time.

Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..
schperplata
Associate III
Posted on November 04, 2017 at 19:28

Expecting multiple characters to arrive synchronized and with fixed length seems massively optimistic. Robustness doesn't come from hoping things will work properly.

Well, I am not an expert of serial communication or protocols, but I don't think I am doing such despicable mistake here, but I am counting on uart peripheral and HAL drivers. I will write more details about my protocol, and you can than let me know if I generally don't understand something important.

Two identical boards are communicating through UART, 115200, no parity or other checking, all messages has fixed length of 10 bytes. Master further communicates with PC over second UART, which so far didn't do me any problems.

Master board is controlled by PC and can send data to slave anytime, since slave has enabled RXNE interrupt. Master always expects response (acknowledement - ACK) from slave and does not do anything other than waiting for it. If slave does not respond in time, this is considered as slave error and is send to PC and handled later. If ACK from slave is not OK (like data or other slave error), master further sends this error back to PC. Once ACK is received (or timeout), slave should be ready for new data. 

On the slave end, once rxne interrupt is triggered, 10 bytes are received using standard HAL_UART_Receive() function with timeout. If everything is OK (no rx timeout or other error), this data is put to buffer which is handled outside of ISR. Once data from buffer is processed, slave sends response (acknowledge) and other data again in length of 10 bytes. If there is some error (timeout or other UART error), buffer is flushed and slave sends back NACK so master/PC can handle it. 

Here is the problem. If something in slave generates delay larger than master receive timeout, additional byte appears in master rx uart. When next data is sent by master, and slave receives it correctly, sends back ACK, master reads this ack, but since timeout si generated before, this ACK message has appended one fault byte, leaving all data as garbage.

I checked that:

- master sends correct data

- slave receives correct data

- slave responds with correct data

- master receives data correctly if there is no receive timeout generated before. After timeout, all data is appended by aditional 0x00 byte.

It is true that this protocol does not have robust message wrapping like TCP. This is simple and straight forward communciation, and the real question should be (at least I think so), where this byte is coming from. 

Anyway, I am very happy if anyone can give me any advice how communication between boards should go. I can see and understand it on paper, but implementing it in reality with HAL/STD drivers... completely different story. Let me know!
Posted on November 06, 2017 at 07:56

... but I am counting on uart peripheral and HAL drivers.

I'd say forget the latter. Regardless of quality, Cube will not provide an implementation for your protocol.

Always include the 'rainy day' scenarios, where characters get lost and corrupted. If your code gets stuck or de-synchronized, the protocol is inappropriate.

Here is the problem. If something in slave generates delay larger than master receive timeout, additional byte appears in master rx uart. When next data is sent by master, and slave receives it correctly, sends back ACK, master reads this ack, but since timeout si generated before, this ACK message has appended one fault byte, leaving all data as garbage.

I suggest a state-machine design for the protocol driver, i.e. two coupled state-machines.

If you define a protocol with defined start and end characters, you can just discard any 'stray character' you receive outside your protocol.

You buy robustness with (reasonable) overhead.

Posted on November 06, 2017 at 10:41

I suggest a state-machine design for the protocol driver, i.e. two coupled state-machines.

If you define a protocol with defined start and end characters, you can just discard any 'stray character' you receive outside your protocol.

You buy robustness with (reasonable) overhead

Yes. I have decided to rewritte this protocol or use some other (looking at

https://github.com/min-protocol/min

). I've spend few hours and I can't find any case (or 'bug') that could generate this additional byte scenario, so the conclusion is, that I am screwing things up with inappropriate data handling and badly designed protocol. Still, the thing I am affraid of is not the protocol, but the low level UART implementation, sending&receiving functions.

Let me know, if there is anything out there with good instructions for low level firmware code.

Thank you so far!

Posted on November 06, 2017 at 11:20

Still, the thing I am affraid of is not the protocol, but the low level UART implementation, sending&receiving functions.

My first suggestion would be to drop the Cube/HAL straightjacket, and go for simple, interrupt-based low-level layer.

Not sure if you ever need to port your code, but a strict layering and separation of tasks/modules reflecting your protocol needs is helpful in many regards. Whenever I looked at Cube code, it just stood in my way ...

Yes. I have decided to rewritte this protocol or use some other (looking at

/external-link.jspa?url=https%3A%2F%2Fgithub.com%2Fmin-protocol%2Fmin

).

I think your requirements are more similar to xmodem or zmodem, rather than USB-related one's.

Or, look at the Modbus ASCII protocol. Specs are freely available.

I mean, the design ideas are probably relevant for you, not the application or protocol/data contents.