Handling UART errors in HAL

maple · ‎2021-03-23

I am struggling to establish robust UART communication using HAL. It also seems I am not alone, there were many similar questions here, all closed without an answer. Here are some of them:

https://community.st.com/s/question/0D50X00009XkfAH/handling-uart-errors-with-hal

https://community.st.com/s/question/0D50X00009XkfFg/stm32-hal-uart-error-managment

https://community.st.com/s/question/0D50X00009XkgNBSAZ/hal-uarts-and-overrun-errors

https://community.st.com/s/question/0D50X00009Xkfom/stm32f407-uart-error-handling

In our system we have multiple UART channels between several MCUs. The data is passed in packets with variable length using standard DMA for transmission and circular DMA for reception. Since low latency is important for us, we also have RTO enabled to read partial buffers when there are pauses in reception.

Problem 1: Recently updated HAL treats RTO as an error and stops communication. We sort of solved this by adding user code to USARTx_IRQHandler() and clearing flag before default HAL processing. An ugly solution to artificially created problem.

Problem 2: It seems HAL uses same UART_DMAError function for hdmatx->XferErrorCallback and hdmarx->XferErrorCallback. This function stops BOTH Tx and Rx DMA transfers regardless of an error. As a result Rx error on device A terminates transmission as well, which causes Rx error on device B, which terminates its own DMA too.

At this point both devices trying to restart UART, and then simple timing decides outcome: if receiver is ready when transmission begins everything is OK. If Tx begins first then receiver seems to catch the middle of incoming frame and rise an error again.

Question 1: is there an example of correct HAL error handling? Every single piece of code I found on the web does something like calling Error_Handler() in main, which is not really "handling".

Question 2: Why RTO is treated as an error in HAL? And not just any error, but a "blocking" one, requiring stopping everything right away.

Question 3: Are there really any Tx errors? I went through all the register descriptions and all I could see are Rx-related interrupts, like parity, frame, overflow etc. Furthermore, in HAL_UART_IRQHandler() all the errors are processed in Rx half of the code, transmission half only checks for TC flag. On the other hand HAL_UART_ERROR_DMA looks like "all in one" for Rx and Tx, and there is no way to tell the difference in user callback function.

Pavel A. · ‎2021-03-23

Which STM32? Does your model have UARTs with RTO?

There is a set of very good examples of UARTs with DMA by Tilen Majerle.

If you haven't seen these yet, do it now.

-- pa

maple · ‎2021-03-23

We have a mix of STM32L431CC, STM32L4Q5CG and STM32F303RET6, as well as some non-STM devices and even USB-UART dongles bridging to PC. The STM devices do have RTO and we are using it already, after hacking stm32lxx_hal_uart.c to disable default handling of it as an error.

The application note by Majerle is a perfect example of what I was talking about - completely devoid of any error handling.

There are tons of examples like that on the web, good for teaching students but useless for real world.

waclawek.jan · ‎2021-03-23

Cube/HAL inevitably implements only one or a limited number of usage models, that's what abstraction is all about. In this particular case, the model is "UART mostly works, and if anything happens, all communication is off, let's start from scratch". This is not a bad model for things like bootloaders, where the user can simply retry by replugging in the unlikely case of error.

There are people who try to wrestle a different model into Cube/HAL's framework, and sometimes (maybe most of the time - usually we see only the failures here) they succeed. I don't recommend it, though.

UART is what, 3 registers, and DMA another 4? OK I'm exaggerrating here, but it's not rocket science either.

JW

HPATH.1 · ‎2022-11-28

no one answered this yet. :smirking_face: