cancel
Showing results for 
Search instead for 
Did you mean: 

HAL_UART Interrupt mode driver can lose characters and stop Rx forever

After 'fighting' with the speed (performance) of the HAL_UART drivers used in Interrupt mode (_IT) in a complex FW system (using RTOS and many other interrupts and peripherals), I have realized three issues with the default HAL UART drivers:

  1. They are not really optimized for high UART baud rates (transmission speed)
  2. They are 'tricky' to use in a 'mixed' system with RTOS threads and Interrupts - they have a 'race condition' and potentially UART Rx can be/remain disabled forever
  3. If you hit an 'Overrun' condition (driver too slow, INT handler postponed) - the Rx remains disabled forever (nobody tells you - just by understanding the source code of the drivers!)

Even, you think the UART works fine (with USB ST-Link VCP, e.g. the new NUCLEO-H743ZI2 with ST-Link V3 seems to work up to 10,000, 000 baud) - in user interactive mode (!) - it fails when sending files from terminal (e.g. TaraTerm) or sending UART commands from a Python script in 'full speed'.

You will have a speed (baud rate) on UART where the drivers (and system, with RTOS) cannot keep up with: at least lost characters (ST-Link VCP does not have a flow control), and worst-case: Rx stops forever).

My Approach:

a) I use RTOS with two threads: one for UART Tx, one for UART Rx.

b) The threads are synchronized by giving a Semaphore from INT in order to release the threads (if Tx has completed or Rx has a character received).

c) I want to have single-character UART Rx, so that I can send back (as remote echo), therefore I call the HAL_UART_Receive_IT(huart, uartRxChr, 1); always for one character to be received.

d) Due to fact, that a completed UART Rx disables the interrupt - you had to call this

HAL_UART_Receive_IT(huart, uartRxChr, 1); all the time again, after each single character received.

e) Thinking about, to call this Rx enable function again in an RTOS thread - which can create a huge latency (and overhead) - potentially losing characters, I tried to use this

function again in the UART Interrupt handler (after/during Rx INT processed - initialize Rx INT again for next character, an endless Rx).

But this is 'tricky' and can potentially fail! (see below - a race condition due to a single, shared UART Tx and Rx driver)

So, what can happen are these cases:

  1. Your system cannot keep up with the UART Rx speed - you hit the 'Overrun' condition (ORE). If you do not have an UART ErrorHandler - the Rx Interrupt is disabled (if you do not relaunch this function again).
  2. So, you try to start the UART Rx again, with calling HAL_UART_Receive_IT(huart, uartRxChr, 1); . But you do this, when the context is still the MCU Interrupt (via/in this void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart) handler). Bear in mind, this code is part of the real MCU INT handler!
  3. The UART Tx is launched from the RTOS thread. So, it runs outside the MCU INT context. And here you can have a 'race condition'!

So, if you get UART Rx characters very fast (via sending a file - you will not realize by typing interactively in terminal), you can lose characters, enter this 'overrun' condition and!:

If your UART Tx thread wants to send, it calls the HAL_UART_Transmit_IT(&huart3, buf, (uint16_t)len); function and it enters the driver.

But this driver is a 'shared' driver, it handles Tx and Rx and it has therefore a LOCK mechanism: __HAL_LOCK(huart);

Assume, you run inside this LOCKed region but a new UART Rx INT kicks in and at the end you try to relaunch the UART Rx Interrupt (because it was stopped) - this is tried to be done inside the MCU INT handler. But it will see that the driver is LOCKEed (sure, Tx was interrupted when inside this region). And this call will fail (rejected as busy), the Rx is not initialized again. And your receiver stops forever.

Solution:

The idea to do both Tx and Rx inside the RTOS threads worries me due to latency, overhead and late reaction on a received character. I want to have a real, endless UART Rx interrupt, in the background, with a large Circular Buffer.

So, I decided to do this:

a) remove the code inside the HAL drivers which would disable the Rx: let the Rx INT active and be running all the time, without a need to call this HAL_UART_Receive_IT again and again

b) remove the UART Rx disable on the 'overrun' condition, best to have an UART ErrorHandler (actually, the INT flags should be cleared properly already)

c) write a new, separate UART Rx driver which is not shared with the Tx driver. So, that it does not need this __HAL_LOCK(huart); mechanism and the Tx and Rx drivers are separated and used exclusively. Doing this: make sure the overhead is small, the speed is fast and the number of instructions is reduced (I do just the mandatory stuff for the UART mode I use/need, not all the different options, checks etc.).

d) place the drivers and handleers into the ITCM memory: even assuming the cache on the CM7 is nice, but if you have a 'cache miss' for the UART INT handler code - you get an additional latency penalty. So, I use this __attribute__, e.g. as HAL_StatusTypeDef __attribute__((section(".itcmram"))) HAL_UART_Receive_IT, in order to place all this code into the ITCM memory (assuming it is the fastest execution speed there and it reduces the caches I would need for other code). Just make sure your startup will copy this code from Flash ROM into ITCM memory.

Results: with these changes I could double my maximum working baud rate to 3,686,400 baud. The test criteria is to send a huge file from TeraTerm w/o to lose any character. BTW: a nice Circular Buffer for the UART Rx is also in place, so that all received characters from file are stored there: a file with (at least) this amount of characters 'must' work w/o to lose any character. And it does now (but not before!).

BTW: it is also 'helpful' to think about how to assign INT priorities and RTOS thread priorities, if you want to have or not a task context switch triggered from an Interrupt etc. This 'system design' has also a huge impact on the performance, if you lose characters or not (I do not care if my UART Tx is behind, most important is to make sure I am able to receive all characters without 'gaps').

Conclusions:

  • the HAL UART drivers are not really optimized for maximum UART speed
  • they are 'tricky' in terms to use functions inside INT handlers
  • they are 'tricky' in terms of 'dead lock' situations, e.g. in combination with an RTOS
  • nobody tells you that entering an Overrun condition will disable the Rx Interrupt (and you would not think about to hit this case and not a UART ErrorHandler coded)
  • to have a look into the driver implementation can help to understand 'your' system and realize potential issues, and creating your own 'optimized' (stripped down) drivers, to separate Tx and Rx etc. It can be a bit of work but a huge performance boost at the end

1 REPLY 1
MarkC
Associate

Hi Torsten.

I totally agree. I've been around quite a few of the manufacturers SDKs, and they all seem to have these weirdly contrived setups for asking for receive, and enabling/disabling the receive path for each requested set of characters. Why don't they have the conceptually simple circular buffer between interrupt and main thread or task? And then we wouldn't have to re-invent the wheel for every new SDK we use. :(

Cheers,

Mark