Showing results for 
Search instead for 
Did you mean: 

System freezes, stuck in USART interrupt

The MCU is STM32F469, using RTOS: AZURE ThreadX and FatFS.
The system is fetching data through 3 UART ports and doing logging (saving data into a file).
Logging is setup to run once one hour, about 1000 bytes.
UART read and write both use DMA.
The system sometimes gets into frozen state after 1~2 days.
Here frozen means the UI becomes unresponsive (not responding to keypresses). Logging also stops.
Note the freezing only happens when logging is running. 
If logging is turned off, the system can keep running for very long time.
Using debugger, it shows this stack:
HAL_UART_IRQHandler() at stm32f4xx_hal_uart.c:2,489
<signal handler called>() at 0xfffffffd
HAL_UARTEx_ReceiveToIdle_DMA() at stm32f4xx_hal_uart.c:1,855
Also the USART_SR value is 0xd8, which means there is ORE error.
Watched a few freezing cases, the stack is always the same, stuck in HAL_UART_IRQHandler().
It looks like the USART interrupt happens again and again, cannot be cleared.

A very interesting thing is, if a system is frozen, then I connect a debugger to it to check information (with a special setting to not reset the target), once I press the resume button, the system recovers from freezing and runs normally: UI unresponsive, logging continues.


Could anyone have any ideas how this could happen?

The bug is likely in your code somewhere buried deep among the other logic. ORE is likely a symptom not the cause. What could prevent UART interrupts from happening in your code for more than one char duration? Where are other threads at when this happens?

Can always set up UART in circular mode, never disable it, and handle characters as they come in. No possibility of ORE with that scheme.

If you feel a post has answered your question, please click "Accept as Solution".

Thanks for the help.

The UARTs are using DMA for receiving, so they need not to happen on every char. The interrupt is triggered only when errors happen. The errors can be FE, ORE or PE, though PE won't happen as we don't enable the parity bit. Sometimes the errors are FE + ORE together. Sometimes it just ORE.

A little bit more information - when the freezing happens and the debugger is on, I tried to run pause-run-pause-run a few times, every time it is stopped inside the HAL_UART_IRQHandler(), while the context - the task being interrupted - changes among a few different cases, and they appear normal. This probably means the system is running normally except being interrupted indefinitely by UART, causing by error(s). 

Back to the UART IRQ handler, there is code to handle the various errors. Experiment shows FE and ORE happen quite often (unfortunately), and they are handled properly almost always until in this case.

BTW, your advice of "set up UART in circular mode, never disable it, and handle characters as they come in" is great, which is also what I normal do. This code is not working that way though. It re-configs UART every time it starts a Rx or Tx, which must be the cause of so many errors to be handled.

However, this is inherited code and it works OK until now, I'll see if it can be easily fixed, or a complete rewrite is needed.



> Experiment shows FE and ORE happen quite often (unfortunately)

Probably should solve the frame errors first. It is not normal for those to occur. Perhaps clocks are slightly mismatched. HSE driven by crystal is going to be more accurate than HSI. Lots of other discussion in reference manual on acceptable clock mismatch and how to improve it.

If you feel a post has answered your question, please click "Accept as Solution".