2024-02-01 05:15 PM
A very interesting thing is, if a system is frozen, then I connect a debugger to it to check information (with a special setting to not reset the target), once I press the resume button, the system recovers from freezing and runs normally: UI unresponsive, logging continues.
Could anyone have any ideas how this could happen?
2024-02-01 06:54 PM
The bug is likely in your code somewhere buried deep among the other logic. ORE is likely a symptom not the cause. What could prevent UART interrupts from happening in your code for more than one char duration? Where are other threads at when this happens?
Can always set up UART in circular mode, never disable it, and handle characters as they come in. No possibility of ORE with that scheme.
2024-02-04 05:31 PM
Thanks for the help.
The UARTs are using DMA for receiving, so they need not to happen on every char. The interrupt is triggered only when errors happen. The errors can be FE, ORE or PE, though PE won't happen as we don't enable the parity bit. Sometimes the errors are FE + ORE together. Sometimes it just ORE.
A little bit more information - when the freezing happens and the debugger is on, I tried to run pause-run-pause-run a few times, every time it is stopped inside the HAL_UART_IRQHandler(), while the context - the task being interrupted - changes among a few different cases, and they appear normal. This probably means the system is running normally except being interrupted indefinitely by UART, causing by error(s).
Back to the UART IRQ handler, there is code to handle the various errors. Experiment shows FE and ORE happen quite often (unfortunately), and they are handled properly almost always until in this case.
BTW, your advice of "set up UART in circular mode, never disable it, and handle characters as they come in" is great, which is also what I normal do. This code is not working that way though. It re-configs UART every time it starts a Rx or Tx, which must be the cause of so many errors to be handled.
However, this is inherited code and it works OK until now, I'll see if it can be easily fixed, or a complete rewrite is needed.
2024-02-04 05:59 PM
> Experiment shows FE and ORE happen quite often (unfortunately)
Probably should solve the frame errors first. It is not normal for those to occur. Perhaps clocks are slightly mismatched. HSE driven by crystal is going to be more accurate than HSI. Lots of other discussion in reference manual on acceptable clock mismatch and how to improve it.