2023-12-20 10:45 AM
Hello,
I am working with a custom board communicating via USB-2-USART IC (FT234XD-R). Upon power up, the board operates as expected. I can communicate with the device . After some time--usually 60sec or so, but not perfectly periodic-- the device will stop responding. Using a logic analyzer, it shows the FT234 is still sending data to the chip, but the chip is not responding.The graph below shows a power up, communications start, MCU stops responding and reset button restores COMS indefinitely. I have left the board running for 24hrs and do not see the same issue after reset.
Using the hardware reset button on the board, I can reset the MCU and no longer have any issues. A few debugging techniques I have tried so far:
Current band-aid
Setting the independent watchdog fixes the issue, but I would like the resolve, rather than band-aid, the issue.
Background information
STMCUBE v 1.14
USART2 with RX and TX both using the DMA on different channels.
Configuration
Initialization
HAL_UARTEx_ReceiveToIdle_DMA(&huart2, (uint8_t*)uartRxBuffer, RX_BUFFER_MAX);
Callback structure
Debug mode w/ break points
When operating in debug mode I am unable to recreate the issue. Everything works as intended after power-up. I no longer have to click the hardware reset.
{
/* USER CODE BEGIN Error_Handler_Debug */
/* User can add his own implementation to report the HAL error return state */
__disable_irq();
TIM21->CCR1 = 60000;
while (1)
{
}
/* USER CODE END Error_Handler_Debug */
}
Setting an LED flash if UART config changes or breakout of main loop
This debugging method did not show an relevant information. The UART would stop responding, but we never entered the error handler or broke out of the main loop.
Let me know if more information is needed. Thanks for stopping by!
Isaac.
2023-12-20 11:05 AM
Check that the UART isn't holding any sticky errors like noise, parity, framing, over-run, etc. Clear those errors if pending.
2023-12-20 12:05 PM
Assuming that HAL_UARTEx_RxEventCallback isn't being called anymore after a bit.
You don't check for HAL status when you call HAL_UARTEx_ReceiveToIdle_DMA. If it returns HAL_BUSY, thenRx interrupts is not longer enabled. So you're pretty much dead in the water.
See this Wiki and project which shows how to set a flag in order to recover if HAL_UARTEx_ReceiveToIdle_DMA does return HAL_BUSY.
https://github.com/karlyamashita/Nucleo-G431RB_Three_UART/wiki
2023-12-20 02:21 PM
Hello all,
Thank you for your responses. Update on data gathering. What complicates the issue is I can not recreate it when trying in debug mode, so I've had to set flags and visuals to see if certain blocks of code run.
Question 1: Could it be my code outside of the UART COMS?
Answer 1: When not communicating with the board. I do not see a reset, so thus the watchdog is getting pet and all is well. I do not believe there to be problematic code outside of the information I have presented. If the board were reset by the watch dog, we would see an anomaly int the heart beat pulse in red seen below.
Question 2(Tesla DeLorean :( UART isn't holding any sticky errors like noise, parity, framing, over-run, etc
Answer 2: I checked that the UART error callback is not called and checked the interrupt was enabled. (Watch dog is disabled now so I can glimpse into the issue) From the graph you can see that COMS start and everntually the MCU stops responding. In my UART error callback I turn off the heart beat light PWM, but notice that it never reaches that block of code from the graphs perspective.
Question 3(Karl Yamashita:( HAL_UARTEx_ReceiveToIdle_DMA returns busy & dead in the water
Answer 3:As a quick check if this would be the culprit, I put the HAL_UARTEx_ReceiveToIdle_DMA right in my main loop to be consistently re-initialized. From the graph, you can see that the issue still arises.
Great call out. I will be sure to add this redundancy checking in my code.
2023-12-21 08:10 AM
Please see response below.
2023-12-21 08:11 AM
Thanks for the comment! Please see the response below.
2023-12-21 10:09 AM
Yeah, what I'd want to see are the registers from the DMA and UART peripheral units in the failing, and non-failing conditions, so I could better understand the internal context, rather than observations at a pin level.
Print them out via a secondary debug channel, not screen shots in the debugger. Watch the reading DR/RDR or FIFOs has secondary effects on the status, etc.