cancel
Showing results for 
Search instead for 
Did you mean: 

Modbus RTU Slave Stops Responding – Only Recovers After Power Cycle

cuong
Associate II

[Issue] One Modbus RTU Slave Stops Responding After Several Months – Only Recovers After Power Cycle (STM32G4 / UART Interrupt)

Hello ST Community,

I’d like to ask for help with a real issue we're facing in the field.

System Overview:

  • We are using STM32G474 as a Modbus RTU slave.

  • UART is configured in interrupt mode using hal uart receive it 

  • There are 5 slaves on the same RS485 line, with Modbus IDs 1 to 5.

  • The master sends regular requests to all 5 slaves continuously.

Possible Causes:

  • We suspect that a UART error like Framing Error or Overrun may have occurred, which caused the UART peripheral to hang,

  • We do not use HAL_UART_ErrorCallback(), and we do not call HAL_UART_Abort() in our code,

  • In HAL_UART_RxCpltCallback(), we always call HAL_UART_Receive_IT() again — so we believed it would keep working — but somehow it stops responding after a long time.


Questions:

  1. Has anyone experienced a similar issue where a UART stops responding after months of operation?

  2. Should we always implement HAL_UART_ErrorCallback() and call HAL_UART_Abort() to ensure recovery from errors like FE, ORE, NE, PE?

  3. Is it advisable to add a software reset mechanism or watchdog recovery in case the UART gets stuck and stops responding?

We’re looking for a robust and long-term solution, since this is for field-deployed devices that cannot be manually reset easily.

Any insights, experiences, or advice would be greatly appreciated.

Thank you very much!

5 REPLIES 5
Andrew Neil
Super User

@cuong wrote:
  • We suspect that a UART error like Framing Error or Overrun may have occurred, which caused the UART peripheral to hang,


So make sure that you handle all such errors.

Pay attention to HAL return codes.

Test by deliberately injecting such errors.

Have some way to record & recover diagnostics.

Implement a "watchdog"

 

A complex system that works is invariably found to have evolved from a simple system that worked.
A complex system designed from scratch never works and cannot be patched up to make it work.
mbarg.1
Senior III

I do have many projects with Modbus RTU and no problems.

One reason is that we do use DMA transfer on top of a sound OS (ThreadX - aka Azure).

All my IOT like apps have IWDG - just if some power glitch happens - and it is not intended to recover from Modbus errors and it does not monitor serial lines.

Modbus errors must be handled at protocol level, and that can be easily achieved.

USART with DMA can run serial data transfer in 100% reliable way in background.

That is valid both for slave and master instances.

@Andrew Neil Andrew Neil

 Why does the UART recover by itself after a Framing Error?

I'm using STM32 (HAL) with 115200 baudrate.
I deliberately caused a Framing Error by sending data with the wrong baudrate.
As expected, UART stopped receiving during the error.
But after I switched back to the correct baudrate, UART started receiving data again normally,
even though I did not implement HAL_UART_ErrorCallback() or explicitly clear any error flags.

is this expected behavior?
 Does the internal UART peripheral automatically reset itself when baudrate matches again?

Error Callback it is just signalling errors; logic in usart is detrministic, after N clock cycles you get back to a steady state.

N can be huge, but N exist by deisgn.

The only cause to hang an usart, is by changing registers value or having clock glitches - both can happen by static electricity common in industrial environment or by power induced transients.

While N exist, there are some register combinations that can get usart stuck - if stuck, you could get no callback.

You will need more simulations before dismiss this case and have your design approved for prime time.

Remember that working with debugger it is different from real operation - that make things more difficult, but if it happens while debugging, it happens also in stand alone mode, but not the contrary.

Should you decide to go further, please describe you test environment and hw setup.

We all have good glass balls but reducing inceranity will lead to better advice.


@cuong wrote:

 Why does the UART recover by itself after a Framing Error?


The UART itself does recover - it'll be something in your code that gets "stuck".

A complex system that works is invariably found to have evolved from a simple system that worked.
A complex system designed from scratch never works and cannot be patched up to make it work.