cancel
Showing results for 
Search instead for 
Did you mean: 

STM32 + Quectel BG95 Modem: System hangs after ~60 days - UART Idle Line setup never returns

GR88_gregni
Associate III

Background

We have a battery-powered device using:

  • MCU: STM32 (STM32L4series)
  • Modem: Quectel BG95 (cellular)
  • Communication: UART with Idle Line Interrupt (HAL_UARTEx_ReceiveToIdle_IT)
  • Power mode: Device goes to sleep periodically, wakes up to communicate with modem

Architecture Pattern

Our communication pattern:

  1. Device wakes from low-power mode
  2. Call low-level AT command function that:
    • Enables UART RX Interrupt with Idle Line detection (HAL_UARTEx_ReceiveToIdle_IT)
    • Sends AT command
    • Waits for response with timeout
    • Disables interrupt (HAL_UART_Abort_IT)
  3. Process response
  4. Go back to sleep

This means we re-enable/disable UART Idle Line interrupt on every transaction (potentially hundreds of times per day).

UART + Ring Buffer Setup:

#define RING_BUFFER_SIZE 2048
#define ISR_BUFFER_SIZE  1024

typedef struct {
    lwrb_t rb;                              // lwrb ring buffer
    volatile bool rx_data_ready_flag;       // Flag set by ISR
    uint8_t rb_buffer[RING_BUFFER_SIZE];    // Ring buffer storage
    uint8_t isr_buffer[ISR_BUFFER_SIZE];    // Intermediate buffer for Idle Line ISR
} uart_rb_t;

uart_rb_t modem_rb;

 

ISR Callback:

void HAL_UARTEx_RxEventCallback(UART_HandleTypeDef *huart, uint16_t Size) {
    if (huart == &MODEM_UART) {
        // Save data from ISR buffer to ring buffer
        lwrb_write(&modem_rb.rb, modem_rb.isr_buffer, Size);
        modem_rb.rx_data_ready_flag = true;
        
        // Re-enable Idle Line interrupt
        int retries = 10;
        do {
            if (HAL_UARTEx_ReceiveToIdle_IT(&MODEM_UART, 
                                            modem_rb.isr_buffer,
                                            sizeof(modem_rb.isr_buffer)) == HAL_OK) {
                break;
            }
            retries--;
        } while (retries > 0);
    }
}

 

The Problem

After ~60 days of continuous operation, the device completely froze with no recovery.

Symptoms:

  • System printed the last log line: >>StartParse >>>>
  • Then complete silence - no further output
  • No HardFault triggered (we have handler with reset + logging - it was never called)
  • Device required power cycle

Last logs before hang:

<<EndParse <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>StartParse >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"AT+QISTATE=1,1" --> [response received OK]

<<EndParse <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>StartParse >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"AT+QISTATE=1,1" --> [response received OK]

<<EndParse <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>StartParse >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"AT+QISTATE=1,1" --> [response received OK]

<<EndParse <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>StartParse >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[SYSTEM FROZE HERE - no further output]

Notice: The command string was not printed in the last call, suggesting the code hung before that point.

Code Structure (simplified)

Parent function:

uint32_t tick = HAL_GetTick();
do {
    return_code = modem_send_command_wait_parse_result(
        "AT+QISTATE=1,1", 
        "+QISTATE:", 
        /* parsing params */,
        300 /* timeout ms */
    );
    
    if (condition_met) break;
    HAL_Delay(100);
    
} while (HAL_GetTick() - tick < 15000);  // 15 second outer timeout

 

Low-level function structure:

int modem_send_command_wait_parse_result(..., int timeout, ...) {
    // Local buffers
    char formatted_command[1024] = {0};
    char buffer_final[2048] = {0};
    unsigned int total_bytes = 0;
    
    printf(">>StartParse >>>>\n");
    if (command_to_send != NULL) {
        printf("\"%s\" --> ", command_to_send);
    }
    
    // Enable UART RX Interrupt with Idle-line detection
    int retries = 10;
    do {
        if (HAL_UARTEx_ReceiveToIdle_IT(&MODEM_UART, 
                                        modem_rb.isr_buffer,
                                        sizeof(modem_rb.isr_buffer)) == HAL_OK) {
            break;
        }
        retries--;
    } while (retries > 0);
    
    if (retries == 0) {
        return_code = -1;
    }
    
    uint32_t tick = HAL_GetTick();
    
    // Main receive loop with timeout
    while (return_code > 0 && ((HAL_GetTick() - tick) < timeout)) {
        // Check flag and read from ring buffer
        if (modem_rb.rx_data_ready_flag) {
            modem_rb.rx_data_ready_flag = false;
            
            int bytes = lwrb_read(&modem_rb.rb, 
                                  &buffer_final[total_bytes],
                                  sizeof(buffer_final) - total_bytes);
            total_bytes += bytes;
        }
        
        // Parse response, check for expected strings, etc.
        // ...
    }
    
    // Cleanup
    HAL_UART_Abort_IT(&MODEM_UART);
    lwrb_reset(&modem_rb.rb);
    
    return return_code;
}

 

My Questions

  1. Is re-enabling UART Idle Line interrupt on every transaction a valid approach? Could repeatedly calling HAL_UARTEx_ReceiveToIdle_IT (with 1KB buffer) → HAL_UART_Abort_IT → HAL_UARTEx_ReceiveToIdle_IT (hundreds of times over 60 days) cause UART peripheral corruption?
  2. The retry loop, when we enable idle line in low level, sometimes returns != HAL_OK (HAL_BUSY or HAL_ERROR). Is this expected behavior, or does it indicate underlying UART state problems that could accumulate over time? 
  3. Race condition in flag handling? The rx_data_ready_flag is set in ISR and cleared in main loop without atomic operations. Could this cause issues:
// Main loop (non-atomic):
   if (modem_rb.rx_data_ready_flag) {        // Read
       modem_rb.rx_data_ready_flag = false;  // Write - ISR could interrupt here!
   }​
  • Could printf() cause the hang? We use printf extensively for debugging over a separate UART. Could printf buffer overflow or UART TX blocking cause the system to freeze without triggering exceptions?
  • HAL_GetTick() overflow handling: After 50 days, HAL_GetTick() wraps around. Our timeout check is (HAL_GetTick() - tick) < timeout. Is this safe with overflow?
  • Ring buffer overflow: If the 2KB ring buffer fills up and data is lost, could this cause the expected response string to never arrive, leading to timeout? Though this should be caught by the timeout logic...

Request for advice:

  • Are we using UART Idle Line correctly for this use case (repeated enable/disable)?
  • Is the flag handling race condition a real concern, or is it benign?
  • Any known issues with long-term UART peripheral usage on STM32?
  • Could be an issue the OPEN LOG dev board that we have connected to our device in order to collect logs into an SD card?
10 REPLIES 10
GR88_gregni
Associate III

@TDK @Andrew Neil  I used my program in debug mode and I saw that the program when called HAL_UARTEx_ReceiveToIdle_IT  stopped inside this function in the stm32l4xx_hal_uart.c 

void HAL_UART_IRQHandler(UART_HandleTypeDef *huart)
{
  uint32_t isrflags   = READ_REG(huart->Instance->ISR);
  uint32_t cr1its     = READ_REG(huart->Instance->CR1); // Stoped here
  uint32_t cr3its     = READ_REG(huart->Instance->CR3);

  uint32_t errorflags;
  uint32_t errorcode;
.....
}

The call stack in debug mode is the following:
HAL_UART_IRQHandler() ->Stopped here at the line that mentioned above.
USART1_IRQHandler() -> HAL_UART_IRQHandler(&huart1);
UART_Start_Receive_IT() -> 

/* Computation of UART mask to apply to RDR register */

UART_MASK_COMPUTATION(huart);


HAL_UARTEx_ReceiveToIdle_IT()
modem_send_command_wait_parse_result()
modem_open_connection()
udp_FSM_open_socket()


I saw the UART1 ISR register and the only bits that are asserted are: 
IDLE
RXNE
TC
TXE
EOBF

From the register side:
sp = 0x20016f60
lr = 134363963

Disassembly shows this:

2294        uint32_t cr1its     = READ_REG(huart->Instance->CR1);
0802b8f6:   ldr     r3, [r7, #4]
0802b8f8:   ldr     r3, [r3, #0]  Mentioning this
0802b8fa:   ldr     r3, [r3, #0]
0802b8fc:   str.w   r3, [r7, #224]  @ 0xe0



I can't understand why it doesn't print the command in the 

modem_send_command_wait_parse_result

The debug mode does not showing anything wrong like the command not normally passed inside this function, I can see it but I didn't see it in the logs. Furthermore, I can't understand why it stack and left my MCU idle it does not doing anything stack there for hours.

When I hit run code it continue correctly. I saw that in low level function modem_send_command_wait_parse_result() the receive to idle fails in the first try and went to enable it in the second try and stopped there. But when I hit the play button it continue.