STM32L4 + Cellular Modem: IWDG vs Long Blocking Operations Dilemma

GR88_gregni · ‎2025-08-07

Hello STM32 Community,

I'm developing a cellular IoT device using STM32L4 and facing an architectural challenge regarding IWDG implementation with long cellular communication timeouts. I'd appreciate your expertise and recommendations.

System Overview:

Hardware: STM32L4 custom board with cellular modem (Quectel BG95/BG96)
Communication: UART-based AT commands with IDLE line interrupt (no DMA)
Application: Periodic sensor data transmission to server via UDP
Power Management: Device enters STOP2 mode between measurement cycles

IWDG Configuration:

Clock Source: LSI (32kHz)
Timeout: ~32.8 seconds maximum (IWDG_PRESCALER_256, Reload=4095)
Behavior: Once enabled, cannot be disabled; pauses during STOP modes

The Challenge: My low-level AT command function implements timeout-based communication to prevent infinite loops:

int <function_name>(const char *command, const char *expected_response, 
                                        /* other params */, int timeout_ms) {
    uint32_t start_time = HAL_GetTick();
    
    // Send AT command via UART
    
    while ((HAL_GetTick() - start_time) < timeout_ms) {
        if (rx_data_ready_flag) {
            // Copy UART response to buffer
            // Check for expected response patterns
        }
        
        if (strstr(response_buffer, expected_response)) {
            // Parse response and return success
            break;
        }
        
        // Handle error responses, retries, etc.
    }
    
    return result;
}

The Problem: Some legitimate cellular operations require timeouts exceeding IWDG maximum:

Network Time Protocol (NTP) synchronization: Up to 125 seconds
Network registration in poor coverage: Up to 60 seconds

Currently, IWDG triggers reset during these legitimate operations, causing communication failures.

Questions for the Community:

Error Masking Concern: If I add HAL_IWDG_Refresh(&hiwdg) inside the timeout loop every 25 seconds, could this mask genuine errors? For example, if strstr() encounters issues, or if the system enters an unexpected state but continues looping?
Architecture Recommendations: What's the best practice for handling this scenario?
1. Implement progress-based IWDG refresh (only refresh when receiving data)?
Real-World Experience: For those working with cellular IoT applications, how do you balance IWDG protection with legitimate long network operations?
Debugging Implications: If I implement periodic IWDG refresh, what debugging strategies would you recommend to ensure I'm not masking critical issues?

Current Timeout Examples:

Standard AT commands: 300-5000ms
Network registration: 60,000ms
NTP time sync: 125,000ms
Modem initialization: 30,000ms

The system works reliably when IWDG is disabled during development, but I need it enabled for production deployment to handle potential firmware hangs, memory corruption, or hardware issues. The device operates in remote locations where manual recovery isn't feasible, making robust watchdog implementation critical. However, cellular connectivity can be unpredictable, and legitimate operations sometimes require extended timeouts.

Any insights, experiences, or architectural recommendations would be greatly appreciated!

Best regards,

NG

Andrew Neil · ‎2025-08-07

Using a State Machine should make it easy to use non-blocking delays!

@GR88_gregni wrote:
The blocking delays are necessary because if the modem doesn't successfully obtain an IPv4/IPv6 address, the system cannot proceed to the next step.

That doesn't follow at all - that can certainly be handled without blocking delays.

@GR88_gregni wrote:
if the first chunk times out, I need to ensure the low-level function continues listening for the response rather than giving up. This would require substantial code restructuring and make the implementation much more complex and less readable.

It shouldn't do; eg,

    while ((HAL_GetTick() - start_time) < timeout_ms) {
        if (rx_data_ready_flag) {
            // Copy UART response to buffer
            // Check for expected response patterns
        }
        
        if (strstr(response_buffer, expected_response)) {
            // Parse response and return success
            break;
        }

        if( time_to_update_wd() )
        {
            // Update the WD
        }
        
        // Handle error responses, retries, etc.
    }

A complex system that works is invariably found to have evolved from a simple system that worked.
A complex system designed from scratch never works and cannot be patched up to make it work.

View solution in original post

Andrew Neil · ‎2025-08-07

Don't use blocking delays.
If you really must use blocking delays, divide them into "chunks" of less than the IWDG timeout.

A complex system that works is invariably found to have evolved from a simple system that worked.
A complex system designed from scratch never works and cannot be patched up to make it work.

GR88_gregni · ‎2025-08-07

I am using a state machine that handles the communication process with the modem. The blocking delays are necessary because if the modem doesn't successfully obtain an IPv4/IPv6 address, the system cannot proceed to the next step.

Regarding chunking the timeouts: The modem genuinely requires up to 60 seconds for successful network connection, and up to 125 seconds for NTP synchronization. If I were to chunk these operations with my current implementation, I would need to:

Split the 60-second timeout into two 30-second calls Split the 125-second timeout into four 31-second calls

However, this approach has a significant drawback: if the first chunk times out, I need to ensure the low-level function continues listening for the response rather than giving up. This would require substantial code restructuring and make the implementation much more complex and less readable.

That's why I'm asking: if I simply refresh the IWDG counter inside the existing blocking delay (within the while (timeout) loop), could this potentially mask errors that would otherwise cause the MCU to reset? My primary goal is to prevent the MCU from getting stuck while still allowing legitimate long operations to complete.

In other words, I want to ensure I'm not trading IWDG timeout issues for the risk of masking genuine system failures.

Andrew Neil · ‎2025-08-07

Using a State Machine should make it easy to use non-blocking delays!

@GR88_gregni wrote:
The blocking delays are necessary because if the modem doesn't successfully obtain an IPv4/IPv6 address, the system cannot proceed to the next step.

That doesn't follow at all - that can certainly be handled without blocking delays.

@GR88_gregni wrote:
if the first chunk times out, I need to ensure the low-level function continues listening for the response rather than giving up. This would require substantial code restructuring and make the implementation much more complex and less readable.

It shouldn't do; eg,

    while ((HAL_GetTick() - start_time) < timeout_ms) {
        if (rx_data_ready_flag) {
            // Copy UART response to buffer
            // Check for expected response patterns
        }
        
        if (strstr(response_buffer, expected_response)) {
            // Parse response and return success
            break;
        }

        if( time_to_update_wd() )
        {
            // Update the WD
        }
        
        // Handle error responses, retries, etc.
    }

A complex system that works is invariably found to have evolved from a simple system that worked.
A complex system designed from scratch never works and cannot be patched up to make it work.