cancel
Showing results for 
Search instead for 
Did you mean: 

Application based on LoRaWAN_End_Node_FreeRTOS hangs sporadically

JBive.1
Associate II

Hi all,

We have a sporadic issue which is difficult to debug. Maybe you can give some advice / solutions / hints? Any input is welcome!

We are working on an application based on the project LoRaWAN_End_Node_FreeRTOS of STM32CubeWL V1.3.0, board: Nucleo-WL55JC1. We frequently add small updates and re-build the application (build system: CMake). The application usually works for many days without issues. However, in some builds we observe sporadic errors: the application hangs with interrupts disabled, so that the watchdog (IWDG) resets the MCU. We are positive that the program does not get stuck in an interrupt/exception-handler, because we instrumented them all.

Software components used:

  • FreeRTOS V10.2.1
  • LoRaWAN stack by ST
  • UTIL_ADV_TRACE (HAL_UART_Transmit_DMA)
  • UTIL_TIMER
  • UTIL_LPM
  • IWDG
  • ... from package STM32CubeWL V1.3.0.

One specific build of the application shows the following behavior: Reset; LoRaWAN join; transmit/receive data packets every 60 s; at uptime = 541 s: application hangs after waking up from STOP mode -> IWDG reset. We did some experiment with this build, trying to catch the bug. Observations:

  • The error happens about 4 out of 5 times after 541 seconds (reproducible).
  • If the error does not happen, the application just runs normally (2 days and longer).
  • When a debugger (STM32CubeIDE) is connected -> no error or error happens later (?).
  • Changing a startup delay (constant in the code) by 1 second -> no error or error happens later (?).
  • Changing something in the code, e.g. in a function that is never called -> no error or error happens later (?). So the error behavior changes when the firmware changes, but the program flow is the same?
  • Stack overflow detection (of FreeRTOS tasks) is activated. No stack overflow was observed.
  • Stack size of main stack (MSP) was increased by many KB -> did not help.
  • Change optimization from -Og to -O2 -> no error or error happens later (?).

Thanks!

 

3 REPLIES 3
JBive.1
Associate II

Update:

We managed to connect the debugger and pause the target after the application hanged (at uptime = 541 seconds) and before the watchdog triggered. (See above: With the debugger connected to the target, the application does not hang.)

We found that the application gets stuck in xTaskResumeAll() in tasks.c in an endless loop (Middlewares/Third_Party/FreeRTOS/Source/tasks.c).

Any ideas?

Did anyone successfully build an application from the above components?

Thanks!

 

 


@JBive.1 wrote:

We found that the application gets stuck in xTaskResumeAll() in tasks.c in an endless loop


So did you go on to find what was causing it to get stuck there?

What prevents it from exiting that loop?

Have you tried the FreeRTOS forums for help with that?


@JBive.1 wrote:

We are positive that the program does not get stuck in an interrupt/exception-handler, because we instrumented them all.


Have you instrumented the whole application? 
Does that show anything different between working & non-working scenarios?

 


@JBive.1 wrote:
  • When a debugger (STM32CubeIDE) is connected -> no error or error happens later (?).

When you say "connected", is that just when it's physically attached, or during an active debug session?

Remember that, during an active debug session, the device won't actually be going to sleep - so that may be a clue ... ?