cancel
Showing results for 
Search instead for 
Did you mean: 

Bug in LoRaWAN software expansion for STM32Cube

Richard Cunliffe
Associate

I am using V1.1.5 of the LoRa software expansion for STM32Cube and have come across an issue which leaves the stack unusable until reset.

Although the problem was first encountered in our application project, I verified that the problem also occurred when using the AT command line slave project that is shipped with the package, thus making it easy to reproduce.

The set of circumstances that create the problem are:

1.      AT+NJM=1 to enable OTAA join mode.

2.      AT+CFM=1 to enable confirmed messages.

3.      AT+JOIN to join the network.

4.      AT+NJS=? to query join status (repeat until joined).

5.      Disable Gateway.

6.      AT+SEND=1,1 to send message.

7.      Send will timeout due to no ACK being received.

8.      Enable Gateway.

9.      AT+JOIN to rejoin network.

10.  AT+NJS=? to query join status (repeat until joined).

11.  AT+SEND=2,2 to send message. Returns AT_ERROR.

After this, any AT+SEND or AT+JOIN commands will return AT_ERROR and will continue to do so until the stack is restarted.

The reason for this was found to be because the MAC state (LoRaMacState variable) was at 0x0001 (LORAMAC_TX_RUNNING) instead of having cleared to 0x0000 (LORAMAC_IDLE), thus effectively locking the stack up as busy.

This in turn was traced to the AckTimeoutTimer.IsRunning flag being set to 1 which in OnRadioRxDone() prevents the AT+JOIN procedure fully completing:

  

   if( AckTimeoutTimer.IsRunning == false )

   {

// Procedure is completed when the AckTimeoutTimer is not running anymore

       LoRaMacFlags.Bits.MacDone = 1;

AckTimeoutTimer is not used by the join process, but is used during confirmed sends and it was found that the failed confirmed send (step 6 above) resulted in AckTimeoutTimer.IsRunning being left at 1 on completion of the process (not an issue then because OnRadioRxDone() not called as nothing received).

The issue looks to be with the timer implementation in timeServer.c.

When a timer is started, it is inserted into a list of active timers that is managed by the firmware such that the order of the list is the order in which the timers are scheduled to expire (head of list = first to expire). Each timer has a .IsRunning flag associated with it, but rather than being set for each active timer it would appear that only the timer at the head of the list gets it’s flag set.

When a timer expires, the TimerIrqHandler() calls the associated callback function for the timer and the timer is removed from the list. A timer can also be stopped before it has expired by a manual call to TimerStop() which, once again, sees it removed from the list.

Each timer callback function begins with a TimerStop() call for that timer, which implies that the timer has to be manually stopped (much like an interrupt flag has to be cleared by an ISR). Calling TimerStop() for a specified timer will search the list of timers for that timer and, if found, remove it from the list and clear its .IsRunning flag. However, because the TimerIrqHandler() function which invoked the callback has already removed the timer from the list, the TimerStop() function effectively does nothing because the timer is no longer in the list. Crucially, the removal of the timer from the list in the TimerIrqHandler() function does not also clear the .IsRunning flag, hence the AckTimeoutTimer’s flag remaining set despite the timer having expired.

Although this appears fairly fundamental, the only checking of a timer’s .IsRunning flag outside of the timer functions themselves, is in the OnRadioRxDone() function in LoRaMac.c, hence it not being noticed more readily.

Three different fixes were tested and verified to overcome the issue:

1.      Explicit clearing of timer’s .IsRunning flag in TimerStop() function (regardless of whether timer has already been removed from the list).

2.      Make existing TimerExists() function in timeServer.c global and replace check for AckTimeoutTimer.IsRunning flag in OnRadioRxDone() function with test of TimerExists(&AckTimeoutTimer).

3.      Remove deletion of timer from list in TimerIrqHandler().

 

Solution 2 has the least risk as it addresses one specific incidence, although, unlike the other 2, it does not actually clear the AckTimeoutTimer.IsRunning flag, so I think solution 1 is the best as it clears the flag and is still low risk (the flag should be cleared when the timer is stopped).

3 REPLIES 3
Amel NASRI
ST Employee

Hi @Richard Cunliffe​ ,

Many updates were performed in the version 1.2.0 of the I-CUBE-LRWAN package.

Could you please use it, and keep us updated if the same issue is still faced with this version?

-Amel

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

FFrum
Associate

Hello.

We have incurred into the same issue last week, using the 1.2.0 version of the package, and come to the very same conclusions as @Richard Cunliffe​ , that is the stack relies on the IsRunning flag of the time server.

We solved it independently (unfortunately) patching the time server code as in the proposed solution number 1.

Hope this helps.

-F

Marcuka
Associate III

Hello, when can I expect an official new release of the software with that bug fixed?

And one more question. If I send confirmed up type messages, how can my application be informed that no response is received on the sent messages?