LoRaWAN stops sending data after a month or so

Sebastian Miller · ‎2018-04-03

Posted on April 03, 2018 at 13:30

Hi,

We are working on a sensor using LoRaWAN for communication.

Our setup is a custom board using the

CMWX1ZZABZ module from Murata for the communication part (this module is the same as the one on the B-L072Z-LRWAN1 Discovery Kit). The sensor is connected to the LoRaWAN module with 2 wires: 1 output to activate the sensors, 1 input to get the sensor status.

For the LoRaWAN code, we used as base the End_Node project of I-CUBE-LRWAN v1.1.1 (01-June-2017). The main differences with the original project are:

- Disabled vcom and corresponding UART (for power saving)

- Added ADC configuration

- Added an interrupt for the sensor status GPIO

- The application data transmission duty cycle is changed to 10 minutes.

The STM goes to STOP mode using WFI, so it is woken up either by the 10 minute timer, or the sensor status interruption. On wake-up, if the sensor status pin is changed, the new value is sent, otherwise a heartbeat packet is sent.

We flashed 40 devices about 2 months ago as a trial. After about a month, the heartbeat of some devices suddenly stops, and the only way to get them going again is to reset them. So far, 7 devices out the 40 have failed, and we have not managed to reproduce the problem under debug conditions.

We feel quite blocked, as it is very difficult to find the origin of the problem: any debug output was disabled to save power, and connected a debugger resets the device.

We are unsure if the problem could be in the firmware or the hardware.

Has anybody seen a similar problem before ? Or has anybody any suggestions as to a way of debugging efficiently such a problem ?

Thanks !

#stm32lo #lorawan #lora #murata-cmwx1zzabz #murata #i-cube-lrwan

Tesla DeLorean · ‎2018-04-03

Posted on April 03, 2018 at 15:29

You will need to find ways to accelerate the failure. ie start more frequently, have the sensor trigger more often.

Get some method of instrumentation, and power in the lab. Look for loops where it could get stuck, ie Hard Fault, etc.

Use an external device to monitor the device to see when things stop, or the rules haven't been followed.

Check if the problem occurs if the periodic wake up and sensor trigger are coincident, either having some logic/exit issue, or a race condition. Check all error paths, and have a means to report, ie bring up a USART later if/when needed.

If available use a secondary alarm you can advance at each wake to act as a longer term watchdog.

If you have dynamic memory allocation check for fragmentation and resources leaks.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

T J · ‎2018-04-03

Posted on April 04, 2018 at 01:49

(I have only done some hours work in LoRa)

do you expect a response from your packet ?

it sounds like the unit wants to transmit a sensor packet, but it can not. hence no heartbeat packet.

'it is woken up either by the 10 minute timer, or the sensor status

'

I guess, its not waking up.

how do you set the 10 minute timer ? (I can imagine many methods of failure here.) I think this could be your problem.

you could set an IO pin to toggle during wakeup, setup a scope to trigger on that event.

I think you will see the wake up has failed.

LOW Power ?? is it low voltage too ?

how low (voltage) can you go before internal RAM becomes flakey ? I think this could be your problem.

how are you providing the Low Power ?

( stabbing in the dark here) if the STOP registers are corrupted, it will hang.

do you reload the STOP mode configuration totally ? I think this could be your problem.

can you install a wakeup button ?

does it wakeup every time you hit it ?

AVI-crak · ‎2018-04-03

Posted on April 04, 2018 at 03:10

The time counter has 32 bits, and the report is 1 ms.

The transition through zero occurs in one month + 18 days.

A simple solution is missing. Because almost everything related to time control uses a 32-bit counter.

There is an option - forced software reset through a timer in one month.

devman · ‎2018-12-25

Hi,

We have encountered a problem like you and it hasn’t resolved yet. Our setup is two custom boards using the CMWX1ZZABZ modules from Murata for the communication part. One board, we call it “sender�?, has two temperature sensors connected to the module via I2C bus. We call the other boards “receiver�?.

We are developing our code on mbed and are using SX1276GenericLib by Helmut Tschemernjak.

The library link is https://os.mbed.com/users/Helmut64/code/SX1276GenericLib/ .

These two boards behave as follows.

A “sender�? reads data from sensors and send them to “receiver�?.
After sending, “sender�? module gets into sleep mode for 15 minutes.
After 15 minutes sleep, “sender�? wakes up and behaves as written at 1. A “receiver�? operates so that it can receive data constantly.

We flashed 43 devices as “sender�? and one device as “receiver�?. Within 20 days, heartbeat of 40 devices suddenly stopped. The only way to get them going again is to reset them. The voltage at the time of stopping is around 2.5 volt.

We are now trying to find the cause of this phenomenon, with using debugger and inspecting circuit. But we still haven’t solved this problem.

We are unsure whether the problem could be in the firmware or in the hardware or in the both.

Have your problem already solved in a favorable way? Or have you gotten any information about the problem? Please let us know any information whether it is trivial or not.

We really appreciate your help.

Thank you.

Tesla DeLorean · ‎2018-12-26

How long do they run without failure from a bench or other continuous supply?

Make sure you don't have an 49-50 day rollover, or equivalent issues.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Piranha · ‎2019-01-20

...or 24-25 day rollover, if using signed 32-bit integer.

MAlke · ‎2019-03-25

Hi Sebastian.

We have the same issue as you. We used the Ping-ping example and modified it to our needs. The same thing happens when the unit just stops working.

We have a sensor that wakes up ever two minutes and send out a packet. Then we have a logger that captures this packets and saves the data. The logger is allways in RX mode, and when it receives a packet from a sensor, it goes to TX mode. When the logger is done transmitting data, it switches back to RX mode and waits for the next sensor packet.

The sensor and logger software were started with the Ping pong example code.

Then at random times the logger and the sensor will just stop working. I've left a logger on the debugger, then a day later the logger stopped. Then, with the debugger I saw that the reason it stopped was because of a hard fault error.

I've read quite alot about hard fault errors, and what may cause them, but I still can't seem to get the issue resolved.

Al I can think is that the SRAM gets corrupted, but I don't know how or where in my code. And I don't know how to keep this from happening.

Did you get to resolve your issue Sebastian?

Tesla DeLorean · ‎2019-03-25

You need to review the faulting code to see what it does.

The CM0 is ultra sensitive to alignment issues, so watch for 16, 32 or 64-bit reads/writes against unaligned byte pointers.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

T J · ‎2019-03-25

I know one issue that would cause it.

this is the issue here, where it fails.. //before hardfault

void SX1276ReadFifo( uint8_t *buffer, uint8_t size ); // variable size buffer up to 256

the rot starts here: in the declaration // you have to be careful with buffers.

memset( RxTxBuffer, 0, ( size_t )RX_BUFFER_SIZE );

//if there is an overrun. what is declared next after this is in trouble

this is the fix:

#define RX_BUFFER_SIZE 256 // has to be full length or it will over run the next bytes with an errent reception.