cancel
Showing results for 
Search instead for 
Did you mean: 

LoRaWAN stops sending data after a month or so

Sebastian Miller
Associate
Posted on April 03, 2018 at 13:30

Hi,

We are working on a sensor using LoRaWAN for communication.

Our setup is a custom board using the 

CMWX1ZZABZ module from Murata for the communication part (this module is the same as the one on the B-L072Z-LRWAN1 Discovery Kit). The sensor is connected to the LoRaWAN module with 2 wires: 1 output to activate the sensors, 1 input to get the sensor status.

For the LoRaWAN code, we used as base the End_Node project of I-CUBE-LRWAN v1.1.1 (01-June-2017). The main differences with the original project are:

- Disabled vcom and corresponding UART (for power saving)

- Added ADC configuration

- Added an interrupt for the sensor status GPIO

- The application data transmission duty cycle is changed to 10 minutes.

The STM goes to STOP mode using WFI, so it is woken up either by the 10 minute timer, or the sensor status interruption. On wake-up, if the sensor status pin is changed, the new value is sent, otherwise a heartbeat packet is sent.

We flashed 40 devices about 2 months ago as a trial. After about a month, the heartbeat of some devices suddenly stops, and the only way to get them going again is to reset them. So far, 7 devices out the 40 have failed, and we have not managed to reproduce the problem under debug conditions.

We feel quite blocked, as it is very difficult to find the origin of the problem: any debug output was disabled to save power, and connected a debugger resets the device. 

We are unsure if the problem could be in the firmware or the hardware.

Has anybody seen a similar problem before ? Or has anybody any suggestions as to a way of debugging efficiently such a problem ?

Thanks !

#stm32lo #lorawan #lora #murata-cmwx1zzabz #murata #i-cube-lrwan
14 REPLIES 14
MAlke
Associate II

Hi TJ

RX_BUFFER_SIZE is defined to 256 already, but I still get hard faults.

>>but I still get hard faults

So you're going to need to chase that to ground. There's a whole reason the micro-controller has a bunch of registers to define the causal reason for the fault, and it's not to spin in a while(1) loop.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
S.Ma
Principal

Sometime there are variables measuring time in 16 bit and 32 bit, and possibly after quite long time these variables will overflow. If the code uses delta (difference) to trigger event, it might get stuck like GPS might malfunction for the same reason on April 6th... check around the overflows.

T J
Lead
You may want to extend it to 1024 bytes , fill it with zero and test for nonzero occasionally…
Heaven forbid some engineer is overrunning a buffer…
I shifted the buffer position in ram to before my debug Uart Tx buffer, so if it overruns, I am likely to overwrite it anyhow.
MAlke
Associate II

Hi I found my issue

In the RxDone routine, I had a memcopy(buffer, payload, payload_size). The variable, buffer, is an array that is 10 bytes long. The value payload_size, and pointer of payload is sent to RxDone. If the value of payload_size is bigger than 10, then I would get a case were I'm writing over other variables ram. This cause a hard fault.

Thank you