cancel
Showing results for 
Search instead for 
Did you mean: 

Need M0+ recovery solution without reseting M4

OrnellaBenzi
Associate III

Hi! We are developing a sensitive device using the STM32WB55, using in the M0+ the concurrent Thread+Ble binary (we use the concurrent because we'll need both Ble and Thread in the near future, althought the most important role is the Thread communication, and is the only one we are using right now). 
Thread works fine, but after some hours on the field (not in lab), the M0+ gets stuck in some state where it no longer ack the commands from our app in M4. Reseting the whole device makes it functional again, but it is not an opcion in the field. We need a way of reseting the M0+ without touching the M4, in order to make it function again. 
The state in which the M0 wont respond is simply it wont ack any command anymore after a while. It is not consistent, it could happen after 4h or after 12h. It is not reproductible in the lab, it only happens on the field. So we need to implement a recovery for when it happens. 
Please any hint is welcome. 

 

16 REPLIES 16

External chips like BlueNRG have a reset pin. Strange to omit such a function in the embedded software.

OrnellaBenzi
Associate III

Hi @Christophe Arnal , thank you for your response. The binary we are using is stm32wb5x_BLE_Thread_dynamic_fw.bin (version 1.21.0) 

OrnellaBenzi_0-1760547942406.png

I will add to the log the values you suggested and update when the issue arises again with as much information as possible about the problem.

Regards

Hi,

BlueNRG is a Single Core Device.

Similarly to STM32WB55, the reset acts on the full device.

 

Regards.

RiceCorn
Associate

Hi,

I got similar problem with v1.21.0 stm32wb5x_Zigbee_FFD_fw.bin stack. M0 not ack to M4 API after communicating for few hours.

We found out that its wont happen with v1.22.0 stack. And its based on the wireless activity "send packet".

Maybe the wireless environment on the field is much more complicated than in the lab in your case?

We're still working on v1.21.0 stack (got other issue with v1.22.0 stack), and try to reduce wireless activity to avoid issue.

Hope to see any solution here.

Hi @Christophe Arnal , we could reproduce the bug in field and dump all 4x  32bits values from SRAM2A base address:
'sram2a': ['0x1170FD0F', '0x00000000', '0x000003B3', '0x20037BF0']
'stack_version': '1.21.0'
'stack_type': 81

And we also confirmed that the exact problem is that the M4 never gets the Ack from the M0+ after an OT Command. We modified the firmware to be able to get a timeout waiting for the ack instead of being stuck in that line forever. 

static uint8_t Wait_Getting_Ack_From_M0(uint32_t timeout)
{
    uint8_t result = 1; // timeout
    if (xSemaphoreTake(OtCmdAckSem, timeout) == pdPASS)
    {
        m0_ack_failed = 0;
        result = 0; // success
    }
    else
    {
        // TODO
        m0_ack_failed = 1;
    }
    return result;
}

 

Christophe Arnal
ST Employee

Hello,

The first 32bits value 0x1170FD0F reports the CM0+ is in its HardFault interrupt handler. You can find a short description on this in AN5289 rev18 Chapter 4.8.2.

As long as the CM0+ is in Hardfault, there is no way to send any kind of command to CM0+. As I already reported, it is not possible by HW Architecture to reset only CM0+.

I strongly doubt there is a way to find any king of workaround on CM4 side to avoid CM0+ getting into such HardFault.

I am currently checking with the team if this HardFault is already known and in that case, what would be the first CM0+ Wireless version where it has been fixed.

It is almost sure you will have to upgrade CM0+ wireless firmware. Obviously, if this is possible, upgrading to latest version is always the best.

Regards.

Hello,

It has been confirmed an issue in one SW module in the CPU2 wireless firmware that in some very timing corner cases may lead to an HardFault. The log you provided confirmed the HardFault is generated from this module.

In Stack Version 1.22, since the fix was not already there, this SW module has been set back to an older version so that the HardFault issue is not there. This version is perfectly operational (the update that were implemented in this SW module is not mandatory for the product).

In Stack Version 1.23, this SW module has been fully fixed.

Since the OpenThread TAG is the same in V1.21, V1.22 and V1.23, I would recommend to go with at least Stack V1.23.

Of course, that would be great to move to the latest version V1.24 but be aware that the OpenThread TAG has been changed so depending on your application, you may be impacted.

Regards.