STM32WB55 CPU2 Corrupts TL_RefTable while configuring BLE advertising (stack v1.15.0)

BStic.2 · ‎2023-03-07

Ive seen this issue in various different forms over the last 2 years and it seems like CPU2 is corrupting the TL_RefTable structure while configuring advertising. We are using v1.15.0 of the stack on a STM32WB55. This happens every few hours in the following sequence.

aci_gap_set_non_discoverable();
// Wait 40 seconds
aci_hal_set_tx_power_level(1, powLevel);
aci_gap_set_discoverable(ADV_IND, RateMin, RateMax, addrType, NO_WHITE_LIST_USE, 0, NULL, 0, NULL, 0, 0);
aci_gap_update_adv_data(Manufacturing specific type)
aci_gap_update_adv_data(Name)
aci_gap_delete_ad_type(AD_TYPE_TX_POWER_LEVEL)
// Wait 20 seconds

Then the sequence repeats over and over until we get a connection. If I just let it run (over night for example) TL_RefTable gets corrupted and triggers an MPU fault on the next advertising sequence.

Why is this happening? What can I do to stop it?

AM.12 · ‎2023-03-23

Hello @BStic.2 ,

Do you mean CFG_PRIVACY?

When I am trying in debug mode I don't face any problems so it's getting difficult in debugging for me.

BStic.2 · ‎2023-03-23

Yes, my privacy is set to PRIVACY_ENABLED for random resolvable addresses. A.K.A, LE Privacy. If I disable it, the problem goes away.

AM.12 · ‎2023-03-23

Hi @BStic.2 ,

In my configuration it's disabled, but still I face this problem. Strange is when I put it in debug mode I don't get any problem, 1 out of 1000 I got this problem.

Remy ISSALYS · ‎2023-03-24

Hello,

In most of case, the CPU2 hard fault occurs when the CPU1 corrupts the flash memory and then when the CPU2 try to access to the flash, it generates the hard fault, that's why I asked you to check this point.

Can you confirm that your screenshot in your post where the TL_RefTable contains the hardfault keyword is taken with the BLE Full Stack v1.15.0.3? If it's not the case, can you share a screenshot which contain the TL_RefTable and give the corresponding stack version?

Best Regards

BStic.2 · ‎2023-03-24

Thank you for that information. Yes, the TL_RefTable from the original post is v1.15.0.3. Here is another one from v1.16.0.4. I just took a screenshot of the state of the IDE. TL_RefTable is in the lower right corner.

Just as an FYI, the code I am running does not ever write flash.

Thanks,

Ben

Remy ISSALYS · ‎2023-03-27

Hello,

Thanks for your response with your screenshot. I have some follow up questions. Are you using the whitelist? Did you reproduce the issue on nucleo board or it's on your own board? If it's possible, can you share the source code of your application?

Best Regards

BStic.2 · ‎2023-03-27

Hello, yes, we were configuring the resolving list and we did have a mode where we enabled the whitelist, but in this example we were not entering that mode. So, all of the adverting configuration used the NO_WHITE_LIST_USE for Advertising_Filter_Policy. Essentially, if we lost communication due to a connection timeout, we would enable the white list for a short amount of time to prefer previously connected devices. If nothing connected, we would disable the white list and transition to a period where we were slow advertising for a period, followed by a period of no advertising. This example just runs the period of slow advertising followed by no advertising and repeats that over and over.

This is on our custom board, we are using the STM32WB55VG.

Full source is proprietary, but I could update the support ticket (00174181) with the code that manages the advertising if you like.

Thanks again.

Ben

Remy ISSALYS · ‎2023-03-28

Hello,

Yes, if you can share the code that implements your scenario in the support ticket to try to reproduce the issue on my side. If you can also give all the details regarding commands and parameters used in your application. How did you manage the whitelist, when did you add your device into the whitelist? When the hard fault occurs there isn't any connection, only advertising without whitelist followed by no advertising and repeats, is it right? After how long do you reproduce the issue?

Best Regards

BStic.2 · ‎2023-03-28

I have uploaded the code to the ticket. This issue takes a long time on my setup to reproduce. Sometimes it happens in hours, it could take up to a day. Rarely was I able to go overnight without the issue occurring. I tried various things like changing the way the advertising data was set or disabling STOP2 mode, the only thing that made the problem go away was to stop using random resolvable addresses and just rely on my static random address. So, just to be clear, we are a peripheral and we only support 1 connection at a time.

The code should have all the parameters passed to the BLE interface functions. If there is something missing you would like to see, shoot me a message.

I have created a basic flow chart of what our device is doing in this state.

At the end of a connection, we fast advertise for a bit, then take a break and shift to slow advertising. We enable and disable slow advertising until a new connection is established. We are a battery limited device, so we are trying to optimize current consumption while not connected to anything. We are not connected when this issue occurs, we are just advertising.

Remy ISSALYS · ‎2023-03-29

Hello,

Thanks for information provided, investigation is on-going. When hardfault occurs, is there some connection that happens during your test or not? According to your flow chart, when you are in fast advertising mode you don't use the whitelist, is it right? In your setup, how many devices are bonded? Can you share a binary via online ticket that can be run on Nucleo board which implement your scenario and reproduce the issue?

Best Regards