cancel
Showing results for 
Search instead for 
Did you mean: 

Fatal connection bug in bluenrg2 V2.X stack causes device lockup

donpedro
Associate II

Hello all,

I have fond a fatal flaw in the 2.x BT library that comes with the latest SDK:

STSW-BLUENRG1-DK 3.2.1

Setup: bluenrg2, external balun

Problem: When connecting a device using initial connection parameters with very small connection intervals (min = 7ms), the stack sometimes ends up in a connection limbo state where the stack (and the app) gets confused.

It appears to be a race and is triggered like this:

1) The stack emits a normal hci_le_connection_complete_event() with status 0 (zero).

2) The remote end (a dongle in a PC) clearly signals on HCI level that the connection first succeeded with status 0 (success), but shortly after gets a "disconnection complete" with error code "0x3e" = "Reason: Connection Failed to be Established". This is a normal scenario if connection timeouts occur during the initial connection itself, and this has also been observed on other stacks/chipsets.

3) Locally (on the bluenrg2), the stack the does _not_ signal signal "hci_disconnection_complete_event" correctly, causing it to be in a bad state (fatal).

4) No matter what is done on the remote end (e.g. eject dongle), the link never drops on the bluenrg2 with "hci_disconnection_complete_event".

5) If the state is forced on the bluenrg2 by calling hci_disconnect(), still nothing happens.

It would seem that the very short connection intervals causes the connection process itself to fail (in the middle of it). This seems to trigger a race or missing handling of a state to occur inside the bluenrg2 bluetooth v2.X stack. I have seen problems with short connection intervals on other stacks and chipsets, which is not a big problem. The state lockup is a big problem however, and the only way to get out of it is a reset of the chip.

1) It can be triggered fairly easily with most devices. I can make it fail after very few tries with e.g. a "BT-400" from Asus (Broadcom/Cypress chipset).

2) It can be reproduced with any example peripheral project from the SDK. Just a few connections and it goes south.

3) The reason you do not normally see this is a e.g. iphone and android devices never do initial connections with such low intervals. They use high intervals and then re-negotiate after connection.

Any light on this or a fix/workaround would be great. Our device is suffering due to this exotic but fairly obvious bug.

Thanks,

/pedro

1 REPLY 1
donpedro
Associate II

Hello all,

Still no sign of progress on this.

Seems that the HCI version of the interface is just broken.

Use the GAP version of the interface instead.

Only problem with that interface is that it leaves less space available for advertising.

That is however not an option for us, as we need all the space.

I have also experimented with mixing the APIs to get the wanted result, but not sure if that will cause new problems.

It is always a challenge when working with closed-source black-box code.

Thanks,

/pedro