cancel
Showing results for 
Search instead for 
Did you mean: 

Why does BLE Data Transmission block Arm® Cortex®�?M4?

Aishwarya
Associate III

STM32WB controllers are dual-core processers where Arm® Cortex®�?M4 core running at 64 MHz for application and Arm Cortex�?M0+ core at 32 MHz for network or BLE. 

To transmit data through BLE aci_gatt_update_char_value() Function used which blocks Arm® Cortex®�?M4 core.

Why does the BLE Transmission function block Arm® Cortex®�?M4?

Is there an approach or method to transmit data via BLE without blocking the Arm® Cortex® M4?

16 REPLIES 16
jro
Associate III

Hi @OCatt.1​ 

Just taken a look at this, though not with a debugger, so this will be a bit theoretical ...

I only ever use aci_gatt_update_char_value() to change a Characteristic's value (there's also a aci_gatt_write_char_value() but I've no real idea what the difference is: seems to be connection-based rather than service-based, and maybe starts the process but doesn't complete it): both call hci_send_req(), The documentation says aci_gatt_write_char_value() generates an event, but not aci_gatt_update_char_value() - that just returns an error code. I notice that Christophe Arnal only mentioned aci_gatt_update_char_value() above.

As far as I can see hci_cmd_resp_wait() is poorly conceived. Although it nominally has a timeout, the ST-supplied implementation ignores it, and even if your implementation uses it, there's no official way to figure out if the wait completed or timed out, as the enum concerned only has WAIT and RELEASE values. In hci_send_req() the outermost while() loop spins until it gets a response, so the system can very readily get terminally wedged.

You've not said which function you're calling, but if it's not aci_gatt_update_char_value() you might want to give that a try...

Hi @jro​ ,

Thank you for the response!

Apologies, I am using the aci_gatt_update_char_value() function in order to change the Characteristic's value.

I have managed to circumvent the issue by implementing a short timeout within the outermost while() loop such as to break it free, whilst also removing the hci_cmd_resp_wait() call. This doesn't seem like the intended/optimal use case, do you mind me asking how you went about resolving this issue for yourself? Was the implementation of a FreeRTOS you mentioned previously the solution?

My current project serves as file/data transfer between PC app and external device via the STM32WB55 nucleo as a BLE link. I'm transferring data from the app by updating the Characteristic value of the nucleo, which then transfers received data via UART to my device, which responds with an Ack and on receiving of serial data (HW_UART_Receive_IT), it updates the Characteristic value via aci_gatt_update_char_value(), such that the PC App receives the callback for the Characteristic "ValueChanged" event and sends the next packet.

I mention this for context as to my current situation, and to ask regarding throughput. Currently this process is extremely slow, sending 253 bytes per packet and each packet has approximately .5 seconds interval between them, which as you can imagine causes even small files to take a very long time. Processing on both the App and the external device sides are very quick and shouldn't be causing a delay close to this degree, leaving the resultant cause to be within the bluetooth processing.

I have ensured 2M PHY is used and I am using Write without response/notify.

Any help you can provide me to both of these matters would be incredibly appreciated.

Regards.

jro
Associate III

Hi @OCatt.1​ 

I don't recall actually having had an issue, apart from having to replace the weak definitions of [s]hci_cmd_resp_wait() and [s]hci_cmd_resp_release(). This really only helped me free up the CPU so other FreeRTOS tasks could run, I don't think it'll help your problem. As far as I can see, so long as you do a well-formed call which ends up in hci_send_req(), it should return very quickly, even if the result is an error.

When you say you have to break out of the outermost while() loop, is that because hci_cmd_resp_wait() is stalled, or because it gets released but there's then nothing useful in HciCmdEventQueue? I've forgotten the details, but I seem to recall there's a bunch of tricky setup required to get the IPCC interrupt to vector into the BLE handler code. IIRC there was a weak definition somewhere which allowed the code to compile correctly while totally failing to operate... Again, doesn't sound like your issue, as you're getting things to work, albeit slowly.

From what you said above it sounds as if you might be waiting for data to be acknowledged before sending more (not totally clear, though) - I think that would definitely cause throughput problems.

I'm getting about 128kB/s on a 1Mbit/s link. I have one task reading a Flash sector (512B), then copying it 128B at a time to my Characteristic's buffer, then sending a request to the BLE task (via a FreeRTOS queue) to update the value on CPU2 from that buffer. The BLE task does that by calling aci_gatt_update_char_value(), which takes however long and may return an error if CPU2 is "too busy" (I'm vague on the details of that!). If I get an error, I just retry "immediately", within the constraints of FreeRTOS's queueing and scheduling overhead.

My target client is an iOS app, so I'm limited to a 185-byte MTU, hence the convenient choice of 128-byte notification payloads. I've done a little debug using Windows: the throughput on that is OK, though it can take an age to connect for some reason.

Hi @jro​ ,

When you say you have to break out of the outermost while() loop, is that because hci_cmd_resp_wait() is stalled, or because it gets released but there's then nothing useful in HciCmdEventQueue?

The situation I'm getting appears to be that there is nothing useful in HciCmdEventQueue, thus causing the while loop to go infinite.

 I seem to recall there's a bunch of tricky setup required to get the IPCC interrupt to vector into the BLE handler code.

Thank you for letting me know of this, I will give this area a look to see if it could allow me to at least not have to brute force out of the loop!

From what you said above it sounds as if you might be waiting for data to be acknowledged before sending more (not totally clear, though) - I think that would definitely cause throughput problems.

I did have a feeling that this was the case, however I was unsure as to why it would be as Write without response was being used?

On the off chance it provides any information, when sniffing the Bluetooth packets during a transfer I see the following sequence:

Direction Protocol Length Desc Time Diff From Prev Packet

  • PC -> Module, ATT, 256, Write Command, +0,
  • Module->PC, HCI_EVT, 8, Number of completed packets, ~+50ms,
  • Module->PC, HCI_ACL, 32, [Reassembled in last packet], ~+80ms, (+3ms to +9ms between each of these, varying),

I believe this to be some data (or Ack) when the module changes the char value, it is repeated 9 times per sent packet, with the final item in the sequence being where it is reassembled.

  • Module->PC, ATT, 16, Handle Value Notification, ~+1ms (~+120ms from HCI_EVT packet)

Following this final packet, the next Write Command occurs (next packet sent), following an ~+45ms delay from the Handle Value Notification packet.

I am unsure where I really ought to be looking regarding where the delays are occurring as it appears like there is a significant delay in the packet being received by the module from the app, the ack being sent by the module to the app (serial comms between module and connected device occurs in <5ms), as well as the time taken from the sent Ack packet to the following packet being sent (App firing next packet immediately as soon as "Value changed" event occurs. C# App using "WriteValueAsync(Buffer, GattWroteOption.WriteWithoutResponse);"), making it very hard to pinpoint a specific place to be looking for delays.

I really appreciate the help you've provided so far, thank you a lot JRO!

jro
Associate III

Definitely worth chasing why HciCmdEventQueue has nothing (useful) in it - there is only one place I can find that hci_cmd_resp_release() is called, which is in TlEvtReceived() and there's something put in HciCmdEventQueue there!

Have you done a aci_gatt_exchange_config() after establishing the connection? Re-assembling 9 packets sounds as if the MTU is still at 23 bytes, so your data is getting fragmented for transmission. CFG_BLE_MAX_ATT_MTU is set in app_conf.h: mine's 156, which seems to be the stack default. You may still get fragmentation as 253 bytes don't fit into 156, and even if you set an MTU of >=253 bytes it may not be accepted; but it's worth checking.

I don't think a long Connection Interval is relevant, but that might be worth playing with, just to prove its not the blocker.

Hi @jro​ ,

Thanks for that insight, it's actually led to me finding something, though not a solution.

I tested calling the aci_gatt_update_char_value() immediately when the packet sent by the App is received by the module (within STM_Event_Handler()) rather than sending through the serial and waiting for a serial response, and when this occurs the event works perfectly fine and the process functions correctly. This suggests that the issue is lying in the fact that aci_gatt_update_char_value() is called within the serial interrupt callback (UART_RxCpltCallback() called via HW_UART_Receive_IT()).

Could it be that for some reason when called within this callback, the event is blocked or cut off before occurring? If this is the case, why has every piece of example software provided by ST that I've seen using this sort of feature got this process occurring this way with the call occurring within this callback?

To try and overcome this I attempted to have the callback set a flag, with an if statement checking this flag within the infinite while loop in main, however this didn't work (seemingly never called the function), I'm unsure if this while loop is active similarly to how others function. Would this be where the Sequencer is used instead?

I'm not sure on how to use this sequencer as of yet, assuming just a call to the UTIL_SEQ_SetTask()?

I've checked CFG_BLE_MAX_ATT_MTU, which is set as 300 possibly causing it to not be accepted? I'll check different values.

I call aci_gatt_exchange_config() within the HCI_LE_CONNECTION_COMPLETE_SUBEVT_CODE event in SVCCTL_App_Notification(), so I'd assume the MTU would update, I also tested (in current state) having altered the BLE_DEFAULT_ATT_MTU within ble_bufsize.h, which had no affect on the fragmentation which occurred.

Having used CubeMX, I have FAST_CONN_ADV_INTERVAL to 400 for Min/Max, LP_CONN_ADV_INTERVAL to 600, MAX_CONN_EVENT_LENGTH is also at 0xFFFFFFFF.

jro
Associate III

I suspect calling aci_gatt_update_char_value() in an ISR is likely to cause problems - certainly not something I do, but then with FreeRTOS I don't need to... I assume your flag is marked volatile so the foreground code doesn't optimise away the check.

TBH I've not really looked at the ST examples and use of the sequencer in any detail, so can't really comment on how they get their examples to work. I thought their sequencer ran entirely in the foreground, and just flagged some "tasks" as stalled if e.g. the BLE stack was processing a request; the interrupt then clears the flag when the request is completed. But I could be wrong.

Don't think the advertising interval is anything to with the connection interval.

aci_gatt_exchange_config() returns the agreed MTU somewhere in its response structure, so it should be easy enough to check. There are points in the code where although a packet could be >255 bytes, the length parameter is a uint8_t - hopefully that isn't causing the fragmentation! Easy to set 150, anyway, which should be acceptable to most clients.