cancel
Showing results for 
Search instead for 
Did you mean: 

Why does BLE Data Transmission block Arm® Cortex®�?M4?

Aishwarya
Associate III

STM32WB controllers are dual-core processers where Arm® Cortex®�?M4 core running at 64 MHz for application and Arm Cortex�?M0+ core at 32 MHz for network or BLE. 

To transmit data through BLE aci_gatt_update_char_value() Function used which blocks Arm® Cortex®�?M4 core.

Why does the BLE Transmission function block Arm® Cortex®�?M4?

Is there an approach or method to transmit data via BLE without blocking the Arm® Cortex® M4?

16 REPLIES 16
Remi QUINTIN
ST Employee

How do you detect the M4 core is blocked?

This is not expected. M4 and M0+ cores are two separate and independent core. The only shared ressources are the shared SRAM buffers used for IPCC and the flash memory. Shared resources are protected via semaphores.

The M4 core is maybe waiting for an event from the M0+ core?

jro
Associate III

It's blocked because the supplied implementation of hci_cmd_resp_wait() in hci_tl.c has an INFINITE while loop (ignoring the timeout parameter!) which spins until hci_cmd_resp_release() is called from the interrupt-level code when CPU2 finishes its work.

Both hci_cmd_resp_ definitions are WEAK, so you can provide your own, and perform other functions "inside" hci_cmd_resp_wait(). This is non-trivial...

Aishwarya
Associate III

Thanks a lot @jro​ , that helped.

It is surprising to see that community members understand the environment and the problems that might arise better than ST employees themselves.

Is there any application notes or something similar elaborating on such system functions and to what extent does the developer has scope to modify such system functions without "breaking" them? Due to the lack of official documentation and/or even comments in the generated code, it is very challenging to clearly understand the control flow around and into such functions.

With an expression of gratitude...

Christophe Arnal
ST Employee

Hello,

When the aci_gatt_update_char_value() is sent to transmit data, two things happen :

1/ The command is sent from CPU1 to CPU2 and is acknowledged by CPU2. The data is stored in our internal buffers.

2/ The data is sent over BLE when possible.

The first step may hold CPU1 until CPU2 acknowledges the command whereas the second step is of course not blocking CPU1 execution.

So, CPU2 does not block CPU1 during execution but it is required some synchronization when sending a command.

Here is some background :

An application code should look like:

.....
aci_***();
...
...
aci_***();
... 
aci_***();

Obviously, in your application, you expect each aci_***() command to be completed on return before moving forward in the code and sending a new one. This is especially true when the command returns some status.

On a single core, there is no question.

For a dual core, the command is executed on the second core. However, the application shall be working in the same way so it still does not expect the aci_***() command to return before it has been acknowledged by CPU2.

When the weak functions hci_cmd_resp_wait() and hci_cmd_resp_release() are not implemented in the application, the default behavior is that the HCI Transport Layer is blocking until the CPU2 provides the acknowledge. The overhead of the IPCC communication between the two cores is a couple of tens of us.

So, there should not be any significant performance issue in the application when using the default implementation.

For those applications running at 64MHz where any single ten of us matters, we implemented a mechanism to hold the current process to execute another one.

This is achieved with hci_cmd_resp_wait() and hci_cmd_resp_release(). You may check the BLE_HeartRateFreeRTOS project where OS semaphores are used.

When using an OS and the dedicated Semaphores, the current task is pending until CPU2 provides the acknowledgment and CPU1 may run any other OS tasks in the meantime.

When using a baremetal implementation, such feature is usually not available. However, the sequencer ( which is a simple baremetal packaging AND NOT a custom OS) provides a similar functionality that is used in all our non-OS examples.

By default, there should be no reasons why you would need to change the implementation although this is possible.

The mechanism is described in AN5289.

In rev5, you may find:

  • a MSC flow description given at p136
  • an api description given at p138 - (note: I noticed a typo where shci_cmd_resp_wait() is described twice, second description is related to shci_cmd_resp_release() - this will be fixed in some later release)

There is the exact same information for hci_cmd_resp_wait() at page 143 and 144.

You may check as well the description given in the header file hci_tl.h and schi_tl.h in the Cube Package Firmware..

Please, let us know which kind of information you are missing in the AN5289 so that we may improve the description.

Regards.

Sorry, what is the solution to the original issue of the transmission causing blocking?

I am presently having this exact issue however I'm unable to find a resolution.

Christophe Arnal
ST Employee

Hello,

Could you please provide more details on your issue.

Basically, there is no blocking on the CPU1 due to BLE transmission on CPU2.

Regards.

jro
Associate III

Basically, there is blocking in the CPU1 stack - you've contradicted your previous post, @Christophe Arnal​ ! You said "When the weak functions hci_cmd_resp_wait() and hci_cmd_resp_release() are not implemented in the application, the default behavior is that the HCI Transport Layer is blocking until the CPU2 provides the acknowledge". Which is true.

@OCatt.1​ , you'll have to implement your own copies of the above functions, such that when execution reaches your hci_cmd_resp_wait() your application loops doing anything except BLE stack calls (because it's already doing one...), and returns when your hci_cmd_resp_release() has been called (which is done under interrupt).

As I noted previously, this is non-trivial. I've done it by implementing my system in FreeRTOS, so the BLE stack has its own task, and no other task cares if it's blocked. This is complex and in any case I can't post the code because I did it for work purposes! I think the ST sequencer is intended to provide this sort of capability at a very basic level - you'll find it in Utilities/sequencer/stm32_seq.c and .h, several examples use it.

Hi @jro​ ,

Thanks for the response and for the depth you've gone into with your answer, it's been very helpful.

In my personal case, within the hci_send_req function the while(local_cmd_status == HCI_TL_CmdBusy) stalls as the inner loop is never entered (no events raised) to change the local_cmd_status value. I originally thought the hci_cmd_resp_wait() function was causing the stall (never exited as no events raised) however realised that they were both part of the same problem.

Is there a reason that when changing the characteristic value this situation arises? Should I be receiving an event in this situation?

I am yet to attempt using a sequencer and will attempt it shortly, however I was planning on implementing a timeout on the hci_cmd_resp_wait (possibly that timeout which currently does nothing...) to release the wait and alter the local_cmd_status value simultaneously, however I'm unsure if this may cause issues elsewhere?

Any advice you could give me would be greatly appreciated.

Hi @Christophe Arnal​ ,

For more details on my issue please read the below response to jro, as well as the post found [HERE].

Thank you.