cancel
Showing results for 
Search instead for 
Did you mean: 

STM32WB05 vs STM32WB09 for continuous BLE data streaming – single-core suitability?

vkosuri
Associate III

Hi ST community,

We are currently evaluating STM32WB05 vs STM32WB09 for an application involving continuous BLE data streaming from multiple sensors, and would appreciate guidance from experts who have worked with STM32WB0 series.

The reason we choosing the STMWB05 over STM32WB09 MCU, STM32WB05 MCU comes at a lower cost.

In brief our application sensor are

  • 2x IMU LSM6DSO sensors (high-frequency sampling)

  • 1x magnetometer

  • 1x analog force sensor

  • A single BLE connection (no mesh or multi-node), with frequent sync occurring approximately every three minutes.

Key Questions

1. WB05 vs WB09 for continuous BLE streaming

Both MCUs seem to have similar BLE capabilities (Bluetooth 5.4, 2 Mbps PHY), but:

  • Does WB09 (larger RAM/Flash) provide significant advantage for:

    • Continuous streaming

    • Buffering

    • Reducing packet drops?

  • Has anyone faced RAM bottlenecks or BLE throughput instability on WB05, WB09 in such use cases?

2. Single-core architecture suitability

Both WB05 and WB09 are single-core Cortex-M0+ with radio coprocessor.

In real-world use:

  • Is single-core sufficient for:

    • Continuous IMU acquisition (SPI/I2C)

    • BLE notifications at high rate?

  • Any known limitations when CPU is handling:

    • sensor read + packetization + BLE stack?

3. BLE timing / sync optimization

Looking for best practices on:

  • Handling continuous BLE notifications without drops

  • Optimizing:

    • Connection interval

    • MTU size

    • Packet batching strategy

  • Recommended approach:

    • Send per-sample vs batch multiple samples?

4. STM32CubeWB0 stack usage

We have reviewed:

  • STM32CubeWB0 Getting Started

  • STM32CubeWB0 GitHub examples

Observations:

  • STM32_BLE middleware handles full BLE stack

  • HAL + LL + middleware layered architecture

Questions:

  • Any recommended BLE example as baseline for streaming use case?

  • Is BLE_Skeleton sufficient, or better to start from another example?

5. HAL vs LL for performance-critical path

From documentation:

  • HAL = easier, portable

  • LL = better performance

For this use case:

  • Recommended approach:

    • HAL only?

    • HAL + LL mix (e.g., LL for SPI/DMA)?

6. FreeRTOS vs bare-metal

  • Is FreeRTOS recommended or avoided for BLE-heavy applications on WB0?

  • Any known issues with:

    • latency

    • BLE timing interference?

What we are trying to achieve

  • Stable continuous BLE streaming (no packet loss)

  • Efficient CPU usage on single-core

  • Clean architecture for sensor + BLE sync

Looking for suggestions on:

  • MCU selection between WB05 vs WB09

  • BLE tuning application parameters for ST LSM6DSO + ST Magneto + LSM6DSO + Force, No audio

  • Firmware architecture (ISR / DMA / buffering)

  • Any pitfalls to avoid

 

Thanks in advance!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Imen.D
ST Employee

Hello @vkosuri ,

Thank you for taking the time to post with details.

Please find the answer below for each point :

##1. WB05 vs WB09 for continuous BLE streaming

Since you will use a very simple scenario, the resources available on STM32WB05 are likely sufficient.

There is no expected advantage in using STM32WB09 over STM32WB05. The Bluetooth LE stack configuration (used features, buffers, etc.) may need to be tuned.

##2. Single-core architecture suitability

The performance of STM32WB05/WB09 is usually sufficient to handle sensor data streaming. However, this depends on the target data rate and on the implementation of sensor readout (e.g. whether DMA is used or not, whether filters are applied, and so on).

The advantage of the STM32WB0 stack over most other Bluetooth devices is that it is able to dynamically increase the number of packets in a connection event, to sustain a high data rate (if this is also supported by the Central device).

##3. BLE timing / sync optimization

- **Connection interval**: it can be tuned depending on the required latency. A short connection interval provides better latency, but it increases power consumption and CPU load. A short connection interval may also be required to achieve a higher and more stable throughput.

- **MTU size / Data length**: it is recommended to use a high ATT MTU size to reduce protocol overhead. It is also recommended to increase the maximum Link Layer PDU size to the maximum supported value (typically 251 bytes for LE 1M PHY with Data Length Extension enabled).

A commonly used configuration is:

  • ATT MTU = 247 bytes  
  • LL Data Length = 251 bytes  

  so that one full ATT payload plus headers can fit efficiently into a single PDU.

- **Packet batching strategy**: it is preferable to collect multiple samples and send them in larger packets, which is more efficient than sending many small packets.

Note: BLE notifications are typically used to achieve high data throughput. However, for very high data throughput, the BLE standard does not guarantee that all notifications are delivered to the application. This needs to be taken into account at application level (For example, use sequence numbers, application‑level ACKs, or loss‑tolerant encoding).

## 4. STM32CubeWB0 stack usage

**BLE_SerialPort_Server** is one of the simplest applications that can be used as a starting point, including for streaming use cases.

**BLE_Skeleton** is not a BLE application example but just an "almost empty" project, since it does not contain any application code using BLE.

## 5. HAL vs LL for performance-critical path

The recommended approach depends on whether the code needs to be more optimized or easier to develop and maintain.

- **HAL** functions make using the peripherals easier and more portable but require more Flash space and add some overhead.
- **LL** APIs may sometimes be required to reach maximum performance and minimize code size, at the cost of more detailed low‑level configuration.

## 6. FreeRTOS vs bare-metal

FreeRTOS typically increases Flash and RAM usage. It may also have a minor negative impact on CPU load due to context switching. I would not recommend using it if the primary goal is to maximize efficiency of CPU and memory usage and the application remains relatively simple.

 

Hope this answer your questions. If this the case, please close this post by clicking "Accept as Solution" button on my reply. This will help other members of the community find this response more quickly.

When your question is answered, please close this topic by clicking "Accept as Solution".
Thanks
Imen

View solution in original post

4 REPLIES 4
Imen.D
ST Employee

Hello @vkosuri ,

Thank you for taking the time to post with details.

Please find the answer below for each point :

##1. WB05 vs WB09 for continuous BLE streaming

Since you will use a very simple scenario, the resources available on STM32WB05 are likely sufficient.

There is no expected advantage in using STM32WB09 over STM32WB05. The Bluetooth LE stack configuration (used features, buffers, etc.) may need to be tuned.

##2. Single-core architecture suitability

The performance of STM32WB05/WB09 is usually sufficient to handle sensor data streaming. However, this depends on the target data rate and on the implementation of sensor readout (e.g. whether DMA is used or not, whether filters are applied, and so on).

The advantage of the STM32WB0 stack over most other Bluetooth devices is that it is able to dynamically increase the number of packets in a connection event, to sustain a high data rate (if this is also supported by the Central device).

##3. BLE timing / sync optimization

- **Connection interval**: it can be tuned depending on the required latency. A short connection interval provides better latency, but it increases power consumption and CPU load. A short connection interval may also be required to achieve a higher and more stable throughput.

- **MTU size / Data length**: it is recommended to use a high ATT MTU size to reduce protocol overhead. It is also recommended to increase the maximum Link Layer PDU size to the maximum supported value (typically 251 bytes for LE 1M PHY with Data Length Extension enabled).

A commonly used configuration is:

  • ATT MTU = 247 bytes  
  • LL Data Length = 251 bytes  

  so that one full ATT payload plus headers can fit efficiently into a single PDU.

- **Packet batching strategy**: it is preferable to collect multiple samples and send them in larger packets, which is more efficient than sending many small packets.

Note: BLE notifications are typically used to achieve high data throughput. However, for very high data throughput, the BLE standard does not guarantee that all notifications are delivered to the application. This needs to be taken into account at application level (For example, use sequence numbers, application‑level ACKs, or loss‑tolerant encoding).

## 4. STM32CubeWB0 stack usage

**BLE_SerialPort_Server** is one of the simplest applications that can be used as a starting point, including for streaming use cases.

**BLE_Skeleton** is not a BLE application example but just an "almost empty" project, since it does not contain any application code using BLE.

## 5. HAL vs LL for performance-critical path

The recommended approach depends on whether the code needs to be more optimized or easier to develop and maintain.

- **HAL** functions make using the peripherals easier and more portable but require more Flash space and add some overhead.
- **LL** APIs may sometimes be required to reach maximum performance and minimize code size, at the cost of more detailed low‑level configuration.

## 6. FreeRTOS vs bare-metal

FreeRTOS typically increases Flash and RAM usage. It may also have a minor negative impact on CPU load due to context switching. I would not recommend using it if the primary goal is to maximize efficiency of CPU and memory usage and the application remains relatively simple.

 

Hope this answer your questions. If this the case, please close this post by clicking "Accept as Solution" button on my reply. This will help other members of the community find this response more quickly.

When your question is answered, please close this topic by clicking "Accept as Solution".
Thanks
Imen

@Imen.D wrote:

- **Connection interval**: it can be tuned depending on the required latency. .


@vkosuri  But remember that it is the Central which makes the decision on what Connection Interval to actually use - it night not observe the value requested by the Device.

This is particularly true of phones, tablets, etc ...

A complex system that works is invariably found to have evolved from a simple system that worked.
A complex system designed from scratch never works and cannot be patched up to make it work.
Andrew Neil
Super User

@vkosuri wrote:
  • Recommended approach:

    • HAL only?

    • HAL + LL mix (e.g., LL for SPI/DMA)?


It has been said that "premature optimisation is a root of all kinds of evils"

So probably the best approach is to start with HAL, and see what performance you get:

  • If the performance is fine, you're a winner.
  • If there's a performance issue, then look into it and determine exactly where the issue(s) is/are - and solve them specifically.

 

A complex system that works is invariably found to have evolved from a simple system that worked.
A complex system designed from scratch never works and cannot be patched up to make it work.

Hi @vkosuri 

When your questions are answered, please close this topic by clicking "Accept as Solution". This will help other users find that answer faster.

If you still have questions or issues, don't hesitate to return to the Community with new thread or as a continuity of this one (when related to the same initial topic).

When your question is answered, please close this topic by clicking "Accept as Solution".
Thanks
Imen