2026-03-24 10:48 PM - last edited on 2026-03-25 7:14 AM by Andrew Neil
Hi ST community,
We are currently evaluating STM32WB05 vs STM32WB09 for an application involving continuous BLE data streaming from multiple sensors, and would appreciate guidance from experts who have worked with STM32WB0 series.
The reason we choosing the STMWB05 over STM32WB09 MCU, STM32WB05 MCU comes at a lower cost.
2x IMU LSM6DSO sensors (high-frequency sampling)
1x magnetometer
1x analog force sensor
A single BLE connection (no mesh or multi-node), with frequent sync occurring approximately every three minutes.
Both MCUs seem to have similar BLE capabilities (Bluetooth 5.4, 2 Mbps PHY), but:
Does WB09 (larger RAM/Flash) provide significant advantage for:
Continuous streaming
Buffering
Reducing packet drops?
Has anyone faced RAM bottlenecks or BLE throughput instability on WB05, WB09 in such use cases?
Both WB05 and WB09 are single-core Cortex-M0+ with radio coprocessor.
In real-world use:
Is single-core sufficient for:
Continuous IMU acquisition (SPI/I2C)
BLE notifications at high rate?
Any known limitations when CPU is handling:
sensor read + packetization + BLE stack?
Looking for best practices on:
Handling continuous BLE notifications without drops
Optimizing:
Connection interval
MTU size
Packet batching strategy
Recommended approach:
Send per-sample vs batch multiple samples?
We have reviewed:
STM32CubeWB0 Getting Started
STM32CubeWB0 GitHub examples
Observations:
STM32_BLE middleware handles full BLE stack
HAL + LL + middleware layered architecture
Questions:
Any recommended BLE example as baseline for streaming use case?
Is BLE_Skeleton sufficient, or better to start from another example?
From documentation:
HAL = easier, portable
LL = better performance
For this use case:
Recommended approach:
HAL only?
HAL + LL mix (e.g., LL for SPI/DMA)?
Is FreeRTOS recommended or avoided for BLE-heavy applications on WB0?
Any known issues with:
latency
BLE timing interference?
Stable continuous BLE streaming (no packet loss)
Efficient CPU usage on single-core
Clean architecture for sensor + BLE sync
MCU selection between WB05 vs WB09
BLE tuning application parameters for ST LSM6DSO + ST Magneto + LSM6DSO + Force, No audio
Firmware architecture (ISR / DMA / buffering)
Any pitfalls to avoid
Thanks in advance!
Solved! Go to Solution.
2026-03-30 3:11 AM
Hello @vkosuri ,
Thank you for taking the time to post with details.
Please find the answer below for each point :
Since you will use a very simple scenario, the resources available on STM32WB05 are likely sufficient.
There is no expected advantage in using STM32WB09 over STM32WB05. The Bluetooth LE stack configuration (used features, buffers, etc.) may need to be tuned.
The performance of STM32WB05/WB09 is usually sufficient to handle sensor data streaming. However, this depends on the target data rate and on the implementation of sensor readout (e.g. whether DMA is used or not, whether filters are applied, and so on).
The advantage of the STM32WB0 stack over most other Bluetooth devices is that it is able to dynamically increase the number of packets in a connection event, to sustain a high data rate (if this is also supported by the Central device).
- **Connection interval**: it can be tuned depending on the required latency. A short connection interval provides better latency, but it increases power consumption and CPU load. A short connection interval may also be required to achieve a higher and more stable throughput.
- **MTU size / Data length**: it is recommended to use a high ATT MTU size to reduce protocol overhead. It is also recommended to increase the maximum Link Layer PDU size to the maximum supported value (typically 251 bytes for LE 1M PHY with Data Length Extension enabled).
A commonly used configuration is:
so that one full ATT payload plus headers can fit efficiently into a single PDU.
- **Packet batching strategy**: it is preferable to collect multiple samples and send them in larger packets, which is more efficient than sending many small packets.
Note: BLE notifications are typically used to achieve high data throughput. However, for very high data throughput, the BLE standard does not guarantee that all notifications are delivered to the application. This needs to be taken into account at application level (For example, use sequence numbers, application‑level ACKs, or loss‑tolerant encoding).
**BLE_SerialPort_Server** is one of the simplest applications that can be used as a starting point, including for streaming use cases.
**BLE_Skeleton** is not a BLE application example but just an "almost empty" project, since it does not contain any application code using BLE.
The recommended approach depends on whether the code needs to be more optimized or easier to develop and maintain.
- **HAL** functions make using the peripherals easier and more portable but require more Flash space and add some overhead.
- **LL** APIs may sometimes be required to reach maximum performance and minimize code size, at the cost of more detailed low‑level configuration.
FreeRTOS typically increases Flash and RAM usage. It may also have a minor negative impact on CPU load due to context switching. I would not recommend using it if the primary goal is to maximize efficiency of CPU and memory usage and the application remains relatively simple.
Hope this answer your questions. If this the case, please close this post by clicking "Accept as Solution" button on my reply. This will help other members of the community find this response more quickly.
2026-03-30 3:11 AM
Hello @vkosuri ,
Thank you for taking the time to post with details.
Please find the answer below for each point :
Since you will use a very simple scenario, the resources available on STM32WB05 are likely sufficient.
There is no expected advantage in using STM32WB09 over STM32WB05. The Bluetooth LE stack configuration (used features, buffers, etc.) may need to be tuned.
The performance of STM32WB05/WB09 is usually sufficient to handle sensor data streaming. However, this depends on the target data rate and on the implementation of sensor readout (e.g. whether DMA is used or not, whether filters are applied, and so on).
The advantage of the STM32WB0 stack over most other Bluetooth devices is that it is able to dynamically increase the number of packets in a connection event, to sustain a high data rate (if this is also supported by the Central device).
- **Connection interval**: it can be tuned depending on the required latency. A short connection interval provides better latency, but it increases power consumption and CPU load. A short connection interval may also be required to achieve a higher and more stable throughput.
- **MTU size / Data length**: it is recommended to use a high ATT MTU size to reduce protocol overhead. It is also recommended to increase the maximum Link Layer PDU size to the maximum supported value (typically 251 bytes for LE 1M PHY with Data Length Extension enabled).
A commonly used configuration is:
so that one full ATT payload plus headers can fit efficiently into a single PDU.
- **Packet batching strategy**: it is preferable to collect multiple samples and send them in larger packets, which is more efficient than sending many small packets.
Note: BLE notifications are typically used to achieve high data throughput. However, for very high data throughput, the BLE standard does not guarantee that all notifications are delivered to the application. This needs to be taken into account at application level (For example, use sequence numbers, application‑level ACKs, or loss‑tolerant encoding).
**BLE_SerialPort_Server** is one of the simplest applications that can be used as a starting point, including for streaming use cases.
**BLE_Skeleton** is not a BLE application example but just an "almost empty" project, since it does not contain any application code using BLE.
The recommended approach depends on whether the code needs to be more optimized or easier to develop and maintain.
- **HAL** functions make using the peripherals easier and more portable but require more Flash space and add some overhead.
- **LL** APIs may sometimes be required to reach maximum performance and minimize code size, at the cost of more detailed low‑level configuration.
FreeRTOS typically increases Flash and RAM usage. It may also have a minor negative impact on CPU load due to context switching. I would not recommend using it if the primary goal is to maximize efficiency of CPU and memory usage and the application remains relatively simple.
Hope this answer your questions. If this the case, please close this post by clicking "Accept as Solution" button on my reply. This will help other members of the community find this response more quickly.
2026-03-30 3:18 AM
@Imen.D wrote:- **Connection interval**: it can be tuned depending on the required latency. .
@vkosuri But remember that it is the Central which makes the decision on what Connection Interval to actually use - it night not observe the value requested by the Device.
This is particularly true of phones, tablets, etc ...
2026-03-30 3:24 AM
@vkosuri wrote:
Recommended approach:
HAL only?
HAL + LL mix (e.g., LL for SPI/DMA)?
It has been said that "premature optimisation is a root of all kinds of evils"
So probably the best approach is to start with HAL, and see what performance you get:
2026-03-31 10:43 AM
Hi @vkosuri
When your questions are answered, please close this topic by clicking "Accept as Solution". This will help other users find that answer faster.
If you still have questions or issues, don't hesitate to return to the Community with new thread or as a continuity of this one (when related to the same initial topic).