2026-05-06 8:16 AM - edited 2026-05-06 8:24 AM
Dear all,
I think I have discovered a bug in the CMSIS headers supplied for the STM32H7RSx and STM32H7x MCUs.
In "Drivers/CMSIS/Device/ST/STM32H7RSxx/Include/stm32h7rsxx.h", there are definitions for ATOMIC_SET_BIT() and ATOMIC_CLEAR_BIT() among others, see https://github.com/STMicroelectronics/cmsis-device-h7rs/blob/7e6e213ddc397c76622a0aca2f623ccc3b34c010/Include/stm32h7rsxx.h#L162:
/* Use of CMSIS compiler intrinsics for register exclusive access */
/* Atomic 32-bit register access macro to set one or several bits */
#define ATOMIC_SET_BIT(REG, BIT) \
do { \
uint32_t val; \
do { \
val = __LDREXW((__IO uint32_t *)&(REG)) | (BIT); \
} while ((__STREXW(val,(__IO uint32_t *)&(REG))) != 0U); \
} while(0)and the "register" in the description and the '__IO' qualifier in the code hint that this can be used for peripheral registers. This suspicion is correct, e.g., see https://github.com/STMicroelectronics/stm32h7rsxx-hal-driver/blob/1bde483bc7ab4883c2ef643e5a16ad6b303a631f/Src/stm32h7rsxx_hal_uart.c#L4214:
static void UART_TxISR_8BIT(UART_HandleTypeDef *huart)
{
/* Check that a Tx process is ongoing */
if (huart->gState == HAL_UART_STATE_BUSY_TX)
{
if (huart->TxXferCount == 0U)
{
/* Disable the UART Transmit Data Register Empty Interrupt */
ATOMIC_CLEAR_BIT(huart->Instance->CR1, USART_CR1_TXEIE_TXFNFIE);Note above that the LDREX instruction is used in the ATOMIC_CLEAR_BIT() implementation.
STM32H7RSx and STM32H7x MCUs host ARM Cortex M7 cores, which follow the ARMv7-M architecture.
The ARMv7-M Architecture Reference Manual (https://developer.arm.com/documentation/ddi0403/d/Application-Level-Architecture/ARM-Architecture-Memory-Model/Synchronization-and-semaphores/Load-Exclusive-and-Store-Exclusive-usage-restrictions) states:
"LDREX and STREX operations must be performed only on memory with the Normal memory attribute."
Most code examples for STM32H7RSx and STM32H7x (e.g., https://github.com/STMicroelectronics/STM32CubeH7RS/blob/4891e67739a01faea3c35d7a9bdccea6970266fd/Projects/STM32H7S78-DK/Examples/GPIO/GPIO_IOToggle/Appli/Src/main.c#L171) don't define memory properties for the peripherals region (0x4000_0000 .. 0x5FFF_FFFF for STM32H7RSx) by means of an explicit MPU config, but they do define this by means of the implicit MPU region -1 by means of calling HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT). This means privileged code will access the peripherals using the default memory properties which can be found in the MCU's reference manual. For the peripherals, this is with the Device memory attribute:
Therefore, I conclude this is a violation of the ARM architecture reference manual. This is confirmed by this post (https://community.st.com/t5/stm32-mcus-products/ldrex-instruction-on-the-stm32h755-when-mpu-activated-causes/m-p/174176/highlight/true#M36012) and by someone of ST in the accepted answer of that same topic:
"We don't implement global monitor on STM32H7. We recommend to use the HW semaphore for synchronization."
I happened to run into this using an MPU config that defines an explicit MPU region including the peripherals region with the Device memory attribute set: the target froze on a "LDREX [0x4000_4C00, #0]" instruction (UART4 CR1 on STM32H7S3). Normal debugger access was not possible anymore (I had to use HW instruction tracing to figure out where it hung). The corresponding code for that instruction was in "stm32h7rsxx_hal_uart.c" in UART_TxISR_8BIT():
/* Disable the UART Transmit Data Register Empty Interrupt */
ATOMIC_CLEAR_BIT(huart->Instance->CR1, USART_CR1_TXEIE_TXFNFIE);Interesting though was that this freeze only happened after two days of running without problems. When I randomly interrupt the MCU while not in a frozen state and then put a breakpoint on that instruction and then continue, the breakpoint hits and I can continue without encountering any freeze. So it seems it only freezes very rarely at that point.
Can someone from ST comment on this, confirm that this is indeed a violation/bug (and what are the suggested fixes on both short term and long term)?
2026-05-06 8:57 AM
Regardless of whatever is said online, if you test it, you will find that STREX/LDREX work as intended when accessing peripheral memory, including USART registers.
If you are concerned about functionality, there is no bug to address.
2026-05-06 9:12 AM
This has already been discussed, IIRC when this HAL code was introduced in the STM32F4 library. STM32H7R/S is a different architecture. If you see problems like freezing, maybe this deserves a second look.
> this freeze only happened after two days of running without problems
So the freeze only occurred once, or you can repro it within ~ 2 days?
2026-05-07 12:04 AM
Yes, I can consistently reproduce the freeze after ~2 days.
2026-05-07 12:06 AM
I know many occasions where people said "I tested, it works" while they're actually in undefined behavior according to the reference manuals. One day, they regret it. That day seems to be here now.
2026-05-07 6:57 AM - edited 2026-05-07 7:00 AM
Can you make a compact code that reproduces the freeze on a STM32H7S eval board, or other H7 Nucleo board without external memories (H72/73, H7A/B...) ? Then we could escalate this to ST support.
2026-05-08 1:32 AM
Dear Pavel,
Up until now I haven't been able to succeed in doing that yet. This was first noticed on a large code base, and only up until recently I was able to get the instruction trace of it. But now that I have an idea what the problem might be, I will definitely give this another attempt. Maybe I should create a program that intensively runs these LDREX/STREX instructions concurrently on a peripheral register (loop in main function + high frequency timer interrupt routine).
2026-05-09 4:28 PM
Perhaps the following contrived scenario can result in a "live freeze":
- Unhandled NVIC interrupt occurs spuriously, it triggers again immediately after returning from the handler.
- Interrupt breaks the exclusive monitor activated by LDREX, so the following STREX fails.
- This repeats in the do.. while loop in ATOMIC_SET_BIT
2026-05-10 8:15 AM
Thanx for bringing that up (which sort of is a biggy for my own source base). Stellar observation.
Now it would be interesting to see what would happen if you use the MPU to use a non-DEVICE memory type for the device region, like TEX = 000, C = 0, B = 0.
I do have the sneaky suspicion that the language there with the background of implementing the exclusive monitor in the cache logic only (similar to RISCV).
2026-05-10 8:20 AM
ARM-V8M is even more restrictive:
"The only memory types for which it is architecturally guaranteed that a global exclusive monitor is implemented
are:
• Inner Shareable, Inner Write-Back, Outer Write-Back Normal memory with Read allocation hint and Write
allocation hint and not transient.
• Outer Shareable, Inner Write-Back, Outer Write-Back Normal memory with Read allocation hint and Write
allocation hints and not transient."
Meaning if you have PSRAM on OSPI/HSPI with Write-Throu, you cannot have LDREX/STREX.