CubeMX Generated ThreadSafe/stm32_lock.h - Potential Bug

BCowp.1 · ‎2023-01-11

I've burned a significant amount of time tracking down an issue, so I thought I would share in the hope of helping others and maybe improving ST's provided newlib locks.

(This is probably in the wrong place, please move it as required.)

I'm using the ThreadSafe files (stm32_lock.h & newlib_lock_glue.c) generated by CubeMX 6.6.1 to make newlib thread safe.

I'm utilising Strategy 4: "Allow lock usage from interrupts. Implemented using FreeRTOS locks." (because I'm using FreeRTOS but also the ST USB device library which by default calls malloc from an ISR).

Strategy 4 implements newlib's __retarget_lock_etc suite of hooks to lock critical sections inside newlib.

To test how robust this solution is I created 3 (exact number not critical, but I had 3 LEDs for debugging...) threads which all repeatedly malloc() and free() in infinite loops.

To my dismay it would not be long before a crash/instability occurred.

Often the particular "canary" to be killed would be STM32_LOCK_ASSERT_VALID_NESTING_LEVEL inside stm32_lock_acquire() or stm32_lock_release().

I tracked the issue to corrupted/illogical values for the contents of the lock (LockingData_t) in particular the nesting level itself.

/**
  * @brief Acquire STM32 lock
  * @param lock The lock to acquire
  */
static inline void stm32_lock_acquire(LockingData_t *lock)
{
  STM32_LOCK_BLOCK_IF_NULL_ARGUMENT(lock);
  STM32_LOCK_ASSERT_VALID_NESTING_LEVEL(lock);
  lock->basepri[lock->nesting_level++] = taskENTER_CRITICAL_FROM_ISR();
}

It turns out (upon inspecting the assembly) that GCC decided to carry out the lock->nesting_level++ increment BEFORE disabling interrupts (setting the basepri interrupt mask in my case).

Therefore it was possible for multiple contending threads to each increment lock->nesting_level in their call to stm32_lock_acquire() before the leading thread would eventually turn off interrupts, by which point the nesting_level value would be higher than it logically should be. Shenanigans soon follow.

The incrementing of nesting_level should be inside the critical region, but in my case GCC had other ideas.

The fix for me was simple:

/**
  * @brief Acquire STM32 lock
  * @param lock The lock to acquire
  */
static inline void stm32_lock_acquire(LockingData_t *lock)
{
  STM32_LOCK_BLOCK_IF_NULL_ARGUMENT(lock);
  STM32_LOCK_ASSERT_VALID_NESTING_LEVEL(lock);
  lock->basepri[lock->nesting_level] = taskENTER_CRITICAL_FROM_ISR();
  lock->nesting_level++;
}

This convinced the compiler to increment inside the critical region, with interrupts safely off.

I would recommend ST make this change to their released code.

Piranha · ‎2023-07-23

The reported bug is a consequence of the developer not understanding how the C language actually works. The reason is the assignment operator = is not a sequence point. Here is a discussion of why is it so.

Apart from the already reported bug, the approach for strategy 4 is somewhat broken by design also. Why is there an array, which saves multiple BASEPRI values? Except for the first taskENTER_CRITICAL_FROM_ISR() call, all of the nested calls of will return the same configMAX_SYSCALL_INTERRUPT_PRIORITY constant value anyway. Save it only after the first call and restore it with a taskEXIT_CRITICAL_FROM_ISR() call only when the nesting level counter has reached zero again. When this is implemented, there is no need to limit the nesting levels to two an the limit can be raised to UINT8_MAX. It's the same logic, which is implemented for strategy 2.

In addition code can use the FreeRTOS's portASSERT_IF_INTERRUPT_PRIORITY_INVALID() macro to assert on the calls from illegal interrupts with priority levels above the configMAX_SYSCALL_INTERRUPT_PRIORITY.

As a minor memory usage optimization the BASEPRI values can also be safely stored in uint8_t variables because there isn't a single STM32 microcontroller or any Cortex-M core at all, which would use more than 8 LSB of that register.

Piranha · ‎2023-07-23

And now let's take a look at the code for the strategy 2.

static inline void stm32_lock_acquire(LockingData_t *lock)
{
  uint8_t flag = (uint8_t)(__get_PRIMASK() & 0x1); /* PRIMASK.PM */
  __disable_irq();
  __DSB();
  __ISB();
  STM32_LOCK_BLOCK_IF_NULL_ARGUMENT(lock);
  if (lock->counter == 0)
  {
    lock->flag = flag;
  }
  else if (lock->counter == UINT8_MAX)
  {
    STM32_LOCK_BLOCK();
  }
  lock->counter++;
}

There is no need for the & 0x1 code for PRIMASK value, because that register has just that single bit in hardware and CMSIS documents that __get_PRIMASK() returns 0 or 1 anyway. Also there is absolutely no need for any barrier instructions after __disable_irq(). It is documented very clearly in "Application Note 321 ARM Cortex-M Programming Guide to Memory Barrier Instructions" section "4.8 Disabling interrupts using CPS and MSR instructions".

static inline void stm32_lock_release(LockingData_t *lock)
{
  STM32_LOCK_BLOCK_IF_NULL_ARGUMENT(lock);
  if (lock->counter == 0)
  {
    STM32_LOCK_BLOCK();
  }
  lock->counter--;
  if (lock->counter == 0 && lock->flag == 0)
  {
    __enable_irq();
  }
}

The conditional interrupt enabling with && lock->flag == 0 and __enable_irq(); is inefficient. Instead the same should be done like this:

  if (lock->counter == 0)
  {
    __set_PRIMASK(lock->flag);
  }

Here is a discussion with more details about the same.

Piranha · ‎2023-07-23

And even better question for ST is why that "ThreadSafe" folder with it's files is available only for CubeMX. Why isn't it available as a normal software component in firmware packages and GitHub in the "Utilities" directory?

unturned3 · ‎2024-06-28

I just ran into this issue... seems like it still hasn't been fixed as of today (CubeMX version 6.11.1)

Can someone from ST kindly take a look?