cancel
Showing results for 
Search instead for 
Did you mean: 

[BUG] STM32 HAL driver lock mechanism is not interrupt safe

Piranha
Chief II

STM32 HAL driver library is full of flawed and sub-optimal constructs. The most common one, which impacts almost all drivers, is the lock mechanism. It's a bad and limiting design and getting rid of it requires a major rewrite, but the worst fact is that it's not even interrupt safe and therefore doesn't provide locking for which it was introduced. The current __HAL_LOCK() (reformatted) code looks like this:

#define __HAL_LOCK(__HANDLE__)             \
do{                                        \
	if((__HANDLE__)->Lock == HAL_LOCKED)   \
	{                                      \
		return HAL_BUSY;                   \
	}                                      \
	else                                   \
	{                                      \
		(__HANDLE__)->Lock = HAL_LOCKED;   \
	}                                      \
}while (0U)

Between testing and setting the ->Lock an interrupt can happen and also test and set the ->Lock. Therefore both - main thread and interrupt - will continue execution as if the object was unlocked and the interrupt will unlock it before the main thread has completed it's "locked" part, which makes it even more prone to next interrupt calls. The same will happen when higher priority interrupt interrupts lower priority interrupt. Additionally the ->Lock variable is not marked as volatile and therefore is prone to reorder by compiler optimization.

The proposed fix is simple and requires adding of only a few lines of code in stm32XXxx_hal_def.h files for all STM32 series:

#define __HAL_LOCK(__HANDLE__)                \
do {                                          \
	uint32_t rPriMask = __get_PRIMASK();      \
	__disable_irq();                          \
	if ((__HANDLE__)->Lock == HAL_UNLOCKED) { \
		(__HANDLE__)->Lock = HAL_LOCKED;      \
		__set_PRIMASK(rPriMask);              \
	} else {                                  \
		__set_PRIMASK(rPriMask);              \
		return HAL_BUSY;                      \
	}                                         \
} while (0)
 
typedef volatile enum {
	HAL_UNLOCKED = 0x00U,
	HAL_LOCKED   = 0x01U
} HAL_LockTypeDef;

Thus this will make that bad construct at least interrupt safe and actually provide locking as was intended.

Note that __HAL_UNLOCK() code doesn't need modifications as it already is atomic.

17 REPLIES 17

That's a very good idea with a #define to adjust the behavior of __HAL_LOCK/__HAL_UNLOCK mechanism.

Several levels from aborting (current behavior), waiting/blocking, complete disable etc. would be conceivable.

It's on the developer to implement a correct application structure and avoiding concurrent access to same hardware.

There are several synchronisation mechanisms (e.g. flags and semaphores) to deserialize hardware access in one task.

__HAL_LOCK/__HAL_UNLOCK should help to find the problem in application structure but should not try to solve it automatically.

MMAST.1
Associate II

Hello,

an overview on the lock mechanism coming soon in STM32Cube Package :

1- Critical section : The critical section mechanism is based on the use of the stack and the restore primask mechanism instead of enabling IRQs on the Exit CS phase.

Typical use of this method is illustrated in the pseudo code below:

HAL_StatusTypeDef HAL_PPP_Process (PPP_HandleTypeDef *hppp, __PARAMS__)

{

__HAL_ENTER_CRITICAL_SECTION();

 /* Protected resources */

 __HAL_EXIT_CRITICAL_SECTION();

}

The Enter/Exit CS functions are implemented macros in the stm32ynxx_hal_def.h file as follows for both bare metal and RTOS cases:

#if (USE_RTOS == 1)

 #define __HAL_ENTER_CRITICAL_SECTION() OsEnterCriticalSection()

 #define __HAL_EXIT_CRITICAL_SECTION()  OsExitCriticalSection()

#else

#define __HAL_ENTER_CRITICAL_SECTION()     \

   uint32_t PriMsk;                       \

   PriMsk = __get_PRIMASK();              \

     __set_PRIMASK(1);                    \

#define __HAL_EXIT_CRITICAL_SECTION()      \

   __set_PRIMASK(PriMsk);                 \

#endif

2- Lock mechanism : ,The lock object is an entity allocated in the peripherals drivers handles and defined for each standalone process, for full duplex processes with simultaneous transfer, 2 lock objects shall be used. For peripheral with sub instances (Channels, Endpoints….etc) a lock object per sub-instance shall be defined. 

a lock macro is used before starting any process as follows :

HAL_StatusTypeDef HAL_PPP_Process (PPP_HandleTypeDef *hppp, __PARAMS__, uint32_t Timeout)

{

 uint32_t tickstart = HAL_GetTick();

 if(__ARGS__ == WRONG_PARAMS)

 {

   hppp->ErrorCode = HAL_PPP_ERROR_PARAM;

   return HAL_ERROR;

 }

 if(HAL_Lock (hppp->iLock) == HAL_LOCKED)

 {

   return HAL_BUSY;

 }

(...)

Lock methods for ARMv7/ ARMv8

/**

 * @brief  Attempts to acquire the lock.

 * @param  lock   Pointer to variable used for the lock.

* @details This in an interrupt safe function that can be used as a mutex.

           The lock variable shall remain in scope until the lock is released.

           Will not block if another thread has acquired the lock.

 * @returns HAL_LOCKED if everything successful, HAL_UNLOCK if lock is taken.

 */

__STATIC_INLINE HAL_LockStateTypeDef HAL_Lock(__IO uint32_t *lock)

{

   do {

       /* Return if the lock is taken by a different thread */

       if(__LDREXW(lock) != HAL_UNLOCKED) {

           return HAL_LOCKED;

       }

       /* Attempt to take the lock */

   } while(__STREXW(HAL_LOCKED, lock) != 0);

   /* Do not start any other memory access until memory barrier is complete */

   __DMB();

   return HAL_UNLOCKED;

}

/**

 * @brief  Free the given lock.

 * @param  lock   Pointer to variable used for the lock.

 */

__STATIC_INLINE void HAL_UnLock(uint32_t *lock)

{

   /* Ensure memory operations complete before releasing lock*/

   __DMB();

   *lock = HAL_UNLOCKED;

}

Lock methods for ARMv6

/**

 * @brief  Attempts to acquire the lock.

 * @param  lock   Pointer to variable used for the lock.

 * @details This in an interrupt safe function that can be used as a mutex.

           The lock variable shall remain in scope until the lock is released.

           Will not block if another thread has acquired the lock.

 * @ returns HAL_LOCKED if everything successful, HAL_UNLOCK if lock is taken.

 */

__STATIC_INLINE HAL_LockStateTypeDef HAL_Lock(__IO uint32_t *lock)

{

 uint32_t oldvalue;

 __HAL_SAVE_PRIMASK();

 __HAL_ENTER_CRITICAL_SECTION();

 oldvalue = *lock;

 if(*lock == HAL_UNLOCKED) 

 {                              

   *lock = HAL_LOCKED;                            

 }

 __HAL_EXIT_CRITICAL_SECTION();   

 return (oldvalue);  

}

/**

 * @brief  Free the given lock.

 * @param  lock   Pointer to variable used for the lock.

 */

__STATIC_INLINE void HAL_UnLock(__IO uint32_t *lock)

{

   *lock = HAL_UNLOCKED;

}

the above implementations are used in non RTOS env. when RTOS is used (USE_RTOS), the lock is simply a semaphore take (unlock = semaphore release)

this way, when a process is locked in RTOS env. the current process is pended till the semaphore is freed, then the process resume once the semaphore is released.

Rds

Hi @MMAST.1​ 

Thanks for posting the update and the opportunity to review It here.

There are some areas needing some more work. Please accept my comments constructively….

This is a summary of the LDREX and STREX instructions:

  1. The LDREX syntax (simplified): LDREX Rt, [Rn], performs these steps:
    1. Loads the data (Rt) from memory address (Rn)
    2. Set the exclusive access tag
  2. The exclusive access tag is cleared by:
    1. Exception entry or exit, or
    2. Executing STREX or CLREX
  3. The STREX syntax (simplified): STREX Rd, Rt, [Rn], performs these steps:
    1. If the exclusive access tag is set
      1. Store the data (Rt) to memory address (Rn)
      2. Assign status (Rd) = 0
    2. Else
      1. Assign status (Rd) = 1
    3. Clear the exclusive access tag
  4. There is an exclusive access tag per address for memories with a Shared TLB attribute and a single exclusive access tag for all other memories.

For the MCUs equipped with the LDREX and STREX instructions, this is what your HAL_Lock function does:

  1. Loop
    1. Read value from lock’s memory address
    2. Set the exclusive access tag
    3. If the value != HAL_UNLOCKED 
      1. Return HAL_LOCKED
    4. If the exclusive access tag is set, 
      1. Write HAL_LOCKED to lock’s memory address
      2. Status = 0
    5. Else
      1. Status = 1
    6. Clear the exclusive access tag
    7. Break if status == 0
  2. Data memory barrier

These are its outcomes:

  1. If lock was already HAL_LOCKED or the caller’s thread is pre-empted and another thread takes the lock, then return HAL_LOCKED to indicate a different thread has the lock, i.e. the peripheral’s busy.
  2. Otherwise, loop until it’s atomically changed lock from HAL_UNLOCKED to HAL_LOCKED, then return HAL_UNLOCKED to indicate it’s successfully taken the lock. Predominant case is the thread is not pre-empted and so the loop is not taken.

PROBLEM #1. The HAL_Lock function’s detail description “The lock variable shall remain in scope until the lock is released�? is incorrect. It either obtains the lock for its thread or it detects another thread has it.

PROBLEM #2. The HAL_Lock function’s returns description is incorrect/inaccurate, and the HAL_UNLOCKED and HAL_LOCKED returns do not describe the function’s operation well and so a casual reader might incorrectly assume it is only reading the lock. It would read better if it returned HAL_LOCKED if it obtained the lock and HAL_BUSY otherwise.

PROBLEM #3. For the MCUs without LDREX and STREX instructions, the HAL_Lock function would execute faster and the code would be smaller if “if(*lock == HAL_UNLOCKED)�? were replaced with “if(oldvalue == HAL_UNLOCKED)�?. Remember *lock is volatile and the compiler has to load it from memory again. But you have already read it to oldvalue which could be a register, and would still be smaller code if it were local, and you have already entered a critical section.

REQUEST #1. Please add a method to turn off HAL’s locks.

I layer my apps so the calls of each peripheral drivers are single-threaded, or each direction is single-threaded if the peripheral supports simultaneous receive and transmit, and so my apps never see a busy error. If my app needed to output from more than one task, those tasks send to a task dedicated to the peripheral (or its output channel if it is duplex) where it is queued and started it as soon as the last output finishes or immediately if no output is in progress. Similar for receive, if my app needs to send received data to different tasks, a task dedicated to the peripheral would interrogate the data or check an application mode (with suitable protection) to determine where and forward it.

In summary, I design my apps to always work correctly.

Further, my company choose the smallest/cheapest part to do a job. So I want to save easy cycles.

Your method to disable HAL locks might be like this:

  1. In the HAL Settings pane of Cube’s Project Manager/Code Generator screen, add a check box labelled: “Enable Lock checking�? and default it Enabled.
  2. Add a macro assigned the state to the stm32h7xx_hal_conf.h file it generates.
  3. Refactor the HAL code to use the macro.

REQUEST #2. Please add a method to turn off HAL’s parameter checking. I’ve debugged. I’m accepting the MCU may be struck by a sub-atomic particles. I accept the risks. Please turn them off the same way as the HAL’s locks.

I do not have a good grasp why HAL locks are necessary. But clearly they are, else other developers would be asking for ways to turn them off too.

THOUGH #1. What does a task dedicated a peripheral or one of its channels look like? As example, this is one of my go-to methods for a dedicated task to handle a peripheral’s transmit channel:

  1. A call-based function enqueues the output. Queue may be OS or not. If not, mutex if protection is necessary. Dimensioned queue per throughput and burstiness. Block the caller or drop on abnormal queue full per requirements. Notify the task if no output is in progress, with critical section to avoid race.
  2. A transmit complete interrupt notifies the task.
  3. On notify, the task starts next output.

THOUGHT #2. Does HAL have locks only because we can’t engineer our apps properly?

If you develop apps with more than one thread accessing a peripheral or one of its channels, I’ll poke with some tongue-in-cheek questions…

  1. What is your go-to plan for its throwing BUSY? Just drop? I hope not.
  2. So you have decided BUSY is normal. You are designing your app for it. What are your go-to methods?
  3. Do you wait-retry until not BUSY? You should not do that in interrupt.
  4. Does the peripheral driver need to be called from interrupt? If that were true and it throws BUSY, do you pole and burn cycles there? Or, because it had to be started form interrupt, do you start a timer as delay and retry from the timer’s interrupt?
  5. Or as workaround, if BUSY is thrown in interrupt, do you work-around by notifying a task to retry? But if you did that, why do you not reduce your app’s paths of execution and make it only notify, and then the BUSY wouldn’t occur?

Post your thoughts.

Thanks a lot Alister, actually your feedback are more than appreciated. as I mentionned the listing I write is an overview of the update. we will take care of your feedback to improve the mechanism. thanks again

Piranha
Chief II

In addition to what Alister said...

#define __HAL_ENTER_CRITICAL_SECTION()     \
   uint32_t PriMsk;                       \

This will get you in a trouble if the lock/unlock will be necessary multiple times in a single function/block. Also it's not clear what __HAL_SAVE_PRIMASK(); does if __HAL_ENTER_CRITICAL_SECTION(); also does the same. Probably something like this should be introduced and put at the top of the function:

#define __HAL_DECLARE_CRITICAL_SECTION() uint32_t PriMsk

Or another simpler solution is to implement one global critical section nesting counter, as it's done in FreeRTOS, for example.

> when a process is locked in RTOS env. the current process is pended till the semaphore is freed

So for a non-RTOS environment HAL_Lock() would be non-blocking and returning BUSY, but for RTOS it would be blocking and not failing. That is inconsistent and leads to confusion and significantly different usage in each case. I mostly agree to Alister that the HAL_Lock() is unnecessary and damaging. And yes - there is no really a sane scenario for what to do when HAL_Lock() returns BUSY anyway. Managing access to a peripheral is a task for a higher platform layer code, not the driver.

The API that use lock can't be called from interrupt context, as lock is implemented with a RTOS semaphore.

A semaphore can be released from an IT but can't be taken (an IT can't be delayed).

The HAL_UART_DMAStop source code clearly explain this:

/* The Lock is not implemented on this API to allow the user application

   to call the HAL UART API under callbacks HAL_UART_TxCpltCallback() / HAL_UART_RxCpltCallback():

   when calling HAL_DMA_Abort() API the DMA TX/RX Transfer complete interrupt is generated

   and the correspond call back is executed HAL_UART_TxCpltCallback() / HAL_UART_RxCpltCallback()

   */

For example in UART HAL the following functions use lock:

HAL_UART_RegisterCallback

HAL_UART_UnRegisterCallback

HAL_UART_Transmit_IT

HAL_UART_Receive_IT

HAL_UART_Transmit_DMA

HAL_UART_Receive_DMA

HAL_UART_DMAPause

HAL_UART_DMAResume

HAL_LIN_SendBreak

HAL_MultiProcessor_EnterMuteMode

HAL_MultiProcessor_ExitMuteMode

HAL_HalfDuplex_EnableTransmitter

HAL_HalfDuplex_EnableReceiver

So most of the API is not usable from interrupt.

Perhaps the HAL API should report an error if such an API is called from an interrupt context

Some RTOS rises an assert when a forbiden API is used in interrupt context: The HAL could build on that.

MMAST.1
Associate II

Dear All,

Based on the different gathered feedbacks from this Forum and other feedbacks sources and the full analysis of the different calls to the HAL_Lock() and the issues mentioned regarding this topic shows that the __HAL_LOCK() and __HAL_UnLOCK() are not used always as "standard" lock mechanism for critical sections properly but rather as a special state machine to reject launching same HAL processes in several statements in the current HAL , thus the following updates have been introduced on the HAL to fix the issue related to this topic.

  1. Reject launching same HAL processes (ongoing):

Fix

  • Rely rather on the native state machines rather than the HAL_LOCK/HAL_UNLOCK

Example:

HAL_StatusTypeDef HAL_UART_Transmit_IT(UART_HandleTypeDef *huart, const uint8_t *pData, uint16_t Size)

{

   if (huart->gState == HAL_UART_STATE_READY)

   {  ...}

}

2- Protect changing the common state machine for several processes (Already implemented on uart and full deployment ongoing):

Fix: have a state machine per independent process

Example:

typedef struct __UART_HandleTypeDef

{

 (…)

 HAL_LockTypeDef          Lock;                   /*!< Locking object                    */

 __IO HAL_UART_StateTypeDef   gState;             /*!< UART state information related to global Handle management

                                                         and also related to Tx operations. This parameter

                                                         can be a value of @ref HAL_UART_StateTypeDef */

 __IO HAL_UART_StateTypeDef   RxState;            /*!< UART state information related to Rx operations. This

                                                         parameter can be a value of @ref HAL_UART_StateTypeDef */

} UART_HandleTypeDef;

3 - Protect checking and modifying the state machine by locking the check and set statement within lock mechanism based on __LDREXH / __STREXH (only series based on CM0 core come with an implementation around the enable/disable irq: (will be deployed on next HAL releases):

#define HAL_CHECK_AND_SET_STATE(__HANDLE__, __PPP_STATE_FIELD__, __PPP_CONDITIONAL_STATE__, __PPP_NEW_STATE__)  \

 do {                                                      \

  do{                                                     \

   /* Return HAL_BUSY if the status is not ready */                              \

   if (__LDREXW((__IO uint32_t *)&(__HANDLE__)->__PPP_STATE_FIELD__) != (uint32_t)(__PPP_CONDITIONAL_STATE__)) \

   {                                                     \

    return HAL_BUSY;                                             \

   }                                                     \

   /* if state is ready then attempt to change the state to the new one */                  \

  } while(__STREXW((uint32_t)(__PPP_NEW_STATE__), (__IO uint32_t *)&((__HANDLE__)->__PPP_STATE_FIELD__)) != 0); \

  /* Do not start any other memory access until memory barrier is complete */                 \

  __DMB();                                                   \

 }while(0)

4 - Protect common processes register update (Already implemented):

Fix:

Add new macros in the CMSIS device files for atomic bit and registers modifications (based on __LDREXH / __STREXH (only series based on CM0 core come with an implementation around the enable/disable irq)

§ ATOMIC_SET_BIT(REG, BIT)                           

§ ATOMIC_MODIFY_REG(REG, CLEARMSK, SETMASK)                        

§ ATOMIC_SETH_BIT(REG, BIT)                          

§ ATOMIC_CLEARH_BIT(REG, BIT)                        

§ ATOMIC_MODIFYH_REG(REG, CLEARMSK, SETMASK)                  

Example:

HAL_StatusTypeDef HAL_UART_Receive_DMA(UART_HandleTypeDef *huart, uint8_t *pData, uint16_t Size)

{

 (…)

       /* Enable the UART Receiver Timeout Interrupt */

       ATOMIC_SET_BIT(huart->Instance->CR1, USART_CR1_RTOIE); => ATOMIC access for the USART_CR1_RTOIE bit

   (…)

}

Thanks and Regards

Maher

Thomas LB
Associate III

Hello @MMAST.1,

Can you share the yearly update on this topic ?

I'm still struggling with very rare bug on a module <-> mcu communication based on.
I'm using raw C, no OS, no multi-thread. 

TX (in Cube MX: Preemption Priority 2 UART and DMA):

HAL_UART_Transmit_DMA() wait for a flag set by

HAL_UART_TxCpltCallback()

 

RX: (in Cube MX: Preemption Priority 2 UART and DMA, DMA Circular, Overrun: Disable, DMA on RX Error: Enable)

HAL_UARTEx_ReceiveToIdle_DMA()

 

I also implemented a ErrorCallback like this:
void HAL_UART_ErrorCallback(UART_HandleTypeDef *huart)
{
HAL_UART_AbortReceive(&huart2);
HAL_UART_DeInit(&huart2);
// MX_USART2_UART_Init();
// Ringbuf_Init(); // restart HAL_UARTEx_ReceiveToIdle_DMA(&UART, aRXBufferUser, RxBuf_SIZE);
}
As you can see I first tried with to restart in the ErrorCallback but was still facing some very rare lock (my timeout (implemented so that if response is not received within x seconds, the module is turn off and will retry next loop) is not firing, my code just hang somewhere, so I decided to just try to stop is a clean way, benefit from the timeout (applicative driven using a custom counter decreased in SysTick_Handler()) and restart next loop.
At this point I don't know if this code is still prone to issue of lock and I'm not even able to tell if this is related to HAL_LOCK mechanism described here but at least I can tell that the code in my latest version (stm32l0xx_hal_def.f) is the same as the one said to be buggy in 2020

 

#define __HAL_LOCK(__HANDLE__)
	do{
		if((__HANDLE__)->Lock == HAL_LOCKED)
		{
		return HAL_BUSY;
		}
		else
		{
		(__HANDLE__)->Lock = HAL_LOCKED;
		}
    }while (0)

#define __HAL_UNLOCK(__HANDLE__)
	do{
		(__HANDLE__)->Lock = HAL_UNLOCKED;
	}while (0)

 

More detailed about my issue here

Thank you for any help