cancel
Showing results for 
Search instead for 
Did you mean: 

[BUG] STM32 HAL driver lock mechanism is not interrupt safe

Piranha
Chief II

STM32 HAL driver library is full of flawed and sub-optimal constructs. The most common one, which impacts almost all drivers, is the lock mechanism. It's a bad and limiting design and getting rid of it requires a major rewrite, but the worst fact is that it's not even interrupt safe and therefore doesn't provide locking for which it was introduced. The current __HAL_LOCK() (reformatted) code looks like this:

#define __HAL_LOCK(__HANDLE__)             \
do{                                        \
	if((__HANDLE__)->Lock == HAL_LOCKED)   \
	{                                      \
		return HAL_BUSY;                   \
	}                                      \
	else                                   \
	{                                      \
		(__HANDLE__)->Lock = HAL_LOCKED;   \
	}                                      \
}while (0U)

Between testing and setting the ->Lock an interrupt can happen and also test and set the ->Lock. Therefore both - main thread and interrupt - will continue execution as if the object was unlocked and the interrupt will unlock it before the main thread has completed it's "locked" part, which makes it even more prone to next interrupt calls. The same will happen when higher priority interrupt interrupts lower priority interrupt. Additionally the ->Lock variable is not marked as volatile and therefore is prone to reorder by compiler optimization.

The proposed fix is simple and requires adding of only a few lines of code in stm32XXxx_hal_def.h files for all STM32 series:

#define __HAL_LOCK(__HANDLE__)                \
do {                                          \
	uint32_t rPriMask = __get_PRIMASK();      \
	__disable_irq();                          \
	if ((__HANDLE__)->Lock == HAL_UNLOCKED) { \
		(__HANDLE__)->Lock = HAL_LOCKED;      \
		__set_PRIMASK(rPriMask);              \
	} else {                                  \
		__set_PRIMASK(rPriMask);              \
		return HAL_BUSY;                      \
	}                                         \
} while (0)
 
typedef volatile enum {
	HAL_UNLOCKED = 0x00U,
	HAL_LOCKED   = 0x01U
} HAL_LockTypeDef;

Thus this will make that bad construct at least interrupt safe and actually provide locking as was intended.

Note that __HAL_UNLOCK() code doesn't need modifications as it already is atomic.

17 REPLIES 17
SStor
Senior

I think __HAL_LOCK/__HAL_LOCK would be unnecessary if user calls HAL functions in the correct way (no concurrent access to same hardware ressource from multiple threads/ISR). And therefore all additional overhead in this function would be also unnecessary.

So __HAL_LOCK should never return HAL_BUSY because the calling HAL function will be aborted with this error and further following errors!

A loop would indicate the problem at the debugging immediately (causing a deadlock on concurrent access):

#define __HAL_LOCK(__HANDLE__)                     \
do{                                                \
	if((__HANDLE__)->Lock != HAL_LOCKED)       \
	{                                          \
		(__HANDLE__)->Lock = HAL_LOCKED;   \
		break;                             \
	}                                          \
}while (1U)
 

alister
Lead

This isn't the fix IMO.

__HAL_LOCK should not be used by interrupt.

Apps should be sensibly threaded, i.e. by either entirely by one thread, or (general case) if the driver/device has receive and transmit channels and receive uses interrupt, one thread does overall init and receive and the other does transmit.

ONLY if a critical section's necessary during de/re-init, it should be plainly written and visible in the code, not hidden in a macro.

MMAST.1
Associate II

Hello,

This point is already on the top of the improvement that will be introduced shortly in our HAL according to HAL evolution and improvement roadmap . Note that the Lock mechanism approach will be fully reworked to distinguish between Lock process and critical section (lock resources)

So actually, in the HAL we will apply three mechanisms :

  • State machine (Functional sequence)
  • Lock process (lock full operation, prevent spurious process overlap on the same hardware instance or sub instance)
  • critical section : prevent two or more ‘threads’ accessing the same data in parallel

Thus the old HAL_Lock/Unlock macros will be reworked and extended. Keep tuned!

Rds

Maher

Piranha
Chief II

The comments from @SStor​ and @alister​ rises another general question - which HAL functions are allowed to be called from interrupt context? Obviously the HAL_PPP_IRQHandler() is allowed because that is it's sole purpose, but what about other functions? The guys assume that no other function is (intended to be) allowed, but is that true? I can't find clear explanation anywhere. For example, let's look at UM1905. The "Figure 7: HAL driver model" at page 60 seems to show that HAL_PPP_Process() can be called from interrupt context. A bit lower is the text:

Basically, the HAL driver APIs are called from user files and optionally from interrupt handlers file when the APIs based on the DMA or the PPP peripheral dedicated interrupts are used.

This says that it's allowed but doesn't specify which ones. @MMAST.1​, someone from ST should clarify this here in a forum and also document it explicitly!

Piranha
Chief II

Related topics...

https://community.st.com/s/question/0D50X0000C8eyb9SQA/bug-the-hallockx-macro-needs-a-lock

This topic cites HAL code comment, which shows that HAL_UART_DMAStop() is intended to be called from interrupt context:

https://community.st.com/s/question/0D50X0000C3CcW6SQK/can-we-call-hallock-within-interrupt-

MMAST.1
Associate II

All HAL APIs can be called from ISR context except polling process ones like HAL_UART_Transmit as they are in polling model. This is said the HAL is built on a full process model (start --> End of operation) with an asynchronous notification in case of DMA/IT model thus those processes are protected through either state machine or lock mechanism while the process is busy. once a process has finished or forced to abort it is unlocked and other process on the same resources (Instances, Channel..etc) can take place. processes can be launched of course in parallel when they acts on different instances or sub-process (channel, endpoint, Tx/Rx transmission/reception...etc)

Rds

Maher

DOsbo
Senior

Thanks @MMAST.1​. I'm glad to hear this issue will be fixed soon. I noticed ST was working on it back in 2016, so I assume it's being rolled up with another major release.

https://community.st.com/s/question/0D50X00009XkeOGSAZ/questions-surrounding-hallock

Cheers

David

DOsbo
Senior

Thanks for the heads-up @Piranha​ .

The lock mechanism is a protection against incorrect implementation?

Don't lose sight of its purpose and that developers are asking too for a leaner/faster HAL.

Developers want means to disable locks and argument checking.

In Cube perhaps it'd be configured in the HAL Settings pane in Project Manager.

It's macro, e.g. HAL_DISABLE_CHECKS, default 0 (checks are on), would be added to stm32h7xx_hal_conf.h.