Does USB device code for STM32 really allocate dynamic memory from interrupts?

freddie_chopin · ‎2017-01-13

Posted on January 13, 2017 at 18:29

Today in one of my projects I've enabled a new feature of my RTOS (

) which ensures some functions are not used in interrupts. This checks - among others - for any use of mutexes from interrupts, as it is a logical error to do that (first of all - interrupts cannot ''block'' and additionally using mutexes in interrupts violates the concept of ''ownership'').

You may imagine how surprised I was when the critical error was raised even before the application would fully start... See this call stack:

It turns out that ST's code for USB device (CDC, audio, HID, DFU, MSC) in STM32 happily allocates dynamic memory in interrupts... The path is like this:

OTG_FS_IRQHandler() interrupt,
HAL_PCD_IRQHandler(),
HAL_PCD_SetupStageCallback(),
USBD_LL_SetupStage(),
USBD_StdDevReq(),
USBD_SetConfig(),
USBD_SetClassConfig(),
indirect call of a function stored in ''Init'' field of USBD_ClassTypeDef struct.

The ''Init'' pointer in

USBD_ClassTypeDef struct is initialized with an address of functions like

USBD_CDC_Init(), and these function - almost right at the beginning - call malloc() via USBD_malloc() macro...

As malloc() is generally _NOT_ reentrant, it has to be protected from concurrent access from multiple contexts/threads. If a real-time operating system is used, this protection is usually implemented with a mutex or pausing the scheduler (preventing context switches). In single threaded applications malloc() is usually not protected at all, because most of the people are reasonable enough and don't even consider using malloc() in interrupts. In both of these cases adding ST's USB device code sets the application on a path to certain failure if it actually uses dynamic allocation anywhere else:

when locking is done with mutexes in a multithreaded system - any attempt to block in interrupt handler (like locking a locked mutex used to serialize access to malloc()) is undefined behaviour and will most likely severely corrupt the system,
worse - as mutexes serializing access to malloc() are usually recursive, such call can lead to a typical race condition, causing heap corruption,
when locking is done by pausing the scheduler in a multithreaded system - interrupts are not affected by that locking mechanism, malloc() may be re-entered from two different contexts, leading to heap corruption,
in a single-threaded application with no locking - an allocation in interrupt during an allocation in main thread will lead to heap corruption.

This whole discussion obviously also applies to free(), which is called in interrupt context via at least 3 different code paths...

#malloc #memory-corruption #race-condition #heap #rtos #usb

Andrei Chichak · ‎2018-01-20

Posted on January 20, 2018 at 19:55

I spent a few weeks tracking this down as well.

I am integrating a USB CDC device into a system that uses RTEMS as the RTOS.

Once I tracked down the cause, malloc in an interrupt context, I checked with some of my friends that do embedded systems for medical devices, fitness trackers, consumer goods, and the sort, and the general consensus was 'THEY DID WHAT? NO don't ever use malloc in an interrupt routine. What were they thinking?'

To be sure that it wasn't just the unanimous opinion of 6 random embedded programming professionals, I went to the authors of RTEMS to understand what there problem was. They write:

It is definitely returning NULL because

you shouldn't malloc from an ISR. malloc() is a non-deterministic operation. I am

surprised that code was designed to do that.

My workaround came down to a few lines of code that avoided using malloc and free. I noticed that the malloc just allocated enough memory for a single

USBD_HandleTypeDef, the code used one instance of it, and then freed it when the USB disconnected. This could EASILY be accomplished by using a single global variable in my space:

USBD_HandleTypeDef

usbHandle;

Then, in usb_cdc.c, instead of doing the malloc:

pdev->

pClassData

= (

void

*) &usbHandle;

and instead of doing the free:

pdev->

pClassData

= NULL;

This will always work.

There are a few more damning issues with the original design. Even though malloc is called in an interrupt routine, the null result gives a return code of -1 to the caller of USBD_CDC_INIT, but it does not get checked or acted upon. The null pointer just gets used, even though the lower level routines tried to tell the callers that there was an error.

Finally, the various functions are set up to return an uint8_t but actually return a USBD_StatusTypeDef enum - sometimes. Enums are not uint8_ts, they are enums. If you want to return an enum, return an enum.

Once again, this code sucked up a lot of unbillable time.

Andrei

Pavel A. · ‎2018-01-21

Posted on January 21, 2018 at 16:01

In the specific case of ST device-side examples this allocation does not actually use malloc in Cube generated project (as in the CubeMX repository v, 1.9.0). Instead, it creates 'USBD_static_malloc' which just returns a static buffer. And it never can fail.

For one instance it is OK

But as general design issue, all this should be moved out of the interrupt handler to a 'task' (or whatever it's called in your OS).

- pa

Andrei Chichak · ‎2018-01-22

Posted on January 22, 2018 at 20:57

From the description in the app note, all that they are doing is allocating a total of 16 bytes.

I wasted a lot of time trying to second guess what they are doing for a total of 16 bytes.

Now I'm even more pissed off.

A

jschloss · ‎2018-12-29

This just broke my code too, after I turned on the thread safety locks in newlib (__malloc_lock) for freertos. This is very broken, malloc isn't threadsafe or reentrant on newlib by default! I hacked a static array for CDC driver like the earlier posters, and it seems to be working now, but this is a big deal. The code is not safe as-is.

Is there a better place to report bugs in the HAL drivers? Newlib's heap does not support being used like this.

pshankar · ‎2019-03-15

Thank you, I ran into the same issue after overloading my malloc statements to make it thread safe and ran into the same issue. Resorting to static memory allocation now,

Andrei Chichak · ‎2019-08-12

Hey, let's bring this up again.

I managed to shoot myself in the foot by using the method in UM1734 at face value.

UM1734 gives a static allocation of 4 uint32_t memory units (16 bytes). The device descriptor for a USB CDC, where this code is used commonly, needs 540 bytes at this time.

So there are a couple of problems here, the standard malloc function allocates N bytes, not N uint32_ts, so 4 times as much space is allocated than necessary. Next, UM1734 doesn't give enough background to understand the semantics of allocating just 4 units, so the unwary user (waves hand) will use the code as is.

So if you are using the USB middleware and have set it up to allocate memory in a static manner, please take another look at your USBD_static_malloc routine and the value of MAC_STATIC_ALLOC_SIZE and make sure that it is okay. (put a breakpoint in USB_static_malloc and check the value of size).

Sigh.

Andrei

waclawek.jan · ‎2019-08-12

https://community.st.com/s/question/0D50X0000BCNOz7SQH/why-does-the-hal-usb-cdc-driver-call-malloc-in-an-isr-context-again

anonymous.8 · ‎2019-08-12

On 2019-08-12 3:46 p.m., ST Community wrote:

AToth · ‎2021-08-10

MISRA C: avoid malloc at all cost.

USB configuration is known in advance so everything can be allocated during compilation.

Again, poor/bloated SW from ST.

Lix Paulian · ‎2021-08-10

MISRA-C, they're living in the past. The point here is not necessarily that they use malloc() and free(), but that they do it on an interrupt context!