How to use FreeRTOS message buffers on STM32H7 dual-core devices

Adam BERLINGER · ‎2021-10-07

1. Introduction and theory

When running instance of FreeRTOS on each core, we can implement inter-process communication (IPC) using FreeRTOS message buffers as described here:
https://www.freertos.org/2020/02/simple-multicore-core-to-core-communication-using-freertos-message-buffers.html
Main part of the implementation is already provided by FreeRTOS. We just need to implement two notifications between the cores:

When new message is available in buffer
When message buffer is released by reader

Also we should implement this notification for each message buffer we use. For bidirectional communication, we need separate message buffer for each direction.
For sending these notifications we can use hardware semaphore (HSEM) peripheral. However with that we can only send the notification and no more data. So we would need 4 different HW semaphores for basic bidirectional communication. There are 32 in total, so 4 isn't that much.
Or we can implement "control" message buffer where we pass this information. This becomes quite useful when we want to implement multiple message buffers between cores and different tasks.
Please note that each buffer should be written only by single task. The same goes for reading.
In case of implementing "control" message buffer, we pass the following information in the message:

Which message buffer is being notified
What is the notification (new message available or message released)

So when new "data" message is sent, first the "data" message buffer is populated, then the "control" message buffer is populated and finally HSEM notification is generated.
Of course there could be issue when the "control" message buffer becomes full. However handling such case can be difficult and it might be easier to make the control buffer large enough. Having space for 2 events for each "data" message buffer can be a good staring point. This is a disadvantage compared to the solution without "control" message buffer.
However we can detect this issue and at least report error (instead of ending in some undefined behavior). The "control" message buffer is released in HSEM interrupt handler, so having high priority on this interrupt can also help.
(In theory you could try to implement some notification for the "control" message buffer separately, but then you also need to make sure that only one thread is waiting for "control" message buffer. And this solution might bring even more complexity.)
We also need to take care where the message buffers are allocated. The best place would be to place it in D3 SRAM (starting 0x38000000). This memory is accessible by both cores and is retained when one of the cores goes to low power mode. Since we need to allocate multiple handles and buffers, easiest approach is to define global structure and place it at the beginning of D3 SRAM.
Here we make assumption that both CM4 and CM7 compiled code will use same size and alignments for the structure.

2. Example implementation

Example is created for STM32H745-Discovery board, but it should be easy to port to other boards/devices.
Partially based on cm_ipc.c/.h available in STM32Cube_FW_H7_V1.9.0 firmware package (STM32Cube_FW_H7_V1.9.0\Projects\STM32H747I-DISCO\Demonstrations\extra_modules\STIPC).
Here we focus on the implementation with "control" message buffer (FreeRTOS_MultipleMessageBuffers). The examples contain also simpler version with single data message buffer.
In the application, each core sends message periodically on channel 0. When core receives message from channel 0 it will toggle LED. The CM4 adds intentionally a delay after receiving message. This test the case where CM7 thread is waiting for available space in message buffer
On channel 1, the CM4 sends a message when user button is pressed on board. When CM7 receives the message it will toggle other LED.

2.1 STM32CubeMX configuration

In STM32CubeMX we need to enable FreeRTOS middleware for both cores, the CMSIS version shouldn't matter, since we will be calling the FreeRTOS directly. Also we need to enable HSEM interrupt for both cores in NVIC1 (CM7) and NVIC2 (CM4). This basic configuration should be sufficient.
In this example, the MPU of both CM7 and CM4 prevents accessing FLASH and RAM memory used by other core, except for D3 SRAM. This is to avoid and detect any unwanted access, caused by wrong configuration.

2.2 Adding cm_ipc.c/.h files

You can find the cm_ipc.c/.h implementation in the attached examples.
For sending the "control" messages we use the following structure:

typedef struct {
	MessageBufferHandle_t buffer;
	uint32_t is_receive;
}amp_ctrl_msg_t;

For the buffer allocation we define shared_ram_t structure that is placed in D3 SRAM:

typedef struct {
	MessageBufferHandle_t cm7_to_cm4_handle;
	MessageBufferHandle_t cm4_to_cm7_handle;
	StaticMessageBuffer_t cm7_to_cm4_xmsg;
	StaticMessageBuffer_t cm4_to_cm7_xmsg;
	uint32_t cm7_to_cm4_buffer[IPC_CHANNEL_BUFFER_SIZE/4];
	uint32_t cm4_to_cm7_buffer[IPC_CHANNEL_BUFFER_SIZE/4];
}ipc_channel_t;

typedef struct {
	/* Control message buffers */
	MessageBufferHandle_t cm7_to_cm4_handle;
	MessageBufferHandle_t cm4_to_cm7_handle;
	StaticMessageBuffer_t cm7_to_cm4_xmsg;
	StaticMessageBuffer_t cm4_to_cm7_xmsg;
	uint32_t cm7_to_cm4_buffer[CM7_TO_CM4_CTRL_SIZE/4];
	uint32_t cm4_to_cm7_buffer[CM4_TO_CM7_CTRL_SIZE/4];

	ipc_channel_t channels[IPC_NUMBER_OF_CHANNELS];
}shared_ram_t;

In this example, to simplify configuration and initialization, all "data" buffers have the same size, so we can simply defined them as array.
Here is interrupt handler that is called from the HSEM interrupt and that process incoming "control" messages:

static void prvCoreInterruptHandler(int ctrl)
{
    BaseType_t xHigherPriorityTaskWoken = pdFALSE;
    amp_ctrl_msg_t ctrl_msg;

    if (!xrx_ctrl_buf){
        return;
    }


	while(xMessageBufferReceiveFromISR(xrx_ctrl_buf, &ctrl_msg, sizeof(amp_ctrl_msg_t), &xHigherPriorityTaskWoken) == sizeof(amp_ctrl_msg_t)){
		if(ctrl_msg.is_receive){
			xMessageBufferSendCompletedFromISR(ctrl_msg.buffer,
											   &xHigherPriorityTaskWoken);
		}
		else {
			xMessageBufferReceiveCompletedFromISR(ctrl_msg.buffer,
											   &xHigherPriorityTaskWoken);
		}
	}
    /* Normal FreeRTOS yield from interrupt semantics, where
       xHigherPriorityTaskWoken is initialzed to pdFALSE and will then get set
       to pdTRUE if the interrupt safe API unblocks a task that has a priority
       above that of the currently executing task. */
    portYIELD_FROM_ISR(xHigherPriorityTaskWoken);
}

And below is function used for sending the events:

void vGenerateRemoteInterrupt(void * xUpdatedMessageBuffer, int is_receive)
{
    MessageBufferHandle_t xUpdatedBuffer =
        (MessageBufferHandle_t)xUpdatedMessageBuffer;
    amp_ctrl_msg_t ctrl_msg;

	__DSB();
    if (xUpdatedBuffer != xtx_ctrl_buf && xUpdatedBuffer != xrx_ctrl_buf)
    {
		ctrl_msg.buffer = xUpdatedBuffer;
		ctrl_msg.is_receive = is_receive;
		if(xMessageBufferSend(xtx_ctrl_buf, &ctrl_msg, sizeof(amp_ctrl_msg_t), 0) != sizeof(amp_ctrl_msg_t)){
			/* Control message buffer overflow */
			Error_Handler();
		}

		/* Take the HW Semaphore with Process1 ID  */
		if(HAL_HSEM_Take(HSEM_TX_ID, HSEM_PROCESS) == HAL_OK)
		{
			/* Release the HW Semaphore */
			HAL_HSEM_Release(HSEM_TX_ID, HSEM_PROCESS);
		}
    }
}

Here we can detect if "control" message buffer overflows and call some error handler.

2.3 Modifying FreeRTOSConfig.h

In the FreeRTOSConfig.h we need to overwrite some macro definitions. There is also additional C macro that is checked in cm_ipc.c to make sure this modification is done properly:

/* USER CODE BEGIN Defines */
/* Section where parameter definitions can be added (for instance, to override default ones in FreeRTOS.h) */
void vGenerateRemoteInterrupt(void * xUpdatedMessageBuffer, int is_receive);
#define sbSEND_COMPLETE_FROM_ISR sbSEND_COMPLETED_FROM_ISR
#define sbSEND_COMPLETED( pxStreamBuffer ) vGenerateRemoteInterrupt( pxStreamBuffer, 1 )
#define sbSEND_COMPLETED_FROM_ISR( pxStreamBuffer, pxHigherPriorityTaskWoken ) vGenerateRemoteInterrupt( pxStreamBuffer, 1 )
#define sbRECEIVE_COMPLETED( pxStreamBuffer ) vGenerateRemoteInterrupt( pxStreamBuffer, 0 )
#define sbRECEIVE_COMPLETED_FROM_ISR( pxStreamBuffer, pxHigherPriorityTaskWoken )  vGenerateRemoteInterrupt( pxStreamBuffer, 0 )
#define IPC_CHECK_sbSEND_COMPLETED
/* USER CODE END Defines */

Note: there seems to be slight typo or inconsistent naming where sbSEND_COMPLETE_FROM_ISR is used instead of sbSEND_COMPLETED_FROM_ISR in FreeRTOS code.
Otherwise the FreeRTOS will try to pass notifications to local threads and some nasty things can happen. (In some cases it can cause e.g. CM7 to execute CM4 tasks). This why there is also the IPC_CHECK_sbSEND_COMPLETED macro, which checks that those FreeRTOS callbacks are overwritten. This can be useful when copying cm_ipc.c/.h to new project. If the macro is not defined, the cm_ipc.c will throw a compile error.

2.4 Modifying linkerscript

Modification to linkerscript is quite easy. For both cores, we need to make sure we modify the linkerscript ending with "_FLASH.ld" and we add the ".shared_ram" section after interrupt vector:

  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  .shared_ram (NOLOAD) :
  {
  	*(.shared_ram)
  } >RAM_D3

For Cortex-M4 we need to define the D3 SRAM (RAM_D3 region):

MEMORY
{
FLASH (rx)     : ORIGIN = 0x08100000, LENGTH = 1024K
RAM (xrw)      : ORIGIN = 0x10000000, LENGTH = 288K
RAM_D3 (xrw)   : ORIGIN = 0x38000000, LENGTH = 64K
}

In the STM32CubeIDE in Window > Show view > Build Analyzer, we can check if the data are placed at 0x38000000.

2.5 Initializing buffers and exchanging messages

In the cm_ipc.c/.h files there are function ipc_init and ipc_start. ipc_init initializes the message buffers and ipc_start enables the HSEM interrupts. ipc_init should be called first and ipc_start should be called when FreeRTOS is already running.
In our example the buffers are initialized by CM7 and CM4 is released after the ipc_init is executed.
For sending and receiving data there are ipc_send and ipc_receive functions. Both take the channels number as a first parameter.
In both cases we should check for the return value to see if we received/send any data successfully. In the example this is done only for receive part.

3. Adding more channels and improving the example

When using multiple channels, it might be good idea to get rid of the "control" message buffer. This can allow us to send message from interrupts and tasks in parallel, assuming each interrupt/task uses separate channel.
We could do this by dedicating more 4 HSEM semaphores to each bidirectional channel, but we would soon run out of the available semaphores.
We can "mimic" the HSEM notification events by using two 32-bit values stored in SRAM. Let's call them "remote_toggle" and "local_toggle". Each event then has its bit in both registers, "local_toggle" is modified by current CPU and "remote_toggle" by the other CPU (so on the other CPU those registers are swapped). We can send event by toggling a bit on the "local_toggle". On remote end we can detect incoming events by XORing "remote_toggle" and "local_toggle". Finally the other CPU acknowledges the event by toggling the "remote_toggle".
When "toggling" the bits, we should make sure only one thread/interrupt accesses the "local_toggle". Since we want to send messages also from interrupt context, the only safe way is to disable interrupts globally. However the toggle procedure itself should be short (3 instructions excluding the enable/disable of interrupts).
In demanding real-time application, we can disable globally only the low-priority interrupts via priority masking,. However, in that case we can't send the messages from those interrupts that are not disabled/masked, or we would need to implement separate channel for such interrupts (using dedicated HSEM semaphores).
This mechanism is implemented in the third example and additional function ipc_sendmsg_irq is added to send messages from interrupt routines.

4. Examples

There are 3 examples in the package:

FreeRTOS_MessageBuffer - simple example implementing only one bi-directional buffer between cores
FreeRTOS_MultipleMessageBuffers - advanced example using multiple message buffers implementing "control" message buffer
FreeRTOS_MultipleMessageBuffers2 - improved example with ability to send messages from interrupt routines using the "toggle" mechanism

All examples are made for STM32H745-Discovery board, but they could be easily ported to other boards.
Download examples here