Best/easiest way to share data between cores in STM32H745

RMcCa · ‎2019-10-23

I am working with a dual core H7 for the first time and and very confused about how to share data between the two cores. I would like to use the M4 as a slave/background processor for the M7 where the M7 can alert the M4 when there is new data to process and then the M4 signals back when finished.

I can see how to use a semaphore to signal back and forth to synchronize the reading and writing - that part is fairly clear, but what is the best way to declare & use two common buffers? I tried adding the resource manager to both cores in MX-Cube, but it is incomplete and tries to include header files to doesn't exist anywhere in the repository and I couldn't find any useful documentation about the resource manager on the web.

I tried doing something simple and created shared.c & shared.h in the Common directory that contain just the variable declaration for the buffers and then added #include "shared.h" to both main.c, similar to how global variables can be shared between .c files in a single project.. The code compiles and the compiler or linker don't complain when I alter one the variables declared in shared.c in either main.c. Does that mean it might work? If so, then all I'll need to do is add is the semaphore interrupts for both cores.

berendi · ‎2019-10-23

No, I'm afraid just putting variable definitions in a common .c source file won't work.

Code and data for both cores are linked separately, and they are put into separate memory areas for each core, even if they are defined in a common source file. CM7 variables are placed in DTCM RAM at 0x20000000, CM4 variables go into D2 SRAM at 0x30000000 (aliased to 0x10000000).

You have to explicitly specify the memory address for the shared data structure(s), and either do some linker trickery to have them placed at the exact same address for both cores, or simply access them through a fixed pointer.

struct shared_data {
    /* shared data goes here */
};
 
volatile struct shared_data * const shared_ptr = (struct shared_data *)0x30040000;

Chapter 2.4 of the reference manual recommends using AHB SRAM3 at 0x30040000 as shared memory between the two cores, so I've used that address, but it's not a hard rule, as there is an example projects that uses SRAM4 at 0x38001000 for this purpose.

You should ensure that the memory occupied by shared data is not touched by the linker, i.e. it doesn't appear in the linker definition for either core.

RMcCa · ‎2019-10-23

Thank you for the concise answer. I was in the process of trying to add a common segment to both linker scripts, but it wasn't working right. I'll use your answer.

RMcCa · ‎2019-10-23

I hope STM is working on expanding the documentation and mx-cube examples for the h7 series. I know these chips are new and very complicated, but the mx-cube example code has an incomplete & untested feel to it & the integration in Eclipse is very confusing. I'm the sort of person that likes to dig into things and figure them out myself, but after a whole day of digging thru data sheets and importing examples to read, i was no less confused.

savvn001 · ‎2019-12-06

Hi, any updates on this? I'm in the same boat and am pretty much stumped when it comes to using this dual core MCU. CubeMX examples are very limited and there's not really any documentation or other examples.

TSimo.1 · ‎2021-01-15

In case anyone is still interested in this topic, this is what I did, and it seems to work. Many thanks to berendi for the recommendation, which I followed.

I wanted to share data between the 2 cores, CM4 and CM7, in an STM32H745, using the STM32CubeIDE to program a NUCLEO-H745ZI-Q board.

First, I defined the following struct and pointer in main.c for each of the cores. I used the same names in main.c for both CM4 and CM7. This made it easier for me to keep everything straight in my head. Note that I used the SRAM4 address at 0x38001000.

/* USER CODE BEGIN PV */
 
// inter-core buffers
struct shared_data
{
	uint8_t sts_4to7; // status: 0 = empty, 1 = has data, 2 = locked (CM4-CM7)
	uint8_t sts_7to4; // status: 0 = empty, 1 = has data, 2 = locked (CM7-CM4)
	uint32_t M4toM7[64]; // 256 bytes from CM4 to CM7
	uint32_t M7toM4[64]; // 256 bytes from CM7 to CM4
};
 
// pointer to shared_data struct (inter-core buffers and status)
volatile struct shared_data * const xfr_ptr = (struct shared_data *)0x38001000;
 
/* USER CODE END PV */

Next, to test the concept, I used the virtual UART connected to CM4. From a computer, I sent data to CM4, put the data into the CM4 to CM7 buffer, retrieved it from the CM7 to CM4 buffer, and printed it to the computer.

For this to work, I needed code in CM7 to copy the data from the CM4 to CM7 buffer to the CM7 to CM4 buffer. I wrote the following functions, which can be used in other applications.

/* USER CODE BEGIN PFP */
 
uint32_t * get_M4(); // get data from M4 to M7
 
void put_M7(uint32_t buffer[64]); // put data from M7 to M4
 
/* USER CODE END PFP */
 
/* USER CODE BEGIN 4 */
 
uint32_t * get_M4() // get data from M4 to M7 buffer
{
	static uint32_t buffer[64]; // buffer to receive data
	if (xfr_ptr->sts_4to7 == 1) // if M4 to M7 buffer has data
	{
		xfr_ptr->sts_4to7 = 2; // lock the M4 to M7 buffer
		for(int n = 0; n < 64; n++)
		{
			buffer[n] = xfr_ptr->M4toM7[n]; // transfer data
			xfr_ptr->M4toM7[n] = 0; // clear M4 to M7 buffer
		}
		xfr_ptr->sts_4to7 = 0; // M4 to M7 buffer is empty
	}
	return buffer; // return the buffer (pointer)
}
 
void put_M7(uint32_t buffer[64]) // send data from M7 to M4
{
	if (xfr_ptr->sts_7to4 == 0) // if M7 to M4 buffer is empty
	{
		xfr_ptr->sts_7to4 = 2; // lock the M7 to M4 buffer
		for(int n = 0; n < 64; n++)
		{
			xfr_ptr->M7toM4[n] = buffer[n]; // transfer data
			buffer[n] = 0; // clear M7 to M4 buffer
		}
		xfr_ptr->sts_7to4 = 1; // M7 to M4 buffer has data
	}
}
 
/* USER CODE END 4 */

The get_M4() function checks to see if the M4 to M7 buffer has data, changes the status to locked, retrieves the data, clears the buffer, and changes the status to empty.

The put_M7 function checks to see if the M7 to M4 buffer is empty, changes the status to locked, stores the data, clears the incoming buffer, and changes the status to has data.

Finally, the following code in CM7 uses these 2 functions to retrieve the data from CM4 and return it to CM4.

/* USER CODE BEGIN 1 */
 
	uint32_t * xfr_data; // pointer to transfer data
	uint8_t sts_xfr = 0; // xfr_data status: 0 = empty, 1 = has data
 
	//initialize inter-core status pointers
	xfr_ptr->sts_4to7 = 0;
	xfr_ptr->sts_7to4 = 0;
 
  /* USER CODE END 1 */
 
  /* USER CODE BEGIN WHILE */
  while (1)
  {
    /* USER CODE END WHILE */
 
    /* USER CODE BEGIN 3 */
 
	  // if xfr_data buffer is empty and M4 to M7 buffer has data
	  if (sts_xfr == 0 && xfr_ptr->sts_4to7 == 1)
	  {
		  xfr_data = get_M4(); // get data sent from M4 to M7
		  sts_xfr = 1; // set xfr_data status to has data
	  }
	  // if transfer data buffer has data and M7 to M4 buffer is empty
	  if (sts_xfr == 1 && xfr_ptr->sts_7to4 == 0)
	  {
		  put_M7(xfr_data); // copy data to M7 to M4 buffer
		  sts_xfr = 0; // set xfr_data status to empty
	  }
  }
 
  /* USER CODE END 3 */

The main loop checks to see if the CM4 to CM7 buffer has data and the xfr_data buffer is empty. If so, the get_M4() function retrieves the data to the xfr_data buffer and sets the xfr_data buffer status to has data. Next, if the xfr_data buffer has data and the CM7 to CM4 buffer is empty, the put_M7 function puts the data from the xfr_data buffer into the CM7 to CM4 buffer and sets the xfr_data buffer status to empty.

I’m not the most experienced C programmer, but this is simple and it works. I like simple.

Please comment if I have made any mistakes or if you have a better way of moving data from one core to the other.

Tonatiuh · ‎2021-01-17

Hi , i use the next code, with "placement new", with board STM32H745ZI.

#include <new>
 
//BANK SRAM4
#define MEMORY_SHARED_INIT_ADDRESS 0x38000000
 
volatile void *pointerShared = (void *) MEMORY_SHARED_INIT_ADDRESS;
 
MemoryShared *memoryShared = new ( (void *) pointerShared) MemoryShared();
 
MemoryShared *MemoryShared::GetInstance() {
	return memoryShared;
}

MemoryShared.hpp:

class MemoryShared {
	private:
	Control control;
 
	public:
	static MemoryShared *GetInstance();
 
	MemoryShared() {
 
	}
 
	~MemoryShared() {
 
	}
 
	Control &getControl() {
		return control;
	}
};

Finally in main.cpp (CM7,CM4), i use:

MemoryShared *shared = MemoryShared::GetInstance();

All objects and members from MemoryShared can be shared, while is monolithic data like buffer arrays const, const instances, etc.

K.Ata15 · ‎2021-03-09

Hey,

uint8_t sts_4to7; // status: 0 = empty, 1 = has data, 2 = locked (CM4-CM7)

uint8_t sts_7to4; // status: 0 = empty, 1 = has data, 2 = locked (CM7-CM4)

You are not actually locking anything with those flags, as each core will need to connect to the SRAM to do the read/write. The sync mechanism

should be via HSEM, as it is separate hardware connection to the SRAM comms interface.

It is not very well explained in the ST docs, but imagine the SRAM is a single chip, connected via SPI as slave, and you have two masters on the bus (the cores).

Only one can talk to it at any given time, this is why the HSEM lock/unlock is needed.

TSimo.1 · ‎2021-03-11

Thanks for your input. I understand what you are saying about the lock status not being foolproof. I found some information and examples for HSEM, which I will look at when I have time.

The way I am using the shared buffers, locking is not necessary. I use the buffers to move data in only one direction. For example, if I want to send data from CM4 to CM7, I put data in the CM4-CM7 buffer only if the status is 0 (empty). After copying the data to the buffer, I change the status to 1 (has data). If the get function in CM-7 sees that the status is 1 (I should use an interrupt for this), I get the data and change the status back to 0. If I want to send data from CM-7 to CM-4, I use the same process but a different buffer (CM7-CM4). Therefore, changing the status to 2 (locked) is an unnecessary step in my functions.

This may not be the best way of passing data between cores, but it seems to be foolproof and simple.

K.Ata15 · ‎2021-03-12

Hey,

If it works for your project, great. I have also found that using the HSEM is not always needed while sharing an UART. I guess there is basic logic to protect two masters accessing bus domains at the same time. Maybe HSEM is needed to ensure no data corruption, IDK.

K.