About the SRAM4 access size from M4 and A7 cores

mikesponk2 · ‎2023-05-26

Many thanks for your help!

Currently, we are using the SRAM4 memory as a shared RAM mapped between the M4 and the A7 cores.

We have a Linux application that writes full 32bit values from the A7 core, as in the following snippet:

*(volatile uint32_t *)p_sram4_mapped = 0x1441cddc; // (core A7)

*(volatile uint32_t *)p_sram4_mapped = 0xa55a3663; // (core A7)

...

the M4 core application that uses the same shared memory may read from it a 32bit word comprised of 16bit values coming from different A7 write operations, e.g. reading this way:

const uint32_t read_value = *(volatile uint32_t *)p_sram4_mapped; // (core M4)

read_value could be OK, e.g. 0x1441cddc, 0xa55a3663

but read_value could be also a 16-bit hybrid, e.g. 0x14413663, 0xa55acddc

The same is true also in the vice versa case when M4 writes and A7 reads.

I wonder if there is a way to make the read/write operations from the A7/M4 cores atomic on a 32-bit size instead of a 16-bit size.

Many thanks again for your help!

PatrickF · ‎2023-05-26

Hi @mikesponk2 ,

as SRAMs are 32-bits and busses are either 32-bits or larger, there should be no issue for a 32-bit value.

The fact that your read 'hybrid' values are probably due to use of pointer value which is not aligned to a multiple of 4.

I think your way of working is maybe also not much robust and might be prone to future issues hard to debug.

We recommend to use RPmsg/OpenAMP which use IPCC interrupts to synchronize cores data sharing. There is some examples provided.

Please have a look to https://wiki.st.com/stm32mpu/wiki/Exchanging_buffers_with_the_coprocessor

Regards.

In order to give better visibility on the answered topics, please click on 'Select as Best' on the reply which solved your issue or answered your question. See also 'Best Answers'

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

PatrickF · ‎2023-05-26

Hi @mikesponk2 ,

as SRAMs are 32-bits and busses are either 32-bits or larger, there should be no issue for a 32-bit value.

The fact that your read 'hybrid' values are probably due to use of pointer value which is not aligned to a multiple of 4.

I think your way of working is maybe also not much robust and might be prone to future issues hard to debug.

We recommend to use RPmsg/OpenAMP which use IPCC interrupts to synchronize cores data sharing. There is some examples provided.

Please have a look to https://wiki.st.com/stm32mpu/wiki/Exchanging_buffers_with_the_coprocessor

Regards.

In order to give better visibility on the answered topics, please click on 'Select as Best' on the reply which solved your issue or answered your question. See also 'Best Answers'

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

mikesponk2 · ‎2023-05-26

Hi, PatrickF!

Many thanks for your reply!

I understand the issue with the pointer alignment; I am pretty sure that the pointer value I used in my simple test is aligned to a multiple of 4: it points to the very start of the SRAM4 area, i.e. 0x10050000.

I am currently porting a legacy application, where a GUI thread (that runs now in A7) interacts with a real-time thread (that runs now on M4); they were originally running in the same core, so I am trying to replicate the same structure as in the original application just splitting the application between the STM32MP1 cores.

In the original application, the thread communicate using a structure made of full 32-bit words, where the access was guaranteed atomic for each element of the structure, and I would like to keep the same on the new platform.

Furthermore, I already looked at OpenAMP, but using it would require rewriting some of the business logic of the application, while right now I would like to just make things happen with the same structure as the original.

So, if the pointer value is 4 bytes aligned, do you have any idea about the reasons for the bus access operations to be (apparently) 16-bit wide instead?

Many thanks again!

Michele

PatrickF · ‎2023-05-26

Hi,

the issue must come from Linux as I'm 100% sure that neither Cortex-M4 nor AHB bus does not break an LDM 32-bit read instruction at an aligned address.

On Linux, did you have built a custom driver or are your using tricks on user space (like mmap) to access absolute addresses (which is usually forbidden and could explain some strange behaviors) ?

Btw, have you looked of the generated assembly code on M4 and A7 side ? Maybe some weird compiler optimizations break the 32-bit access in two instructions.

Regards.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

mikesponk2 · ‎2023-05-26

Hi, PatrickF!

Many thanks for your help and good ideas!

I admit that I am using mmap to access absolute addresses from user space.

I am not sure how this could affect the R/W operations, anyway I'll take a deep look into it.

I'll take also a look at the generated code on M4 and A7 core, searching for compiler weirdness.

mikesponk2 · ‎2023-05-26

Hi, PatrickF!

(edited: typos)

Many thanks for your suggestions!

You were right: the A7 compiler didn't assume the pointer was aligned just from the base pointer value itself: I have to set the object pointed explicitly as "aligned(4)" so that the memory accesses are full 32-bit.

I checked the generated assembly and saw that on the A7 the opcodes of the memory access instructions were annotated as "@ unaligned".

I suppose this means that the ldr instruction is/may be broken into smaller accesses to guarantee the addresses' alignment (as stated e.g. in https://medium.com/@iLevex/the-curious-case-of-unaligned-access-on-arm-5dd0ebe24965).

So, I added to the pointed structure definition the aligned(4) directive, and this way the access instruction is no more marked as unaligned in the generated assembly code.

Now the opcodes generated are e.g. simple ldr instructions: I think that I will be able to access my 32-bit words from both cores without glitches now:

typedef struct _TipoStructInterProcessCommunication
 
{
 
   uint32_t ui32_num_of_ipc_shared_areas;
 
   volatile uint32_t ui32_sync_word_dsp;
 
   volatile uint32_t ui32_sync_word_arm;
 
   //...
 
   uint32_t ui32_first_byte_after_ipc_areas_desc;
 
} __attribute__((packed, aligned(4))) TipoStructInterProcessCommunication;
 
 
@ ../src/shared_mem/ipc.c:368:            ui32_other_side_act_value=((volatile TipoStructInterProcessCommunication* )p_ipc)->ui32_sync_word_dsp;
 
   .loc 1 368 38 view .LVU72
 
   ldr   r4, [r3, #4]   @ ui32_other_side_act_value, MEM[(volatile struct TipoStructInterProcessCommunication *)p_ipc.11_6].ui32_sync_word_dsp

Many thanks again for your help!

mikesponk2 · ‎2023-06-05

I can confirm that using the __attribute__((packed, aligned(4))) on the shared structure, 32-bit shared access from both cores works as expected, and no hybrid words are read.

Many thanks again for your help!