cancel
Showing results for 
Search instead for 
Did you mean: 

How to give execution permission to the M4 core for DDR RAM addresses?

DGree.2
Associate III

I have a large firmware that the MCU must execute. It will not fit into SRAM1-4, so I am trying to move the entire code (or a portion of it) to DDR RAM. I have no problems reading and writing DDR memory from the MCU, as long as it's just data, but for some reason when the PC jumps to an address in DDR, execution halts (or we get a HardFault, depending on if a debugger is attached to the MCU). If I move this section to SRAM, then everything works fine (except that my MCU firmware is limited to 384kB).

I have set the linker so that the MCU code's entry point and Reset_Handler is in SRAM, and one particular linker section is in DDR RAM (0xC5000000 for example). I've verified the M4's elf file contains the proper sections.

When I run the program, the A7 MPU properly loads the raw data of the M4's code into 0xC5000000, and I can confirm with a debugger that these addresses contain valid Cortex-M4 instructions.

After that, the A7 releases the MCU HOLD on the co-processor. With a debugger attached to the M4 core, I can see that the MCU starts at its Reset_Handler (in SRAM1) and properly executes code, until it reaches a branch instruction to an address in DDR RAM. At that point, the MCU hangs. If I do this without a debugger attached, there is a hard fault on the MCU. Even if I manually change the instruction at the jump point to a NOP, it will cause the same effect on the MCU. For some reason it refuses to execute code in DDR RAM.

I have configured the MMU so that this section of DDR RAM has EXECUTE permissions, as well as RW for priv_t and user_t access types (as is done in CMSIS core_ca.h). I am not sure if making it Cacheable or not is important (since the MCU does not see the cache), but I have tried both ways. I am using 1MB sections, not 64kB or 4kB pages. I've tried various section types that are defined in CMSIS core_ca.h, such as section_normal (which has execute permissions and is read/writable). Nothing seems to work. Example:

#define M4CODE_BASE 0xC5000000
#define M4CODE_SIZE 0x01000000
MMU_TTSection(TTB_BASE, M4CODE_BASE, M4CODE_SIZE / 0x100000, Sect_Normal);

I suspect that the TZC would be the peripheral that controls the MCU's permission to execute code in DDR address space, since it filters IO to/from the DDR controller. So I've configured the TZC to allow all R/W access to all addresses for all peripherals. I'm not 100% confident I've done this correctly. The TZC can allow/deny read or write accesses, but it doesn't have any options for Execute. Am I missing something? Here is how I've done it:

// Disable Trust Zone
RCC->TZCR = 0;
 
// Allow read/write for all securable peripherals (top 6 bits are reserved)
TZPC->DECPROT0 = 0x03FFFFFF;
 
// TZC AXI port 1 and 2 clocks enable
RCC->MP_APB5ENSETR = RCC_MP_APB5ENSETR_TZC1EN;
RCC->MP_APB5ENSETR = RCC_MP_APB5ENSETR_TZC2EN;
 
// Read/write enable on all nsaid
TZC->REG_ID_ACCESSO = 0xFFFFFFFF;
 
// bit 30 / 31 => Secure Global Enable : write/read
// bit 0 / 1 => Region Enable for filter 0/1
TZC->REG_ATTRIBUTESO = TZC->REG_ATTRIBUTESO | (1 << 0) | (1 << 1) | (1 << 30) | (1 << 31);
 
// Enable Filter 0 and 1
TZC->GATE_KEEPER = TZC->GATE_KEEPER | (1 << 0) | (1 << 1);

1 ACCEPTED SOLUTION

Accepted Solutions
DGree.2
Associate III

Aha! I figured it out.

The Cortex-M4 core has an MPU which needs configured in order to allow execution from DDR RAM. I setup a region with execute permissions (XN bit in RASR set to 0), and enabled the MPU.

Interestingly, it seems as if the MPU does not need to be configured for the SRAM1-4 regions. Although, I am not finding much documentation about the MP1-M4 MPU (it could be due to "MPU" being difficult to search for, since it more commonly refers to “Micro-Processor Unit�?, not the “Memory Protection Unit�?). It seems to behave like other Cortex-M4 MCUs (https://www.st.com/resource/en/application_note/an4838-managing-memory-protection-unit-in-stm32-mcus-stmicroelectronics.pdf).

The latest version of MP1Cube contains HAL_MPU_* functions which make it easier to set up.

It also seems that the MMU does not need to be configured for this region, the MPU alone suffices to allow the M4 to execute code from 0xC0000000-0xDFFFFFFF. The TZC does need to be configured so the M4 has read access, of course.

View solution in original post

2 REPLIES 2
DGree.2
Associate III

Aha! I figured it out.

The Cortex-M4 core has an MPU which needs configured in order to allow execution from DDR RAM. I setup a region with execute permissions (XN bit in RASR set to 0), and enabled the MPU.

Interestingly, it seems as if the MPU does not need to be configured for the SRAM1-4 regions. Although, I am not finding much documentation about the MP1-M4 MPU (it could be due to "MPU" being difficult to search for, since it more commonly refers to “Micro-Processor Unit�?, not the “Memory Protection Unit�?). It seems to behave like other Cortex-M4 MCUs (https://www.st.com/resource/en/application_note/an4838-managing-memory-protection-unit-in-stm32-mcus-stmicroelectronics.pdf).

The latest version of MP1Cube contains HAL_MPU_* functions which make it easier to set up.

It also seems that the MMU does not need to be configured for this region, the MPU alone suffices to allow the M4 to execute code from 0xC0000000-0xDFFFFFFF. The TZC does need to be configured so the M4 has read access, of course.

PatrickF
ST Employee

Hi @DGree.2​ ,

glad to see you find a solution to your issue.

You could find addition information about Cortex-M4 MPU in PM0214 STM32 Cortex®-M4 MCUs and MPUs programming manual.

Cortex-A7 MMU (MMU is inside the core) only affect Cortex-A7 accesses and does not impact any other master accesses (e.g. Cortex-M4).

Anyway, I need to emphasize that, as the Cortex-M4 has no caches and that the path to DDR is not optimized, it's access to DDR for code execution will impact a lot the performance of the Cortex-M4 itself (probably by a factor of 10 or more) but could also impact the Linux performances due to many 32-bit single accesses interleaved with existing traffic (DDR are optimized for large chunk of data).

Regards.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.