cancel
Showing results for 
Search instead for 
Did you mean: 

Configure Cortex-M7 Settings (Cache, TCM, MPU) for STM32H747 Dual-Core Project

KAnahar
Associate III

Hello everyone,

I’m working on an STM32H747IIT6 based project that uses both Cortex-M7 and Cortex-M4 cores. Under Pinout & Configuration > System Core > Cortex_M7 (and similarly Cortex_M4) in STM32CubeIDE, there are several options such as:

  • CPU I-Cache / CPU D-Cache
  • MPU (Memory Protection Unit)
  • TCM (ITCM / DTCM) Enable
  • Background Region Privileged Access
  • MPU Instruction Access
  • Cache and MPU Region Settings

I’d like to understand the best practices for configuring these settings, especially when running multiple middleware components like FreeRTOS, LwIP (Ethernet), USB, I2C, and an SD card interface. Here are some questions I have:

  1. Cache Enable/Disable: Under which circumstances should I enable both Instruction Cache and Data Cache? What are the typical pitfalls (e.g., DMA conflicts) when D-Cache is enabled?

  2. MPU Configuration: If I plan to use the MPU, how do I decide which memory regions need different cache policies or access permissions? Are there any recommended templates or application notes for setting up MPU regions on the STM32H747?

  3. TCM (ITCM / DTCM): What is the practical approach to using tightly coupled memory on the STM32H7? For instance, should I place time-critical code in ITCM, or should I leave everything in AXI SRAM?

  4. Background Region Privileged Access: What scenarios require enabling this setting, and how does it affect system performance or safety?

  5. DMA Considerations: When using Ethernet, USB, or SD card (all DMA-driven), how should I handle cache maintenance (clean/invalidate) to avoid data corruption or stale data in buffers?

  6. Dual-Core Synchronization: Since I also have a Cortex-M4 core, are there any special considerations for cache/MPU setup that apply to the M4 domain or shared memory regions?

 

Any guidance, best practices, or references (e.g., ST Application Notes, example projects) would be greatly appreciated. If you have specific example linker scripts, MPU configurations, or sample code that demonstrate these setups, I’d love to check them out.

Thanks in advance for your help!

KAnahar_0-1740493823143.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
Saket_Om
ST Employee

Hello @KAnahar 

I-caches can be enabled systematically to achieve the best system performance, but this results in variation in execution time depending on whether you have hits in the cache or not. If code needs to be deterministic or have the minimum possible latency, it's better to place it in the TCM, which has a fast and fixed response time. Routines that are frequently called can also be put in the TCM. The same applies to D-cache with the same effect on determinism. Thus, critical data or data often used (C heap/C stack) should be in the DTCM for optimal performance and/or determinism.

On the H7, it is required to carefully program the MPU (See the AN4838) as the CPU has caches and is also capable of performing speculative accesses. Usually, we use a background region with non-executable/strongly ordered attributes, and on top of it, we define the regions corresponding to memories and peripherals. When a region is used as a buffer for data exchange (USB, Ethernet, etc.), it should be declared as shareable in the MPU. This will automatically disable the cache in this region and avoid cache maintenance operations. It is, of course, possible to keep it as non-shareable, but this implies manual cache maintenance operations with synchronization barriers.

On dual-core systems, the issue is the same. It is better to have shareable attributes on shared regions. If not, even though the M4 doesn't have an integrated cache, the use of synchronization barriers is required to ensure that the data is effectively flushed from the write buffer to the shared memory.

If your question is answered, please close this topic by clicking "Accept as Solution".

Thanks
Omar

View solution in original post

2 REPLIES 2
Saket_Om
ST Employee

Hello @KAnahar 

I-caches can be enabled systematically to achieve the best system performance, but this results in variation in execution time depending on whether you have hits in the cache or not. If code needs to be deterministic or have the minimum possible latency, it's better to place it in the TCM, which has a fast and fixed response time. Routines that are frequently called can also be put in the TCM. The same applies to D-cache with the same effect on determinism. Thus, critical data or data often used (C heap/C stack) should be in the DTCM for optimal performance and/or determinism.

On the H7, it is required to carefully program the MPU (See the AN4838) as the CPU has caches and is also capable of performing speculative accesses. Usually, we use a background region with non-executable/strongly ordered attributes, and on top of it, we define the regions corresponding to memories and peripherals. When a region is used as a buffer for data exchange (USB, Ethernet, etc.), it should be declared as shareable in the MPU. This will automatically disable the cache in this region and avoid cache maintenance operations. It is, of course, possible to keep it as non-shareable, but this implies manual cache maintenance operations with synchronization barriers.

On dual-core systems, the issue is the same. It is better to have shareable attributes on shared regions. If not, even though the M4 doesn't have an integrated cache, the use of synchronization barriers is required to ensure that the data is effectively flushed from the write buffer to the shared memory.

If your question is answered, please close this topic by clicking "Accept as Solution".

Thanks
Omar
mƎALLEm
ST Employee

Hello,

See also: 

AN4839 "Level 1 cache on STM32F7 Series and STM32H7 Series"

AN4891 "STM32H72x, STM32H73x, and single-core STM32H74x/75x system architecture and performance" (it applies also to Dual core products)

AN5557 "STM32H745/755 and STM32H747/757 lines dual-core architecture"

AN5617 "STM32H745/755 and STM32H747/757 lines inter-processor communications"

AN5361 "Getting started with projects based on dual-core STM32H7 microcontrollers in STM32CubeIDE"

See also this thread for the default MPU configuration set for CM7 in CubeMx (background configuration)

 

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.