on
2025-12-15
2:24 AM
- edited on
2025-12-15
2:25 AM
by
Laurids_PETERSE
This article examines cache coherency issues specific to STM32 high-performance microcontrollers, describes the reasons behind their occurrence, and outlines how the memory protection unit (MPU) serves as the primary tool for configuring memory attributes to mitigate these problems. Practical implementation examples are also provided.
High-performance STM32 MCUs, which includes STM32F7, STM32H7, STM32N6, offer substantial performance gains, largely attributed to their integrated level 1 (L1) ICACHE and DCACHE. These caches minimize memory latency by storing frequently used instruction and data close to the CPU. However, this performance enhancement introduces a critical challenge known as cache coherency, especially in systems where multiple bus masters, such as the CPU and DMA controllers, access the same memory regions.
Figure 1. STM32H7Rx/7Sx system architecture, AN6062
Cache coherency issues arise when the view of memory differs between masters.
Failure to manage coherency leads to data corruption, unpredictable behavior, and notoriously difficult-to-debug errors.
The MPU is a crucial component of the high-performance core. While its primary role is memory protection (preventing unauthorized access between memory regions), it is also the mechanism used to define memory attributes for different regions of the memory map. These attributes dictate how the memory system, including the caches, interacts with that region.
Key MPU attributes for cache coherency management:
By configuring MPU regions covering shared memory areas (like DMA buffers), developers can dictate the caching behavior and enforce coherency.
Below are the primary strategies, along with examples based on STM32H7 assuming txBuffer and rxBuffer are located within a specific SRAM region (for example, starting at 0x30000000) and aligned to 32 bytes.
This is the simplest and safest approach by preventing the CPU from caching the shared region altogether.
Configure the MPU region containing the DMA buffers as non-cacheable (for example, normal non-cacheable or, often preferred for peripherals/shared buffers, device memory). All CPU accesses bypass the DCACHE, ensuring the CPU and DMA always sees the same data in SRAM.
For the MPU configuration device memory (C=0, B=1, TEX=0b000) provides stricter ordering and is often suitable for peripheral or shared buffers. Bufferable allows writes to complete faster from the CPU’s perspective.
#include "stm32h7xx_hal.h" // Include appropriate HAL header
// Assume txBuffer/rxBuffer are within 1 KB starting at 0x30000000
#define SHARED_MEM_BASE ((uint32_t)0x30000000) // Example base address
#define SHARED_MEM_SIZE MPU_REGION_SIZE_1KB // Example size
void MPU_Config_Device_Bufferable(void) {
MPU_Region_InitTypeDef MPU_InitStruct = {0};
/* Disable MPU before configuration */
HAL_MPU_Disable();
/* Configure the MPU region as Device Bufferable */
MPU_InitStruct.Enable = MPU_REGION_ENABLE;
MPU_InitStruct.Number = MPU_REGION_NUMBER0; // Use an available region number
MPU_InitStruct.BaseAddress = SHARED_MEM_BASE;
MPU_InitStruct.Size = SHARED_MEM_SIZE;
MPU_InitStruct.SubRegionDisable = 0x00; // No subregions disabled
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS; // Full R/W access
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_ENABLE; // Or DISABLE if no code executes here
/* Memory Attributes: Device Bufferable */
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0; // TEX[2:0] = 000
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE; // C = 0
MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE; // B = 1
MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE; // S = 1 (Recommended for Device)
HAL_MPU_ConfigRegion(&MPU_InitStruct);
/* Enable the MPU */
HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT); // Or MPU_HFNMI_PRIVDEF_UNLOCKED
}
Pros:
Cons:
It is a compromise where CPU writes update both cache and main memory relatively quick.
To do this, an MPU region is configured as cacheable with a write-through policy (C=1, S=0,B=0, TEX=0b000). This solves the CPU-Write/DMA-Read problem without software cleaning. However, the DMA-Write/CPU-Read problem remains, requiring software invalidation.
MPU configuration for the write-though policy:
#include "stm32h7xx_hal.h"
#define SHARED_MEM_BASE ((uint32_t)0x30000000)
#define SHARED_MEM_SIZE MPU_REGION_SIZE_1KB
void MPU_Config_Cacheable_WT(void) {
MPU_Region_InitTypeDef MPU_InitStruct = {0};
HAL_MPU_Disable();
/* Configure the MPU region as Cacheable Write-Through */
MPU_InitStruct.Enable = MPU_REGION_ENABLE;
MPU_InitStruct.Number = MPU_REGION_NUMBER1; // Use a different region number
MPU_InitStruct.BaseAddress = SHARED_MEM_BASE;
MPU_InitStruct.Size = SHARED_MEM_SIZE;
MPU_InitStruct.SubRegionDisable = 0x00;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_ENABLE;
/* Memory Attributes: Normal, Write-Through, No Write-Allocate */
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0; // TEX[2:0] = 000
MPU_InitStruct.IsCacheable = MPU_ACCESS_CACHEABLE; // C = 1
MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;// B = 0
MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE; // S = 0
HAL_MPU_ConfigRegion(&MPU_InitStruct);
HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
}
Software maintenance:
// In DMA RX Complete Callback:
// Invalidate D-Cache for the rxBuffer BEFORE CPU reads it
SCB_InvalidateDCache_by_Addr((uint32_t*)rxBuffer, BUFFER_SIZE);
// Now CPU can read rxBuffer
Pros:
Cons:
Aims for maximum CPU performance by allowing full caching (Write-Back; C=1, B =1, S=0 TEX=0b001) but requires careful software management.
#include "stm32h7xx_hal.h"
#define SHARED_MEM_BASE ((uint32_t)0x30000000)
#define SHARED_MEM_SIZE MPU_REGION_SIZE_1KB
void MPU_Config_Cacheable_WB(void) {
MPU_Region_InitTypeDef MPU_InitStruct = {0};
HAL_MPU_Disable();
/* Configure the MPU region as Cacheable Write-Back */
MPU_InitStruct.Enable = MPU_REGION_ENABLE;
MPU_InitStruct.Number = MPU_REGION_NUMBER2; // Use another region number
MPU_InitStruct.BaseAddress = SHARED_MEM_BASE;
MPU_InitStruct.Size = SHARED_MEM_SIZE;
MPU_InitStruct.SubRegionDisable = 0x00;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_ENABLE;
/* Memory Attributes: Normal, Write-Back, Write-Allocate */
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1; // TEX[2:0] = 001
MPU_InitStruct.IsCacheable = MPU_ACCESS_CACHEABLE; // C = 1
MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE; // B = 1
MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE; // S = 0
HAL_MPU_ConfigRegion(&MPU_InitStruct);
HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
// Optional but recommended: Invalidate D-Cache once after enabling
// if its state is unknown.
// SCB_InvalidateDCache();
}
Software maintenance:
#define BUFFER_SIZE 128
// Assume txBuffer/rxBuffer declared and aligned(32)
// Before starting DMA TX:
// CPU writes to txBuffer...
SCB_CleanDCache_by_Addr((uint32_t*)txBuffer, BUFFER_SIZE);
// Start DMA TX...
// In DMA RX Complete Callback:
// DMA finishes writing to rxBuffer...
SCB_InvalidateDCache_by_Addr((uint32_t*)rxBuffer, BUFFER_SIZE);
// CPU reads rxBuffer...
Pros:
Cons:
The L1 caches on the STM32 are vital for performance but necessitate careful management of cache coherency, particularly when DMA controllers share memory with the CPU. The MPU is the essential tool for defining memory region attributes and implementing a coherency strategy.
The choice of configuration the shared region depends critically on the application’s specific requirements, data access patterns, and the developer’s tolerance for complexity. Each configuration has its potential pros and cons. For example, configuring the shared region as non-cacheable region would be simple, safe, and potentially have slower CPU access. In contrast, fully cacheable with the software maintenance would have highest potential performance but complex to manage.
Regardless of the chosen strategy, meticulous MPU configuration and thorough testing are essential for building robust and reliable STM32 high-performance applications.