2017-11-27 02:15 AM
Dear All,
I have some questions concerning the cache of the cortex M7 & H7 micro controller.
I use an external SDRAM and the data and instruction caches are activated. Now, if I define the region as 'shareable', I lose a factor 3 in the system performance (just like if the cache is not activated). Any experience on it? Here is my MPU initialization.
Sharable ... but bad performances
static void _MPU_Configuration(void) {
MPU->CTRL = 0x00000000; // Disable the MPU
// Attributes for the SDRAM area (0xD0000000)
MPU->RNR = 0x00000000; // Region 0
MPU->RBAR = 0xD0000000; // Address
MPU->RASR = (0<<28) // XN: 0 executable
| (3<<24) // AP: 11 read-write
| (0<<19) // TEX: 000 normal
| (1<<18) // S: 1 shareable
| (1<<17) // C: 1 cashable
| (0<<16) // B: 0 non bufferable
| (0<<8) // Sub-region disable
| (22<<1) // 8-MB
| (1<<0); // Region enabled
MPU->CTRL = (1<<2) // Enable the usage of all the default map
| (1<<1) // MPU is enabled during the fault
| (1<<0); // MPU enabled
MEMO_SYNC_BARRIER;
DATA_SYNC_BARRIER;
INST_SYNC_BARRIER;
// Enable branch prediction
// Normally not necessary (always on)
SCB->CCR |= (1<<18);
DATA_SYNC_BARRIER;
}
Non sharable ... and great performances
static void _MPU_Configuration(void) {
MPU->CTRL = 0x00000000; // Disable the MPU
// Attributes for the SDRAM area (0xD0000000)
MPU->RNR = 0x00000000; // Region 0
MPU->RBAR = 0xD0000000; // Address
MPU->RASR = (0<<28) // XN: 0 executable
| (3<<24) // AP: 11 read-write
| (0<<19) // TEX: 000 normal
| (0<<18) // S: 0 non shareable
| (1<<17) // C: 1 cashable
| (0<<16) // B: 0 non bufferable
| (0<<8) // Sub-region disable
| (22<<1) // 8-MB
| (1<<0); // Region enabled
MPU->CTRL = (1<<2) // Enable the usage of all the default map
| (1<<1) // MPU is enabled during the fault
| (1<<0); // MPU enabled
MEMO_SYNC_BARRIER;
DATA_SYNC_BARRIER;
INST_SYNC_BARRIER;
// Enable branch prediction
// Normally not necessary (always on)
SCB->CCR |= (1<<18);
DATA_SYNC_BARRIER;
}
Thank you for your advises,
Best regards
Edo.
#m7-and-h7-cache-(sharable-or-not-)2017-11-29 04:03 PM
Correct, but the register you mentioned in the ARM documentation is indicated to be optional. In another ARM document ''System Memory Model'' ARM indicates this: (this is at the level of the MPU).
S is for sharable or not!
MPU->RASR = (0<<28) // XN: 0 executable
| (3<<24) // AP: 11 read-write
| (0<<19) // TEX: 000 normal
| (1<<18) // S: 1 shareable
| (1<<17) // C: 1 cashable
| (0<<16) // B: 0 non bufferable
| (0<<8) // Sub-region disable
| (22<<1) // 8-MB
| (1<<0); // Region enabled
EF
2017-11-29 05:40 PM
Correct, but the register you mentioned in the ARM documentation is indicated to be optional.
That means, the implementor (here: ST) may chose not to include it. In that case there's no relief from the Shareable => Non Cacheable rule.
In another ARM document
That's a higher level, more generic document; more precisely, it's an excerpt (or ''application note'' if you will) from the ARM-v7M ARM. That describes the basic foundation, a generic processor with many options left to implementations; the bottom of the hierarchy of documents, topped with the RM and DS of a particular product.
At that level, there's no explicit connection between shareability and cacheability yet, just a set of requirements for the implementation - see screenshot below. The details of what shareability implies (with regard to exclusive instructions for example) is described in that ARM: the details of cacheability are not dealt with at all, so it may simply be a signal output from the processor, left to some of the implementations levels what to do with it.
There are several implementations of ARM-v7M - Cortex-M3, Cortex-M4, Cortex-M7. These are more specific, thus their specifics are described in other documents, called TRM.
For example, the Cortex-M4 TRM basically just says that there is an (optional - i.e. the implementor may omit it) MPU; and refers straightly to the v7M ARM. That means, that the implementor may chose what to do with those signals - cacheability is contained in the AHB signals, so they presumably may have implemented a cache external to the processor, somewhere at the memory boundary for example. It's quite unlikely anybody would do that, as, frankly, generic caches are not very useful in true *controller* applications; it's just that with increasing performance and resources people tend to (ab)use these chips for general purpose computation and visual user interface and similar. If we go one more step in the hierarchy from generic to concrete, i.e. look at the choices ST did when implementing Cortex-M4 into STM32, these are described in PM0214, saying:
In STM32 implementations, the shareability and cache policy attributes do not affect the
system behavior. However, using these settings for the MPU regions can make theapplication code more portable.Now ARM created Cortex-M7 as the top-notch implementation of ARM-v7M, and decided to add a cache (it may have been a popular demand, but maybe not: I am not very versed in AXIM but I'd guess it would be vastly inefficient without caching). With adding cache, it was up to them now to decide the choice of how to ensure the cache-coherency of shareable memory regions; and they decided to go for the simplest solution - shareable is not cacheable. This is then documented in the Cortex-M7 TRM, from which I have quoted initially.
JW