cancel
Showing results for 
Search instead for 
Did you mean: 

M7 and H7 cache (sharable performance)

Franzi.Edo
Senior
Posted on November 27, 2017 at 11:15

Dear All,

I have some questions concerning the cache of the cortex M7 & H7 micro controller.

I use an external SDRAM and the data and instruction caches are activated. Now, if I define the region as 'shareable', I lose a factor 3 in the system performance (just like if the cache is not activated). Any experience on it? Here is my MPU initialization.

Sharable ... but bad performances

static void _MPU_Configuration(void) {

      MPU->CTRL = 0x00000000; // Disable the MPU

// Attributes for the SDRAM area (0xD0000000)

      MPU->RNR = 0x00000000;      // Region 0

      MPU->RBAR = 0xD0000000;     // Address

      MPU->RASR = (0<<28)         // XN: 0 executable

               | (3<<24)         // AP: 11 read-write

               | (0<<19)         // TEX: 000 normal

               | (1<<18)         // S: 1 shareable

               | (1<<17)         // C: 1 cashable

               | (0<<16)         // B: 0 non bufferable

               | (0<<8)          // Sub-region disable

               | (22<<1)         // 8-MB

               | (1<<0);         // Region enabled

      MPU->CTRL = (1<<2)          // Enable the usage of all the default map

               | (1<<1)          // MPU is enabled during the fault

               | (1<<0);         // MPU enabled

      MEMO_SYNC_BARRIER;

      DATA_SYNC_BARRIER;

      INST_SYNC_BARRIER;

// Enable branch prediction

// Normally not necessary (always on)

      SCB->CCR |= (1<<18);

      DATA_SYNC_BARRIER;

}

Non sharable ... and great performances

static void _MPU_Configuration(void) {

      MPU->CTRL = 0x00000000; // Disable the MPU

// Attributes for the SDRAM area (0xD0000000)

      MPU->RNR = 0x00000000;      // Region 0

      MPU->RBAR = 0xD0000000;     // Address

      MPU->RASR = (0<<28)         // XN: 0 executable

               | (3<<24)         // AP: 11 read-write

               | (0<<19)         // TEX: 000 normal

               | (0<<18)         // S: 0 non shareable

               | (1<<17)         // C: 1 cashable

               | (0<<16)         // B: 0 non bufferable

               | (0<<8)          // Sub-region disable

               | (22<<1)         // 8-MB

               | (1<<0);         // Region enabled

      MPU->CTRL = (1<<2)          // Enable the usage of all the default map

               | (1<<1)          // MPU is enabled during the fault

               | (1<<0);         // MPU enabled

      MEMO_SYNC_BARRIER;

      DATA_SYNC_BARRIER;

      INST_SYNC_BARRIER;

// Enable branch prediction

// Normally not necessary (always on)

      SCB->CCR |= (1<<18);

      DATA_SYNC_BARRIER;

}

Thank you for your advises,

Best regards

   Edo.

#m7-and-h7-cache-(sharable-or-not-)
11 REPLIES 11
Posted on November 30, 2017 at 00:03

Correct, but the register you mentioned in the ARM documentation is indicated to be optional. In another ARM document ''System Memory Model'' ARM indicates this: (this is at the level of the MPU).

0690X000006091WQAQ.png

S is for sharable or not!

      MPU->RASR = (0<<28)         // XN: 0 executable

               | (3<<24)         // AP: 11 read-write

               | (0<<19)         // TEX: 000 normal

               | (1<<18)         // S: 1 shareable

               | (1<<17)         // C: 1 cashable

               | (0<<16)         // B: 0 non bufferable

               | (0<<8)          // Sub-region disable

               | (22<<1)         // 8-MB

               | (1<<0);         // Region enabled

EF

Posted on November 30, 2017 at 01:40

Correct, but the register you mentioned in the ARM documentation is indicated to be optional.

That means, the implementor (here: ST) may chose not to include it. In that case there's no relief from the Shareable => Non Cacheable rule.

In another ARM document

That's a higher level, more generic document; more precisely, it's an excerpt (or ''application note'' if you will) from the ARM-v7M ARM. That describes the basic foundation, a generic processor with many options left to implementations; the bottom of the hierarchy of documents, topped with the RM and DS of a particular product.

At that level, there's no explicit connection between shareability and cacheability yet, just a set of requirements for the implementation - see screenshot below. The details of what shareability implies (with regard to exclusive instructions for example) is described in that ARM: the details of cacheability are not dealt with at all, so it may simply be a signal output from the processor, left to some of the implementations levels what to do with it.

0690X000006091mQAA.png

There are several implementations of ARM-v7M - Cortex-M3, Cortex-M4, Cortex-M7. These are more specific, thus their specifics are described in other documents, called TRM.

For example, the Cortex-M4 TRM basically just says that there is an (optional - i.e. the implementor may omit it) MPU; and refers straightly to the v7M ARM. That means, that the implementor may chose what to do with those signals - cacheability is contained in the AHB signals, so they presumably may have implemented a cache external to the processor, somewhere at the memory boundary for example.  It's quite unlikely anybody would do that, as, frankly, generic caches are not very useful in true *controller* applications; it's just that with increasing performance and resources people tend to (ab)use these chips for general purpose computation and visual user interface and similar. If we go one more step in the hierarchy from generic to concrete, i.e. look at the choices ST did when implementing Cortex-M4 into STM32, these are described in PM0214, saying:

In STM32 implementations, the shareability and cache policy attributes do not affect the

system behavior. However, using these settings for the MPU regions can make the

application code more portable.

Now ARM created Cortex-M7 as the top-notch implementation of ARM-v7M, and decided to add a cache (it may have been a popular demand, but maybe not: I am not very versed in AXIM but I'd guess it would be vastly inefficient without caching). With adding cache, it was up to them now to decide the choice of how to ensure the cache-coherency of shareable memory regions; and they decided to go for the simplest solution - shareable is not cacheable. This is then documented in the Cortex-M7 TRM, from which I have quoted initially.

JW