Skip to main content
Franzi.Edo
Senior
November 27, 2017
Question

M7 and H7 cache (sharable performance)

  • November 27, 2017
  • 3 replies
  • 3085 views
Posted on November 27, 2017 at 11:15

Dear All,

I have some questions concerning the cache of the cortex M7 & H7 micro controller.

I use an external SDRAM and the data and instruction caches are activated. Now, if I define the region as 'shareable', I lose a factor 3 in the system performance (just like if the cache is not activated). Any experience on it? Here is my MPU initialization.

Sharable ... but bad performances

static void _MPU_Configuration(void) {

      MPU->CTRL = 0x00000000; // Disable the MPU

// Attributes for the SDRAM area (0xD0000000)

      MPU->RNR = 0x00000000;      // Region 0

      MPU->RBAR = 0xD0000000;     // Address

      MPU->RASR = (0<<28)         // XN: 0 executable

               | (3<<24)         // AP: 11 read-write

               | (0<<19)         // TEX: 000 normal

               | (1<<18)         // S: 1 shareable

               | (1<<17)         // C: 1 cashable

               | (0<<16)         // B: 0 non bufferable

               | (0<<8)          // Sub-region disable

               | (22<<1)         // 8-MB

               | (1<<0);         // Region enabled

      MPU->CTRL = (1<<2)          // Enable the usage of all the default map

               | (1<<1)          // MPU is enabled during the fault

               | (1<<0);         // MPU enabled

      MEMO_SYNC_BARRIER;

      DATA_SYNC_BARRIER;

      INST_SYNC_BARRIER;

// Enable branch prediction

// Normally not necessary (always on)

      SCB->CCR |= (1<<18);

      DATA_SYNC_BARRIER;

}

Non sharable ... and great performances

static void _MPU_Configuration(void) {

      MPU->CTRL = 0x00000000; // Disable the MPU

// Attributes for the SDRAM area (0xD0000000)

      MPU->RNR = 0x00000000;      // Region 0

      MPU->RBAR = 0xD0000000;     // Address

      MPU->RASR = (0<<28)         // XN: 0 executable

               | (3<<24)         // AP: 11 read-write

               | (0<<19)         // TEX: 000 normal

               | (0<<18)         // S: 0 non shareable

               | (1<<17)         // C: 1 cashable

               | (0<<16)         // B: 0 non bufferable

               | (0<<8)          // Sub-region disable

               | (22<<1)         // 8-MB

               | (1<<0);         // Region enabled

      MPU->CTRL = (1<<2)          // Enable the usage of all the default map

               | (1<<1)          // MPU is enabled during the fault

               | (1<<0);         // MPU enabled

      MEMO_SYNC_BARRIER;

      DATA_SYNC_BARRIER;

      INST_SYNC_BARRIER;

// Enable branch prediction

// Normally not necessary (always on)

      SCB->CCR |= (1<<18);

      DATA_SYNC_BARRIER;

}

Thank you for your advises,

Best regards

   Edo.

#m7-and-h7-cache-(sharable-or-not-)
This topic has been closed for replies.

3 replies

Nesrine M_O
Associate
November 27, 2017
Posted on November 27, 2017 at 13:05

Hi

Franzi.Edo

,

I recommend you to have a look to the

http://www.st.com/content/ccc/resource/technical/document/application_note/group0/0d/b5/e7/b7/47/0c/4a/ae/DM00306681/files/DM003066pdf/jcr:content/translations/en.DM003066pdf

application note.

This application note is provided with the

http://www.st.com/content/st_com/en/products/embedded-software/mcus-embedded-software/stm32-embedded-software/stm32cube-embedded-software-expansion/x-cube-perf-h7.html

embedded software package that includes the stm32h7x3_cpu_perf project aimed at demonstrating the performance of CPU memory accesses in different configurations with code execution and data storage in different memory locations using L1 cache.

-Nesrine-

waclawek.jan
Super User
November 27, 2017
Posted on November 27, 2017 at 13:32

From Cortex-M7 TRM:

By default, only Normal, Non-shareable memory regions can be cached in the RAMs.

Caching only takes place if the appropriate cache is enabled and the memory type is

cacheable. Shared cacheable memory regions can be cached if CACR.SIWT is set to 1.

JW

Franzi.Edo
Senior
November 27, 2017
Posted on November 27, 2017 at 16:21

Thank you Jan. That clarify the situation.

Best regards

  Edo

T J
Senior III
November 28, 2017
Posted on November 28, 2017 at 01:28

what is the result ?

is it a fix or is there a problem ?

sharing cache ? why would you need to do that ?

do you share with one or more DMAs or programs ?

are you discussing the security side ?

are you trying to offer APIs, but hide the code ?

Sorry that I don't understand but would like to.

I have made a 208pin PCB where the H7 seems to be pin for pin and pin function for function drop in replacement

Franzi.Edo
Senior
November 29, 2017
Posted on November 29, 2017 at 19:20

Hi Clive,

Thank you for your comment. The gate count is not enough to explain a so bad performance. In my tests, I set the bit Shareable but I use only one master, the CPU (the DMA is disabled). The performance of the SDRAM cashed without the Sharing bit is great (10-15% less performed than the internal SRAM).

The real effect of this bit on the bandwidth of the memory for me is still not clear.

Anyway, thank's

  Edo

Tesla DeLorean
Guru
November 29, 2017
Posted on November 29, 2017 at 20:33

On F4 parts which don't cache SDRAM execution speed is 6x slower. Try doing an XCHG to memory on your x86 box for some whiplash inducing stalls.

Perhaps there are some 'ARM Certified Engineers' that can better explain the ramification, or expected performance, of 'sharable'.

If using DMA there are going to be coherency issues, you really want to use internal single cycle memory for those tasks, and save the cache for computationally important data on slower memory.

Tips, Buy me a coffee, or three.. PayPal Venmo (See Profile) Up vote any posts that you find helpful, it shows what's working..