Can someone explain the MPU configuration needed for LWIP?

eduardo_reis · ‎2024-01-30

In How do I create a project for STM32H7 with Ethernet and LwIP stack working? by @Adam BERLINGER we have the following MPU configuration:

Cortex-M7 configuration

This step can be skipped when using Cortex-M4 core.

Enable ICache and DCache.
Enable memory protection unit (MPU) in “Background Region Privileged access only + MPU Disabled ...” mode. Configure regions according to the picture below:

Can someone explain that in more details?

Where does it say these regions should be protected from MPU?

What is the rational of behind each of those MPU regions?

Many thanks,

Eduardo.

tjaekel · ‎2024-01-30

The RM does not talk about LwIP memory configurations.

My "brief" overview about MPU and why it is needed for the LwIP stack:

MPU is not just a "Memory Protection Unit" (sure, you can configure that some regions are not writeable, or not possible to execute code from there) - the important thing here for LwIP is: it configures also which region is cached, what the cache policy is, or a region not cached!

LwIP uses the ETH drivers. And the ETH interface is DMA based. For DMAs you have descriptors sitting in the RAM as well as the buffers where the transfer is done from/to. A DMA needs "instructions" (descriptors). And these have to be updated by MCU before you kick off a DMA (which is like another "master", another "core").

It comes to the "cache coherency" topic:

A DMA engine, e.g. in the ETH device, needs descriptors sitting in the RAM ("instructions").
But the MCU can write/update such descriptors by writing through its data cache.
It means: the final memory is still not yet written/updated, all sits still in the cache of the MCU (not in memory).
But the DMA engine of the ETH device reads only from memory - it would not see any updates on DMA descriptors (ETH DMA has no clue what sits in the MCU cache).
BTW: the same for data buffers:
If MCU writes a new ETH package, but it sits just in Data Cache, the ETH device would send from memory old data. Or vice versa: the ETH DMA has placed new data in RAM, but the MCU does not know that its cache is not updated (not invalidated, see "cache clear" and "cache invalidate" functions).

Therefore, in combination with DMAs, esp. the LwIP DMA - the MPU is used to define regions in order:

To specify a region which is not cached (or at least a "write-through" policy), so that DMA can see what the MCU has written as new "instructions".
For my understanding: at least the DMA descriptors for the ETH have to sit in an "un-cached" (at least "write-through") region. Here is the MPU config used for: specify the "cache behavior" (for a region).
It can be also the case for the transfer buffers: when ETH DMA and MCU want to access the "latest" data in memory - the memory region might be needed to be excluded from the entire cached RAM.
Configuring regions as "un-cached" via MPU saves a lot of function calls, e.g. clear or invalidate the MCU cache.

Without MPU configured, the entire RAM can be (is!) enabled for DCache. The MPU regions define now an exception as: "this region has a different cache policy, or even this region is never cached."

Otherwise, the DMA (not being aware of MCU caches) and the MCU (with using DCache) would never be "up-to-date" (not in sync, not cache coherent between MCU and DMA as "two different master cores").

So, in case of LwIP, the MPU is mainly configured to "control" the cache functionality, esp. to "disable" a region as a cached region. Otherwise, MCU and DMA would not "understand" each other, because MCU "looks through" its DCache but ETH DMA just directly on memories (and the content can be not (yet) updated properly).

Suggestion:
compare the MPU regions with some __attribute__ used in LwIP stack (to use a dedicated piece of memory), the linker script, where also sections are defined, e.g. where the ETH DMA should have its descriptors, or where the ETH transfer buffers are located. All is "cross-connected": you should find somewhere else a "reservation" of specific memory regions and locations which are reflected also on the MPU region definitions (start addresses and size, plus cache policy). All must be matching: changing just one config will break all: source code (e.g. with __attribute__ to specify on which memory address), linker script (see sections for DMA descriptors, buffers) as well as the MPU region config must match.

View solution in original post

Pavel A. · ‎2024-01-30

>Where does it say these regions should be protected from MPU?

Actually these MPU region settings are used to disable data caching on memory areas shared with the ETH over it's own DMA. By default this memory is cacheable. Other attributes are less important. Note that region #1 has TEX level 1 which redefines meaning of Shareable /Cacheable/Bufferable bits. Please take your time, read the STM32H7 programmer reference (or the original ARM Cortex-M7 documentation).

The region #0 is common ST recommendation for any project on STM32H7, to avoid unwanted access to address ranges of external memories, when these memories are not present. This region covers the whole 4GB space, with 4 "holes" punched (0b10000111) where are internal memories and peripheral registers. The same is also applicable to STM32F7.

Tesla DeLorean · ‎2024-01-30

Disable caching/buffering of RAM shared for DMA read/write from the Ethernet, saves lots of coherency issues

Disabling caching of areas used my the MCU to build descriptors, which the Ethernet peripheral needs to read and walk, but not write.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Pavel A. · ‎2024-01-30

ETH also writes to the descriptors. Clears the OWN bit, writes status, timestamps and so on.

eduardo_reis · ‎2024-01-30

Hello @Pavel A. and @Tesla DeLorean Thank you for your replies.

It sounds that Region #0 is not really required. What about regions #1 and #2?

I did try to read through the RM, but it is too dense. For someone with my level of expertise, it would take forever until I am able to digest it and make sense of this information just to find my way around it. So, I am asking for help to speed this process up. I would appreciate if someone could trace the decision from each of those parameters from the RM. I imagine there are other people like me that would also appreciate such an explanation, working as a stepping stone on learning STM32 development.

Does the RM really mentioned the LwIP memory configuration? I could not find references for "LwIP" in it.

tjaekel · ‎2024-01-30

The RM does not talk about LwIP memory configurations.

My "brief" overview about MPU and why it is needed for the LwIP stack:

MPU is not just a "Memory Protection Unit" (sure, you can configure that some regions are not writeable, or not possible to execute code from there) - the important thing here for LwIP is: it configures also which region is cached, what the cache policy is, or a region not cached!

LwIP uses the ETH drivers. And the ETH interface is DMA based. For DMAs you have descriptors sitting in the RAM as well as the buffers where the transfer is done from/to. A DMA needs "instructions" (descriptors). And these have to be updated by MCU before you kick off a DMA (which is like another "master", another "core").

It comes to the "cache coherency" topic:

A DMA engine, e.g. in the ETH device, needs descriptors sitting in the RAM ("instructions").
But the MCU can write/update such descriptors by writing through its data cache.
It means: the final memory is still not yet written/updated, all sits still in the cache of the MCU (not in memory).
But the DMA engine of the ETH device reads only from memory - it would not see any updates on DMA descriptors (ETH DMA has no clue what sits in the MCU cache).
BTW: the same for data buffers:
If MCU writes a new ETH package, but it sits just in Data Cache, the ETH device would send from memory old data. Or vice versa: the ETH DMA has placed new data in RAM, but the MCU does not know that its cache is not updated (not invalidated, see "cache clear" and "cache invalidate" functions).

Therefore, in combination with DMAs, esp. the LwIP DMA - the MPU is used to define regions in order:

To specify a region which is not cached (or at least a "write-through" policy), so that DMA can see what the MCU has written as new "instructions".
For my understanding: at least the DMA descriptors for the ETH have to sit in an "un-cached" (at least "write-through") region. Here is the MPU config used for: specify the "cache behavior" (for a region).
It can be also the case for the transfer buffers: when ETH DMA and MCU want to access the "latest" data in memory - the memory region might be needed to be excluded from the entire cached RAM.
Configuring regions as "un-cached" via MPU saves a lot of function calls, e.g. clear or invalidate the MCU cache.

Without MPU configured, the entire RAM can be (is!) enabled for DCache. The MPU regions define now an exception as: "this region has a different cache policy, or even this region is never cached."

Otherwise, the DMA (not being aware of MCU caches) and the MCU (with using DCache) would never be "up-to-date" (not in sync, not cache coherent between MCU and DMA as "two different master cores").

So, in case of LwIP, the MPU is mainly configured to "control" the cache functionality, esp. to "disable" a region as a cached region. Otherwise, MCU and DMA would not "understand" each other, because MCU "looks through" its DCache but ETH DMA just directly on memories (and the content can be not (yet) updated properly).

Suggestion:
compare the MPU regions with some __attribute__ used in LwIP stack (to use a dedicated piece of memory), the linker script, where also sections are defined, e.g. where the ETH DMA should have its descriptors, or where the ETH transfer buffers are located. All is "cross-connected": you should find somewhere else a "reservation" of specific memory regions and locations which are reflected also on the MPU region definitions (start addresses and size, plus cache policy). All must be matching: changing just one config will break all: source code (e.g. with __attribute__ to specify on which memory address), linker script (see sections for DMA descriptors, buffers) as well as the MPU region config must match.

tjaekel · ‎2024-01-30

this threads talks about the MPU config as well:

Nucleo-h743zi2 Ethernet Ping issue - STMicroelectronics Community

eduardo_reis · ‎2024-02-16

Hello @Pavel A.

Could you help me to understand this part?

@Pavel A. wrote:
The region #0 is common ST recommendation for any project on STM32H7, to avoid unwanted access to address ranges of external memories, when these memories are not present. This region covers the whole 4GB space, with 4 "holes" punched (0b10000111) where are internal memories and peripheral registers. The same is also applicable to STM32F7.

After watching this video on MPU Sub-Region setting, my understanding is that the 4GB memory region is divided into 8 parts of size 0x20`000`000, hence Region #0 protects [0x20`000`000 - 0x9F`FFF`FFF]

From the RM0468, that seems to prevent access (ALL ACCESS NOT PERMITED) to many if not all peripherals. What is the point of this? USART and TIMs are part of my application.

I know this Region #0 recommendation works, since it is also part of many sample codes. But there is something I am missing to fully get it.

Cheers,

Eduardo.

eduardo_reis · ‎2024-02-16

Hello @tjaekel, thank you so much for your extensive answer. It was very much appreciated and clarifying.

I will follow the suggestions and see if I can solve the underlying problem I had behind this question.

Cheers, and thank you.

Eduardo.

Pavel A. · ‎2024-02-16

@eduardo_reis Yes, MPU regions can be divided to 8 parts (if size of the region > 256 bytes). Each 1/8 th part can be turned into a hole that peeks into underlying region - which in case of the region #0 is hardware-defined. Bit with value 1 means a hole in correspondent 1/8 part. A hole means that the MPU region does not "catch" access to that address range but passes it through to previous region. The holes in the 4GB region #0 are from 0x00000000 to 0x5FFFFFFF which covers internal memories and periphs, and 0xE0000000 to FFFFFFFF which covers the ARM defined registers (not shown on your pictures). So all the UARTs and TIMs remain accessible.

Sorry if this is hard to grasp. That's what it is.