LWIP/ V6.5.0. STMH32H735_DISCO (https://github.com/stm32-hotspot/STM32H7-LwIP-Examples) .Rx_PoolSection. (former .RxArraySection) seems not to be guarded by the MPU against cache coherency issues. Is this the right way?

Johi · ‎2023-01-03

I have compared 2 STMH32H735_DISCO_Eth examples, one built under cube IDE V6.2.1 and a more recent one built with IDE V6.5.0. (thanks to @Pavel A. for providing the reference to the new examples on GIT).

The V6.5.0. example moved the Rx_PoolSection/RxArraySection from AHB D2 RAM to AXI D1 RAM. The MPU configuration in the example seems not to protect the Rx_PoolSection/RxArraySection area from cache coherency problems any more. Can such a protection be omitted and if so why?

I made a very detailed .doc describing in detail my findings, it is attached to this post.

Piranha · ‎2023-01-05

Your analysis seems to be correct. Similarly I took a deeper look at some examples from Cube packages...

Let's start with an example for NUCLEO-H743ZI. The address of RX_POOL is 0x30000400 for GCC, Keil and IAR compilers. An offset of 0x400 is reasonable because it provides 1KB space for descriptors. But then in MPU configuration we have an address 0x30004000, which is wrong. There are two possibilities how this happened. The first and obvious one seems to be just a spelling error, but the fact that the number still has 8 digits is suspicious. So should we change it to 0x30000400? No, because the address must also be aligned to the MPU region size, which is 16KB = 0x4000 in this case. So now we see from where the 0x4000 offset most likely came - someone "fixed" the address to be aligned to the MPU region size and just didn't care about the consequences. Of course, the correct address is 0x30000000 and the region for descriptors must be set in a region with a higher number than a region for RX_POOL memory for the priorities to act as required.

Then they ported this to the STM32H735G-DK board. They moved the RX_POOL to the 0x30000200 for GCC and Keil compilers, left at the previous address for IAR (descriptors also at the previous addresses) compiler and, of course, did not fix the region address and order in the MPU configuration.

On top of it at least the latest CubeMX v6.7.0 does not generate the additional RX_POOL declaration code at all and because of that the pool will be assigned to the .bss section, which is again wrong.

Because none of these broken examples actually worked, the developers tried solving it by adding a D-cache invalidation in HAL_ETH_RxLinkCallback() function. But, of course, they didn't do that without flaws also. A correct cache maintenance also requires a D-cache invalidation before the reception - for this code in HAL_ETH_RxAllocateCallback() function. Anyway, if there is a cache maintenance code, there is no need for MPU configuration for RX_POOL memory. Doing both for the same purpose just shows that the developers don't understand what they are doing.

And, if that wouldn't be enough, I will remind about another D-cache related flaw, which has been reported long ago. Regardless of whether the driver uses MPU configuration or cache maintenance functions for Rx buffers, it cannot solve the cache coherence problems for Tx buffers. First, Tx buffers can come from different sources, including the RX_POOL, when the received packets are "returned" to the output. Second, PBUF types PBUF_ROM/PBUF_REF can point to arbitrary memory addresses. Therefore the Tx buffers always need a D-cache clean operation to be performed in the function low_level_output().

At the end I'll remind that configuring the specific memory regions for data buffers as non-cacheable is generally a very poor choice performance wise. For a proper and decent solution it is recommended to read my article "Maintaining CPU data cache coherence for DMA buffers". Additionally this topic also has some useful related information.

@Imen DAHMEN , @Amel NASRI , yeah, yet another set of severe issues!

View solution in original post

Piranha · ‎2023-01-05