Linker file bugs in LWiP HTTP Server examples for gcc

Andrei Chichak · ‎2021-01-22

There's some problems with the .ld files for SW4STM32 for various processors in the LWiP demo projects in the CubeMX HAL distribution files.

For instance F7 version 1.16.0 746 and 767 nucleo boards, the RAM section overlaps the ethernet DMA rx and tx descriptor, and rx data array sections (Memory_B[123]).

The 767 file has a stack area declared to be 0x1000 (4096) bytes, but the tx data array extends into that area (Memory_B4).

This would have some very subtle failures since these sections are handled by DMA. It's not like you can check to see what code is screwing up your variables, because there is no code

Andrei (from The Great White North)

Piranha · ‎2021-01-23

Indeed, but take a note that this is only one of a million other flaws, including much more critical ones:

https://community.st.com/s/question/0D50X0000BOtfhnSQB/how-to-make-ethernet-and-lwip-working-on-stm32

Anyway, I will add this one to the list... :)

Andrei Chichak · ‎2021-01-25

I don't know if this is an issue or not, on the 767 LWiP HTTP server example, the author sets up the MPU for the ETH control and data regions.

The implementation is to take the top 16K of SRAM and mark it non-cacheable non-bufferable. Then a higher priority region of 256 bytes is overlayed for the control information that is non-cacheable but is bufferable.

The weirdness comes in that the 16k non-cache non-buffer region covers the stack as well.

I'm not sure if this is an issue. I tried to figure out what the default memory protection for SRAM is, but the docs really aren't clear on that.

Piranha · ‎2022-07-20

ST modified the files, but did not fix the absurd bugs...

https://community.st.com/s/question/0D53W00001fRfQ5SAK/linker-script-specifies-txdescripsection-memoryb2-but-txdescripsection-ends-up-in-ram-section-lwiphttpservernetcoonrtos-on-stmf767zi

MSG_ST · ‎2022-11-28

Hello,

Thank you for your feedbacks, it helps us to continuously improve.

@Andrei Chichak The utility of the implementation of the top 16K of SRAM and mark it non-cacheable and non-bufferable is related to the fact that we should define a pool section where the LwIP RAM heap pointer is defined. And to ensure that CPU & DMA's data/instruction are synchronized, we configure it as not-cacheable & not-bufferable.

Above, a common answer with https://community.st.com/s/feed/0D53W00001fRfQ5SAK :

There's confirmed issues related to LwIP_HTTP_Server_Netconn_RTOS application on STM32F7.

And here's the delta between F7 version 1.17.0 and the correction to be made on STM32F767ZITx_FLASH.ld :

Tx Descriptor region :

.TxDescripSection (NOLOAD) : { *(.DMATxDscrTab_section) } >Memory_B2

will be replaced by :

.TxDecripSection (NOLOAD) : { *(.TxDescripSection) } >Memory_B2

Ram region length :

RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 497K

497K = 128K(DTCM) + 368K(SRAM1) + 1K(For Descriptors Section starting from 0x2007C000)

Sorry for the late answer and thank you again for your cooperation.

Regards

Mahdy

Piranha · ‎2022-12-04

So you will put descriptors in "Memory_B2" region, but add a space for those in "RAM" region... Maybe try thinking before you write such an absurd nonsense?

And is it too hard for ST's "developers" to realize that there is no need for separate memory sections and regions for Rx and Tx descriptors? Put both of those in a single section and region!

Also is it too hard for them to understand that the DTCM is a significantly different memory and should not be merged with SRAM? Separate it and put the stack at the end of DTCM at 0x20020000. One can also kind of put the descriptors in DTCM as that memory is not cached anyway, but there is an errata issue possibly related to it.

Andrei Chichak · ‎2022-12-04

Perhaps you can tone it down just a hell of a lot.

You might have a point, but being an ******* about it just turns everybody off and now nobody will be interested in anything that you have said.

Please don't comment on any of the items that I have posted ever again.

ST staff, can you take care of this person somehow?

Piranha · ‎2023-05-21

After taking a deeper look on a similar topic, it turns out the 16 KB memory region is not for Rx buffer pool. Instead for H7 they have implemented an incomplete D-cache maintenance code, but for F7 examples there is no such either. Knowing that, it becomes clear why for F7 they are merging DTCM with SRAM - because then the beginning of the merged "RAM" region consists of the non-cacheable DTCM. And, as most likely (but not guaranteed) the RX_POOL and all other RAM data fits into that DTCM part, it allows for Ethernet Rx buffers to function properly.

All in all it is one stupidity "solved" by another stupidity. If that would be a conscious decision to place the Rx buffers in DTCM memory, it could be done in a civilized way by just creating a dedicated section in a linker script like it is done for descriptors.

In addition CubeMX v6.8.1 with CubeF7 1.17.0 for F7 devices shows a warning "The ETH can work only when RAM is pointing at 0x24000000", which is a nonsense because that address is in the reserved range for these devices, and does not include the memory regions and sections necessary for Ethernet in the generated linker script at all.

Pavel A. · ‎2023-05-21

Well then F7 has a very strange DTCM. Tightly coupled to the CPU - but accessible by a peripheral DMA (this means shareable?).

Piranha · ‎2023-05-22

The Cortex-M7 CPU core is the same on F7, H7 and other manufacturers MCUs. And the TCM memories are still connected directly to the core, but take a careful look at the core's internal TCM controller - it has an additional AHBS bus meant exactly for DMAs. So the question becomes to what the AHBS is connected on each device.

As can be seen in AN4667 "Figure 1. STM32F7 Series system architecture", on F7 the AHBS bus is connected as a slave to the main AHB bus matrix and is accessible from all bus masters. And, as can be seen in AN4891 "Figure 2. STM32H74x and STM32H75x system architecture", on H7 the AHBS bus is connected to MDMA and therefore only that peripheral can access TCM memories.

Here is an ARM documentation about AHBS and it's configuration. By default AHBS accesses have a lower priority compared to software accesses, therefore the software latency should not be impacted significantly. Also take a note that DTCM is connected by a 2x32-bit interface, which enables even two simultaneous 32-bit accesses, if there is no data dependency.