STM32H7 Ethernet Transmit DMA with LwIP - Another Ethernet driver problem

ADunc.1 · ‎2025-04-15

Not a bug as such, but a nuisance none the less, and likely to catch out those less familiar with the quirks of H7 Ethernet MAC DMA...

If you set up Ethernet and LwIP the way that is recommended (and partially done by STM32CubeMX), with the Ethernet transmit and receive descriptors, receive buffer and LwIP memory pools within Ethernet DMA reachable RAM (RAM D2, 0x30000000 ...), then for the most everything sort of works.

If you use the LwIP sockets API, then it also works as the sockets API copies data to be sent on a socket into LwIP buffers, ensuring that data is in Ethernet DMA reachable memory.

BUT, if you use the netconn API, which is much faster and lighter weight than the sockets API, functions like netconn_write do not copy socket send data. It makes it all the way to the Ethernet driver as zero copy, which is nice. Unfortunately, it is likely that the sent data is on the stack of the calling process, or the general heap. Which makes it non DMA reachable for the Ethernet.

This results in an Ethernet DMA error, Ethernet is disabled and no longer works.

As a minimum, a really nice improvement to the driver would be to check the memory range of the transmit data is Ethernet DMA reachable and assert. At least that way you could understand what you have done and save hours of debugging.

A better improvement would be to copy the memory at that point into a LwIP buffer before calling Ethernet transmit. It is a LwIP driver, so no harm there. Much better solution than having to do it all through user code. And generally only the user part of the pbuf is not already in the right RAM.

Pavel A. · ‎2025-04-15

> As a minimum, a really nice improvement to the driver would be to check the memory range of the transmit data is Ethernet DMA reachable and assert

Good idea. Low level output function in ethernetif.c can check the addresses and assert. Copying requires more thought and management of TX buffer memory.

ADunc.1 · ‎2025-04-15

Copying is not too difficult I dont think. It already loops through the pbuf chain creating a matching tx buffer set. If it discovers that a pbuf is pointing to out of range RAM, can allocate a new pbuf, and copy that data into it, and point the tx buffer at that one instead. Of course need to keep track of the pbufs created so can free them, but that could be done by chaining them as they are created, so only one free is required.

There is a risk of an allocation failure of course...

pbufs are guaranteed to be allocated in DMA reachable RAM if LwIP memory has been put in the correct region as is required for zero copy receive buffers. And as per all the examples and STM32CubeMX.

Pavel A. · ‎2025-04-15

Yes, chaining pbufs is a clever trick!

ASEHST · ‎2025-04-16

Hello,

Thank you for this insightful analysis and report.

The zero-copy feature is designed to enhance performance by eliminating unnecessary data copying, which may take extra execution time.

However, it can lead to issues if the data resides in memory regions inaccessible to the Ethernet DMA. To ensure proper functionality, it's crucial for users to verify that the transmission data is in a DMA-accessible memory region, typically SRAM. If the data is not accessible, using the NETCONN_COPY parameter in the netconn_write API will copy the data into SRAM, making it accessible to the DMA and preventing transmission errors.

With regards,

If your question is answered, please close this topic by clicking "Accept as Solution".

ADunc.1 · ‎2025-04-16

I agree about zero copy. Zero copy would still be achieved by the driver if RAM is in Ethernet DMA reachable space. But by copying RAM that is out of bounds too, all bases are covered. The copy needs to be made somewhere, either by the user or by the driver.

Expecting the user to know where RAM is allocated, then make a decision about if they need NETCONN_COPY is not ideal. It means code that is microcontroller dependent. Or you just need to always copy to be safe.

It is just a suggestion that for a handful of lines of code in the transmit function, the driver could be robust, work in all situations, allow platform independent user code and still zero copy where needed.