Ethernet MAC, TCP Checksum offload stops working

LKert · ‎2022-07-13

Hi,

I'm trying to push through a lot of data over Ethernet using TCP. It's working for a while and then the TCP checksums disappear, IPv4 checksums are still there. This happens after 1-2MB -> 25GB depending if there's a python or C client on the PC, python happens more frequently, don't think this is relevant though.

In RM0090 there's a statement "You must make sure the Transmit FIFO is deep enough to store a complete frame before that frame is transferred to the MAC Core transmitter. If the FIFO depth is less than the input Ethernet frame size, the payload (TCP/UDP/ICMP) checksum insertion function is bypassed and only the frame’s IPv4 Header checksum is modified, even in Store-and-forward mode." <- this describes exactly what I'm seeing, the problem is that when it starts to happen, it never recovers on its own. Same thing happens with one 60byte packet a second.

How can one make sure the FIFO is deep enough?

Or this is always true because I'm using TCP, 1460byte MSS -> 1516byte maximum frame size which always fits into the FIFO? (by the time the prev frame was completely sent to the MAC Core transmitter)

MCU is STM32F427ZG

This is the config:

ETH->MACCR |= (ETH_MACCR_FES |

ETH_MACCR_ROD |

ETH_MACCR_IPCO |

ETH_MACCR_DM);

ETH->MACFFR = 0;

ETH->MACFCR = 0;

ETH->DMAOMR = (ETH_DMAOMR_DTCEFD |

ETH_DMAOMR_RSF |

ETH_DMAOMR_TSF |

ETH_DMAOMR_OSF);

ETH->DMABMR = (ETH_DMABMR_AAB |

ETH_DMABMR_FB |

ETH_DMABMR_RTPR_4_1 |

(32 << 17) | //RX Burst Length

(32 << 8) | //TX Burst Length

ETH_DMABMR_USP |

ETH_DMABMR_EDE);

kind regards,

Lorand

LCE · ‎2022-11-23

Hello @LKert

have you found a solution for that problem?

Same here with an STM32F767.

I just found that the TCP header checksum suddenly becomes 0, then all applications fail.

I am also using TCP streaming for lots of data, Http is the same.

LCE · ‎2022-11-24

I checked again all relevant registers and the descriptors, all set as it should.

LKert · ‎2022-11-24

HI @Community member ,

sadly no, I've tried everything but nothing helped,

at the end I've ended up enabling the software checksum for now hoping that someone might have something here. The software solution is significantly slower though on the F427 I'm using, minus 30% throughput, 4MB/s with the offload and 2.7MB/s with the software.

For me the checksum calculation stops if I'm pushing a lot of data through, which I am, downloading 100s of GBs of data.

If you find anything, please let me know as well, I'm pushing to get 7-8MB/s minimum in the coming year, but the project is shelved at the moment.

LCE · ‎2022-11-24

Thanks for the info.

I'll let you know if I find the solution.

Until now I have checked:

the TX descriptors (in F7: DESC0 = Status: CIC bits set)
DMA registers
lwIP: TCP header checksum = 0 before handed to MAC

I just found a TX packet in wireshark which had not the correct length, thus the MAC didn't calculate the checksum, and then transfer broke down.

The TX packet with incorrect length:

"good" TCP header
but with checksum = 0
TCP data was crap, looked like parts of the IP header (e.g. included the MAC address)

LCE · ‎2022-11-25

No solution yet.

Some more testing, but before that I "separated" the internal memories via linker script to make sure there are no leaks / spill-overs / whatsoever. So the lwIP's heap and the local buffers used as source for sending TCP packets, as well as STM32 stack and heap are in their own memory areas.

Still the same problem:

all of a sudden TCP packets sent from STM32 have a TCP header checksum = 0.

Windows side doesn't like that, no more ACKs coming back to STM32.

Then TCP aborts the connection and TCP hangs completely.

UDP is still working, so does the rest of the STM32. Just all related to TCP (streaming and http) are dead.

lwIP stats don't show any errors.

This usually happens after lots of data have been transmitted.

Right now at about 8 GB, still running...

waclawek.jan · ‎2022-11-25

Maybe this.

JW

LCE · ‎2022-11-25

Thanks Jan, but I found that info in the RM, and already checked lwIP.

And right before handing a TCP segment to the MAC it resets the header checksum:

seg->tcphdr->chksum = 0;

in tcp_out.c, tcp_output_segment(), called by tcp_output()

It's amazing that it works for millions of packets most of the time, then all of a sudden, BAM...

LCE · ‎2022-11-27

So, I played around a lot with:

1) checking the map file where variables are placed

2) changing linker file and placing variables in different sections

1) the compiler with the standard linker file (all SRAM is the same) doesn't care about DTCM / SRAM1 / SRAM2, so buffers and variables cross these borders

2) This had a huge impact: it got worse, which confirms that we have a SRAM placement problem.

All variables and buffers used by ETH DMA must be in the same RAM section, it seems.

Here's another discussion confirming that:

https://en-nut-discussion.egnite.narkive.com/B2Hlv0eb/stm32f756-strange-effect

I'll check what's best, where to put what.

The problem: I need huge TX buffers (as much as possible of F767's 512 kB).

LCE · ‎2022-11-27

Best result until now for many tries - still running after 30 minutes = ~6GB via TCP:

no DTCM used whatsoever, zero, nada
SRAM2: only ETH descriptors
SRAM1: everything else, incl. heap and stack

Even placing heap and stack into DTCM didn't work that well.

But why?