STM32F7: ETH TCP checksum offload fails

LCE · ‎2022-11-30

Hello,

ST's and all Ethernet experts, please.

STM32F767,

Nucleo-144, and custom board

no OS

lwIP 2.1.3, IPv4 only

STM32CubeIDE

no ETH interrupts used

Application:

Industrial frontend,

streaming "audio" data from SAIs via ethernet,

at high data rates, for long periods of time (weeks)

TCP is a must, losing packets is not an option.

Audio streaming mostly uses UDP, and they use interpolation

to mask lost packets - we are not allowed do that.

Problem:

ETH transmission: TCP header checksum is = ZERO = 0.

Depending on settings below (SRAM usage, CPU clock),

this happens after a few MB, or many GB of data,

at 25.6 Mbps it's running sometimes for hours,

sometimes it stops after a few minutes.

IP4 header checksum is okay (at least not 0).

All checked with Wireshark.

Then the PC side stops ACKnowledging,

then lwIP shuts down TCP.

Checked:

Same behaviour on Nucleo and custom board.
Checked on 2 different PCs.
LwIP stats don't show any errors.
"Transmit Store and Forward" is set in DMAOMR.
Transmit FIFO is deep enough for packets (1514 B, no bigger packets).
Checksum offload (CIC = ETH_DMATXDESC_CIC_TCPUDPICMP_FULL) is activated in all TX descriptor Status registers (BTW, there's a documentation error in RM0410 which says CIC bits are 28..27 in DESC1, page 1785).
Header checksums from lwIP are definitely = 0 before given to ETH DMA.
Payload checksum error status bit in the Transmit Status vector is NEVER set.
Memory barriers are used as recommended (DMB, DSB).
RM says reasons for checksum failure might be:
- no end of frame written to FIFO
- incorrect length
- All this does not happen, I checked all descriptors.

Findings:

Settings so far with an impact on the "zero-checksum-failure":

CPU clock -> lower = better
usage of internal SRAM memory areas, use of DTCM / SRAM1

Best "setup" until now:

CPU clock reduced to 192 MHz (216 is max for F767)
no use of DTCM - which makes it lose 1/4 of internal SRAM

-> Much better, but still not perfect, failed on one board after 1 hour or so.

Having used FPGAs for years, I had the hope of leaving these

"assumed race conditions" behind (naive me...). At least in the

FPGA you can get control of these problems.

For the final product, right now it seems the STM32F7 is not an option.

Which is sad, after having spent a lot of time on that one, having the

firmware at about 99% finished.

So, what am I doing wrong?

Or is there a known issue?

Source code of ethernetif.c etc. attached.

P.S.: I spammed the code with lots of __DSB()... I think I can remove many of these. But as it seems to be memory related, I had some hope.

LCE · ‎2022-11-30

BTW, I am not the only one who encountered that problem:

https://community.st.com/s/question/0D53W00001fRaz3SAC/ethernet-mac-tcp-checksum-offload-stops-working

And I wanted to add:

instruction cache and data cache are disabled (check via SCB->CCR)

waclawek.jan · ‎2022-11-30

If you are willing to test wild hypotheses, try reverting to the state with many errorneous transmissions (so that you are confident that error will occur very quickly), and after that switch DMAOMR.OSF off.

JW

waclawek.jan · ‎2022-11-30

I just posted the documentation error as a separate thread.

JW

LCE · ‎2022-11-30

Hi @Community member ,

thanks for the idea, right now I am so desperate that I welcome every idea, no matter how wild. And playing with that bit doesn't sound too wild.

The problem is that I don't know how the firmware can detect that error.

So first I have to find a way to detect that error, because I don't get any info back from the TX descriptors, and the MAC cannot write the TCP checksum back into its source buffer.

So I see that error only when it's too late: in my PC application (scope / analyser), in wireshark, and in my UART output that lwIP closed the connection due to not getting the ACKs for too long.

Right now I can only reset the OSF bit manually - when it's much too late.

Any ideas?

Just tried without the OSF bit:

nice, the zero-checksum error now comes after a few packets!

Need to think about that...

Right now the only solution that comes to my mind:

make 1 pbuf out of the 2 that lwIP builds for header and data. Ouch...

LCE · ‎2022-11-30

It's such a stupid error, on my 2nd PC it's running flawlessly for almost 5 hours now.

EDIT: 5.5 hours, and zero checksum again.

That's really a pain in... everywhere.

Piranha · ‎2022-12-01

I've been running Iperf2 TCP full-duplex at ~190 Mbps on F76x for weeks non-stop moving terabytes of data and running over 2^32 packets for both Rx and Tx without any issues. Therefore the issue should be a software issue and at least you can stop worrying about the MCU not being usable.

Even if the device sends a corrupted packet or several, it's almost impossible to accidentally break a TCP connection. That just doesn't sound right. But what exactly stops working? A single TCP connection, TCP subsystem, IP stack, driver, hardware? When it stops... Can the device create a new TCP connection? Does ICMP ping work? Does the driver still receive and can send packets?

I'll look into code and report here.

LCE · ‎2022-12-01

Hello @Piranha ,

thanks for chiming in!

And thanks for the good news about your long-term testing!

So there's hope, and it's "only" something stupid in my software.

This morning I had some hope, because in the SAI RX complete ISR in case of SAI buffer overflow it might have grabbed some other buffer already given to ETH DMA. I removed that, but problem still remains.

I went through all ISRs again, even put in some IRQ blocking in some functions (preparing ETH TX descriptors, because these are pointing to the SAI buffers).

Here are some more infos:

a) As soon as TCP header checksum = 0, the PC side stops ACK'ing, which seems to lead to a TCP timeout, the streaming connection's error callback is called and tells me via UART:

eErrIn = -13 = Connection aborted

b) Even after that streaming server stop, LwIP stats show that there are no lwIP errors, and 2 TCP servers / listening PCBs active:

the "streaming" PCB (restarted as soon as the old one's off, as it should be, by software feature), and
a Http server for control (via GET / PUT).

c) Now when I send another TCP/Http command, in wireshark I see that the SYN from PC gets through, and even that the MCU sends a SYN / ACK, but again with checksum = 0, and again this packet is not ACK'd. The MCU throws out TCP retransmissions, same problem, checksum = 0.

So TCP is alive, but never gets any ACK due to sending checksum = 0.

d) UDP is still working, as I have seen from PTP still running in the background, and so does a UDP echo server which I can start manually from UART.

But: UDP header checksum = 0! Which is not the case before the "checksum breakdown".

e) All MCU register settings after "checksum breakdown" are still the same, I just compared these again with a still running version.

f) TX descriptors also show no errors, as maybe IHE / IPE might be expected.

LCE · ‎2022-12-01

Until now I had not found that wireshark feature that it can check the checksum,

now that I turned it on it gave me a good laugh because of the comment in brackets:

Checksum: 0x0000 incorrect, should be 0x242e (maybe caused by "TCP checksum offload"?)

LCE · ‎2022-12-01

BTW, I forgot to show the MAC and DMA register setup.

Maybe there's something wrong?

ETH->
 
MACCR   = 0000CE0C
         speed:  100M
         duplex: FULL
         transmitter: ON
         receiver:    ON
         Interframe Gap = 96 bit times
         IPv4 checksum offload: ON
         Retry OFF (half-duplex)
MACSR   = 00000000
MACFFR  = 00000051
MACHTHR = 00000000
MACHTLR = 00000000
MACMIIAR = 00000050
MACMIIDR = 0000782D
MACFCR  = 00000000
 
DMASR   = 00660404
DMAIER  = 0001A041
DMAOMR  = 02202006
        RSF
        TSF
        ST
        OSF
        SR
DMABMR   = 02C12080
DMARDLAR = 2007C000
DMATDLAR = 2007D800
DMACHTDR = 2007E1D8

I have to admit that I don't really understand the MACFCR flow control register. Have to read more...