cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H753 Ethernet TX corruption

Werdf
Associate

Hello,

I am currently experiencing some unusual Ethernet transmission (STM to PC) issues with a few of our custom PCBs. We use the STM32H753II chip and the KSZ8081RNB. The KSZ8081RNB is connected to the MCO2 and the RMII-interface of the STM.

The IP/TCP/UDP checksum check primarily fails due to corrupted payloads ranging in size from 1 to 20 bytes, or, more rarely, due to corrupted protocols. It doesn't happen regularly, but it happens too often.

This means that, for TCP, two or more retransmissions are sometimes needed.

udp bad:

Werdf_0-1777355198972.png

udp good:

Werdf_2-1777355407285.png

Sometimes, packets with complete defects are sent. These defects are not sent by the application. Even, when I paused the application, I could still see these packets.

Werdf_1-1777355273974.png

For tests the PCB and the PC are directly connected. Using different cables does not solve the issue; only using different PCBs does.

I set up the LwIP example project in CubeIDE 1.19, as described in Github STM32H7-LwIP-Examples#how-to-create-project-from-scratch.

This project's firmware was used to create the traffic shown in the pictures. After increasing the payload size from 20 to 620 bytes, I noticed the issue occuring more frequently.

The DBGMCU->IDCODE register shows the STM revision 0x2003 ("V") and the device ID 0x450 for all our PCBs.

 

The following points are things that I tried that didn't make a difference:

- disable or enable D-/I-Cache

- ethernet buffers (normal, non-cachable) and descriptors (strongly-ordered) in AXI-SRAM or in D2-SRAM with configured MPUs

- different frequencies for SYSClocks (480MHz, 400MHz, 240MHz)

- force address-aligned beats / transmit-store-forward-mode for ethernet DMA

- enabled error ethernet IRQ, but I received none

- disable checksum offloading

- output ethernet data via another interface (in my case, UART) before HAL_ETH_Transmit. The UART data always looks fine, but the Ethernet data does not.

 

Thoughts:

The correct payload is used to compute the IP/TCP/UDP checksum. Additionally, the ethernet frame has its own checksum. This checksum is not displayed in Wireshark. If it fails, the packet will be dropped before Wireshark. These checksums are computed in the STM-MAC.

If the packet becomes corrupted during transfer (e.g. from the MAC to the PHY), the Ethernet frame checksum check should at least fail. However, this does not happen. So something is happening in the MAC.

 

I am 90% sure that the issue is not software-related, but rather an STM problem.

However, I don't know how to investigate it any further.

I would be very grateful for any additional information or hints.

 

Maybe related to stm32h743-ethernet-tx-corruption

5 REPLIES 5
waclawek.jan
Super User

> issues with a few of our custom PCBs

To me, this sounds like power supply (decoupling?) and/or clock issue. Describe these, and/or check/validate at the hardware. Consider, that the clock you generate at MCO2 may not be stable/symmetrical enough.

Note, that the interface between FIFO and MAC serves also as a trans-clock-domain interface, as the DMA/FIFO runs at the internal  eth_hclk clock but Tx portion of MAC at the eth_mii_tx_clk clock (which is derived from ETH_RMII_REF_CLK clock within RCC). It's not entirely clear from the description in RM, but I can envisage that the ETH frame CRC is calculated as the data are transferred across this interface, so the CRC calculator may be at the RMII clock domain i.e. if there's data corruption at that interface, it will see the corrupted data.

That the corrupted data contain way too many 0x55, is a puzzling fact, though, and not too well supporting the above theory.

JW

 

Thank you very much for your response.

The power supply looks stable to me. My colleagues in the hardware department assure me that there cannot be a problem with the power supply because it is used in all products and product groups.
I will look for the PCB schematics for a detailed description.

We haven’t had a chance to check the ETH_REF_CLK clock yet, but we will.

Your note is helpful. It provides valuable insights, but I need time to think about what it actually means.

Werdf

It doesn't mean too much, I was just trying to somehow justify my rather weak theory that it's unstable-clock or in some other way hardware-related. The symptoms you've presented are rather puzzling.

waclawekjan_0-1777536275588.png

By "power supply" I mean more the way how power and ground is distributed to all VDD/GND pins (including decoupling and any significant external power sink in the vicinity), rather than the supply circuit itself.

JW

LCE
Principal II

I once had some simlar problems because the STM32's MAC was set up differently than the PHY's actual connection concerning speed (10M / 100M) and duplex. It worked most of the time, but then had lots of TCP re-transmissions.

Better compare your setup with the actual PHY registers showing the connection status.

Pavel A.
Super User

> Additionally, the ethernet frame has its own checksum. This checksum is not displayed in Wireshark. If it fails, the packet will be dropped before Wireshark. These checksums are computed in the STM-MAC.

>If the packet becomes corrupted during transfer (e.g. from the MAC to the PHY), the Ethernet frame checksum check should at least fail. However, this does not happen. So something is happening in the MAC.

If wireshark runs as root and enables promiscuous receive, it still can see packets with bad FCS (the ethernet frame own checksum) even if the FCS is not shown. It depends on capabilities of the host adapter. In linux you can tell the adapter to pass the FCS to software, and tell Wireshark to show the FCS. To be sure whether the corruption occurs in the MAC or on the way to PHY, enabling the FCS display would help.