cancel
Showing results for 
Search instead for 
Did you mean: 

Very slow Ethernet connection on ETH1

rtie
Associate II

We have some devices based on STM32MP257 SoCs that connect to a 100 Mbit Ethernet network via ETH1, where ETH1 is going straight to an RMII PHY. Pins for ETH2 are reserved, but are not used/not connected. Some of our devices work rock-solid as excepted, but others exhibit massive performance problems on the Ethernet connection.

After lots of research, we think that our devices are affected by a silicon bug described in errata sheet ES ES0598 as "ETH1 RMII mode could have CRC errors". Our chip revisions are all rev Y, which are affected by the issue. The errata says that ETH1 should be routed through ETHSW to work around the silicon bug. To test if the workaround works for our case, I've tried to enable ETHSW, but failed so far because STM32CubeMX forces me to free some pins before it allows me to enable ETHSW in RMII mode. It seems like ETH1 and ETH3 must be used together if ETHSW is enabled, the pins for ETH3, however, are already used for other hardware functions.

So, my goal is to

- enable ETHSW in RMII mode,
- route ETH1 through ETHSW to work around the silicon bug,
- use the same set of pins as used by ETH1 in direct mode,
- don't use ETH3 (ETH2 would be OK, though), and
- keep all other pins unaffected by this change.

Is this possible at all?

1 ACCEPTED SOLUTION

Accepted Solutions

Hello @rtie ,
The good point is that pin-out is not impacted by the use of switch or not, it is fully transparent for you.

Concerning the low power question, the consumption will not be impacted so much, but the low power sequence is a bit less straight forward than the standard RMII case.

Depending on your timing constraints, what I would suggest is to wait for the next STM32MP2 cut that solves the RMII HW issue, that will be released soon. It will be easier than modifying all your soft for a temporary modification. By the way, even if you face some ETH1 RMII CRC issue, you can still prototype your product with it (it will generate some TCP retries or UDP packet lost maybe), but that will disappear with the new cut. With this way, you will be able to test without trouble the low power mode you want to reach and more.

Kind regards,
Erwan.

[Edit: you can already command the new MP2 cut]

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

9 REPLIES 9
Erwan SZYMANSKI
ST Employee

Hello @rtie ,
Yes you can absolutely do it, this is the typical workaround for this issue.

This X-LINUX-TSNSWCH will give you all the necessary stuff to make the switch working. Then, some words about the switch configuration on this article.

The goal is simple, you need to use the newly created switch interface as if you had a single ETH1 one.
You will see that the X-LINUX package brings a lot of services and script including TSN stuff. In your case, you are free to not use them at all and only keep the part dealing with the switch driver build, deploy and start.

In summary, the switch part is quite simple:

Kind regards,
Erwan.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Hi,

thank you for your answer, and sorry for my late reply.

I've tried to configure the system as suggested, but didn't succeed because STM32CubeMX doesn't allow me to. We have hardware that was designed to use the pins assigned to ETH1 and ETH2 in RMII mode without ETHSW in the way. Now we'd like to try the ETHSW workaround, but without changing the pin assignment. It would be best if ETHSW could be enabled transparently. Is this possible or does enabling ETHSW imply a hardware change?

Can you tell us something about the impact on power consumption when enabling ETHSW? We need to implement a network standby mode and make sure our device doesn't exceed a certain power budget. If ETHSW is known to cause a significant increase of power consumption, we might not be able to use it and would need a hardware revision anyway.

Hello @rtie ,
The good point is that pin-out is not impacted by the use of switch or not, it is fully transparent for you.

Concerning the low power question, the consumption will not be impacted so much, but the low power sequence is a bit less straight forward than the standard RMII case.

Depending on your timing constraints, what I would suggest is to wait for the next STM32MP2 cut that solves the RMII HW issue, that will be released soon. It will be easier than modifying all your soft for a temporary modification. By the way, even if you face some ETH1 RMII CRC issue, you can still prototype your product with it (it will generate some TCP retries or UDP packet lost maybe), but that will disappear with the new cut. With this way, you will be able to test without trouble the low power mode you want to reach and more.

Kind regards,
Erwan.

[Edit: you can already command the new MP2 cut]

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

I see, thank you very much for your help! We'll discuss internally how to proceed.

Hello Erwan,

I am using the chip from last year's revision, and I using ETH1 but I have not been using ETH1 in switch mode. So far, I haven't observed many abnormalities.

I would like to understand this errata better: Is this issue merely sporadic, or could it become severe once triggered under specific conditions? I want to make sure I fully grasp the failure characteristics.

Thank you~

Hello @bugman,

Sure, let me detail a bit more.
The issue is not a 100% present issue. We can even say that it is quite rare to fall on it. It will barely depend on the SoC piece, the layout of the PCB, and external parameters such as temperature (the higher the better to not fall in it), pressure...

Another thing, more VDDCORE is high, less probability you have to see the issue.

This is a set of grouped parameter that can make this issue appear, more particularly because it will influence indirectly the edge switch speed and the phase of an internal clock (in the SoC on MAC IP).

The behavior of this internal clock is what is corrected in next cut, but it is fully possible (and highly likely) that you will not observe it on your current SoC revision. 

You just have to keep this errata in mind so that one day, if you begin to see strange TCP retries or UDP packets dropped due to CRC errors on your ETH1 interface, you know this errata exists.

I hope I answered your question.

Kind regards,
Erwan.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Unfortunately, the problem wasn't so rare on our side. Possibly, this is due to the parameters you have mentioned, such as VDDCORE voltage and other settings set to unfavorable values. We have played with some of these a bit at a point when we still thought we might have a PCB design or power supply problem, but observed no differences while trying.

Anyway, after having related the observed network performance problems to the errata, we have tested with 17 SoCs in depth. It turned out that 10 of them were affected by the errata, to different degrees of severity. The other 7 devices work without any problem.

On affected devices, the 100 MBit connection is much slower than expected (only a few kB/s on some of our devices, a few hundred kB/s on others). There are no relevant entries in the kernel log. On some of the affected devices, the error counters shown by ifconfig stay consistently at all zero, on other devices they consistently rise quickly. Reducing the line speed to 10 MBit via ethtool tends to keep Ethernet working reliably, though, so this *might* be another workaround (don't take my word for it, though).

Cooling these chips down consistently made things worse, to the point that no data was transmitted at all. Applying heat somewhat improved reliably, like by blowing hot air at the chip or simply by stressing the CPU a bit. The unaffected SoCs can still work at full 100 MBit/s even at temperatures below 0°C.

While testing with freeze spray, we have discovered 4 devices that worked fine under "normal" conditions, but started to fail when cooling them down. The worst of these 4 started to degrade performance below 38°C already, the most stable of them at below 15°C. (We have read out the internal temperature sensors to figure out these temperatures.)

So, in a set of 17 devices, we had 6 devices with obvious problems, and 4 more devices with hidden problems.

@rtie ,
Yes you are right, my formulation was not perfect.

Basically, on a same PCB / layout, we observe that if the issue has a trend to appear, the probability to find it out on the same layout with other chips is high.

We can distinguish 3 cases:

  • Transparent: we do not fall at all in the issue
  • Bad: the issue is almost 100% present no matter the conditions
  • Mid: the issue is at a limit where temperature and VDDCORE value will have a huge impact on the OK/KO state.

Anyway, the issue comes from our SoC, we are really sorry for the impact on your project. My explanations was more to explain @bugman that in some cases, he may not see the issue on its platform, but that does not mean that there is no HW SoC issue.

Kind regards,
Erwan.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Hello,

I was also bitten by this bug (interestingly I see alignment errors, not CRC) and it was solved by switch. I needed to fix it in my FSBL, so that it you are interested this snippet can help you to start switch just for ETH1 port in RMII:

// we need ker_ethsw 125MHz
// and ker_ethswref 50MHz (only if we are RMII clocker),
RCC->ETHSWCFGR = RCC_ETHSWCFGR_ETHSWEN|RCC_ETHSWCFGR_ETHSWMACEN;
// 0x54200818 <= 0x66
(void)RCC->ETHSWCFGR;
SYSCFG->ETH1CR = 0x13; // RGMII, internal clk; 0x54233000
SYSCFG->ETHSWCR = 4; // RMII, internal 125MHz, external refclk; 0x54233800
volatile uint32_t *deip = (void*)0x5c000000;
#define FESn_PORT_STATER(N) (0x200000+N*0x10000)
deip[FESn_PORT_STATER(0)/4] = 0x120; // GMII
deip[FESn_PORT_STATER(2)/4] = 0x200; // MII
#define FES_RMII_STS 0x1000402
deip[FES_RMII_STS/4] = 1;
Martin