2024-12-05 10:01 AM - edited 2024-12-05 10:42 AM
I'm working on a custom PCB with a STM32F429ZIET6 MCU interfacing with a LAN8742A ethernet PHY. We only use static IP, and UDP communication. Our design has 28 devices all set to unique IPs on the network, which will be controlled over a proprietary protocol using UDP. The issue is that ethernet sometimes takes up to 15 minutes to start communicating. In some setups, the issue seems to be more reliable. For example setting a device up on a switch that only has a PC connected to it for communication, seems to be the best success rate. If we setup the same device on a router running DHCP, the issue becomes much more regular, probably 80%+ of the time the device will take 3.5 - 15 minutes to begin communication.
What I've observed in software is that the link status comes "up" on boot within the first 5 seconds of boot, and most of the time instantly. If this happens, communication is successful. We're currently setting up a test to run a device for 5 hours and verify that once the link status is "up" and the device is communicating over ethernet, it does not lose ethernet communication later.
I'm using STM32CubeIDE running FreeRTOS 10.3.1 CMSIS v1 and using the LWIP library and LAN8742 driver / BSP_COMPONENT_DRIVER. The only modification to ethernet setup is the pins used for communication.
When I read the BSR from LAN8742A I can see that the value of the register is either decimal 30765 (0b111100000101101) or decimal 30729 (0b111100000001001) on boot. Within the first 5 seconds the register will either change to 30765 and ethernet communication will be successful, or will remain 30729 for up to 15 minutes, and then change to 30765 indicating "link is up" and ethernet communication will be successful.
I'm running tests to confirm that FreeRTOS is not throttling the ethernet initialize process in any way. Any insight into what is required for link status to be "up" is greatly appreciated. We're combing the datasheet in search of anything we can use.
Note: Test setup between "successful" and "unsuccessful" ethernet communication within 5 seconds of booting the device is not changing. We see worse results with a router running DHCP, than results run with only a switch, but we can get failures either way.
See attached schematic for ethernet PHY.
2024-12-05 04:19 PM
Does the software detect speed and duplex correctly?
Are there some boards that always are faster or slower to connect? In other words can this be related to quality of the board?
2024-12-06 05:23 AM
The speed and duplex aren't negotiated until after link status is "up". Once the link status is "up" I have confirmed the speed and duplex are set correctly.
We're running tests overnight to verify if some boards are slower or faster to connect now.
2024-12-08 12:39 AM
I did complain on similar problems with STM developer but I was told fixing FreeRTOS is not their focus - they suggested to migrate to AZRTOS, this is their choice.
Some times ago, they posted this info, but to most of us, went unnoticed.
AZRTOS is definitely infinite times more stable and reliable than FreeRTOS, but NetxDuo not yet 100% bug free, use with caution.
Try yourself and share bugs with all of us - can save lot of time.
2024-12-08 01:38 AM
Is the VSS pad properly soldered?
Is 3.3V power supply rock stable? Are you sure about the choke?
Is the 25MHz oscillator input rock stable? Is the REFCLKOUT output rock stable 50MHz?
Are the straps properly configured (check against DS)? Does the software generate reset at the time when the 3.3VPHY voltage is already up and stable?
Is the 1.2V core voltage rock stable?
Do the LEDs reflect your "link goes up late" observation?
To take the link-related portion of software out of the equation, write a simple polled readout of the PHY's registers, and observe that on both a "known good" board like Nucleo or EVAL, and some of your problematic boards.
JW
2024-12-11 01:49 AM
After the basic ETH MAC setup (GPIO, ETH DMA SW reset, MDIO), the first thing I do is giving the LAN8742A a software reset via BCR register, then wait a little for the PHY to recover.
Then I write the enable bit for auto-negotiation, then get into a loop to check auto-negotiation result (with timeout).
This has worked on all ST-eval-boards, and a custom board with F767, with all kinds of address setups (fixed, DHCP, AutoIP). But I never had more than 3 boards at a time connected.
Not using HAL, so if you do, check all the steps in HAL... good nerves and luck!
2024-12-11 03:54 AM
> the first thing I do is giving the LAN8742A a software reset via BCR register
I don't have my ETH-PHY-related sources at hand but I don't think I do that, once I do hardware reset.
ST example code used to be spectacularly bad in that after reset/setup it simply waited for a couple of seconds and then checked the link status in PHY *once*, giving up if it was not connected, and that was all. This worked somehow, if there was a cable connected to a fast enough switch. This was in the SPL times, I have not reason to check Cube.
JW