cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F429 + LAN8742A - reinitialize LWIP/network after ESD shock

jp9
Associate II

EDIT: 

After doing some more research and testing, I've discovered that doing a software reset of the STM32 actually pulls the nRST pin itself low briefly. I did not realize this.  I assumed a software reset was purely internal.

In my previous testing, I noticed that the LAN8742A PHY was recovering after resetting only the STM32, so I thought it was something in the startup initialization sequence that was pulling the PHY out.  However, I now realize that the PHY is also being hard reset.  

(more details about this in latest reply below).

ORIGINAL MESSAGE:

Hello.  We have a controller using an STM32F429 with a LAN8742A ethernet phy.  That part of the design is roughly based on the STM32F429 discovery board.  Long story short - what is the correct process/procedure for reinitializing everything related to ethernet and LWIP?  We are experience ethernet lockups associated with electrical noise.  Resetting the STM32 without resetting the PHY seems to recover communication, but since this is running a piece of equipment, we are not able to fully reset while running.  

Details:

We are running FreeRTOS with LWIP, with a communication protocol using a single UDP socket.  This runs great, but occasionally a controller will stop responding to UDP requests.  We have done extensive bench testing of leaving controllers running for days and they never lose communication.  However, these controllers are in HVAC units, so there is some electrical noise from contactors and other equipment.

I have been able to reproduce the issue with ESD discharges to a board.  About 1 out of 10 times, a light ESD shock to the board in the area of the ethernet jack and PHY is enough to lock up the ethernet so that it no longer responds.  Everything else on the controller is still running and otherwise working - the ethernet simply stops responding.

We do not have a separate hardware reset line for the PHY - it is tied to the global reset - so we are not able to hard reset the PHY.  Doing a software reset of the PHY does not help.

One clue is that I *was* able to recover communication by triggering a chip reset of the STM32.  In this case, the LAN8742A is *not* hard reset, but the STM32 obviously goes through it's startup and initialization.  After this, the ethernet works.  So I'm convinced that I should be able to re-initialize the ethernet-related stuff and be able to recover without doing a full reset.  However, this isn't quite working.  I've done what I though is needed to bring down and reinitialize everything ethernet-related, but after that process. the MACCR register is all 0s, so nothing is enabled.  I suspect something is not in a state to be able to be configured when I'm trying to configure it - either something doesn't have a clock when it needs to, or *does* have a clock when it shouldn't.  

What is the correct sequence to fully re-initialize ethernet and LWIP as it would be after startup?  Here is what I have now:

void ethernet_phy_reset_thread(void const * argument)
{

  uint32_t oldRxCount = 0;
  ETH_MACConfigTypeDef MACConf = {0};
  for(;;)
  {

    phyResetTimer++;
    // FIXME: this is probably *not* the ideal way to determine if we are down
    if(getRxUnicastGoodFrameCount() != oldRxCount){
      oldRxCount = getRxUnicastGoodFrameCount();
      phyResetTimer = 0;
    }

    // if(phyResetTimer > PHY_RESET_INTERVAL * 60){
    // FIXME: short timer for testing
    if(phyResetTimer >10){
      phyResetInProgress = true;
      phyResetTimer = 0;
      phyResetCount++;
// FIXME: FIXME: FIXME: take out!!! - testing CPU reset(without hard resetting the PHY) to see if it is the CPU or the PHY that is locked
      // NVIC_SystemReset();
      // while(1);
     
     
      // Bring netif down
      netif_set_down(&gnetif);
      netif_set_link_down(&gnetif);

      // Deinit ETH peripheral
      HAL_ETH_DeInit(&heth);

      osDelay(2);
      HAL_ETH_MspDeInit(&heth);
      osDelay(2);
      // Optionally re-init ETH GPIO pins
      HAL_ETH_MspInit(&heth);
      osDelay(2);
      LAN8742_DeInit(&LAN8742);
      osDelay(2);
     
      // Toggle RMII interface
      SYSCFG->PMC &= ~SYSCFG_PMC_MII_RMII_SEL;
      osDelay(2);
      SYSCFG->PMC |= SYSCFG_PMC_MII_RMII_SEL;
      osDelay(2);
      LAN8742_Init(&LAN8742);
      osDelay(2);



      // Reinitialize DMA descriptors(must be done after de-initializing ETH peripheral)
      heth.Init.RxDesc = DMARxDscrTab;
      heth.Init.TxDesc = DMATxDscrTab;
      heth.Init.RxBuffLen = ETH_RX_BUFFER_SIZE;

      // Reinit ETH peripheral
      HAL_ETH_Init(&heth);

      // Reinit MAC config if needed
      HAL_ETH_GetMACConfig(&heth, &MACConf);
      MACConf.DuplexMode = ETH_FULLDUPLEX_MODE; // or as detected
      MACConf.Speed = ETH_SPEED_100M;           // or as detected
      HAL_ETH_SetMACConfig(&heth, &MACConf);

      // Start ETH interrupts
      HAL_ETH_Start_IT(&heth);

      // Bring netif up
      netif_set_up(&gnetif);
      netif_set_link_up(&gnetif);
     

      phyResetInProgress = false;
    }

    osDelay(1000);
  }
}

 

10 REPLIES 10
Pavel A.
Super User

Why do you reset the PHY if you found that it is alive and only the STM32 should be reset? Reset of the PHY can take several seconds (with auto link detection) so avoid it if possible. Try to re-initialize the MAC like when the link goes down and up again: stop the ETH then start again. Check the PHY to ensure that it still senses the link.

 

Thank you for the reply.

I had added the PHY reset because I wanted to effectively do the full initialization sequence that is done on powerup.  If this takes a few extra seconds then that's not a problem.  My main goal is to be able to recover communication without having to physically power cycle the unit, and without having to do a full chip reset of the STM32. 

The project does have a "link thread"(see below) that will detect a change in the link state, and will set the MAC config based on what the PHY reports.  With this, after the ESD event happens, the device will switch from 100/FULL to 10/HALF.  Presumably this is what the PHY renegotiates(or otherwise switches) to, and then the link thread changes the STM32's ethernet MAC to match what the PHY is reporting.  

However, after this happens, I can no longer communicate with the device over ethernet.  Previously I had discovered an issue with ST's ethernet driver(related to an errata) that was preventing my device from working at 10/half, so I figured fixing that would fix my problem.  However, I have verified that my device now communicates just fine at 10/half if I specifically configure the other end to use that.  So it's not an issue specifically with 10/half.  It's almost like there is a mismatch between what the PHY is doing and what the STM32's ethernet is doing.  

If the link thread(below) is incorrect then I can certainly change that to see if it fixes the issue.  I only added the separate, full ethernet reset thread as a "nuclear" option to re-initialize everything ethernet-related, since the link thread itself didn't seem to do it.

Thanks!

 

void ethernet_link_thread( void const * argument )
{
//  ETH_MACConfigTypeDef MACConf = {0};
//  int32_t PHYLinkState = 0U;
  uint32_t linkchanged = 0U, speed = 0U, duplex = 0U;
  struct netif *netif = (struct netif *) argument;

  for(;;)
  {

    if(phyResetInProgress)
    {
      // If phy reset is in progress then we don't want to try to read from the PHY until that is all done
      osDelay(100);
      continue;
    }
    PHYLinkState = LAN8742_GetLinkState(&LAN8742);

    if(netif_is_link_up(netif) && (PHYLinkState <= LAN8742_STATUS_LINK_DOWN))
    {
    	PHYLinkStateDownCount++;
      HAL_ETH_Stop_IT(&heth);
      netif_set_down(netif);
      netif_set_link_down(netif);
    }
    else if(!netif_is_link_up(netif) && (PHYLinkState > LAN8742_STATUS_LINK_DOWN))
    {


      switch (PHYLinkState)
		  {
		  case LAN8742_STATUS_100MBITS_FULLDUPLEX:
        duplex = ETH_FULLDUPLEX_MODE;
        speed = ETH_SPEED_100M;
        linkchanged = 1;
        break;
		  case LAN8742_STATUS_100MBITS_HALFDUPLEX:
        duplex = ETH_HALFDUPLEX_MODE;
        speed = ETH_SPEED_100M;
        linkchanged = 1;
        break;
		  case LAN8742_STATUS_10MBITS_FULLDUPLEX:
        duplex = ETH_FULLDUPLEX_MODE;
        speed = ETH_SPEED_10M;
        linkchanged = 1;
        break;
		  case LAN8742_STATUS_10MBITS_HALFDUPLEX:
        duplex = ETH_HALFDUPLEX_MODE;
        speed = ETH_SPEED_10M;
        linkchanged = 1;
        break;
        
	  default:
        break;
	  }

		  if(linkchanged)
		  {

			  PHYLinkStateUpCount++;	

        /* Get MAC Config MAC */
        HAL_ETH_GetMACConfig(&heth, &MACConf);
        MACConf.DuplexMode = duplex;
        MACConf.Speed = speed;
        HAL_ETH_SetMACConfig(&heth, &MACConf);
        HAL_ETH_Start_IT(&heth);
        netif_set_up(netif);
        netif_set_link_up(netif);
        if(!dhcpEnabled.currentValue){
          setIpAddress();
        }

		  }
		}

		osDelay(100);
  }
}

 

Pavel A.
Super User

after the ESD event happens, the device will switch from 100/FULL to 10/HALF.  Presumably this is what the PHY renegotiates(or otherwise switches) to

This is interesting. Of course if the other side (a switch?) does not changes its settings, the speed indicated by the phy is wrong and this won't work. Can you further peek into the PHY registers and tell if it indicates negotiation error? can it be asked to repeat negotiation?  If the PHY interrupt is connected to STM32, try to enable it and see if the interrupt is triggered when link fails.

 

Before adding the full "ethernet_phy_reset_thread" function(first post), I did add code to the :ethernet_link_thread" to tell the LAN8742A to renegotiate whenever it detected a link state change.  It went through the autonegotiation again(takes a couple seconds), and still came up as 10/HALF.  I'm not sure why the PHY seems to get locked in to that mode instead of negotiating back up to 100/FULL, as it started out.  However, when it gets in to this state(after an ESD event, it negotiates down to 10/HALF, and doesn't respond), if I do a software chip reset of the STM32, once it comes back up, the ethernet is back to 100/FULL.  

So I don't know what is happening during a full reset of the STM32 that isn't happening when I attempt to just reset the ethernet-related portion?

I am monitoring the link state on the controller using the LAN8742_GetLinkState(...) function, and I also check it on the switch side. 

I have not been specifically checking the PHY's registers or the interrupt state, but I can look, in to that.  I know no interrupts are being specifically enabled on the LAN8742A.  

The fact that a full chip reset(of the STM32, not the PHY) recovers it almost makes it feel like either an IO pin configuration issue or a clock issue.  I have seen on another microcontroller(an AVR), the port config bits can actually get "scrambled" as a result of electrical noise.  Not saying that's exactly what is going on in this case, but it's something I've considered.

Thanks again!

edit: to add to this - the nINT pin on the LAN8742A is being used for the ethernet REFCLK back to the STM32, so I don't have a dedicated interrupt pin to be able to read.  But I should be able to read the registers to see if any interrupt is being raised.

jp9
Associate II

Another interesting clue.  I put an oscilloscope on the REFCLK output of the LAN8742A.  Normally this outputs a clean 50MHz clock to the STM32.  After the ESD shock, the REFCLK signal goes away and just stays high, and does not recover.  

I'm not seeing any bits in any registers on the LAN8742A that would control this, but I do see where this clock output would be disabled if the PHY is in a "general powerdown" state, so I'll check for that.

Here is the relevant section of the schematic, showing the LAN8742A circuit.  

stm32f429 - lan8742A.png

Pavel A.
Super User

So is the refclk from the PHY used as the clock (HSE) to the STM32?

 

The 50MHz clock (ETH_REF_CLK) is supplied from the LAN8742A to PA1 on the STM32F429 for the ETH_RMII_REF_CLK signal.  

jp9_0-1758317282002.png

The STM32's HSE is a 25MHz crystal, with a 32.768kHz crystal for LSE.

When the ethernet comm loss occurs, the rest of the controller keeps running - IO still works, the touchscreen display still works.  But ethernet stops working(presumably because it no longer has the 50MHz REF_CLK to sync with).

I've done some additional testing of reading all the LAN8742A's registers to see what changes between working and not working.  When it is working, SMR is 0x60E0, indicating all modes + autoneg available.  After the event, SMR is 0x6000, indicating 10/half, autoneg disabled.  This is configured using the strap resistors on the PHY; it is not set separately in software.  I did add code to *force* these to all modes if it detects this has been changed, but I still don't have my 50MHz refclk out.  

The only other thing I can see that would kill the 50Mhz REF_CLK is if the LAN8742A was told to power down, but it is not.  I also added code to check this and make sure it is *not* in a powerdown mode.  

I'm having a hard time scoping the 25MHz crystal on the PHY, as attaching the scope seems to throw it off quite a bit.  

 

jp9
Associate II

After doing some more research and testing, I've discovered that doing a software reset of the STM32 actually pulls the nRST pin itself low briefly. I did not realize this.  I assumed a software reset was purely internal.

In my previous testing, I noticed that the LAN8742A PHY was recovering after resetting only the STM32, so I thought it was something in the startup initialization sequence that was pulling the PHY out.  However, I now realize that the PHY is also being hard reset.  

So...... I would still prefer not to have to reset my whole controller in order to recover the PHY, as I don't want to interrupt the operation of the equipment it is controlling.

Simply doing a software reset of the LAN8742A does not recover the REF_CLK output.  According to the datasheet, there are several bits in several registers that are not reset from a software reset, but I'm not seeing anything that would be specifically related to enabling or disabling the REF_CLK.  The only thing I can see that I can somewhat control that is the powerdown mode.  I've tried enabling and then disabling the powerdown mode, but this does not bring back the REF_CLK.

I was also wondering if I could disable/enable the internal 1.2V regulator in software, and maybe this would recover it.  It looks like this is only controlled by the config strap resistors, but I was wondering if maybe one of the "undefined" bits in one of the registers controls this(or even the REF_CLK enable/disable)?

Thanks!

Pavel A.
Super User

Probably it's worth to contact Microchip support or their forum on this.

doing a software reset of the STM32 actually pulls the nRST pin itself low briefly. 

Yes, sure. Designers should be very careful to check if they want this outgoing reset propagate to other things. Especially, things like GPS that need a lot of time to come to working state.