cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H743 Nucleo - Ethernet stops working

My project uses the CubeMX RTOS LwIP web server demo. It works fine, just: after a long time, often after hours, the Ethernet peripheral stops working. It happens in a random way: sometimes the recv_err = netconn_recv(conn, &inbuf); does not result in ERR_OK, but very often the ETH just does not receive anymore. The ETH Interrupt is not generated anymore. The MCU has to be reset. The RTOS is still alive, just the ETH Peripheral seems to hang.

If this happens then I cannot see any ETH I/O registers in debugger, the block looks like disabled, hanging and not responding anymore. The bus interface to ETH seems to be locked up.

It happens, if I let display the dynamic Task View in web browser with a HTTP auto-refresh (every second), for a very long time.

The ETH MAC peripheral seems to hang. It happens always, just the duration how long it will still work is random and it will die after thousands of requests and packets transferred and hours running well (should not be an 'out of memory' issue).

Short before it will stop working - the web page auto refresh seems to need much more time (huge delay) before the request is responded. Nothing else is running and the FW should do all the time the same stuff.

ETH on STM32H7 is too unreliable if running for a long time (endless), with periodic HTTP requests. No clue how to debug and what to look for when ETH stops working (not access-able anymore via debugger).

12 REPLIES 12
JMund
Associate II

Just for the record we experience the same issues with ethernet, slow usability before finally locking and being reset by the WDT. Both CycloneTCP and LWIP experience the same issue...

If anyone has a workaround please share...

Kind Regards,

Jon Mundall

Dear Jon,

I think It is different the problem.

The ETH_SWRESET_TIMEOUT value is big in stm32h7xx_hal_eth.c file.

or I modified the period time more than 8 seconds for WDT .

Best Regards,

I have not further investigated - but I am still 'thinking': there might be something going on in my/your network which hits 'a bug' (more a 'lack') in the TCP/IP stack/implementation.

For instance: I use Power Line Modems in my home network, or a mix of Apple devices, Android, Windows PC ... Maybe these 'guys' send network packets which are 'non-standard' or even 'corrupted packets' might slip through (esp. when Power Line Modems). But the SW implementation is not able to cope with it - therefore it results in a hang, a wrong state inside SW because "this packet is not understood" properly or a corrupted packet results in a wrong action and state.

My first step to investigate further would be: use a very simple, easy network. At best, use fix IP addresses on device and PC and connect a direct cable (no switches, routesr, gateways, no DNS, DHCP servers, no Audinate/AES67 audio in network ...). Does it work?

If I change back to my "real" network - does it fail? What could going on in my network (e.g. traffic like SNMP, Multicast, Broadcast ... which is not enabled in my SW TCP/IP stack but could be traffic in my network and hits my network stack software).

All what I have debugged so far has more resulted in: the network SW stack has entered a "dead lock", a wrong state, potentially due to 'seeing a network frame' which is not really expected, not handled properly etc. So, I blame a bit my network (using 'non-common' components like Power Line Modems', Broadcast/Multicast streams) or the network stack software (which might not be tested in all imaginary network situations).