cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F7 lwIP, TCP, dupacks and delayed retransmit - w/ source code

LCE
Principal

Hello,

I'm working with a STM32F767, on a custom pcb and a Nucleo-144, same behavior, so it's probably not hardware / board related.

Source code:

  • Bare metal, no OS
  • SAI to ETH, via DMA
  • ETH: zero copy Tx & Rx
  • http server (TCP) works perfect
  • PTP via UDP works perfect
  • Debugging with UART, Wireshark, Jperf.

Problem:

  • streaming / sending TCP data from STM32 to Jperf server, 6.4 Mbit/s
  • every now and then there's a dupack, then fast retransmission, okay...
  • but usually after a few seconds there are lots of dupacks (>3), and after too many a retransmission occurrs with a delay of > 400 ms, killing my application

I played with the lwIP timers and the TCP retransmission timers, that reduced the delay to these 400 ms, before that it was sometimes 1.5 s.

I turned of everything in source code that might block.

I changed all ETH transfers to zero copy, TX with interrupt for non-blocking PTPd.

Any ideas?

Edit: I uploaded the most important source code parts - the complete zero copy (mostly non-Cube/HAL) ethernetif.c / .h

Edit 2: I uploaded the updated source, having removed some blunders...

For now TCP is running smoothly with 25.6 Mbps

1 ACCEPTED SOLUTION

Accepted Solutions
LCE
Principal

At least I found the cause of the late collision error:

It's the HAL / Cube setting up of the LAN8742, rather the reading of the PHY settings.

I used the "old" STM32F7 Cubae HAL stuff for setting this up - the last part remaining where I use HAL / Cube for ethernet...

It's reading back the wrong PHY register, and then sets its own duplex mode wrongly in MACCR.

Aaaaaarrrghhh....!

I found that because www told me that the most probable cause for the late collision error is that one side has the wrong duplex mode set.

Then I found that someone in another forum complained about the wrong register reading in HAL / Cube for LAN8742.

Here's the wrong and corrected version from stm32f7xx_hal_eth.c :

I hope that and the code above might help other people.

#define PHY_REGNEW_SSR					(uint16_t)31			/* PHY Special Status Register */
#define PHY_REGNEW_SSR_SPD_100M			((uint32_t)1 <<  3)		/* Speed Indication, bits 4:2 */
#define PHY_REGNEW_SSR_DUPL_FULL		((uint32_t)1 <<  4)		/* Speed Indication, bits 4:2 */
 
#if( 0 )
/* 2022-10-01
 * HAL reading back wrong register for status of LAN8742!
 */
	/* WRONG */
		/* Read the result of the auto-negotiation */
		if((HAL_ETH_ReadPHYRegister(heth, PHY_SR, &phyreg)) != HAL_OK)
		{
			/* In case of write timeout */
			err = ETH_ERROR;
 
			/* Config MAC and DMA */
			ETH_MACDMAConfig(heth, err);
 
			/* Set the ETH peripheral state to READY */
			heth->State = HAL_ETH_STATE_READY;
 
			/* Return HAL_ERROR */
			return HAL_ERROR;
		}
 
		/* Configure the MAC with the Duplex Mode fixed by the auto-negotiation process */
		if((phyreg & PHY_DUPLEX_STATUS) != (uint32_t)RESET)
		{
			/* Set Ethernet duplex mode to Full-duplex following the auto-negotiation */
			(heth->Init).DuplexMode = ETH_MODE_FULLDUPLEX;
		}
		else
		{
			/* Set Ethernet duplex mode to Half-duplex following the auto-negotiation */
			(heth->Init).DuplexMode = ETH_MODE_HALFDUPLEX;
		}
		/* Configure the MAC with the speed fixed by the auto-negotiation process */
		if((phyreg & PHY_SPEED_STATUS) == PHY_SPEED_STATUS)
		{
			/* Set Ethernet speed to 10M following the auto-negotiation */
			(heth->Init).Speed = ETH_SPEED_10M;
		}
		else
		{
			/* Set Ethernet speed to 100M following the auto-negotiation */
			(heth->Init).Speed = ETH_SPEED_100M;
		}
#else
	/* CORRECT */
		/* Read the result of the auto-negotiation */
		if( HAL_ETH_ReadPHYRegister(heth, PHY_REGNEW_SSR, &phyreg) != HAL_OK )
		{
			/* In case of write timeout */
			err = ETH_ERROR;
 
			/* Config MAC and DMA */
			ETH_MACDMAConfig(heth, err);
 
			/* Set the ETH peripheral state to READY */
			heth->State = HAL_ETH_STATE_READY;
 
			/* Return HAL_ERROR */
			return HAL_ERROR;
		}
 
		/* Configure the MAC with the Duplex Mode fixed by the auto-negotiation process */
		if( phyreg & PHY_REGNEW_SSR_DUPL_FULL )
		{
			/* Set Ethernet duplex mode to Full-duplex following the auto-negotiation */
			(heth->Init).DuplexMode = ETH_MODE_FULLDUPLEX;
		}
		else
		{
			/* Set Ethernet duplex mode to Half-duplex following the auto-negotiation */
			(heth->Init).DuplexMode = ETH_MODE_HALFDUPLEX;
		}
		/* Configure the MAC with the speed fixed by the auto-negotiation process */
		if( phyreg & PHY_REGNEW_SSR_SPD_100M )
		{
			/* Set Ethernet speed to 100M following the auto-negotiation */
			(heth->Init).Speed = ETH_SPEED_100M;
		}
		else
		{
			/* Set Ethernet speed to 10M following the auto-negotiation */
			(heth->Init).Speed = ETH_SPEED_10M;
		}
#endif

View solution in original post

8 REPLIES 8
LCE
Principal

Just added the most important source code parts ...

With "killing my application" I just mean that data transfer from SAI to ETH is stopped, because RAM is not big enough for such a time gap.

When I stop SAI DMA and restart ETH TX, it works again for a few seconds.

Further observations:

  • It stops much quicker with a higher data rate
  • I check every function with the CPU cycle counter, there's nothing blocking that long that might explain the delayed retransmission
  • I see that the TCP PCB's snd_buf gets full, unsent and unacked queue too
  • after the delayed retransmission, these queues are perfectly "emptied" as they should
LCE
Principal

So, tcp_output() somehow doesn't work on the unsent queues.

I put some error messages in tcp_output() / tcp_output_segment(), nothing.

I checked the lwIP mem stats, all below maximum.

I read back the PHY registers, 100M full duplex.

The problem right now is, that I have the feeling that I have looked everywhere...

Not everywhere:

I just started checking TX descriptor 0 = status for errors:

packets get lost / TX aborted because of late collision error.

Okay, at least I know why the packet's gone...

But that still doesn't explain this long retransmission delay.

LCE
Principal

At least I found the cause of the late collision error:

It's the HAL / Cube setting up of the LAN8742, rather the reading of the PHY settings.

I used the "old" STM32F7 Cubae HAL stuff for setting this up - the last part remaining where I use HAL / Cube for ethernet...

It's reading back the wrong PHY register, and then sets its own duplex mode wrongly in MACCR.

Aaaaaarrrghhh....!

I found that because www told me that the most probable cause for the late collision error is that one side has the wrong duplex mode set.

Then I found that someone in another forum complained about the wrong register reading in HAL / Cube for LAN8742.

Here's the wrong and corrected version from stm32f7xx_hal_eth.c :

I hope that and the code above might help other people.

#define PHY_REGNEW_SSR					(uint16_t)31			/* PHY Special Status Register */
#define PHY_REGNEW_SSR_SPD_100M			((uint32_t)1 <<  3)		/* Speed Indication, bits 4:2 */
#define PHY_REGNEW_SSR_DUPL_FULL		((uint32_t)1 <<  4)		/* Speed Indication, bits 4:2 */
 
#if( 0 )
/* 2022-10-01
 * HAL reading back wrong register for status of LAN8742!
 */
	/* WRONG */
		/* Read the result of the auto-negotiation */
		if((HAL_ETH_ReadPHYRegister(heth, PHY_SR, &phyreg)) != HAL_OK)
		{
			/* In case of write timeout */
			err = ETH_ERROR;
 
			/* Config MAC and DMA */
			ETH_MACDMAConfig(heth, err);
 
			/* Set the ETH peripheral state to READY */
			heth->State = HAL_ETH_STATE_READY;
 
			/* Return HAL_ERROR */
			return HAL_ERROR;
		}
 
		/* Configure the MAC with the Duplex Mode fixed by the auto-negotiation process */
		if((phyreg & PHY_DUPLEX_STATUS) != (uint32_t)RESET)
		{
			/* Set Ethernet duplex mode to Full-duplex following the auto-negotiation */
			(heth->Init).DuplexMode = ETH_MODE_FULLDUPLEX;
		}
		else
		{
			/* Set Ethernet duplex mode to Half-duplex following the auto-negotiation */
			(heth->Init).DuplexMode = ETH_MODE_HALFDUPLEX;
		}
		/* Configure the MAC with the speed fixed by the auto-negotiation process */
		if((phyreg & PHY_SPEED_STATUS) == PHY_SPEED_STATUS)
		{
			/* Set Ethernet speed to 10M following the auto-negotiation */
			(heth->Init).Speed = ETH_SPEED_10M;
		}
		else
		{
			/* Set Ethernet speed to 100M following the auto-negotiation */
			(heth->Init).Speed = ETH_SPEED_100M;
		}
#else
	/* CORRECT */
		/* Read the result of the auto-negotiation */
		if( HAL_ETH_ReadPHYRegister(heth, PHY_REGNEW_SSR, &phyreg) != HAL_OK )
		{
			/* In case of write timeout */
			err = ETH_ERROR;
 
			/* Config MAC and DMA */
			ETH_MACDMAConfig(heth, err);
 
			/* Set the ETH peripheral state to READY */
			heth->State = HAL_ETH_STATE_READY;
 
			/* Return HAL_ERROR */
			return HAL_ERROR;
		}
 
		/* Configure the MAC with the Duplex Mode fixed by the auto-negotiation process */
		if( phyreg & PHY_REGNEW_SSR_DUPL_FULL )
		{
			/* Set Ethernet duplex mode to Full-duplex following the auto-negotiation */
			(heth->Init).DuplexMode = ETH_MODE_FULLDUPLEX;
		}
		else
		{
			/* Set Ethernet duplex mode to Half-duplex following the auto-negotiation */
			(heth->Init).DuplexMode = ETH_MODE_HALFDUPLEX;
		}
		/* Configure the MAC with the speed fixed by the auto-negotiation process */
		if( phyreg & PHY_REGNEW_SSR_SPD_100M )
		{
			/* Set Ethernet speed to 100M following the auto-negotiation */
			(heth->Init).Speed = ETH_SPEED_100M;
		}
		else
		{
			/* Set Ethernet speed to 10M following the auto-negotiation */
			(heth->Init).Speed = ETH_SPEED_10M;
		}
#endif

LCE
Principal

The problem with lwIP's super late retransmission remains...

LCE
Principal

The problem with lwIP's super late retransmission still remains...

But it's not that urgent anymore, because now I haven't seen a retranmission for some days.

I have uploaded some updated source code for those who are interested in zero-copy ethernet IO for STM32F7 without OS, and mostly without HAL.

I stole it myself from everywhere and mixed it up... :D

Hi LCE,

I'm trying to use LwIP with zero-copy Tx on STM32H7. Did you succeded to do it? Do you have code example ?

For now when I use zero-copy Tx, I have an issue on my implementation of low_level_output where I don't have enough DMA decriptor available.

 

 

Hi,

i reduced the time for retransmission in tcp_priv.h:

 

 

#ifndef TCP_TMR_INTERVAL

#define TCP_TMR_INTERVAL 25//usermodify:before it was 250 /* The TCP timer interval in milliseconds. */

#endif /* TCP_TMR_INTERVAL */

 

 

I use STM32H725IGK6 + PHY ADIN1100. Code generated by CUBEMX. No RTOS, with ETH and with LWIP.

 

Another time improvement was to disable nagle algorithm (to force sending short messages):

 

 

#define tcp_nagle_disable(pcb) tcp_set_flags(pcb, TF_NODELAY)