cancel
Showing results for 
Search instead for 
Did you mean: 

Program stuck in netconn_connect() - possible solution

_AdamNtrx
Associate III

Hello! Recently I discovered that my program (STM32 TCP client) gets stuck in netconn_connect() function for about 5 minutes if server to which I want to connect to is not available when this function is called. Only after that time program would go into

if ((tcp_rexmit_rto_prepare(pcb) == ERR_OK) || ((pcb->unacked == NULL) && (pcb->unsent != NULL))) {

statement (which is present in tcp_slowtmr() function). Once it happens, the debugger shows that:

TCP PCB shown in debuggerTCP PCB shown in debugger

It's the tcp_rexmit_rto_prepare(pcb) condition that allows program to go into the if statement. Also, as debugger shows, nrtx is equal to 0, which means there were no retransmissions in the last 5 minutes (rtime=575, which is 287 seconds).

Interesting thing is it's possible to "unfreeze" the program way sooner by pinging the client device or by replugging Ethernet cable. netconn_connect() would finally finish and device could try to connect to server once again. By the way, the netconn I used was of blocking type. Non-blocking one leaves netconn_connect() immediately and returns ERR_ISCONN, but that doesn't help with reestabilishing connection to server if netconn_connect() was called when server wasn't available.

 

Possible solution

I tried many things to fix the problem, including disabling MPU (which I configured according to this guide) and changing all LwIP semaphore timeouts from infinity to hundreds of milliseconds. What finally helped was changing ethernetif_input() function present in ethernetif.c file:

/**
 * @brief This function should be called when a packet is ready to be read
 * from the interface. It uses the function low_level_input() that
 * should handle the actual reception of bytes from the network
 * interface. Then the type of the received packet is determined and
 * the appropriate input function is called.
 *
 *  netif the lwip network interface structure for this ethernetif
 */
void ethernetif_input(void* argument)
{
  struct pbuf *p = NULL;
  struct netif *netif = (struct netif *) argument;

/* OLD FOR LOOP */
//  for( ;; )
//  {
//    if (osSemaphoreAcquire(RxPktSemaphore, TIME_WAITING_FOR_INPUT) == osOK)
//    {
//      do
//      {
//        p = low_level_input( netif );
//        if (p != NULL)
//        {
//          if (netif->input( p, netif) != ERR_OK )
//          {
//            pbuf_free(p);
//          }
//        }
//      } while(p!=NULL);
//    }
//  }


/* NEW FOR LOOP */
  for( ;; )
    {

      osSemaphoreAcquire(RxPktSemaphore, TIME_WAITING_FOR_INPUT/*100*/); //both timeouts seem to work fine

      LOCK_TCPIP_CORE();
      HAL_ETH_ReleaseTxPacket(&heth); //release earlier transmitted packets
      UNLOCK_TCPIP_CORE();

      do
      {
        p = low_level_input(netif);
        if (p != NULL)
        {
          if (netif->input(p, netif) != ERR_OK)
          {
            pbuf_free(p);
          }
        }
      } while(p != NULL);
    }
}

 

Using HAL_ETH_ReleaseTxPacket() function before do... while loop finally allowed program to properly retransmit SYN packet and not get stuck in netconn_connect(). From what I understand, old SYN packet, sent when server was down, was stuck in DMA and held reference to output segments, which made tcp_rexmit_rto_prepare() function return ERR_VAL and because of that tcp_slowtmr() would not get into if statement mentioned at the beginning of this post. HAL_ETH_ReleaseTxPacket() makes program acknowledge that SYN packet has been sent and thus retransmitting it becomes possible. Also, since HAL_ETH_ReleaseTxPacket() is being called when device performs a transmission (with use of low_level_output() function), pinging STM32 makes it release old Tx packets when it's responding to ping.

 

 

EDIT 1 Short non-infinite timeout in new for loop of ethernetif_input() seems to be more reliable and allows for faster "unfreeze" than long/infinite timeout.

0 REPLIES 0