cancel
Showing results for 
Search instead for 
Did you mean: 

32F417 - how to prevent low_level_output() (in ethernetif.c) overwriting the last DMA buffer?

PHolt.1
Senior III

It seems that one has to wait on the three TPS bits = 000 in DMASR.

I fixed the problem by putting

 if ( (((heth->Instance)->DMASR) & (0x7 << 20)) != 0 ) {}

at the end of HAL_ETH_TransmitFrame().

That loop should not hang unless the silicon is defective - right?

This is the code which is all over the internet, and I think the check of whether the buffer is available is in the wrong place. It should be BEFORE the memcpy().

      /* Check if the length of data to copy is bigger than Tx buffer size*/
      while( (byteslefttocopy + bufferoffset) > ETH_TX_BUF_SIZE )
      {
        /* Copy data to Tx buffer*/
        memcpy( (u8_t*)((u8_t*)buffer + bufferoffset), (u8_t*)((u8_t*)q->payload + payloadoffset), (ETH_TX_BUF_SIZE - bufferoffset) );
 
        /* Point to next descriptor */
        DmaTxDesc = (ETH_DMADESCTypeDef *)(DmaTxDesc->Buffer2NextDescAddr);
 
        /* Check if the buffer is available */
        if((DmaTxDesc->Status & ETH_DMATxDesc_OWN) != (u32)RESET)
        {
          errval = ERR_USE;
          goto error;
        }
 
        buffer = (u8 *)(DmaTxDesc->Buffer1Addr);
 
        byteslefttocopy = byteslefttocopy - (ETH_TX_BUF_SIZE - bufferoffset);
        payloadoffset = payloadoffset + (ETH_TX_BUF_SIZE - bufferoffset);
        framelength = framelength + (ETH_TX_BUF_SIZE - bufferoffset);
        bufferoffset = 0;
      }

13 REPLIES 13
Piranha
Chief II

> The requirement for a full printf is due to needing uint63 and float support, ... However I am not sure if there is a full-feature printf which doesn't use the heap, and is itself thread-safe.

This is an absolute proof that you haven't even read the link I gave you about it. And you have not looked at another two implementations given to you by people on EEVblog forum, which, by the way, are the same two implementations recommended by Dave Nadler. Though one of the ones given by me, is anyway based on those other two, but has evolved further. So, you have an endless time to spend on fighting with a code written by idiots, but you have no time to read and learn about a better solutions people give you all around the internet?

> Regarding low_level_input, my mutexes are based on your code here

But, of course, you haven't read my latest post in that topic, written specifically as an answer for you.

> I guess LWIP monitors the input buffer structure to see if new data has arrived?

I literally answered that in my previous post in this topic.

> However, I never managed to work out what controls LWIP's tx buffers.

There are no Tx specific buffers in lwIP itself. Any PBUF, that has to be transmitted, becomes a "Tx buffer".

> My code (implemented by someone else; I don't really understand it) came from the ST Cube examples. It seems to work ok. Do you think there is something wrong with it?

It's an idiot code by ST and it's documented in my Ethernet issue list topic what is wrong with it. Wasn't it said million times all around the internet and thousand times for you specifically that in RTOS environment calling lwIP raw APIs without core locking is forbidden?

> Maybe a mutex is needed somewhere. I see ethernetif_update_config is indeed called from another RTOS thread.

Your kde_DHCP_task() is another thread - it's not the lwIP core thread! And it's not some arbitrary mutex, which you have to use, but the special core locking macro.

How many times everyone have to repeat the same things over and over again for you to actually read that information?

PHolt.1
Senior III

OK thanks. I have removed mutexes from low_level_input and low_level_output and all is running ok.

"Your kde_DHCP_task() is another thread - it's not the lwIP core thread! And it's not some arbitrary mutex, which you have to use, but the special core locking macro."

I am using LWIP core locking, so isn't that ok now?

"> I guess LWIP monitors the input buffer structure to see if new data has arrived?

I literally answered that in my previous post in this topic."

I am sorry but I am a bit stupid, and I cannot find it.

"There are no Tx specific buffers in lwIP itself. Any PBUF, that has to be transmitted, becomes a "Tx buffer"."

That is news to me, and I have read a vast amount of stuff on LWIP. Much of it talks about lwipopts.h values and such. But it doesn't matter, because what I have works, I am nowhere near clever enough to write the code for zero-copy operation.

Regarding performance, sure, UDP is one-way. But 250kbytes/sec is ok for now. Any dramatic speedup would need interrupt-based rx, but that can be done anytime later.

My SPI flash filesystem has a read speed of 1000kbytes/sec and a write speed of 30kbytes/sec. I have done extensive tests with an SPI RAM and the speed is 2000kbytes/sec. If you built a FatFS filesystem in there the speed would be more like 1000 again. So I am not far off.

Regarding my too-fast input concern: in this architecture, packet filtering is not done at a low level. It is done inside LWIP. So everything seen at the RJ45 socket goes all the way up to LWIP. Doesn't this cause a problem, if you are on a LAN with loads of devices on it? Maybe an ethernet switch doesn't route packets to the box unless addressed by MAC# ?

Another factor for zero-copy is that the RTOS uses the 64k CCM and that doesn't support DMA. AFAIK, LWIP buffers are not located within any of the RTOS stacks,

If somebody would like to have a go at overhauling my code, I am happy to pay for their time, make a donation to their favourite cat rescue charity, etc : - )

PHolt.1
Senior III

I have restructured the code so that the link change is done in the same RTOS task as the low level input check, which is done at 100Hz, and the link change check is done at 1Hz. This should avoid any thread safety issue.

/**
  * This function is the ethernetif_input task. It uses the function low_level_input()
  * that should handle the actual reception of bytes from the network
  * interface. Then the type of the received packet is determined and
  * the appropriate input function is called.
  *
  * This is a standalone RTOS task so is a forever loop.
  * Could be done with interrupts but then we have the risk of hanging the KDE with fast input.
  * Mutexes removed, based on
  * https://community.st.com/s/question/0D73W000001P2M3SAK
  *
  */
void ethernetif_input( void * argument )
{
	struct pbuf *p;
	struct netif *netif = (struct netif *) argument;
	uint32_t link_change_check_count = 1000/ETH_IN_POLL_PERIOD;	// to get 1000ms = 1 sec
 
	do
    {
		p = low_level_input( netif );
 
		if (p!=NULL)
		{
			if (netif->input( p, netif) != ERR_OK )
			{
				pbuf_free(p);
			}
		}
 
		osDelay(10);	// this has a dramatic effect on ETH speed, both ways
 
		// Do ETH link status change check
 
		link_change_check_count--;
		if (link_change_check_count==0)
		{
			// reload counter
			link_change_check_count = 1000/ETH_IN_POLL_PERIOD;
 
			// Get most recently recorded link status
			bool net_up = netif_is_link_up(&g_netconf_netif);
 
			// Read the physical link status
			ethernetif_set_link(&g_netconf_netif);
 
			// Has the link status changed
			if (net_up != netif_is_link_up(&g_netconf_netif))
			{
				ethernetif_update_config(&g_netconf_netif);
 
				if (net_up) {
				   // Link was up so must have dropped
				   debug_thread("Ethernet link down");
				}
				else {
				   // Link was down so must be up - restart DHCP
				   debug_thread("Ethernet link up");
				   restart_DHCP();
				}
			}
		}
 
    } while(true);
 
}
 
 
/**
  * @brief Should allocate a pbuf and transfer the bytes of the incoming
  * packet from the interface into the pbuf.
  *
  * @param netif the lwip network interface structure for this ethernetif
  * @return a pbuf filled with the received packet (including MAC header)
  *         NULL on memory error
  */
static struct pbuf * low_level_input(struct netif *netif)
{
  struct pbuf *p = NULL, *q = NULL;
  uint16_t len = 0;
  uint8_t *buffer;
  __IO ETH_DMADescTypeDef *dmarxdesc;
  uint32_t bufferoffset = 0;
  uint32_t payloadoffset = 0;
  uint32_t byteslefttocopy = 0;
  uint32_t i=0;
  
  /* get received frame */
 
  HAL_StatusTypeDef status = IF_HAL_ETH_GetReceivedFrame(&EthHandle);
 
  if (status != HAL_OK) {
    return NULL;		// Return if no RX data
  }
 
  /* Obtain the size of the packet and put it into the "len" variable. */
  len = EthHandle.RxFrameInfos.length;
  buffer = (uint8_t *)EthHandle.RxFrameInfos.buffer;
  
  if (len > 0)
  {
    /* We allocate a pbuf chain of pbufs from the Lwip buffer pool */
    p = pbuf_alloc(PBUF_RAW, len, PBUF_POOL);
  }
  
  if (p != NULL)
  {
    dmarxdesc = EthHandle.RxFrameInfos.FSRxDesc;
    bufferoffset = 0;
    
    for(q = p; q != NULL; q = q->next)
    {
      byteslefttocopy = q->len;
      payloadoffset = 0;
      
      /* Check if the length of bytes to copy in current pbuf is bigger than Rx buffer size */
      while( (byteslefttocopy + bufferoffset) > ETH_RX_BUF_SIZE )
      {
        /* Copy data to pbuf */
        memcpy( (uint8_t*)((uint8_t*)q->payload + payloadoffset), (uint8_t*)((uint8_t*)buffer + bufferoffset), (ETH_RX_BUF_SIZE - bufferoffset));
        
        /* Point to next descriptor */
        dmarxdesc = (ETH_DMADescTypeDef *)(dmarxdesc->Buffer2NextDescAddr);
        buffer = (uint8_t *)(dmarxdesc->Buffer1Addr);
        
        byteslefttocopy = byteslefttocopy - (ETH_RX_BUF_SIZE - bufferoffset);
        payloadoffset = payloadoffset + (ETH_RX_BUF_SIZE - bufferoffset);
        bufferoffset = 0;
      }
 
      /* Copy remaining data in pbuf */
      memcpy( (uint8_t*)((uint8_t*)q->payload + payloadoffset), (uint8_t*)((uint8_t*)buffer + bufferoffset), byteslefttocopy);
      bufferoffset = bufferoffset + byteslefttocopy;
    }
  }
 
  /* Release descriptors to DMA */
  /* Point to first descriptor */
  dmarxdesc = EthHandle.RxFrameInfos.FSRxDesc;
  /* Set Own bit in Rx descriptors: gives the buffers back to DMA */
  for (i=0; i< EthHandle.RxFrameInfos.SegCount; i++)
  {  
	__DMB();
    dmarxdesc->Status |= ETH_DMARXDESC_OWN;
    dmarxdesc = (ETH_DMADescTypeDef *)(dmarxdesc->Buffer2NextDescAddr);
  }
 
  /* Clear Segment_Count */
  EthHandle.RxFrameInfos.SegCount =0;
  
  /* When Rx Buffer unavailable flag is set: clear it and resume reception */
  if ((EthHandle.Instance->DMASR & ETH_DMASR_RBUS) != (uint32_t)RESET)  
  {
    /* Clear RBUS ETHERNET DMA flag */
    EthHandle.Instance->DMASR = ETH_DMASR_RBUS;
    /* Resume DMA reception */
    EthHandle.Instance->DMARPDR = 0;
  }
  return p;
}

Piranha
Chief II

As this is relevant for the code in this topic... There is a new topic, in which all of the issues with ST's link state management and DHCP code were reported:

https://community.st.com/s/question/0D53W00001sGiChSAK/cubemx-lwip-ethernetlinkthread-bug

The main issues are related to an improper use of HAL and lwIP APIs regarding multi-threading and a few other issues. There are also some links, which show how to use lwIP core thread locking properly.

> But 250kbytes/sec is ok for now.

It was already explained in an earlier post:

"And it's not only about an average data rate - the frames can come one per day, but if they all come in your 10 ms period then the first ones fill the buffers and the rest of frames are lost."

> Maybe an ethernet switch doesn't route packets to the box unless addressed by MAC# ?

And this was answered even before you asked:

"Also I recommend reading my posts in this topic to learn a better understanding of how an actual Ethernet network works and what impact it has on your device."