I may have found an error in the STM32H7 Ethernet Driver when receiving multiple frames.

MDS · ‎2018-10-17

I have attached a trace file showing the error.

When ethernetif_input() receives multiple frames in its for() loop, the 2nd frame is assigned the 1st frame's buffer so the 1st frame is processed twice. Then every following frame in the receive loop uses the wrong receive buffer, the previous message buffer.. In the case when I sent 10 UDP frames that were received within the for() loop, the 1st frame is correct, the 2nd is a duplicate of the 1st, then remaining are assigned the wrong buffer, and the last frame was never received. This is very repeatable. It happens whenever 2 or more frames are received by ethernetif_input() . Just being connected to a busy Ethernet is sufficient to receive multiple frames.

The attached file is too long to post inline.

One more issue to consider in the ethernetif_input() for() loop:

void ethernetif_input( void const * argument )

{

struct pbuf *p;

struct netif *netif = (struct netif *) argument;

for( ;; )

{

if( osSemaphoreWait( RxPktSemaphore, TIME_WAITING_FOR_INPUT ) == osOK )

{

do

{

p = low_level_input( netif );

if (p != NULL)

{

if (netif->input( p, netif) != ERR_OK )

{

pbuf_free(p);

}

/* Build Rx descriptor to be ready for next data reception */

HAL_ETH_BuildRxDescriptors(&heth);

}while(p!=NULL);

The call to netif->input( p, netif) only places the p into the tcpip_input() mbox. The Rx_Buff in the payload has not been processed. When HAL_ETH_BuildRxDescriptors(&heth) is called, the Rx_Buff is returned to DMA OWN, and the DMA can reuse the Rx_Buff before or while the IP stack is processing the contents, which can lead to corruption.

MDS · ‎2018-10-18

Upon further investigation, I found the cause of the Rx_Buff misalignment issue and a fix. The cause is in the function HAL_ETH_GetRxDataBuffer() in the file stm327xx_hal_eth.c, at the top of the function as shown below with the fix.

HAL_StatusTypeDef HAL_ETH_GetRxDataBuffer(ETH_HandleTypeDef *heth, ETH_BufferTypeDef *RxBuffer)

{

ETH_RxDescListTypeDef *dmarxdesclist = &heth->RxDescList;

uint32_t descidx = dmarxdesclist->FirstAppDesc;

uint32_t index, accumulatedlen = 0, lastdesclen;

__IO const ETH_DMADescTypeDef *dmarxdesc = (ETH_DMADescTypeDef *)dmarxdesclist->RxDesc[descidx];

ETH_BufferTypeDef *rxbuff = RxBuffer;

if(dmarxdesclist->AppDescNbr ==0)

{

// Raytheon: If another frame is ready, FirstAppDesc will be updated,

// so the values set upon entry still access the previous Rx_Buff, which is wrong.

if(HAL_ETH_IsRxDataAvailable(heth) == 0)

{

/* No data to be transferred to the application */

return HAL_ERROR;

}

// Raytheon: FirstAppDesc may have been changed by HAL_ETH_IsRxDataAvailable(), making descidx old

descidx = dmarxdesclist->FirstAppDesc; // refresh

dmarxdesc = (ETH_DMADescTypeDef *)dmarxdesclist->RxDesc[descidx]; // refresh

The problem is that descidx and dmarxdesc are set BEFORE calling HAL_ETH_IsRxDataAvailable(). This sets the values to those of the last process receive Rx_Buff. When another frame is ready, calling HAL_ETH_IsRxDataAvailable() will update FirstAppDesc so descidx and dmarxdesc MUST be reloaded to access the new Rx_Buff. It is better just to wait until after the call and then set the values as shown above. This issue is fixed.

The 2nd issue of the dmarxdesc and its Rx_Buff being returned to DMA OWN after placing the pbuf into the tcpip_input mbox still exists, and is something to be aware of. By returning the dmarxdesc to DMA ownership, the Rx_Buff can be over written before, or while, it is going up the IP stack. It appears that the p->payload is not freed until the application reads the net_conn. At any time before the payload is processed, the Ether DMA can overwrite all or part of the payload with a new receive frame. As the netbuf is for the old payload, the corrupted payload could cause the application or net_conn layer to crash as the overwritten payload most likely is not the size of the previous frame, and the new payload may not even be for this application. Solving this looks like a major rewrite.

Imen.D · ‎2018-12-21

Hello @MDS ,

We have reproduced the issue of redundant Ethernet packet when decreasing the UDP interval under 100 us. It’s due that when the new data sent by the UDP sender transferred from PHY to system memory, the RX DMA descriptors not yet updated and they still contain the address of last received packet. And in this case the application won’t read the new arrived data.

Could you please try with the attached ethernetif.c and let me know if it fixes your problem.

Kind Regards,

Imen.

When your question is answered, please close this topic by clicking "Accept as Solution".
Thanks
Imen

MDS · ‎2018-12-21

The above attached file has major differences throughout the file. My ethernetif.c is copyright 2018 and it from CubeMxH7 V1.3.0. The file above is copyright 2017 and has so many differences, I don't know which ones I need to look at. It will not build as the Ethernet handle has a different name than the rest of my LwIP source code. What part of the above file am I suppose to review?

Is this fix for the packet being returned to DMA owned before it is processed by the stack and therefore possibly overwritten, or the redundant Ethernet packet caused be back-to-back UDP packets.

My proposed fix for the redundant Ethernet packet is in HAL_ETH_GetRxDataBuffer() in the file stm327xx_hal_eth.c. It is really simple...just 2 lines (see above).

I don't see a fix for returning a packet buffer to DMA owned before it is processed, except just have a lot of buffers.

Harrold · ‎2019-02-07

Good point. The function name HAL_ETH_IsRxDataAvailable() gives the impression it will only check for data available, but as you pointed out, the function can (and will) also change some of the dmarxdesc values.

Did you noticed HAL_ETH_IsRxDataAvailable() is also called from within HAL_ETH_IRQHandler() ?

Dave Nadler · ‎2019-03-15

@Imen DAHMEN , @MDS - Have these two issues (Ethernet support bugs) been resolved?

If so, in which version of CubeMX?

Thanks!

Best Regards, Dave

Harrold · ‎2019-03-18

I installed CubeMX 5.1.0, but I don't see any fix in the new generated code related to this two issues. We still use modified versions of stm32h7xx_hal_eth.c and ethernetif.c to make it work.

Harrold · ‎2019-03-18

I installed CubeMX 5.1.0, but I don't see any fix in the new generated code related to this two issues. We still use modified versions of stm32h7xx_hal_eth.c and ethernetif.c to make it work.

Dave Nadler · ‎2019-03-18

@Harrold , @MDS - Did you fix both the "wrong descriptor" and "pbuf freed before use" bugs? Sorry, incorrect: Also, on reviewing the ST-provided HAL_ETH_IRQHandler, this code assumes that there is only one new frame per interrupt, while the documentation clearly states the ISR is required to process multiple frames. Did anyone fix this 3rd bug?

Thanks,

Best Regards, Dave

Harrold · ‎2019-03-19

I used the information from MDS (many thanks MDS!) to update stm32h7xx_hal_eth.c, not only adapted HAL_ETH_GetRxDataBuffer(), but it turned out we also must update HAL_ETH_IRQHandler() to prevent freezing after some times. This ISR-function also calls HAL_ETH_IsRxDataAvailable() and maybe I'm wrong but I really don't think this function should be called from within an ISR. This (extra) check should (and will be) be done later. So I comment out the to call HAL_ETH_IsRxDataAvailable() and this solved freezing.

Here are the modification we made in stm32h7xx_hal_eth.c, but of course we prefer a 'real' solution from ST.

Index: stm32h7xx_hal_eth.c
===================================================================
--- stm32h7xx_hal_eth.c	(revision 2244)
+++ stm32h7xx_hal_eth.c	(revision 2245)
@@ -901,7 +901,11 @@
       return HAL_ERROR;
     }
   }
-  
+
+  // Refresh descidx and dmarxdesc after the call to HAL_ETH_IsRxDataAvailable() (HS 8-2-2019)
+  descidx = dmarxdesclist->FirstAppDesc; // refresh
+  dmarxdesc = (ETH_DMADescTypeDef *)dmarxdesclist->RxDesc[descidx]; // refresh
+
   /* Get intermediate descriptors buffers: in case of the Packet is splitted into multi descriptors */
   for(index = 0; index < (dmarxdesclist->AppDescNbr - 1); index++)
   {    
@@ -1004,7 +1008,10 @@
       return HAL_ERROR;
     }
   }
-  
+
+  // Refresh descidx after the call to HAL_ETH_IsRxDataAvailable() (HS 8-2-2019)
+  descidx = dmarxdesclist->FirstAppDesc; // refresh
+
   /* Get index of last descriptor */
   INCR_RX_DESC_INDEX(descidx, (dmarxdesclist->AppDescNbr-1));
   /* Point to last descriptor */
@@ -1036,6 +1043,9 @@
       return HAL_ERROR;
     }
   }
+
+  // Refresh descidx after the call to HAL_ETH_IsRxDataAvailable() (HS 8-2-2019)
+  descidx = dmarxdesclist->FirstAppDesc; // refresh
   
   /* Get index of last descriptor */
   INCR_RX_DESC_INDEX(descidx, ((dmarxdesclist->AppDescNbr) - 1U));
@@ -1166,7 +1176,9 @@
     if(__HAL_ETH_DMA_GET_IT_SOURCE(heth, ETH_DMACIER_RIE)) 
     {      
       /* Call this function to update handle fields */
-      if(HAL_ETH_IsRxDataAvailable(heth) == 1)
+      // No, don't call this function from within a IRQ function
+      // because it update handle fields! (HS 8-2-2019)
+//    if(HAL_ETH_IsRxDataAvailable(heth) == 1)
       {
         /* Receive complete callback */
         HAL_ETH_RxCpltCallback(heth);

Regarding ethernetif.c, we made a lot of changes. In essence we replaced the zero-copy mechanism, because in our opinion, it simply cannot work the way it is currently implemented. Also for this issue we are looking forward for a real solution from ST. See also here.

With this two updates, our lwip stack looks pretty stable.

Regarding your comment about HAL_ETH_IRQHandler() which assumes there is only one new frame, I'm not sure whether there is a problem. Function ethernetif_input() in ethernetif.c contains a do-while-loop where it handles all received frames at once. But please let me know if you find a situation where this could fail.