cancel
Showing results for 
Search instead for 
Did you mean: 

I may have found an error in the STM32H7 Ethernet Driver when receiving multiple frames.

MDS
Associate III

I have attached a trace file showing the error.

When ethernetif_input() receives multiple frames in its for() loop, the 2nd frame is assigned the 1st frame's buffer so the 1st frame is processed twice. Then every following frame in the receive loop uses the wrong receive buffer, the previous message buffer.. In the case when I sent 10 UDP frames that were received within the for() loop, the 1st frame is correct, the 2nd is a duplicate of the 1st, then remaining are assigned the wrong buffer, and the last frame was never received. This is very repeatable. It happens whenever 2 or more frames are received by ethernetif_input() . Just being connected to a busy Ethernet is sufficient to receive multiple frames.

The attached file is too long to post inline.

One more issue to consider in the ethernetif_input() for() loop:

void ethernetif_input( void const * argument )

{

struct pbuf *p;

struct netif *netif = (struct netif *) argument;

for( ;; )

{

   if( osSemaphoreWait( RxPktSemaphore, TIME_WAITING_FOR_INPUT ) == osOK )

  {

   do

   {

     p = low_level_input( netif );

    if (p != NULL)

    {

       if (netif->input( p, netif) != ERR_OK )

      {

         pbuf_free(p);

      }

   }

/* Build Rx descriptor to be ready for next data reception */

HAL_ETH_BuildRxDescriptors(&heth);

}while(p!=NULL);

The call to netif->input( p, netif) only places the p into the tcpip_input() mbox. The Rx_Buff in the payload has not been processed. When HAL_ETH_BuildRxDescriptors(&heth) is called, the Rx_Buff is returned to DMA OWN, and the DMA can reuse the Rx_Buff before or while the IP stack is processing the contents, which can lead to corruption.

30 REPLIES 30

H7 issues listed here:

  1. Hard binding buffers to rx-descriptors, restarting rx-DMA before it’s buffer’s been processed and resulting corruption <– My changes fix this.
  2. "Following frame in the receive loop uses the wrong receive buffer" <– I haven't seen this fail. Probably fixed before H7_FW V1.5.

H7 Issues listed by Piranha at https://community.st.com/s/question/0D50X0000BOtfhnSQB:

  1. Missing compiler and CPU memory barriers <- I haven't seen this fail.
  2. MMC counter interrupts not masked <- I haven't seen this fail.
  3. lwIP driver Rx buffers released while still in use (https://community.st.com/s/question/0D50X0000BozSc5SQE) <- Duplicate of the issue I'd fixed above.
  4. lwIP driver Tx missing D-cache clean operation (https://community.st.com/s/question/0D50X0000AnsIJeSQM) <- My ethernetif.c contains a different fix coded by a colleague. It defines a heap (lwip_custom_ram_heap) for tx buffers, which we configure non-cachable with MPU.
  5. HAL driver wrong Rx channel descriptor tail pointer (https://community.st.com/s/question/0D50X0000Bea7dNSQQ) <- My changes fix this by different technique.
  6. lwIP driver Rx data buffers not aligned to cache line size <- My changes fix this by different technique.
  7. Other lwIP API <- I haven't seen or perhaps I've fixed or perhaps these pertain example code. I use only Cube H7_FW V1.5 generated code. My changes work.

Further comments:

  1. My ethernetif.c is for a custom board. My changes attached my earlier posts are specifically to the ethernet receive. To use my change you must merge the receive parts of my ethernetif.c to yours.
  2. Reading FW_H7 V1.6.0, I observe two notable changes: (a) MX_LWIP_Init is called from main and (b) TCPIP_THREAD_PRIO is changed from osPriorityHigh to osPriorityBelowNormal. I haven't properly checked or tested these yet. Time poor.

@alister​ - Thanks for the exhaustive reply!

I also noticed the init changes, which to me at first glance look incorrect:

  • (a) init should be after scheduler is running I think (not sure, need to check); at minimum ethernet interrupt must not be enabled prior associated semaphores and queues are actually up and running.
  • (b) I don't know why one would want to reduce the LwIP thread priority, also seems incorrect...
Piranha
Chief II

In reply to @Dave Nadler​ and @alister​ regarding my issue list. Alister's code probably fixes points 3-6, if the implementation is correct. Points 1, 2 and 7 are still broken. Check out my updated topic, as I've added more detailed description of points 1 and 2, and updated information on one of the most serious bugs in non-H7 series, which was recently fixed by ST. As for the point 7, here is an example from alister's code:

void ethernet_link_thread(void const * argument)
{
  struct netif *netif = (struct netif *) argument;
  
  for(;;)
  {
  HAL_ETH_Start_IT(&heth);
  netif_set_up(netif);
  netif_set_link_up(netif);
  osDelay(100);
  }
}

It's not allowed to call RAW API functions from other threads without proper protection. It's written clearly in lwIP documentation to which I've provided links in my topic. And the use of netif_set_up() is wrong - is't not for a link status. Also the code doesn't check the actual link and calls all of the functions all the time in a loop. This code is completely broken!

@Piranha​, thanks for the feedback.

>Alister's code probably fixes points 3-6, if the implementation is correct. Points 1, 2 and 7 are still broken.

I'd inspected the ST's Cube H7_FW V1.5.0 source code and fixed every problem I'd found. Cube H7_FW V1.6.0 is the same as V1.5.0.

Point 1, Missing compiler and CPU memory barriers.

I've not seen this fail. I'm loath to add its cycles if it's not really warranted.

Could you provide more information please?

Point 2, MMC counter interrupts not masked.

The ST's Cube H7_FW V1.5.0 does not use the MCC and at this time I'm not adding it.

So this is safe.

Point 7, Calling lwIP RAW API without protection.

Yes this slipped through. I replace ethernet_link_thread with EthPhy for my custom phy. I should have commented ethernet_link_thread out.

I haven't shared EthPhy as it's not the topic.

Another bug in ethernet_link_thread is initializing the link up before it knows its up and so if it's down the up's spurious.

This weekend I added links to detailed descriptions for points 1 and 2 in my issue list topic. Check it out! 🙂

M.Holch
Associate II

now a year later... the bug still happens :(

We are working on a STM32H7 product, and are thinking about change to other vendor that will listen.. i fear we are not the only one thinking about this, and i hate if we are to stop using the STM32 :(

>now a year later... the bug still happens :(

Please list the H7 FW version that's still not working.

Have you tried https://community.st.com/s/question/0D50X0000C6eNNSSQ2/bug-fixes-stm32h7-ethernet?

Hello All,

Our team are working to resolve Ethernet issues.

The Ethernet driver will be fully reworked for multiple fixes by 21Q2.

Thank you for your patience while we work on this.

Imen

When your question is answered, please close this topic by clicking "Accept as Solution".
Thanks
Imen

21Q2 is over. Is it fixed already? And if not, then what is the latest ETA?

IIRC there was intent to provide some fix or change for ThreadX integration.

I really hope that ST would split the ETH driver to a chip-specific and multiple "middleware" parts, as demonstrated here.

ST will own the chip-specific ("HAL") module.

Each network stack (LwIP, FreeRTOS, ThreadX ...) will get their own "middleware" part which deals with memory and DMA descriptors.

90% of bugs and user complaints refer to this "middleware" part so outsourcing it will benefit both ST and users.

-- pa