Corrupted ETH Rx Buffer in STM32H7

DWeb_2 · ‎2021-06-08

Hello everyone,

I'm running a webserver on a STM32H7. Code was generated with CubeIDE 1.5.0, using LwIP V2.1.2 and the Firmware-Package V1.8.0. So far, I had no problems receiving or transmitting smaller files, in the x-xx kB range from the MCU to a client via Ethernet.

When the client requests 3 rather large files successively, the Rx ETH buffers get corrupted. In this example, the resource chart.js is transferred successfully (see below network trace). The data directly following this request can not be properly sent back from the server. They should be in the *** kB range, however my server can't interpret the HTTP query and sends back a error message (~100B).

This is a more detailed view with Wireshark. The transmission for the first resource (chart.js) works (on port 30986 here):

The other following GET's fail. Further investigation shows, that the begin of the HTTP request line and some header fields are overwritten with 0's. This is unprocessed data directly out of the LwIP stack:

The rest of the frame is ok and has valid data. This phenomenon never occurred to me before:

The ETH hardware is configured like this:

The MPU treats the Rx-Buffers as Strongly ordered, and the Rx+Tx descriptors as Device Memory:

Here is the snippet, where "buf" from above was inspected, with a breakpoint in line 10:

recv_err = netconn_recv(conn, &inbuf);
 
if(recv_err != ERR_OK)
{
    webservState = WEBS_NETWORK_ERR;
    break;
}
 
netbuf_data(inbuf, (void*)&buf, (u16_t*)&buflen);
webservState = WEBS_HANDLE_FILLING;
break;

I've read quiet a lot of forum posts about the ETH driver and it's problems, but I could not resolve my problem with this knowledge. I'm out of ideas what the cause of this might be.

Has anyone a tip where to dive deeper with debugging? Or has anyone had similar behavior?

Walid ZRELLI · ‎2021-06-11

Hello @DWeb_2 ,

This behavior might indicate a buffer overflow/stack overflow. If you're using FreeRTOS, I recommend that you activate stack overflow detection (option 2).

In fact, when swapping a task out of the Running state the RTOS kernel can check the last 16 bytes within the valid stack range to ensure that these known values have not been overwritten by the task or interrupt activity.

Take a look at this link https://www.freertos.org/Stacks-and-stack-overflow-checking.html for more information.

Otherwise, you could use Watchpoints to stop the debugger on a read/write access to a given address and catch the culprit in the act. I'd recommend activating them just before the issue appears as the data structure could be accessed frequently during normal code execution. Take a look at this link : https://mcuoneclipse.com/2012/04/29/watchpoints-data-breakpoints-in-mcu10/

I hope this help you.

BeST Regards,

Walid

Pavel A. · ‎2021-06-11

> I've read quiet a lot of forum posts about the ETH driver and it's problems, but I could not resolve my problem with this knowledge.

Yes there are yet quite a lot of known issues to resolve.

IIRC, ST said something about plans to revise the eth driver towards Azure RTOS and other integrations but no ETA and that is still to be seen.

If you're using FreeRTOS, have a look at their TCP/IP layer and their version of STM32 Eth driver.

--pa

DWeb_2 · ‎2021-06-13

Hi Walid,

tanks for your response and the provided links/information. I forgot to mention that we use FreeRTOS port from the CubeIDE on our System.

I tried out using the hook function you mentioned, however I did not observe any unusual behavior. In fact we already implemented a basic monitoring function, that uses FreeRTOS functions to read out the stack high watermark. According to those numbers everything is OK as well:

I also enabled Watchpoints on the first 4 Bytes of the TCP payload (within the Rx Buffers). However, I never stepped into the problematic regions:

extern uint8_t Rx_Buff[ETH_RX_DESC_CNT][ETH_RX_BUFFER_SIZE];
volatile uint32_t* rx0Ptr = (uint32_t*)&Rx_Buff[0][54];
volatile uint32_t* rx1Ptr = (uint32_t*)&Rx_Buff[1][54];
volatile uint32_t* rx2Ptr = (uint32_t*)&Rx_Buff[2][54];
volatile uint32_t* rx3Ptr = (uint32_t*)&Rx_Buff[3][54];

With these settings, the data breakpoint did not seem to fire. I set a instruction breakpoint where I first encountered the problem, and ran into the same issues, without previously stepping to where the actual 0 write happened:

Is it possible that the problematic write could occur during streaming of the RX DMA, and that in that timeframe, the processor can't evaluate the data breakpoint?

Thanks a lot for your help and tips

DWeb_2 · ‎2021-06-13

Hi Pavel,

thanks for your reply and the information. I will be looking forward to the new release. We will try out the FreeRTOS TCP/IP stack.

Piranha · ‎2021-07-17

> I'm out of ideas what the cause of this might be.

The absolute incompetence of the developers...

> Has anyone a tip where to dive deeper with debugging?

https://community.st.com/s/question/0D50X0000C6eNNSSQ2/bug-fixes-stm32h7-ethernet

https://community.st.com/s/question/0D50X0000BOtfhnSQB/how-to-make-ethernet-and-lwip-working-on-stm32