cancel
Showing results for 
Search instead for 
Did you mean: 

Ethernet TX doesn't work because ETH_DMATxDescListInit() doesn't initialize the entire ETH_DMADescTypeDef

Robert Sexton
Associate II

The officially recommended recipe for lwIP on STM32H7 puts the ethernet DMA descriptors in SRAM1. SRAM1 is not zero'd by the default startup code.

It's unsafe for the network drivers to assume that buffers and critical data structures live in .bss.

Fix: ETH_DMATxDescListInit() should zero out the data structures w/ memset.

static void ETH_DMATxDescListInit(ETH_HandleTypeDef *heth)
{
  ETH_DMADescTypeDef *dmatxdesc;
  uint32_t i;
 
  // There is no guarantee that the Tx Descriptor is in pre-initialized memory
  // zero it out to be completely sure. 
  memset(&heth->TxDescList,0,sizeof(ETH_TxDescListTypeDef));
 
  /* Fill each DMATxDesc descriptor with the right values */
  for(i=0; i < (uint32_t)ETH_TX_DESC_CNT; i++)
.
.
.
 

10 REPLIES 10
Piranha
Chief II

You have misunderstood the data structures. ETH_TxDescListTypeDef is not a descriptor structure for hardware, but a software-only structure, in which the driver keeps additional variables related to the descriptor list. The descriptors themselves are zeroed in the code immediately following the code you presented:

https://github.com/STMicroelectronics/STM32CubeH7/blob/0a714a644cb53c0233f8cdb41a1685c64d554166/Drivers/STM32H7xx_HAL_Driver/Src/stm32h7xx_hal_eth.c#L2970-L2973

Robert Sexton
Associate II

The descriptors aren't the problem. The rest of the structure is uninitialized. That breaks ETH_Prepare_Tx_Descriptors():

 /* Current Tx Descriptor Owned by DMA: cannot be used by the application  */
  if ((READ_BIT(dmatxdesc->DESC3, ETH_DMATXNDESCWBF_OWN) == ETH_DMATXNDESCWBF_OWN)
      || (dmatxdesclist->PacketAddress[descidx] != NULL))
  {
 

https://github.com/STMicroelectronics/STM32CubeH7/blob/0a714a644cb53c0233f8cdb41a1685c64d554166/Drivers/STM32H7xx_HAL_Driver/Src/stm32h7xx_hal_eth.c#L3058

alister
Lead

The loop in the ETH_DMATxDescListInit function firstly initialises the descriptor at heth->Init.TxDesc + i and then saves a pointer to it at heth->TxDescList.TxDesc[i].

Its using the WRITE_REG macro to write heth->TxDescList.TxDesc[i] helps neither the code nor the reader and is bad practice really because it's not writing a register and good code ought be simple, consistent, do everything for a reason and do nothing for no reason.

If by "uninitialized structure" you mean the elements of the array at heth->TxDescList.TxDesc are not pointing to the descriptors properly, the problem might be one of these:

  1. HAL_ETH_Init has not been called,
  2. HAL_ETH_Init is being passed incorrect initialization data,
  3. HAL_ETH_Transmit or HAL_ETH_Transmit_IT is being passed an incorrect handle, or
  4. That array in the handle has become corrupt somehow
LCE
Principal

I think the OP is right, he says the descriptor init is okay, but the addresses in the struct like dmatxdesclist->PacketAddress are never initialized before being used.

But I just took a quick look at it...

Alister, your comments about WRITE_REG are spot-on. Its bad practice.

That said, this isn't a theory. I wasted a few days of engineering time debugging this, and I have the memory dumps and workarounds. The array in question is corrupted because its in uninitialized memory, and the init code doesn't init the entire data structure.

The code never initializes PacketAddress[] or BuffersInUse. You can verify this with a quick search of the code.

Looking at it again, it makes even less sense. The for() loop just zeros things out and adds a pointer to an array. Better just to memset() the whole thing and then initialize the TxDesc[] array and the hardware.

ETH_DMARxDescListInit() has the same poor inefficient style. It explicitly zeros a bunch of structure items when it should just memset() the whole thing. That would be smaller code, too.

Sem A.
Associate III

This forum thread was marked by the moderator as needing a little more investigation, so a Support case was created in your name and will be handled off-line. Should anyone have any similar questions, please feel free to open a Support case directly at your myST portal: https://st.com/ols

I use neither this Ethernet driver nor your Cube version. You have the code and the code contains all the answers. You have to read it.

Not saying the Ethernet driver is right or wrong. But your observations and theories about initialization seem incorrect to me.

Search your code the line that defines your Ethernet driver's handle. Define means where its storage in memory is reserved. It should look like this:

ETH_HandleTypeDef heth;

That code is defining the handle in the .bss or COMMON. COMMON is part of the .bss section. The .bss is cleared to zeros by special start-up code that executes before main(). The ETH_HandleTypeDef is a structure. A pointer to heth is passed to each the driver's public functions. TxDescList is in that structure. Clearing .bss before executing main is standard C. The Ethernet driver is not responsible for that. Good code knows its responsibilities. Not saying the code is good or bad. Just it would be bad for the code to not know it is not responsible for something and act as though it is.

Debugging improves with practice. You want to isolate a problem's cause and you want to take the least steps to do it. Steps that halve a domain are good. So check

  1. Your handle is declared in .bss, and
  2. Your .bss is cleared to zeros before main.

We are wasting time looking at the Ethernet driver if the handle is not defined in .bss or if the .bss is not being cleared before main.

You can find where the handle is defined by inspecting your map file.

You can check the handle (and the rest of .bss) is cleared before main executing to main in the debugger and inspecting the memory in a memory window or inspecting the heth structure in an expressions window.

Has anyone else ever seen this driver execute reliably? That would be helpful to know too.

I know whats going on, where everything is in memory, how C initialization works, and why the code broke on my part. Thats how I debugged the issue. As far as I'm concerned, this issue is root caused and fixed.

I will double check everything to be absolutely sure of what I saw. I know that its possible to make mistakes on day three of a debug. I know that I found garbage in that structure that was killing packet transmission and when I added a memset() my stack started working.

Hopefully somebody will find this posting and not waste time.

I've got a business critical lwIP application with almost a decade in the field and hundreds of millions of operational hours. It's running on TI, and I have never had to fix a problem with vendor library code.

I'm going to push back my deployment schedule to allow for more rigorous acceptance testing. I've already found other weaknesses in the network stack that need to be run down before mass production, and I will do so. I suspect that I'm seeing issues that have already been reported.

The H7 is a good design. It's got some innovative features that will improve my system. I'm sure I can get it where it needs to be.