STM32F429 + NetxDUO - Heavy traffic problems

Davide Dalfra · ‎2025-07-07

Hello Folks

I'm looking for suggestions / hints on what's the best way to troubleshoot a strange problem we're experiencing.
We're running AzureRTOS/ThreadX + NetxDuo (Version 6.1.0) where we have a MQTT Client subscribing to a broker, waiting for a message and then reply back.
As Phy, we're using LAN8742A.

The problem we're experiencing is that while the application (based on pc) is flooding of messages the STM32F4 board, then it suddenly stuck somewhere on NetX side.

The other part of application is correctly running, and in order to troubleshoot better the issue we're experiencing (which is highly replicable) we have:

Increased the ip packet pool
Increased the RX descriptors
Disabled all the other task , just leave NetX ip instance and the task running the mqtt client(wait for message and then reply with a static message saying "Hello").
Force speed from 100MBit down to 10Mbit.

None of the previous tries give us a clue on what's happening. The only thing we've noticed is that after this happen we're no longer able to get ETH isr triggering.

Just for your information we're sending 10 message, at 10ms each one . After 4/5 burst, the ip stack get stuck and we loose also the ping (pc is pinging the board).

Any suggestion?

Regards

Davide

Ozone · ‎2025-07-13

Not a Cube/HAL user.

I think you would need to look up the source code, and check the semantics of those error codes.
If you try a file search for the error numbers, consider they might be defined as hexadecimal ...

mbarg.1 · ‎2025-07-14

@Davide Dalfra : Let me first point out that I am not STM, even if after 30 yars working with STM and STM devices I fell a little part of STM history ..

The switch to FreeRTOS+ is a STM decision that I strongly disagree, not my decision and impact only new devices (those coming later this year and in STM roadmap).

Coming to code:

Spoiler

void HAL_ETH_IRQHandler(ETH_HandleTypeDef *heth){ ...

demultiplexes various ethernet interrupts and calls relative functions:

...
    /* Receive complete callback */
    HAL_ETH_RxCpltCallback(heth);
....
    /* Transfer complete callback */
    HAL_ETH_TxCpltCallback(heth);
...
    /* Ethernet DMA Error callback */
    HAL_ETH_ErrorCallback(heth);
....

Now HAL_ETH_RxCpltCallback() and HAL_ETH_TxCpltCallback multiplexes again these 2 interrupts ORing one bit in nx_driver_information.nx_driver_information_deferred_events global variable.

nx_driver_information.nx_driver_information_deferred_events is used in _nx_driver_deferred_processing() ìnside Eth thread:

static VOID  _nx_driver_deferred_processing(NX_IP_DRIVER *driver_req_ptr)
{

  TX_INTERRUPT_SAVE_AREA

    ULONG       deferred_events;


  /* Disable interrupts.  */
  TX_DISABLE

    /* Pickup deferred events.  */
    deferred_events =  nx_driver_information.nx_driver_information_deferred_events;
  nx_driver_information.nx_driver_information_deferred_events =  0;

  /* Restore interrupts.  */
  TX_RESTORE
    /* Check for a transmit complete event.  */
    if(deferred_events & NX_DRIVER_DEFERRED_PACKET_TRANSMITTED)
    {

      /* Process transmitted packet(s).  */
      HAL_ETH_ReleaseTxPacket(&eth_handle);
    }
  /* Check for received packet.  */
  if(deferred_events & NX_DRIVER_DEFERRED_PACKET_RECEIVED)
  {

    /* Process received packet(s).  */
    _nx_driver_hardware_packet_received();
  }

  /* Mark request as successful.  */
  driver_req_ptr->nx_ip_driver_status =  NX_SUCCESS;
}

Where TX and RX are demultiplexed again, with a dedicated functions where TX clears used packets and RX retrieves packets.

All this demux-mux is useless and waste interrupt clock cycles.

Also ORing, is not a good practice as more packets can get in while Eth thread is executng and can pile up and crash HAL driver; I replaced single bit with semaphore counting IT and now I can be sure that all packets are retrieved correctly.

Recovering from Eth error, requires to reset all hw with HAL, stop any ongoing activity, and re-initialize and re-start any process, a waste of several seconds at least if you use DHCP, SLAAC and other system functions. Better avoid errors and trigger a complete system reset with IWDG when somting goes wrong, monitoring restart reason to log non-power related restarts.

Mike

Davide Dalfra · ‎2025-07-15

@mbarg.1 Thanks for your detailed response — much appreciated.

Even though I’ve only been in the ST world for 11 years, I’ve already seen so many changes that a bit of hope still remains: maybe they’ll reconsider and roll back to Tx. I’ve heard some rumors about plans to integrate FILEX (not sure about NetX) into the new operating system (FreeRTOS+) and even in bare-metal implementations — and I still ask myself why.

Going back to the hot topic I just have one last question: why only one counting semaphore?

I was more of the idea of having two (one for HAL_ETH_RxCpltCallback and the other one for HAL_ETH_TxCpltCallback). In that way i can loop until the semaphores are empty in the _nx_driver_deferred_processing.
A better approach might be also to differentiate events in the eth working thread, like "deferred_rx" and "deferred_tx" but this seems to be not well supported on NetX as from the documentation could be used for bot tx and rx.

In any case i am going to drill down this in the next days.

Thanks again for your support.

Davide

mbarg.1 · ‎2025-07-15

@Davide Dalfra :

Rx data rate is externally driven - i.e. it is network that send data to you, you do not know how many and when, in my apps I want to be ready to handle as much as possible with no crash and minimum losses.

Tx data rate is App dependent - i.e. app decide how many and when to send. More: if Ethernet tx is not ready, actual implementaton discard packets with no error (an awful implementation), you can easily check if Tx is ready and avoid it.

I use 2 threads, one for servicing Rx interrupts, one for Tx plus I have the option to keep also original AzureX thread for backword compatibility: Rx thread is lowest priority, as is the most critical one.

Once you do not have to demux IT, you do not ned to go to _nx_driver_deferred_processing but you can move stright to _nx_ip_packet_receive - using event NX_IP_RECEIVE_EVENT instead of NX_IP_DRIVER_DEFERRED_EVENT.

In my 2+1(optional) approach I bypass most of NetX layer by layer approach; using EthRxSemaphore to enable my EthRxThread, I keep receiving as far as I do have packets received and use a switch to call most suitable function, eventually waking up some other thread for low priority functions like SNTP, DNS, DHCP, ICMP6, DHCP6 ... and keeping always a NetX compatible format.

Saving is approx from 45% (ARP) to 80% (SNTPv6) of Thread busy time, lower stack requirements, cyclomatic complexity; driving reason, is once you are not using legacy code 100%, you can rework any part of the code as far as you keep compatibility.

Look forward to continue discussion if you are interested.

mike

Davide Dalfra · ‎2025-07-18

@mbarg.1

I think i almost done on the RX side, and now i see more clear the muxing you was talking about.
The approach of an IST (Interrupt Service Task, at least this is what i use to call it) is more clear.

Now stepping ahead on the TX side, i see something more than what we talked before.
1) In _nx_driver_hardware_packet_send i see:

if(HAL_ETH_Transmit_IT(&eth_handle, &TxPacketCfg))
{
  return(NX_DRIVER_ERROR);
}

which to me sounds like weak. A better approach could be:

  while(HAL_ETH_Transmit_IT(&eth_handle, &TxPacketCfg))
  {
    if(HAL_ETH_GetError(&eth_handle) & HAL_ETH_ERROR_BUSY)
    {

      if(nx_driver_information.nx_driver_information_deferred_events &         NX_DRIVER_DEFERRED_PACKET_TRANSMITTED)
      {
        HAL_ETH_ReleaseTxPacket(&eth_handle);
      }
      else
      {
        tx_thread_relinquish();
      }
    }
    else
    {
      return(NX_DRIVER_ERROR);
    }
  }

Do you agree? I see also here with the new approach the same issue of freeing packet with the ORed flag. Shall be unified in only one point.

2) About the relative IST for TX as the tx request falls in the same function as point 1, are you using the TX-IST only for freeing the packets? (HAL_ETH_ReleaseTxPacket(&eth_handle);)

Regards
Davide

mbarg.1 · ‎2025-07-19

@Davide Dalfra :

1) No, I would never modify STM HAL code - soon or late they will change and all your code will be invalid, you will have to re-work; same if you decide to move or reuse part of your code to a different processor.

I found better to work only on nx_stm32_eth_driver.c; adding a thread for Tx cleanup only with lower priority, fired by a dedicated semaphore (probably an event could work as well):

void HAL_ETH_TxCpltCallback(ETH_HandleTypeDef *heth) {

	/*	increase semaphore count	*/
	_tx_semaphore_put(&hEthMb.mb_eth_tx_semaphore);
        /*      Macro for stsistical and delay measurements - in final code, void   */
	NX_ETH_TX_INTERRUPT_INC
}

You will need to exclude original from build and replace with your code.

void Eth_Tx_Thread_Entry() {

	/*	this is ethernet TX processing thread		*/
	debugT("THREAD - Eth_Tx_Thread - START OK - pri %d @ %ld\n", (tx_thread_identify())->tx_thread_priority, tx_time_get());
	for (;;) {

		/*	as we have one semaphore count each packet, we keep going until all used		*/
		/*	@todo		ADD periodic processing using timeout		*/
		tx_semaphore_get(&hEthMb.mb_eth_tx_semaphore, TX_WAIT_FOREVER);
		/* Process transmitted packet(s).  */
		HAL_ETH_ReleaseTxPacket(&heth);
		/*	macro for stst and dely measure		*/
		NX_ETH_TX_THREAD_INC
	}

}

and you will never get an error, no need to warry about, if eth crash for non-sw reason, whole cpu crash and WD resets.

2) another weak point is in

static UINT  _nx_driver_hardware_packet_send(NX_PACKET *packet_ptr)
{
  NX_PACKET       *pktIdx;
  UINT            buffLen = 0;
.....

  TxPacketCfg.Length = buffLen;
  TxPacketCfg.TxBuffer = Txbuffer;
  TxPacketCfg.pData = (uint32_t *)packet_ptr;

  if(HAL_ETH_Transmit_IT(&eth_handle, &TxPacketCfg))
  {
    return(NX_DRIVER_ERROR);
  }
  return(NX_SUCCESS);
}

Before giving up, I prefer to try once more in case some other thread is using eth driver and it is not yot done:

static UINT  _nx_driver_hardware_packet_send(NX_PACKET *packet_ptr)
{

  NX_PACKET       *pktIdx;
  UINT            buffLen = 0;

  ETH_BufferTypeDef Txbuffer[ETH_TX_DESC_CNT];
  memset(Txbuffer, 0 , ETH_TX_DESC_CNT*sizeof(ETH_BufferTypeDef));

  if(packet_ptr->nx_packet_union_next.nx_packet_tcp_queue_next == ((NX_PACKET *)NX_PACKET_FREE)){
	  Error_HandlerT("NX_PACKET_ALLOCATED\n");
  }
  int i = 0;

  for (pktIdx = packet_ptr;pktIdx != NX_NULL ; pktIdx = pktIdx -> nx_packet_next)
  {
    if (i >= ETH_TX_DESC_CNT)
    {
      return NX_DRIVER_ERROR;
    }

    Txbuffer[i].buffer = pktIdx->nx_packet_prepend_ptr;
    Txbuffer[i].len = (pktIdx -> nx_packet_append_ptr - pktIdx->nx_packet_prepend_ptr);
    buffLen += (pktIdx -> nx_packet_append_ptr - pktIdx->nx_packet_prepend_ptr);

    if(i>0)
    {
      Txbuffer[i-1].next = &Txbuffer[i];
    }

    if (pktIdx-> nx_packet_next == NULL)
    {
      Txbuffer[i].next = NULL;
    }

    i++;
#if defined (__DCACHE_PRESENT) && (__DCACHE_PRESENT == 1U)
    SCB_CleanDCache_by_Addr((uint32_t*)(pktIdx -> nx_packet_data_start), pktIdx -> nx_packet_data_end - pktIdx -> nx_packet_data_start);
#endif
  }

#ifdef NX_ENABLE_INTERFACE_CAPABILITY
  if (packet_ptr -> nx_packet_interface_capability_flag & (NX_INTERFACE_CAPABILITY_TCP_TX_CHECKSUM |
                                                           NX_INTERFACE_CAPABILITY_UDP_TX_CHECKSUM |
                                                             NX_INTERFACE_CAPABILITY_ICMPV4_TX_CHECKSUM |
                                                               NX_INTERFACE_CAPABILITY_ICMPV6_TX_CHECKSUM))
  {
    TxPacketCfg.ChecksumCtrl = ETH_CHECKSUM_IPHDR_PAYLOAD_INSERT_PHDR_CALC;
  }
  else if (packet_ptr -> nx_packet_interface_capability_flag & NX_INTERFACE_CAPABILITY_IPV4_TX_CHECKSUM)
  {
    TxPacketCfg.ChecksumCtrl = ETH_CHECKSUM_IPHDR_INSERT;
  }
#else
  TxPacketCfg.ChecksumCtrl = ETH_CHECKSUM_DISABLE;
#endif /* NX_ENABLE_INTERFACE_CAPABILITY */

  TxPacketCfg.Length = buffLen;
  TxPacketCfg.TxBuffer = Txbuffer;
  TxPacketCfg.pData = (uint32_t *)packet_ptr;
  if(packet_ptr->nx_packet_next != NULL){
	  Error_HandlerT("_nx_driver_hardware_packet_send\n");
  }

  if(HAL_ETH_Transmit_IT(&eth_handle, &TxPacketCfg))
  {
	  NX_ETH_TX_RETRY_INC;
      tx_thread_sleep(1);
      if(HAL_ETH_Transmit_IT(&eth_handle, &TxPacketCfg)){
    	  NX_ETH_TX_PACKET_FAIL_INC
    	  return(NX_DRIVER_ERROR);
      }
  }
  return(NX_SUCCESS);
}

Mike

Davide Dalfra · ‎2025-08-16

Hello Mike

Hope to find you well. It taken a while, but finally the implementation / fixing has been done. I'm starting now a full-set of test to see how it performs.

We are going also to perform few test with "WANulator", keep you posted.

Thanks again

Davide