32F417: is there a very quick way to check if there is some ETH RX data available?

PHolt.1 · ‎2022-09-01

I am using a simple polled ETH RX process. The 10ms poll period limits data rate to 250kbytes/sec which is more than enough for the application, but it does introduce a latency of up to 10ms.

Please don't tell me to use interrupt driven RX unless you want to write and debug the code for me : - )

I am looking for a way to skip that 10ms wait if there is some more RX data. That would achieve pretty much the same thing, especially as this thread runs with a high RTOS priority. But the function low_level_input is pretty long. It calls HAL_ETH_GetReceivedFrame which is a lot of code, and all of it runs just to find out there is nothing to do.

Is there some ETH register bit which can be checked to see if more data has arrived?

This is the said function, and only at the end it returns if there is no RX data

 
HAL_StatusTypeDef HAL_ETH_GetReceivedFrame(ETH_HandleTypeDef *heth)
{
  uint32_t framelength = 0U;
 
  /* Check if segment is not owned by DMA */
  __DMB();
  if(((heth->RxDesc->Status & ETH_DMARXDESC_OWN) == (uint32_t)RESET))
  {
    /* Check if last segment */
    if(((heth->RxDesc->Status & ETH_DMARXDESC_LS) != (uint32_t)RESET))
    {
      /* increment segment count */
      (heth->RxFrameInfos).SegCount++;
 
      /* Check if last segment is first segment: one segment contains the frame */
      if ((heth->RxFrameInfos).SegCount == 1U)
      {
        (heth->RxFrameInfos).FSRxDesc =heth->RxDesc;
      }
 
      heth->RxFrameInfos.LSRxDesc = heth->RxDesc;
 
      /* Get the Frame Length of the received packet: substruct 4 bytes of the CRC */
      framelength = (((heth->RxDesc)->Status & ETH_DMARXDESC_FL) >> ETH_DMARXDESC_FRAMELENGTHSHIFT) - 4U;
      heth->RxFrameInfos.length = framelength;
 
      /* Get the address of the buffer start address */
      heth->RxFrameInfos.buffer = ((heth->RxFrameInfos).FSRxDesc)->Buffer1Addr;
      /* point to next descriptor */
      heth->RxDesc = (ETH_DMADescTypeDef*) ((heth->RxDesc)->Buffer2NextDescAddr);
 
      /* Return function status */
      return HAL_OK;
    }
    /* Check if first segment */
    else if((heth->RxDesc->Status & ETH_DMARXDESC_FS) != (uint32_t)RESET)
    {
      (heth->RxFrameInfos).FSRxDesc = heth->RxDesc;
      (heth->RxFrameInfos).LSRxDesc = NULL;
      (heth->RxFrameInfos).SegCount = 1U;
      /* Point to next descriptor */
      heth->RxDesc = (ETH_DMADescTypeDef*) (heth->RxDesc->Buffer2NextDescAddr);
    }
    /* Check if intermediate segment */
    else
    {
      (heth->RxFrameInfos).SegCount++;
      /* Point to next descriptor */
      heth->RxDesc = (ETH_DMADescTypeDef*) (heth->RxDesc->Buffer2NextDescAddr);
    }
  }
 
  /* Return function status */
  return HAL_ERROR;
}

waclawek.jan · ‎2022-09-01

> all of it runs just to find out there is nothing to do

No. If there's no Rx packet, the very first condition is false and function returns immediately.

If you want to spare time, delete the barrier, it's not needed in Cortex-M4.

JW

PHolt.1 · ‎2022-09-01

You are right - thanks. It skips most of it.

My plan is to set a flag if data has been received, and clear it on a timeout, say 10ms. While that flag is set, the RX will be polled in 1ms intervals (resulting data rate = 1200kbytes/sec). Once reset, the RX will be polled in 10ms intervals (resulting data rate - 250kbytes/sec, but immediately going up to the 1200kbytes/sec). It's a reasonably simple way to get a high ETH transfer speed, without the complexity of interrupt driven RX, and without excessively fast polling which would then slow down other RTOS tasks.

EDIT: done and works perfectly.

Other reasons for doing this is that the whole low level input loop in a standalone RTOS task, and I have stuff like link status change detection in there too. That conveniently avoids thread safety issues.

Re the barrier, "Piranha" here might disagree, and "Piranha" being an unhappy fish is not something I want to risk : - )

Piranha · ‎2022-09-05

My article explains it all, but, of course, you didn't read it. It instructs putting a barrier after checking the OWN bit, but you have put it before checking. It also says that Cortex-M7 is currently the only core, which does re-order memory transactions and perform speculative data reads and therefore does require either the DMB instruction or an appropriate MPU configuration. And it also says that the code still needs a compiler barrier to stop the compiler from reordering accesses to other descriptor words relative to the volatile Status word with the OWN bit. Though in this particular case the code logic itself (the controlling expression of if statement is a sequence point) doesn't allow such reordering. Also the code unnecessarily reads the volatile Status word 3 times.

uint32_t Status = heth->RxDesc->Status;
__COMPILER_BARRIER();
if (!(Status & ETH_DMARXDESC_OWN))

About the throttling delay code... Imagine the code checks and determines that there are no new data and after that the data comes in. The code will still wait for the 10 ms delay and the data will be processed after up to 9,(9) ms.

> That conveniently avoids thread safety issues.

It doesn't, because with NO_SYS=0 the lwIP core thread ("tcpip_thread") runs in parallel anyway.

PHolt.1 · ‎2022-09-06

I have removed the __DMB.

I have made the 10ms delay adaptive, dropping to 2ms if data arrives and reverting to 10ms after 10ms of no data. Data rate is now 1MB/sec which is much faster than neded.

What do you mean "runs in parallel anyway". You said before that LWIP will never run more than one instance of low_level_output.

I have

/**
 * NO_SYS==1: Provides VERY minimal functionality. Otherwise,
 * use lwIP facilities.
 */
#define NO_SYS                  	0
 
// Flag to make LWIP API thread-safe. The netconn and socket APIs are claimed
// to be thread-safe anyway. The raw API is never thread-safe.
#define LWIP_TCPIP_CORE_LOCKING   	1

Piranha · ‎2022-09-06

> I have made the 10ms delay adaptive, dropping to 2ms if data arrives

It's impossible without a race condition. I my previous post I gave you an example how it will wait the longer time, when the data has arrived. Anyway that is an unnecessary contraption. Normally such things are solved by setting proper priorities for RTOS threads. Just set the Ethernet processing thread to a lower priority that other more important threads and the RTOS will deal with it.

> What do you mean "runs in parallel anyway". You said before that LWIP will never run more than one instance of low_level_output.

So there is the internal lwIP core thread, your Ethernet processing thread and your other threads. The lwIP core API cannot be called from multiple threads simultaneously. When LWIP_TCPIP_CORE_LOCKING is enabled, the lwIP core thread uses core locking macros to get exclusive access when it calls the core API. And, of course, all other threads, which need to call lwIP core API, also must use the same core locking macros. The low_level_output() function are called only with the core locked. Therefore, if all threads obey the rules, then it will not be called in parallel. If they don't obey the rules, there will be much more problems and the the code is broken - like it is with ST's code.

> // Flag to make LWIP API thread-safe.

No, it doesn't do that. I already explained it in your other topic.