Ethernet performance & packets loss

ranran · ‎2019-09-26

Hello,

We try to test our Ethernet performance (STM32H7).

We are using a udp echo with LWIP , and we inject udp packets from PC into STM32H7 .

The performance is OK with most packets, but when decreasing packet size below ~300bytes, we start to get packet lost.

My question is:

Is there a benchmark document for STM32H7 showing the expected performance for Ethernet ?
Is there a loopback application example using HAL APIs (without LWIP) ?

Thank you,

Ran

Ozone · ‎2019-11-28

> Typical Cortex-M3/M4 is more powerful than Intel 80486, which was able to handle Windows 95, and Cortex-M7 is on par with Pentium II. On a RAM side few tens of KB for Ethernet and IP stack buffers are more than enough for high-performance implementation and modern MCUs have much more internal RAM, not even talking about possibility of adding external one.

You are comparing apples to oranges here.

First, ethernet was 10MBit/s at the time of the 486. And second, even then the network chips provided significant buffer and preprocessing capacities to reduce the real-time requirements for the main processor.

Surely Cortex M4 controllers can digest a moderate amount of ethernet traffic. But can it handle the average 1000MBit/s office network ?

Usually they are already drown by the irrelevant background chatter on link layer level.

Giuseppe Cannarella · ‎2019-11-28

Dear Piranha,

thank you for your demo. I'm glade to know that there's no "hardware related" limitations....

However I cannot run the test code because I'm using STM32F4 series.... Can you share the source code? In this case we can modify to run on F4 or share an explanation of the changes in the ST driver (if you have modified it and not completly rewritten)

Piranha · ‎2019-12-04

> But can it handle the average 1000MBit/s office network ? Usually they are already drown by the irrelevant background chatter on link layer level.

It seems that this is rather popular fundamental misconception of Ethernet networking. I've seen this misconception in posts from @Ozone , @Community member and other users. In my previous post I already told about "switch vs hub" question, but let's expand on this. First, look at this image.

Long ago networked devices were connected with network hubs and had to work all synchronously in the same speed and mode (full or half duplex). As hub is a simple repeater, all connected devices in a single collision domain shared total throughput and every frame was sent to every device. But that was 20+ years ago, when 10 Mbps networks were the most popular ones! In 21st century hubs are obsolete and are not used or produced anymore.

Nowadays all Ethernet networks are connected with network switches, which are totally different "store and forward" type devices. Switch has an internal RAM, where it captures incoming frame, processes MAC addresses and transmits the frame on corresponding port. To do this, the switch also has an internal RAM for MAC address table, where it stores learned addresses of network devices. That's why even simple switches have such specifications as MAC address table size, maximum frame size and frame forwarding rate. Because of the "store and forward" technique, the switch can connect devices with different speeds and modes. And, because of MAC address learning, the switch almost doesn't "spam" the devices with irrelevant traffic.

Therefore typical office network is not 1000 Mbps, nor 100 Mbps or any other particular speed but a mix of different speeds, modes and physical interfaces, including optical and wireless. Every link between two devices has it's own speed and mode. If a desktop PC uploads a large file to some NAS (network-attached storage) through 1 Gbps interface at almost full speed and the same PC through the same network interface streams live 128 kbps MP3 stream to a 100 Mbps capable MCU based device connected to that same switch, then the MCU device needs to be able to handle only that MP3 stream. It doesn't even need to be able to handle full 100 Mbps, but only the data rate needed for it's actual task and a bit more. Even more - MCU device will not even be aware or able to detect the fact that the other PC-NAS traffic exists.

The only exception or "background" traffic that is delivered to all devices in a network, is broadcast (targeted to all devices) frames. These are required mostly for discovery tasks of DHCP, ARP, DNS and some other protocols. As stated before, these range from few frames per minute to few frames per second. To contrast this, my STM32F7 based demonstration firmware is capable of handling 8000+ unicast (targeted to the particular device) TCP frames per second with a 33% CPU load. Broadcast frames are almost always UDP and UDP processing takes approximately twice less CPU. Dropping irrelevant frames early in IP stack takes even less CPU. Therefore broadcast or "background" traffic really is nothing for a Cortex-M class devices.

Piranha · ‎2019-12-04

> You are comparing apples to oranges here.

Kind of, but we can compare what amount of sugars and fats they contain. I.e. compare CPU performance, RAM size and bus speeds, which in modern MCUs are pretty similar to those old PCs.

> even then the network chips provided significant buffer and preprocessing capacities to reduce the real-time requirements for the main processor

That is exactly what STM32 integrated ETH peripheral does. It has it's own dedicated scatter/gather DMA, which makes it possible to implement zero-copy drivers. And it has hardware checksum verifying/inserting capability for Ethernet, ICMP, UDP, TCP frames. Plus IEEE 1588 (PTP) hardware time-stamping and clock generation engine for those, who need such a feature.

> But can it handle the average 1000MBit/s office network ?

Short answer - it can handle average office network easily, but it doesn't need to be able to handle 1000 Mbit/s data traffic! More detailed explanation is in my other post below.