[bug fixes] STM32H7 Ethernet

alister · ‎2020-02-10

@Amel NASRI, @ranran, @Piranha, @Harrold, @Pavel A.

V2 of my fixes and improvements to H7_FW V1.5.0/V1.6.0 Ethernet...

Changes include

Decoupling receive buffers from receive descriptors so buffers may be held any length time without choking receive.
Optionally queuing transmit, so transmit doesn't need to block until complete.
Many bug fixes.

Find full details and source in the attached zip. V1 was posted to another developer's question. Please post any questions about V2 here.

Piranha · ‎2020-07-23

Hi, guys, and sorry for a long delay. I should really limit my time and effort spent on hopeless users here and concentrate more on specific useful topics...

Regarding overall design I can say that descriptor lists are queues which are partly managed by hardware. Therefore I'm not using any additional queues neither for Rx, nor Tx. The lwIP memory pool for Rx and TxQueue for Tx are unnecessary. You don't need lwIP pool management for your array, which is managed by hardware and your code anyway. And there is no real sense in additional Tx queue, because it also has a limit exactly like descriptor array. If it's not enough, just increase the number of descriptors. In my driver I just have 3 separate arrays for Rx (descriptors, data buffers and pbuf_custom) and 1 array (descriptors) for Tx. The number of elements in Rx arrays are equal, but being separate makes them more effective regarding size/alignment and makes it possible to put them in different memories/locations.

For this to work, I added additional pointer argument at the end of descriptor structure. I use it to attach a pbuf/pbuf_custom to a descriptor. While descriptors are used and recycled incrementally, the use and release of Rx data buffer and pbuf_custom structure pairs depends on application code and is not deterministic, but the pairs are always fully synchronized - indexes are always the same. That way, when the pbuf_free_custom_fn() is called, I calculate the respective data buffer index from pbuf pointer, because at that point the payload member contains junk. And then just attach the released pair to the next free Rx descriptor.

For Tx the pbuf segment count are not deterministic because it depends on many factors. Just as an example, the stack itself adds Ethernet+IP+UDP/TCP combined header in front of the sent data by adding additional pbuf at front all the time. Counting pbuf-s before queuing is necessary, but, as typically a pbuf chain has 1-3 segments, the CPU cycles spent on that are negligible. If there are not enough descriptors, just drop the frame altogether. TCP will re-transmit and UDP and Ethernet itself doesn't have to guarantee the delivery anyway - the same situation as with broken network. Stopping the whole tcpip_thread() can potentially be even worse than dropping some frames under abnormally high load. As alister said - descriptors are cheap! One can set their numbers to tens or even hundreds, if necessary. For example, my demo firmware uses 16 Rx descriptors (with 1536 B data buffers) and 48 Tx descriptors.

P.S. Of course, ask for more details if/when necessary. :)

alister · ‎2020-07-23

Really thought-provoking ideas. Thanks for sharing.

The results of Piranha's effort are at https://community.st.com/s/question/0D50X0000AhNBoWSQW/actually-working-stm32-ethernet-and-lwip-demonstration-firmware.

@SLuka.1 this is an answer to your post. I'll add it's not unexpected for tx_cycles to exceed rx_cycles because rx is an external event that the software merely responds, whereas tx needs to be prepared by the app and scheduled (especially TCP) by the stack.

ADunc.1 · ‎2020-09-06

FYI V1.8.0 STM32CubeH7 has some minor changes to the ethernet driver. I didn't study them in much detail at all. Looks to address some of the issues but does not appear to have changed much at all.

Geraldo Pereira · ‎2020-10-28

Hi Alister,

Thank you by your work. Do you have anything like this to F429ZI or F767Zi board? I could not generate and run examples in CubeMX that use ethernet, RTOS and LwIP, yet.

Thanks,

Geraldo.

alister · ‎2020-10-28

Hi Geraldo

I've nothing on F4. The shop I work has F7 with working ethernet including fixes to rx-DMA, rx-buffer-managements, auto-negotiation, maybe some others. Its FW_F7 version is more than 3 years old. I can't share it.

Check @Piranha's many threads and posts. Here are some:

https://community.st.com/s/question/0D50X0000BOtfhnSQB/how-to-make-ethernet-and-lwip-working-on-stm32

https://community.st.com/s/question/0D50X0000AhNBoWSQW/actually-working-stm32-ethernet-and-lwip-demonstration-firmware

https://community.st.com/s/question/0D50X0000B2AG7FSQW/ethernet-send-complete-interrupt-not-triggered-in-stm32f7

Check searches of the community:

https://community.st.com/s/global-search/stm32f4%20ethernet

https://community.st.com/s/global-search/stm32f7%20ethernet

Check Google:

https://www.google.com/search?q=%22stm32f7%22+ethernet+driver+fixes&oq=%22stm32f7%22+ethernet+driver+fixes

Does anyone following this thread have a link to F4/F7 fixes they can share?

[EDIT] You could clarify if your boards are purported to be supported by the examples, and how it doesn't work. Also as the focus of this thread is H7 ethernet, please post to an existing F4 and/or F7 thread describing the same problems or, if you can't find one, start another.

Geraldo Pereira · ‎2020-10-29

Hi Alister,

Thanks by your answer. I´ll read the documents.

By the way. I´ve some tests. For example in cubemx, I tried configure ETH, RTOS with CMCIS_v1 (and with CMCIS_v1) and Lwip. But with the generated code any IP adress is assigned. So I´m not sure about the correct con figuration to put the system with RTOS and Lwip to run. I´m analysing the LwIP_HTTP_Server_Netconn_RTOS example. I saw it use CMCIS_v1.So I tryied configure a project in CubeMX to use RTOS with cmcis_v1 and Lwip. The project compile and run, but the IP is not assigned (with or not using DHCP). I thinks it´s a problem with the DHCP or LwIP configuration. So I´ll check.

I´m a eletrical engineer student in Brasil in a critical situation. I´ve to finish a prototype that use socket to send information to a remote server. I´ve tried Texas board, but I decide change and try STM. But It´s very dificult to develop project without an oficial tutorial to explain for example "A simple socket example in CubeMX using RTOs and LwIP?". You have to read a lot of documentation that doesn´t reference and put the real version of packages used, for example CMSIS, LwIP, etc; So its hard, but I´ll try.

Thanks and if your have any suggestion to prepare a simple project configuration based in F429ZI or F767ZI to use RTOS and Socket and could send to me, I´ll be great. My email is geraldo@mlink.com.br. I´ll search the links and open a new thread to F4 or F7.

Bellow is the code that I´ve tested without sucess. The generated code call MX_LWIP_Init(); So I decided adapt the code from LwIP_HTTP_Server_Netconn_RTOS examples. But it didn´t run. It compile but the IP is not assigned.

/* USER CODE END 4 */

/* USER CODE BEGIN Header_StartDefaultTask */

/**

* @brief Function implementing the defaultTask thread.

* @param argument: Not used

* @retval None

*/

/* USER CODE END Header_StartDefaultTask */

void StartDefaultTask(void const * argument)

{

/* init code for LWIP */

// MX_LWIP_Init();

/* USER CODE BEGIN 5 */

/* Create tcp_ip stack thread */

tcpip_init(NULL, NULL);

/* Initialize the LwIP stack */

Netif_Config();

/* Notify user about the network interface config */

User_notification(&gnetif);

#ifdef USE_DHCP

/* Start DHCPClient */

osThreadDef(DHCP, DHCP_thread, osPriorityBelowNormal, 0, configMINIMAL_STACK_SIZE * 2);

osThreadCreate (osThread(DHCP), &gnetif);

#endif

/* Infinite loop */

for(;;)

{

/* Delete the Init Thread */

osThreadTerminate(NULL);

}

/* USER CODE END 5 */

}

Thanks,

Geraldo.

elmood · ‎2020-10-31

I found this thread based on the thread: "How to make Ethernet and lwIP working on STM32" in which the OP frustratingly points out a lot of defects in the STM32 code but refuses to post any actual code... what the actual heck? So thank you alister for making the world a better place! Does the code posted here by alister cover all the topics mentioned in the other thread?

I'm working on the STM32F746 with CubeMX 5.6.1 and STM32Cube MCU package 1.16.0. I built working ethernet code with a simple http server and DHCP. I'm planning to try patching code based on the examples here, but just wondering the best way to do that since I know this is designed for the H7. Are there any tips for doing this successfully?

Also, I see references to "iperf2" but have never used this before... I'd like to test the performance and also measure the CPU overhead. If anyone has ideas on how to set this up with the stock firmware and LwIP it would be very helpful! I want to have a solid base on which to build Ethernet-based projects and just want to make sure I'm not running into any issues later. I'll gladly publish my base code for the STM32F746ZG Nucleo-144 dev board if I can get some help to patch the necessary bits.

alister · ‎2020-10-31

>thank you alister for making the world a better place

Thank you elmood.

>what the actual heck?

You've more than you've paid for. He's shared a lot of clues.

>"iperf2"

Cube generates it at Middlewares\Third_Party\LwIP\src\apps\lwiperf\lwiperf.c. lwIP v2.0.3 supports TCP server only. V2.1.2 supports TCP client and server. There’s UDP support at https://savannah.nongnu.org/patch/?9751. Its tool can be found at https://sourceforge.net/projects/iperf2/. iperf-2.0.9-win64 works with lwIP v2.0.3, or iperf-2.0.14a-win.exe with -C (compatibility).

>Does the code posted here by alister cover all the topics mentioned in the other thread?

It's on my todo list to check a potential lockup when an MMC count reaches 2^31, mentioned at https://community.st.com/s/question/0D50X0000AIdSc0SQF/unwanted-interrupts-for-ethernetmac-mmc.

The code works as is. It must have an MCU region for the DMA descriptors. You could code it for without. But I think that's a poor choice and performance wouldn't improve. It's a poor choice because there's a principle in software that bugs are proportional to lines of code, and that doesn't mean skimp on comments. It means you want a job done once and used many.

>STM32F746

Similar, but different. Please create a new thread for it.

MBigy.1 · ‎2020-11-09

(edit: sorry I have answered in the wrong position, and I cannot delete this. I have moved my answer below!)

MBigy.1 · ‎2020-11-09

Hi @ASar (Community Member), I tried to integrate @alister (Community Member) 's bugfixes on H7_FW V1.8, but with the TCP_WND of 11680 bytes in lwipopts.h, as you used, I can't get over 55 Mbps RX (i.e. very far from the 90+Mbps you are achieving).

I can get about 90 Mbps only if TCP_WND is over 20kB (e.g. 23360). However, I get the same speed on the ST stock FW 1.8 (i.e. without bugfixes), with that same buffer size.

I'm using lwiperf.c (which is present on the lwip package 2.1.2) and Free RTOS 10.2.1. On the PC side I use iperf2.09. The project runs on a Nucleo Board with an STM32H743, with the standard LAN8742A, though the PHY should (almost) not make any difference. The project does nothing else apart initialization and starting the lwiperf server.

What is your configuration / setup you use to achieve your speed ?

Thanks!

UPDATEs:

Using the -P option on iperf (number of parallel client thread to run) I get indeed the maximum 94Mb/s as the sum, when I use 2 or more threads (with the 11.68k TCP_WND).

Still I don't understand why a single connection cannot reach the whole bandwidth.

The IDLE task running time is larger than 85% during the test, i.e. no CPU starvation. The task running time is reset after getting the task stats, using the function shown here: https://forums.freertos.org/t/how-to-reset-vtaskgetruntimestats/7861/3).