cancel
Showing results for 
Search instead for 
Did you mean: 

STM32CubeMX and STM32F4xx still low TCP/IP transmit performance

PZhan.11
Associate II

Hi, anybody,

I am using STM32F407ZG/STM32F429IG to build two similar control board. The main function is

some host send amount of data, about 1M bytes (STM32F407 with FSMC/SRAM) or 32M bytes (STM32F429 with FMC/SDRAM)

to STM32 and stored in memory. Then after some control command, the STM32 MCU control GPIOs

to drive another board. The problem is: both STM32F4 MCU claim that they have up to 100M bps

transmit speed, but I only got about 100 kbps.

The base code is generated with STM32CubeMX, enabled FreeRTOS with CMSIS_v2 API, LwIP (2.1.2) and mbedTLS (used for encrypt).

firmware is STM32Cube_FW_F4_V1.25.2, PHY is LAN8720A. And I config three task:

1. default task with 1024 words stack size for TCP handling

2. idle task which has low priority to broadcast MCU's IP and port (IP is DHCP supplied) through UDP, which is blocked by a binary semaphore when a TCP connection is accepted.

3. a realtime task for GPIO transmit, which is downgrade to normal priority for testing (empty task with osDelay now)

I have Googled around, and most tips are about sending throughtput, seems receiving are talked little.

[Low TCP/IP transmit perormance with STM32CubeMX and STM32F429ZG](https://community.st.com/s/question/0D50X00009XkZaV/low-tcpip-transmit-perormance-with-stm32cubemx-and-stm32f429zg)

talks about memcpy performance, but in testing scencio, I even do not use FSMC/FMC memory. But I changed

the memory map:

1. all code and RW ZI data are moved into CCM for better performance

2. ethernetif, lwip, mem, memp are keeped in SRAM 1 start with 0x20000000 for DMA accessable

3. SRAM 2 start with 0x2001C000 left for other purpose.

I use netconn API built a req/rep pattern server on MCU. When payload data is less than TCP_MSS,

it is fast enough, but when I send a 32 kbytes data block, it slows down.

I have enlarged the TCP_MSS to 1460, and TCP_WND to 11*TCP_MSS. All other tries are failed.

The last day's experiment show that: when I do some printf inside low_level_input to check the incoming

packet info, the performance suddenly run up to 500 kbytes/s, aka 4 Mbps. It is Incomprehensible, and seems

fail into quantum domain :(. I also tried sprintf and other method to check incomming packet info, but no lucky.

So how could I touch the 100Mbps?

1 ACCEPTED SOLUTION

Accepted Solutions
MWB_CHa
ST Employee

Hello @PZhan.1​ ,

The issue that you described is part of the known limitations in the legacy ETH HAL driver due to the complete copy of buffers degrading the performance.

The solution is indeed using zero copy method to enhance performance and that's what has been done in the new ETH HAL driver that you can find in STM32H7 CubeFW release V1.10.0 (also available on GitHub here https://github.com/STMicroelectronics/STM32CubeH7)

Please note that this ETH HAL driver update brings compatibility break and requires some rework at application level. More details are presented in the post here.

Kind Regards,

View solution in original post

2 REPLIES 2

Hello @PZhan.1​ ,

Your question has been raised internally. I will keep you posted.

Thanks for you contribution.

BeST Regards,

Walid

MWB_CHa
ST Employee

Hello @PZhan.1​ ,

The issue that you described is part of the known limitations in the legacy ETH HAL driver due to the complete copy of buffers degrading the performance.

The solution is indeed using zero copy method to enhance performance and that's what has been done in the new ETH HAL driver that you can find in STM32H7 CubeFW release V1.10.0 (also available on GitHub here https://github.com/STMicroelectronics/STM32CubeH7)

Please note that this ETH HAL driver update brings compatibility break and requires some rework at application level. More details are presented in the post here.

Kind Regards,