TCP RST's, LWIP OS mode and TCPIP core locking

Manu Abraham · ‎2020-09-16

Hi Folks,

I have been spending quite a while trying to trace TCP_RST 's happening with the STM32H7, specifically H743.

Raised the issue on individual mailing lists; folks there are quite frustrated with ST.

I could sense a lot of frustration in different people. Some did have weird comments. :(

In a situation, where I am running CubeH7 1.8.0 with FreeRTOS and LWIP.

In a private email, by one of the LWIP folks, they pointed me to:

https://www.nongnu.org/lwip/2_0_x/pitfalls.html

https://www.nongnu.org/lwip/2_0_x/group__lwip__os.html

and

https://www.nongnu.org/lwip/2_0_x/group__lwip__opts__lock.html#ga8e46232794349c209e8ed4e9e7e4f011

to even start debugging where the problem originates in the first place.

The whole idea is to use NO_SYS 0,

enable LWIP_TCPIP_CORE_LOCKING.

This led to a whole lot of other issues.

Looking into the ST port, it does not seem to have LWIP_TCPIP_CORE_LOCKING ported ?

Any suggestions, folks ?

Thanks

Manu

Piranha · ‎2020-09-16

You already know my topic... Unless those problems are fixed, there is no sense in "debugging" higher layer code!

https://community.st.com/s/question/0D53W00000DHBWhSAP/lwip-and-sntp

> Eventually ended up with a thread doing sntp_send() having a udp_sendto() and a sleep() in the thread.

> It appears to be working. No locking in there atm, the pcb isnt shared. So, I guess not an issue there, prolly.

That's not how it works and it's still broken. lwIP has an internal core thread running, which calls RAW API functions. Other threads are not allowed to call RAW API functions without using core locking or other safe technique.

And my topic already has a links to Multithreading and Common pitfalls documentation. An excerpt from it:

> As such, the list of functions that may be called from other threads or an ISR is very limited! Only functions from these API header files are thread-safe: ...

What exactly is not clear in these two sentences?

https://community.st.com/s/question/0D53W00000DJTq3SAH/stm32h7-lwip-and-multi-threading-done-right-

As you can see, all the Ethernet/lwIP is totally broken and ST doesn't care...

> Can you please point to at least a single properly implemented LWIP sample application code base ?

Here is an example of a correct usage of RAW API:

http://lwip.100.n7.nabble.com/UDP-and-Raw-API-lwip-running-with-RTOS-td33221.html

> The problems that you raised in the past, exists in the CubeH7 1.7.0 package as well ?

ST's HAL/CubeMX developers do not understand interrupt and thread safe code, asynchronous code, non-trivial problem reports, do not read documentation and their development process is ridiculously slow. They have not been able of solving those problems since the beginning of STM32 in 2007. Believing that a miracle will happen and they will solve those problems is totally naive.

PHolt.1 · ‎2022-07-11

I have been digging into this and I am using exactly the same scenario i.e.

The whole idea is to use NO_SYS 0,

enable LWIP_TCPIP_CORE_LOCKING.

It looks like you have to implement some macros or functions which create mutexes.

FreeRTOS can supply mutexes and these work very well.

Looks like you have to implement these functions

#define LWIP_MEM_FREE_PROTECT() sys_mutex_lock(&mem_mutex)

#define LWIP_MEM_FREE_UNPROTECT() sys_mutex_unlock(&mem_mutex)

There are what appear to be example functions in sys_arch.c under Mutexes.

I find all this very confusing because e.g. here

https://www.nongnu.org/lwip/2_0_x/pitfalls.html

it states that

In OS mode, Callback-style APIs AND Sequential-style APIs can be used. Sequential-style APIs are designed to be called from threads other than the TCPIP thread, so there is nothing to consider here. The implication is that these do not need mutex protection. And indeed stepping through code such as

fd = socket(AF_INET, SOCK_DGRAM, 0);

does find that a mutex is used, deep down.

it calls conn = (struct netconn *)memp_malloc(MEMP_NETCONN);

which does SYS_ARCH_PROTECT(old_level);

but I can't see that defined anywhere so that is probably what needs to be done if you are setting

LWIP_TCPIP_CORE_LOCKING to 1

What I can't find anywhere is whether the "sequential" functions are thread-safe with or without LWIP_TCPIP_CORE_LOCKING.

And where does SYS_ARCH_PROTECT come into all this, since it seems unrelated to LWIP_TCPIP_CORE_LOCKING.

PHolt.1 · ‎2022-07-15

I did solve the LWIP thread safety issue (LWIP_TCPIP_CORE_LOCKING = 1)

Details here

https://www.eevblog.com/forum/microcontrollers/any-stm-32f4-eth-lwip-freertos-mbedtls-experts-here-(not-free-advice)/msg4298761/#msg4298761

Piranha · ‎2022-07-23

The latest lwIP documentation has more information and that link is already present in my first post in this topic and in my Ethernet/lwIP related issue list:

https://www.nongnu.org/lwip/2_1_x/multithreading.html

The Socket API is built on top of Netconn API and netifapi_***() calls and those use tcpip_send_msg_wait_sem() and tcpip_api_call() underneath. As the code shows, those functions are thread-safe both with and without core locking, but, because core locking is more efficient, it is the default and the recommended mode. I am also enabling the LWIP_TCPIP_CORE_LOCKING_INPUT and setting the network input thread to a higher priority than the lwIP core thread. Take a note that netifapi_***() are thread-safe versions of netif_***() calls, though not all of them.

The LWIP_TCPIP_CORE_LOCKING provides a "slow" protection for thread safety, which is relevant only for NO_SYS=0 (with RTOS). The SYS_ARCH_PROTECT provides a "fast" protection for memory allocation and other critical sections, which can be relevant also for NO_SYS=1 (no RTOS or calling lwIP only from a single thread). While technically it can also be implemented as a mutex, typically it is implemented as a global interrupt disable. That's why it's called "arch" - it is specific to the CPU architecture. The best option for FreeRTOS on Cortex-M is to implement it like this in sys_arch.h file:

#define SYS_ARCH_DECL_PROTECT(lev)    UBaseType_t lev
#define SYS_ARCH_PROTECT(lev)         lev = taskENTER_CRITICAL_FROM_ISR()
#define SYS_ARCH_UNPROTECT(lev)       taskEXIT_CRITICAL_FROM_ISR(lev)

Take a note that on Cortex-M this code works correctly in both interrupt and thread contexts.

PHolt.1 · ‎2022-07-23

The issue with disabling interrupts is that it is likely to create large ISR latencies. Looking at the amount of code involved these could be tens or hundreds of microsconds.

As I posted on the other thread just now, the solution needs two more mutexes, but they cannot be the same macros as used for the API.

https://community.st.com/s/question/0D50X0000BOtUflSQF/bug-stm32-lwip-ethernet-driver-rx-deadlock

I did find text indicating that LWIP is already thread-safe (for socket and netconn APIs, not for the raw API) even without LWIP_TCPIP_CORE_LOCKING=1 but it was very ambiguous. Thanks for the clarification.

FWIW I found that mutexing the two low level functions or using CORE_LOCKING has no measurable effect on ETH performance. A FreeRTOS mutex uses up only a microsecond or two (168MHz 32F417).

Right now I am trying to get more detail of the __DMB fix. For example the code here

https://github.com/STMicroelectronics/STM32CubeF4/blob/4aba24d78fef03d797a82b258f37dbc84728bbb5/Projects/STM32F429ZI-Nucleo/Applications/LwIP/LwIP_HTTP_Server_Netconn_RTOS/Src/ethernetif.c#L235

doesn't have it, and that is close to what I am using.

Piranha · ‎2022-07-24

SYS_ARCH_PROTECT does not lock for a long time. It's mostly used on short deterministic sections. One exception I see is mem_malloc(), but then it implements temporary unlocking here and here. A comment in the latter one even specifically says "prevent high interrupt latency...". Another one is memp_overflow_check_all(), but that is basically only for debugging purposes at an extreme level and you have been warned. To me it seems that a code without that overflow checking should not lock for more than a few microseconds, while mostly being in a nanosecond range.