2022-09-17 03:20 AM - edited 2023-08-07 07:49 PM
This bug is present in ethernetif.c files for lwIP generated by CubeMX for the newer reworked ETH drivers.
Problem
When using CMSIS-RTOSv2, CubeMX generates the following code in the function low_level_output():
pbuf_ref(p);
HAL_ETH_Transmit_IT(&heth, &TxConfig);
while(osSemaphoreAcquire(TxPktSemaphore, TIME_WAITING_FOR_INPUT)!=osOK)
{
}
HAL_ETH_ReleaseTxPacket(&heth);
The TxPktSemaphore is initialized with both maximum and (wrongly) initial count of 1. Because the semaphore count is already 1, the osSemaphoreAcquire() call returns immediately and the HAL_ETH_ReleaseTxPacket() can release the packet before the transmission has actually been completed. While the Tx complete interrupt happens after the osSemaphoreAcquire() call, the code continues running in this crippled mode, corrupting packets and driver/stack state.
When the Tx complete interrupt happens before the osSemaphoreAcquire() call, the osSemaphoreRelease() call in the Tx complete interrupt tries to increase the semaphore count, but fails to do it because the count already is at the maximum. At this moment the code actually corrects the wrongly initialized semaphore count and the osSemaphoreAcquire() call starts to actually wait for the Tx complete interrupt. That means the packets will not be corrupted further, but the driver/stack state can potentially already be damaged and is not guaranteed to recover.
After the semaphore count is corrected or if CMSIS-RTOSv1 is used, another issue can happen. The HAL_ETH_Transmit_IT() function can return an error if, for example, there are not enough descriptors available at the moment. In this case the transmission will not be started at all and the osSemaphoreAcquire() call will wait forever. This means that the function low_level_output() and therefore the whole lwIP core is deadlocked.
In addition, if the low_level_output() function is designed to block until the packet transmission is complete, there is absolutely no point in using the pbuf_ref() and the respective pbuf_free() in HAL_ETH_TxFreeCallback() function. Incrementing the PBUF reference count is necessary for a design, where the low_level_output() function doesn't block and the PBUF is released asynchronously.
The RxPktSemaphore has exactly the same wrong semaphore initial count, but, apart from a single additional useless loop iteration in the ethernetif_input() function, the bug has no critical consequences.
Solution
First, initialize the semaphores with a count of 0 in function low_level_init(). Therefore this CubeMX script in ethernetif_***.ftl files:
[#if cmsis_version = "v2"]
RxPktSemaphore = osSemaphoreNew(1, 1, NULL);
TxPktSemaphore = osSemaphoreNew(1, 1, NULL);
[#else][#-- else cmsis_version --]
RxPktSemaphore = xSemaphoreCreateBinary();
TxPktSemaphore = xSemaphoreCreateBinary();
[/#if][#-- endif cmsis_version --]
Must be changed to this:
[#if cmsis_version = "v1"]
RxPktSemaphore = osSemaphoreCreate(osSemaphore(RxSem), 1);
TxPktSemaphore = osSemaphoreCreate(osSemaphore(TxSem), 1);
/* Decrease the semaphores' initial count from 1 to 0 */
osSemaphoreWait(RxPktSemaphore, 0);
osSemaphoreWait(TxPktSemaphore, 0);
[#else][#-- else cmsis_version --]
RxPktSemaphore = osSemaphoreNew(1, 0, NULL);
TxPktSemaphore = osSemaphoreNew(1, 0, NULL);
[/#if][#-- endif cmsis_version --]
In addition to fixing the bugs, I've made other improvements. The CMSIS-RTOSv1 code was using FreeRTOS API, which was functionally correct but inconsistent. The code examples are also using the same inconsistent mix of those two APIs. Also I swapped the order of [if-else] blocks for the CubeMX script to a more logical one and consistent with other parts of the script. I would recommend ST to keep the same order everywhere, which is not the case currently.
Second, the return value of the HAL_ETH_Transmit_IT() function must be checked and, if something other than HAL_OK was returned, the osSemaphoreAcquire() and HAL_ETH_ReleaseTxPacket() calls must be skipped and instead the pbuf_free() must be called to revert the increment done by the earlier pbuf_ref() call, if it is used.
pbuf_ref(p);
if (HAL_ETH_Transmit_IT(&heth, &TxConfig) == HAL_OK) {
while(osSemaphoreAcquire(TxPktSemaphore, TIME_WAITING_FOR_INPUT)!=osOK)
{
}
HAL_ETH_ReleaseTxPacket(&heth);
} else {
pbuf_free(p);
}
Additional information
Although with incomplete description of consequences and their severity, this bug and a correct solution has also been reported there:
https://github.com/STMicroelectronics/STM32CubeF4/issues/127
And the fact of a deadlock happening is also reported there:
https://github.com/STMicroelectronics/STM32CubeH7/issues/224
2023-05-02 02:13 PM
Here is a patch file for ethernetif.c
diff --git "a/LWIP/Target/ethernetif.c" "b/LWIP/Target/ethernetif.c"
index 88f5e6e..c689082 100644
--- "a/LWIP/Target/ethernetif.c"
+++ "b/LWIP/Target/ethernetif.c"
@@ -3,11 +3,11 @@
******************************************************************************
* File Name : ethernetif.c
* Description : This file provides code for the configuration
- * of the Target/ethernetif.c MiddleWare.
+ * of the ethernetif.c MiddleWare.
******************************************************************************
* @attention
*
- * Copyright (c) 2022 STMicroelectronics.
+ * Copyright (c) 2023 STMicroelectronics.
* All rights reserved.
*
* This software is licensed under terms that can be found in the LICENSE file
@@ -33,7 +33,7 @@
/* Within 'USER CODE' section, code will be kept by default at each generation */
/* USER CODE BEGIN 0 */
-
+#include "App.h"
/* USER CODE END 0 */
/* Private define ------------------------------------------------------------*/
@@ -102,7 +102,18 @@ ETH_DMADescTypeDef DMARxDscrTab[ETH_RX_DESC_CNT]; /* Ethernet Rx DMA Descriptor
ETH_DMADescTypeDef DMATxDscrTab[ETH_TX_DESC_CNT]; /* Ethernet Tx DMA Descriptors */
/* USER CODE BEGIN 2 */
-
+CCMRAM StaticSemaphore_t RxSemCB;
+osSemaphoreAttr_t RxAttr = {
+ .name = "RxSemaphore",
+ .cb_mem = &RxSemCB,
+ .cb_size = sizeof(RxSemCB),
+};
+CCMRAM StaticSemaphore_t TxSemCB;
+osSemaphoreAttr_t TxAttr = {
+ .name = "TxSemaphore",
+ .cb_mem = &TxSemCB,
+ .cb_size = sizeof(TxSemCB),
+};
/* USER CODE END 2 */
osSemaphoreId RxPktSemaphore = NULL; /* Semaphore to signal incoming packets */
@@ -244,10 +255,10 @@ static void low_level_init(struct netif *netif)
#endif /* LWIP_ARP */
/* create a binary semaphore used for informing ethernetif of frame reception */
- RxPktSemaphore = osSemaphoreNew(1, 0, NULL);
+ RxPktSemaphore = osSemaphoreNew(1, 0, &RxAttr);//StaticSemaphore_t
/* create a binary semaphore used for informing ethernetif of frame transmission */
- TxPktSemaphore = osSemaphoreNew(1, 0, NULL);
+ TxPktSemaphore = osSemaphoreNew(1, 0, &TxAttr);
/* create the task that handles the ETH_MAC */
/* USER CODE BEGIN OS_THREAD_NEW_CMSIS_RTOS_V2 */
@@ -382,7 +393,10 @@ static err_t low_level_output(struct netif *netif, struct pbuf *p)
pbuf_ref(p);
+ __DSB();__ISB();
HAL_ETH_Transmit_IT(&heth, &TxConfig);
+ __DSB();__ISB();
+
while(osSemaphoreAcquire(TxPktSemaphore, TIME_WAITING_FOR_INPUT)!=osOK)
{
2023-05-06 06:51 AM
Moving some data to TCM/CCM changes the speed of execution and therefore can indirectly hide or reveal some bugs, but it cannot fix those. The lines 65 and 67 are unnecessary. Instead the code must check the return status of the HAL_ETH_Transmit_IT() function as reported in the main post of this topic.
Also, as reported in another topic, the rewritten HAL ETH driver for F series has a bunch of flaws, including a broken internal ETH_Prepare_Tx_Descriptors() function, which is an absolutely critical issue.
2023-05-06 07:59 AM
OK, let me summarise this topic, for those with an IQ below 200.
New driver released.
Full of bugs.
Lots of bugs listed by description (always leaving out key details)
No actual fixed source published anywhere.
If I am wrong, for the 32F417, I will eat my board...
2023-06-09 05:50 AM
I'm not sure if I'm correct here.
I use STM32F407 with freertos and lwip. Try to setup an UDP server example. Init is ok but with the first answer from the server code blocks at mem_free called from HAL_ETH_ReleaseTxPacket. The client receives the correct message but afterwards the function HAL_ETH_ReleaseTxPacket is called a second time and then I got a "LWIP_MEM_ILLEGAL_FREE("mem_free: illegal memory: double free");" because mem->used is 0.
It is the same if I use CMSYS V1 or V2 and also with initializing the semaphores to 0
2023-09-24 10:14 PM - edited 2023-09-24 10:31 PM
Package: H7, V1.11.1
uC: STM32H725IGKX
I'm running into the same bug as described by this topic using Freemodbus: the transmission is getting stuck after a sending data for a few minutes.
After implementing the suggested fix the microcontroller it is crashing on some internal FreeRTOS queue configASSERT(...) - again after running a few minutes.
So this bug still persists and isn't fixed
Stacktrace:
xQueueSemaphoreTake() at queue.c:1,433 0x80114d6
osSemaphoreAcquire() at cmsis_os2.c:1,609 0x80104da
low_level_output() at ethernetif.c:412 0x800f884
ethernet_output() at ethernet.c:312 0x801e8d0
etharp_output_to_arp_index() at etharp.c:769 0x801cb94
etharp_output() at etharp.c:868 0x801ccf2
ip4_output_if_src() at ip4.c:1,007 0x801da22
ip4_output_if() at ip4.c:818 0x801d8de
tcp_output_segment() at tcp_out.c:1,607 0x801b37c
tcp_output() at tcp_out.c:1,360 0x801afda
xMBTCPPortSendResponse() at porttcp.c:313 0x801fa4a
eMBTCPSend() at mbtcp.c:150 0x801f56c
eMBPoll() at mb.c:402 0x801f42c
modbus_task() at modbus.c:59 0x80018f4
pxPortInitialiseStack() at port.c:214 0x80132dc
xQueueSemaphoreTake() at queue.c:1,433 0x80114d6:
BaseType_t xQueueSemaphoreTake( QueueHandle_t xQueue, TickType_t xTicksToWait )
{
BaseType_t xEntryTimeSet = pdFALSE;
TimeOut_t xTimeOut;
Queue_t * const pxQueue = xQueue;
#if( configUSE_MUTEXES == 1 )
BaseType_t xInheritanceOccurred = pdFALSE;
#endif
/* Check the queue pointer is not NULL. */
configASSERT( ( pxQueue ) );
/* Check this really is a semaphore, in which case the item size will be
0. */
configASSERT( pxQueue->uxItemSize == 0 ); <<-- !!!!! This is line 433
Last stuck position before going into FreeRTOS calls:
2023-09-27 09:07 AM
I'm trying to implement this fix/improvement even though I don't face any problems but I'm having some compilation errors.
I'm compiling the code using F4 1.27.1 HAL Firmware Package, using CMSISv1 but I'm getting the following error:
src/ethernetif.c:242:40: note: in expansion of macro 'osSemaphore'
242 | RxPktSemaphore = osSemaphoreCreate(osSemaphore(RxSem), 1);
| ^~~~~~~~~~~
lib/FreeRTOS/include/cmsis_os.h:665:2: error: 'os_semaphore_def_TxSem' undeclared (first use in this function); did you mean 'os_semaphore_def'?
665 | &os_semaphore_def_##name
I replace directly in the code because I'm using VSC+PlatformIO instead. The code I have is:
RxPktSemaphore = osSemaphoreCreate(osSemaphore(RxSem), 1);
TxPktSemaphore = osSemaphoreCreate(osSemaphore(TxSem), 1);
osSemaphoreWait(RxPktSemaphore, 0);
osSemaphoreWait(TxPktSemaphore, 0);
Where the RxSem and TxSem objects come from?
Thanks!
2023-09-27 09:15 AM
I believe I forgot to put definitions:
osSemaphoreDef(RxSem);
osSemaphoreDef(TxSem);
RxPktSemaphore = osSemaphoreCreate(osSemaphore(RxSem), 1);
TxPktSemaphore = osSemaphoreCreate(osSemaphore(TxSem), 1);
osSemaphoreWait(RxPktSemaphore, 0);
osSemaphoreWait(TxPktSemaphore, 0);
2024-07-20 03:36 AM
Old thread, yet i was analyzing this and asked myself whether calling HAL_ETH_Transmit_IT() in ethernetif.c with the stack variable TxConfig is the right thing to do. I remember from using HAL_UART_Transmit_IT() that we can't do it. Those interrupt procedures are asynchronous. And with some ethernet traffic and rapid task switching i thought there may be another problem.
It might work though as the calling thread blocks anyway when waiting for the semaphore. In order to not block one has to find a better place for TxConfig.
Regards, Dieter