2019-08-26 08:45 AM
Hello,
I've noticed and at last analyzed/debug the nasty situation, where my SSL client task suddenly "hangs" with no further action.
I've analyzed the situation and it seems that it hangs on taking a semaphore with infinite timeout inside LwIP part.
#1
I was quite surprised, because I've noticed, that semaphore is taken (called) by using infinite timeout. That means that task will never resume or know that something is wrong (it basically tries to take semaphore to send a message through LwIP). Shouldn't such code always be written in more non-blocking manner and return in some finite time interval if no semaphore is available...
#2
How is this possible to have such code solutions in such known library as LwIP.
AFAIK, blocking calls should be avoided, particularly if they show potential to be blocking forever.
#3
Are there any other option to prevent such situations (maybe some task watchdog, or task supervision from another task, etc...) ? If you can point me to usefull info, I'd much appreciate.
#4
There is a setting in the LwIP code that enables/disables IP Core Locking - LWIP_TCPIP_CORE_LOCKING.
Has anyone any idea, what happens if I disable that setting ?
Thanks in advance,
regards,
Bully.
Call Stack (from bottom -> up):
in sys_arch_sem_wait() at sys_arch.c:322 0x802ecc6 :
#if (osCMSIS < 0x20000U)
while(osSemaphoreWait (*sem, osWaitForever) != osOK);
return (osKernelSysTick() - starttime);
#else
while(osSemaphoreAcquire(*sem, osWaitForever) != osOK);
return (osKernelGetTickCount() - starttime);
#endif
in lwip_netconn_do_write() at api_msg.c:1.675 0x801ff28
/**
* Send some data on a TCP pcb contained in a netconn
* Called from netconn_write
*
* @param m the api_msg_msg pointing to the connection
*/
void
lwip_netconn_do_write(void *m)
{
struct api_msg *msg = (struct api_msg*)m;
if (ERR_IS_FATAL(msg->conn->last_err)) {
msg->err = msg->conn->last_err;
} else {
if (NETCONNTYPE_GROUP(msg->conn->type) == NETCONN_TCP) {
#if LWIP_TCP
if (msg->conn->state != NETCONN_NONE) {
/* netconn is connecting, closing or in blocking write */
msg->err = ERR_INPROGRESS;
} else if (msg->conn->pcb.tcp != NULL) {
msg->conn->state = NETCONN_WRITE;
/* set all the variables used by lwip_netconn_do_writemore */
LWIP_ASSERT("already writing or closing", msg->conn->current_msg == NULL &&
msg->conn->write_offset == 0);
LWIP_ASSERT("msg->msg.w.len != 0", msg->msg.w.len != 0);
msg->conn->current_msg = msg;
msg->conn->write_offset = 0;
#if LWIP_TCPIP_CORE_LOCKING
if (lwip_netconn_do_writemore(msg->conn, 0) != ERR_OK) {
LWIP_ASSERT("state!", msg->conn->state == NETCONN_WRITE);
UNLOCK_TCPIP_CORE();
sys_arch_sem_wait(LWIP_API_MSG_SEM(msg), 0);
LOCK_TCPIP_CORE();
LWIP_ASSERT("state!", msg->conn->state != NETCONN_WRITE);
}
#else /* LWIP_TCPIP_CORE_LOCKING */
lwip_netconn_do_writemore(msg->conn);
#endif /* LWIP_TCPIP_CORE_LOCKING */
/* for both cases: if lwip_netconn_do_writemore was called, don't ACK the APIMSG
since lwip_netconn_do_writemore ACKs it! */
return;
} else {
msg->err = ERR_CONN;
}
#else /* LWIP_TCP */
msg->err = ERR_VAL;
#endif /* LWIP_TCP */
#if (LWIP_UDP || LWIP_RAW)
} else {
msg->err = ERR_VAL;
#endif /* (LWIP_UDP || LWIP_RAW) */
}
}
TCPIP_APIMSG_ACK(msg);
}
.....
2022-05-27 06:50 AM
Hello,
Recently I encountered the exactly same issue with LwIP.
Did you find a proper solution to prevent application hanging and what was the main reason?
Best Regards,
PKEY
2022-05-27 07:47 AM
Hello,
we haven't found the solution, but we also changed the network connection type...
Regards.