2024-03-08 03:18 PM
Hi,
I have configured an ethernet application with LWIP and without OS using RAW API, the connection to a server running on a net assistant tool keeps aborting after 19 seconds of running.
With DHCP disabled, the application's ip address is configured to 192.168.3.168, while the server is running with ip address of 192.168.3.40 and port of 5189. The pings to the both ip addresses are OK, the server can be connected from another PC with ip address of 192.168.3.127.
The execution sequence of the code is as below:
tcp_err(pcb, TcpErrorCallBack);
tcp_sent(pcb, TcpSentCallBack);
tcp_recv(pcb, TcpRecvedCallBack);
tcp_poll(pcb, TcpPollCallBack, 10);
tcp_bind(pcb, &localAddr, 0);
err_t err = tcp_connect(pcb, &remoteAddr, remotePort, TcpConnectedCallBack);
The callback function TcpConnectedCallBack() never gets called, and the application has never connected to the server. tcp_poll() is used to execute tcp_connect() every 5 seconds. After the first execution of tcp_connect, all other executions of tcp_connect returns -10 (Conn already established). Sometimes, the first execution of tcp_connect returns -4 (Routing problem). Each time when the first execution of tcp_connect returns ERR_OK,the TcpErrorCallBack gets called with error code of -13(Connection aborted) 19 seconds after its running with following call stack trace:
The tcp_pcb used for tcp_connect has the following contents after the execution of tcp_connect:
When tracking down the execution of tcp_connect, I got this:
It seems there are incoming data packets:
Having done some experiments, I found whether or not I use tcp_poll or tcp_bind, they all do not make difference.
The main.c which includes all the application code attached. The ioc file is also attached.
After 10 hours debugging, I am nearly desperate about this, so please help!
Regards
Chao
E:\SLC\Designs\Software\STM32\TcpF767ZI\Core\Src\main.c
E:\SLC\Designs\Software\STM32\TcpF767ZI\TcpF767ZI.ioc
2024-03-09 01:27 PM
How does one quote just part of a post on this board ?
"Today, I checked the code for initialization and that in while(1) loop in main.c against two Cube examples targeted on different boards" Which Cube examples ? Were these examples FreeRTOS or no OS ?
"I did a reset on LWIP configuration, then disabled DHCP, chose Lan8742 in Platform Settings, enabled LWIP_RAW APIs, and changed nothing else, I got the same result: the server running in a socket tool did not show any new connection, and the F767's Tcp connection was aborted after 19 seconds."
It's the weekend here. I'll be working on F767 networking this week. Please upload this code so that I can debug and test it on Monday.
BTW: have you looked at using any other network stacks ?
2024-03-09 10:53 PM
Which Cube examples ?
LwIP_HTTP_Server_Raw, on STM324x9I-EVAL board
LwIP_TCP_Echo_Client, on STM324x9I-EVAL board
Were these examples FreeRTOS or no OS ?
No RTOS for the both.
My test code (main.c) is attached for your reference. I have run the code on two Nucleo-F767ZI boards, they behave the same: when you build and run, the tcp_connect would return -4 (route error), but once you have run the debugger, step into and through tcp_connect, you won't get -4 any more.
Good luck.
2024-03-10 11:21 AM
I'll look at it this week.
2024-03-10 12:46 PM
After enabling all the TCP debug options in LWLP's Key Options configuration, I managed to get the debug messages at runtime, please see attached files. tcp_debug_1.log is for running LWIP_TCP APIs and tcp_debug_2.log is for running after enabling memory relevant LWIP options (i.e. MEMP_MEM_MALLOC etc.). The major difference between the two runnings is that IP address of 192.168.31.1 dissapears along with the following debug message:
ethernet_input: dest:0hx:0hx:0hx:0hx:0hx:0hx, src:0hx:0hx:0hx:0hx:0hx:0hx, type:ff
pbuf_remove_header: old 0x20002030 new 0x2000203e (14)
etharp_update_arp_entry: 192.168.31.1 - d4:da:21:6b:3f:47
etharp_find_entry: found empty entry 1
etharp_find_entry: no empty entry found and not allowed to recycle
etharp_input: incoming ARP request
etharp_input: ARP request was not for us.
Above message appears many times in tcp_debug_1.log while in tcp_debug_2.log the following messages are frequently seen:
ethernet_input: dest:0hx:0hx:0hx:0hx:0hx:0hx, src:0hx:0hx:0hx:0hx:0hx:0hx, type:ff
pbuf_remove_header: old 0x20001800 new 0x2000180e (14)
etharp_update_arp_entry: 192.168.3.1 - c0:d1:93:da:6a:f4
etharp_find_entry: found matching entry 0
etharp_update_arp_entry: updating stable entry 0
etharp_input: incoming ARP request
etharp_input: ARP request was not for us.
However, the IP address of 192.168.31.1 seems non-existent:
192.168.3.40 is the IP address of the server created by the socket tool. The application didn't send out any data within the while(1) loop.
After the LWIP_TCP API test, I did another test using LWIP_RAW APIs. Like in LWIP_TCP API test, the connection with server was not established either in LWIP_RAW API test.
In LWIP_RAW API test, a short string is sent out to the server every second by using raw_send(), but the following messages show pbuf_alloc problem:
app_run(): tick = 1s
raw_sendto
pbuf_add_header: old 0x200026c0 new 0x200026ac (20)
pbuf_remove_header: old 0x200026ac new 0x200026c0 (20)
pbuf_add_header: old 0x200026c0 new 0x200026ac (20)
ip4_output_if: st0
IP header:
+-------------------------------+
| 4 | 5 | 0x00 | 45 | (v, hl, tos, len)
+-------------------------------+
| 0 |000| 0 | (id, flags, offset)
+-------------------------------+
| 255 | 6 | 0x0000 | (ttl, proto, chksum)
+-------------------------------+
| 168 | 3 | 168 | 192 | (src)
+-------------------------------+
| 40 | 3 | 168 | 192 | (dest)
+-------------------------------+
ip4_output_if: call netif->output()
pbuf_add_header: old 0x200026ac new 0x2000269e (14)
ethernet_output: sending packet 0x20002678
sct calling h=ip_reass_tmr t=0 arg=0x80180c8
tcpip: ip_reass_tmr()
sys_timeout: 0x20000958 abs_time=4183 handler=ip_reass_tmr arg=0x80180c8
sct calling h=etharp_tmr t=0 arg=0x80180d4
tcpip: etharp_tmr()
etharp_timer
sys_timeout: 0x20000d78 abs_time=4198 handler=etharp_tmr arg=0x80180d4
app_run(): tick = 2s
raw_sendto
pbuf_add_header: old 0x2000269e new 0x2000268a (20)
pbuf_remove_header: old 0x2000268a new 0x2000269e (20)
pbuf_add_header: old 0x2000269e new 0x2000268a (20)
ip4_output_if: st0
IP header:
+-------------------------------+
| 4 | 5 | 0x00 | 45 | (v, hl, tos, len)
+-------------------------------+
| 1 |000| 0 | (id, flags, offset)
+-------------------------------+
| 255 | 6 | 0x0000 | (ttl, proto, chksum)
+-------------------------------+
| 168 | 3 | 168 | 192 | (src)
+-------------------------------+
| 40 | 3 | 168 | 192 | (dest)
+-------------------------------+
ip4_output_if: call netif->output()
pbuf_add_header: failed as 0x2000267c < 0x20002688 (not enough space for new header size)
ethernet_output: could not allocate room for header.
sct calling h=ip_reass_tmr t=0 arg=0x80180c8
tcpip: ip_reass_tmr()
sys_timeout: 0x20000958 abs_time=5183 handler=ip_reass_tmr arg=0x80180c8
sct calling h=etharp_tmr t=0 arg=0x80180d4
tcpip: etharp_tmr()
etharp_timer
sys_timeout: 0x20000d78 abs_time=5198 handler=etharp_tmr arg=0x80180d4
app_run(): tick = 3s
raw_sendto
pbuf_add_header: failed as 0x20002676 < 0x20002688 (not enough space for new header size)
pbuf_alloc(length=0)
pbuf_alloc(length=0) == 0x20000d98
pbuf_chain: 0x20000d98 references 0x20002678
raw_sendto: added header pbuf 0x20000d98 before given pbuf 0x20002678
pbuf_add_header: old 0x20000dcc new 0x20000db8 (20)
ip4_output_if: st0
IP header:
+-------------------------------+
| 4 | 5 | 0x00 | 45 | (v, hl, tos, len)
+-------------------------------+
| 2 |000| 0 | (id, flags, offset)
+-------------------------------+
| 255 | 6 | 0x0000 | (ttl, proto, chksum)
+-------------------------------+
| 168 | 3 | 168 | 192 | (src)
+-------------------------------+
| 40 | 3 | 168 | 192 | (dest)
+-------------------------------+
ip4_output_if: call netif->output()
pbuf_add_header: old 0x20000db8 new 0x20000daa (14)
ethernet_output: sending packet 0x20000d98
pbuf_free(0x20000d98)
pbuf_free: deallocating 0x20000d98
pbuf_free: 0x20002678 has ref 1, ending here.
sct calling h=ip_reass_tmr t=19 arg=0x80180c8
tcpip: ip_reass_tmr()
sys_timeout: 0x20000958 abs_time=6183 handler=ip_reass_tmr arg=0x80180c8
sct calling h=etharp_tmr t=18 arg=0x80180d4
tcpip: etharp_tmr()
etharp_timer
sys_timeout: 0x20000d78 abs_time=6198 handler=etharp_tmr arg=0x80180d4
Then I set the min heap size from 0x400 to 0x4000, and min stack size from 0x200 to 0x800, running the application again and I didn't see the difference. Replace raw_send() with raw_sendto() (raw_connect() is not called in this case), no incoming message gets printed out in the server side.
The code is as follow:
#define USE_RAW_API
struct raw_pcb* rawPcb;
struct tcp_pcb* tcpPcb;
struct udp_pcb* udpPcb;
ip_addr_t localAddr;
ip_addr_t remoteAddr;
uint16_t remotePort;
uint32_t startTick;
int secondCount;
char str[512];
uint8_t connected;
struct pbuf* pBuffer;
/* USER CODE END PV */
/* Private function prototypes -----------------------------------------------*/
void SystemClock_Config(void);
static void MX_GPIO_Init(void);
static void MX_USART3_UART_Init(void);
static void MX_RTC_Init(void);
/* USER CODE BEGIN PFP */
void app_init();
void app_run();
err_t TcpRecvedCallBack(void *arg, struct tcp_pcb* pcb, struct pbuf* p, err_t err);
err_t TcpSentCallBack(void *arg, struct tcp_pcb* pcb, uint16_t len);
err_t TcpConnectedCallBack(void *arg, struct tcp_pcb* pcb, err_t err);
void TcpErrorCallBack(void *arg, err_t err);
err_t TcpPollCallBack(void *arg, struct tcp_pcb* tpcb);
u8_t RawRecvedCallBack(void *arg, struct raw_pcb* pcb, struct pbuf* p, const ip_addr_t* remote_addr);
int main(void)
{
HAL_Init();
SystemClock_Config();
MX_GPIO_Init();
MX_USART3_UART_Init();
MX_LWIP_Init();
MX_RTC_Init();
app_init();
while (1)
{
app_run();
MX_LWIP_Process();
}
}
/* USER CODE BEGIN 4 */
void tcp_app_init()
{
tcpPcb = tcp_new();
if (tcpPcb == NULL)
{
printf("app_init: pcb = null\n");
}
tcp_err(tcpPcb, TcpErrorCallBack);
tcp_sent(tcpPcb, TcpSentCallBack);
tcp_recv(tcpPcb, TcpRecvedCallBack);
tcp_poll(tcpPcb, TcpPollCallBack, 10);
// tcp_bind(tcpPcb, &localAddr, 0);
err_t err = tcp_connect(tcpPcb, &remoteAddr, remotePort, TcpConnectedCallBack);
printf("\ntcp_app_init: tcp_connect returned err = %d\n\n", err);
}
void raw_app_init()
{
rawPcb = raw_new(IP_PROTO_TCP);
pBuffer = pbuf_alloc(PBUF_TRANSPORT, 2048, PBUF_RAM);
raw_bind(rawPcb, &localAddr);
err_t err = raw_connect(rawPcb, &remoteAddr);
printf("\nraw_app_init: raw_connect returned err = %d\n\n", err);
if (err == ERR_OK)
{
connected = 1;
}
raw_recv(rawPcb, RawRecvedCallBack, NULL);
}
void udp_app_init()
{
}
void app_init()
{
localAddr.addr = LWIP_MAKEU32(192, 168, 3, 168);
remoteAddr.addr = LWIP_MAKEU32(192, 168, 3, 40);
remotePort = 5189;
connected = 0;
startTick = HAL_GetTick();
secondCount = 0;
#ifdef USE_TCP_API
tcp_app_init();
#elif defined(USE_RAW_API)
raw_app_init();
#elif defined(USE_UDP_API)
udp_app_init();
#else
#endif
}
void app_run()
{
if ((HAL_GetTick() - startTick) > 1000)
{
secondCount++;
startTick = HAL_GetTick();
HAL_GPIO_TogglePin(LD1_GPIO_Port, LD1_Pin);
#ifdef USE_TCP_API
printf("\napp_run(): tick = %ds, pcb->state = %d (0-CLOSED 2-SYN_SENT 3-SYN_RCVD 4-ESTABLISHED)\n\n", secondCount, tcpPcb->state);
#elif defined(USE_RAW_API)
sprintf(str, "\napp_run(): tick = %ds\n\n", secondCount);
printf("%s", str);
if (connected == 1)
{
uint16_t len = strlen(str);
memcpy(pBuffer->payload, (uint8_t*)str, len);
pBuffer->len = len;
pBuffer->tot_len = len;
raw_send(rawPcb, pBuffer);
// raw_sendto(rawPcb, pBuffer, &remoteAddr);
}
#elif defined(USE_UDP_API)
#else
#endif
}
}
err_t TcpPollCallBack(void *arg, struct tcp_pcb* tpcb)
{
// printf("TcpPollCallBack: connected = %d\n", connected);
// err_t err = tcp_connect(tcpPcb, &remoteAddr, remotePort, TcpConnectedCallBack);
// printf("TcpPollCallBack: connected = %d, tcp_connect returned err = %d\n", connected, err);
return ERR_OK;
}
void TcpErrorCallBack(void *arg, err_t err)
{
printf("\nTcpErrorCallBack: err = %d\n\n", err);
}
err_t TcpRecvedCallBack(void *arg, struct tcp_pcb* pcb, struct pbuf* p, err_t err)
{
printf("\nTcpRecvedCallBack: err = %d\n\n", err);
return ERR_OK;
}
u8_t RawRecvedCallBack(void *arg, struct raw_pcb* pcb, struct pbuf* p, const ip_addr_t* remote_addr)
{
printf("RawRecvedCallBack: received %d bytes - %s", p->len, (char*)p->payload);
return 0;
}
err_t TcpSentCallBack(void *arg, struct tcp_pcb* pcb, uint16_t len)
{
printf("\nTcpSentCallBack: len = %d\n\n", len);
return ERR_OK;
}
err_t TcpConnectedCallBack(void *arg, struct tcp_pcb* pcb, err_t err)
{
connected = 1;
printf("\nTcpConnectedCallBack: err = %d\n\n", err);
return ERR_OK;
}
void print_status(uint8_t status)
{
switch (status)
{
case 1: printf("\nethernet_link_status_updated(): netif is up\n\n"); break;
case 2: printf("\nethernet_link_status_updated(): netif is down\n\n"); break;
}
}
PUTCHAR_PROTOTYPE
{
HAL_UART_Transmit(&huart3, (uint8_t *)&ch, 1, 0xFFFF);
return ch;
}
/* USER CODE END 4 */
Ethernet global Interrupt is NOT enabled. And didn't change the GPIO Settings of the Nucleo board. LAN8742 is chosen.
In the Ethernet Configuration for ETH, there is a warning:
"The ETH can work only when RAM is pointing at 0x24000000"
I don't know what to do to avoid any potential problems because of this warning. I tried to change the First Tx Descriptor Address and First Rx Descriptor Address (just under the warning) from 0x2007c0a0 and 0x2007c000 to 0x2407c0a0 and 0x2407c000, but CubeMX does not allow me to modify them.
Chao
2024-03-11 12:49 AM
It's quite a lot (well documented) you are posting here, and working through that takes too much time for most people.
Linker script: That's why you want to stop using Cube at some point.
For Lwip I have some heavily modified linker scripts, to make sure the ethernet descriptors and buffers are placed correctly.
I recently worked more with H7 which has different descriptor and internal SRAM setup.
Things you might want to check with your problems:
- as said above, variable / descr. placement
- check if the old TCP connections were really closed
- number of connections your TCP server accepts, and number of current connections (..\Middlewares\Third_Party\LwIP\src\include\lwip\opt.h : MEMP_NUM_TCP_PCB, MEMP_NUM_TCP_PCB_LISTEN)
- TCP pbufs really all freed correctly?
- RX buffer settings, pool size, ...
PS: I just remember the main reason I switched from F7 to H7 (apart from HyperRam support via OCTOSPI):
I had some hardware IP / TCP checksum problems, probably coming from some unaligned http messages that lwip created, but I never found out for sure. System ran for minutes or hours sometimes, and then bang HW checksum was 0 and TCP failed. That only happened with small CGI / SSI http data packets while TCP streaming audio data. With the http stuff off, it never happened.
So check via Wireshark if things like that happen.
2024-03-12 09:12 AM
Thank you for your reply.
This post is indeed quite long, therefore I wrote a new post for the connection issue at here :
Regards
Chao