2024-12-12 03:00 PM - last edited on 2024-12-12 11:49 PM by Andrew Neil
Split from: https://community.st.com/t5/stm32-mcus-embedded-software/hardfault-udp-client/m-p/716640
I have the same problem. @LEAMtw, I hope you don't mind if I add details here for my scenarios.
Thread #1 [main] 1 [core: 0] (Suspended : Signal : SIGTRAP:Trace/breakpoint trap)
MemManage_Handler() at stm32h7xx_it.c:105 0x8016138
<signal handler called>() at 0xffffffe9
ethernet_input() at ethernet.c:113 0x802e6c4
ethernetif_input() at ethernetif.c:347 0x8023e18
MX_LWIP_Process() at lwip.c:156 0x8023b6a
Firmware::MainRuntimeLoop() at firmware.cc:101 0x80034de
Boilerplate_Loop() at boilerplate.cc:16 0x80011d6
main() at main.c:131 0x8015458
When debugging, I noticed that `p` buffer is valid, however, the payload is invalid.
Name : p
Details:0x30004140 <memp_memory_RX_POOL_base+15680>
Default:0x30004140 <memp_memory_RX_POOL_base+15680>
Decimal:805323072
Hex:0x30004140
Binary:110000000000000100000101000000
Octal:06000040500
Name : payload
Details:0x9a000000
Default:0x9a000000
Decimal:-1711276032
Hex:0x9a000000
Binary:10011010000000000000000000000000
Octal:023200000000
In my case I am not using FreeRTOS, I am using a bare-metal approach instead.
I have a STM32H723ZG. This is how things are configured.
## MPU
2024-12-12 11:58 PM - edited 2024-12-13 12:49 AM
@eduardo_reis wrote:
I have the same problem. @LEAMtw, I hope you don't mind if I add details here for my scenarios.
It may be similar symptoms, but not necessarily the same problem. So best to have your own thread - then you can mark your own solution when done.
@eduardo_reis wrote:In my case I am not using FreeRTOS, I am using a bare-metal approach instead.
Could be quite a significant difference!
Have you tried the previous suggestions:
@eduardo_reis wrote:When debugging, I noticed that `p` buffer is valid, however, the payload is invalid.
Are you passing auto data, so that the data is no longer valid by the time LwIP tries to access it?
2024-12-13 05:49 AM - last edited on 2024-12-13 06:35 AM by Andrew Neil
Duplicate - merged.
I have the same problem. @LEAMtw, I hope you don't mind if I add details here for my scenarios.
Thread #1 [main] 1 [core: 0] (Suspended : Signal : SIGTRAP:Trace/breakpoint trap) MemManage_Handler() at stm32h7xx_it.c:105 0x8016138 <signal handler called>() at 0xffffffe9 ethernet_input() at ethernet.c:113 0x802e6c4 ethernetif_input() at ethernetif.c:347 0x8023e18 MX_LWIP_Process() at lwip.c:156 0x8023b6a Firmware::MainRuntimeLoop() at firmware.cc:101 0x80034de Boilerplate_Loop() at boilerplate.cc:16 0x80011d6 main() at main.c:131 0x8015458
When debugging, I noticed that `p` buffer is valid, however, the payload is invalid.
Name : p Details:0x30004140 <memp_memory_RX_POOL_base+15680> Default:0x30004140 <memp_memory_RX_POOL_base+15680> Decimal:805323072 Hex:0x30004140 Binary:110000000000000100000101000000 Octal:06000040500
Name : payload Details:0x9a000000 Default:0x9a000000 Decimal:-1711276032 Hex:0x9a000000 Binary:10011010000000000000000000000000 Octal:023200000000
In my case I am not using FreeRTOS, I am using a bare-metal approach instead.
I have a STM32H723ZG. This is how things are configured.
2024-12-13 06:39 AM
@eduardo_reis wrote:
I have the same problem.
:
When debugging, I noticed that `p` buffer is valid,
That's a different problem: In @LEAMtw's case, the pointer was invalid, and accessing the invalid memory caused a Hard Fault.
Also, that case was using FreeRTOS - which you're not.
2024-12-13 07:11 AM - edited 2024-12-13 07:45 AM
@Andrew Neil thank you for creating the spilt. I ended up duplicating my post cause I didn't get the notification about it and I though my original response didn't go through when note seeing it here. Make more sense to put it as its own thing indeed.
- Increase stack size
- Enable diagnostics
By stack size you mean LWIP MEM_SIZE? What should I estimate when defining the stack size?I have it as 14 KB, which to seems to be way above my needs.
I enabled the following debug flags:
/*----- Default Value for ETHARP_DEBUG: LWIP_DBG_OFF ---*/
#define ETHARP_DEBUG LWIP_DBG_ON
/*----- Default Value for NETIF_DEBUG: LWIP_DBG_OFF ---*/
#define NETIF_DEBUG LWIP_DBG_ON
/*----- Default Value for PBUF_DEBUG: LWIP_DBG_OFF ---*/
#define PBUF_DEBUG LWIP_DBG_ON
/*----- Default Value for RAW_DEBUG: LWIP_DBG_OFF ---*/
#define RAW_DEBUG LWIP_DBG_ON
/*----- Default Value for MEM_DEBUG: LWIP_DBG_OFF ---*/
#define MEM_DEBUG LWIP_DBG_ON
/*----- Default Value for MEMP_DEBUG: LWIP_DBG_OFF ---*/
#define MEMP_DEBUG LWIP_DBG_ON
/*----- Default Value for UDP_DEBUG: LWIP_DBG_OFF ---*/
#define UDP_DEBUG LWIP_DBG_ON
and got the following output on the serial monitor when initialising my application.
10:38:53.370 -> Assertion "udp_remove: invalid pcb" failed at line 1185 in ../Middlewares/Third_Party/LwIP/src/core/udp.c
Despite the Assertion above, the application was initialised and it seems to be running fine. I have to bring to start publishing udp packages with a certain size in order to make it crash after sometime.
I don't see any new message when it crashes.
2024-12-19 10:32 AM - edited 2024-12-19 11:54 AM
GPT gave me a lead on a few other things to investigate, and one of them seems to have fixed this issue.
I had a concurrency problem. In my main loop, I had `MX_LWIP_Process`; in another interruption, the LWIP was sending packages. At some point, the interruption would interrupt the `MX_LWIP_Process,` and things would be messed up from then on.
Instead of dealing with synchronization, such as semaphores, etc, I simply moved the `MX_LWIP_Process` to the beginning of the interruption, which has worked well so far in my application.
I am still getting this assertion failed on the LwIP debug.
10:38:53.370 -> Assertion "udp_remove: invalid pcb" failed at line 1185 in ../Middlewares/Third_Party/LwIP/src/core/udp.c
I still don't understand why, given that my assumption about this being a concurrency problem is correct, the issue did not happen all the time but only in some use cases.
In any case, I am marking this as solved. Thank you so much to everyone.
PS: I am un-marking this as the solution because I found the problem is persistent in another test case. So, it turned out this solution is only partial.