cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F4 LwIP memp Pbuf Linked List Pointer Corruption Causes Hard Fault

nathan23
Associate
Posted on November 16, 2016 at 17:00

Hello STM32 group.  I don't post a lot, but frequently read and benefit from the conversations on this board so I wanted to update you on something we've recently spent quite a bit of time debugging.

We created an application based on the STM32F4x7_ETH_LwIP_V1.1.1 project.  (Apparently if you use the cube utility the default LwIP settings are configured the same way.  I haven't verified this.)

The problem presented with somewhat random hard faults.  If we pinged the controller continuously it might happen 4 or 5 times a day.  These were usually coming from the memp_tab[type] = memp->next; instruction in msg = (struct tcpip_msg *)memp_malloc(MEMP_TCPIP_MSG_INPKT); call from the tcpip_input() function in tcpip.c.  The pointer memp->next was usually pointed out to some random inaccessible place in memory, totally random.  Occasionally the faults came from other places in memory.

The SYS_LIGHTWEIGHT_PROT in lwipopts.h was switched off.  Enabling this switch fixed our problem.  Apparently the buffer allocation or memory allocation function was getting interrupted.  Switching this on protects the allocation/deallocation critical regions.

Hope this helps.
3 REPLIES 3
Posted on November 16, 2016 at 17:29

Thanks for the info.

/** SYS_LIGHTWEIGHT_PROT

 * define SYS_LIGHTWEIGHT_PROT in lwipopts.h if you want inter-task protection

 * for certain critical regions during buffer allocation, deallocation and memory

 * allocation and deallocation.

 */

So, are you using some kind of multitasking, RTOS or similar? Context is important here.

JW

nathan23
Associate
Posted on November 16, 2016 at 20:34

Hello Jan,

I'm using the STM32F407VG processor, LwIP 1.4.1, FreeRTOS 7.3.0 and the CM3 port for STemWinLibrary522 (because the CM4 port requires the FPU to be on and I'm not using it... another thing you can spend a few days trying to figure out if you don't pay close attention!).

I wasn't able to actually watch it happen so I can't put a finger on the exact source of the problem.  DMA is handling the incoming packets from the PHY, and the Ethernet task where the allocation/deallocation happens to process the incoming packets (and where the corruption is occurring I assume) is the highest priority task.  I'm assuming that occasionally the Eth/DMA interrupt is calling when the code is executing a critical section of an allocation/deallocation function.  Before I could research that theory it became obvious that the SYS_LIGHTWEIGHT_PROT functionality had fixed the issue.  I'm running way behind so I'm off to the next problem.

Hope this helps.

N

Posted on November 17, 2016 at 00:24

Nathan,

thanks.

Jan