cancel
Showing results for 
Search instead for 
Did you mean: 

HardFault after several hours with CDC USB_HOST_M7, USB_OTG_HS

roblahnst
Associate II

Product: RIVERDI RVT70HSSNWC00-B with STM32H757XIH6

STM32Cube FW_H7 V1.11.1
TouchGFX 4.25.0
CubeIDE 1.16.1
FreeRTOS 10.3.1, CMSIS_V2

I am using the code attached in usb_host.c to read data from a USB Virtual COM barcode reader. The .ioc file is also attached.

The code runs fine, however after some hours I get a hard fault. This happens only when the USB-scanner is plugged in. If nothing is plugged to USB it runs fine. I don't do anything with the barcode reader - it is just plugged in and the hard fault occurs.

I already tried the following:

  • USBH_CDC_BUFFER_SIZE 1024 to 2048
  • #define USBH_PROCESS_PRIO osPriorityLow to osPriorityNormal
  • #define USBH_PROCESS_STACK_SIZE ((uint16_t)4*2048) between 512 and 8192
  • STM32Cube FW_H7 V1.11.2

Any help would be very much appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
6 REPLIES 6
TDK
Guru

Where does the hard fault occur? What is the stack trace?

What is the nature of the hard fault? The Fault Analyzer in STM32CubeIDE can help here.

If you feel a post has answered your question, please click "Accept as Solution".

The hard fault occurs at:

Program Counter (PC):  0x080A70CE

Disassembly:

0x080A70C4:   bl      0x0809675E <lwip_htonl>
0x080A70C8:   mov     r4, r0
0x080A70CA:   ldr     r3, [r7, #32]
0x080A70CC:   ldr     r3, [r3, #12]
0x080A70CE:   ldr     r3, [r3, #4]    <-- HARD FAULT occurs here
0x080A70D0:   mov     r0, r3

 

At the time of the crash:

stacked_pc  = 0x080A70CE  (crash instruction)
stacked_lr  = 0x080A70C9
stacked_r3  = 0x50E76AFE  <-- invalid pointer causing the fault

The faulting instruction attempts to dereference r3, which points to an invalid memory address.

Fault Status Registers:

CFSR  = 0x01000000 (UsageFault: UNALIGNED access)
HFSR  = 0x40000000 (Forced Hard Fault)
MMFAR = 0x00000000 (No valid memory fault address)
BFAR  = 0x00000000 (No valid bus fault address)

Interpretation:

UNALIGNED bit set in CFSR → Indicates an unaligned memory access or invalid pointer dereference.

• The forced hard fault is triggered due to an unrecoverable error.

• No memory management or bus fault addresses were captured.

The fault occurs inside the LwIP stack, most likely during a pbuf (packet buffer) operation while handling network traffic. The pointer r3 = 0x50E76AFE is invalid and outside of the valid memory regions (SRAM, QSPI, etc.).

But how is LWIP and CDC connected?

 

Chasing a pointer in a structure. Look at source line.

ptr->ptr->thing

How is that value placed in structure?

Perhaps a memory or pool allocation that fails.

CDC using common resources to LWIP. Perhaps memory/resource leak on CDC side.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

It seems it is here:

080a70ca: ldr r3, [r7, #32]   ; useg laden
080a70cc: ldr r3, [r3, #12]   ; useg->tcphdr
080a70ce: ldr r3, [r3, #4]    ; useg->tcphdr->seqno --> HARDFAULT!

in tcp_out.c:

       /* In the case of fast retransmit, the packet should not go to the tail
         * of the unacked queue, but rather somewhere before it. We need to check for
         * this case. -STJ Jul 27, 2004 */
        if (TCP_SEQ_LT(lwip_ntohl(seg->tcphdr->seqno), lwip_ntohl(useg->tcphdr->seqno))) {
          /* add segment to before tail of unacked list, keeping the list sorted */
          struct tcp_seg **cur_seg = &(pcb->unacked);
          while (*cur_seg &&
                 TCP_SEQ_LT(lwip_ntohl((*cur_seg)->tcphdr->seqno), lwip_ntohl(seg->tcphdr->seqno))) {
            cur_seg = &((*cur_seg)->next );
          }
          seg->next = (*cur_seg);
          (*cur_seg) = seg;
        } else {
          /* add segment to tail of unacked list */
          useg->next = seg;
          useg = useg->next;
        }
      }

 

Apparently this is a known issue => https://savannah.nongnu.org/bugs/?59831

So an issue with tcphdr

It's walking a list, so I'd perhaps suspect unanticipated concurrent behaviour due to RTOS, either switching tasks, interrupt, or something manipulating the list outside of normal flow.

Perhaps needs to be done within a critical section?

 

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..