2024-05-08 11:34 PM
My project crashes after a random period when sending network traffic (one hour or longer working). I want to debug (but without a debugger connected or sitting there and waiting). So, I want to record some data, e.g. when the HardFault_Handler was called, to verify on which location in code the fault happened.
In order to do so, here the approach:
All works so far, just:
The HardFault_Handler can be asynchronous (or synchronous). If an asynchronous HardFault_Handler happens (and it does for me):
More details can be found here:
https://interrupt.memfault.com/blog/cortex-m-hardfault-debug
So, my question:
How to debug an "asynchronous" Hard Fault? How to turn it into a "synchronous" event?
(I tried with disabling the ICache, but not really a difference: the reported PC as location of the crash is a bit later: it can be already in another function call and no idea from where this call was done).
Here, my implementation details:
1. Cause a Hard Fault (by intent):
#if 1
//force a Hard Fault to check our "cr" command
unsigned long *addr = (unsigned long *)(0x08000000 + 0x02000000);
*addr = 0x11223344; //write to invalid address
#if 1
{
int i;
for (i = 0; i < 100; i++)
__NOP(); //the Hard Fault Handler comes here, delayed, imprecise!
}
#endif
#endif
2. Add a Hard Fault Handler and forward to the function recording the "stack frame":
.section .text.Default_Handler,"ax",%progbits
Default_Handler:
TST LR, #4
ITE EQ
MRSEQ R0, MSP
MRSNE R0, PSP
B HardFault_Handler_C
Infinite_Loop:
b Infinite_Loop
.size Default_Handler, .-Default_Handler
Remark: I use it for all handlers defined as "weak", using this Default_Handler, also triggered by a HardFault.
3. The HardFault_Handler_C:
void __USED HardFault_Handler_C(unsigned long *hardfault_args)
{
uint32_t *rtcBkpReg = (uint32_t *)&RTC_START_BKP_REG;
rtcBkpReg += 15; //skip the syscfg
//stacked_r0 = hardfault_args[0];
//stacked_r1 = hardfault_args[1];
//stacked_r2 = hardfault_args[2];
//stacked_r3 = hardfault_args[3];
//stacked_r12 = hardfault_args[4];
*rtcBkpReg++ = hardfault_args[5]; //LR
*rtcBkpReg++ = hardfault_args[6]; //PC
*rtcBkpReg++ = hardfault_args[7]; //XPSR
*rtcBkpReg++ = *((unsigned long *)0xE000ED28); //CFSR
*rtcBkpReg++ = *((unsigned long *)0xE000EFA8); //ABFSR
//*rtcBkpReg = *((unsigned long *)0xE000ED2C); //HFSR ?
////NVIC_SystemReset();
while (1) {
__NOP();
}
}
Remark: I store some data from the stack frame in RTC BackUp registers: I want to make sure they will survive a Reset button pressed (not a power cycle which clears RTC backup registers, if long enough interrupted).
So, after Reset I just print what was recorded (via command "cr"):
void SYSCFG_printCrashInfo(EResultOut out)
{
uint32_t *rtcBkpReg = (uint32_t *)&RTC_START_BKP_REG;
rtcBkpReg += 15; //skip the syscfg
/* we print the RTC backup registers */
print_log(out, " 0 : 0x%08lx\r\n", *rtcBkpReg++);
print_log(out, " 1 : 0x%08lx\r\n", *rtcBkpReg++);
print_log(out, " 2 : 0x%08lx\r\n", *rtcBkpReg++);
print_log(out, " 3 : 0x%08lx\r\n", *rtcBkpReg++);
print_log(out, " 4 : 0x%08lx\r\n", *rtcBkpReg++);
}
All works fine in debugger: when I set a breakpoint on the causing code and step through the code - all looks fine, reasonable (and correct).
But when I run "full speed" (without debugger) - the recorded PC is completely different (and does not make sense, it does not help me to find the causing location in code).
I can make it "more reasonable" with the __NOP() Loop right after the causing instruction: now the PC is reported as doing the __NOP()s. So, a clear indication that the HardFault is "asynchronous" (comes "much" later, with a different, but not the causing PC recorded).
How to make the HardFault a synchronous event?