2021-05-07 09:32 AM
My code, which is in progress, stopped working with a hard fault. This is on a STM32G431 with several timers running and two dma channels. I can single step through the code with breakpoints on the timer IRQ's and the DMA IRQ's and it works for several minutes then jumps to the hard fault.
I've been trying to use an App note from Segger, AN00016, to debug this.
Here are the general registers:
General Registers General Purpose and FPU Register Group
r0 0x1af0 (Hex)
r1 0x48000000 (Hex)
r2 1
r3 0x602b5300 (Hex)
r4 0x1eb (Hex)
r5 0
r6 8
r7 0x40013400 (Hex)
r8 0x400 (Hex)
r9 0x800 (Hex)
r10 0x100 (Hex)
r11 1
r12 0
sp 0x20007ed0
lr 0xfffffff1 (Hex)
pc 0x800413c <HardFault_Handler>
xpsr 1627389955
msp 536903376
psp 0
Per the app note the lr register bit 2 is 0, so the main stack is reporting the fault information. The sp is at 0x20007ed0 and the stack contents are:
memory value app note ref
0x2007ED0 00001AF0 r0
0X20007ED4 48000000 r1
0X20007ED8 00000001 r2
0X20007EDC 602B5300 r3
0X20007EE0 00000000 r12
0X20007EE4 080086B7 lr
0X20007EE8 602B5300 pc
0X20007EEC 6000000F xPSR
Per the app note the first four values are r1-r3 and the 5th value should be r12.
If I use this the previous pc value is 0x602B5300. This is not in code, it is in the memory area FSMC bank1. Which puts me at a dead end ....... ;(
Is there any guidance on how to better debug a hard fault? I realize it is most likely an uninitialized pointer or a peripheral access to invalid memory, or a buffer over-run, I'm looking for these things but it is a needle in a haystack. I was looking for a procedure that is a little more deterministic.
Any suggestions are appreciated.
Thanks,
2021-05-07 10:03 AM
There are routines I've posted to automatically output register/stack content.
PC suggests you perhaps popped something off the stack, or did a 'blx r3' using ASCII data, or something else unhelpful.
Look at what subroutine LR suggests is the origin, and what function is being called.
Walk back up the stack identifying pointers and subroutines (PC and LR pushes) this might help understand the call tree, and parameters passed.
Walk your own code to understand the flow/logic.
Add sanity checking in the routines/logic implicated.
Add telemetry output so you can establish flow, stack depth, and general integrity as it approaches the fault.
2021-05-07 11:25 AM
The SCB registers will typically provide the best info in the case of an uninitialized pointer or reading out of bounds. It will likely give you the address of the offending instruction.
2021-05-07 01:07 PM
One way to proceed is to have a look at the instruction before the address in lr (which has LSB set as Cortex-M runs (permanently) in Thumb mode), That is a subroutine call, and the target of that call is the routine which caused the problem. I'm quite willing to bet that it's a bx r3, and a result of function pointer call. From mixed source/disasm view, find out in which routine that is.
JW