2024-08-13 11:41 PM
Hello
After running the code for about one to two hours I always get Hard Fault exception. Readout of the registers in this Hard Fault while loop are:
HFSR=0x4000 0000
CFSR=0x8200
BFAR=0x20020000
MMFAR=0x20020000
AFSR=0
Readout of the SP register shows:
SP=0x2001ff40
*(SP)=8
*(SP-1)=8
*(SP-2)=1
*(SP-3)=2
*(SP-4)=2
*(SP-5)=2
*(SP-6)=0
*(SP-7)=0
*(SP-8)=0x2001ffc0
*(SP-9)=0x8012ae8
What is going on here? How to make a proper recovery from this situation?
Solved! Go to Solution.
2024-10-02 01:51 AM
Problem with this exception was solved.
The cause was a DC/DC converter in the near proximity of the board with this microcontroller because of EMC interference. After replacing the DC/DC converter with other one the problem was gone.
2024-08-14 12:57 AM - edited 2024-08-14 04:18 AM
KB: How to debug a HardFault on an Arm Cortex®-M STM32
https://interrupt.memfault.com/blog/cortex-m-hardfault-debug
You can use CubeIDE integrated hard fault analyzer to get a friendlier view of state.
You can use CubeIDE build analyzer to find which function lives at certain address (this doesn't require an active debug session, unlike disassembly view).
Possibly (If I've decoded the data correctly), you have a divide-by-zero error occurring at 0x8012ae8.
2024-08-14 02:15 AM
Thanks for the fast reply.
I do not use CubeIDE for this project, I use Atollic TrueSTUDIO.
How did you get to idea that it is a divide-by-zero problem?
I mean:
HFSR=0x4000 0000 -> I have a FORCED hard fault
CFSR=0x0000 8200 -> PRECIS ERR and BFAR VALID which means the address in BFAR is valid
BFAR=0x20020000
I assume there was and access to this location presumably a read. In my linker .ld file I have: _estack = 0x20020000
Does this have some connection in some ways?
Also I do not have any code on address 0x8012ae8. My code according to .list file and settings in the ld file starts at 0x08020000.
2024-08-14 02:51 AM
True Studio has the fault analyzer, same as in CubeIDE. [video]
> I do not have any code on address 0x8012ae8.
This likely is the culprit. Stack overwrite?
2024-08-14 03:30 AM
> How did you get to idea that it is a divide-by-zero problem?
My Mistake. I searched for CM4 CFSR bits definition but got the CM3 page instead.
2024-08-14 03:49 AM - edited 2024-08-14 06:01 AM
Isn't your stack dump showing the wrong addresses? The stack (in Cortex-M4) grows downwards. If you want to see what was pushed on the stack by the exception (esp. the PC), you should be looking at SP+n not at SP-n . That's why the only value that looks like a code address doesn't make sense (PC should be available at *((uint32_t*)SP)+6 ) unless I'm wrong again).
That's why it's simpler to just make use of the Hard fault analyzer / GUI debugger, avoiding all these easy-to-make mistakes.
2024-08-14 05:09 AM
Looking at the call stack when the error happens can give you insight. If it's a stack overflow. If stack variables are corrupted, likely there's an out of bounds write that is at fault.
Does you code do dynamic memory allocation? (malloc/free)
2024-08-14 05:54 AM
Ok. I made it wrongly. Instead of incrementing decrementing. I will correct that in my code.
Yes, I will proceed, when debugging this problem, with fault analyzer. I did not even know that such tool exists. Thanks to you all sharing this with me.
I will be able to work on the system on Friday and I hope I will have more information about this exception.
2024-08-14 06:01 AM
I make some allocation of small amount of memory at the initialization stage with malloc which is never released.
2024-10-02 01:51 AM
Problem with this exception was solved.
The cause was a DC/DC converter in the near proximity of the board with this microcontroller because of EMC interference. After replacing the DC/DC converter with other one the problem was gone.