How to diagnose a Hard Fault Exception on STM32F407IGT
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-13 11:41 PM
Hello
After running the code for about one to two hours I always get Hard Fault exception. Readout of the registers in this Hard Fault while loop are:
HFSR=0x4000 0000
CFSR=0x8200
BFAR=0x20020000
MMFAR=0x20020000
AFSR=0
Readout of the SP register shows:
SP=0x2001ff40
*(SP)=8
*(SP-1)=8
*(SP-2)=1
*(SP-3)=2
*(SP-4)=2
*(SP-5)=2
*(SP-6)=0
*(SP-7)=0
*(SP-8)=0x2001ffc0
*(SP-9)=0x8012ae8
What is going on here? How to make a proper recovery from this situation?
Solved! Go to Solution.
- Labels:
-
STM32F4 Series
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-10-02 1:51 AM
Problem with this exception was solved.
The cause was a DC/DC converter in the near proximity of the board with this microcontroller because of EMC interference. After replacing the DC/DC converter with other one the problem was gone.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-14 12:57 AM - edited ‎2024-08-14 4:18 AM
KB: How to debug a HardFault on an Arm Cortex®-M STM32
https://interrupt.memfault.com/blog/cortex-m-hardfault-debug
You can use CubeIDE integrated hard fault analyzer to get a friendlier view of state.
You can use CubeIDE build analyzer to find which function lives at certain address (this doesn't require an active debug session, unlike disassembly view).
Possibly (If I've decoded the data correctly), you have a divide-by-zero error occurring at 0x8012ae8.
- Please post an update with details once you've solved your issue. Your experience may help others.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-14 2:15 AM
Thanks for the fast reply.
I do not use CubeIDE for this project, I use Atollic TrueSTUDIO.
How did you get to idea that it is a divide-by-zero problem?
I mean:
HFSR=0x4000 0000 -> I have a FORCED hard fault
CFSR=0x0000 8200 -> PRECIS ERR and BFAR VALID which means the address in BFAR is valid
BFAR=0x20020000
I assume there was and access to this location presumably a read. In my linker .ld file I have: _estack = 0x20020000
Does this have some connection in some ways?
Also I do not have any code on address 0x8012ae8. My code according to .list file and settings in the ld file starts at 0x08020000.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-14 2:51 AM
True Studio has the fault analyzer, same as in CubeIDE. [video]
> I do not have any code on address 0x8012ae8.
This likely is the culprit. Stack overwrite?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-14 3:30 AM
> How did you get to idea that it is a divide-by-zero problem?
My Mistake. I searched for CM4 CFSR bits definition but got the CM3 page instead.
- Please post an update with details once you've solved your issue. Your experience may help others.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-14 3:49 AM - edited ‎2024-08-14 6:01 AM
Isn't your stack dump showing the wrong addresses? The stack (in Cortex-M4) grows downwards. If you want to see what was pushed on the stack by the exception (esp. the PC), you should be looking at SP+n not at SP-n . That's why the only value that looks like a code address doesn't make sense (PC should be available at *((uint32_t*)SP)+6 ) unless I'm wrong again).
That's why it's simpler to just make use of the Hard fault analyzer / GUI debugger, avoiding all these easy-to-make mistakes.
- Please post an update with details once you've solved your issue. Your experience may help others.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-14 5:09 AM
Looking at the call stack when the error happens can give you insight. If it's a stack overflow. If stack variables are corrupted, likely there's an out of bounds write that is at fault.
Does you code do dynamic memory allocation? (malloc/free)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-14 5:54 AM
Ok. I made it wrongly. Instead of incrementing decrementing. I will correct that in my code.
Yes, I will proceed, when debugging this problem, with fault analyzer. I did not even know that such tool exists. Thanks to you all sharing this with me.
I will be able to work on the system on Friday and I hope I will have more information about this exception.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-14 6:01 AM
I make some allocation of small amount of memory at the initialization stage with malloc which is never released.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-10-02 1:51 AM
Problem with this exception was solved.
The cause was a DC/DC converter in the near proximity of the board with this microcontroller because of EMC interference. After replacing the DC/DC converter with other one the problem was gone.
