cancel
Showing results for 
Search instead for 
Did you mean: 

How to diagnose a Hard Fault Exception on STM32F407IGT

SMali.3
Associate III

Hello

After running the code for about one to two hours I always get Hard Fault exception. Readout of the registers in this Hard Fault while loop are:

HFSR=0x4000 0000

CFSR=0x8200

BFAR=0x20020000

MMFAR=0x20020000

AFSR=0

 

Readout of the SP register shows:

SP=0x2001ff40

*(SP)=8

*(SP-1)=8

*(SP-2)=1

*(SP-3)=2

*(SP-4)=2

*(SP-5)=2

*(SP-6)=0

*(SP-7)=0

*(SP-8)=0x2001ffc0

*(SP-9)=0x8012ae8

What is going on here? How to make a proper recovery from this situation?

 

8 REPLIES 8
BarryWhit
Senior III

KB: How to debug a HardFault on an Arm Cortex®-M STM32 

https://interrupt.memfault.com/blog/cortex-m-hardfault-debug

 

You can use CubeIDE integrated hard fault analyzer to get a friendlier view of state.

You can use CubeIDE build analyzer to find which function lives at certain address (this doesn't require an active debug session, unlike disassembly view).

 

 

Possibly (If I've decoded the data correctly), you have a divide-by-zero error occurring at 0x8012ae8.

- If someone's post helped resolve your issue, please thank them by clicking "Accept as Solution".
- Please post an update with details once you've solved your issue. Your experience may help others.

Thanks for the fast reply.

I do not use CubeIDE for this project, I use Atollic TrueSTUDIO.

How did you get to idea that it is a divide-by-zero problem?

I mean:

HFSR=0x4000 0000 -> I have a FORCED hard fault

CFSR=0x0000 8200 -> PRECIS ERR and BFAR VALID which means the address in BFAR is valid

BFAR=0x20020000

I assume there was and access to this location presumably a read. In my linker .ld file I have: _estack = 0x20020000

Does this have some connection in some ways?

Also I do not have any code on address 0x8012ae8. My code according to .list file and settings in the ld file starts at 0x08020000.

Pavel A.
Evangelist III

True Studio has the fault analyzer, same as in CubeIDE. [video]

I do not have any code on address 0x8012ae8. 

This likely is the culprit. Stack overwrite?

How did you get to idea that it is a divide-by-zero problem?

My Mistake. I searched for CM4 CFSR bits definition but got the CM3 page instead. 

- If someone's post helped resolve your issue, please thank them by clicking "Accept as Solution".
- Please post an update with details once you've solved your issue. Your experience may help others.
BarryWhit
Senior III

Isn't your stack dump showing the wrong addresses? The stack (in Cortex-M4) grows downwards. If you want to see what was pushed on the stack by the exception (esp. the PC), you should be looking at SP+n not at SP-n . That's why the only value that looks like a code address doesn't make sense (PC should be available at *((uint32_t*)SP)+6 ) unless I'm wrong again).

 

That's why it's simpler to just make use of the Hard fault analyzer / GUI debugger, avoiding all these easy-to-make mistakes.

- If someone's post helped resolve your issue, please thank them by clicking "Accept as Solution".
- Please post an update with details once you've solved your issue. Your experience may help others.
TDK
Guru

Looking at the call stack when the error happens can give you insight. If it's a stack overflow. If stack variables are corrupted, likely there's an out of bounds write that is at fault.

Does you code do dynamic memory allocation? (malloc/free)

If you feel a post has answered your question, please click "Accept as Solution".

Ok. I made it wrongly. Instead of incrementing decrementing. I will correct that in my code.

Yes, I will proceed, when debugging this problem, with fault analyzer. I did not even know that such tool exists. Thanks to you all sharing this with me.

I will be able to work on the system on Friday and I hope I will have more information about this exception.

SMali.3
Associate III

I make some allocation of small amount of memory at the initialization stage with malloc which is never released.