How to diagnose a Hard Fault Exception on STM32F407IGT

SMali.3 · ‎2024-08-13

Hello

After running the code for about one to two hours I always get Hard Fault exception. Readout of the registers in this Hard Fault while loop are:

HFSR=0x4000 0000

CFSR=0x8200

BFAR=0x20020000

MMFAR=0x20020000

AFSR=0

Readout of the SP register shows:

SP=0x2001ff40

*(SP)=8

*(SP-1)=8

*(SP-2)=1

*(SP-3)=2

*(SP-4)=2

*(SP-5)=2

*(SP-6)=0

*(SP-7)=0

*(SP-8)=0x2001ffc0

*(SP-9)=0x8012ae8

What is going on here? How to make a proper recovery from this situation?

SMali.3 · ‎2024-10-02

Problem with this exception was solved.

The cause was a DC/DC converter in the near proximity of the board with this microcontroller because of EMC interference. After replacing the DC/DC converter with other one the problem was gone.

View solution in original post

BarryWhit · ‎2024-08-14

KB: How to debug a HardFault on an Arm Cortex®-M STM32

https://interrupt.memfault.com/blog/cortex-m-hardfault-debug

You can use CubeIDE integrated hard fault analyzer to get a friendlier view of state.

You can use CubeIDE build analyzer to find which function lives at certain address (this doesn't require an active debug session, unlike disassembly view).

~~Possibly (If I've decoded the data correctly), you have a divide-by-zero error occurring at 0x8012ae8.~~

- If someone's post helped resolve your issue, please thank them by clicking "Accept as Solution".
- Please post an update with details once you've solved your issue. Your experience may help others.

SMali.3 · ‎2024-08-14

Thanks for the fast reply.

I do not use CubeIDE for this project, I use Atollic TrueSTUDIO.

How did you get to idea that it is a divide-by-zero problem?

I mean:

HFSR=0x4000 0000 -> I have a FORCED hard fault

CFSR=0x0000 8200 -> PRECIS ERR and BFAR VALID which means the address in BFAR is valid

BFAR=0x20020000

I assume there was and access to this location presumably a read. In my linker .ld file I have: _estack = 0x20020000

Does this have some connection in some ways?

Also I do not have any code on address 0x8012ae8. My code according to .list file and settings in the ld file starts at 0x08020000.

Pavel A. · ‎2024-08-14

True Studio has the fault analyzer, same as in CubeIDE. [video]

> I do not have any code on address 0x8012ae8.

This likely is the culprit. Stack overwrite?

BarryWhit · ‎2024-08-14

> How did you get to idea that it is a divide-by-zero problem?

My Mistake. I searched for CM4 CFSR bits definition but got the CM3 page instead.

- If someone's post helped resolve your issue, please thank them by clicking "Accept as Solution".
- Please post an update with details once you've solved your issue. Your experience may help others.

BarryWhit · ‎2024-08-14

Isn't your stack dump showing the wrong addresses? The stack (in Cortex-M4) grows downwards. If you want to see what was pushed on the stack by the exception (esp. the PC), you should be looking at SP+n not at SP-n . That's why the only value that looks like a code address doesn't make sense (PC should be available at *((uint32_t*)SP)+6 ) unless I'm wrong again).

That's why it's simpler to just make use of the Hard fault analyzer / GUI debugger, avoiding all these easy-to-make mistakes.

- If someone's post helped resolve your issue, please thank them by clicking "Accept as Solution".
- Please post an update with details once you've solved your issue. Your experience may help others.

TDK · ‎2024-08-14

Looking at the call stack when the error happens can give you insight. If it's a stack overflow. If stack variables are corrupted, likely there's an out of bounds write that is at fault.

Does you code do dynamic memory allocation? (malloc/free)

If you feel a post has answered your question, please click "Accept as Solution".

SMali.3 · ‎2024-08-14

Ok. I made it wrongly. Instead of incrementing decrementing. I will correct that in my code.

Yes, I will proceed, when debugging this problem, with fault analyzer. I did not even know that such tool exists. Thanks to you all sharing this with me.

I will be able to work on the system on Friday and I hope I will have more information about this exception.

SMali.3 · ‎2024-08-14

I make some allocation of small amount of memory at the initialization stage with malloc which is never released.

SMali.3 · ‎2024-10-02

Problem with this exception was solved.

The cause was a DC/DC converter in the near proximity of the board with this microcontroller because of EMC interference. After replacing the DC/DC converter with other one the problem was gone.