Skip to main content
SMali.3
Associate II
August 14, 2024
Solved

How to diagnose a Hard Fault Exception on STM32F407IGT

  • August 14, 2024
  • 5 replies
  • 3565 views

Hello

After running the code for about one to two hours I always get Hard Fault exception. Readout of the registers in this Hard Fault while loop are:

HFSR=0x4000 0000

CFSR=0x8200

BFAR=0x20020000

MMFAR=0x20020000

AFSR=0

 

Readout of the SP register shows:

SP=0x2001ff40

*(SP)=8

*(SP-1)=8

*(SP-2)=1

*(SP-3)=2

*(SP-4)=2

*(SP-5)=2

*(SP-6)=0

*(SP-7)=0

*(SP-8)=0x2001ffc0

*(SP-9)=0x8012ae8

What is going on here? How to make a proper recovery from this situation?

 

Best answer by SMali.3

Problem with this exception was solved.

The cause was a DC/DC converter in the near proximity of the board with this microcontroller because of EMC interference. After replacing the DC/DC converter with other one the problem was gone.

 

5 replies

BarryWhit
Lead
August 14, 2024

KB: How to debug a HardFault on an Arm Cortex®-M STM32 

https://interrupt.memfault.com/blog/cortex-m-hardfault-debug

 

You can use CubeIDE integrated hard fault analyzer to get a friendlier view of state.

You can use CubeIDE build analyzer to find which function lives at certain address (this doesn't require an active debug session, unlike disassembly view).

 

 

Possibly (If I've decoded the data correctly), you have a divide-by-zero error occurring at 0x8012ae8.

"- If someone's post helped resolve your issue, please thank them by clicking ""Accept as Solution"".- Please post an update with details once you've solved your issue. Your experience may help others."
SMali.3
SMali.3Author
Associate II
August 14, 2024

Thanks for the fast reply.

I do not use CubeIDE for this project, I use Atollic TrueSTUDIO.

How did you get to idea that it is a divide-by-zero problem?

I mean:

HFSR=0x4000 0000 -> I have a FORCED hard fault

CFSR=0x0000 8200 -> PRECIS ERR and BFAR VALID which means the address in BFAR is valid

BFAR=0x20020000

I assume there was and access to this location presumably a read. In my linker .ld file I have: _estack = 0x20020000

Does this have some connection in some ways?

Also I do not have any code on address 0x8012ae8. My code according to .list file and settings in the ld file starts at 0x08020000.

BarryWhit
Lead
August 14, 2024

How did you get to idea that it is a divide-by-zero problem?

My Mistake. I searched for CM4 CFSR bits definition but got the CM3 page instead. 

"- If someone's post helped resolve your issue, please thank them by clicking ""Accept as Solution"".- Please post an update with details once you've solved your issue. Your experience may help others."
Pavel A.
Super User
August 14, 2024

True Studio has the fault analyzer, same as in CubeIDE. [video]

I do not have any code on address 0x8012ae8. 

This likely is the culprit. Stack overwrite?

BarryWhit
Lead
August 14, 2024

Isn't your stack dump showing the wrong addresses? The stack (in Cortex-M4) grows downwards. If you want to see what was pushed on the stack by the exception (esp. the PC), you should be looking at SP+n not at SP-n . That's why the only value that looks like a code address doesn't make sense (PC should be available at *((uint32_t*)SP)+6 ) unless I'm wrong again).

 

That's why it's simpler to just make use of the Hard fault analyzer / GUI debugger, avoiding all these easy-to-make mistakes.

"- If someone's post helped resolve your issue, please thank them by clicking ""Accept as Solution"".- Please post an update with details once you've solved your issue. Your experience may help others."
SMali.3
SMali.3Author
Associate II
August 14, 2024

Ok. I made it wrongly. Instead of incrementing decrementing. I will correct that in my code.

Yes, I will proceed, when debugging this problem, with fault analyzer. I did not even know that such tool exists. Thanks to you all sharing this with me.

I will be able to work on the system on Friday and I hope I will have more information about this exception.

TDK
August 14, 2024

Looking at the call stack when the error happens can give you insight. If it's a stack overflow. If stack variables are corrupted, likely there's an out of bounds write that is at fault.

Does you code do dynamic memory allocation? (malloc/free)

"If you feel a post has answered your question, please click ""Accept as Solution""."
SMali.3
SMali.3Author
Associate II
August 14, 2024

I make some allocation of small amount of memory at the initialization stage with malloc which is never released.

SMali.3
SMali.3AuthorBest answer
Associate II
October 2, 2024

Problem with this exception was solved.

The cause was a DC/DC converter in the near proximity of the board with this microcontroller because of EMC interference. After replacing the DC/DC converter with other one the problem was gone.

 

BarryWhit
Lead
October 2, 2024

Are you sure it was EMI?  excessive switching noise on the SMPS output could also cause glitches for example.

"- If someone's post helped resolve your issue, please thank them by clicking ""Accept as Solution"".- Please post an update with details once you've solved your issue. Your experience may help others."
SMali.3
SMali.3Author
Associate II
October 2, 2024

I assume that is EMI because the power supply in question had no direct connection with the micro-controller except for the ground. Micro-controller is supplied from another SMPS which works fine with the component for more than a decade.

I did not measure with the spectrum analyzer because of lack of time. Maybe I will do that at some point in the future.

BarryWhit
Lead
October 2, 2024

Fair. Would have been interesting to verify by shielding with an improvised can and seeing if the issue went away. 

"- If someone's post helped resolve your issue, please thank them by clicking ""Accept as Solution"".- Please post an update with details once you've solved your issue. Your experience may help others."