How to debug hard fault handler exceptions?

danielblesener9 · ‎2013-03-25

Posted on March 26, 2013 at 02:34

Hey all - thanks for taking your time to help

I have no idea how to track down whats going on. Opening debug explore seems logical - but I have no idea what to do with the address faults. I am not sure what they mean.

Is there a systematic way to solve these problems, or is it just experience? I have no clue what to do with any hard faults/mem faults/bus faults.

If you don't mind taking a moment, could you explain how to methodically solve these problems? I am fairly new to the programming world.

Last address in the hard fault status register was = 0x40000000

The call stack reads SP: 2000FD08h

Attached is a file showing the fault reporting window

Any advice?

danielblesener9 · ‎2013-04-20

Posted on April 20, 2013 at 22:42

Thanks for the responses. I am still struggling with this hard fault problem though.

I believe the problem is in hardware. I have stripped the program down from every angle, yet still can get the hard fault if I wait long enough. Does anybody know of a hardware situation that would cause a hard fault in the stm32???

danielblesener9 · ‎2013-04-20

Posted on April 20, 2013 at 22:55

I am even considering ESD problems? I have no idea where to look. Some pins from the micro route directly to push buttons. Could this cause hardfaults?

Tesla DeLorean · ‎2013-04-20

Posted on April 21, 2013 at 02:07

While it might be convenient to ponder on silicon issues causing your problem, this is usually not the case. The hardware is well validated, the internal timing well characterized, and critical paths well known. Things still apt to bite you would be issues in the flash, or prefetching paths, where the waits states are insufficient or clocks are too fast. These would typically present as gross failures, other design issues would be encountered by other people.

Let's assume your problems are like most others, hard faults tend to catch gross errors, not subtle ones of logic or algorithms. As such they should be a lot easier to pin down and resolve. You need to understand where in your code they occurred, and what you were doing at the time. The predominant cause of hard faults are broken/invalid pointers, out of scope accesses, stack corruption, and the stack and heap colliding with each other. These can be tracked with asserts, or validation of structures at key points or periodically, adding guard zones and checking them. Expect the points of failure to be similar, either occurring at the same places (ie use of particular subroutines, or code), or while doing the same things (ie returning from a subroutine, use of particular pointers).

If the failure point appear random, consider if they occur immediate after an interrupt service, if those services exceed the available resources at the time, or interact with other tasks. Instrument the interrupt entry/exit so you can place them in the time line of the failures.

The resource rich developer would use hardware trace to map the route to the failure, the poorer one would output telemetry, walk back the stack and call trees, and use handlers that output useful diagnostics.

If you still think things are random, and unrelated to anything your code is doing, start looking at the supplies, the clocks, the PLL stability. Check currents and temperature. Check soldering and mechanical issues.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

danielblesener9 · ‎2013-04-22

Posted on April 22, 2013 at 17:16

Clive, as always thanks for the help. I will look into the different aspects of your response and let you know how it goes.

emalund · ‎2013-04-22

Posted on April 22, 2013 at 17:24

I am not working ST right now, but what I just did with another processor was to insert this in the hard fault ISR

bool blah.= FALSE

..

if (blah)

{

return();

set a breakpoint on ''if (blah)'' change blah to true and step to the place where the processor jumped to hard fault

danielblesener9 · ‎2013-05-02

Posted on May 03, 2013 at 00:32

The error came from defining this variable -- EXT VOL uint16_t VirtAddVarTab[NUM_FLASH_VARS];

YES, just by defining it. This is part of the eeprom.c file. This did not happen with older versions of st's library. I have a co-worker working on the stm32f0 series right now, and also ran into this problem. All we have to do is take out this definition and no more hard fault occurs, looks like its a bug in st's library.

Tesla DeLorean · ‎2013-05-02

Posted on May 03, 2013 at 01:01

The EEPROM Emulation is a rather awkward abstraction, in my opinion.

It's a bit hard to see the definition would cause a fault unless it's unreasonably large, but rather the accesses to the memory, or an alignment issue.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..