Resolving Hard Fault with STM32F407

bcornwellmott · ‎2019-01-02

I've been working on debugging a hard fault occuring on an STM32F4-discovery board.

For context, I'm working with a custom USB library which is used to communicate with an Android device in Accessory mode (this code uses usbh_core, etc and is similar to the usbh_cdc code) and also using a fatfs library to read/write to an SD card. The hard fault occurs when I unplug or replug in my USB connection.

There is way too much code to share, but the issue is more how to debug the hard fault than anything else. I've tried outputting the stack and key variables in the stack, as well as use_Unwind_Backtrace, without any real luck.

Here is what the hard fault handler outputs:

SCB->HFSR = 0x40000000
SCB->CFSR = 0x00008200
SCB->BFAR = 0x20020ae8
SCB->MMFAR = 0x20020ae8
MSP = 0x2001fdc0
PSP = 0x00000000

Here are the first 8 values in the stack

R0 = 0x00000000
R1 = 0x2001fdc0
R2 = 0x2000c938
R3 = 0x20020adc
R12 = 0x2000cd38
LR = 0xfffffff9
PC = 0x20000adc
PSR = 0x00000001

Here is the output of the backtrace:

Backtrace:
    #0: program counter at 080e989c
    #1: program counter at 080e999c

With this as the relevant info from the .MAP file

 .text          0x080e9744      0xc30 obj//04-ProgramCode/Errors/Programming.o
                0x080e9838                trace_fcn
                0x080e9888                print_backtrace_here
                0x080e98a8                RebootToBootloader
                0x080e9974                discoveryIgnoreHandler
                0x080e997c                discoveryIgnore2Handler
                0x080e9984                discoveryDMAHandler
                0x080e998c                discoveryHARD_FAULTHandler
                0x080e9e54                discoveryMPU_FAULTHandler
 
.data.impure_data
                0x20000678      0x428 C:\Progra~1\SCICOS~1.1\contrib\E4coder\E4CODE~1\sdk\DISCOV~1\GNU_TO~1/arm-none-eabi/lib/thumb\libc.a(lib_a-impure.o)
 .data._impure_ptr
                0x20000aa0        0x4 C:\Progra~1\SCICOS~1.1\contrib\E4coder\E4CODE~1\sdk\DISCOV~1\GNU_TO~1/arm-none-eabi/lib/thumb\libc.a(lib_a-impure.o)
                0x20000aa0                _impure_ptr
 .data          0x20000aa4        0x0 C:\Progra~1\SCICOS~1.1\contrib\E4coder\E4CODE~1\sdk\DISCOV~1\GNU_TO~1/arm-none-eabi/lib/thumb\libc.a(lib_a-init.o)
 .data          0x20000aa4        0x0 C:\Progra~1\SCICOS~1.1\contrib\E4coder\E4CODE~1\sdk\DISCOV~1\GNU_TO~1/arm-none-eabi/lib/thumb\libc.a(lib_a-malloc.o)
 .data          0x20000aa4        0x0 C:\Progra~1\SCICOS~1.1\contrib\E4coder\E4CODE~1\sdk\DISCOV~1\GNU_TO~1/arm-none-eabi/lib/thumb\libc.a(lib_a-mallocr.o)
 .data.__malloc_av_
                0x20000aa4      0x408 C:\Progra~1\SCICOS~1.1\contrib\E4coder\E4CODE~1\sdk\DISCOV~1\GNU_TO~1/arm-none-eabi/lib/thumb\libc.a(lib_a-mallocr.o)
                0x20000aa4                __malloc_av_
 .data.__malloc_trim_threshold
                0x20000eac        0x4 C:\Progra~1\SCICOS~1.1\contrib\E4coder\E4CODE~1\sdk\DISCOV~1\GNU_TO~1/arm-none-eabi/lib/thumb\libc.a(lib_a-mallocr.o)
                0x20000eac                __malloc_trim_threshold

The code for my backtrace is something I found on stack exchange:

_Unwind_Reason_Code trace_fcn(_Unwind_Context *ctx, void *d)
{
    int *depth = (int*)d;
	char msg[80];
    sprintf(msg,"\t#%d: program counter at %08x\n", *depth, _Unwind_GetIP(ctx));
	SendAsyncPriority(msg);
    (*depth)++;
    return _URC_NO_REASON;
}
 
void print_backtrace_here()
{
    int depth = 0;
    _Unwind_Backtrace(&trace_fcn, &depth);
}

As you can see, the HFSR and CFSR are indicating that the BFAR value is accurate (I disable the SCnSC->ACTLR write buffer to ensure it is), but the BFAR value is a RAM-style address, not an address in program memory (as I normally have seen). However, the RAM only goes to 0x20020000, so this is pointing beyond my stack.

I'm guessing that somewhere, I'm loading a stack value, addressing somewhere beyond this value, then trying to access it as a function or something, but I have no idea how I could be doing that in my code, and no clue as to what part of my code might be doing that.

Any suggestions on how to debug this would be really useful. I'm thinking that getting a functional backtrace would give me a clue as to what section of code is causing this issue, but for whatever reason the backtrace is clearing at the hard fault handler.

S.Ma · ‎2019-01-02

Have you tried in debug mode to look at the stack pointer value and debug stack window which shows the origin of the exception?

Are you running an RTOS?

Add a counter when you plug unplug the USB for debug.

Are you also hot plugging/unplugging SD Card as well?

Last advice: Is the issue repeatable or has certain probability to happen?

bcornwellmott · ‎2019-01-02

Thanks for the quick response.

I'm not sure I follow what you're asking with the first question? I am not using an IDE, so don't have a debug stack window or anything to assist me.

I am not running an RTOS.

I'm not hot swapping the SD card (at least, not when the issue occurs).

The issue has a probability to happen. I think it has something to do with an interrupt being triggered at an inopportune time, but can't be sure.

Can you explain more about the counter? When would I start it, what would it tell me, etc?

Thanks again

Tesla DeLorean · ‎2019-01-02

General things to watch would be stack and heap size.

On GNU/GCC there are couple of complaints about malloc() being used in an interrupt handler.

For Hard Faults I decompose the frame, and look at the faulting instruction(s) and the registers.

If this is a NULL pointer, add some sanity check so it is not used.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

bcornwellmott · ‎2019-01-02

Thanks for the response Clive. With your feedback, I think I might have identified the issue.

My stack and heap sizes don't seem to the be issue, as my pointers don't seem to be showing at or near the limits.

Your comment on the malloc issue in an interrupt handler I think is the problem, as I just recently added a timer interrupt to call my USBH Process function (which processes all the USB connection stuff). When the USB connects, the interrupt will eventually call the InterfaceInit function of my USBH class, which has a malloc in it. I've printed out my entire stack and have noticed that the memory locations nearer to the top of the stack point to malloc functions, and as I move further down I see my InterfaceInit function.

Do you know of any workaround for the malloc issue? If I can't find a solution, I could try just skipping malloc for this call (I think I can just have a static instantiation and not be too concerned about memory), but I feel like that's pretty hacky, and I can see myself having to malloc in the USB timer interrupt in the future.

Thank you

Pavel A. · ‎2019-01-02

> I just recently added a timer interrupt to call my USBH Process function

It has "Process" in its name for a reason. It's a HUGE HINT: don't call me from interrupt!

-- pa

bcornwellmott · ‎2019-01-03

Great!

Have you got a suggestion for how to have a function called every 50 us consistently without an interrupt?