Debugging Intermittent Hard Fault on Reset - HELP!

AVoel.1 · ‎2022-01-21

We've got code that's been working for months. We updated the HAL Drivers and IDE, and now we got a gremlin. Now we have intermittent hard faults after an external hardware reset, but only when using STMCubeProgrammer (not the IDE) to program the MCU, and seemingly only after a full chip erase before programming. We've instrumented the hardfault handler and we see the attached output when it happens:

So two questions. This output looks very strange, especially the PC being at address 0x2000000. I've tested the tracing code with intentional hard faults, and it works fine, so I think this address is real in some sense.

Second question - does anyone know how to programmatically generate a call stack from the hard fault handler? I've seen posts using gnu Unwind_Backtrace, but none of them have any explanation as to how anything works.

Any help would be greatly appreciated!! We are burning days of time trying to figure this out.

waclawek.jan · ‎2022-01-21

Which STM32?

What is "external hardware reset"?

HFSR.DEBUGEVT being set appears to indicate that a debug event occured, and several DFSR bits are set too, which indicate that this is a debug event. In other words, you appear to have a debugger attached. How to interpret PC=2000'0000 in this context, I don't know. Do you attempt to run code from RAM?

JW

AVoel.1 · ‎2022-01-22

The MCU is and STM32L452, in a WLCSP64 package. No, we don't run out of RAM, certainly not the first word of RAM. The only time we even briefly run out of RAM is transferring control from our bootloader to our App:

__disable_irq();
      auto p = cmd_handler.ProgParams();
      __set_MSP(*(__IO uint32_t*) p.addr);
      auto jumpAddress = *(__IO uint32_t*)(p.addr+4);
      auto JumpToApplication = (void(*)(void)) jumpAddress;
      JumpToApplication();

But that code hasn't even run yet when we get our Hard Fault.

Your comment about the debugger is interesting. We had noticed the DEBUGEVT bit but hadn't paid much atttention. Yes, there is an ST-Link attached, but we aren't in debug mode, it is attached since we just programmed using the STM32CubeProgrammer and then disconnected. Just maybe it is a development tool bug associated with the debug port. We will try disconnecting the ST-Link and see if we can reproduce the problem. For what it is worth, we cannot duplicate this problem in debug mode running from the IDE.

Does that provide any clues?

AVoel.1 · ‎2022-01-22

I forgot to clarify. "external hardware reset" means pulling the RESET pin low.

I've been trying to absorb the debug architecture by reading the ARM®v7-M Architecture Reference Manual. It's very complicated. One question that comes to mind is "why would a debug event be triggered when not in debug mode?" It seems that an external agent like the CubeProgrammer could set up the registers this way with the ST-Link attached, but why? (but I am a little hazy on all this, it looks very complicated).

Tesla DeLorean · ‎2022-01-22

Perhaps have your own Hard Fault handler output diagnostic data, dump memory areas, or instructions.

Check what's happening with the BOOT0 pin, and option bytes. Check L4 errata.

Check the vectors are not corrupted.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

AVoel.1 · ‎2022-01-23

It's already generating diagnostic data, see the original post. I've adding DEMSR to the list, we will retest. We've checked the options bytes. The BOOT0 pin will always be the same, the problem is intermittent. Same with the vectors. But checking the errata is a good idea.