The program enters HardFault _ Handler, but no valid stack information can be found

xtxChino · ‎2025-04-28

Hello, my program runs to the HardFault _ Handler function, I usually use interrupt check whether there is a function variable overflow, but this time the jump function is irregular, I do not know how to judge, please help me, I will provide the relevant information you need

It finally jumped into the DMA function, but I had no idea why it would trigger HardFault _ Handler, let alone call the UART _ WaitOnFlagUntilTimeout function

Sometimes, my program will jump directly from UART _ WaitOnFlagUntilTimeout function to HardFault _ Handler, in the following line, and I know through debugging that the huart 's Instance is NULL, which makes me not understand.

if (Timeout != HAL_MAX_DELAY)
{
  if (((HAL_GetTick() - Tickstart) > Timeout) || (Timeout == 0U))//this line jump to HardFault_Handler
  {

    return HAL_TIMEOUT;
  }

I am using STM32G431RBT6, using three UART and its DMA.

Thanks.

xtxChino · ‎2025-05-06

I found the problem is indeed a stack overflow, but it is not the reason why the allocation is too small.

I used sprintf when formatting the string content and sending it, but my char string space is too small, which leads to an overflow. I believed too much in the size of my char array and didn 't understand the mechanics of sprintf that led to this error. Thank you to everyone involved in this discussion.

View solution in original post

Ozone · ‎2025-04-29

Do you know this document ? https://www.keil.com/appnotes/files/apnt209.pdf

The SCB registers, as described there, will tell you more about the cause and location of the fault.

Pavel A. · ‎2025-04-29

Some compilers optimize no-return functions so that the call stack cannot be seen in the debugger.

Try to change HardFault _ Handler:

void HardFault_Handler(void)
{
  // make compiler believe this can return
  static volatile int junk = 0;
  while(!junk) { __NOP(); }
}

or:

void HardFault_Handler(void)
{
  __BKPT(0);
}

xtxChino · ‎2025-04-29

Thank you first for your reply. I tested the two functions you gave, and the phenomenon did not change. I intercepted a more detailed graph of the register state, which contains a page with Instance as NULL. Incidentally, I use the GCC compiler + HAL G4 v1.6.1 package. I can trigger this error steadily

xtxChino · ‎2025-04-29

I tried more on Keil, and if I run on Keil without closing a DMA loop sampling ( yes, I forgot to say I have another DMA loop sampling ), it will get into Fault faster.

After annotating the ADC sampling startup code, I can trigger HardFault _ Handler again according to the previous steps in the CLion + GCC compiler, but the stack is still clueless. By the way, I have read the manual. If there is any omission, please tell me.

https://www.keil.com/appnotes/files/apnt209.pdf

Ozone · ‎2025-04-29

This is what the appnote says :

If the occurences are seemingly random, you might have a stacksize problem.
BTW, reasons 'a' and 'c' in this screenshot mean your code tried to call ARM code (instead of ThumbII). But that would happen rather synchronous, so it seems unlikely.

> I tried more on Keil, and if I run on Keil without closing a DMA loop sampling ( yes, I forgot to say I have another DMA loop sampling ), it will get into Fault faster.

Built with another toolchain (Keil) ?
Anyway, you can try to increase the stacksize, somewhere in the project settings.
Especially printf-style formatting library functions, semihosting, or FPU usage can cause problems.

Pavel A. · ‎2025-04-29

Great, so you confirmed the call stack: something is going in HAL_DMA_Abort (line 248), called while UART_WaitOnFlagUntilTimeout is on stack. This is something to ponder on.

DMA is dangerous, it can overwrite memory and do things.

xtxChino · ‎2025-04-29

I will try to stop all dma and check it again

xtxChino · ‎2025-04-29

Yes, when I use keil, the DMA of the ADC will make my program faster hardfault.

By the way, the errors I trace back through the CmBacktrace library are as follows:

Firmware name: Hello, hardware version: 1, software version: 2
Fault on interrupt or bare metal(no OS) environment
=================== Registers information ====================
R0 : 00000000 R1 : 00000040 R2 : 0000000d R3 : 00000020
R12: ffffffff LR : 080034bb PC : 08002286 PSR: 200b0000
==============================================================
Usage fault is caused by attempts to switch to an invalid state (e.g., ARM)
Show more call stack info by run: addr2line -e Hello.elf -afpiC 08002286 080034ba 0800596e

The result of addr2line parsing is consistent with the following figure

Ozone · ‎2025-04-29

> The result of addr2line parsing is consistent with the following figure ...

The relevant information in this picture is unreadably small. Anyway ...

> Usage fault is caused by attempts to switch to an invalid state (e.g., ARM)
> Show more call stack info by run: addr2line -e Hello.elf -afpiC 08002286 080034ba 0800596e

I suppose this refers to the same lines as in the post timestamped "2025-04-29 2:20 AM" (it's a shame ST doesn't number posts in a thread).

As you might know, ARM cores only fetch instructions from even addresses, thus they often use the LSB of a vector to denote the mode of the routine (ARM or Thumb). And Cortex M only supports Thumb.
Almost certainly your stack becomes corrupted, and the return address overwritten with a "random" value.

As said, try increasing the stack size.

You could profile your code, to see what exactly happens. There are some good (but expensive) tools to do that.
Or go the cheaper route, and use GPIO pins and a scope / logic analyzer, instrumentalising the relevent routines and calls.
The spuriousness suggests it is related to interrupts.