How to debug a HardFault on an Arm Cortex®-M STM32

STea · ‎2024-06-27

Introduction

For a firmware developer targeting and STM32 MCU based on a Cortex® M, they need to keep an eye on memory access, hardware availability, clock, and power to avoid running into issues that can be hard to follow and determine their root cause.

Nevertheless, we can all at the end run into a dead end, where we will need to debug and overcome HardFault.

In this article, we explain how to debug faults on Arm Cortex®-M based STM32 devices. In the process, we learn about fault registers, how to automate fault analysis, and figure out ways to recover from some faults without rebooting the MCU. We include practical examples, with a step-by-step walkthrough on how to investigate them.

Introduction
Table of contents
1. HardFault definition
2. Causes of HardFault
3. Types of HardFault
3.1. MemManageFault
3.2.Bus Fault
3.3.Usage Fault
4. Determining the fault cause
5. Configurable Fault Status Register
6. Debugging the HardFault
7. Halting and determining the core register state
8. How to recover the call stack
8.1. Fault analyzer
9. Examples
9.1. Read data access
9.2. Observations
9.3. Write Data Access
Conclusion
Related links

1. HardFault definition

A HardFault is a type of fault that occurs on Arm Cortex®-M processors, which are commonly used in microcontroller applications. It is a type of error that indicates a serious failure in the system, and it triggers by various conditions that the processor cannot handle through other exception mechanisms.
In other words, a HardFault is an exception that occurs because of an error during exception processing, or because an exception cannot be managed by any other exception mechanism. HardFaults have a fixed priority of -1, meaning they have higher priority than any exception with configurable priority.

2. Causes of HardFault

If you get into the HardFault handler on your firmware during development, you potentially have one of the following issues:

Executing an undefined instruction
Executing program code from a memory region marked as the eXecute Never (XN)
Writing to a memory region marked as read-only
Accessing an invalid memory location
Accessing a privileged-access only registers by unprivileged software
Accessing an unaligned memory location

3. Types of HardFault

In fact, the HardFault exception could be raised because of an escalated fault other than a HardFault, which is not enabled. Below are the exceptions that can be escalated to HardFault. We take the Cortex®-M4 based MCUs as an example and this list can be extended for the Cortex®-M33 based MCUs.

3.1. MemManage Fault

An exception that occurs because of a memory protection related fault. The fixed memory protection constraints determine this fault, for both instruction and data memory transactions. This fault is always used to abort instruction accesses to Execute Never (XN) memory regions.

3.2. Bus Fault

An exception that occurs because of a memory related fault for an instruction or data memory transaction. This might be from an error detected on a bus in the memory system.

3.3. Usage Fault

An exception that occurs because of a fault related to instruction execution.

The fault exception other than the HardFault can be enabled by writing to the System Handler Control and State Register (SCB->SHCSR) in their enable bit fields (USGFAULTENA, BUSFAULTENA, MEMFAULTENA) as shown below:

4. Determining the fault cause

To help detect what type of error was encountered in the fault handler, the Cortex®-M3 and Cortex®-M4 processors also have a number Fault Status Registers (FSRs) and Fault Address Registers (FARs) that are used for fault analysis.

5. Configurable Fault Status Register

The CFSR indicates the cause of a MemManage fault, BusFault, or UsageFault and can be further divided into three registers. Besides accessing CFSR as a 32-bit word, each part of the CFSR can be accessed using byte and half-word transfers. There is no CMSIS-Core symbol for the divided MMSR, BFSR, and UFSR.

This table shows the types of fault, the handler used for the fault, the corresponding fault status register, and the register bit that indicates that the fault has occurred. See the configurable fault status register (CFSR; UFSR+BFSR+MMFSR) on page 237 and HFSR on page 241 of the programming manual PM0214 for more information.

6. Debugging the HardFault

In order to debug the HardFault, we need to recover the useful information found in the above-mentioned registers. Furthermore, we need to recover the call stack as well as the core registers to get the instruction that caused the HardFault.

Note: For better debuggability optimization, the level must be set to 0 or 1. To make sure we are visualizing the correct behavior of the CPU, we are setting the optimization level to 0:

7. Halting and determining the core register state

In order to analyze the HardFault we need to Halt the execution and get the value of core registers.

Below is an example on how we can do this:
Set a software breakpoint in the error handled of your HardFault:

void HardFault_Handler(void)
{
  /* USER CODE BEGIN HardFault_IRQn 0 */
    #ifdef DEBUG
     __BKPT(0);
    #endif
  /* USER CODE END HardFault_IRQn 0 */
  while (1)
  {
    /* USER CODE BEGIN W1_HardFault_IRQn 0 */
    /* USER CODE END W1_HardFault_IRQn 0 */
  }
}

Alternatively, we can set the option halt in exception in CubeIDE in the debug configuration:

Now, we can visualize the content of core registers and get the stack pointer value:

8. How to recover the call stack

To recover the call stack and examine the CPU state which lead to the HardFault we need to do the following:

Extract the value of the stack pointer (SP) from CPU registers
Copy the address SP and paste it in the memory view
You should read the content as follows:
From first address to 8^th address

1^staddress for R0
2^ndaddress for R1
3^rdaddress for R2
4^thaddress for R3
5^thaddress for R12
6^thaddress for LR
7^thaddress for PC
8^th address for xPSR
5. Copy the PC content and paste it in the disassembly viewer and see the instruction that cause the HardFault

8.1. Fault analyzer

The fault analyzer gives information about the fault that occurred, based on the fault related registers.
Note: The fault analyzer feature is not available for Cortex M0 based microcontrollers.

9. Examples

In this section we will go throw 3 different examples of an access to a reserved address: 0x00100000:

Trigger a BusFault (if enabled) or HardFault by reading from a reserved address
Trigger a BusFault (if enabled) or HardFault by writing to a reserved address
Trigger a BusFault (if enabled) or HardFault by executing at a reserved address

To run each Test, you must uncomment the corresponding define.

#include "main.h"

/* Private includes ----------------------------------------------------------*/
/* Private typedef -----------------------------------------------------------*/
/* Private define ------------------------------------------------------------*/
#define ADDRESS                     0x00100000

#define READ_DATA_ACCESS                /*Uncomment only to run Test1*/
//#define WRITE_DATA_ACCESS             /*Uncomment only to run Test2*/
//#define FETCH_INSTRUCTION_ACCESS      /*Uncomment only to run Test3*/

/* Private macro -------------------------------------------------------------*/
/* Private variables ---------------------------------------------------------*/
/* Private function prototypes -----------------------------------------------*/
void SystemClock_Config(void);

/* Private user code ---------------------------------------------------------*/

/**
  * @brief  The application entry point.
  * @retval int
  */
int main(void)
{
  /* Reset of all peripherals, Initializes the Flash interface and the Systick. */
  HAL_Init();

  /* Configure the system clock */
  SystemClock_Config();

  int r;
  volatile unsigned int* p;
  int (*pF)(void);

  #ifdef READ_DATA_ACCESS
    p = (unsigned int*)ADDRESS;     // reserved address
    r = *p;                         // read from reserved address
  #endif
  #ifdef WRITE_DATA_ACCESS
    //SCnSCB->ACTLR |= SCnSCB_ACTLR_DISDEFWBUF_Msk ; // Disable Write Buffer
    p = (unsigned int*)ADDRESS;     // reserved address
    *p = 0xDEEDBEEF;                // write to reserved address
  #endif
  #ifdef FETCH_INSTRUCTION_ACCESS
    pF = (int(*)(void))(ADDRESS+1); // reserved address
    r = pF();                       // fetch instruction from reserved address
  #endif

  while (1)
  {

  }
}

9.1. Read data access

9.2. Observations

When running the Project, a HardFault is triggered

The first step to do is to check the Fault Analyzer and Debug windows.

We can find very useful information related to the fault.

(1): HardFault has been caused by escalation of another exception which is BusFault.

(2) (3): The BusFault is caused by Precise data access to the address: 0x100000

(4): PC = 0x0800022E the value of PC just before the exception has occurred (the address of the instruction that causes the fault).

9.3. Write Data Access

When running the Project, a HardFault is triggered due to a BusFault escalation

BusFault caused this time by imprecise data access(No address is specified in BFSR)

The value of PC just before the exception has occurred PC= 0x08000232

This is not the address of the faulting write, which is 0x08000230

Note: BusFault is raised some instructions after the write instruction

IMPRECISERR = 1

Imprecise data access violation. Return address not related to fault

BFARVALID = 0

BFAR not valid

The reason for a bus fault to become imprecise is due to the presence of write buffers in the processor bus interface

Debugging imprecise bus faults is a bit harder than precise bus faults because by the time the bus fault exception is triggered, the processor could have executed several instructions, including branch instructions.

If the branch target can be reached via several paths, it could be hard to tell where the faulting memory access took place

To help with debugging such situations, you can disable the write buffer using the DISDEFWBUF bit in the Auxiliary Control register

To disable the write buffer, uncomment Line 40

After disabling the write buffer, the fault becomes precise.

BFARVALID = 1 BFAR is valid (contains the address of the location that generated a BusFault).

The value of PC just before the exception has occurred shown in the debug window (can be found in the stack (Exception frame) now points to the faulty instruction.

Note:

To facilitate the debug of imprecise fault, we can change the memory type of a region that contains the reserved address to Strongly Ordered using the MPU (memory protection unit) instead of disabling the write buffer.

Fetch instruction Access

FORCED = 1

BusFault escalated to HardFault.

IBUSERR = 1

BusFault on instruction prefetch.

The address of the location that generated a BusFault is: PC = 00100000 not in BFAR (unlike the precise data error)

The BusFault is immediately triggered on execution at an invalid address

Conclusion

In this article we try to display a view of some basic debugging technique for fault handling. This can be useful for any embedded engineer when developing any kind of application on most of STM32 MCUs based on Arm Cortex M 3/4 CPUs. For other Cortex M33 or M7 based MCUs we can have other sources of HardFault like secure fault and faults related to cache management.