cancel
Showing results for 
Search instead for 
Did you mean: 

How to debug a HardFault on an Arm Cortex®-M STM32

STea
ST Employee

Introduction

For a firmware developer targeting and STM32 MCU based on a Cortex® M, they need to keep an eye on memory access, hardware availability, clock, and power to avoid running into issues that can be hard to follow and determine their root cause.

Nevertheless, we can all at the end run into a dead end, where we will need to debug and overcome HardFault.

In this article, we explain how to debug faults on Arm Cortex®-M based STM32 devices. In the process, we learn about fault registers, how to automate fault analysis, and figure out ways to recover from some faults without rebooting the MCU. We include practical examples, with a step-by-step walkthrough on how to investigate them.

Table of contents

1. HardFault definition

A HardFault is a type of fault that occurs on Arm Cortex®-M processors, which are commonly used in microcontroller applications. It is a type of error that indicates a serious failure in the system, and it triggers by various conditions that the processor cannot handle through other exception mechanisms.
In other words, a HardFault is an exception that occurs because of an error during exception processing, or because an exception cannot be managed by any other exception mechanism. HardFaults have a fixed priority of -1, meaning they have higher priority than any exception with configurable priority.​

2. Causes of HardFault

If you get into the HardFault handler on your firmware during development, you potentially have one of the following issues:

  • Executing an undefined instruction​
  • Executing program code from a memory region marked as the eXecute Never (XN)​
  • Writing to a memory region marked as read-only​
  • Accessing an invalid memory location​
  • Accessing a privileged-access only registers by unprivileged software​
  • Accessing an unaligned memory location


3. Types of HardFault

In fact, the HardFault exception could be raised because of an escalated fault other than a HardFault, which is not enabled. Below are the exceptions that can be escalated to HardFault. We take the Cortex®-M4 based MCUs as an example and this list can be extended for the Cortex®-M33 based MCUs.


3.1. MemManage Fault

An exception that occurs because of a memory protection related fault. The fixed memory protection constraints determine this fault, for both instruction and data memory transactions. This fault is always used to abort instruction accesses to Execute Never (XN) memory regions.​

​3.2. Bus Fault

An exception that occurs because of a memory related fault for an instruction or data memory transaction. This might be from an error detected on a bus in the memory system.​

3.3. Usage Fault

An exception that occurs because of a fault related to instruction execution.​

 

STea_0-1715262749764.jpeg


The fault exception other than the HardFault can be enabled by writing to the System Handler Control and State Register (SCB->SHCSR) in their enable bit fields (USGFAULTENA, BUSFAULTENA, MEMFAULTENA) as shown below:
 
 STea_1-1715262749770.jpeg


 

4. Determining the fault cause 

To help detect what type of error was encountered in the fault handler, the Cortex®-M3 and Cortex®-M4 processors also have a number Fault Status Registers (FSRs) and Fault Address Registers (FARs) that are used for fault analysis.​


5. Configurable Fault Status Register

The CFSR indicates the cause of a MemManage fault, BusFault, or UsageFault and can be further divided into three registers. ​Besides accessing CFSR as a 32-bit word, each part of the CFSR can be accessed using byte and half-word transfers. ​There is no CMSIS-Core symbol for the divided MMSR, BFSR, and UFSR.

STea_5-1715262749776.jpeg

This table shows the types of fault, the handler used for the fault, the corresponding fault status register, and the register bit that indicates that the fault has occurred. See the configurable fault status register (CFSR; UFSR+BFSR+MMFSR) on page 237 and HFSR on page 241 of the programming manual PM0214 for more information.

hardfault regs.PNG

6. Debugging the HardFault

In order to debug the HardFault, we need to recover the useful information found in the above-mentioned registers. Furthermore, we need to recover the call stack as well as the core registers to get the instruction that caused the HardFault. 

Note: For better debuggability optimization, the level must be set to 0 or 1. To make sure we are visualizing the correct behavior of the CPU, we are setting the optimization level to 0:

STea_2-1716228854110.png

 


7. Halting and determining the core register state


In order to analyze the HardFault we need to Halt the execution and get the value of core registers.

Below is an example on how we can do this:
Set a software breakpoint in the error handled of your HardFault:

void HardFault_Handler(void) { /* USER CODE BEGIN HardFault_IRQn 0 */ #ifdef DEBUG __BKPT(0); #endif /* USER CODE END HardFault_IRQn 0 */ while (1) { /* USER CODE BEGIN W1_HardFault_IRQn 0 */ /* USER CODE END W1_HardFault_IRQn 0 */ } }


Alternatively, we can set the option halt in exception in CubeIDE in the debug configuration:

STea_1-1716228776247.png

 

Now, we can visualize the content of core registers and get the stack pointer value:

STea_0-1716228602733.png

 

8. How to recover the call stack

To recover the call stack and examine the CPU state which lead to the HardFault we need to do the following:

  1. Extract the value of the stack pointer (SP) from CPU registers
  2. Copy the address SP and paste it in the memory view
  3. You should read the content as follows:
  4. From first address to 8th address
  • 1st address for R0
  • 2nd address for R1
  • 3rd address for R2
  • 4th address for R3
  • 5th address for R12
  • 6th address for LR
  • 7th address for PC
  • 8th address for xPSR
  • STea_3-1716229017345.png
    5. Copy the PC content and paste it in the disassembly viewer and see the instruction that cause the HardFault

    STea_4-1716229144420.png

     

8.1. Fault analyzer

The fault analyzer gives information about the fault that occurred, based on the fault related registers. 
Note: The fault analyzer feature is not available for Cortex M0 based microcontrollers.

STea_5-1716231132407.png

 

9. Examples

In this section we will go throw 3 different examples of an access to a reserved address: 0x00100000:

  • Trigger a BusFault (if enabled) or HardFault by reading from a reserved address
  • Trigger a BusFault (if enabled) or HardFault by writing to a reserved address
  • Trigger a BusFault (if enabled) or HardFault by executing at a reserved address
STea_6-1716232752375.png

To run each Test, you must uncomment the corresponding define.

#include "main.h" /* Private includes ----------------------------------------------------------*/ /* Private typedef -----------------------------------------------------------*/ /* Private define ------------------------------------------------------------*/ #define ADDRESS 0x00100000 #define READ_DATA_ACCESS /*Uncomment only to run Test1*/ //#define WRITE_DATA_ACCESS /*Uncomment only to run Test2*/ //#define FETCH_INSTRUCTION_ACCESS /*Uncomment only to run Test3*/ /* Private macro -------------------------------------------------------------*/ /* Private variables ---------------------------------------------------------*/ /* Private function prototypes -----------------------------------------------*/ void SystemClock_Config(void); /* Private user code ---------------------------------------------------------*/ /** * @brief The application entry point. * @retval int */ int main(void) { /* Reset of all peripherals, Initializes the Flash interface and the Systick. */ HAL_Init(); /* Configure the system clock */ SystemClock_Config(); int r; volatile unsigned int* p; int (*pF)(void); #ifdef READ_DATA_ACCESS p = (unsigned int*)ADDRESS; // reserved address r = *p; // read from reserved address #endif #ifdef WRITE_DATA_ACCESS //SCnSCB->ACTLR |= SCnSCB_ACTLR_DISDEFWBUF_Msk ; // Disable Write Buffer p = (unsigned int*)ADDRESS; // reserved address *p = 0xDEEDBEEF; // write to reserved address #endif #ifdef FETCH_INSTRUCTION_ACCESS pF = (int(*)(void))(ADDRESS+1); // reserved address r = pF(); // fetch instruction from reserved address #endif while (1) { } }
View more

9.1. Read data access

STea_7-1716232929587.png

9.2. Observations

When running the Project, a HardFault is triggered

The first step to do is to check the Fault Analyzer and Debug windows.

We can find very useful information related to the fault.

(1): HardFault has been caused by escalation of another exception which is BusFault.

(2) (3): The BusFault is caused by Precise data access to the address: 0x100000

(4): PC = 0x0800022E the value of PC just before the exception has occurred (the address of the instruction that causes the fault).

STea_8-1716233062624.png


9.3. Write Data Access

STea_0-1716311241462.png

 

When running the Project, a HardFault is triggered due to a BusFault escalation

BusFault caused this time by imprecise data access(No address is specified in BFSR)

The value of PC just before the exception has occurred PC= 0x08000232

This is not the address of the faulting write, which is 0x08000230

Note: BusFault is raised some instructions after the write instruction

 

STea_1-1716311419524.png

IMPRECISERR = 1     

Imprecise data access violation. Return address not related to fault

BFARVALID   = 0     

BFAR not valid

The reason for a bus fault to become imprecise is due to the presence of write buffers in the processor bus interface

Debugging imprecise bus faults is a bit harder than precise bus faults because by the time the bus fault exception is triggered, the processor could have executed several instructions, including branch instructions.

If the branch target can be reached via several paths, it could be hard to tell where the faulting memory access took place

To help with debugging such situations, you can disable the write buffer using the DISDEFWBUF bit in the Auxiliary Control register

To disable the write buffer, uncomment Line 40

STea_2-1716311555136.png

After disabling the write buffer, the fault becomes precise.

BFARVALID = 1 BFAR is valid (contains the address of the location that generated a BusFault).

The value of PC just before the exception has occurred shown in the debug window (can be found in the stack (Exception frame) now points to the faulty instruction.

Note:

To facilitate the debug of imprecise fault, we can change the memory type of a region that contains the reserved address to Strongly Ordered using the MPU (memory protection unit) instead of disabling the write buffer.

 Fetch instruction Access

STea_3-1716311762352.png

 

FORCED = 1

BusFault escalated to HardFault.

IBUSERR = 1

BusFault on instruction prefetch.

The address of the location that generated a BusFault is: PC = 00100000 not in BFAR (unlike the precise data error)

The BusFault is immediately triggered on execution at an invalid address

Conclusion

In this article we try to display a view of some basic debugging technique for fault handling. This can be useful for any embedded engineer when developing any kind of application on most of STM32 MCUs based on Arm Cortex M 3/4 CPUs. For other Cortex M33 or M7 based MCUs we can have other sources of HardFault like secure fault and faults related to cache management.

 

Related links

Comments

The Cortex-M0(+) is more fussy with memory alignment, so pointers the fetch doubles or uint64_t can be particularly problematic, say unaligned structs in memory, files, or serial data streams.

Things like LDRD/STRD

Add a Handler that provides information from products in the field, or that you can't readily debug in person.

Believe me a while(1) loop that dies silently won't help your Technical Support Engineers identify modes of failure. If you can't learn something from a customer call, it's a wasted engagement.

https://github.com/cturvey/RandomNinjaChef/blob/main/KeilHardFault.c

Also for Error_Handler(), use the __FILE__,__LINE_ form so you know where it came from.

Version history
Last update:
‎2024-08-26 07:38 AM
Updated by: