How to debug a HardFault on an Arm Cortex®-M STM32
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content
on
2024-06-27
05:00 AM
- edited on
2024-08-26
07:38 AM
by
Laurids_PETERSE
Introduction
For a firmware developer targeting and STM32 MCU based on a Cortex® M, they need to keep an eye on memory access, hardware availability, clock, and power to avoid running into issues that can be hard to follow and determine their root cause.
Nevertheless, we can all at the end run into a dead end, where we will need to debug and overcome HardFault.
In this article, we explain how to debug faults on Arm Cortex®-M based STM32 devices. In the process, we learn about fault registers, how to automate fault analysis, and figure out ways to recover from some faults without rebooting the MCU. We include practical examples, with a step-by-step walkthrough on how to investigate them.
Table of contents
- Introduction
- Table of contents
- 1. HardFault definition
- 2. Causes of HardFault
- 3. Types of HardFault
- 3.1. MemManageFault
- 3.2.Bus Fault
- 3.3.Usage Fault
- 4. Determining the fault cause
- 5. Configurable Fault Status Register
- 6. Debugging the HardFault
- 7. Halting and determining the core register state
- 8. How to recover the call stack
- 8.1. Fault analyzer
- 9. Examples
- 9.1. Read data access
- 9.2. Observations
- 9.3. Write Data Access
- Conclusion
- Related links
1. HardFault definition
A HardFault is a type of fault that occurs on Arm Cortex®-M processors, which are commonly used in microcontroller applications. It is a type of error that indicates a serious failure in the system, and it triggers by various conditions that the processor cannot handle through other exception mechanisms.
In other words, a HardFault is an exception that occurs because of an error during exception processing, or because an exception cannot be managed by any other exception mechanism. HardFaults have a fixed priority of -1, meaning they have higher priority than any exception with configurable priority.
2. Causes of HardFault
If you get into the HardFault handler on your firmware during development, you potentially have one of the following issues:
- Executing an undefined instruction
- Executing program code from a memory region marked as the eXecute Never (XN)
- Writing to a memory region marked as read-only
- Accessing an invalid memory location
- Accessing a privileged-access only registers by unprivileged software
- Accessing an unaligned memory location
3. Types of HardFault
In fact, the HardFault exception could be raised because of an escalated fault other than a HardFault, which is not enabled. Below are the exceptions that can be escalated to HardFault. We take the Cortex®-M4 based MCUs as an example and this list can be extended for the Cortex®-M33 based MCUs.
3.1. MemManage Fault
An exception that occurs because of a memory protection related fault. The fixed memory protection constraints determine this fault, for both instruction and data memory transactions. This fault is always used to abort instruction accesses to Execute Never (XN) memory regions.
3.2. Bus Fault
An exception that occurs because of a memory related fault for an instruction or data memory transaction. This might be from an error detected on a bus in the memory system.
3.3. Usage Fault
An exception that occurs because of a fault related to instruction execution.
The fault exception other than the HardFault can be enabled by writing to the System Handler Control and State Register (SCB->SHCSR) in their enable bit fields (USGFAULTENA, BUSFAULTENA, MEMFAULTENA) as shown below:
4. Determining the fault cause
To help detect what type of error was encountered in the fault handler, the Cortex®-M3 and Cortex®-M4 processors also have a number Fault Status Registers (FSRs) and Fault Address Registers (FARs) that are used for fault analysis.
5. Configurable Fault Status Register
The CFSR indicates the cause of a MemManage fault, BusFault, or UsageFault and can be further divided into three registers. Besides accessing CFSR as a 32-bit word, each part of the CFSR can be accessed using byte and half-word transfers. There is no CMSIS-Core symbol for the divided MMSR, BFSR, and UFSR.
This table shows the types of fault, the handler used for the fault, the corresponding fault status register, and the register bit that indicates that the fault has occurred. See the configurable fault status register (CFSR; UFSR+BFSR+MMFSR) on page 237 and HFSR on page 241 of the programming manual PM0214 for more information.
6. Debugging the HardFault
In order to debug the HardFault, we need to recover the useful information found in the above-mentioned registers. Furthermore, we need to recover the call stack as well as the core registers to get the instruction that caused the HardFault.
Note: For better debuggability optimization, the level must be set to 0 or 1. To make sure we are visualizing the correct behavior of the CPU, we are setting the optimization level to 0:
7. Halting and determining the core register state
In order to analyze the HardFault we need to Halt the execution and get the value of core registers.
Below is an example on how we can do this:
Set a software breakpoint in the error handled of your HardFault:
void HardFault_Handler(void)
{
/* USER CODE BEGIN HardFault_IRQn 0 */
#ifdef DEBUG
__BKPT(0);
#endif
/* USER CODE END HardFault_IRQn 0 */
while (1)
{
/* USER CODE BEGIN W1_HardFault_IRQn 0 */
/* USER CODE END W1_HardFault_IRQn 0 */
}
}
Alternatively, we can set the option halt in exception in CubeIDE in the debug configuration:
Now, we can visualize the content of core registers and get the stack pointer value:
8. How to recover the call stack
To recover the call stack and examine the CPU state which lead to the HardFault we need to do the following:
- Extract the value of the stack pointer (SP) from CPU registers
- Copy the address SP and paste it in the memory view
- You should read the content as follows:
- From first address to 8th address
- 1st address for R0
- 2nd address for R1
- 3rd address for R2
- 4th address for R3
- 5th address for R12
- 6th address for LR
- 7th address for PC
- 8th address for xPSR
5. Copy the PC content and paste it in the disassembly viewer and see the instruction that cause the HardFault
8.1. Fault analyzer
The fault analyzer gives information about the fault that occurred, based on the fault related registers.
Note: The fault analyzer feature is not available for Cortex M0 based microcontrollers.
9. Examples
In this section we will go throw 3 different examples of an access to a reserved address: 0x00100000:
- Trigger a BusFault (if enabled) or HardFault by reading from a reserved address
- Trigger a BusFault (if enabled) or HardFault by writing to a reserved address
- Trigger a BusFault (if enabled) or HardFault by executing at a reserved address
To run each Test, you must uncomment the corresponding define.
#include "main.h"
/* Private includes ----------------------------------------------------------*/
/* Private typedef -----------------------------------------------------------*/
/* Private define ------------------------------------------------------------*/
#define ADDRESS 0x00100000
#define READ_DATA_ACCESS /*Uncomment only to run Test1*/
//#define WRITE_DATA_ACCESS /*Uncomment only to run Test2*/
//#define FETCH_INSTRUCTION_ACCESS /*Uncomment only to run Test3*/
/* Private macro -------------------------------------------------------------*/
/* Private variables ---------------------------------------------------------*/
/* Private function prototypes -----------------------------------------------*/
void SystemClock_Config(void);
/* Private user code ---------------------------------------------------------*/
/**
* @brief The application entry point.
* @retval int
*/
int main(void)
{
/* Reset of all peripherals, Initializes the Flash interface and the Systick. */
HAL_Init();
/* Configure the system clock */
SystemClock_Config();
int r;
volatile unsigned int* p;
int (*pF)(void);
#ifdef READ_DATA_ACCESS
p = (unsigned int*)ADDRESS; // reserved address
r = *p; // read from reserved address
#endif
#ifdef WRITE_DATA_ACCESS
//SCnSCB->ACTLR |= SCnSCB_ACTLR_DISDEFWBUF_Msk ; // Disable Write Buffer
p = (unsigned int*)ADDRESS; // reserved address
*p = 0xDEEDBEEF; // write to reserved address
#endif
#ifdef FETCH_INSTRUCTION_ACCESS
pF = (int(*)(void))(ADDRESS+1); // reserved address
r = pF(); // fetch instruction from reserved address
#endif
while (1)
{
}
}
9.1. Read data access
9.2. Observations
When running the Project, a HardFault is triggered
The first step to do is to check the Fault Analyzer and Debug windows.
We can find very useful information related to the fault.
(1): HardFault has been caused by escalation of another exception which is BusFault.
(2) (3): The BusFault is caused by Precise data access to the address: 0x100000
(4): PC = 0x0800022E the value of PC just before the exception has occurred (the address of the instruction that causes the fault).
9.3. Write Data Access
When running the Project, a HardFault is triggered due to a BusFault escalation
BusFault caused this time by imprecise data access(No address is specified in BFSR)
The value of PC just before the exception has occurred PC= 0x08000232
This is not the address of the faulting write, which is 0x08000230
Note: BusFault is raised some instructions after the write instruction
IMPRECISERR = 1
Imprecise data access violation. Return address not related to fault
BFARVALID = 0
BFAR not valid
The reason for a bus fault to become imprecise is due to the presence of write buffers in the processor bus interface
Debugging imprecise bus faults is a bit harder than precise bus faults because by the time the bus fault exception is triggered, the processor could have executed several instructions, including branch instructions.
If the branch target can be reached via several paths, it could be hard to tell where the faulting memory access took place
To help with debugging such situations, you can disable the write buffer using the DISDEFWBUF bit in the Auxiliary Control register
To disable the write buffer, uncomment Line 40
After disabling the write buffer, the fault becomes precise.
BFARVALID = 1 BFAR is valid (contains the address of the location that generated a BusFault).
The value of PC just before the exception has occurred shown in the debug window (can be found in the stack (Exception frame) now points to the faulty instruction.
Note:
To facilitate the debug of imprecise fault, we can change the memory type of a region that contains the reserved address to Strongly Ordered using the MPU (memory protection unit) instead of disabling the write buffer.
Fetch instruction Access
FORCED = 1
BusFault escalated to HardFault.
IBUSERR = 1
BusFault on instruction prefetch.
The address of the location that generated a BusFault is: PC = 00100000 not in BFAR (unlike the precise data error)
The BusFault is immediately triggered on execution at an invalid address
Conclusion
In this article we try to display a view of some basic debugging technique for fault handling. This can be useful for any embedded engineer when developing any kind of application on most of STM32 MCUs based on Arm Cortex M 3/4 CPUs. For other Cortex M33 or M7 based MCUs we can have other sources of HardFault like secure fault and faults related to cache management.
Related links
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Email to a Friend
- Report Inappropriate Content
Some more general links on debugging Cortex-M Hard Faults:
https://community.arm.com/support-forums/f/embedded-forum/3257/debugging-a-cortex-m0-hard-fault
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Email to a Friend
- Report Inappropriate Content
The Cortex-M0(+) is more fussy with memory alignment, so pointers the fetch doubles or uint64_t can be particularly problematic, say unaligned structs in memory, files, or serial data streams.
Things like LDRD/STRD
Add a Handler that provides information from products in the field, or that you can't readily debug in person.
Believe me a while(1) loop that dies silently won't help your Technical Support Engineers identify modes of failure. If you can't learn something from a customer call, it's a wasted engagement.
https://github.com/cturvey/RandomNinjaChef/blob/main/KeilHardFault.c
Also for Error_Handler(), use the __FILE__,__LINE_ form so you know where it came from.