on
2024-06-27
05:00 AM
- edited on
2024-08-26
07:38 AM
by
Laurids_PETERSE
For a firmware developer targeting and STM32 MCU based on a Cortex® M, they need to keep an eye on memory access, hardware availability, clock, and power to avoid running into issues that can be hard to follow and determine their root cause.
Nevertheless, we can all at the end run into a dead end, where we will need to debug and overcome HardFault.
In this article, we explain how to debug faults on Arm Cortex®-M based STM32 devices. In the process, we learn about fault registers, how to automate fault analysis, and figure out ways to recover from some faults without rebooting the MCU. We include practical examples, with a step-by-step walkthrough on how to investigate them.
A HardFault is a type of fault that occurs on Arm Cortex®-M processors, which are commonly used in microcontroller applications. It is a type of error that indicates a serious failure in the system, and it triggers by various conditions that the processor cannot handle through other exception mechanisms.
In other words, a HardFault is an exception that occurs because of an error during exception processing, or because an exception cannot be managed by any other exception mechanism. HardFaults have a fixed priority of -1, meaning they have higher priority than any exception with configurable priority.
If you get into the HardFault handler on your firmware during development, you potentially have one of the following issues:
In fact, the HardFault exception could be raised because of an escalated fault other than a HardFault, which is not enabled. Below are the exceptions that can be escalated to HardFault. We take the Cortex®-M4 based MCUs as an example and this list can be extended for the Cortex®-M33 based MCUs.
An exception that occurs because of a memory protection related fault. The fixed memory protection constraints determine this fault, for both instruction and data memory transactions. This fault is always used to abort instruction accesses to Execute Never (XN) memory regions.
An exception that occurs because of a memory related fault for an instruction or data memory transaction. This might be from an error detected on a bus in the memory system.
An exception that occurs because of a fault related to instruction execution.
The fault exception other than the HardFault can be enabled by writing to the System Handler Control and State Register (SCB->SHCSR) in their enable bit fields (USGFAULTENA, BUSFAULTENA, MEMFAULTENA) as shown below:
To help detect what type of error was encountered in the fault handler, the Cortex®-M3 and Cortex®-M4 processors also have a number Fault Status Registers (FSRs) and Fault Address Registers (FARs) that are used for fault analysis.
The CFSR indicates the cause of a MemManage fault, BusFault, or UsageFault and can be further divided into three registers. Besides accessing CFSR as a 32-bit word, each part of the CFSR can be accessed using byte and half-word transfers. There is no CMSIS-Core symbol for the divided MMSR, BFSR, and UFSR.
This table shows the types of fault, the handler used for the fault, the corresponding fault status register, and the register bit that indicates that the fault has occurred. See the configurable fault status register (CFSR; UFSR+BFSR+MMFSR) on page 237 and HFSR on page 241 of the programming manual PM0214 for more information.
In order to debug the HardFault, we need to recover the useful information found in the above-mentioned registers. Furthermore, we need to recover the call stack as well as the core registers to get the instruction that caused the HardFault.
Note: For better debuggability optimization, the level must be set to 0 or 1. To make sure we are visualizing the correct behavior of the CPU, we are setting the optimization level to 0:
In order to analyze the HardFault we need to Halt the execution and get the value of core registers.
Below is an example on how we can do this:
Set a software breakpoint in the error handled of your HardFault:
void HardFault_Handler(void)
{
/* USER CODE BEGIN HardFault_IRQn 0 */
#ifdef DEBUG
__BKPT(0);
#endif
/* USER CODE END HardFault_IRQn 0 */
while (1)
{
/* USER CODE BEGIN W1_HardFault_IRQn 0 */
/* USER CODE END W1_HardFault_IRQn 0 */
}
}
Alternatively, we can set the option halt in exception in CubeIDE in the debug configuration:
Now, we can visualize the content of core registers and get the stack pointer value:
To recover the call stack and examine the CPU state which lead to the HardFault we need to do the following:
The fault analyzer gives information about the fault that occurred, based on the fault related registers.
Note: The fault analyzer feature is not available for Cortex M0 based microcontrollers.
In this section we will go throw 3 different examples of an access to a reserved address: 0x00100000:
To run each Test, you must uncomment the corresponding define.
#include "main.h"
/* Private includes ----------------------------------------------------------*/
/* Private typedef -----------------------------------------------------------*/
/* Private define ------------------------------------------------------------*/
#define ADDRESS 0x00100000
#define READ_DATA_ACCESS /*Uncomment only to run Test1*/
//#define WRITE_DATA_ACCESS /*Uncomment only to run Test2*/
//#define FETCH_INSTRUCTION_ACCESS /*Uncomment only to run Test3*/
/* Private macro -------------------------------------------------------------*/
/* Private variables ---------------------------------------------------------*/
/* Private function prototypes -----------------------------------------------*/
void SystemClock_Config(void);
/* Private user code ---------------------------------------------------------*/
/**
* @brief The application entry point.
* @retval int
*/
int main(void)
{
/* Reset of all peripherals, Initializes the Flash interface and the Systick. */
HAL_Init();
/* Configure the system clock */
SystemClock_Config();
int r;
volatile unsigned int* p;
int (*pF)(void);
#ifdef READ_DATA_ACCESS
p = (unsigned int*)ADDRESS; // reserved address
r = *p; // read from reserved address
#endif
#ifdef WRITE_DATA_ACCESS
//SCnSCB->ACTLR |= SCnSCB_ACTLR_DISDEFWBUF_Msk ; // Disable Write Buffer
p = (unsigned int*)ADDRESS; // reserved address
*p = 0xDEEDBEEF; // write to reserved address
#endif
#ifdef FETCH_INSTRUCTION_ACCESS
pF = (int(*)(void))(ADDRESS+1); // reserved address
r = pF(); // fetch instruction from reserved address
#endif
while (1)
{
}
}
When running the Project, a HardFault is triggered
The first step to do is to check the Fault Analyzer and Debug windows.
We can find very useful information related to the fault.
(1): HardFault has been caused by escalation of another exception which is BusFault.
(2) (3): The BusFault is caused by Precise data access to the address: 0x100000
(4): PC = 0x0800022E the value of PC just before the exception has occurred (the address of the instruction that causes the fault).
When running the Project, a HardFault is triggered due to a BusFault escalation
BusFault caused this time by imprecise data access(No address is specified in BFSR)
The value of PC just before the exception has occurred PC= 0x08000232
This is not the address of the faulting write, which is 0x08000230
Note: BusFault is raised some instructions after the write instruction
IMPRECISERR = 1
Imprecise data access violation. Return address not related to fault
BFARVALID = 0
BFAR not valid
The reason for a bus fault to become imprecise is due to the presence of write buffers in the processor bus interface
Debugging imprecise bus faults is a bit harder than precise bus faults because by the time the bus fault exception is triggered, the processor could have executed several instructions, including branch instructions.
If the branch target can be reached via several paths, it could be hard to tell where the faulting memory access took place
To help with debugging such situations, you can disable the write buffer using the DISDEFWBUF bit in the Auxiliary Control register
To disable the write buffer, uncomment Line 40
After disabling the write buffer, the fault becomes precise.
BFARVALID = 1 BFAR is valid (contains the address of the location that generated a BusFault).
The value of PC just before the exception has occurred shown in the debug window (can be found in the stack (Exception frame) now points to the faulty instruction.
Note:
To facilitate the debug of imprecise fault, we can change the memory type of a region that contains the reserved address to Strongly Ordered using the MPU (memory protection unit) instead of disabling the write buffer.
Fetch instruction Access
FORCED = 1
BusFault escalated to HardFault.
IBUSERR = 1
BusFault on instruction prefetch.
The address of the location that generated a BusFault is: PC = 00100000 not in BFAR (unlike the precise data error)
The BusFault is immediately triggered on execution at an invalid address
In this article we try to display a view of some basic debugging technique for fault handling. This can be useful for any embedded engineer when developing any kind of application on most of STM32 MCUs based on Arm Cortex M 3/4 CPUs. For other Cortex M33 or M7 based MCUs we can have other sources of HardFault like secure fault and faults related to cache management.
Some more general links on debugging Cortex-M Hard Faults:
https://community.arm.com/support-forums/f/embedded-forum/3257/debugging-a-cortex-m0-hard-fault
The Cortex-M0(+) is more fussy with memory alignment, so pointers the fetch doubles or uint64_t can be particularly problematic, say unaligned structs in memory, files, or serial data streams.
Things like LDRD/STRD
Add a Handler that provides information from products in the field, or that you can't readily debug in person.
Believe me a while(1) loop that dies silently won't help your Technical Support Engineers identify modes of failure. If you can't learn something from a customer call, it's a wasted engagement.
https://github.com/cturvey/RandomNinjaChef/blob/main/KeilHardFault.c
Also for Error_Handler(), use the __FILE__,__LINE_ form so you know where it came from.