Cortex M7 Hard Fault Handler - freeRTOS Aware

Garnett.Robert · ‎2025-04-12

Hi All,

We all know how hard it often is to track down the cause of hard faults, particularly random or intermittent ones when the debugger isn't connected and particularly when using freeRTOS in a complex system with many tasks, timers and queues. To assist I have written a hard fault handler that provides printed output of all the mcu and if used, floating point registers, special registers and the freeRTOS task that was running when the hard fault occurred. It also provides memory ranges to check for program counter, link register and stack addresses to check whether an address used is definitely invalid.

I have checked it on a range of fault types and it seems to work OK. I'm not an ARM assembler expert, my assembler knowledge is quite limited, so I would appreciate it if the assembler/mcu experts could critique it and suggest fixes and mod's.

I have also written a small module that generates hard faults for testing purposes. To ease the use of this handler I have provided all the files required including my startup, interrupt handlers and linker script in the attached zip file.

The output of the handler from TeraTerm is:

==============================

***** HardFault Occurred *****
Link Reg Value (Lockup Addr): 0xFFFFFFED
Stack frame: PSP Process Stack Ptr (Thread mode)

Fault status registers:
HFSR (HardFault Status):          0x00000000
CFSR (Configurable Fault Status): 0x00000082
MMFAR (MemManage Fault Addr):     0x00000000
BFAR (BusFault Addr):             0x00000000
AFSR (Auxiliary Fault Status):    0x00000000

MemManage Fault:
  - MMFAR valid (0x00000000)
  - Data access violation

HardFault details:
R0 : 0x00000006
R1 : 0x00001F40
R2 : 0xDEADBEEF
R3 : 0x00000000
R12: 0xDD6CE856
Stacked PC:  0x900441A2
Stacked PSR: 0x41000000
Stacked LR:  0x900442DD

Special registers:
CONTROL: 0x00000000
PRIMASK: 0x00000000
BASEPRI: 0x00000000
FAULTMASK: 0x00000000

FPU register dump:
FPSCR: 0x00000000
S0-S1:  0x00000000 0x00000000
S2-S3:  0x00000000 0x00000000
S4-S5:  0x00000000 0x00000000
S6-S7:  0x00000000 0x00000000
S8-S9:  0x00000000 0x00000000
S10-S11: 0x00000000 0x3F800000
S12-S13: 0x4A989680 0x40000000
S14-S15: 0x4E64E1C0 0x1C9C3800

Double precision registers:
D0:  0.000000 (0x000000000000000lX)
D1:  0.000000 (0x000000000000000lX)
D2:  0.000000 (0x000000000000000lX)
D3:  0.000000 (0x000000000000000lX)
D4:  0.000000 (0x000000000000000lX)
D5:  0.007812 (0x000000000000000lX)
D6:  2.000001 (0x000000000000000lX)
D7:  0.000000 (0x000000000000000lX)
----------------------------

freeRTOS Task Status:
----------------------------
Task Name: monitorTask
Task State: 0
Task Priority: 24
Task Stack High Water Mark: 77
Task Handle Addr: 2405e5e0
==== Hard Fault Report End ====

Info to use the handler may be found in the comment block at the start of HardFault_HandlerFreeRTOS.c

I hope people find this useful.

*This message has been edited to comply with the ST Community Terms and Conditions.

STOne-32 · ‎2025-04-16

Dear @Garnett.Robert ,

Thank you for the valuable contribution very appreciated to our STCommunity. It will be great to post the same message on this Knowledge Article in the comments/reply: How to debug a HardFault on an Arm® Cortex®-M STM3... - STMicroelectronics Community

So to have the same topic on same thread in future and have more interactions from our members .

Thanks,

STOne-32.

Pavel A. · ‎2025-04-17

+1 Just a thought: calling printf in context of HardFault handler does not look like a good idea (besides the pure demonstration)

AMars.4 · ‎2025-04-17

Using SWO should be OK though right?

Garnett.Robert · ‎2025-04-23

Using printf will only be a problem if the stack used for interrupts (MSP stack) is corrupt or is over-run as printf will use this stack for processing in the hard fault handler interrupt routine.

This handler is designed to be used for capturing hard faults in test systems where the hard faults occur infrequently and apparently randomly so that normal debugging is pretty useless. If the MSP stack is being corrupted then printf will either print out rubbish or cause another hard fault. Not much you can do about this, but it will indicate to you that the MSP has probably been corrupted or has run out of stack due to too many interrupts being nested.

The idea is to connect a pc up to the virtual comm-port or com port if available and to print the output to a program like TeraTerm where the output can be logged to a file. So printf is required to get the data out. Not using printf defeats it's purpose.If you don't use printf then you have to use a debugger to examine the registers and memory which is often impractical. If the hard fault is data driven then allowing the hard fault handler to reset the system so it can continue to run will enable logging of multiple hard faults along with time if time is provided by the RTC peripheral.

I have used printf within the hard fault handler for infrequent, random faults and have found it to be very reliable. No interrupts are used for printf, only the stack which if it is intact and the messages are short for each printf then printf does not seem to cause problems.

I have a #define PRINT_HARD_FAULTS 1 to enable or disable printing for use with the debugger.

As for using SWO I have redirected the printout to this, but the debugger must be connected and the output to the SWO Data Console is not logged, although you can export the output manually in the CubeIDE. Printf is till used for SWO so the issues regarding printf causing problems is still the same as using the com port.

There is no rule against using printf you just need to understand it can either cause more hard faults, print garbage or do both if the MSP becomes corrupt and points to non-ram or ram that that is defined as not usable by the MPU. If for same reason the MSP was changed by the fault from the prescribed ram to some other ram, printf will probably work although the ram it uses will of course be overwritten.

To make your system robust when you have a number of tasks and much non-blocking I/O that could behave badly, use all the memory areas at your disposal and protect them with the Memory Protection Unit (MPU).

I stick my privileged mode stack and heap (MSP) in the first 8k of the DTCM Ram at 0x20000000 and set the MPU for privileged only access for this area. This means that if naughty, unprivileged code or DMA tries to access this memory you will get an MPU error and you thus won't corrupt this stack. You can do this by:

  /* Configure the MPU attributes as Backup OSPI Fash */
  MPU_InitStruct.Enable = MPU_REGION_ENABLE;
  MPU_InitStruct.Number = MPU_REGION_NUMBER11;
  MPU_InitStruct.BaseAddress = 0x20000000;
  MPU_InitStruct.Size = MPU_REGION_SIZE_16KB;
  MPU_InitStruct.AccessPermission = MPU_REGION_PRIV_RW;
  MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
  MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
  MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
  MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;
  MPU_InitStruct.SubRegionDisable = 0x00;
  MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_ENABLE;
  HAL_MPU_ConfigRegion(&MPU_InitStruct);

With the M7 you have 16 memory areas you can define for the MPU and it is very handy to stop silly things happening. You can do the same sort of thing with the ITCM and of course the flash area. It's easy to use doesn't cost anything and works well.

Hard faults are a pain, but really they can be your friend by protecting you from the forces of evil and darkness. Using printf is OK you just need to understand its limitations.

It is possible to implement per task memory protection using FreeRTOS-MPU with the MPU, but I haven't tried this. If I give it a whirl I will post something about it.

Standard printf has its limitations for debugging. It uses blocking I/O that makes it slow, so for time critical debugging it can be useless. To get over this I have written a DMA version that uses non blocking DMA for the I/O and a separate task using queues to pass the data from the fast task to the printdma task. It is very useful for logging where printf cannot be used. It is not suitable for small systems and it requires freeRTOS, so it also has its limitations, but I have found it invaluable for tracking issues with more complex systems using wiFi or ethernet.

I know there are people who will criticise my use of freeRTOS, but if you want to use easy graphics like TouchGFX then freeRTOS is mandatory. Anyway I like freeRTOS for complex systems so I will stick with it.