STM32NUCLEO F767 Hardfault Handler on new project generated in Cube IDE

raffin · ‎2020-05-19

After udating the Cube MX, I started a new project for the STM32NUCLEO F767ZI.

On the default settings I added, through the CUBE MX, the FREERTOS and the LWIP.

Then I set TIM2 as timebase source. Finally I generated the code.

Surprisingly, when I start debugging I get always an Hard Fault.

The hard fault happens randomly during the peripheric init and in one case it does not happened. I am further investigating but some idea would be appreciated.

turboscrew · ‎2020-05-19

Well, if it's about the core, read ARM documentation. If it's about SoC peripherals outside the core, read STM documentation.

In this case, start with ARM (ARMv7-M ARM, https://static.docs.arm.com/ddi0403/eb/DDI0403E_B_armv7m_arm.pdf)

If other faults are not enabled, the faults are escalated to hard fault. To prevent that, enable other faults in SHCSR - at least usage faults, bus faults and memory faults.

Then you have three more exception vectors, but you can direct them all to the same handler that:

first checks what kind of fault it is (CFSR)
check LR for the exception return code and see which stack contains the stack frame, (ARMv7-M ARM, B1.5.8 Exception return behavior)
see the stack pointer (MSP or PSP).
Check if the fault is "precise". If so, the BFAR shows the address access to which caused the fault.
Find the exception frame from the SP upwards, and the exception return address there (ARMv7-M ARM, B1.5.6 Exception entry behavior, PushStack() pseudo code)
The faulting instruction is in either at the return address or is the previous instruction (depending...)

You could read those registers and have a breakpoint after the reads. Then you can check the values with a debugger.

raffin · ‎2020-05-20

Thank you turboscrew,

looking at the CFSR I found that the bit 0 on the UFSR (Usage Fault Status Register) is set which means "The processor has attempted to execute an undefined instruction". In one case I found the bit 1 set: Instruction executed with invalid EPSR.T or EPSR.IT.

The fault is not precise. It happens on the init functions (e.g. HAL_UART_MspInit, HAL_PCD_MspInit, HAL_GPIO_Init,...) and in some case it does not happen.

PC and LR registers points to valid address related to these init funtions.

It is related to the FreeRTOS since it happens when I enable this. I wonder if this could be related to a problem on the MPU on this specific nucleo board since I can't believe that a so simple and common recipe does not work:

0) Take the NUCLEO F767 Board.

1) Load the basic configuration for the Nucleo F767ZI on Cube MX

2) Add FreeRTOS (CMSIS_V1 or CMSIS_V2 is the same)

3) Set TIM2 (or TIM1 is the same) as sysclock.

4) Generate, Build; and Run

I expect this must work. Am I giving to much confidence to the CubeMX tool?

turboscrew · ‎2020-05-20

I don't reallu know hoe reliable the tools are, but usage fault is quite often caused by wrong address in branches or returns. The Cortex-M cores have the common thumb instruction set that is also used in bigger ARMs. The bigger devices often have several instruction sets: ARM, Thumb, Jazelle, ... ARM and Thumb instruction sets can be in use simultaneously. The least significant bit in the address tells whether the target is ARM code (bit = 0) or Thumb (bit = 1). Even if the Cortex-M cores do not support ARM instruction set, some "oddities" are still present.

If the last bit of a target address is zero, that would usually mean that the next instruction is ARM instruction, But since Cortex-M does not support ARM instructions, it throws usage fault.

Check that the code is really compiled totally for Thumb instruction set.

raffin · ‎2020-05-21

Thank you again Turboscrew,

all the code has been compiled with Thumb instruction set.

I suppose that the problem is related to the clock stability at the very beginning, during the periph. init.

I halfed the Sysclock from 96MHz to 48 and the hardware fault disappeared. Nevertheless I still have some trouble in the SystemClock_Config() function where HAL_RCC_ClockConfig(..) hangs in an infinite loop. A delay (while (cnt)cnt--) before the SystemClock_Config call has fixed this.