Hard fault causes imprecise bus fault, when handlers code order swapped.

Dr.1 · ‎2023-02-08

Hi,

We have a problem where we think that a memory alignment issue is causing

a hard fault, but can't work out why.

We are using ...

stm32f765.

Atollic 9.3.0 (for legacy purposes)

No operating system.

stlink v2 for debug.

The following code is what we label as bad. This code will run on the debugger,

but will not run from a power up reset.

Taken from P5003_stm32f7xx_it.c

/**

* @brief This function handles Pendable request for system service.

*/

void PendSV_Handler(void)

{

/* USER CODE BEGIN PendSV_IRQn 0 */

/* USER CODE END PendSV_IRQn 0 */

/* USER CODE BEGIN PendSV_IRQn 1 */

/* USER CODE END PendSV_IRQn 1 */

}

/**

* @brief This function handles System tick timer.

*/

#warning "SysTick_Handler before peripheral interrupt handlers == cold start fail"

#warning "SysTick_Handler before PendSV_Handler == cold start ok"

void SysTick_Handler(void)

{

HAL_IncTick();

}

The code below is labelled as good, as it runs on both the debugger and from a power

up reset. The only difference is that we have swapped the two handlers SysTick_Handler()

and PendSV_Handler() around.

#warning "SysTick_Handler before peripheral interrupt handlers == cold start fail"

#warning "SysTick_Handler before PendSV_Handler == cold start ok"

void SysTick_Handler(void)

{

HAL_IncTick();

}

/**

* @brief This function handles Pendable request for system service.

*/

void PendSV_Handler(void)

{

/* USER CODE BEGIN PendSV_IRQn 0 */

/* USER CODE END PendSV_IRQn 0 */

/* USER CODE BEGIN PendSV_IRQn 1 */

/* USER CODE END PendSV_IRQn 1 */

}

/**

* @brief This function handles System tick timer.

*/

The attached Stm32f765AlignmentProblem.Zip contains good and bad folders which

further contain the .map .elf .list .hex and the P5003_stm32f7xx_it.c file for

we think is causing the problem. We have so far determined that there is

an imprecise bus fault occrring, but don't know why swapping these two handlers causes

it to stop. Is it now being masked until we add more code to cause it to come back again?

Any help in understanding the problem would be greatly appreciated.

Note we have used -falign-functions gcc compiler options and this fixes the problem

with the bad code, but only because it has realigned it.

KnarfB · ‎2023-02-08

Could it be a side effect of some earlier bug, like a wild pointer access/buffer overflow?

Have you tried analysis with some advanced HardFault handler like https://www.segger.com/downloads/application-notes/AN00016 ?

Are you able to attach to the faulted by still running processor and analyze?

hth

KnarfB

Dr.1 · ‎2023-02-13

Thanks, for the info, we are currently reviewing the application note AN00016 to see if we can shed further light on this unusual bug. There are a lot of pointers/buffers in our code so maybe there is an issue there, but the fact that it runs on the debugger but not standalone points more towards a timing issue or maybe something to do with the reset logic.

FBL · ‎2023-02-16

Hello @Dr.1

Weird behavior. It is possible that the code is causing the problem. it's also possible that the issue is being masked and may reappear in the future as you mentioned.

It seems that the hard fault you are experiencing is related to memory alignment. It is weird that the order in which the interrupt handlers are defined is affecting the memory alignment of the code. You may need to test further your code to find out the root cause.

Using -falign-functions compiler option forces the compiler to align functions on a specific addresses, which can help with memory alignment issue. However, it's important to note that this is not a guaranteed solution and may not work in all cases.

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.