cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F746 Disco HardFault

paul2
Senior
Posted on April 30, 2017 at 15:09

Hi,

Creating a EmWin application based on a STM32F746 Disco, using OpenSTM32. I toke the STemWin example application as a starting point. O

ccasionally

(once every 1-3 hours of operation) I'm running into a Hard Fault (stm32f7xx_it.c -> HardFault_Handler()).

The default HardFault_Handler doesn't give me a root cause. I use the description below to further extent the HardFault_Handler.

https://blog.frankvh.com/2011/12/07/cortex-m3-m4-hard-fault-handler/

I have implemented the code below in startup_stm32f746xx.s.

HardFault_Handler:

  TST LR, #4

  ITE EQ

  MRSEQ R0, MSP

  MRSNE R0, PSP

  B hard_fault_handler_c

Now when a hard fault occurs the MCU is stuck at line “TST LR, #4�?, not sure what I’m doing wrong. Maybe this assembler code is not compatible with Cortex M7?

What is the best and fastest way to track down the actual hard fault root cause on a Cortex M7?

Thanks. 

1 ACCEPTED SOLUTION

Accepted Solutions
Posted on May 22, 2017 at 10:59

Hi,

Indeed, after some debugging and RTOS digging it turned out that the timer stack was defined to small. We made some changes in the timer callback, which is now causing a timer stack overflow. Changed configTIMER_TASK_STACK_DEPTH to a value of 256 (instead of 128).

Problem solved.                   

View solution in original post

9 REPLIES 9
Posted on May 01, 2017 at 05:21

>>

What is the best and fastest way to track down the actual hard fault root cause on a Cortex M7?

Look at the registers, and dig into the stack? You are trying to identify the faulting instruction, the registers it uses, and how the code may have come to this point. If it is a pointer, add sanity checking before you use it, catch it before it faults, and understand how this invocation differs from all the others that succeed in the preceding hours.

The 

“TST LR, #4â€� should be innocuous in any case, do you have a breakpoint placed there? 

Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..
Posted on May 01, 2017 at 17:24

If you use dynamic memory allocation, ie malloc()/free() make sure it is not giving you a NULL vector, and that you don't have a fragmentation or memory leak issue with respect to the heap.

Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..
Posted on May 05, 2017 at 21:19

 ,

 ,

Hi Clive,

Having the hard fault handler changed I'm getting closer to the roote cause of the hard fault that occurs once every 1-3 hours.

Seems to be RTOS related. ,

Disassembly:

634 , , , , ,  , , ,  , , ,  , , ,  ,if( listIS_CONTAINED_WITHIN( NULL, &,( pxTimer->,xTimerListItem ) ) == pdFALSE )

 ,

080338f2: , , ldr , , , , r3, [r7, ♯ 36] , , , 0x24

 ,

080338f4: , , ldr , , , , r3, [r3, ♯ 20]

Code:

if( listIS_CONTAINED_WITHIN( NULL, &,( pxTimer->,xTimerListItem ) ) == pdFALSE )

 ,

 , ,  , , ,  , , ,  ,{

 ,

 , ,  , , ,  , , ,  , , ,  ,/* The timer is in a list, remove it. */

 ,

 , ,  , , ,  , , ,  , , ,  ,( void ) uxListRemove( &,( pxTimer->,xTimerListItem ) ),

 ,

 , ,  , , ,  , , ,  ,}

We are also using HAL lib timers (e.g. TIM2). Is it possible that the HAL timer is conflicting with an OS timer?

Posted on May 05, 2017 at 21:37

Is there any periodic process that allocates memory in the heap and doesn't free it? Or, do you allocate enough stack space for your tasks (esp. with STemWin the depth can be very big and some functions even recurring).

Check with RTOS stack-aware trace (or FreeRTOS uxTaskGetSystemState function) the stack usage and increase where close or at 100%, the stack overflow is very peculiar and has random effects.

Otherwise the FreeRTOS and STemWin are rock stable.

Posted on May 05, 2017 at 22:17

Knowing r3 and r7 would be instructive.

Suspect 

pxTimer is invalid. Sanity check the value prior to use, and look where it is coming from. Passed as a parameter? On the stack, etc.

Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..
Posted on May 07, 2017 at 08:26

Hi,

did some testing for a few days, in order to reproduce the hard fault and to check if the hard fault handler points to the same location.

Last few days three times a hard fault occurred, the handler always gives me the location bellow:

235           ( pxList->uxNumberOfItems )--;

08031842:   ldr     r3, [r7, &sharp12]

08031844:   ldr     r3, [r3, &sharp0]

stacked_r3    unsigned int    0xffffffff (Hex)    

 

stacked_pc    unsigned int    0x8031844 (Hex)    

 

 

Pointer r3 seems to be invalid. I don't have a stacked_r7 implemented in the hard fault handler, so that information is missing.

Not sure how to trace down the source of this pointer.

According to periodic processes, beside the touch screen we have a CAN interface running. 

Maybe a good idea to implement the uxTaskGetSystemState and get the stacked r7 value?

Posted on May 08, 2017 at 01:10

Place a breakpoint in the HardFault_Handler, and review the registers in the debug window.

Sanity check pxList prior to using it. Print out a diagnostic, or breakpoint, if it fails. Work back up the call tree instrumenting and adding sanity checking.

Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..
Tomas DRESLER
Senior II
Posted on May 09, 2017 at 14:20

One question - what do you do in the timer elapsed callback? The registered function runs with the Timer stack and that may be too small for designed function.

Check RTOS statistics and stack allocation for this system thread.

Posted on May 22, 2017 at 10:59

Hi,

Indeed, after some debugging and RTOS digging it turned out that the timer stack was defined to small. We made some changes in the timer callback, which is now causing a timer stack overflow. Changed configTIMER_TASK_STACK_DEPTH to a value of 256 (instead of 128).

Problem solved.