2017-04-30 06:09 AM
Hi,
Creating a EmWin application based on a STM32F746 Disco, using OpenSTM32. I toke the STemWin example application as a starting point. O
ccasionally
(once every 1-3 hours of operation) I'm running into a Hard Fault (stm32f7xx_it.c -> HardFault_Handler()).The default HardFault_Handler doesn't give me a root cause. I use the description below to further extent the HardFault_Handler.
https://blog.frankvh.com/2011/12/07/cortex-m3-m4-hard-fault-handler/
I have implemented the code below in startup_stm32f746xx.s.
HardFault_Handler:
TST LR, #4
ITE EQ
MRSEQ R0, MSP
MRSNE R0, PSP
B hard_fault_handler_c
Now when a hard fault occurs the MCU is stuck at line “TST LR, #4�?, not sure what I’m doing wrong. Maybe this assembler code is not compatible with Cortex M7?
What is the best and fastest way to track down the actual hard fault root cause on a Cortex M7?
Thanks.
Solved! Go to Solution.
2017-05-22 03:59 AM
Hi,
Indeed, after some debugging and RTOS digging it turned out that the timer stack was defined to small. We made some changes in the timer callback, which is now causing a timer stack overflow. Changed configTIMER_TASK_STACK_DEPTH to a value of 256 (instead of 128).
Problem solved.
2017-04-30 08:21 PM
>>
What is the best and fastest way to track down the actual hard fault root cause on a Cortex M7?
Look at the registers, and dig into the stack? You are trying to identify the faulting instruction, the registers it uses, and how the code may have come to this point. If it is a pointer, add sanity checking before you use it, catch it before it faults, and understand how this invocation differs from all the others that succeed in the preceding hours.
The
“TST LR, #4� should be innocuous in any case, do you have a breakpoint placed there?
2017-05-01 08:24 AM
If you use dynamic memory allocation, ie malloc()/free() make sure it is not giving you a NULL vector, and that you don't have a fragmentation or memory leak issue with respect to the heap.
2017-05-05 02:19 PM
,
,
Hi Clive,
Having the hard fault handler changed I'm getting closer to the roote cause of the hard fault that occurs once every 1-3 hours.
Seems to be RTOS related. ,
Disassembly:
634 , , , , , , , , , , , , , , ,if( listIS_CONTAINED_WITHIN( NULL, &,( pxTimer->,xTimerListItem ) ) == pdFALSE )
,
080338f2: , , ldr , , , , r3, [r7, ♯ 36] , , , 0x24,
080338f4: , , ldr , , , , r3, [r3, ♯ 20]Code:
if( listIS_CONTAINED_WITHIN( NULL, &,( pxTimer->,xTimerListItem ) ) == pdFALSE )
,
, , , , , , , , ,{,
, , , , , , , , , , , ,/* The timer is in a list, remove it. */,
, , , , , , , , , , , ,( void ) uxListRemove( &,( pxTimer->,xTimerListItem ) ),,
, , , , , , , , ,}We are also using HAL lib timers (e.g. TIM2). Is it possible that the HAL timer is conflicting with an OS timer?
2017-05-05 02:37 PM
Is there any periodic process that allocates memory in the heap and doesn't free it? Or, do you allocate enough stack space for your tasks (esp. with STemWin the depth can be very big and some functions even recurring).
Check with RTOS stack-aware trace (or FreeRTOS uxTaskGetSystemState function) the stack usage and increase where close or at 100%, the stack overflow is very peculiar and has random effects.
Otherwise the FreeRTOS and STemWin are rock stable.
2017-05-05 03:17 PM
Knowing r3 and r7 would be instructive.
Suspect
pxTimer is invalid. Sanity check the value prior to use, and look where it is coming from. Passed as a parameter? On the stack, etc.
2017-05-07 01:26 AM
Hi,
did some testing for a few days, in order to reproduce the hard fault and to check if the hard fault handler points to the same location.
Last few days three times a hard fault occurred, the handler always gives me the location bellow:
235 ( pxList->uxNumberOfItems )--;
08031842: ldr r3, [r7, &sharp12]08031844: ldr r3, [r3, &sharp0]stacked_r3 unsigned int 0xffffffff (Hex)
stacked_pc unsigned int 0x8031844 (Hex)
Pointer r3 seems to be invalid. I don't have a stacked_r7 implemented in the hard fault handler, so that information is missing.
Not sure how to trace down the source of this pointer.
According to periodic processes, beside the touch screen we have a CAN interface running.
Maybe a good idea to implement the uxTaskGetSystemState and get the stacked r7 value?
2017-05-07 06:10 PM
Place a breakpoint in the HardFault_Handler, and review the registers in the debug window.
Sanity check pxList prior to using it. Print out a diagnostic, or breakpoint, if it fails. Work back up the call tree instrumenting and adding sanity checking.
2017-05-09 05:20 AM
One question - what do you do in the timer elapsed callback? The registered function runs with the Timer stack and that may be too small for designed function.
Check RTOS statistics and stack allocation for this system thread.
2017-05-22 03:59 AM
Hi,
Indeed, after some debugging and RTOS digging it turned out that the timer stack was defined to small. We made some changes in the timer callback, which is now causing a timer stack overflow. Changed configTIMER_TASK_STACK_DEPTH to a value of 256 (instead of 128).
Problem solved.