2019-01-23 08:50 AM
Hello,
I'm having a hard time debugging a hard fault I'm experiencing.
The hard fault occurs in the handler of external interrupt. Here's it's code:
extern "C" {
void EXTI2_IRQHandler(void)
{
static uint32_t last_sample_time = 0;
if (EXTI_GetITStatus(EXTI_Line2) != RESET) {
EXTI->PR = EXTI_Line2;
volatile uint32_t time = TIM2->CNT;
if(static_cast<uint16_t> (time - last_sample_time) < 184) {
return;
}
last_sample_time = time;
generator.getOutput();
}
}
}
The Fault Analyzer of Atollic TrueStudio says that fault occurs in the
if (EXTI_GetITStatus(EXTI_Line2) != RESET) {
line. Specifically, it points to this instruction as being the culprit:
08001c66: cbz r0, 0x8001c84 <EXTI2_IRQHandler()+40>
Here's the disassembly code around that line:
EXTI2_IRQHandler():
08001c5c: push {r4, r5, lr}
08001c5e: sub sp, #12
225 if (EXTI_GetITStatus(EXTI_Line2) != RESET) {
08001c60: movs r0, #4
08001c62: bl 0x8000a0c <EXTI_GetITStatus>
08001c66: cbz r0, 0x8001c84 <EXTI2_IRQHandler()+40>
226 EXTI->PR = EXTI_Line2;
08001c68: movs r2, #4
08001c6a: ldr r3, [pc, #268] ; (0x8001d78 <EXTI2_IRQHandler()+284>)
08001c6c: str r2, [r3, #20]
227 volatile uint32_t time = TIM2->CNT;
08001c6e: mov.w r3, #1073741824 ; 0x40000000
08001c72: ldr r3, [r3, #36] ; 0x24
08001c74: str r3, [sp, #4]
The cause of the fault appears to be an attempt to switch into invalid state (INVSTATE).
And here are register values at the time of the fault, as pointed out by the Fault Analyzer:
sp(MSP) 0x2001ffb4
r0 0x0
r1 0x0
r2 0x20000494
r3 0xed
r12 0x46
lr 0x8000
0x8001c67
0x20000148
The one thing that stands out to me is the link register, because it looks to me like an incorrect return address.
Although I have to admit that I'm a bit confused, because the fault seems to appear during a pretty standard procedure (checking the interrupt flag and then clearing it accordingly). Is it possible that the root of the problem is somewhere else in the code and it just materializes in this line?
I should add that this interrupt happens very often (the frequency of the external clock that triggers the EXTI is over 1 MHz). The fault usually occurs after a few seconds of running the program, which means that above piece of code is called millions of times before it eventually crashes.
Any help would be greatly appreciated.
2019-01-23 09:02 AM
A 1 MHz (1 us) interrupt rate is excessive.
Would check the stack isn't corrupted. Check MSP and PSP.
For GNU check the heap hasn't crashed into the stack, or otherwise failed.
The LR doesn't look right at all, and will definitely crash when the call unwinds.
The PC typically points the instruction after the fault.
Point of fault is likely symptom, not cause. Instrument better, and sanity check frequently (ie heap, stack depth, pointers, etc).
2019-01-23 09:02 AM
for debugging do the following:
uint32_t last_sample_time;
uint32_t time;
void EXTI2_IRQHandler(void)
{
last_sample_time = 0;
if (EXTI_GetITStatus(EXTI_Line2) != RESET) {
EXTI->PR = EXTI_Line2;
time = TIM2->CNT;
if((time - last_sample_time) < 184) {
return;
}
last_sample_time = time;
generator.getOutput();
}
}
static cast, static withing interrupt and volatile local variables seems unfamiliar to me.
Try simpler with global (assuming there are no global with same name, change name if there is)
2019-01-23 09:20 AM
static indicates the value is retained between calls, and thus a global variable, and not thread safe. Shouldn't be an issue unless static pool is getting corrupted, and wouldn't cause a fault directly.
volatile that the value must be re-read from memory with each use, seems unnecessary here, the count is volatile, but the variable won't change outside normal program flow.
Would expect C++ related failures elsewhere.
2019-01-23 10:08 AM
Thanks a lot.
I know the interrupt rate is excessive, but it's unfortunately a design requirement (I have to process incoming data that comes with each external clock cycle and output the result of some calculations).
Stack overflow seems unlikely, because I'm using very little RAM memory and this chip has plenty of it. I'm also not using heap at all (no dynamic memory allocation).
If the PC points to the instruction after the fault, does it mean that the previous instruction (the one causing the fault) is the branch to EXTI_GetStatus():
08001c62: bl 0x8000a0c <EXTI_GetITStatus>
or the return from the EXTI_GetStatus() function?
I'll look into possible causes elsewhere. I'm avoiding pointers/references whenever possible, also I try to implement boundary checks every time I read/write from an array, but there must be something I've missed.
Wonder what could cause a corruption of the link register at the beginning of this ISR though - it almost has to be another interrupt firing and corrupting the stack, right?
2019-01-23 10:20 AM
I added the volatile in case compiler optimizations were messing something up, but yes, it's completely unnecessary.
the static variable is only accessed within this routine, so the thread safety shouldn't be an issue in this case I think.
What would be the first C++ related failures that come to mind? I'm using it mainly for the convenience of classes, but I'm not using polymorphism/inheritance, exceptions pr any quirky template stuff