cancel
Showing results for 
Search instead for 
Did you mean: 

Critical stack corruption during CAN Interrupt (Silicon bug?)

rickard
Associate II
Posted on April 30, 2008 at 07:00

Critical stack corruption during CAN Interrupt (Silicon bug?)

13 REPLIES 13
rickard
Associate II
Posted on May 17, 2011 at 09:52

Further test have shown that the stack corruption also may appear in str function.

One interesting thing I found is that the value which the stack pointer is set to is based on the offset to the stackpointer:

ldr r0,[r13,#0x14] => sp = 0x40017F50

ldr r3,[r13,#0x0C] => sp = 0x40017FA4

str r3,[r13,#0x14] => sp = 0x40017F50

str r3,[r13,#0x14] => sp = 0x40017F50

ldr r3,[r13,#0x0C] => sp = 0x40017FA4

Thus offset 0x14 sets sp to 40017F50 and offset 0x0C sets sp to 40017FA4.

However I have no clue what's causing this....

mark9
Associate II
Posted on May 17, 2011 at 09:52

You need to reset the FIQ bit, ie 1=disabled,0=enable.

To disable FIQ:

MSR CPSR_c, #Mode_FIQ|I_BIT

I noticed in your screen shot that the FIQ is enabled during the interrupt, because the FIQ bit is 0. You might have code that is unintentionally enabling the FIQ.

rickard
Associate II
Posted on May 17, 2011 at 09:52

I've disabled the FIQ, but I still get the stack problem. I have no idea how to fix this problem...

rickard
Associate II
Posted on May 17, 2011 at 09:52

I’ve found the error.

There is nothing wrong with sp, the debugging tool was showing wrong in IRQ.

The problem was related to GnuArm 4.1.1.

GnuArm has a bug with interrupts.

Gcc will genereate a IRQ function like this:

void Interrupt()

{

sub lr, lr, #4 ; 0x4 // fix lr for interrupt

stmdb sp!, {r0, r1, r2, r3, r4, r5, r6, r7, ip, lr}

…..

….

….

ldmia sp!, {r0, r1, r2, r3, r4, r5, r6, r7, ip, lr}

subs pc, lr, #4 // the pc = lr – 4 // wrong!

}

The problem is that lr will be decreased with 8 instead of 4. This will cause the instruction before the interrupt to be ran twice, causing unstable behaviour.

To solve this, I declared the function as “naked�? and made macros for storing and reloading the registers:

void Interrupt()

{

sub lr, lr, #4 ; 0x4

stmdb sp!, {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, ip, lr}

ldmia sp!, {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, ip, pc}

}

This solved the problem with the wrong lr. However when declaring the function naked, the compiler skipped the instructions for reserving space on the stack for the local variables. This made the stack corrupt after the interrupt.

The function should look like:

void Interrupt()

{

sub lr, lr, #4 ; 0x4

stmdb sp!, {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, ip, lr}

sub sp, sp, #28 // reserve space on the stack for 28 byte local variables.

add sp, sp, #28 // release space on the stack for 28 byte local variables.

ldmia sp!, {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, ip, pc}

}

The problem is that the macro will never know how much data need to be reserved on the stack so by adding the sub, add with 28 byte will not be a good solution. A change in the code may need extra bytes.

Have anyone solved this GnuArm bug? Or made a nice workaround? Or some thoughts about this bug?

[ This message was edited by: rickard.thorstensson on 30-04-2008 13:35 ]