cancel
Showing results for 
Search instead for 
Did you mean: 

Critical stack corruption during CAN Interrupt (Silicon bug?)

rickard
Associate II
Posted on April 30, 2008 at 07:00

Critical stack corruption during CAN Interrupt (Silicon bug?)

13 REPLIES 13
rickard
Associate II
Posted on May 17, 2011 at 09:52

I found a strange behavior of the ARM STR912FW44 Revision G.

When executing code in an CAN interrupt the stackpointer (sp) is increased by 472, causing the registers to be corrupt after the interrupt.

The assembler function which corrupts the sp is:

ldr r0,[r13,#0x14]

which means that r0 should be equal to the value located at sp (r13) + 0x14. This value is 5 and r0 is set to 5. At the same time the sp is increased with 472, from 0x40017D78 to 0x40017F50.

This stack corruption have only been seen in the CAN Interrupt.

I have attached screenshots showing registers and assembler instructions. ''Stack ok'' shows when everything is fine and ''Stack nok'' shows when the stackpointer have been changed.

Several ldr functions which gets data from the stack has been executed in the interrupt without changing the stack pointer.

What is this? Is it a silicion limitation?? Is there a logical explanation?

mark9
Associate II
Posted on May 17, 2011 at 09:52

Do you have FIQ enabled?

It is interesting that SP[0] SP[x14] and SP[x18] do not change when the SP changes.

It almost looks like the lowest 12 bits of R14 got copied into R13.

truf9
Associate II
Posted on May 17, 2011 at 09:52

Hello,

I had also problems with the can-interrupt.

I work with Keil-uVision.

It gives two interrupt Handler:IRQ.S 91x_vect.s

With the orginal IRQ.s I got undefinded_Handler, reset_Handler and so on.

I moved the IRQ-Handler from 91x_vect.s to IRQ.s and then the application doesn't run into the error-handlers.

Have a look to the documentation of nested interrupts. At the use of nested interrupt, the stacksize have to be enlarged.

When are the interrupt enabled in the interrupt-handler??

Good Luck

rickard
Associate II
Posted on May 17, 2011 at 09:52

lakata:

The FIQ is enabled but not used. Could this cause a problem?

Your observation that the 12 bits of R14 got copied into R13 is interesting.

TRuf:

I'm using GCC and I don't use 91x_vect.s and IRQ.s. Did you do something else then just move the code?

Where can I find the documentation of nested interrupts?

The interupt is enabled at the end of the CAN Interrupt by setting VIC0->VAR = 0x0000; Is this correct or do I have to disable interrupts in beginning of CAN interupt.

I attach my startup file. Maybe something is wrong in it.

truf9
Associate II
Posted on May 17, 2011 at 09:52

Hello,

I only moved the code for th interrupt handler.

For documentation look at:

http://infocenter.arm.com/help/index.jsp

Nested interrupts are found in Realview compilation tools/Developer Guide

Have a look at:

PrimeCell Vectored Interrupt Controller (PL190)

With VIC0->VAR you acknowledge Interrupts, you clear the address of the current active ISR.

Enable interrupts are made in the Current program Status Register as following:

MSR CPSR_c, #0x1F ; Switch to SYS Mode and enable IRQ

Is your IRQ Handler running in IRQ or in Sys-Mode???

rickard
Associate II
Posted on May 17, 2011 at 09:52

I made a new test where I put another function in the Can Interrupt.

The instruction which changes the stackpointer is:

ldr r3,[r13,#0x0C]

The stackpointer is changed from 0x40017DCC to 0x40017FA4 which is an increase of 472, the same as in my previous test.

Note that PC is at a different address than before, r3 is used instead of r0, offset differs but still the stackpointer is increased with 472 at an ldr instruction which gets data from stackpointer with an offset. The same instruction is also found at 0x1D414. At this location the stackpointer is not changed but at 0x1D38 it is.

I've attached screenshots showing registers and assembler instructions. 1_Stack_Ok shows when the stack is correct and 2_Stack_Nok shows when the stack is corrupted.

Truf:

I'm running the IRQ handler in IRQ mode. Is this correct setup?

.equ Mode_IRQ, 0x12

msr CPSR_c, #Mode_IRQ|I_BIT|F_BIT

mov r13, r0 4

sub r0, r0, #IRQ_Stack_Size

I do not want use nested interrupts. I don't think I need to use MSR CPSR_c, #0x1F then. Right?

[ This message was edited by: rickard.thorstensson on 25-04-2008 14:43 ]

truf9
Associate II
Posted on May 17, 2011 at 09:52

.equ Mode_IRQ, 0x12

msr CPSR_c, #Mode_IRQ|I_BIT|F_BIT

mov r13, r0 4

sub r0, r0, #IRQ_Stack_Size

-> This is the setting of the stackpointer in IRQ-mode in the startup-file

I do not want use nested interrupts. I don't think I need to use MSR CPSR_c, #0x1F then. Right? yes

mark9
Associate II
Posted on May 17, 2011 at 09:52

I would argue that either a FIQ interrupt is happening inside your IRQ ISR, or there is a silicon bug.

Do you have a dummy FIQ ISR handler installed at 0x1C?

rickard
Associate II
Posted on May 17, 2011 at 09:52

At 0x1C I have this instruction

ldr pc,0x3C ; pc,FIQ_Addr and at 0x3C:

FIQ_Addr: dcd 0x54

This is a dummy FIQ ISR as you mentioned.

In my startup file I have this inline

MSR CPSR_c, #Mode_FIQ|I_BIT|F_BIT

and Mode_FIQ is defined as:

.equ Mode_FIQ, 0x11 

If I want to disable FIQ interrupt can I just remove

MSR CPSR_c, #Mode_FIQ|I_BIT|F_BIT or is it something else I have to do?

If this is a silicon bug, somebody else must have seen it because it makes the cpu completely useless. Is there a problem for STR912 to handle many instructions in an interrupt?

mirou, eris: Have anyone of you moderators seen this problem?