what can cause the STR912 to lock up so that the JTAG can't halt it?

mark9 · ‎2007-08-18

Posted on August 18, 2007 at 17:23

mark9 · ‎2011-05-17

Posted on May 17, 2011 at 09:46

This has happened a few times, and I don't know what is causing the problem. The problem is that something bad happens in my code and it hangs the ARM core, such that the JLINK debugger (in IAR EWARM) can not recover at all. It can't halt the core. And once it is in this state, I can not reprogram the part using EWARM (I have to use CAPS+Raisonance and erases the entire flash first).

What sort of software command could cause such a problem?

I have narrowed down the cases as to what is triggering this issue. If I compile my timer ISR using ''high'' optimization, it causes the CPU to lock up, while if it is compiled with ''medium'' optimization, it works fine.

ebrombaugh · ‎2011-05-17

Posted on May 17, 2011 at 09:46

Wow - good investigation. Looks like you've definitely uncovered some sort of pipelining confilct, either with the ARM processor, or with the peripheral bus interfaces.

I suppose you could narrow it down to the ARM core or the peripherals by trying the same instruction sequence, but use SRAM or Flash instead of the ADC. But that's beside the point I suppose - now we know not to do this.

I also had a lockup case which required using CAPS to erase, but I think it was one of the documented issues. Nothing so subtle as a pipeline optimization problem.

Eric

mark9 · ‎2011-05-17

Posted on May 17, 2011 at 09:46

Well, after a lot of hacking, I think I found the hardware issue on the STR912. I'm pretty sure this is a hardware bug. The problem is that the CPU locks up, and the way CPUs usually lock up is some timing bug in the memory pipeline. And when you halt the CPU using the debugger, the pipeline gets flushed so it is impossible to debug pipeline issues using a stepwise debugger.

The bug is pretty consistent. It seems to be related to a read of the hardware ADC register immediately after writing to a SRAM register (LDRH after STR). It comes down to these instructions:

Does not work:

LDR R2,??adc_address ;; 0x5c00a01c

STR R3,[R0, #+0] ;; save adcBufferWritePtr to internal SRAM

LDRH R2,[R2, #+0] ;; read adc from hardware

Does not work (added NOP to beginning, more NOPS don't help either)

NOP

LDR R2,??adc_address ;; 0x5c00a01c

STR R3,[R0, #+0] ;; save adcBufferWritePtr to internal SRAM

LDRH R2,[R2, #+0] ;; read adc from hardware

Works ! (added NOP after store)

LDR R2,??adc_address ;; 0x5c00a01c

STR R3,[R0, #+0] ;; save adcBufferWritePtr

NOP

LDRH R2,[R2, #+0] ;; read adc

Works (reordered STR and LDR)

STR R3,[R0, #+0] ;; save adcBufferWritePtr

LDR R2,??adc_address ;; 0x5c00a01c

LDRH R2,[R2, #+0] ;; read adc

Works: (got rid of the STR (in fact, it is not even necessary, but that is a different story))

LDR R2,??adc_address ;; 0x5c00a01c

LDRH R2,[R2, #+0] ;; read adc

I'm trying to get confirmation from ST Tech Support on this problem so that it doesn't happen again. I don't see it in the Errata.

-Mark

mark9 · ‎2011-05-17

Posted on May 17, 2011 at 09:46

After more experiments, I found it is just the timing of the two LDs

Fails:

LDR R2,??adc_address ;; 0x5c00a01c

NOP

LDRH R2,[R2, #+0] ;; read adc from hardware

Works:

LDR R2,??adc_address ;; 0x5c00a01c

LDRH R2,[R2, #+0] ;; read adc from hardware

Works:

LDR R2,??adc_address ;; 0x5c00a01c

NOP

LDRH R2,[R2, #+0] ;; read adc from hardware

However, I can't reproduce this outside of the ISR.

-Mark

mark9 · ‎2011-05-17

Posted on May 17, 2011 at 09:46

In case any one is wondering, this problem also happens on the 'FA (rev G) part. We finally got samples yesterday and tried them out.

I've got an email dialog going with a ST engineer on this ... they don't seem to recognize this as a known issue.

The troubling thing is that I can't reproduce this problem if it is not inside the ISR.

Eric, good idea. I'll change the address to flash, SRAM, external SRAM, or another APB periph address and see if it still happens.

-Mark