cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with STM32F407 RAM

jforrest
Associate
Posted on April 20, 2012 at 09:21

I have recently assembled 6 PCBs with STM32F407IGT6 micros on them.

I seem to be having a problem reading from RAM.

This was first noticed when one of the boards was hard faulting and returning a non-precise bus error.

When this was tracked down through the stack pointer/program counter the problem occurred when data was being sent down USART2.

Instead of looking at 0x40004400 for the status register, the micro appearred to be trying to look at 0x48004400 which of course is outside the register addresses. This didn't happen everytime, just sometimes.

We put a conditional break point just before the register read checking for 0x48004400. When this broke we looked at the disassembly. It had a line:

ldr r0, [r4, #0]

When I looked at the memory location pointed to by r4 the memory said 0x40004400 however the value that ended up in r0 was 0x48004400. An extra bit had been set for no apparent reason.

The following code was then added as the first thing the micros did after initialisation:

// in INTs.

#define MEM_SIZE 2048

volatile uint32_t memtest[MEM_SIZE];

volatile uint32_t mem_errors;

void setMem(uint32_t val)

{

    uint32_t i;

    for (i = 0; i < MEM_SIZE; i++)

    {

        memtest[i] = val;

    }

}

void galTest()

{

    uint32_t testCell;

    setMem(0);

    for (testCell = 0; testCell < MEM_SIZE; testCell++)

    {

        uint32_t compare;

        memtest[testCell] = ~memtest[testCell];

        for (compare = 0; compare < MEM_SIZE; compare++)

        {

            if (compare == testCell)

            {

                continue;

            }

            uint32_t cmpVal = memtest[compare];

            uint32_t cellVal = memtest[testCell];

            if (cmpVal != 0)

            {

                mem_errors++;

            }

            else if (cellVal != ~cmpVal)

            {

                mem_errors++;

            }

        }

        memtest[testCell] = ~memtest[testCell];

    }

}

void doMemTest()

{

    mem_errors = 0;

    gpio_off(&runup1_pin);

    while (1)

    {

        galTest();

        if (mem_errors > 0)

            gpio_on(&runup1_pin);

    }

}

On 2 boards, mem_errors stays 0 for at least half an hour of run time.

On the other 4 boards mem_errors keeps ramping up. Slow on some boards, faster on others.

When debugging on the worst of these boards, if I pause at any point in time and look at memtest, approximately half the values in the array will be zero and half of the values in the array will be 0x08000000. The same bit that was causing the hard fault is set. This appears to be happening when the value is loaded from RAM into a register. Then when the value is stored from the register back into RAM it stays as 0x08000000.

Has anyone seen anything like this? It looks like a problem with the ICs themselves. I even swapped a good IC and a bad IC to make sure it was the micro and not the PCB that was causing the problem.

We are probably just going to buy some more micros, change them over and hope the problem goes away. But if anyone has any good ideas...
21 REPLIES 21
frankmeyer9
Associate II
Posted on April 20, 2012 at 10:59

I would suspect a suboptimal hardware design.

From the sheer number of VDD/GND pins this controller has, I'd guess they serve the purpose to supply different parts of the chip. This way, such suboptimal routing (high resistance or inductivity of copper lanes) would affect specific parts of the controller.

Can you try your application on a known-to-work reference design like the stm32f4-discovery board ? If it works there, it is a hint in this direction.

Posted on April 20, 2012 at 13:29

It seems odd that a stuck-at fault would be the same on multiple parts. Makes me think it's more of a critical path. Have you checked your clocks, and PLL, via the MCO pin? What speed are you running these parts? How about if you back off the speed by 10 or 20%? What voltage are you running the parts at?

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
jforrest
Associate
Posted on April 23, 2012 at 02:17

Have checked clocks and power (both 3V3 and the 1V2 core)

All look good.

We're running the micros at 3V3 and 168MHz.

Have slowed the clock down by more than half and it makes no difference at all.

To see if it was a problem with the PCB or the micro I took a PCB that was working and one that wasn't and swapped the micros. The problem followed the micro itself.

This bug is not on the discovery board (as it wouldn't be if there was a problem with this batch of micros) and its not on 2 out of our 6 micros.

As far as the routing goes. All VDD and GND pins have their own via near the pin to the power planes. Each power pin has its decoupling cap under the micro right next to the via. So I don't believe its sub-optimal routing. Especially since when I swap the micros the problem follows the micro and doesn't stay with the board.

frankmeyer9
Associate II
Posted on April 23, 2012 at 08:48

At first, I'm not a hardware guy, so I'm speaking partly of secondary experience.

> The problem followed the micro itself.

That's a clue, but not proof. The micros are not necessarily defect, there are often tolerances in the power reqirement versus clock frequency.

> Have slowed the clock down by more than half and it makes no difference at all.

Have you tried to slow down the clock even more ?

I would suggest something significantly slower, say 16 or 24MHz.

If it's a commercial project, you might ask ST for help. Taking the strong competition in the ARM-Cortex market into consideration, ST might be happy the keep a customer.

My company has good relation to a competitor of ST, basically in the 8-Bit field. Their micros are not actually good, but the service is.

emalund
Associate III
Posted on April 30, 2012 at 15:12

At first, I'm not a hardware guy, so I'm speaking partly of secondary experience.

well, I very much suspect hard... naah more specific LAYOUT. Check to see that EACH AND EVERY Vdd pin has a decoupling cap with traces no longer than 1cm.

<i>Have slowed the clock down by more than half and it makes no difference at all.</i>

it should not, layout problems do not relate to the clock frequency, but relate to rise and fall times.

Erik

holzleitner
Associate II
Posted on August 08, 2012 at 18:35

We have the same problem with ''ldr r0, [r4, #0]''!

Do you have any solution for this problem?

Posted on August 08, 2012 at 19:38

We have the same problem with ''ldr r0, [r4, #0]''! Do you have any solution for this problem?

Perhaps you can expand a little on your particular circumstances? Is this on some custom board? What are the markings on chip effected?

If it's a custom board, what kind of bulk capacitance do you have on the VCAPx pins, and have you verified the value of the placed parts?
Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
holzleitner
Associate II
Posted on August 08, 2012 at 20:03

It is a custom board.

µC STM32F205VET6 Rev Y

VCAP 2.2 uF 10 V 0603

We have 3 Boards tested. 2 Boards have this problem.

The Hard Fault occurs occasionally (not every time).

ldr r4, [pc, #108]   <-- Value 0x40020C00

movs r1, #32

ldr r3, [r4, #0]   <-- Hard Fault r4 = 0x46202120 (this value equals to asm ''movs r1, #32'')

When debugging with single step everything works.

When we paste a NOP before these instructions it works.

Since my last post we found something:

Decreasing the Sys-Clock frequency from 120MHz to 118MHz prevents the Hard Fault.

So we think we have a problem with the Oscillator.

emalund
Associate III
Posted on August 08, 2012 at 20:03

<i>If it's a custom board, what kind of bulk capacitance do you have on the VCAPx pins, and have you verified the value of the placed parts? </i>

It is not enough to have them, they must be properly laid out.

I have had problems with a chip where the i.. eh, layout person had a 4cm trace to a VCC decoupling cap. Verified by everything being great after soldering a decoupling cap directly across the chip.

Layout, these days with (sub)nanosecond rise and fall times, IS ''rocket science''

Erik