cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with STM32F407 RAM

jforrest
Associate
Posted on April 20, 2012 at 09:21

I have recently assembled 6 PCBs with STM32F407IGT6 micros on them.

I seem to be having a problem reading from RAM.

This was first noticed when one of the boards was hard faulting and returning a non-precise bus error.

When this was tracked down through the stack pointer/program counter the problem occurred when data was being sent down USART2.

Instead of looking at 0x40004400 for the status register, the micro appearred to be trying to look at 0x48004400 which of course is outside the register addresses. This didn't happen everytime, just sometimes.

We put a conditional break point just before the register read checking for 0x48004400. When this broke we looked at the disassembly. It had a line:

ldr r0, [r4, #0]

When I looked at the memory location pointed to by r4 the memory said 0x40004400 however the value that ended up in r0 was 0x48004400. An extra bit had been set for no apparent reason.

The following code was then added as the first thing the micros did after initialisation:

// in INTs.

#define MEM_SIZE 2048

volatile uint32_t memtest[MEM_SIZE];

volatile uint32_t mem_errors;

void setMem(uint32_t val)

{

    uint32_t i;

    for (i = 0; i < MEM_SIZE; i++)

    {

        memtest[i] = val;

    }

}

void galTest()

{

    uint32_t testCell;

    setMem(0);

    for (testCell = 0; testCell < MEM_SIZE; testCell++)

    {

        uint32_t compare;

        memtest[testCell] = ~memtest[testCell];

        for (compare = 0; compare < MEM_SIZE; compare++)

        {

            if (compare == testCell)

            {

                continue;

            }

            uint32_t cmpVal = memtest[compare];

            uint32_t cellVal = memtest[testCell];

            if (cmpVal != 0)

            {

                mem_errors++;

            }

            else if (cellVal != ~cmpVal)

            {

                mem_errors++;

            }

        }

        memtest[testCell] = ~memtest[testCell];

    }

}

void doMemTest()

{

    mem_errors = 0;

    gpio_off(&runup1_pin);

    while (1)

    {

        galTest();

        if (mem_errors > 0)

            gpio_on(&runup1_pin);

    }

}

On 2 boards, mem_errors stays 0 for at least half an hour of run time.

On the other 4 boards mem_errors keeps ramping up. Slow on some boards, faster on others.

When debugging on the worst of these boards, if I pause at any point in time and look at memtest, approximately half the values in the array will be zero and half of the values in the array will be 0x08000000. The same bit that was causing the hard fault is set. This appears to be happening when the value is loaded from RAM into a register. Then when the value is stored from the register back into RAM it stays as 0x08000000.

Has anyone seen anything like this? It looks like a problem with the ICs themselves. I even swapped a good IC and a bad IC to make sure it was the micro and not the PCB that was causing the problem.

We are probably just going to buy some more micros, change them over and hope the problem goes away. But if anyone has any good ideas...
21 REPLIES 21
holzleitner
Associate II
Posted on August 08, 2012 at 21:02

The traces between the µC and the VCAPs are ~3mm.

We have replaced several times the VCAPs.

We also added a resistor to increase the ESR.

When we change the flash wait state from 3 to 4, the hard fault doesn't occur.

Also disabling the instruction cache helps.

Posted on August 08, 2012 at 21:06

I was trying to glean several pieces of information, and postulated the questions purposefully.

In fact I think we've obtained a number of useful observations. Including that this was an F2 device on a custom board.

Someone last week had a problem with the VCAP's being placed as 2.2nF parts, and having seriously aberrant behaviour.

The one that interests me the most is the value loaded by R4. If this is the case I'd say there is a serious problem with the ART device. You look to be receiving the prefetch value for an instruction, and not the value you want.

I'd start by disabling ART, and making sure that the flash wait states were at least 4 or 5 cycles @ 120 MHz.

I think if it were a systemic Cortex M3/M4 problem we'd be seeing it a lot more.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on August 08, 2012 at 21:16

Thanks Alex, messages crossed on the wire.

For the F2

The Flash memory access time is adjusted to fHCLK frequency (0 wait state from 0 to 30 MHz, 1 wait state from 30 to 60 MHz, 2 wait states from 60 to 90 MHz and 3 wait states from 90 to 120 MHz).

For the F4

The Flash memory access time is adjusted to fHCLK frequency (0 wait state from 0 to 30 MHz, 1 wait state from 30 to 60 MHz, 2 wait states from 60 to 90 MHz, 3 wait states from 90 to 120 MHz, 4 wait states from 120 to 150 MHz, and 5 wait states from 150 to 168 MHz,).

At 120 MHz you're pushing the boundaries of the flash access speed, 33ns being quite optimistic (35-42). I'd probably err on the 4 wait states, and let the ART compensate.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on August 08, 2012 at 21:23

http://www.st.com/internet/com/TECHNICAL_RESOURCES/TECHNICAL_LITERATURE/ERRATA_SHEET/DM00027213.pdf

The ART Accelerator prefetch queue instruction is not supported when VDD is lower than 2.1 V.

A critical path? What if the frequency is a tad above 120 MHz?

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
emalund
Associate III
Posted on August 08, 2012 at 21:46

Just happened to see this

Problem with STM32F407

We're running the micros at 3V3 and 168MHz.

When we change the flash wait state from 3 to 4, the hard fault doesn't occur

For the F4

The Flash memory access time is adjusted to fHCLK frequency (0 wait state from 0 to 30 MHz, 1 wait state from 30 to 60 MHz, 2 wait states from 60 to 90 MHz, 3 wait states from 90 to 120 MHz, 4 wait states from 120 to 150 MHz, and 5 wait states from 150 to 168 MHz,).

Erik

holzleitner
Associate II
Posted on August 08, 2012 at 21:49

Thanks for your reply!

Power Supply is 3,3 V

At a other PCB we have an unaligned hard fault @ ldr r4, [pc, #108] (first assembler instruction from the assembler code above)

We cooled down the oscillator (slow down frequency) and the hard fault doesn't occur. Also when we decrease the PLL the hard fault doesn't occur.

So that would indicate to the problem for to high frequency.

But when we increase the system clock from the working PCB up to 125MHz (changing PLL) everything works fine. Could that be a manufacturing tolarance of the silicone?

Tomorrow we will measure the exact frequency from the oscillator.

Posted on August 08, 2012 at 21:57

Just happened to see this

Erik, be aware that Alex is using an F205 device.
Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on August 08, 2012 at 22:09

Tomorrow we will measure the exact frequency from the oscillator.

Measuring it directly may impact the speed, suggest you instead look at the MCO pin and feed out one of the internal clocks, or perhaps PLL Div 2

CMOS silicon processes try to fabricate the wafer/die so that they fall within a certain window. The corners of this window are bounded by the fastness or slowness of the P and N transistors. Critical paths in silicon designs are where there is a long propagation time in the signal from one synchronous flip-flop to the next. If the signal fails to meet the setup/hold times required the circuit will malfunction. This therefore limits the upper frequency at which a design will function. Voltage and temperature will also impact the speed. The errata suggests part of the ART unit on the Y revision die fails as the input voltage drops, this is indicative of a marginal timing issue.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
emalund
Associate III
Posted on August 09, 2012 at 15:07

Erik, be aware that Alex is using an F205 device.

Then why is the thread titled

Problem with STM32F407 RAM

emalund
Associate III
Posted on August 09, 2012 at 15:07

sorry, double posted.  The usual: go error, reposted and found out that the error did post