stm32f405 random hard, usage, bus ore memory-fault

s239955_stm1 · ‎2013-12-05

Posted on December 05, 2013 at 12:17

Hey there,

Our problem is that we get random faults (memory, bus or usage)!

The handlers are calling sporadically between 10s to some hours! But we have also pcb�s without any issues!

We are using a custom environment with a STM32F405ZGT and an external PSRAM!

From software side we have the Keil os without micro lib (due to usage of a cpp library)

We are using excessive dynamic memory but only in external RAM! All other RAM access goes to internal one.

A further interface we are using in a high priority is the SPI2 interface with DMA functionality!

If we stop in an error handler, we recognize invalid addresses in pc (like: 0xE0C001E0; 0x804AED2; 0xFFFFFFFE; 0x46604630 or 0x00000000)

Does anybody have an idea?

#stm32f4 #stm32 #fault

d2399 · ‎2013-12-06

Posted on December 06, 2013 at 14:41

Hi,

I work with sebbo at this issue!

Actual I've 6 pcb's with 130MHz running! Since 12 o'clock I've no issue on it.

(I let it run for some more hours to ensure the statement)

Then I've 4 pcb's

with 168MHz system clock and I've slow down every FSMC clock parameter by 1 HCLK! In this case I got the errors

!

I'm also confused if it is a HW or SW issue!?

d2399 · ‎2013-12-06

Posted on December 06, 2013 at 14:51

Another point of view is that we have pcb’s, those are running at 21 degree’s with no issues but at -10 degree they crash’s !

We also do tests with some capacities (33pF) at the control lines to the RAM…that shifts the issue a little bit!

The issue is coming no more often…but that could not a solution for the issues!

I believe the capacities and the temperature are shifting the timing a little bit!

waclawek.jan · ‎2013-12-06

Posted on December 06, 2013 at 15:21

Do you set the FSMC pins' speed at 100MHz?

Do you switch on the compensation cell?

[EDIT] have looked into grounding and decoupling issues at and around the mcu and PSRAM (i.e. experimenting with ground ''enforced'' by external wires, adding small ceramic capacitors as close to supply pins as possible)

JW

d2399 · ‎2013-12-09

Posted on December 09, 2013 at 10:03

Hi Jan,

>Do you set the FSMC pins' speed at 100MHz?

yes we setup the speed to 100MHz!

>Do you switch on the compensation cell?

No...we had not switched on this compensation!

In the morning I've switched on the CMP_PD bit and the issue is much more rarely!

It looks much better...

>experimenting with ground ''enforced'' by external wires, adding small ceramic >capacitors as close to supply pins as possible)

we did much such things but with no success. But maybe it is a combination of many little things!?

Sabbo is testing the CCM memory...we are not using before! That was also a good hint from clive! Sabbo is getting some ''memory manage faults'' after a short while, if the stacks are running in the CCM memory!

As third we try to get running the trace lines for postmortem analysis!

Tesla DeLorean · ‎2013-12-09

Posted on December 09, 2013 at 13:16

How about looking at the external bus with a logic analyzer, and triggering on the faults?

CCM won't support DMA, but is a good place for stacks and variables.

External Memory has significantly lower bandwidth than internal RAM, where possible DMA into internal RAM, there will be less/shorter contention.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

d2399 · ‎2013-12-09

Posted on December 09, 2013 at 17:31

>CCM won't support DMA, but is a good place for stacks and variables.

yes...we've shifted the stacks to CCM, but the issue is still alive!

>External Memory has significantly lower bandwidth than internal RAM, where possible >DMA into internal RAM, there will be less/shorter contention.

the external RAM is already used from heap only...

(actual we don't get our Ulink pro to run with the trace option!? What is the best way: 4 wire ETM trace? We got ''Data Stream Error'' after some seconds!)

>How about looking at the external bus with a logic analyzer, and triggering on the faults?

I think that's not an option for us we have no experience with that.

Tesla DeLorean · ‎2013-12-09

Posted on December 09, 2013 at 17:40

>How about looking at the external bus with a logic analyzer, and triggering on the faults?

I think that's not an option for us we have no experience with that.

Unfortunately if you have a timing issue this might represent the most effective way to nail it down. If you can't rent or attach an analyzer consider a multi-channel scope looking at the control signals, and validating the device timing against the specs for the external memory?

The processor trace may tell you how you arrived at a particular event, as would adequate telemetry, but external bus activity and DMA may not be apparent. ie Initiation of a DMA transfer, or multiple active transfers, etc.

Consider derating the settings for the external memory, and reducing the SPI bandwidth.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

s239955_stm1 · ‎2013-12-13

Posted on December 13, 2013 at 12:17

At First thx a lot for your help, you definitely brought us closer to a solution, but the story is not finished yet.

Something new:

We review the configuration of the FSMC and there were two thins enabled we are not using- the data/address-multiplexer and the wait-signal usage -. If I just disable the Data/Address-Mux, I get an interesting effect. The RAM is still accessible (the RAM is tested successful every start-up), but the application crashes much earlier than with enabled Mux. Decreasing the clock of the RAM-Module has no effect on this behaviour.

In the last days we did the following tests to narrow the failure down, again:

The library we are using (cpp lib - look first msg in the thread -) needs a lot of heap we are actual locating to the external PSRAM. Now we made a SW, were we map this heap to the internal SRAM and map all other data and stacks to the CCM. In the external RAM we are just having uncritical data for example buffers for DMA-data, but no pointers ore stacks. With this SW at now we couldn’t detect any failures or crashes. In the next step, to get a bit nearer to this, we map two stacks to the external PSRAM. This SW runs just for a few minutes.

So we are quite sure, that the central point of the problem is the connection between the controller and the PSRAM. But at now we were not able to nail down the reason. At one device we had seen thin spikes on the #CE but we are not sure about that because there can be coupling effects by the oscilloscope. The logic-analyser also show that needles. The HW-configuration is orientated at the guidelines from ST. The implementation of small caps ore resistors at the lines brought no improvement.

At the next step of testing we changed the controller to STM32f427ZG with the PSRAM but the Problem is just the same.

In the next days we want to try the STM32F405ZG with an SRAM and the new STM32F427ZG with an SDRAM.

Do you have any suggestions regarding to the noise at the RAM-lines?

What influence does the multiplex-module have on the communication?

Can you give us some advice how we have to configure the FSMC-Registers for the PSRAM (IS66WVE2M16DBLL)?

Tesla DeLorean · ‎2013-12-14

Posted on December 14, 2013 at 14:18

The part doesn't use multiplexing (ie where address and data lines share pins at different phases during access)

SRAM or PSRAM configurations should be appropriate.

What's the current configuration look like? 70 ns timing?

What does the memtest code look like? This should really be able to catch failures if done aggressively. Is it done in assembler? Does it write random patterns?

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

d2399 · ‎2013-12-16

Posted on December 16, 2013 at 09:18

Thats our basic configuration:

p.FSMC_AddressSetupTime = 0x4;// (1/168MHz) * 4 -> 23,8ns

p.FSMC_AddressHoldTime = 0x4;// (1/168MHz) * 4 -> 23,8ns

p.FSMC_DataSetupTime = 0x04;// (1/168MHz) * 4 -> 23,8ns

p.FSMC_BusTurnAroundDuration = 0x0;// no HCLK cycles added

p.FSMC_CLKDivision = 0x1;// clock divided by 2

p.FSMC_DataLatency = 0x2;// 1/(168MHz/(FSMC_CLKDivision+1)) * (2+2) -> 47,6ns

p.FSMC_AccessMode = FSMC_AccessMode_A;

FSMC_NORSRAMInitStructure.FSMC_Bank = FSMC_Bank1_NORSRAM1;

FSMC_NORSRAMInitStructure.FSMC_DataAddressMux = FSMC_DataAddressMux_Enable;

FSMC_NORSRAMInitStructure.FSMC_MemoryType = FSMC_MemoryType_PSRAM;

FSMC_NORSRAMInitStructure.FSMC_MemoryDataWidth = FSMC_MemoryDataWidth_16b;

FSMC_NORSRAMInitStructure.FSMC_BurstAccessMode = FSMC_BurstAccessMode_Disable;

FSMC_NORSRAMInitStructure.FSMC_AsynchronousWait = FSMC_AsynchronousWait_Disable;

FSMC_NORSRAMInitStructure.FSMC_WaitSignalPolarity = FSMC_WaitSignalPolarity_Low;

FSMC_NORSRAMInitStructure.FSMC_WrapMode = FSMC_WrapMode_Disable;

FSMC_NORSRAMInitStructure.FSMC_WaitSignalActive = FSMC_WaitSignalActive_BeforeWaitState;

FSMC_NORSRAMInitStructure.FSMC_WriteOperation = FSMC_WriteOperation_Enable;

FSMC_NORSRAMInitStructure.FSMC_WaitSignal = FSMC_WaitSignal_Enable;

FSMC_NORSRAMInitStructure.FSMC_ExtendedMode = FSMC_ExtendedMode_Disable;

FSMC_NORSRAMInitStructure.FSMC_WriteBurst = FSMC_WriteBurst_Disable;

FSMC_NORSRAMInitStructure.FSMC_ReadWriteTimingStruct = &p;

FSMC_NORSRAMInitStructure.FSMC_WriteTimingStruct = &p;

Then I test with

p.FSMC_AddressSetupTime = 0xB;// (1/168MHz) * 12 ->70,8ns

p.FSMC_AddressHoldTime = 0xB;// (1/168MHz) * 12 -> 70,8ns

p.FSMC_DataSetupTime = 0x0B;// (1/168MHz) * 12 -> 70,8ns

p.FSMC_BusTurnAroundDuration = 0x0;// no HCLK cycles added

p.FSMC_CLKDivision = 0x1;// clock divided by 2

p.FSMC_DataLatency = 0x0;

p.FSMC_AccessMode = FSMC_AccessMode_A;

FSMC_NORSRAMInitStructure.FSMC_DataAddressMux = FSMC_DataAddressMux_Disable;

In both configurations we have the issue!