Hard - to catch - fault

root · ‎2013-06-04

Posted on June 04, 2013 at 09:50

Hello,

I have a kind of strange issue. When I really stress my system (it handles serial packets, get the right data, and send it via serial, I send thousands of messages without waiting for reply), in an unpredictable way, I sometimes get an invalid PC load usage fault of invalid state fault.

When I get an invalid PC load usage, the program counter is like 0x0 or 0x5, or sometimes it contains a ram address, but I don't have code in ram, and looking at the stack trace, I have the feeling there is a stack pointer corruption somewhere because some of the registers have flash address of branch code in them (and LR has weird stuff, obviously not flash code nor ram address).

Here are my stack traces :

****************************

HARD FAULT !

Stack = 0x20000660

Invalid PC load usage fault at

Program counter = 0x200082B0

Stack frame :

R0 = 0x400264B8

R1 = 0x20008318

R2 = 0x3C

R3 = 0x200082B0

R12 = 0x0

LR = 0x8002417

PC = 0x200082B0

PSR = 0x20008318

****************************

Or :

****************************

HARD FAULT !

Stack = 0x20000688

Invalid state usage fault at

Program counter = 0x20008270

Stack frame :

R0 = 0x20008288

R1 = 0x20008EA8

R2 = 0x3C

R3 = 0x200082B0

R12 = 0x0

LR = 0x20008270

PC = 0x20008270

PSR = 0x20000200

****************************

Or again:

****************************

HARD FAULT !

Stack = 0x20000670

Invalid PC load usage fault at

Program counter = 0x1

Stack frame :

R0 = 0x0

R1 = 0x80023D7

R2 = 0x8003B26

R3 = 0x21000200

R12 = 0x0

LR = 0x8003279

PC = 0x1

PSR = 0x200082B0

****************************

The problem seems to happen (tried to track it down but it's very hard) on the service call interrupt exit after a malloc call (but there is like 10000 malloc calls without problem first).

My process stacks are far from full (half empty at min), I have 8k of system stack. The hard fault happens with user stack, but again, seems to trigger when popping rgisters at service call exit.

Spent about 10 hours trying to fix this, but no luck so far, do youguys have any advice for me?

Thomas.

#dma2

root · ‎2013-06-04

Posted on June 04, 2013 at 12:41

Hello,

I'm using TASKING for ARM. I don't have any active breakpoint and the default views (they are refreshing only when process stops).

I now processed >2M packets at full speed, and still working good (J-link is physically connected but not attached).

Thomas.

root · ‎2013-06-04

Posted on June 04, 2013 at 13:25

Hello,

It's now running for more than half an hours, processed millions of packets, and still running.

What do I do now ? Just classify it as Shroedingbug and continue, doesn't seem like a clever idea ...

Thomas.

jpeacock2399 · ‎2013-06-04

Posted on June 04, 2013 at 16:14

I've had similar problems with DMA2. In my case it manifests as spurious interrupts from the NVIC. I found that turning on the FIFO mode in DMA fixes the problem, same as you. There are some known errata on DMA2, perhaps there are some additional problems with the FIFO unit not yet documented.

Jack Peacock

root · ‎2013-06-04

Posted on June 04, 2013 at 16:40

Hello,

I turned on FIFO, but still have the hard faults.

Thomas.

Amel NASRI · ‎2013-07-03

Posted on July 03, 2013 at 09:45

Hello Thomas,

Could you please let me know which product are you using?

Thanks,

-Mayla-

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

root · ‎2013-08-14

Posted on August 14, 2013 at 18:22

Hello,

Using STM32F205RCT6. I'm already using the same reference for other projects.

I'm trying to use this code for a new project (for now it is a simple pass through serial to serial, for testing).

These hard faults are VERY hard to fix because it seems they happen anywhere is the code, and the level of optimization seems to change the occurence rate (on some code it runs quite good on level 0 optimization, pretty much the same with level 2, doesn't work at all on level 2, and work for a few iterations with level 3 before it crashes), plus the fault seems to happen a LOT less when the debugger is not connected.

All this leads me to think that there is a timing problem somewhere (the debugger probably affects the speed a bit too), but where ????

Now I seem to get almost always wrong PC at 0x01, and it still looks like a faulty stack (looks like everything is moved one rank, like stacked PSR contains a memory address when hard fault triggers).

I really need some help or at least some advice on how to track it down !

Thomas.

Tesla DeLorean · ‎2013-08-14

Posted on August 14, 2013 at 19:25

Can you fill the stack or add some guards to make sure it's not going wrong there?

I remember a number of people tickling a prefetch problem with GNU/GCC compilers. The ART seems to have a critical path errata, normally tickled with supplies <2.1V. I'll see if I can pin down a cite for the code generated. A PC relative LDR as I recall.

Does attaching a debugger mess with the supply voltage?

http://www.st.com/st-web-ui/static/active/en/resource/technical/document/errata_sheet/DM00027213.pdf

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2013-08-14

Posted on August 14, 2013 at 19:27

https://my.st.com/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Flat.aspx?RootFolder=https://my.st.com/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Problem%20with%20STM32F407%20RAM&FolderCTID=0x01200200770978C69A1141439FE559EB459D7580009C4E14902C3CDE46A77F0FFD06506F5B&currentviews=819

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

root · ‎2013-08-16

Posted on August 16, 2013 at 10:46

Hello,

Power supply is 3.3V coming out of an M5239 LDO (powered by USB or lab power supply), that's the same power stage I'm using on pretty much all projects I design, never had a problem.

Compiler is the one from TASKING, which is supposed not to be a derivate from GCC/GNU.

I will do some tests disabling some of the optimizations (ART, prefetch, etc) ...

Thomas.

[EDIT] I'm using it at 120MHz with ''only'' 3 flash wait states, I'll also try with 4 wait states.

But I already tested it at 40MHz with 1 wait state, which means lower flash speed, with same result.

root · ‎2013-08-16

Posted on August 16, 2013 at 11:03

Well I just tried with flash state = 5 and all optimizations disabled (no cache, no prefetch), with the exact same result (hard fault invalid state usage at PC = 0x0).

So not related to flash speed.

Thomas.