2020-11-08 09:03 AM
I probably should have posted this many weeks ago but after the hell I've been through I thought it worth posting this as a question that I now believe I've answered myself but in the hopes of saving someone else a few weeks of hell here goes.
For the past 3 months we've been finalizing a design built around a STM32F205. We're using both a 64 and a 100 pin version of this chip in 2 different versions of the product. Unlike our last product, this time we went without an external clock crystal (HSE) and went instead with the internal 16 MHz HSI clock source. We set the system clock at 120 MHz - the rated speed of this chip family. We don't use USB so timing precision wasn't top of mind.
After we got up the learning curve and had the application code working, we started getting intermittent lock-ups....hard faults. Sometimes it would happen immediately, sometimes it would take a few minutes, and sometimes it would go hours and even days....and then lock up.
We edited and refined the application code endlessly looking for the culprit. We wrung out practically every possible bad coding practice and loose end we could find. Nothing would stop the lock-ups which remained unpredictable but inevitable.
After weeks of having tried everything else, including reverting all the code from C++ back to C (didn't help), we finally dropped the clock rate from 120 to 100 MHz, adjusted the timer settings accordingly, and boom.....no more lock-ups.
I've looked high and low on the internet and have not found any clear cautionary guidance about this kind of problem and this solution. I've no idea if this is commonplace or just some obscure problem with this chip model.
So much time spent on something with such a simple solution. So it goes.
Cheers,
Morten
Solved! Go to Solution.
2020-11-08 10:06 AM
After a hard fault, you should be able to get some more information as to why it happened by looking at the SCB registers. One possibility is the wait states are set too low for the speed/voltage you're running out, which will cause this behavior.
2020-11-08 10:06 AM
After a hard fault, you should be able to get some more information as to why it happened by looking at the SCB registers. One possibility is the wait states are set too low for the speed/voltage you're running out, which will cause this behavior.
2020-11-08 11:41 AM
Yes, since we are using Crossworks we have had access to a decent debugger and related tools plus we added code to the Hard Fault handler to "trap" the registers for post-fault analysis. It was all very educational yet the results remained ambiguous. Went chasing down a good many rabbit holes. Never did find the "one big thing" in the code that could explain the hard faulting.
Although I've years of programming and product development under my belt, the learning curve never ends and I readily admit to not appreciating the significance of certain clock settings including the wait states. It's now fairly clear that we had unknowingly run at too high a clock rate for the wait state setting....and needed to either back down on the clock rate or increase the wait state.
And thanks for the reply.
Cheers,
Morten
2020-11-08 12:53 PM
It still may be coincidental. I personally would try to wire up an oscillator to HSE, and I would review power supply and ground, especially if there are power circuits around.
I would recommend trying the software with as little modifications as possible on a "known good" board, but 'F2 has probably only the EVAL; although probably on a 'F4 Nucleo the target could be swapped for a 'F2.
JW
2020-11-08 01:05 PM
There is one Nucleo-144 also. :)
2020-11-08 01:10 PM
The F2 definitely had critical path issues with the ART / Prefetch implementation.
We migrated one design to the F4, and the primary F2 design clocked at 64 MHz to accommodate some 512 KHz, 256 KHz clock generation requirements.
2020-11-09 12:00 AM
I stand corrected.
Thanks.
JW
2020-11-09 05:12 AM
Thanks to your heads up regarding wait states, it appears we were shooting ourselves in the foot all along. The STM32F2 documentation clearly shows the need for 3 wait states when running with a 90-120 MHz clock. Our clock init code had only 2 wait states. It works with just 2, but not reliably. We are now running 3 wait states and haven't had a single fault in over 24 hours of non-stop running.