Previous Post Title:
Help me ST community... you're my only hope. STM32F0 'resets' (???) on interrupt?
I wrote most of this earlier today, but have since then solved the problem. After 5-6 days of experiencing problems that completely prevented ongoing development. Hopefully this might help someone, and there's still a question to be answered by ST!!!
Okay, between the two of us working on this we have more than 50 years experience in the embedded industry. Personally, we each have around seven years experience working with STM32F1's, and over the last 18 months, the STM32F0's. My workmate, the electronics engineer who designed these boards, has 30+ years as an electronics engineer mostly working with processors & fpga's that make the F1 & F0 look like featherweight contenders. I'm saying this purely to provide some context that we are not your average joe-blogs hobbyist... although over the last week I have definitely felt like one.
I have been absolutely humbled by this problem. We have exhausted every possibility we can think of... with the exception of one or two which I will outline below.
We have two boards that have been created. Both are designed to use the STM32F091CC for development, and then also the 030 and 051 for release.
This problem only seems to exist on board #1. Board #2 uses the identical processor, and uses a very similar electronics design but are different designs and PCBs. This is from board #1.
The external resonator has been replaced a couple of times, and the HSI has also been used. The system init call was generated using the excel spreadsheet "STM32F0xx_Clock_Configuration_V1.0.1"
Since i can't work out how to paste formatted source code, I need to include a screenshot... which is a bit of a pain. This is about the simplest example I've boiled it down to that demonstrates the problem.
So, I can connect to the processor, load code, and begin debugging... On board #2, with the exact same processor, runs happily, but not achieving a great deal. Great! Board #1 however has some issues. I've made up two units of board #1, both experience the same symptoms. I've also got two of board #2, neither of which has any problems... So the immediate thing is to think that it's a board design issue eh? The processor hasn't changed. The code hasn't changed, but the board layout/design has. That must be it... except we can't figure out how that is possible.
Now, specifically, the problem seems to be that when the SysTick interrupt occurs (although I suspect any interrupt will cause this). Here's a snapshot of the system state just as the SysTick counter reloads and sets its' interrupt flag - On Board #1
One more step takes us way the hell out in to no-mans-land...
Running the exact same code on Board #2, the action of just one more step takes us exactly where we expect to end up
What the hell is going on here...?!
- Replaced the resonator
- Used the internal oscillator
- Reduced clock speeds
- Replaced decoupling capacitors, and tried changing values (remembering that the design surrounding the two processors on the different boards is identical)
- Replaced the processor... twice
- Bypassed the 12V to 3V3 power supply completely
- Monitored supply pins using micro probes, looking for supply glitches
- Tested various software combinations on different boards
And that's ignoring all the software investigation that took place as well... vector table placement, clock setup, etc... Because a month ago (I left this bit out), it was working... Well, at least my development board was. And now it's not. So what's changed... let's spend copious amounts of time looking through source changes.
And as luck would have it, in the last two hours since starting to write this, we've solved the problem. But because the whole thing is so messed up and unusual it didn't occur to us to look there just yet. As is often the case, explaining a problem in-depth, anticipating questions that might get asked and trying to head those off, resulted in the discovery of the cause.
It turns out R8 was missing (context shortly). How did we work this out? Well, as I was writing this I started to wonder why this weird area of memory looked a little too structured... Oh wait, that's system memory. That's just weird... Why are we in system memory after an interrupt? Okay, so that's the bootloader... wait, what? That's just weird... Oh well, i'll just check the Boot pins. Yep, looks fine... i'll just have a look at the PCB as well. Hey, why the hell is R8 not on the board. It's not marked as DNF... It is on the BOM, but it wasn't in the part designator list that had been entered in to Digikey against each part. The other 25 something 10K resistors were listed properly on the received packaging from DK, but R8 wasn't there... and so the numpty technician didn't fit it when building the prototypes. Which I think was fair enough, in a way... since it was me. Dammit.
But, the reason for continuing this post is to help some other poor sod (kiwi/aussie slang for "poor *******") who ends up in a similar situation. The question now being...
Why does the processor jump to System Memory when hitting the SysTick interrupt event? And why does the processor do that even after booting and has already inspected the state of the boot0 pin?
As is customary in this industry, I first found this problem last weekend... the night before the "big demo". Why hadn't I noticed this before hand? No idea... It also explains why the units would drop offline even after running for a few minutes. Well, it doesn't really... but it does given how even a SysTick interrupt was causing this issue. I also discovered that I believe any interrupt (or at least the ADC and timers) were also having a similar effect. Thankfully for me, it still worked three out of five times. Turn it off and back on and it'd work... It was enough to give the demo. Phew.
So, maybe this will help someone. But bloody hell... seriously? WTF. I could completely understand if it was going in to ISP mode at the time of power on. But part-way through operation?
I'd appreciate some insight as to why this was taking place, even though it's not an issue any longer...
Hopefully you found this amusing :-) I know I sure as hell didn't... until about 30 minutes ago.