cancel
Showing results for 
Search instead for 
Did you mean: 

The MCU requires NRST to be toggled after power on

RobertK
Associate III

Hi all, I have a problem with a new version of an existing PCB using the STM32F334C8T6 microcontroller.

Known working code is loaded onto the new PCB, power is applied and the MCU starts up the clock and then seems to sit idle. If the NRST pin is then driven low and released the MCU starts up and begins operating normally. Holding NRST low on power up and then releasing it after a duration does not work, the MCU must attempt to start up and fail first. NRST is connected to a 10kΩ to +3V3 and a 100nF to 0V. BOOT0 is connected to a 10kΩ to 0V.

PXL_20230922_103355658.jpg

The signals here (top to bottom) are:
old PCB +3V3
old PCB NRST
new PCB +3V3
new PCB NRST
While both boards were powered at the same time the new board has an additional SMPS stage which adds a small delay to the rise of the +3V3 rail.

PXL_20230927_100625674.jpg

The signals here (top to bottom) are:
old PCB NRST
old PCB 8.000MHz crystal
new PCB NRST
new PCB 8.000MHz crystal
I've gone through a couple of stages of being suspicious of the clock as it gets enabled by the code, but then the code seems to stop/stall. However the scope traces don't show any significant problems. I've also compared the clock signal after the reset with the MCU running code and it looks identical to the above scope trace where the MCU is stalled. 

The new PCB does have a different 8.000MHz crystal so I may have got the load capacitors wrong. The new board is using this crystal from JLC with 2x 22pF 0402 C0G load capacitors. Is this correct? Should I be using a different value?

There are other minor differences between the two boards, but nothing that would suggest this issue. e.g. some pins no longer have a 1kΩ resistor linking them even though one pin was never enabled/driven in the code.

I'm running out of hair to tear out so welcome any suggestions of what the issue might be or further debugging steps. Thanks!

36 REPLIES 36
BarryWhit
Senior III

I have no idea why this is happening. Just a few things I notice:

1. I second admonition from above, that you should not probe your crystal, those are high impedance and the additional capacitance from the probe will disrupt operation (at the very least, you can't rely on the frequency you measure). You can use the "Master Clock output" feature of the RCC to route a clock out to a pin for probing. That should not affect the operation of the Crystal circuit itself.

2. The nRST RC circuit you described, has a time constant which means that the overall transition lasts around 3ms. Now, some datasheets (out there in the world) usually tell you to assert reset for some period of time (which can last ms), but none of them want you to feed such a slow transition to the input. I haven't checked if the nRST has a schmitt-trigger input, but in some parallel-universe, your MCU may not like this.

3. Your PCB layout does not follow many of the recommended practices set out in AN2867.

In particular, your have not used a GND guard ring, and the traces from the MCU to the oscillator look longer than they need to be (the appnote includes an example gerber).

4. The layout issues may be exacerbated if you are you near any source of EMI? fluorescent lights? old test equipment? Try another room away from equipment. If anyone on your team has a magnetic personality, send them out to pick-up lunch. Try turning off the lights (Just a shot in the dark, lilz).

 

I have no idea if any of this would solve your problem. But do let us know what you find. Don't despair, but keep at it, you'll figure it out eventually.

 

Update: another thing to try is to butcher one of your boards and jerry-rig an external clock source for the MCU from a frequency generator if you have one. Then configure your RCC to use "HSE. Bypass" mode. Assuming you haven't introduced a new problem doing this, If your problem disappears you then pretty much definitely isolated the issue to your crystal circuit. If not, you may have ruled it out as the source of the problem.

- If a post has answered your question, please acknowledge the help you received by clicking "Accept as Solution".
- Once you've solved your issue, please consider posting a summary of any additional details you've learned. Your new knowledge may help others in the future.
RobertK
Associate III

Hi BarryWhit,

1. I don't really have much access to the firmware, my company has very little firmware resources.

2. The 10kΩ and 100nF on NRST is from a previous version of this PCB  (by my predecessor) with ~700 working boards out in the world. I always though it was pretty standard even through the STM32 has an internal pull up on NRST. I can see that the Nucleo-64 dev board has a 100kΩ and 100nF on NRST, and the Nucleo-144 just has the 100nF. While I doubt this is the root cause, I'll probably change the 10kΩ to 100kΩ in the next order.

3. I have yet to see a single board meet all of the 'recommended' practices of AN2867.  And while I don't have a complete guard ring, I have most of one as well as an unbroken ground layer under the crystal.  Although I don't quite see how I can complete the guard ring as MCU pin 4 is not GND. The crystal traces are laid out as a differential pair and I have the capacitors mirrored. I doubt that this too actually is the issue and it certainly isn't the root cause of my problems.

4. I have tested these boards in a couple of different locations around our offices and have noticed nothing location specific. Nor are there any obvious noise sources that could cause any issues. I'm in an ex-barn surrounded on 3 sides by fields and have a cabinetry workshop over the road. But before you point at them, I've had the problem when they've been away with the machines off.

I think I mentioned it in the discussion but I'll reiterate here:
Changing the capacitors helps. I originally used 22pF caps, which with the 8pF crystal load did not work. I changed to 6.8pF and kept the crystal and most boards of the latest batch worked (23 of 30). I've now swapped most of these boards to 8.2pF and I'm up to 25 of 30. 3 of the remaining faulty boards just don't start up regardless of what capacitor I try but will operate after a reset. The last 2 boards don't work even after a reset.

I will need approximately 500 of these boards next year and I don't want to have to overorder and bin 20%. It's hard not to despair a little, I've ordered approximately 70 PCBAs now and they've all had this issue and despite the advice here I'm no closer to solving it. Can't say STMs and crystals are on my christmas card list.

BarryWhit
Senior III

I don't have a complete guard ring

I don't see a guard ring, all I see is a ground pour that apperently covers the entire layer. How important that is, I can't tell you.

 

I have yet to see a single board meet all of the 'recommended' practices of AN2867. 

Of course I can't be sure what the issue actually is (If I knew, I'd tell you. honest). But You have the chip at 45deg yet kept the crystal at 90. That pushes the crystal away from the chip. 

Compare the trace length in your example

BarryWhit_0-1720048892302.png

vs. the recommended layout

Traces.jpg

There's barely any trace between the chip/osc footprints there, while your layout has like a small runway. That could be significant. On the very next page, the appnote shows an example of a bad layout, and calls out the traces as too long - for a layout with about half the trace length you used.

 

NRST .. with ~700 working boards out in the world

You're probably right. 

 

 > I think I mentioned it in the discussion but I'll reiterate here: Changing the capacitors helps

That's indeed an indication that the crystal is at fault, so I suggested you do test with an external clock. You say you have no access to the firmware, but you don't need it. Create a test project with HSE Byp. mode, and see if the problem disappears. If a bad board immediately becomes rock solid when you replace the crystal with clock source, you've made significant progress towards identifying the root cause. Or see if using HSI instead make the problem goes away (as others suggested)

 

You'e using a no-name crystal, which has a one-page datasheet mostly in mandarin (I assume). And the Cl is given as "6~20pf or specify". I have no idea what that means, do you? The recommended Cl values are usually precisely specified for a given crystal part number and frequency, It can't be such a wide range. The ST appnote gives specific combinations  of part number and Cl (for LSE), for example. 

 

The next thing I would try would be to ditch the chinese crystal. Find a nucelo that uses your STM32 family (or any nucleo really), look at its design files and order a handful of the same part number.  

At the very least (If you're forced to match an existing footprint), stick with well-known manufacturers for crystals. There is a list here in AN2867 (page 29).

It would you cost you 20$ to try 2-3-4 other kinds of crystals. And I rank your success probability as fair.

 

- If a post has answered your question, please acknowledge the help you received by clicking "Accept as Solution".
- Once you've solved your issue, please consider posting a summary of any additional details you've learned. Your new knowledge may help others in the future.
BarryWhit
Senior III

The new board is using this crystal from JLC 

I just realized what this means. So you're using a different crystal (a seemingly obscure chinese one) from the known-good layout? Which crystal did the old one use? what vendor?

- If a post has answered your question, please acknowledge the help you received by clicking "Accept as Solution".
- Once you've solved your issue, please consider posting a summary of any additional details you've learned. Your new knowledge may help others in the future.
RobertK
Associate III

X1 in your example layout from AN2867 is the LSE 32.768kHz crystal, I'm using the HSE crystal pins, which would be the X2 traces. My traces are 11.3 & 14.3mm long, which is 1/28,000 of the 8MHz wavelength so reflections shouldn't be an issue and it's reasonably well shielded and treated as a diff pair so noise should be pretty minimal. Basically I find it hard to see that the layout could be the problem. Especially considering what was in the original design.

Originally (and working in ~350 boards) the crystal was this one here. with the following layout

RobertK_0-1720700131891.png

RobertK_3-1720700214462.png

 

RobertK_2-1720700198015.png

RobertK_1-1720700159843.png

It's a bit hard to see but the traces are 14.0mm and 28.2mm long. The crystal has a Cload of 18pF, Cshunt 7pF, ESR of 60Ω and has 30pF external load caps. The fact this has been extremely reliable despite almost deliberately breaking every single guidance in AN2867 means I'm very suspicious of claims that having layout exactly like AN2867 matters.

As these boards are being assembled by JLC I've been using their library/stores which has no 'reputable' brands for 3.2x2.5mm 8MHz crystals. The one I chose here does list the Cload in the product listing as 8pF although it took me a while to see it. That is my fault, I didn't choose the best crystal to start with. But even with 8pF plugged into the external load capacitor formula and guessing a Cstray of around 3-6pF I calculated ~6-8pF external load capacitors which should have a working crystal oscillator. But I do not.

Sorry for the delay in updates, I've been focusing on getting this board though a test run at an EMC test house. It mostly passed, so that's one headache dodged. yay!

Today I've replaced the crystal with this one here. Abracon are reputable, it has the same Cload of 18pF as my working crystal, Cshunt is lower at 2pF and worryingly ESR is 500Ω. Unfortunately the gmcrit looks to be too high to work properly and I have found that's the case.

I guess I'm going to have to go back to the large original crystal and figure out how to jam a 12.5x5mm crystal into a 3.2x2.5mm space.

BarryWhit
Senior III

As these boards are being assembled by JLC I've been using their library/stores which has no 'reputable' brands

> for 3.2x2.5mm 8MHz crystals.

 

JLC have a "global sourcing" option. If availability were the problem, you could order from any large distributor  (digikey, etc') through them.

 

> I guess I'm going to have to go back to the large original crystal and figure out how to jam a 12.5x5mm crystal into a 3.2x2.5mm space.

 

Several others have suggested testing with HSI (the internal oscillator) as the clock source to try and remove the crystal from the picture and see if it is indeed the issue. I've also suggested several times that injecting an external clock and configuring RCC in HSE bypass mode would be useful (but perhaps give no more info than just using HSI). Any particular reason why you haven't done this test?

 

Also, it might be a good idea to contact ST Online Support and ask their advice directly (hopefully they'll do more than point you at the appnote).

- If a post has answered your question, please acknowledge the help you received by clicking "Accept as Solution".
- Once you've solved your issue, please consider posting a summary of any additional details you've learned. Your new knowledge may help others in the future.
BarryWhit
Senior III

@Peter BENSCH , OP is really taking a beating with his board. Can you help?

- If a post has answered your question, please acknowledge the help you received by clicking "Accept as Solution".
- Once you've solved your issue, please consider posting a summary of any additional details you've learned. Your new knowledge may help others in the future.
Peter BENSCH
ST Employee

Oh, the thread is already quite long... thanks for looping me.

What stands out to me at first glance is the unsuitable layout. @BarryWhit has already inserted the correct excerpts from AN2867, but there is much more to read (between the lines) in AN2867. I have mentioned it several times here in the community (e.g., here, there, and another one), and maybe we should create a Knowledge Base article about it: it is important not to have traces under the crystal, and the crystal should be surrounded by separate GND - underneath and around it, which in turn should be connected to the GND pin of the STM32 in the shortest possible way. This is nicely shown as a white area in the inserted layout example of the AN2867, where the GND around and under the crystal is separated from the rest of the GND.

In the layout of @RobertK, there is a large, continuous GND plane where all sorts of currents can pulse along and disturb the crystal.

It has been asked several times here in the Community whether the oscillators of the STM32 are really that sensitive, because very often only a crystal is taken, two load capacitors are attached, and it works. Yes, maybe on the lab bench and a little in the wild - but guaranteed under all temperature and other environmental conditions?

Unfortunately, I have had to observe over the past decades that the topic of crystals, just like power supply or pin protection (ESD protection, etc.), rarely receives the necessary attention.

Hope that helps?

Regards
/Peter

In order to give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.
LCE
Principal

@Peter BENSCH 

Unfortunately, I have had to observe over the past decades that the topic of crystals,
> just like power supply or pin protection (ESD protection, etc.), rarely receives the necessary attention.

 

I generally agree.

I'm currently having a custom board here with 3 ICs all using the same type of 25 MHz crystal:
STM32H733 in LQFP-144
KSZ8863 (PHY)
LAN9512 (USB-ETH bridge)

I have taken great care with the layout, all 3 ICs are working with that crystal.

BUT checking with the scope at the corresponding crystal IOs of the 3 ICs, I observe that the STM32 "OSC" signals are barely visible on the scope, with only a few 10s of mVpp, whereas the other 2 ICs produce much higher voltage swings with at least a few 100s of mVpp.

I changed the capacitors, even have a serial R in the path (bridged for now), that didn't change the level much.

So I'd say it is also a problem related to the STM's "weak" oscillator circuit, which makes it harder for the user to make it work, and any mistakes are not so easily forgiven as with other ICs.

BarryWhit
Senior III

@Peter BENSCH 

OP is having trouble with HSE (less sensitive to layout than LSE), at room temp (85C not the issue for now), and only during power-up - toggling NRST makes it reliably start. And this issue occurs with multiple boards.

Do you honestly feel that layout issues are a likely root cause (and not just sub-optimal) given that? 

- If a post has answered your question, please acknowledge the help you received by clicking "Accept as Solution".
- Once you've solved your issue, please consider posting a summary of any additional details you've learned. Your new knowledge may help others in the future.