SPI MISO "broken" after a while

_Daniel_ · ‎2024-09-21

Hello,

I have a Nucleo-F439ZI as SPI master connected to four Nucleo-F030R8 SPI slaves.

Initially, the SPI communication was working perfectly fine. But after a while, the master received only zeros on MISO line even though, the slaves where sending data != 0 (verified by logic analyzer).

I first suspected a bug in my code, wrong SPI configuration, wiring issues and so on. But soon I recognized, the SPI1_DR always reads zero, even when the MISO pin (PA6) is tied to 3.3V! The port config looks fine (Alternate function 5, PP, no pull devices). Also SCLK and MOSI are working perfectly fine.

So, I assumed a broken MCU, replaced the master Nucleo board by another new one and everything was working fine again. I ran a test, sending/receiving more than 100,000,000 SPI frames successfully without a single error (verified by SW CRC check on protocol layer).

One day later, I powered the setup again without any change to HW or SW and now, the second board/MCU was broken. Same issue, SPI_DR reads only 0, no matter what the actual input is. I tried a third Nucleo board and it was working fine again.

I checked the GPIOA_IDR.IDR6 and it perfectly reflects the logic level applied to the pin, so, it's not a broken port input. I also tried to use another port pin as MISO input (PB4 [AF5]), but still the same. Looks like the input path is broken somewhere after the port multiplexer.

Since the original SW is quite complex, using HAL, SPI with DMA and complex logic, I created a small test project configuring the GPIOs and SPI manually and just sending/receiving SPI data in a while-loop with polling the SPI1_SR.RXNE flag. Same behavior here: working well on a new Nucleo board, not receiving any data on the first two "broken" boards.

Also I could not find any errata that would explain this behavior.

I have to mention, that I use a ~45cm ribbon cable to connect the SPI master to the slaves, which I know is not really state-of-the-art. However, the signals look well on oscilloscope and logic analyzer. The layout of the ribbon cable is: CS-GND-MOSI-GND-MISO-GND-SCLK-GND.

Another observation: When I change the SPI mode from 0 to 3 (CPOL=1, CPHA=1) and tie the MISO pin to either GND or 3.3V, SPI1_DR reads whatever the initial status of the pin is when the SW is started:

- MISO connected to GND => start SW => DR reads 0 => connect MISO to 3.3V => DR still reads 0

- MISO connected to 3.3V => start SW => DR reads 0xFF => connect MISO to GND => DR still reads 0xFF

("start SW" means pressing "Reset the chip and restart debug session" followed by "Resume" in CubeIDE)

In SPI mode 0, it always reads 0, regardless of the initial state of MISO.

The settings are:

f_PCLK2 = 42MHz
SPI Prescaler 32 => Baudrate = 1.3125MHz
Frame Format = Motorola
Data Size = 8 Bits
CPOL = 0
CPHA = 0
CRC = Disabled
NSS controlled by SW

GPIOA Config:

MODER6 = 0x2
OTYPR6 = 0x0
OSPEEDR6 = 0x3
PUPDR6 = 0x0
AFRL6 = 0x5

SPI1_CR1 Config:

BIDIMODE = 0x0
BIDIOE = 0x0
CRCEN = 0x0
CRCNEXT = 0x0
DFF = 0x0
RXONLY = 0x0
SSM = 0x1
SSI = 0x1
LSBFIRST = 0x0
SPE = 0x1
BR = 0x4
MSTR = 0x1
CPOL = 0x0
CPHA = 0x0

SPI1_CR2 = 0x0, SPI1_I2SCFGR = 0x0, SPI1_I2SPR = 0x0

The board is powered by a stable laboratory power supply with 8V on VIN (JP1 = OFF, JP3 = 5-6).

The boards were purchased on digikey, no china fake.

A similar issue was observed here:

https://www.mikrocontroller.net/topic/549031

(unfortunately in german language and no solution except replacing the MCU).

and here:

https://community.st.com/t5/stm32-mcus-products/stm32f205-spi-data-register-always-zero-as-master/m-p/479853#M166224

(but also here, the solution was just to replace the chip)

What could be the reason for the broken MISO input?

Is there any recommendation for external circuit to protect the input?

If it happened only once, I would just replace the chip and not bother anymore. But if it happens two times in a week, I'll have to find the root cause and a solution....

PGump.1 · ‎2024-09-23

It is true that ESD can trigger Latch-up. However, Latch-up has its own properties.

Overvoltage? What is overvoltage? In the case of Latch-up, overvoltage is proportional (although, not linear) to VCC. During the Power On cycle of VCC, Latch-up immunity goes from Weak to Strong.

And no doubt too weak for this MISO pin...

Protection devices that offer a fixed voltage threshold, often don't help.

I hope this helps someone.

Kind regards
Pedro

AI = Artificial Intelligence, NI = No Intelligence, RI = Real Intelligence.

_Daniel_ · ‎2024-09-24

@PGump.1 , thanks a lot for your comments. I am keeping Latch-Up in mind.

Actually, the pure ESD theory make less and less sense... the max. measured voltage at MISO pin (~3.7V) is always way below the maximum value for five-volt tolerant port pins (VDD + 4V).

Also, it looks like the internal protection diodes of the port pin are still working properly: I connected power supply with 3mA current limit to the MISO pin of a damaged board and slowly increased / decreased the voltage above/below the max./min. voltage ratings. When coming close to the limits (6.8V for max. / -0.2V for min.), the current slowly increases => internal diodes get conductive!?

If an ESD would have damaged the MISO input, shouldn't it then also have damaged the diodes?

Regarding Latch-Up I found the following recommendations:

Power on sequence => for SPI, as well master as slave has inputs, so I could not imagine a meaningful power-on sequence...
Changing the connection setup would be the very last option, not easy to achieve
Adding series resistors to the data signals to limit the current => that's an option
Adding supressor diodes => also an option but most probably not effective during power ramp up
Adding series resistors to the power supply lines => maybe an option but unnecessary power dissipation

Something that might be worth to mention with respect to power on sequence: Since the Nucleo board is missing the external crystal (X3) and is instead clocked from the MCO of the embedded ST-Link, I had an iussue, that the MCO clock comes very late (~2s) after power-on when no USB cable is connected to the ST-Link. So, I increased the HSE Startup Timeout to 3000ms. During this time, the SPI lines might be floating and the SPI cell not yet clocked / powered...

PGump.1 · ‎2024-09-24

I have a Nucleo-F439ZI as SPI master connected to four Nucleo-F030R8 SPI slaves.

Each of these 5 boards have their own VCC. How do you keep the VCCs in-sync?

Kind regards
Pedro

AI = Artificial Intelligence, NI = No Intelligence, RI = Real Intelligence.

unsigned_char_array · ‎2024-09-24

@_Daniel_ wrote:
Initially, the SPI communication was working perfectly fine. But after a while, the master received only zeros on MISO line even though

Can you define "after a while"? Did it happen while it was running? If so how long did it take? If not how many power cycles did it take before the problem occurred?

If it occurred after a power cycle perhaps some registers are not properly cleared by the reset. Or some clock domains are not properly synced. Sometimes STM32 MCU's require some extra resetting of peripherals or multiple attempts at initialization before they function correctly. If it occurred while it was running I have no explanation.

Do you have a schematic of your setup? I recommend series termination at output pins i.e. a series resistor at master's MOSI pin and a series resistor at each slave's MISO pin. In case of drop in power you have less of a short circuit current and in case of high frequency SPI you have less reflections.

Kudo posts if you have the same problem and kudo replies if the solution works.
Click "Accept as Solution" if a reply solved your problem. If no solution was posted please answer with your own.

_Daniel_ · ‎2024-09-24

Hmm....All boards are supplied from the same V_IN via ribbon cable (the same, that also carries the SPI data lines; 4 wires in parallel for bigger cross section). The VCCs are generated separately by the onboard voltage regulators and not necessarily 100% in sync...

(VCC here is connected to V_IN of the Nucleo boards)

_Daniel_ · ‎2024-09-24

@unsigned_char_array wrote:
Can you define "after a while"? Did it happen while it was running? If so how long did it take? If not how many power cycles did it take before the problem occurred?

Not exactly. I have the feeling, that it happened during a power cycle but unfortunately, I cannot say for sure.It took around 10 power cycles (estimated).

I tried resetting the SPI1 via APB2RSTR.SPI1RST but it didn't solve the problem. Looks like the MISO is permanently damaged.

There is no additional circuitry on the SPI lines, 1:4 direct connection of SCLK, MOSI and MISO without termination. Only for the CS there is some logic gates to make the slave address configurable via DIP switch.

PGump.1 · ‎2024-09-24

The answer is that the VCCs are NOT sync'ed. Using the laws of current flow, the boards will attempt equilibrium by pushing and pulling current through your IO lines. Unless you have made allowances for this in your design, this WILL put undue stress on the IO lines. Is it enough to over-stress - that is a matter of luck!!!

It is unfortunate that ST doesn't publish enough on this subject to allow for informed choices.

However, the subject is common to CMOS fabricators. You will find that sites like www.ti.com, www.nxp.com, and others are much more informative on the subject of Latch-up...

I hope this helps someone.

Kind regards
Pedro

AI = Artificial Intelligence, NI = No Intelligence, RI = Real Intelligence.

unsigned_char_array · ‎2024-09-25

You want to prevent phantom powering your MCUs and prevent latch-up.

I have some ideas:

Delay in master to wait before driving the SCLK, MOSI and CSN pins. This is not a guarantee as you don't know if and how long slaves take to power on, but it gives time for the VCC of the slaves to stabilize before sending anything. It won't help with powering off.
Pullups on CSN on slaves. Prevents slaves from thinking they are selected before they are connected to master or before master.
series resistors. Limits current. Can also help limit reflections and limit crosstalk.
Enable brown out in your MCU. Powers off the MCU if VCC gets too low. This stops any pin from driving when you power everything off. Might help when powering everything off.
external clamping diodes, often come in a package specifically designed for a particular interface (such as USB or I2C).
true open drain interface. Puts limits on speed and requires additional circuitry.
Switch to an interface designed to go off board such as RS485.

Kudo posts if you have the same problem and kudo replies if the solution works.
Click "Accept as Solution" if a reply solved your problem. If no solution was posted please answer with your own.

_Daniel_ · ‎2024-09-25

The answer is that the VCCs are NOT sync'ed. Using the laws of current flow, the boards will attempt equilibrium by pushing and pulling current through your IO lines. Unless you have made allowances for this in your design, this WILL put undue stress on the IO lines. Is it enough to over-stress - that is a matter of luck!!

Good point!

You want to prevent phantom powering your MCUs and prevent latch-up.
I have some ideas:

Thanks, I will pick up some of them.

Thanks to all for all the valuable comments!

For now, I will create a new piggy-back PCB for the master evalboard, with series resistors, pull-downs, TVSs and the possibility to provide external 3V3 supply.I will let you know if it helped in 1-2 weeks :)

(this is just a prototype setup for a proof of concept; it would be sufficient, if it withstands for some months. The remaining comments will go into the final design)

Daniel