SPI MISO "broken" after a while

_Daniel_ · ‎2024-09-21

Hello,

I have a Nucleo-F439ZI as SPI master connected to four Nucleo-F030R8 SPI slaves.

Initially, the SPI communication was working perfectly fine. But after a while, the master received only zeros on MISO line even though, the slaves where sending data != 0 (verified by logic analyzer).

I first suspected a bug in my code, wrong SPI configuration, wiring issues and so on. But soon I recognized, the SPI1_DR always reads zero, even when the MISO pin (PA6) is tied to 3.3V! The port config looks fine (Alternate function 5, PP, no pull devices). Also SCLK and MOSI are working perfectly fine.

So, I assumed a broken MCU, replaced the master Nucleo board by another new one and everything was working fine again. I ran a test, sending/receiving more than 100,000,000 SPI frames successfully without a single error (verified by SW CRC check on protocol layer).

One day later, I powered the setup again without any change to HW or SW and now, the second board/MCU was broken. Same issue, SPI_DR reads only 0, no matter what the actual input is. I tried a third Nucleo board and it was working fine again.

I checked the GPIOA_IDR.IDR6 and it perfectly reflects the logic level applied to the pin, so, it's not a broken port input. I also tried to use another port pin as MISO input (PB4 [AF5]), but still the same. Looks like the input path is broken somewhere after the port multiplexer.

Since the original SW is quite complex, using HAL, SPI with DMA and complex logic, I created a small test project configuring the GPIOs and SPI manually and just sending/receiving SPI data in a while-loop with polling the SPI1_SR.RXNE flag. Same behavior here: working well on a new Nucleo board, not receiving any data on the first two "broken" boards.

Also I could not find any errata that would explain this behavior.

I have to mention, that I use a ~45cm ribbon cable to connect the SPI master to the slaves, which I know is not really state-of-the-art. However, the signals look well on oscilloscope and logic analyzer. The layout of the ribbon cable is: CS-GND-MOSI-GND-MISO-GND-SCLK-GND.

Another observation: When I change the SPI mode from 0 to 3 (CPOL=1, CPHA=1) and tie the MISO pin to either GND or 3.3V, SPI1_DR reads whatever the initial status of the pin is when the SW is started:

- MISO connected to GND => start SW => DR reads 0 => connect MISO to 3.3V => DR still reads 0

- MISO connected to 3.3V => start SW => DR reads 0xFF => connect MISO to GND => DR still reads 0xFF

("start SW" means pressing "Reset the chip and restart debug session" followed by "Resume" in CubeIDE)

In SPI mode 0, it always reads 0, regardless of the initial state of MISO.

The settings are:

f_PCLK2 = 42MHz
SPI Prescaler 32 => Baudrate = 1.3125MHz
Frame Format = Motorola
Data Size = 8 Bits
CPOL = 0
CPHA = 0
CRC = Disabled
NSS controlled by SW

GPIOA Config:

MODER6 = 0x2
OTYPR6 = 0x0
OSPEEDR6 = 0x3
PUPDR6 = 0x0
AFRL6 = 0x5

SPI1_CR1 Config:

BIDIMODE = 0x0
BIDIOE = 0x0
CRCEN = 0x0
CRCNEXT = 0x0
DFF = 0x0
RXONLY = 0x0
SSM = 0x1
SSI = 0x1
LSBFIRST = 0x0
SPE = 0x1
BR = 0x4
MSTR = 0x1
CPOL = 0x0
CPHA = 0x0

SPI1_CR2 = 0x0, SPI1_I2SCFGR = 0x0, SPI1_I2SPR = 0x0

The board is powered by a stable laboratory power supply with 8V on VIN (JP1 = OFF, JP3 = 5-6).

The boards were purchased on digikey, no china fake.

A similar issue was observed here:

https://www.mikrocontroller.net/topic/549031

(unfortunately in german language and no solution except replacing the MCU).

and here:

https://community.st.com/t5/stm32-mcus-products/stm32f205-spi-data-register-always-zero-as-master/m-p/479853#M166224

(but also here, the solution was just to replace the chip)

What could be the reason for the broken MISO input?

Is there any recommendation for external circuit to protect the input?

If it happened only once, I would just replace the chip and not bother anymore. But if it happens two times in a week, I'll have to find the root cause and a solution....

PGump.1 · ‎2024-09-22

Sounds like you are the victim of LATCHUP. You need to change the connection hardware design, and/or power on sequence. You will just keep breaking them until you fix it...

Kind regards
Pedro

AI = Artificial Intelligence, NI = No Intelligence, RI = Real Intelligence.

LCE · ‎2024-09-22

So it looks like only the "alternate function path" was damaged, interested I am how that can happen.

Anyway, in case you connect anything from outside the board, always use at least some ESD protection.
For high speed SPI you can use the low capacitance TVS used for USB or other "user interfaces".
And for each pin, also the data lines!

waclawek.jan · ‎2024-09-22

Does SCLK work properly on the damaged boards?

JW

_Daniel_ · ‎2024-09-23

Yes, if it is caused by external ESD, I would have expected the schmitt trigger of the port to break, but not something "deep inside the chip"...

I'll try to add some TVS, thanks!

_Daniel_ · ‎2024-09-23

SCLK (and also MOSI) still work properly on the damaged boards.

Also, the RXNE flag is set as expected, the only problem is, that DR always reads 0x00.

waclawek.jan · ‎2024-09-23

> -SCLK (and also MOSI) still work properly on the damaged boards

Is SCLK=PA5?

Can you try to move it to PB3? (there may be some solder bridge on the Nucleo board on that pin, as it's TRACESWO).

JW

PGump.1 · ‎2024-09-23

That is definitely Latch-up.

@_Daniel_ you need to do some reading on how to design hardware to avoid Latch-up.

Good luck.

Kind regards
Pedro

AI = Artificial Intelligence, NI = No Intelligence, RI = Real Intelligence.

PGump.1 · ‎2024-09-23

Also, don't fall into the trap where you think that ESD and Latch-up are the same thing - they are NOT!

Kind regards
Pedro

AI = Artificial Intelligence, NI = No Intelligence, RI = Real Intelligence.

LCE · ‎2024-09-23

As far as I understand, latch-up is the fault caused by overvoltage, maybe due to ESD.

@PGump.1 so what would be the difference to standard ESD protection with TVS (plus whatever, depending on the circuit, serial Rs, inductors, common mode chokes, ...)?