cancel
Showing results for 
Search instead for 
Did you mean: 

Unstable SPI Slave

Nikolaj_TL
Associate III

I have a custom board with a STMWB55 (master) and a STM32F031 (slave). They communicate over SPI using DMA. Unforturenately there are communication problems.

 

In order to identify the problem I have simplified the setup. I have used a ST NUCLEO-F031K6 and a ST NUCLEO-WB55 board to test the communication (SCLK, MISO & MOSI are used, 250kHz clk freq). Both use SPI example provided from ST (from STM32Cube_FW_F0_V1.11.5 and STM32Cube_FW_WB_V1.19.0)
The slave is setup to use interrupt instead of DMA to get the simplest setup. The only modification made from the original example is that the Master sends 8x bytes messages and the slave echoes what it receives - in order to test the slave device. Approximately 200 bits out of 10kB from the slave are errorprone. The communication is monitored using a logic analyser and the data is analysed with python.

 

Is this success/failure rate expected? Is this something that you have come across before? Do you have suggestions for solutions?

 

Here is an example of an error from the test:

Nikolaj_TL_0-1723122821437.png

(the slave actually echoes the message received +1, in order to easily compare the messages)

12 REPLIES 12
TDK
Guru

> Is this success/failure rate expected? Is this something that you have come across before? Do you have suggestions for solutions?

You shouldn't be having random bits be incorrect. Noise can be a problem with long lead lines, if your clock rate is high. Using a CS pin is typically preferred, as it can allow for re-synchronization between master/slave.

> (the slave actually echoes the message received +1, in order to easily compare the messages)

But your output typically shows the same values (e.g.0x01/0x01), not X/X+1 (0x01/0x02)?

Perhaps zoom in on a transaction that failed within the logic analyzer to see and show more of what's going on.

If you feel a post has answered your question, please click "Accept as Solution".
Saket_Om
ST Employee

Hello @Nikolaj_TL 

The success/failure rate you are experiencing is not typical for reliable SPI communication. Please make sur you have stable connection between board and short wire. 

If your question is answered, please close this topic by clicking "Accept as Solution".

Thanks
Omar
Nikolaj_TL
Associate III

Hello @TDK

I have added CS (NSS) and I still get the same amount of errors. I have zoomed in on a transaction that failed and see that the Slave TxBuffer is shifted oddly and that it also affect the following transmissions from the slave. 

Nikolaj_TL_1-1724765836299.png

In the test setup the master sends x1, x2, x3 ... the slave receives this and updates the TxBuffer: x1+1, x2+1, x3+1 ... Because the slaves respons is delayed by one the messages should always correspond to each other. 


I have made sure that I have the best possible connection using good (and new) jumper wires and a simple setup

Nikolaj_TL_2-1724766051350.png

Do you have any suggestions as where to debug next?

 

 

Probably just a software bug that you need to work through.

The fact that the slave responds with consecutive duplicate bytes suggests maybe it's not fast enough to send data to the SPI before the master starts to read it, so it repeats the previous byte.

Try slowing down the clock rate by a factor of 10. If errors disappear, probably the slave code is too slow.

If you feel a post has answered your question, please click "Accept as Solution".
Nikolaj_TL
Associate III

Hello @TDK,

I have already lowered the baudrate as much as possible. 

Master: The master SYSCLK is 64MHz, the APB1CLK also 64MHz and the SPI_BAUDRATEPRESCALER_256 (default is 32 in the example). This results in a SPI clock frequence of 250kHz. 

Slave: The slave runs at 48MHz and the APB1CLK also runs at 48MHz. 
So from what I can tell the slave should have plenty of time to respond the master.


I have setup a slave debug pin to toggle when the slave enters and exits the SPI interrupt callback (Slave_Debug_Pin), in order to further investigate the problem:

Nikolaj_TL_1-1724826827119.png

 

When an failure occurs, it is after a transfer where the interrupt callback is not called successfully:

Nikolaj_TL_2-1724826944147.png

 

The next transaction after the missing callback the interrupt callback is called in the middle of the transaction: 
Nikolaj_TL_0-1724825845436.png
The same applies to the next few transfers after which the slave comes back into sync again. 

 

 

 

Looks like a byte is dropped or missed. At least, that would explain the behavior.

In the previous transactions, are all 8 bytes present?

TDK_0-1724853847500.png

Consider sending 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 from the slave for every transaction in order to debug the problem better. You will be able to see exactly where things go wrong.

If you feel a post has answered your question, please click "Accept as Solution".
Nikolaj_TL
Associate III

In the transaction you have marked all bytes are present:

 

Nikolaj_TL_1-1724909793296.png

 

Nikolaj_TL
Associate III

As suggested I have set the slave to send: 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08.

Here are the results:

Nikolaj_TL_1-1724911683099.png

The pictures and the Logic Analyser file are attached.

Her is a snippet of some of the transactions

0.png    11.png   22.png 
33.png   44.png   5Nikolaj_TL_2-1724912171324.png ...

TDK
Guru

Okay, so there's a spurious 0x05 appearing between transactions, sometimes. This explains the behavior and why bytes are shifted.

Let's assume the microcontroller is just doing what the code tells it to do, which is a very safe assumption. Where could this 0x05 be coming from? If I had to guess, probably it's the first bytes of the second word you're sending, which means the slave is underflowing. This explanation matches up with the spurious 0xFA in the opening post.

The problem and solution here are likely going to be in the slave code. It looks like the 0x05 is not ready in time. But with 10ms per transaction, that is plenty of time. Can you share the slave code?

 

If you have batches of 8 bytes, it might make more sense to keep CS low during the entire 8 bytes, rather than raising it on each individual byte.

If you feel a post has answered your question, please click "Accept as Solution".