STM32F031 SPI slave BSY flag occasionally does not reset

henk23 · ‎2016-05-09

Posted on May 09, 2016 at 12:53

In our project we have two boards communicating with eachother through SPI. A controller board based on an STM32F405 is SPI master over a sensor board for which we have two instances. One is based on an STM32F071, the other on an STM32F031. They share the same SPI slave code (apart from pin assignment and peripheral number), but only on the 031 we experience a problem.

Once every ~10000 transactions, the SPI peripheral will not reset the BSY bit, even though it is obvious that it should. I have conducted a number of experiments which leave me slightly baffled. I have observed/tried the following:

The slave receives the message exactly as it is intended. I have explicitely tested this with the last bit being 0, 1, the same as the last-but-one bit, and different from the last-but-one bit.
When it occurs, TXE is 1 and DMA is complete.
Slowing down SPI does not make a difference
Making the master clock extra bytes does not change the situation. In fact it gives me results that even baffle me more.

This picture shows a situation where the problem occurs. The DMA scheduled on the slave is all 0xAA except the last 7, which number down 0x61, 0x51, 0x41, etc. The master clocks five extra bytes. Instead of repeating the last one, it will dig further into history and repeat the 0x31, three bytes before the end of the DMA buffer.

(Image obviously does not show SPI/DMA status, but when the problem occurs the sensor board prints the status of TXE, BSY and DMA to a terminal and the controller board generates a trigger at the next transaction)

To summarize:

Happens on 031 but not on 071
TXE is 1, DMA is complete, BSY will not reset
No indication that an SCK pulse was missed

What is going on here?

#stm32f031 #spi #slave

henk23 · ‎2016-05-31

Posted on May 31, 2016 at 15:30

For anyone running into this problem and finding this thread through google:

I have had contact with ST support, made it possible for them to reproduce the problem using a nucleo board and a discovery board, and just now received the verdict: it is a known HW bug that is not (yet) in the errata list.

They suggest a workaround keeping track of RXNE (as that is reliable as opposed to BSY). As I am using DMA to a circular buffer for reception, I cannot use the flag, but I can use the amount of unconsumed data in the DMA buffer. I am not yet in a hurry to implement it because the workaround with nSS going high (marking the end of transfer according to the master) also works, even though I cannot distinguish between the HW bug and a real glitch on the SPI lines this way.

banyaszg · ‎2016-06-17

Posted on June 17, 2016 at 10:30

I ran into the same problem. Unfortunately I didn't find your post until I pinpointed the source of my problem.

In my case I have an STM32F030 based board which is connected to a Cortex-A based host as SPI slave. In my communication protocol there is a receiving part followed by a transmitting part. The error mainly occurs at the end of the transmitting however the TXE flags indicates a successful transmission. The error is more frequent for me than for you and it is present both in IT and DMA mode.

In my project we use HAL driver. With HAL (v1.3.1) this problem is much more severe. At the end of the SPI receiving or transmission the implementation waits for the BSY flag. (SPI_EndRxTransaction() and SPI_EndRxTxTransaction()). These waiting functions are called from the interrupt handler (SPI or DMA interrupt based on the chosen communication).

To make it more severe the STM32Cube sets priority 0 for these interrupts. So the waiting for a stuck flag blocks the interrupt handling. However there is a timeout in the waiting code but it requires SysTick (also priority 0). So in the reality the next SPI communication will unblock the system.

Originally I detected the data loss at the SPI communication and the anomalies at the interrupts. These were my main issues, and my investigation revealed the BSY flag and the HAL error.

In my fix I disabled the BSY waiting in SPI slave mode in the upper functions. Since that the communication seems to be stable.

LoriB · ‎2018-11-09

I am facing the same exact issue on a STM32F779.

I found that now it is on the device errata document:

BSY bit may stay high at the end of a data transfer in Slave mode

Description

The BSY flag may sporadically remain high at the end of a data transfer in slave mode. This occurs upon coincidence of internal CPU clock and external SCK clock provided by master.

In such an event, if the software only relies on BSY flag to detect the end of SPI slave data transaction (for example to enter low-power mode or to change data line direction in half-duplex bidirectional mode), the detection fails.

As a conclusion, the BSY flag is unreliable for detecting the end of data transactions.

Workaround

Depending on SPI operating mode, use the following means for detecting the end of transaction:

• When NSS hardware management is applied and NSS signal is provided by master,

use NSS flag.

• In SPI receiving mode, use the corresponding RXNE event flag.

• In SPI transmit-only mode, use the BSY flag in conjunction with a timeout expiry event.

Set the timeout such as to exceed the expected duration of the last data frame and

start it upon TXE event that occurs with the second bit of the last data frame. The end

of the transaction corresponds to either the BSY flag becoming low or the timeout

expiry, whichever happens first.

Prefer one of the first two measures to the third as they are simpler and less constraining.

Alternatively, apply the following sequence to ensure reliable operation of the BSY flag in

SPI transmit mode:

1.Write last data to data register

2. Poll the TXE flag until it becomes high, which occurs with the second bit of the data

frame transfer

3. Disable SPI by clearing the SPE bit mandatorily before the end of the frame transfer

4. Poll the BSY bit until it becomes low, which signals the end of transfer

Note: The alternative method can only be used with relatively fast CPU speeds versus relatively slow SPI clocks or/and long last data frames. The faster is the software execution, the shorter can be the duration of the last data frame.

Using the HAL, since I am relying on a hardware NSS, i tried to remove the BSY flag check in the SPI_EndRxTransaction callback.

In this way the problem seems to be gone.

static HAL_StatusTypeDef SPI_EndRxTransaction(SPI_HandleTypeDef *hspi,  uint32_t Timeout, uint32_t Tickstart)
{
  if ((hspi->Init.Mode == SPI_MODE_MASTER) && ((hspi->Init.Direction == SPI_DIRECTION_1LINE)
                                               || (hspi->Init.Direction == SPI_DIRECTION_2LINES_RXONLY)))
  {
    /* Disable SPI peripheral */
    __HAL_SPI_DISABLE(hspi);
  }
 
  /* Control the BSY flag */
//  if (SPI_WaitFlagStateUntilTimeout(hspi, SPI_FLAG_BSY, RESET, Timeout, Tickstart) != HAL_OK)
//  {
//    SET_BIT(hspi->ErrorCode, HAL_SPI_ERROR_FLAG);
//    return HAL_TIMEOUT;
//  }
 
  if ((hspi->Init.Mode == SPI_MODE_MASTER) && ((hspi->Init.Direction == SPI_DIRECTION_1LINE)
                                               || (hspi->Init.Direction == SPI_DIRECTION_2LINES_RXONLY)))
  {
    /* Empty the FRLVL fifo */
    if (SPI_WaitFifoStateUntilTimeout(hspi, SPI_FLAG_FRLVL, SPI_FRLVL_EMPTY, Timeout, Tickstart) != HAL_OK)
    {
      SET_BIT(hspi->ErrorCode, HAL_SPI_ERROR_FLAG);
      return HAL_TIMEOUT;
    }
  }
  return HAL_OK;
}

banyaszg · ‎2018-11-09

I did the same modification. Also the SPI_EndRxTxTransaction callback function had a similar part so I removed that too.

The software is used since that without any issue.