STM32F7 Full Duplex SPI Slave - Daisy Chain with Shared Buffer

cheibriados · ‎2020-03-25

I'm currently working on a device that acts as a slave on a SPI bus, with the intention of being daisy-chained with a number of other devices. The bus master sends out a stream of data with 3 bytes meant for each slave, so the total number of bytes is [3 x Number of Slaves]. See below for a generalized image of the daisy-chain setup.

So far, I've had some limited success by initiating a DMA transaction using the same buffer with offset pointers. Because the transaction is an arbitrary length, I attach an interrupt to the hardware NSS pin rising edge and act on the data before resetting the DMA and SPI. Note that in the snippet below, I'm not doing anything with the data, simply resetting the bus.

static volatile uint8_t buf[128] = { 0 };
 
#define SPI_MAX_BYTES 125 // Buffer size - 3, due to RX pointer offset
 
void HAL_GPIO_EXTI_Callback(uint16_t GPIO_Pin)
{
  reset_spi_and_dma();
}
 
int main(void)
{
  SCB_EnableICache();
 
  HAL_Init();
 
  SystemClock_Config();
  MX_GPIO_Init();
  MX_DMA_Init();
  MX_SPI1_Init();
 
  HAL_SPI_TransmitReceive_DMA(&hspi1, buf, buf + 3, SPI_MAX_BYTES);
 
  while (1) {  }
}
 
// Global interrupt clear flags for DMA streams
#define DMA_FLAG_GL0_4                         0x0000003DU
#define DMA_FLAG_GL1_5                         0x00000F40U
#define DMA_FLAG_GL2_6                         0x003D0000U
#define DMA_FLAG_GL3_7                         0x0F400000U
 
void reset_spi_and_dma(void)
{
	// Clear pending interrupt flags for DMA2 Stream 0 and Stream 3
	DMA2->LIFCR = DMA_FLAG_GL0_4 | DMA_FLAG_GL3_7;
	
	// Disable the DMA
	__HAL_DMA_DISABLE(&hdma_spi1_rx);
	__HAL_DMA_DISABLE(&hdma_spi1_tx);
 
	// Disable SPI1 peripheral
	__HAL_SPI_DISABLE(&hspi1);
 
	// Disable SPI1 DMA on RX and TX
	CLEAR_BIT(hspi1.Instance->CR2, SPI_CR2_RXDMAEN | SPI_CR2_TXDMAEN);
 
	// Set NDTR (DMA_SxNDTR) to the number of bytes to transfer
	hdma_spi1_rx.Instance->NDTR = SPI_MAX_BYTES;
	hdma_spi1_tx.Instance->NDTR = SPI_MAX_BYTES;
 
	// Reset SPI2 and clear TXFIFO
	RCC->APB2RSTR |= RCC_APB2RSTR_SPI1RST;
	RCC->APB2RSTR &= ~RCC_APB2RSTR_SPI1RST;
 
	// Reinitialize SPI1
	MX_SPI1_Init();
 
	// Reenable DMA
	__HAL_DMA_ENABLE(&hdma_spi1_rx);
	__HAL_DMA_ENABLE(&hdma_spi1_tx);
 
	// Enable SPI1 DMA on RX and TX
	SET_BIT(hspi1.Instance->CR2, SPI_CR2_RXDMAEN | SPI_CR2_TXDMAEN);
 
	// Reenable SPI1
	__HAL_SPI_ENABLE(&hspi1);
}

DMA for RX and TX on SPI1 are configured as circular with a data width of 1 byte. FIFO is disabled as my understanding is that adds latency to the DMA transaction for single byte transfers.

My understanding of how this should work is:

NSS goes low and the hardware SPI transaction starts
Data is clocked out on the rising clock edge from the TX pointer (buf[0])
Data is clocked in on a falling clock edge to the RX pointer (buf[3])
TX and RX pointers are incremented (to buf[1] and buf[4] respectively)
This process repeats until NSS goes high and we manually stop/reset the peripherals

While this does somewhat work, it appears that the TX lags the RX by a full transaction. If I break the code after the first transaction, I see that buf is filled with the data as expected (confirmed with logic analyzer) but only 0x00 was shifted out for each clock cycle. Another transaction results in new data shifted in to buf but the data shifted out is what was in the buffer previously.

As an example, we see the following

Transaction 1)

Master MOSI: 0x01, 0x02, 0x03, 0x04, 0x05, 0x06

Slave MISO: 0x00, 0x00, 0x00, 0x00, 0x00, 0x00

[buf state after transaction = { 0x00, 0x00, 0x00, 0x01, 0x02, 0x03... }]

Transaction 2)

Master MOSI: 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C

Slave MISO: 0x00, 0x00, 0x00, 0x01, 0x02, 0x03

[buf state after transaction = { 0x00, 0x00, 0x00, 0x07, 0x08, 0x09... }]

It almost seems like it's not actually operating on the memory pointers for TX and RX and is instead buffering data up prior to the transaction starting somehow, and only committing new data back to the buffer after the transaction is finished. Digging through the reference manual and looking at the HAL code didn't seem to point me in any obvious directions.

Is what I'm trying to do feasible? And if so, is there something I'm missing with configuration of the DMA to allow this to happen? Happy to offer any more information about the HAL configuration or any other relevant information.

berendi · ‎2020-03-26

> it will always try to have 3 bytes in the FIFO? Which would be fine, except it looks like when it transmits the first byte, the TXE interrupt (and subsequent DMA request) are serviced before the byte in the RXFIFO is committed to the specified memory pointer.

There are some subtleties here.

At the beginning of the transfer, when the first byte is put into the TX FIFO, it will be immediately (within 1 or 2 cycles) transferred to the shift register, no loger taking up a FIFO slot. So there can be 3 bytes in the FIFO, and a fourth one in the shift register.

> short of bit-banging the SPI transactions

Bit banging with DMA could be actually viable, but it has a couple of restrictions.

> set the priority of the TX DMA request much lower than the RX DMA request in the hopes that RX always gets handled first, but this seems like a pretty fragile solution.

I have my doubts too.

> I've got a pin-compatible H7-series in mind

If STM32H7 is an option, go and order it now. It has a much more versatile SPI controller.

On the H7, set SPI_CFG1_DSIZE to 24 bits, and SPI_CFG1_UDRCFG behavior of slave transmitter at underrun condition to 01: slave repeats lastly received data frame from master, problem solved.

View solution in original post

berendi · ‎2020-03-25

You have disabled the DMA FIFO, but have certainly not disabled the 32 bit SPI TX FIFO, because it cannot be disabled. SPI will continue to issue DMA requests for the transmit channel until it considers the FIFO full.

Buffering in the SPI RX FIFO can be sort of disabled through the SPI_CR2_FRXTH bit, which controls whether the RXNE event occurs when there are 1 or 2 bytes in the RX FIFO, but it might not be enough when the offset is only 3 bytes.

cheibriados · ‎2020-03-26

Hmm, ok, so if I'm understanding correctly, the TXFIFO for the SPI peripheral is 32 bits and doesn't have a configurable threshold like the RXFIFO does (I had already set the CR2_FRXTH bit). In the reference manual for the STM32F7 (RM0431) it says that the TXE interrupt fires whenever the FIFO isn't more than half full, so for a byte-width transaction size, that means it will always try to have 3 bytes in the FIFO? Which would be fine, except it looks like when it transmits the first byte, the TXE interrupt (and subsequent DMA request) are serviced before the byte in the RXFIFO is committed to the specified memory pointer.

Seems like my only option (short of bit-banging the SPI transactions) is to set the priority of the TX DMA request much lower than the RX DMA request in the hopes that RX always gets handled first, but this seems like a pretty fragile solution. Would moving to a faster controller (I've got a pin-compatible H7-series in mind) make it any better, or would that just mean the out-of-order DMA transactions happen twice as fast?

berendi · ‎2020-03-26