cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H7x SPI Slave FIFO data loading

vegac
Associate II

Hello,

We have been trying to use the STM32H747I-DISCO to act as a SPI slave which should respond to a command byte, behaving similarly to how a sensor or EEPROM might.

That is, the fist byte sent from the master will be a command and our slave device should respond to this command. Unfortunately, the best we are able to do is respond to this command 3 bytes late.

In the following, the top is what we'd like, the bottom is what we are currently seeing.

 

2024-06-21-17-11-06-image.png

It appears that to get the slave to send back any data, the TXFIFO must have at least 32-bits loaded into it at the start of the transfer, it doesn't matter if I do this via a single 32-bit register write

hspi5.Instance->TXDR = 0x89ABCDEF // 32 bits of "junk"

or 4, 8-bit writes

*(volatile uint8_t *)hspi5.Instance->TXDR = 0x89 // "junk" byte 1
*(volatile uint8_t *)hspi5.Instance->TXDR = 0xAB // "junk" byte 2
*(volatile uint8_t *)hspi5.Instance->TXDR = 0xCD // "junk" byte 3
*(volatile uint8_t *)hspi5.Instance->TXDR = 0xEF // "junk" byte 4

Either way, we see the behavior in the bottom half of the image.

In more detail, if we do what we'd expect to be correct and do a single 8 bit write to load:

// Make 8-bit pointers to the TX and RX buffers
uint8_t* spi_rx_ptr;
spi_rx_ptr = (uint8_t*)&(hspi5.Instance->RXDR);

uint8_t* spi_tx_ptr;
spi_tx_ptr = (uint8_t*)&(hspi5.Instance->TXDR);

// Disable and re-enable the SPI peripheral
CLEAR_BIT(hspi5.Instance->CR1, SPI_CR1_SPE);
SET_BIT(hspi5.Instance->CR1, SPI_CR1_SPE);

// Set junk byte so we can receive from master
*spi_tx_ptr = 0xAB; // "junk" byte 1

// Wait for RXP (blocking)
while ((hspi5.Instance->SR & SPI_FLAG_RXP) == 0);

// Get the command from the master
uint8_t command_byte = *spi_rx_ptr;

// Normally here we'd switch on the command to change behavior,
// but that's added complexity not needed for this question.


// Wait for TXP so we can load the first (possibly only)
// byte to be transmitted in response to the command.
while ((hspi5.Instance->SR & SPI_FLAG_TXP) == 0);


// Load the byte (We have a little extra logic here
// for stepping though a LUT, incrementing the
// address if it's multi-byte, but, again, let's keep this minimal)
// and just hard code the response for testing
*spi_tx_ptr = 0xCD;


// We have some more logic after this to wait for the end of the
// transfer. For testing, this can be replaced by
// Wait for EOT (blocking)
while ((hspi5.Instance->SR & SPI_FLAG_EOT) == 0);
// Though this may only work once after each reboot.

We only see

 

2024-06-24-08-29-28-image.png

The `0xAB` of the 1st load to the TXFIFO works, but the second (setting it to `0xCD`) doesn't, instead sending out `0xFF` on MISO until the end of the transfer.

If instead we change the code to

// Make 8-bit pointers to the TX and RX buffers
uint8_t* spi_rx_ptr;
spi_rx_ptr = (uint8_t*)&(hspi5.Instance->RXDR);

uint8_t* spi_tx_ptr;
spi_tx_ptr = (uint8_t*)&(hspi5.Instance->TXDR);

// Disable and re-enable the SPI peripheral
CLEAR_BIT(hspi5.Instance->CR1, SPI_CR1_SPE);
SET_BIT(hspi5.Instance->CR1, SPI_CR1_SPE);

// Set junk byte so we can receive from master
*spi_tx_ptr = 0x12; // "junk" byte 1
*spi_tx_ptr = 0x34; // "junk" byte 2
*spi_tx_ptr = 0x56; // "junk" byte 3
*spi_tx_ptr = 0x78; // "junk" byte 4

// Wait for RXP (blocking)
while ((hspi5.Instance->SR & SPI_FLAG_RXP) == 0);

// Get the command from the master
uint8_t command_byte = *spi_rx_ptr;

// Wait for TXP so we can load the response
while ((hspi5.Instance->SR & SPI_FLAG_TXP) == 0);

// Load the response
*spi_tx_ptr = 0xCD;

// Wait for EOT (blocking)
while ((hspi5.Instance->SR & SPI_FLAG_EOT) == 0);

We are able to at least send the response:

 

2024-06-24-08-46-26-image.png

For added context, here is an abridged version of our init function (removed pin setup, error handling, etc.)

hspi5.Instance = SPI5;
hspi5.Init.Mode = SPI_MODE_SLAVE;
hspi5.Init.Direction = SPI_DIRECTION_2LINES;
hspi5.Init.DataSize = SPI_DATASIZE_8BIT;
hspi5.Init.CLKPolarity = SPI_POLARITY_LOW;
hspi5.Init.CLKPhase = SPI_PHASE_1EDGE;
hspi5.Init.NSS = SPI_NSS_HARD_INPUT;
hspi5.Init.FirstBit = SPI_FIRSTBIT_MSB;
hspi5.Init.TIMode = SPI_TIMODE_DISABLE;
hspi5.Init.CRCCalculation = SPI_CRCCALCULATION_DISABLE;
hspi5.Init.CRCPolynomial = 0x0;
hspi5.Init.NSSPMode = SPI_NSS_PULSE_DISABLE;
hspi5.Init.NSSPolarity = SPI_NSS_POLARITY_LOW;
hspi5.Init.FifoThreshold = SPI_FIFO_THRESHOLD_01DATA;
hspi5.Init.TxCRCInitializationPattern
= SPI_CRC_INITIALIZATION_ALL_ZERO_PATTERN;
hspi5.Init.RxCRCInitializationPattern
= SPI_CRC_INITIALIZATION_ALL_ZERO_PATTERN;
hspi5.Init.MasterSSIdleness = SPI_MASTER_SS_IDLENESS_00CYCLE;
hspi5.Init.MasterInterDataIdleness = SPI_MASTER_INTERDATA_IDLENESS_00CYCLE;
hspi5.Init.MasterReceiverAutoSusp = SPI_MASTER_RX_AUTOSUSP_DISABLE;
hspi5.Init.MasterKeepIOState = SPI_MASTER_KEEP_IO_STATE_DISABLE;
hspi5.Init.IOSwap = SPI_IO_SWAP_DISABLE;

// Set TSER = 0, TSIZE = 0 (RM0399 53.11.2)
hspi5.Instance->CR2 = 0;

Also note SPI5 is being clocked from HSE (25MHz) directly as Table 120 "SPI dynamic characteristics" of the STM32H747xI/G datasheet implies that a full duplex slave can't do above ~30Mhz, voltage dependent. Though, I'm not sure if that's saying the peripheral clock or the input clock from the master.

I am using the CM7 exclusively, with the CM4 asleep. The CM7 is clocked as high as possible, at 480MHz.

I have tested with the master clock set to as low as 100Hz and as high as 10Mhz and seen the behavior remain the same, only breaking down at higher clock speeds.

When testing similar code on the STM32F412G-DISCOVERY, with it's different SPI peripheral, we had very similar code working completely, albeit only up to ~1Mhz SPI clock. As we know there is very little time between receiving the command byte and when we need to lead the response data we moved to the H747 in hopes of being able to achieve 5Mhz or greater, noting that in the full code we were bottlenecked by the time to run the FSM which determines what data should be sent in response to the command byte.

3 REPLIES 3
vegac
Associate II

I was wrong about 32-bits needing to be loaded. 16-bit is enough, and makes it so that I am only 1 byte late. This is still an issue, but does point to it being a race condition with the peripheral instead of something with the FIFO I'd think?

vegac
Associate II

I believe this is the same issue seen in https://stackoverflow.com/questions/73037098/stm32-spi-slave-response-to-master-lags-for-several-bytes but that individual was able to rely on using the enabled CRC to act as a buffer byte between the command and loading the data. We don't have that capability. We *must* load the data byte after knowing the command, but must send it as the 1st byte after the command.

vegac
Associate II

We have found that doing

 

    MODIFY_REG(hspi5.Instance->CFG1, SPI_CFG1_UDRCFG, 0);
    MODIFY_REG(hspi5.Instance->CFG1, SPI_CFG1_UDRDET, 1);

 

At the end of our init function allows us to operate in 8-bit mode correctly (as in, we can respond in the 1st byte) however, it's very slow. We also don't understand why this is helping. If we load data to UDRDR, we never see it. (Note that Setting UDRCFG to 0b00 should be the mode where UDRDR is used on underrun) and I don't understand why the setting of UDRDET would matter here either.

 

If only setting one "junk" tx byte, allowing for responding correctly, we can only get reliable SPI communication up to ~2Mhz. Adding one extra junk byte - letting there be two bytes in the buffer - let's us operate above 10Mhz.

 

// Disable and re-enable the SPI peripheral
CLEAR_BIT(hspi5.Instance->CR1, SPI_CR1_SPE);
SET_BIT(hspi5.Instance->CR1, SPI_CR1_SPE);

#if 0
// If we do only 1 byte, max 2Mhz
*spi_tx_ptr = 0x89 // Single junk byte
#else
// If we do 2 bytes, max 10+Mhz
*spi_tx_ptr = 0xCD // "junk" byte 1
*spi_tx_ptr = 0xEF // "junk" byte 2
#endif

// Wait for RXP (blocking)
while ((hspi5.Instance->SR & SPI_FLAG_RXP) == 0);

// ... Code from the original post

 

 

In short, 8-bits works but is only reliable up to 2Mhz master clock. Using 16-bits is fast enough but this still makes us a byte late,

vegac_0-1719408481106.png

As I still need the response to be where we're seeing the 2nd (0xEF, in this example) "junk" byte on MISO.