How to get validation that SPI transmit has TX data on peripheral and is ready for master get it?

MFolk.1 · ‎2023-01-06

I am using the STM32L4S9AII6 clocked at 120MHz. This device is being used as a slave to a master that is clocked at a much higher frequency. They communicate over the SPI1 bus. I am trying to optimize the rate of communication between these two devices.

How do I validate that the SPI transmit API call has TX data sent to the SPI peripheral and is ready for master get it? I need to set a flag (GPO pin) to the master when the slave has finished processing a command and is ready for another command. When I try to set this "finished processing" flag immediately after the HAL_SPI_TransmitReceive_IT() API call the master sends the next command and attempts to retrieve requested data from the last command simultaneously. However, most often the slave misses the deadline and the data that was intended to be sent to the master doesn't show up until the next command cycle.

Obviously the MCU is not able to get the data to the SPI peripheral fast enough before the master comes asking for this data. So I need a way to check/validate that the data is on the SPI peripheral TX register before setting this flag to tell the master "Your data is here, come and get it".

Any suggestions on how to do this?

Thanks!

KnarfB · ‎2023-01-06

4 bytes will fit perfectly into 32-bit SPI FIFO. If you config SPI using STM32CubeMX for polling, and a GPIO, say PB3 for GPIO output, the following should work:

SPI1->DR = 0x2211;  // SPI DR is 16-bit, pack 2 bytes
SPI1->DR = 0x4433;
SPI1->CR2 &= ~SPI_CR2_FRXTH;  // 0: RXNE event 16-bit
SPI1->CR1 |= SPI_CR1_SPE;     // SPI enable
GPIOB->BSRR = GPIO_PIN_3;     // tell master

the master will issue 32 clock cycles for data transfer

while(!(SPI1->SR & SPI_SR_RXNE));
uint16_t d1d0 = SPI1->DR;  // read 1st 2 bytes
while(!(SPI1->SR & SPI_SR_RXNE));
uint16_t d3d2 = SPI1->DR;  // read 2nd 2 bytes
GPIOB->BRR = GPIO_PIN_3;     // reset flag
SPI1->CR1 &= ~SPI_CR1_SPE;     // SPI disable

At the end, d0 and d1 should contain the next request from the master.

If blocking RXNE polling is not feasible in your use case, you might switch to DMA, see timing diagram Figure 467. Slave full-duplex communication in RM0351 Reference manual and implement the Rx transfer complete interrupt for evaluating the response and preparing next data.

hth

KnarfB

View solution in original post

S.Ma · ‎2023-01-06

Well first I assume the protocol is mcu friendly. That NSS on slave MCU can trigger interrupts from EXTi, on rising edge, when the paquet exchange is over, and processing it is needed. This defines the minimum time NSS should remain high before next data exchange.

Use DMA for RX and TX data buffers in cyclic mode. I assume packet length varies.

Depends on your SPI version, you may have to reset the spi fifo amd flush them between pacquet transmission.

Thr average data rate increase with bigger packets.

AScha.3 · ‎2023-01-06

>Obviously the MCU is not able to get the data to the SPI peripheral fast enough before the master comes asking for this data.

so you need faster cpu - or
need to slow down the communication

If you feel a post has answered your question, please click "Accept as Solution".

KnarfB · ‎2023-01-06

HAL_SPI_TransmitReceive_IT call will config and finally enable SPI peripheral. This will cause TXE flag set, which will raise an interrupt, which is handled by HAL and fills data register DR. The interrupt handling might be too slow at high freq. and the next statement (GPIO write) might come first.

How long is the data, just 1..2 bytes? Then there is little advantage of using HAL_SPI_TransmitReceive_IT call. You could use the blocking call which guarantees SPI DR to be filled before return.

Anyway, HAL is not the best choice when trying to optimize the rate of communication. It might be better using HAL just for setup and config and doing the data transfer and GPIO at register level.

Timing/logic analyzer diagrams would be helpful for further analysis.

hth

KnarfB

MFolk.1 · ‎2023-01-06

"Well first I assume the protocol is mcu friendly. That NSS on slave MCU can trigger interrupts from EXTi, on rising edge, when the paquet exchange is over, and processing it is needed. This defines the minimum time NSS should remain high before next data exchange."

I am using the hardware NSS and this is setup in full-duplex mode. You've misunderstood what I'm trying to do a bit. I am not bunching transactions back-to-back. Rather, I am extracting a command to do work from each transaction and producing data that needs to be read back to master by the time the next transaction is allowed to be sent from master. Therefore, synchronizing on NSS does not solve the problem and is unrelated to the problem statement.

"Use DMA for RX and TX data buffers in cyclic mode. I assume packet length varies."

I have tried this using DMA and IT and I have the same problem in both cases. The packet length does not vary. Each transaction (Tx & Rx) are all 4 bytes.

"Depends on your SPI version, you may have to reset the spi fifo amd flush them between pacquet transmission."

In what way(s) does it depend on my SPI version? How would flushing the Tx FIFO help here? I have proven that data is transmitting just fine...Eventually. I just need to optimize the timing so that I can tell the master to start a transmission right as soon as the MCU (slave) is ready with data to send. In other words, I don't want to just guess on how long the MCU needs to have the Tx data ready and insert a timer/sleep routine. This approach is not safe and certainly not optimized for timing.

MFolk.1 · ‎2023-01-06

"You could use the blocking call which guarantees SPI DR to be filled before return."

Each transaction is 4 bytes, so maybe a blocking call would work. However, I thought the blocking call would stall the task until the entire transaction is complete. As this is a slave device with other work to do when not servicing commands from master, you can see how I don't want to block this task. So are you saying that using the blocking call only stalls the task until DR is filled (doesn't wait for transaction with master to complete)?

"Anyway, HAL is not the best choice when trying to optimize the rate of communication. It might be better using HAL just for setup and config and doing the data transfer and GPIO at register level."

I agree, but I am a bit under the gun right now as I have a lot of people waiting on me for this. Do you have a link to any examples where someone has done this by scratch on this architecture? I haven't manually written communication port type drivers since college and I don't have a lot of time to dust off cob-webs. With that said, if your answer to my previous question is what I want to hear (use the blocking function) then that might be sufficient.

Thanks!

MFolk.1 · ‎2023-01-06

Never mind my previous question about the blocking API call. I verified via testing that the blocking call will stall the task until the entire transaction with master is complete, making this approach a definite no go for my application. I guess I'll stop using HAL calls and find a way to write my own drivers from scratch. Any examples or help here would be greatly appreciated.

S.Ma · ‎2023-01-06

Then give details:

STM32L4S @120Mhz

SPI slave mode, 4 bytes exchange with master.

4 wire interface with bidir exchange.

SCK clock speed MHZ = ?

WCET = ?

Minimum time NSS is high on the line = ? us

How long is the SPI interrupt worst case ?

Is the protocol tolerant to communication errors? What will happen, say the slave power up later than host and starts spi slave in the middle of a packet?

KnarfB · ‎2023-01-06

4 bytes will fit perfectly into 32-bit SPI FIFO. If you config SPI using STM32CubeMX for polling, and a GPIO, say PB3 for GPIO output, the following should work:

SPI1->DR = 0x2211;  // SPI DR is 16-bit, pack 2 bytes
SPI1->DR = 0x4433;
SPI1->CR2 &= ~SPI_CR2_FRXTH;  // 0: RXNE event 16-bit
SPI1->CR1 |= SPI_CR1_SPE;     // SPI enable
GPIOB->BSRR = GPIO_PIN_3;     // tell master

the master will issue 32 clock cycles for data transfer

while(!(SPI1->SR & SPI_SR_RXNE));
uint16_t d1d0 = SPI1->DR;  // read 1st 2 bytes
while(!(SPI1->SR & SPI_SR_RXNE));
uint16_t d3d2 = SPI1->DR;  // read 2nd 2 bytes
GPIOB->BRR = GPIO_PIN_3;     // reset flag
SPI1->CR1 &= ~SPI_CR1_SPE;     // SPI disable

At the end, d0 and d1 should contain the next request from the master.

If blocking RXNE polling is not feasible in your use case, you might switch to DMA, see timing diagram Figure 467. Slave full-duplex communication in RM0351 Reference manual and implement the Rx transfer complete interrupt for evaluating the response and preparing next data.

hth

KnarfB