Showing results for 
Search instead for 
Did you mean: 

SPI too slow

Associate III


I am using SPI on STM32H74II MCU

First I am using a software NSS and not a Hardware NSS because it doesn't work.

I need to transmit 1 byte and receive 1 byte, the two commands one after another see the simple code below :

  status = HAL_SPI_Transmit(&hspi3, buffer, 1, 1);
  status = HAL_SPI_Receive(&hspi3, buffer, 1, 1);

I am monitoring the SPI lines by a scope and I monitor 900 us between the NSS enable and the MISO data.

I need all this to run and end in sub 30 us.

what I can do to run this simple code in the required timing ?

Best regards

Jawad Khaleel


If you need strict timings, you should read and write directly to the SPI data register and not use any HAL function calls.

//pseudo code
SPI->DATAREGISTER = "byte to send here";
Poll status register, so you know when transmit/receive is done.
"some variable to store received byte" = SPI->DATAREGISTER

In the end the proplem might be somewhere else, than those lines of code.

What speed are you clocking the bus at?

Couldn't you use the HAL_SPI_TransmitReceive() variant and specify a slightly bigger buffer?

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

SPI isn't slow. HAL is slow. Rule #1: You can have either strict timings or use HAL in a project, never both.

Sending a byte using HAL on STM32F40690X00000BvvmKQAR.png

Using the register interface



Thanks for your reply.

Do you have an example code of using SPI registers for transmit and receive !?

Best Regards

Jawad Khaleel

Associate III


Thanks for your reply.

Do you have an example code of using SPI registers for transmit and receive !?

Best Regards

Jawad Khaleel


Do use 4 wire SPI method for transmit and receive. The master triggers the transaction by writing on the DR, and the transaction is complete when RXNE is set (but for a single byte, careful, I would write 32 bit which maybe the HW FIFO size for test).

I am using the HAL to transfer big enough data chunks at 12 MHz over a meter so that the start/stop delay is becoming second order of annoyance.

@berendi Thank you so much for posting these 'scope traces and absolutely/definitively/positively showing HAL's performance limitations.

I've "ranted" about HAL many times, both here and in my GitHub open-source repositories. That HAL is slow is completely obvious even from a casual inspection of its source code, and I've done some rudimentary timing tests. But for anyone who questions it, this proves it once and for all.

HAL is fine for "easy"/quick prototyping. I put "easy" in quotes because I don't really think it's that much easier than register-level coding, particularly when using a library such as the one I posted on GitHub (search my other posts on this site). Mainly it's that HAL is what Cube<whatever> produces. And if the product you spec'd from ST's line has FLASH, RAM, and CPU cycles to waste, a HAL implementation might be sufficient (once you get past the CUBE and HAL bugs).

For production quality, high-performance systems you really need register-level code. And the problem is ST's lack of documentation and examples (despite years of requests for those) which makes writing such code difficult.

I hate to point this out on an ST website but the newer ATMEGAs are actually faster than even your fast example ! And I'm sure even an 8 pin ATTINY13 would outstrip the HAL one !

Associate III

Hi JKhal,

I had the same problem with incoming SPI data that is too fast for the “HAL_SPI_Receive(…)�?- function.

Using direct SPI- register read, reduced the read- errors dramatically, but I still had some errors.

After more research, I found the solution: You also need to instruct the compiler to optimise your SPI- read function for speed (“-Ofast�?)

See attached code of my function…

  * >>  GetTel - Function  >>
  *  The incoming SPI data is very fast (6MHz clock), so we need to optimize
  *  our code in order to read the data without error.
  *  The Get-Telemetry- Function is optimized in 2 ways :
  *  ----------------------------------------------------
  *  1. The incoming data is read directly from the SPI- registers (SPI1->DR)
  *	    The "HAL_SPI_Receive(...)"- function is NOT used because it is too slow.
  *  2. Additionally the GCC - compiler needs to be instructed to optimize
  *     the compiled code for maximum speed ("-Ofast")
  *  Notes:
  *    *  The Altitude- Telemetry- data- packet starts with an 'AC' - byte,
  *  	  and it is 37 bytes long.
  *    *  The Other- Telemetry- data- packet starts with an 'AA' - byte.
  *    *  I'm using SPI1
  *    *  You also need to Enable the SPI after initialization like this:
  *  	  __HAL_SPI_ENABLE(&hspi1);
__attribute__((optimize("-Ofast"))) void GetTel(uint8_t StBy)
	do  // Find the next Telemetry data packet
		while ((SPI1->SR & SPI_FLAG_RXNE) == 0);  //Wait for Data Ready
		Tel_Rx[0] = SPI1->DR;	// Read Data Register Directly
	} while (Tel_Rx[0] != StBy); // repeat until correct start byte is found
	for (int x = 1; x < 37; x++)   // Capture the rest of the Telemetry data packet
		while ((SPI1->SR & SPI_FLAG_RXNE) == 0);  // Wait for Data Ready
		Tel_Rx[x] = SPI1->DR;	// Read Data Register Directly
	} // for 1 to 36

 Ok, so my application was for a “Receive Only Slave�? – SPI.

You were asking for an example of Tx and Rx with SPI registers.

The following code segment should give you an idea >>

while (((SPI1->SR)&(SPI_FLAG_TXE)) == 0);  //Wait for Tx buffer Empty before next Write
SPI1->DR = TxByte;	//Write to Data Register Directly
while (((SPI1->SR)&(SPI_FLAG_RXNE)) == 0);  //Wait for Data Ready to Read
RxByte = SPI1->DR;	//Read Data Register Directly