SPI bandwith using HAL ?

Gpeti · ‎2019-06-10

I'm testing the SPI capabilities of STM32H7. For this I'm using the SPI examples provided in STM32CubeH7. I will perhaps not keep this code in my own development, rigth now the goal is to understand how SPI is working and what bandwith I can get in the different modes (with DMA, with cache enabled or not, etc...).

I'd like to share the figures I've computed, as it doesn't seem very high. In the example, if I understood correctly, the CPU is @ 400Mhz and the SPI bus frequency @ 100MHz.

For polling mode I've measured the number of cycles of the call to function HAL_SPI_TransmitReceive.

For DMA I've measured between call to HAL_SPI_TransmitReceive_DMA and call to the transfer complete callback.

Measurements of cycles where made with SysTick clocked on internal clock. Since there is no low power usage, it should be accurate.

I've just modified ST's examples to send a buffer of 1KB.

I get around 200.000 CPU cycles in polling mode, which means around 2MB/s

And around 3MB/s in DMA mode.

Since the SPI clock runs at 100Mhz I would have expected much more, especially in DMA mode, what do you think ? Is there something wrong in my test procedure ?

Tesla DeLorean · ‎2019-06-10

ST's SPI peripheral is a depressingly sad implementation, for most devices the chip select line has to be managed manually.

What prescaler are you using? Looked at the external SPI clock with a scope? Looked at the data utilization with a logic analyzer?

Numbers quoted seem more aligned with a 25 MHz SCLK

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

S.Ma · ‎2019-06-10

I use SPI + DMA most of the time. The cache won't be a major booster.

The time lost will be in controlling NSS manually, kick in the DMA, get interrupt when block transfers (RX and TX) are completed, rise NSS

HAL or LL won't have the same performance as HAL will activate all possible interrupts from SPI+DMA, even if unused (such as DMA half transfer), in case you register a specfic callback. So if performance is the prime vector, use LL.

Note that if you go above ~48MHz, you have a look at the GPIO strength and signal integrity carefully.

Other SPI like protocol is the QSPI which maps in the STM32 memory space external QSPI flash. These flash are typically max 133MHz for sending commands and 54 MHz for read. (example herehttps://www.micron.com/products/nor-flash/serial-nor-flash/part-catalog/mt25ql128aba1ew7-0sit )

In 2019 wish list, forgot to ask for QSPI slave function, to have a high speed CMOS level DDR link between MCUs. :smirking_face:

Also, DMA only moves data blocks, you still need to create, process or transfer them. This can be background or sequential waits.

Total performance requires good SW architecture. In my case, I implement a multi-source interrupt triggered state machine on SPI

Gpeti · ‎2019-06-10

Well, you are probably right. I was thinking that for DMA mode the performance is not too much linked to the code quality since the HW is working by itself once the com has been started. However I had a look to the settings of SPI and clock in ST appli, and I don't really understand some of them.

Is it correct to say that the maximum frequency of the SPI SCK (clock) signal is half the frequency that the SPIO peripheral is fed with ie. the APB2 peripheral clock ?

Gpeti · ‎2019-06-10

Thanks for your feedback. I know about QSPI, I've also planned to test it. I didn't know about these limitations above a given frequency.

S.Ma · ‎2019-06-10

For STM32L4R5, SPI master goes up to SYSCLK/2 and slaves goes up to SYSCLK/4

For power reason, SYSCLK is 48MHz, so my SPI runs at 12 MHz (single master multi-STM32 slaves)

Gpeti · ‎2019-06-10

When you say SPI master you are talking about the maximum frequency of SPI clock signal or about the SPI peripheral clock ? I guess it is the SCK clock.

The documentation of STM32H7 specifies a maximum kernel clock for SPI1 peripheral of 200MHz in highest voltage. So this means the SPI SCK signal can goes up to 100MHz if I'm correct.

However in STM32CubeMX when selectiung Nucleo-H743 the maximum SPI peripheral clock is indicated at 120Mhz which would lead to a max theoretical bandwith of 60Mhz. Kinda confusing, I have to dig deeper into this topic.

Gpeti · ‎2019-06-11

For the record, I analysed a bit further the ST example code regarding SPI. The signal spi_ker_ck is plugged on pll1_q_clk (default) with a PLL setting leading to 200MHz, and the baudrate prescaler is set to 8, which leads to 25MHz, consistent to my measurements.

Changing the prescaler to 4 leads to a bandwith close to 50MHz.

However chaning the prescaler to 2 does not work (I receive crap).