cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F103 HAL_SPI_TransmitReceive throughput much lower than expected – non-continuous SCK

Zaeem-Ahmed
Associate II

Hello everyone,

I am using SPI2 of STM32F103RCT6 to communicate with an AT45DB641E DataFlash device. I am trying to understand why the effective SPI data rate is significantly lower than what I expect based on the configured SPI clock.

Hardware setup

  • MCU: STM32F103RCT6 (Cortex-M3)
  • SPI instance: SPI2
  • SPI clock: 9 MHz
  • Data frame: 8 bits
  • SPI mode: 2-line unidirectional (full-duplex hardware)
  • NSS: Software-controlled GPIO
  • Core clock: 72 MHz
  • Custom PCB (STM32 + AT45 connected directly)
  • Schematics attached

Test description

I am reading the status register of the AT45DB641E.
Once the read command is sent and SS is held low, the flash continuously outputs two status bytes as long as clock pulses are provided.

I measured the execution time of HAL_SPI_TransmitReceive() by placing timestamps before and after the function call, using the DWT cycle counter (DWT->CYCCNT) available on the Cortex-M3.

Here is the code used for testing, the tx and rx buffer are globally defined of size 5000 and datatype uint8_t:

  /* Enable trace and debug block */
  CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
  /* Reset the cycle counter */
  DWT->CYCCNT = 0;
  /* Enable the cycle counter */
  DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;

  uint32_t previous_count=0;
  uint32_t diff=0;
  tx_buffer[0] = status_reg_read_opcode;
  // starting to receive status byte continuously
  HAL_GPIO_WritePin(GPIOB, GPIO_PIN_12, GPIO_PIN_RESET);
  previous_count = DWT->CYCCNT;
  HAL_SPI_TransmitReceive(&hspi2, tx_buffer, rx_buffer, 5000, 1000);
  diff = DWT->CYCCNT - previous_count;
  HAL_GPIO_WritePin(GPIOB, GPIO_PIN_12, GPIO_PIN_SET);


static void MX_SPI2_Init(void)
{
  /* SPI2 parameter configuration*/
  hspi2.Instance = SPI2;
  hspi2.Init.Mode = SPI_MODE_MASTER;
  hspi2.Init.Direction = SPI_DIRECTION_2LINES;
  hspi2.Init.DataSize = SPI_DATASIZE_8BIT;
  hspi2.Init.CLKPolarity = SPI_POLARITY_LOW;
  hspi2.Init.CLKPhase = SPI_PHASE_1EDGE;
  hspi2.Init.NSS = SPI_NSS_SOFT;
  hspi2.Init.BaudRatePrescaler = SPI_BAUDRATEPRESCALER_4;
  hspi2.Init.FirstBit = SPI_FIRSTBIT_MSB;
  hspi2.Init.TIMode = SPI_TIMODE_DISABLE;
  hspi2.Init.CRCCalculation = SPI_CRCCALCULATION_DISABLE;
  hspi2.Init.CRCPolynomial = 10;
  if (HAL_SPI_Init(&hspi2) != HAL_OK)
  {
    Error_Handler();
  }
}


I have also attached my full main.c file.

Measured results

Below is the data I collected manually from the Expressions window in STM32CubeIDE (debug mode).

No of bytescycles takenactual time (us)calculated datarate bytes/scalculated time (us)time differencescaling factor (actual / cal)

1000

2707113759.81152000888.82870.94.229
5000135148418770.611520004444.414326.24.223

From this, it is clear that the actual transfer time is ~4.2× slower than the theoretical SPI transfer time.

Logic analyzer observations

I then captured the SPI signals using a logic analyzer (PulseView .sr file attached can be opened in pulseview software).

Key observations:

  • The SCK is not continuous
  • The clock appears to be generated byte-by-byte
  • There is a noticeable gap between successive bytes, almost 2.9 us.
  • It looks like software is triggering the clock rather than the SPI peripheral running continuously (Please correct me if my interpretation is wrong.)

SPI_transmitReceive_API_5000bytes_waveform.PNG

My current understanding / assumptions

  • SPI clock is generated only when data is written to SPI->DR
  • To receive data, the master must transmit dummy bytes
  • The slave (AT45) only shifts data when clock is present

In HAL_SPI_TransmitReceive(), the while loop:

  • Polls TXE
  • Polls RXNE
  • Checks timeout
while ((hspi->TxXferCount > 0U) || (hspi->RxXferCount > 0U))
    {
      /* Check TXE flag */
      if ((__HAL_SPI_GET_FLAG(hspi, SPI_FLAG_TXE)) && (hspi->TxXferCount > 0U) && (txallowed == 1U))
      {
        *(__IO uint8_t *)&hspi->Instance->DR = *((const uint8_t *)hspi->pTxBuffPtr);
        hspi->pTxBuffPtr++;
        hspi->TxXferCount--;
        /* Next Data is a reception (Rx). Tx not allowed */
        txallowed = 0U;
      }

      /* Wait until RXNE flag is reset */
      if ((__HAL_SPI_GET_FLAG(hspi, SPI_FLAG_RXNE)) && (hspi->RxXferCount > 0U))
      {
        (*(uint8_t *)hspi->pRxBuffPtr) = hspi->Instance->DR;
        hspi->pRxBuffPtr++;
        hspi->RxXferCount--;
        /* Next Data is a Transmission (Tx). Tx is allowed */
        txallowed = 1U;
      }
      if ((((HAL_GetTick() - tickstart) >=  Timeout) && ((Timeout != HAL_MAX_DELAY))) || (Timeout == 0U))
      {
        hspi->State = HAL_SPI_STATE_READY;
        __HAL_UNLOCK(hspi);
        return HAL_TIMEOUT;
      }
    }
  }

These software checks introduce delay between successive writes to SPI->DR

 

This delay causes gaps in SCK, reducing effective throughput.

My questions

  1. Why is the official STM32 HAL SPI API unable to keep the clock running continuously at the configured 9 MHz? otherwise why would it offer 9Mhz?
  2. Is SPI clock generation strictly tied to writes to SPI->DR?
  3. Is there any way (using HAL) to keep SCK running continuously while receiving data?
  4. Where can I study the internal hardware of the STM32F103 SPI peripheral in more detail like the complete logic diagram or gates circuit to check how the clock is controlled?
  5. Where can I study the hardware of SPI peripheral of stm32f103rct6 in more details than ref manual, may be I can get a logic diagram or circuit diagram to see it in more detial specially how clock is being gating?
  6. What standard does the logic of SPI hardware follow? may be I can go through some SPI standards and understand this behaviour.

Motivation

I understand this might seem like going into excessive detail, but this is purely for learning purposes. I want to understand how SPI really works at the hardware level — not just from an API point of view, but the actual mechanics behind clock generation, data shifting, and timing.

Any insights, corrections, or references would be greatly appreciated.

Thank you for your time.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Andrew Neil
Super User

Yes, this is a well-known "feature".

 


@Zaeem-Ahmed wrote:
  1. Why is the official STM32 HAL SPI API unable to keep the clock running continuously at the configured 9 MHz? otherwise why would it offer 9Mhz?

The HAL source is available for you to inspect.

You will see that HAL_SPI_TransmitReceive() does quite a lot of work between the end of one byte* and the start of the next.

 You can improve matters by increasing the compiler optimisation level.

For maximum throughput, consider using DMA.

 

* Or whatever size transfer you are doing

 


@Zaeem-Ahmed wrote:

2. Is SPI clock generation strictly tied to writes to SPI->DR?.


Yes.

This is fundamental to the operation of SPI; not specific to STM32: there is exactly one clock per bit transmitted.

If no bit is transmitted, there is no clock.

 


@Zaeem-Ahmed wrote:

3. Is there any way (using HAL) to keep SCK running continuously while receiving data?


No - see above

 


@Zaeem-Ahmed wrote:

4. Where can I study the internal hardware of the STM32F103 SPI peripheral in more detail like the complete logic diagram or gates circuit to check how the clock is controlled?

5. Where can I study the hardware of SPI peripheral of stm32f103rct6 in more details than ref manual, may be I can get a logic diagram or circuit diagram to see it in more detial specially how clock is being gating?.


The Reference Manual for the chip is the definitive document - ST (like other manufacturers) don't publish internal details.

You could try looking on the 'Documentation' tab of the Product Page to see if there are any useful Application Notes...

https://www.st.com/en/microcontrollers-microprocessors/stm32f103rc.html#documentation

 


@Zaeem-Ahmed wrote:

6. What standard does the logic of SPI hardware follow? may be I can go through some SPI standards and understand this behaviour.

SPI is a long-established and widely-used standard - see, eg,

https://en.wikipedia.org/wiki/Serial_Peripheral_Interface

 


@Zaeem-Ahmed wrote:

I am using SPI2 of STM32F103RCT6


Are you sure it's a genuine ST chip? 

This part is widely cloned by 3rd parties, and sometimes clones are passed-off as ST originals

A complex system that works is invariably found to have evolved from a simple system that worked.
A complex system designed from scratch never works and cannot be patched up to make it work.

View solution in original post

11 REPLIES 11
Andrew Neil
Super User

Yes, this is a well-known "feature".

 


@Zaeem-Ahmed wrote:
  1. Why is the official STM32 HAL SPI API unable to keep the clock running continuously at the configured 9 MHz? otherwise why would it offer 9Mhz?

The HAL source is available for you to inspect.

You will see that HAL_SPI_TransmitReceive() does quite a lot of work between the end of one byte* and the start of the next.

 You can improve matters by increasing the compiler optimisation level.

For maximum throughput, consider using DMA.

 

* Or whatever size transfer you are doing

 


@Zaeem-Ahmed wrote:

2. Is SPI clock generation strictly tied to writes to SPI->DR?.


Yes.

This is fundamental to the operation of SPI; not specific to STM32: there is exactly one clock per bit transmitted.

If no bit is transmitted, there is no clock.

 


@Zaeem-Ahmed wrote:

3. Is there any way (using HAL) to keep SCK running continuously while receiving data?


No - see above

 


@Zaeem-Ahmed wrote:

4. Where can I study the internal hardware of the STM32F103 SPI peripheral in more detail like the complete logic diagram or gates circuit to check how the clock is controlled?

5. Where can I study the hardware of SPI peripheral of stm32f103rct6 in more details than ref manual, may be I can get a logic diagram or circuit diagram to see it in more detial specially how clock is being gating?.


The Reference Manual for the chip is the definitive document - ST (like other manufacturers) don't publish internal details.

You could try looking on the 'Documentation' tab of the Product Page to see if there are any useful Application Notes...

https://www.st.com/en/microcontrollers-microprocessors/stm32f103rc.html#documentation

 


@Zaeem-Ahmed wrote:

6. What standard does the logic of SPI hardware follow? may be I can go through some SPI standards and understand this behaviour.

SPI is a long-established and widely-used standard - see, eg,

https://en.wikipedia.org/wiki/Serial_Peripheral_Interface

 


@Zaeem-Ahmed wrote:

I am using SPI2 of STM32F103RCT6


Are you sure it's a genuine ST chip? 

This part is widely cloned by 3rd parties, and sometimes clones are passed-off as ST originals

A complex system that works is invariably found to have evolved from a simple system that worked.
A complex system designed from scratch never works and cannot be patched up to make it work.
Ozone
Principal III

> 1. Is SPI clock generation strictly tied to writes to SPI->DR?

Yes. SCLK is directly tied to an ongoing transmission.

> 3. Is there any way (using HAL) to keep SCK running continuously while receiving data?

Find out what the HAL code does in between the transmissions.
As mentioned, the code is available for inspection.

I don't know what you exactly want to achieve in your application.
But check the SPI slave datasheet (AT45DB641), and see if you can organize read/write transfers in blocks, and initiate them with DMA.
Most memory device slaves have some "read block" command, were the master only transmits the start address, and the slave returns values from incremental addresses in following reads, until /SS is deasserted. 

Hi Andrew, thanks for your reply.
This is a custom board and the board manufactrer stuffed the st chip. I don't remember the link provided to the PCB manufacturer for this chip to verify if the chip is original  or not, but I'll verify it and let you know.
Most probably, it is original, but is there anything in the observed behaviour that made you think that my chip might be a copy? 

Hi, thanks for your reply,
I have the littleFS running on my flash and the library I used has provide the SPI interface function prototypes that I implemented using the ST functions.
LittleFS stores it meta data on the flash and perform dozens of write/read before actually storing/reading the required data.
In short, I want to make SPI transmission and reception as fast as I can using this controller. That is why I have developed this test to see how the things are actually working.


@Zaeem-Ahmed wrote:

is there anything in the observed behaviour that made you think that my chip might be a copy? 


No - the delay you describe is down to the HAL software overhead.

A complex system that works is invariably found to have evolved from a simple system that worked.
A complex system designed from scratch never works and cannot be patched up to make it work.

> LittleFS stores it meta data on the flash and perform dozens of write/read before actually storing/reading the required data.

If Flash/EEPROM wites (on the SPI slave) are involved, the issue might be a bit different.
Erase / program takes quite some time, usually a few milliseconds per element (byte, word), up to hundreds of milliseconds per sector or chip.
Your code probably uses a fixed delay, a SPI read to check a Busy flag on the slave would show up in the trace.

But if that is the case, you need to check the datasheet. The device might refuse some or all read/write accesses during an ongoing operation. Or you might be able to access other addresses in the meantime (for an EEPROM-like device).

Thanks for your valueable insights. Yes the other delays does exist But the current environment in which I performed the tests is free of those delays according to the datasheet.
Currently I am focusing to minimize the delay assuming that there is no delay from the flash, ( status register read ) and make the bare SPI to run at its max speed.

Hi Andrew,
I want to use the SPI with DMA and same concept should apply there. Even we are only recieving data using SPI, we have to send the data too to keep the clock running (in SPI master mode) , right?
I was confident at my point until I read this misleading line from the stm32F1 reference manual (rm0008),
"When the SPI is used only to receive data, it is possible to enable only the SPI Rx DMA
channel."
I quoted this line from the topic, 25.3.9 "SPI communication using DMA" at page 719. Is this a typo or a something that is written in a different context?

 

> I was confident at my point until I read this misleading line from the stm32F1 reference manual (rm0008),
"When the SPI is used only to receive data, it is possible to enable only the SPI Rx DMA
channel."

I would recommend to read a good tutorial about SPI, and the basic concepts behind it.
The "S" stands for synchronous, which means that transmission and reception always happen at the same time, and are driven by the same clock signal.

The terms "transmission" and "reception" always relate to the one device you look at (master or slave). 
It is reversed for the other side - thus the signal names "MOSI" and MISO".

To come back to the RM statement above - of course you can "drop" the unused DMA direction, just don't config or enable the related functionality.
I have an application with a F303 as SPI slave, were this device only transmits to the master. Thus I only set up Tx DMA.

I would suggest to look for a SPI / DMA example project for your MCU, because the configuration can get a bit tricky.