cancel
Showing results for 
Search instead for 
Did you mean: 

32F417 SPI running at half the speed it should

PHolt.1
Senior II

Hello,

I am using a chopped-down function from the Cube HAL code to talk to a serial FLASH. The original function has a mass of options which were checked at runtime - the usual thing for HAL code...

It's all working fine. SPI2, 21MHz (the fastest I can go with PCLK1=PCLK2=42MHz).

But testing the speed, I am seeing 600us time taken to read 512 bytes. The SPI speed limit should be about 200us. The actual time, obtained by waggling a pin, is 600us. Can anyone see anything obviously wrong with this code? As I say, it works perfectly.

// Used only for SPI2. Mode is always SPI_MODE_MASTER and 2LINE.
 
//__attribute__((optimize("O0")))
HAL_StatusTypeDef B_HAL_SPI_TransmitReceive(SPI_HandleTypeDef *hspi, uint8_t *pTxData, uint8_t *pRxData, uint16_t Size)
{
  uint16_t             initial_TxXferCount;
  uint32_t             tmp_mode;
  HAL_SPI_StateTypeDef tmp_state;
 
  /* Variable used to alternate Rx and Tx during transfer */
  uint32_t             txallowed = 1U;
  HAL_StatusTypeDef    errorcode = HAL_OK;
 
  /* Init temporary variables */
  tmp_state           = hspi->State;
  tmp_mode            = hspi->Init.Mode;
  initial_TxXferCount = Size;
 
  if (!((tmp_state == HAL_SPI_STATE_READY) || \
        ((tmp_mode == SPI_MODE_MASTER) && (hspi->Init.Direction == SPI_DIRECTION_2LINES) && (tmp_state == HAL_SPI_STATE_BUSY_RX))))
  {
    errorcode = HAL_BUSY;
    goto error;
  }
 
  if ((pTxData == NULL) || (pRxData == NULL) || (Size == 0U))
  {
    errorcode = HAL_ERROR;
    goto error;
  }
 
  /* Don't overwrite in case of HAL_SPI_STATE_BUSY_RX */
  if (hspi->State != HAL_SPI_STATE_BUSY_RX)
  {
    hspi->State = HAL_SPI_STATE_BUSY_TX_RX;
  }
 
  /* Set the transaction information */
  hspi->ErrorCode   = HAL_SPI_ERROR_NONE;
  hspi->pRxBuffPtr  = (uint8_t *)pRxData;
  hspi->RxXferCount = Size;
  hspi->RxXferSize  = Size;
  hspi->pTxBuffPtr  = (uint8_t *)pTxData;
  hspi->TxXferCount = Size;
  hspi->TxXferSize  = Size;
 
  /*Init field not used in handle to zero */
  hspi->RxISR       = NULL;
  hspi->TxISR       = NULL;
 
  /* Check if the SPI is already enabled */
  if ((hspi->Instance->CR1 & SPI_CR1_SPE) != SPI_CR1_SPE)
  {
    __HAL_SPI_ENABLE(hspi);
  }
 
  	/* Transmit and Receive data in 8 Bit mode */
 
    // The need for this initial byte is unknown
    if (initial_TxXferCount == 0x01U)
    {
      *((__IO uint8_t *)&hspi->Instance->DR) = (*hspi->pTxBuffPtr);
      hspi->pTxBuffPtr++;
      hspi->TxXferCount--;
    }
 
    while ((hspi->TxXferCount > 0U) || (hspi->RxXferCount > 0U))
    {
      /* Check TXE flag */
      if ((__HAL_SPI_GET_FLAG(hspi, SPI_FLAG_TXE)) && (hspi->TxXferCount > 0U) && (txallowed == 1U))
      {
        *(__IO uint8_t *)&hspi->Instance->DR = (*hspi->pTxBuffPtr);
        hspi->pTxBuffPtr++;
        hspi->TxXferCount--;
        /* Next Data is a reception (Rx). Tx not allowed */
        txallowed = 0U;
      }
 
      /* Wait until RXNE flag is reset */
      if ((__HAL_SPI_GET_FLAG(hspi, SPI_FLAG_RXNE)) && (hspi->RxXferCount > 0U))
      {
        (*(uint8_t *)hspi->pRxBuffPtr) = hspi->Instance->DR;
        hspi->pRxBuffPtr++;
        hspi->RxXferCount--;
        /* Next Data is a Transmission (Tx). Tx is allowed */
        txallowed = 1U;
      }
 
    }
 
    // Clear overrun flag in 2 Lines communication mode because received is not read
    // For 45DBxx the Init.Direction is always SPI_DIRECTION_2LINES and is set up in b_main.c
    //if (hspi->Init.Direction == SPI_DIRECTION_2LINES)
    //{
    	__HAL_SPI_CLEAR_OVRFLAG(hspi);
    //}
 
error :
  hspi->State = HAL_SPI_STATE_READY;
  return errorcode;
}

25 REPLIES 25
TDK
Guru

Probably just the code unable to keep up with the rate at which the peripheral needs data. Examining the SPI clock line would confirm.

Higher optimization should change the result, or more streamlined code.

If you feel a post has answered your question, please click "Accept as Solution".

For high data rates use DMA.

> // The need for this initial byte is unknown

This is part of the problem. The second part is txallowed.

First stems from attempts of the authors to achieve maximum throughput by keeping Tx full, second is correction of the problem with Rx overflow which results from the first, resulting in certain cases in underutilizing the Tx buffer and sometimes (probably in your case) even in long gaps.

This in addition to poor optimization and/or other loads in the system. 

If you want to understand the code above and its limitations, toggle a GPIO at various places of that code, and observe together with SPI waveform on a LA.

JW

PHolt.1
Senior II

Is there a correct version of this function somewhere?

I don't think it is the CPU not keeping up, because the CPU is running at 168MHz, while the SPI is doing only 2 megabytes/sec.

It seems certain that at various points they are mis-using the double buffering in the USART, and allowing the whole TX to become empty while waiting for data, or vice versa.

I checked the SPI2 clock; it is 21MHz. I measured the 600us, very consistently, across this block

0693W00000HrLTXQA3.png

> Is there a correct version of this function somewhere?

It *is* correct, it's just it's lengthy. For high data rates use DMA.

> I don't think it is the CPU not keeping up

Show us the waveforms.

JW

PHolt.1
Senior II

It is simply impossible for the CPU to not be keeping up in such a tight loop. I watch the GPIO on a digital scope - just over 600us, with occassional longer ones due to interrupts, RTOS, etc. That's while I have a RTOS task doing 100k reads of the same page of the serial flash, so I get a decent trace.

I can show you that waveform but it is just a 600us pulse.

It's a factor of 3.

I think something funny is happening and it is probably waiting on the wrong thing. But I can't quite unravel the code. I've been doing embedded since 1970s (assembler) so I know how a UART works, but this code is just weird. I know with SPI you have to shift "something" out to generate clocks to receive the bit you want to receive, which produces some weird code, but this one I don't get, especially the 1-byte case. I posted about that bit before (here or EEBLOG) and nobody knew what it did.

You mentioned "sometimes (probably in your case) even in long gaps." which is probably exactly what is happening.

The bottleneck must be the long loop to receive the 512 bytes. That should just be sending out 512 bytes of "anything" to generate the 512x8 clock pulses, but you have to wait for TX to have room before you can load the next byte, but that is all that you need to do with TX. The RX data should be read out whenever there is anything in the RX buffer. The code must be doing this otherwise it would not work at all.

I think the problem is in the order in which they test TX space and then test RX data available. One should always test both. If TX has space, chuck a byte in there. If RX has a byte, get it out. there is absolutely nothing to lose doing that. I suspect they are wasting one byte's time for every two. But I can't quite get my head around it...

PHolt.1
Senior II

I've been looking at the code some more and I think they are doing exactly what I say they should be doing : - )

They have a while() loop running until both counters go to zero.

They have if() checks on tx buffer having space and rx buffer daving data, so these tests will always happen every time around the loop, so that loop is running solidly, not hanging anywhere.

What I don't get is why have the toggling flag "txallowed". It should not be needed. There is never any harm in checking either TX or RX. But if I remove that check, it hangs.

Basically what they seem to be doing is not generating clocks (which is what TX does) until an RX byte has been received. I might toggle a GPIO there and see what frequency that is happening at.

> I can show you that waveform but it is just a 600us pulse.

I mean, SCK.

JW

PHolt.1
Senior II

Just checked the clock on a scope. Definitely 21MHz. 47ns period.

Removing the txallowed test hangs it, which is curious. I wonder if SPI_FLAG_TXE or SPI_FLAG_RXNE are doing something strange. May be related to them testing the one-byte case.

BTW I do wish I could run SPI2 at 42MHz (the serial flash will run faster than that) but there seems no way if the two APB clocks are both 42MHz (and I need that for the desired baud rate range with the four UARTs I am using). And SPI1 is lost due to pin allocation...

> Definitely 21MHz.

Continuous, uninterrupted? Then you are not transferring 512 bytes.

Otherwise, there are several possibilities:

  • there are delays between bytes
  • there are ocassional lengthy delays caused by interrupts
  • combination of both

You should see this with carefully observing the waveform. Not just the beginning of it.

JW