Low Level SPI transaction on STM32F439z

CMott · ‎2019-03-07

I've been messing with the LL drivers for SPI and have run into some confusion.

I have HCLK set to 100 Mhz and PCLK1 set to 25 Mhz.

I set up the bus using the code generated from STM32cubeMX:

static void MX_SPI1_Init(void)
{
 
  /* USER CODE BEGIN SPI1_Init 0 */
 
  /* USER CODE END SPI1_Init 0 */
 
  LL_SPI_InitTypeDef SPI_InitStruct = {0};
 
  LL_GPIO_InitTypeDef GPIO_InitStruct = {0};
 
  /* Peripheral clock enable */
  LL_APB2_GRP1_EnableClock(LL_APB2_GRP1_PERIPH_SPI1);
  
  LL_AHB1_GRP1_EnableClock(LL_AHB1_GRP1_PERIPH_GPIOA);
  /**SPI1 GPIO Configuration  
  PA5   ------> SPI1_SCK
  PA6   ------> SPI1_MISO 
  */
  GPIO_InitStruct.Pin = LL_GPIO_PIN_5|LL_GPIO_PIN_6;
  GPIO_InitStruct.Mode = LL_GPIO_MODE_ALTERNATE;
  GPIO_InitStruct.Speed = LL_GPIO_SPEED_FREQ_VERY_HIGH;
  GPIO_InitStruct.OutputType = LL_GPIO_OUTPUT_PUSHPULL;
  GPIO_InitStruct.Pull = LL_GPIO_PULL_NO;
  GPIO_InitStruct.Alternate = LL_GPIO_AF_5;
  LL_GPIO_Init(GPIOA, &GPIO_InitStruct);
 
  /* USER CODE BEGIN SPI1_Init 1 */
 
  /* USER CODE END SPI1_Init 1 */
  /* SPI1 parameter configuration*/
  SPI_InitStruct.TransferDirection = LL_SPI_FULL_DUPLEX;
  SPI_InitStruct.Mode = LL_SPI_MODE_MASTER;
  SPI_InitStruct.DataWidth = LL_SPI_DATAWIDTH_16BIT;
  SPI_InitStruct.ClockPolarity = LL_SPI_POLARITY_LOW;
  SPI_InitStruct.ClockPhase = LL_SPI_PHASE_2EDGE;
  SPI_InitStruct.NSS = LL_SPI_NSS_SOFT;
  SPI_InitStruct.BaudRate = LL_SPI_BAUDRATEPRESCALER_DIV2;
  SPI_InitStruct.BitOrder = LL_SPI_MSB_FIRST;
  SPI_InitStruct.CRCCalculation = LL_SPI_CRCCALCULATION_DISABLE;
  SPI_InitStruct.CRCPoly = 10;
  LL_SPI_Init(SPI1, &SPI_InitStruct);
  LL_SPI_SetStandard(SPI1, LL_SPI_PROTOCOL_MOTOROLA);
  /* USER CODE BEGIN SPI1_Init 2 */
 
  /* USER CODE END SPI1_Init 2 */
 
}

I then enable the bus in main

LL_SPI_Enable(SPI1);

I then run a tight loop

 while (1)
  {
    /* USER CODE END WHILE */
 
	HAL_GPIO_WritePin(SPI1_CS_GPIO_Port, SPI1_CS_Pin, GPIO_PIN_RESET);
	SPI1_Receive();
	HAL_GPIO_WritePin(SPI1_CS_GPIO_Port, SPI1_CS_Pin, GPIO_PIN_SET);
    /* USER CODE END WHILE */
}

Which finally calls my spi function

static inline void SPI1_Receive(void){
	while(!LL_SPI_IsActiveFlag_TXE(SPI1));
	LL_SPI_TransmitData16(SPI1,0xA0A0);
	while(!LL_SPI_IsActiveFlag_RXNE(SPI1));
	LL_SPI_ReceiveData16(SPI1);
}

Note that I have removed all data handling to rule out any issues there.

I based this code on figure 253 of RM0090.

https://www.st.com/content/ccc/resource/technical/document/reference_manual/3d/6d/5a/66/b4/99/40/d4/DM00031020.pdf/files/DM00031020.pdf/jcr:content/translations/en.DM00031020.pdf

I end up with a 2.5 us delay after 3 transmissions. Don't worry about the signal level of the clock looking weird, I'm using a probe with a long grounding lead that's coupling to a lot of noise.

I ran into this problem originally when I tried hiding the latency of two consecutive transactions by sending two transmissions and then gathering the data afterward.

static void SPI1_Receive_2_pipelined(void){
	while(!LL_SPI_IsActiveFlag_TXE(SPI1));
	LL_SPI_TransmitData16(SPI1,0xA0A0);
	while(!LL_SPI_IsActiveFlag_TXE(SPI1));
	LL_SPI_TransmitData16(SPI1,0xA0A0);
	while(!LL_SPI_IsActiveFlag_RXNE(SPI1));
	LL_SPI_ReceiveData16(SPI1);
	while(!LL_SPI_IsActiveFlag_RXNE(SPI1));
	LL_SPI_ReceiveData16(SPI1);
}

This results in a delay between the two data transmissions being long after every other transmission after the first 4. This significantly reduces the timing slack to pick up the first data word before it gets overwritten. In addition it modulates the sampling rate of the SPI peripheral, which I wanted to be a deterministic and stable as possible.

Eventually the systick handler fires during this slack time and I miss the first word of data, deadlocking on the second while loop.

What is causing this delay? I went with a pure blocking implementation since it was simple to debug, and I need to toggle cs to reset the state machine of the SPI slave. Would a DMA based implementation prevent this? I am unfamiliar with this DMA engine, and wanted to check I'm not missing a simple solution before diving into that.

Bob S · ‎2019-03-08

In the first test, where you sent/received one 16-bit value at a time, maybe you could toggle another GPIO in the while() loop waiting for the RXNE flag. Or pulse a GPIO after you have read the RX data. That will tell you if the delay that you see from the SPI clock burst to the rising edge of the chip select is due to some (unexplained) delay in RX data available, or somewhere else.

The image from the first test also looks suspicious regarding the duration of the clock burst. In the first test, the time scale is 1us/div, and the duration looks to be around 3/5 of a division, or 0.8us. In the second test the time scale is 2.5us/div, and what I presume is the full 16-bit clock burst is about 3/5 of one division which about 1.5us. With PCLK1 = 25MHz and SPI prescaler DIV2 I would expect around 1.3us for the 16-bit clock burst. So the 2nd image makes sense. Did you change SPI clock frequencies between these tests?

CMott · ‎2019-03-08

Double checking I realized that the SPI bus is clocked by PCLKL2, which was set to 50 Mhz. The clock period is 40 ns for all traces, so a 16 bit transaction would be about 700 us. In the second example I was attempting 2 16 bit SPI transactions back to back so that is why most transactions look like they are about 1.4 us. However every third trace splits those two transactions. Good point on using more gpio for debug. I'll give that a try.

For now I disabled interrupts in the critical section of the code which preents the deadlock. I also added a timer based interrupt that sets a flag that the main while loop blocks on. I get fairly consistent samples down to a period of 6.5 us. After that I start to see much greater variability.

S.Ma · ‎2019-03-08

Feels wrong. Ok, this is the SPI Gen1 without FIFO.

When the SPI is ready to digest the next data to transmit, TXE is set.

When all the SPI clocks have been sent, the incoming RXNE flag is set.

TXE is first, RNXE is later.

With TXE you can manage to write DR while the previous written data is still being pushed.

With RNXE set, the communication has idled.

If you are looking at maximum throughput, you must use DMA TX AND RX channels on 2 different buffers (transmit and receive simultaneously).

In the SPI V2 (like in STM32L4), there are FIFO to improve in some cases the throughput even when DMA is not available.

SKALE · ‎2019-03-22

HI,

I am getting same issue....I can receive correct data when use 2 transmits.

Actually when I checked on DSO while receiving DATA SPI clock gets disabled.I am using PCLK2 =16MHz.

Please suggest some solution.

Thank You.

waclawek.jan · ‎2019-03-22

Post complete compilable program exhibiting the problem.

JW

S.Ma · ‎2019-03-22

Have you tried to run it with interrupt disabled?

SKALE · ‎2019-03-24

Yes,but its shows same issue.If I transmit dummy data in second transmit then also its gives correct data.I think it need clocks after transmission to receive data which receive command unable to generate.