STM32H743XI Multi SPI Circular DMA with GPIO manipulation

CMA · ‎2020-09-12

Hi, I want to achieve the title's behaviour to communicate with multiple SPI sensors in the 5 SPI buses at the highest possible throughput and minimal CPU load. I first tried the following with 1 bus.

void cpltCB(void);
uint8_t RXBUFF[8];
 
void main(void) {
   ...
   HAL_SPI_RegisterCallback(&hspi1, HAL_SPI_RX_COMPLETE_CB_ID, cpltCB);
   HAL_SPI_Receive_DMA(&hspi1, RXBUFF, 1); // SPI and DMA set up to transfer by WORD in a circular manner
   ...
}
 
void cpltCB(void) {
   static uint32_t sensorID;
 
   switch (sensorID) {
      case 0:
      HAL_GPIO_Write(SENSOR0_CS_PORT, SENSOR0_CS_PIN, 1);
      // do something with the WORD
      HAL_GPIO_Write(SENSOR1_CS_PORT, SENSOR1_CS_PIN, 0);
      sensorID = 1;
      break;
 
      case 1:
      HAL_GPIO_Write(SENSOR1_CS_PORT, SENSOR1_CS_PIN, 1);
      // do something with the WORD
      HAL_GPIO_Write(SENSOR0_CS_PORT, SENSOR0_CS_PIN, 0);
      sensorID = 0;
      break;
 
      default:
      sensorID = 0;
      break;
   }
}

I noticed that a few CLK cycles have passed before the cpltCB interrupt was triggered, resulting in the array of bytes being shifted aft randomly depending on when the interrupt was fired. I then tried the following:

void halfCpltCB(void);
void cpltCB(void) {;} // never called in this implementation
uint8_t RXBUFF[8];
 
void main(void) {
   ...
   HAL_SPI_RegisterCallback(&hspi1, HAL_SPI_RX_HALF_COMPLETE_CB_ID, halfCpltCB);
   HAL_SPI_RegisterCallback(&hspi1, HAL_SPI_RX_COMPLETE_CB_ID, cpltCB);
   HAL_SPI_Receive_DMA(&hspi1, RXBUFF, 2); // 2 WORDS this time
   ...
}
 
void halfCpltCB(void) {
   static uint32_t sensorID;
  
   __HAL_DMA_DISABLE(hspi1.hdmarx);	// disables DMA
   switch (sensorID) {
      case 0:
      HAL_GPIO_Write(SENSOR0_CS_PORT, SENSOR0_CS_PIN, 1); // pulls up CS when 1 WORD is received
      // do something with the WORD
      HAL_GPIO_Write(SENSOR1_CS_PORT, SENSOR1_CS_PIN, 0); // select the other slave
      sensorID = 1;
      break;
 
      case 1:
      HAL_GPIO_Write(SENSOR1_CS_PORT, SENSOR1_CS_PIN, 1); // pulls up CS when 1 WORD is received
      // do something with the WORD
      HAL_GPIO_Write(SENSOR0_CS_PORT, SENSOR0_CS_PIN, 0); // select the other slave
      sensorID = 0;
      break;
 
      default:
      sensorID = 0;
      break;
   }
   __HAL_DMA_ENABLE(hspi1.hdmarx); // reenables DMA
}

The WORD is still shifted in the buffer.

Any hints on how I can achieve what I want?

Thanks in advance.

TDK · ‎2020-09-12

It sounds like you're trying to address multiple slaves with the same SPI data/clock line but without any delay between clocks to allow for CS to be asserted. The STM32 isn't really set up to have tight timing like this. There will be some delay between the event that causes the interrupt and the interrupt actually being executed. This can be minimized by writing your own interrupt handlers and discarding HAL, but it will still be there.

Personally, if I had to do this, I would address slaves one at a time. Assert CS, start a DMA transmission. Then in the transfer complete de-assert CS and move on to the next slave. Written correctly for speed, there will be minimal time between each transaction.

There are other ways to do tight timing like this, but it's going to be difficult with many different slaves, as opposed to 1 or 2.

If you feel a post has answered your question, please click "Accept as Solution".

CMA · ‎2020-09-12

Thank you.

I've tried the solution you suggested, now the chain works but for some reason while(1) in main() never seemed to be reached at all.

void spi1CpltCB(void);
void spi2CpltCB(void);
void spi3CpltCB(void);
void spi4CpltCB(void);
void spi5CpltCB(void);
uint8_t spi1RxBuff[8];
uint8_t spi2RxBuff[8];
uint8_t spi3RxBuff[8];
uint8_t spi4RxBuff[8];
uint8_t spi5RxBuff[8];
uint32_t IDX_MAIN, IDX_SPI; // to monitor update frequencies
 
void main(void) {
   ...
   HAL_SPI_RegisterCallback(&hspi1, HAL_SPI_RX_COMPLETE_CB_ID, spi1CpltCB);
   HAL_SPI_RegisterCallback(&hspi2, HAL_SPI_RX_COMPLETE_CB_ID, spi2CpltCB);
   HAL_SPI_RegisterCallback(&hspi3, HAL_SPI_RX_COMPLETE_CB_ID, spi3CpltCB);
   HAL_SPI_RegisterCallback(&hspi4, HAL_SPI_RX_COMPLETE_CB_ID, spi4CpltCB);
   HAL_SPI_RegisterCallback(&hspi5, HAL_SPI_RX_COMPLETE_CB_ID, spi5CpltCB);
   HAL_SPI_Receive_DMA(&hspi1, spi1RxBuff, 1); // SPI and DMA set up to transfer by WORD, non-circular, DMA1_Stream0
   HAL_SPI_Receive_DMA(&hspi2, spi2RxBuff, 1); // SPI and DMA set up to transfer by WORD, non-circular, DMA1_Stream1
   HAL_SPI_Receive_DMA(&hspi3, spi3RxBuff, 1); // SPI and DMA set up to transfer by WORD, non-circular, DMA1_Stream2
   HAL_SPI_Receive_DMA(&hspi4, spi4RxBuff, 1); // SPI and DMA set up to transfer by WORD, non-circular, DMA1_Stream3
   HAL_SPI_Receive_DMA(&hspi5, spi5RxBuff, 1); // SPI and DMA set up to transfer by WORD, non-circular, DMA1_Stream4
 
   while (1) {
      IDX_MAIN++; // to see if while(1) is reached
   }
   ...
}
 
void spi1CpltCB(void) {
   static uint32_t sensorID;
 
   switch (sensorID) {
      case 0:
      HAL_GPIO_Write(SPI1_SENSOR0_CS_PORT, SPI1_SENSOR0_CS_PIN, 1);
      // do something with the WORD
      HAL_GPIO_Write(SPI1_SENSOR1_CS_PORT, SPI1_SENSOR1_CS_PIN, 0);
      sensorID = 1;
      break;
 
      case 1:
      HAL_GPIO_Write(SPI1_SENSOR1_CS_PORT, SPI1_SENSOR1_CS_PIN, 1);
      // do something with the WORD
      HAL_GPIO_Write(SPI1_SENSOR0_CS_PORT, SPI1_SENSOR0_CS_PIN, 0);
      IDX_SPI++; // to measure SPI transfer frequency
      sensorID = 0;
      break;
 
      default:
      sensorID = 0;
      break;
   }
 
   HAL_SPI_Receive_DMA(&hspi1, spi1RxBuff, 1); // nested SPI DMA requests
}
 
void spi2CpltCB(void) {...} // written similarly
void spi3CpltCB(void) {...} // written similarly
void spi4CpltCB(void) {...} // written similarly
void spi5CpltCB(void) {...} // written similarly

I've got nothing in spi3,4,5RxBuff and IDX_SPI is approximately incremented at 40kHz. Am I simply seeing two instances of HAL_SPI_Receive_DMA eating up all the CPU time at 40kHz, given that SPI1 and 2 seem to work as intended?

Last thing I tried was nesting all 5 buses callbacks together, so that the DMA requests went like SPI1>2>3>4>5>1... and the entire thing worked with IDX_MAIN and IDX_SPI incremented at ~700kHz and ~10kHz.

Would the last option be what I should stick to? I shall also see if moving to LL/direct registry manipulation would improve the frequencies.

Thanks again.

waclawek.jan · ‎2020-09-13

There's no point to use DMA to transfer a single word, it's a further waste of time. IIRC, SPI on H7 is capable of 32-bit frames. Also you might want to use NSS toggling, read SPI chapter in RM.

JW

MM..1 · ‎2020-09-13

I mean optimizer removes your code from main, you need mark idx as IO or volatile

CMA · ‎2020-09-13

Let me try just that to see if polling would actually be faster. Thanks.

CMA · ‎2020-09-13

Negative, both variables were not removed by the compiler. The behaviour was that the line

HAL_SPI_Receive_DMA(&hspi3, spi3RxBuff, 1);

was not even reached.

MM..1 · ‎2020-09-13

When you realy need hw based sw independent DMA based multi SPI then i mean you need CS based on PWM timer channels calculated and synced with spi speed...

TDK · ‎2020-09-13

Your original post made me think it was many slaves on a single SPI. Looks like it's 1 slave per SPI.

> The behaviour was that the line HAL_SPI_Receive_DMA(&hspi3, spi3RxBuff, 1); was not even reached.

The CPU has limited power. Calling an interrupt at the end of every byte transfer is going to prevent it from doing anything useful if the SPI clock is high.

If you feel a post has answered your question, please click "Accept as Solution".

CMA · ‎2020-09-13

At the end it's 2 slaves per bus, hence there's a little statemachine in the cpltCB that switches the different CS.

I was under the impression that the cpltCB is light weight enough to not fully occupy the CPU, I'll try and decrease the SPI freq to see if I can get better behaviours.

Thanks again.