Delays between SPI transfers on STM32F411 using DMA
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-01-14 02:16 AM - last edited on ‎2025-01-14 02:44 AM by waclawek.jan
Hello,
I am seeking help trying to understand and hopefully fix. I am writing data to a 32 channel SPI DAC. I am writing 3 bytes to each channel. To write to all channels and update simultaneously i trigger a LDA line after all 32 channels have been written to. what I see is large delays between 3 byte transmissions. I am using DMA and interupts.
The SPI port is set at 24MHZ and micro at 75MHZ using the PLL HSI. The 3 bytes transfer fine at 24Mhz. This takes approx 950nS. I then see a 2.5uS delay before the next 3 bytes are sent. This kills the overall time taken to something that breaches my spec. Full 32 channels taking approx. 105mS. I want it closer to 32ms. I'm sure I am overlooking something, but it has currently stumped me and my project is starting to slip. Any help much appreciated.
/**************** SPI DMA setup *******************/
void InitSPI3 ( void )
{
GPIO_InitTypeDef GPIO_InitStructure;
SPI_InitTypeDef SPI_InitStruct;
RCC_APB1PeriphClockCmd(RCC_APB1Periph_SPI3, ENABLE);
RCC_AHB1PeriphClockCmd(RCC_AHB1Periph_GPIOC, ENABLE);
RCC_AHB1PeriphClockCmd(RCC_AHB1Periph_DMA1, ENABLE);
// configure pins used by SPI - PC10 = SCK, PC11 = MISO, PC12 = MOSI
GPIO_InitStructure.GPIO_Pin = GPIO_Pin_10 | GPIO_Pin_12;
GPIO_InitStructure.GPIO_Mode = GPIO_Mode_AF;
GPIO_InitStructure.GPIO_Speed = GPIO_Speed_100MHz;
GPIO_InitStructure.GPIO_OType = GPIO_OType_PP;
GPIO_InitStructure.GPIO_PuPd = GPIO_PuPd_UP; //GPIO_PuPd_NOPULL; //;
GPIO_Init(GPIOC, &GPIO_InitStructure);
GPIO_PinAFConfig(GPIOC, GPIO_PinSource10, GPIO_AF_SPI3);
GPIO_PinAFConfig(GPIOC, GPIO_PinSource12, GPIO_AF_SPI3);
SPI_I2S_DeInit(SPI3);
SPI_InitStruct.SPI_Direction = SPI_Direction_1Line_Tx; //SPI_Direction_2Lines_FullDuplex;
SPI_InitStruct.SPI_Mode = SPI_Mode_Master;
SPI_InitStruct.SPI_DataSize = SPI_DataSize_8b; //SPI_DataSize_16b
SPI_InitStruct.SPI_NSS = SPI_NSS_Soft; // SPI_NSS_Hard
SPI_InitStruct.SPI_CPHA = SPI_CPHA_2Edge; // falling edge clock
SPI_InitStruct.SPI_CPOL = SPI_CPOL_Low; // default SPI_MODE_0
SPI_InitStruct.SPI_BaudRatePrescaler = SPI_BaudRatePrescaler_2; // SPI at max 25Mbps
SPI_InitStruct.SPI_FirstBit = SPI_FirstBit_MSB;
SPI_InitStruct.SPI_CRCPolynomial = 0; // NO CRC used
SPI_Init(SPI3, &SPI_InitStruct); // Initialise
SPI_Cmd(SPI3, ENABLE);
DMA_InitTypeDef DMA_InitStructure;
NVIC_InitTypeDef NVIC_InitStructure;
DMA_DeInit(DMA1_Stream7);
DMA_StructInit(&DMA_InitStructure);
// Configure DMA1 Channel0 - TX Stream 7
DMA_DeInit(DMA1_Stream7);
DMA_InitStructure.DMA_Channel = DMA_Channel_0;
DMA_InitStructure.DMA_PeripheralBaseAddr = (uint32_t)&SPI3->DR;
DMA_InitStructure.DMA_DIR = DMA_DIR_MemoryToPeripheral;
DMA_InitStructure.DMA_Memory0BaseAddr = (uint32_t)&SPITxBuffer[0]; //array data transmitted
DMA_InitStructure.DMA_BufferSize = sizeof(SPITxBuffer); //Buffer size
DMA_InitStructure.DMA_PeripheralInc = DMA_PeripheralInc_Disable;
DMA_InitStructure.DMA_MemoryInc = DMA_MemoryInc_Enable;
DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_Byte;
DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;
DMA_InitStructure.DMA_Mode = DMA_Mode_Normal;
DMA_InitStructure.DMA_Priority = DMA_Priority_VeryHigh; //DMA_Priority_High;
DMA_InitStructure.DMA_FIFOMode = DMA_FIFOMode_Disable;
// DMA_InitStructure.DMA_PeripheralBurst = DMA_PeripheralBurst_Single;
// DMA_InitStructure.DMA_MemoryBurst = DMA_MemoryBurst_Single;
DMA_Init(DMA1_Stream7, &DMA_InitStructure); //Initialise the DMA
DMA_ITConfig(DMA1_Stream7, DMA_IT_TC, ENABLE); // enable transfer complete interrupt
// SPI interrupt enable - dont really need if we are just transmitting in a loop
NVIC_PriorityGroupConfig(NVIC_PriorityGroup_2); //SPI3_IRQn
NVIC_InitStructure.NVIC_IRQChannel = DMA1_Stream7_IRQn;
NVIC_InitStructure.NVIC_IRQChannelPreemptionPriority = 0;
NVIC_InitStructure.NVIC_IRQChannelSubPriority = 1;
NVIC_InitStructure.NVIC_IRQChannelCmd = ENABLE;
NVIC_Init(&NVIC_InitStructure);
DMA_ClearFlag(DMA1_Stream7, DMA_IT_TCIF7);
DMA_ClearITPendingBit(DMA1_Stream7, DMA_IT_TCIF7);
DMA_Cmd(DMA1_Stream7, ENABLE);
SPI_I2S_DMACmd(SPI3, SPI_I2S_DMAReq_Tx, ENABLE);
}
/***************** SPI DMA ISR *****************/
void SPI3_DMA1_ISR()
{
// Test on DMA Stream Transfer Complete interrupt
if (DMA_GetITStatus(DMA1_Stream7, DMA_IT_TCIF7))
{
// Clear DMA Stream Transfer Complete interrupt pending bit
DMA_ClearITPendingBit(DMA1_Stream7, DMA_IT_TCIF7);
}
}
/********************** code that sets up the bytes and performs the 3 byte DMA transfer ************/
error_t WriteAD5383( uint8_t chanid, uint16_t value )
{
error_t dacerr = ERR_NONE;
/* Check channel is valid */
if( chanid > 32)
{
return ERR_INVALID_CHANNEL;
}
/*
* Form data for the AD5383 device
* 23 23 22 21 18 17 16 15 14 ..... 5 4 3 2 1 0
* A/B R/W 0 A4.. A0 REG1 REG0 D11 ............D1 D0 X X
*
* X = don't care
* A/B = Register select
* Ax = Input channel address
* Dx = data bits
*/
// add reg bits
value = value << 2;
/* channel in lower nibble of byte 0 */
SPITxBuffer[0] = (uint8_t)chanid;
/* DAC value bits 11:6 in byte 1 and Add Reg 0/1 bits */
SPITxBuffer[1] = (uint8_t)(value >> ‌‌ | 0xC0;
/* DAC value bits 5:0 in byte 2 */
SPITxBuffer[2] = (uint8_t)(value | 0x00);
// Wait if TXE indicates busy
while (SPI_I2S_GetFlagStatus(SPI3, SPI_I2S_FLAG_TXE) == RESET); // wait for empty buffer TODO timeout
/* Assert chip select */
SetDIO( DIOID_DAC_CS, DIOSTATE_ASSERT );
DMA_Cmd(DMA1_Stream7, DISABLE);
DMA1_Stream7->M0AR = (uint32_t)SPITxBuffer;
DMA1_Stream7->NDTR = sizeof(SPITxBuffer);
DMA_Cmd(DMA1_Stream7, ENABLE);
// Wait until data is sent
// while (SPI_I2S_GetFlagStatus(SPI3, SPI_I2S_FLAG_BSY) == SET);
while(!SPI_I2S_GetFlagStatus(SPI3, SPI_I2S_FLAG_TXE));
// Deassert chip select
SetDIO( DIOID_DAC_CS, DIOSTATE_DEASSERT );
return dacerr;
}
/*********** code that performs the 3 bytes writes, looped for the 32 channels ***********/
void InitDACs( void )
{
SetDIO (DIOID_DAC_LDAC, DIOSTATE_DEASSERT);
for (uint8_t i = 0; i <= 32; i++)
{
WriteAD5383(i,0x07FF);
}
// Set LDAC to update all DAC channels simultaneously
SetDIO (DIOID_DAC_LDAC, DIOSTATE_ASSERT);
}
/******************* clock setup ***************/
void InitRCC ( void )
{
RCC_DeInit();FLASH->ACR |= FLASH_ACR_LATENCY_4WS; // three wait states
FLASH->ACR |= FLASH_ACR_PRFTEN; // prefetch enable
FLASH->ACR |= FLASH_ACR_ICEN; // instruction cache enable
FLASH->ACR |= FLASH_ACR_DCEN; // data cache enable
FLASH_OB_Unlock();
RCC_HSEConfig(RCC_HSE_OFF);
RCC_HSICmd(ENABLE);
PWR->CR=0x11; // Set power scale register on.
while( RCC_GetFlagStatus(RCC_FLAG_HSIRDY) == RESET);
// PLL_M = 16, PLL_N 400, PLL_P = 4, PLL_Q = 9
RCC_PLLConfig(RCC_PLLSource_HSI, 16, 400, 4, 9); // mclk 1mhz
RCC_SYSCLKConfig( RCC_SYSCLKSource_PLLCLK ); // Use PLL as system clock
RCC_PLLCmd( ENABLE ); // Enable PLL
PWR->CR=0x11; // Set power scale register on.
while( RCC_GetFlagStatus(RCC_FLAG_PLLRDY) == RESET ) continue; // Wait till PLL is ready
while( RCC_GetSYSCLKSource() != 0x08 ) continue; // Wait till PLL is stable
RCC_HCLKConfig(RCC_SYSCLK_Div1); // HCLK = SYSCLK = 100MHZ
RCC_PCLK1Config(RCC_HCLK_Div2); // PCLK1 = HCLK/4 APB1 - 50Mhz
RCC_PCLK2Config(RCC_HCLK_Div1); // PCLK2 = HCLK/4 APB2 - 100Mhz
}
- Labels:
-
STM32F4 Series
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-01-14 02:47 AM
Assemble all 32x3 bytes into a single buffer and transfer it using a single DMA transfer. Otherwise the overhead of calling functions etc. does the delay you observe.
JW
PS. To post code, use </> at the top of the editor. I edited your initial post just to do that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-01-14 02:55 AM
Thanks JW. I didn't realise the overhead would be in that magnitude. Does that mean I should be using the DMA FIFO?
I assume I can do this under a single SPI CS and don't need it set it high and low between each transfer?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-01-14 03:17 AM
> Thanks JW. I didn't realise the overhead would be in that magnitude.
Do you use compiler optimization?
> Does that mean I should be using the DMA FIFO?
That won't help you, if you wait until end of trasfer each 3 bytes; and will bring complications such as aligment requirements.
> I assume I can do this under a single SPI CS and don't need it set it high and low between each transfer?
I don't know, that depends on the chip you are controlling, I did not read its datasheet; but I would be surprised if you'd need to toggle any signal between every 3 bytes.
JW
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-01-14 03:46 AM
@cpalmer54 wrote:I didn't realise the overhead would be in that magnitude. ?
You're using HAL.
The overheads in HAL setting up & starting a transfer can, indeed, be very large - just take a look at the source code!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-01-14 03:52 AM
Im not using and dont use Hal. Its CMSIS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-01-14 04:13 AM - edited ‎2025-01-14 04:13 AM
Oops - sorry.
But may still be worth looking into the CMSIS functions to see what overheads they add ...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-01-14 04:24 AM - edited ‎2025-01-14 04:26 AM
> I think aside from using DMA circular more I am already adhering to all the other points you mention.
I never mentioned using DMA in circular mode. [EDIT] oh now I see you replied to kerawill1122 - that's probably an AI-generated post attempting (and for some reason failing, at least so far) to inject spam here. We've seen many such recently. [/EDIT]
What I said is, that you should make the memory buffer 3x32 bytes wide, fill it at once and perform one single DMA transfer to transmit all 3x32 bytes at once. And that you should read the target chip's datasheet to find out how exactly the chipselect/framing/load/whateverisitcalled signal should be placed, and that I don't believe you'd need to toggle it after each three bytes.
JW
PS. Nominal one: CMSIS does not define peripheral-API functions. This is probably the deprecated SPL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-01-14 04:43 AM
Thanks. That response was not directed towards you. I was answering another person, but I see that reply has now been deleted.by them. What you first advised makes sense to me. Thats what I will try, and thenreport back the actual results.