SPI with DMA interrupt handling for SPI transfer complete

UFB · ‎2023-02-17

I'm having trouble understanding how the transfer complete interrupt is working.

I have setup SPI using DMA to speed up data transfer to an LCD-display.

The SPI clock is 18MHz, the System clock is 72MHz.

From the book Mastering STM32 my understanding is that the SPI_TX_COMPLETE_CB should be used to signal the completion.

It is also mentioned that the call back for the SPI is taken care of in the DMA Interrupt.

So I have only enabled DMA Interrupts and no SPI Interrupts.

I have copied large parts of the software and they seem to be using DMA in polling mode.

magenta trace is SPI-CLK

dark blue trace is MOSI/MISO

Here is the main part of the code I am using to produce the scope screendumps:

static void IL9341_WriteData(uint8_t *buff, size_t buff_size)
{
        IL9341_Select();
        IL9341_DC_Set();
        flag=0;
        HAL_SPI_Transmit_DMA(&IL9341_SPI_PORT, buff, buff_size);
 //       while (IL9341_SPI_PORT.hdmatx->State != HAL_DMA_STATE_READY)  <== this does not work
 //          {}
        IL9341_UnSelect();
}
void IL9341_Init(void)
{
 
    memset(disp_buf, 0, sizeof(disp_buf)); // fast way to fill buffer with 0
 
//    HAL_Delay(25);
    IL9341_RST_Clr();
//    HAL_Delay(25);
    IL9341_RST_Set();
//    HAL_Delay(50);
 
 
    IL9341_WriteCommand(IL9341_COLMOD);         //      Set color mode
 
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);  // Test pin is yellow trace
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);
    IL9341_WriteSmallData(IL9341_COLOR_MODE_16bit);
    IL9341_WriteCommand(0xB2);                              //      Porch control
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);
 
    {
            uint8_t data[] = {0x0C, 0x0C, 0x00, 0x33, 0x33};
            IL9341_WriteData(data, sizeof(data));
    }
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);
}
 
void spi_tx_complete() {
 
    HAL_GPIO_TogglePin(Diag_GPIO_Port, Diag_Pin);  // Test pin is light blue trace
    HAL_GPIO_TogglePin(Diag_GPIO_Port, Diag_Pin);
    flag=1;
}
 
int main(void)
{
...
 
  if((HAL_SPI_RegisterCallback(&hspi1,HAL_SPI_TX_COMPLETE_CB_ID,spi_tx_complete)==HAL_ERROR)) {
	  Error_Handler();
  }
while (1)
  {
	  IL9341_Init();
 
  }
}

Using the CubeIDE, I have looked at the code in

Drivers/STM32F1xx_HAL_Driver/Src/stm32f1xx_hal_spi.c

and

Drivers/STM32F1xx_HAL_Driver/Src/stm32f1xx_hal_dma.c

My remaining Questions are:

1) is it correct to register the SPI callback or should I register DMA callback instead?

2) why is there such a long delay in the yellow trace after the DMA-transfer?

It seems that something is stuck in low level (interrupt?) routines

as the end of the yellow trace coincides with the transfer-complete signal (light blue trace)

The delay seems to be similar to the delay between "normal" (non-dma) SPI transactions.

There is no checking with a while(flag==0){;} of the "complete" flag in the IL9341_WriteData routine,

yet it returns after completion. Does this not contradict the use of interrupts? Or is the system just slow?

UFB · ‎2023-02-17

Correction: Instead of "Test pin is light blue" it should read "Diag pin is light blue"

The colour refers to the scope traces, of course

UFB · ‎2023-02-17

Something very strange going on. I have done some more testing and added the check for transfer complete

static void IL9341_WriteData(uint8_t *buff, size_t buff_size)
{
        IL9341_Select();
        IL9341_DC_Set();
        flag=0;
        HAL_SPI_Transmit_DMA(&IL9341_SPI_PORT, buff, buff_size);
        while(flag==0){}
        IL9341_UnSelect();
}

for clarity I inseted double pulses before and after the transfer:

IL9341_WriteCommand(0xB2);                              //      Porch control
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);
 
    {
            uint8_t data[] = {0x0C, 0x0C, 0x00, 0x33, 0x33};
            IL9341_WriteData(data, sizeof(data));
    }
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);
    HAL_GPIO_TogglePin(Test_GPIO_Port, Test_Pin);

and now I get this picture:

So now it works as expected, albeit a little slow - for the 2.25µs data burst it takes 13.75µs to signal completion.

waclawek.jan · ‎2023-02-17

Is it a genuine STM32, or a clone on a BluePill?

Cube is open source, so you can look up what does it do.

In the first case, probably the Tx DMA transfer finished somewhere between the first and second toggle; the Tx DMA Transfer Complete interrupt is fired, inside it probably waits until SPI BUSY ends, I wouldn't be surprised if it does so in a loop - I can't explain why the delay is so long, but honestly I am completely uninterested to wade through Cube's bloatware to do so, especially for the ancient 'F1, and especially if it's not even an STM32. Please do so yourself if you are interested. Toggle a pin at interesting points in the Cube code, for example, and observe.

IMO the most reasonable thing to do is to dump Cube and instead, write your own code. It may take some time and some effort, but it pays off in the long term. Using genuine chips with real documentation often pays off, too.

And use Rx to determine end of SPI transfer.

JW

UFB · ‎2023-02-17

jan, thanks for your comment.

Of course it is a bluepill and the bluepill-diagnostics reveal that it is was falsely advertised as genuine.

Another one is on the way, this time "guaranteed" genuine - we'll see.

This strange behaviour could have to do with it being a clone.

STM32 is entirely new for me, so I'm glad that there is an IDE, even if it is bloated.

I doubt that I will become familiar enough to write my own code any time soon.

Here I'm migrating some old Atmel code to STM32 for a little display project.

Atmel with the TFT is too slow updating the display (the original Atmel is using 14-segment LED display).

So I'm trying to see if it will work with SPI+DMA.

I don't know what you mean by "using Rx to determine the end of SPI", the receiving end is a TFT.

I could easily calculate the end of transfer, given that I know the data size and the clock rate.

That might save me a few µs.

From glancing through the code, it seems to me that the "Cube's bloatware" is smart enough to catch any premature new transfer, before the previous one is completely finished.

In that case I need not be concerned with breaking something if I fire the data too rapidly.

waclawek.jan · ‎2023-02-17

The 'F1 family was the first STM32 and as such it suffers from some teething problems. The most annoying one is the GPIO which contrary to other STM32 families has a strange and rigid assignment of peripheral pads to pins, leading to conflicts and allowing to move ("remap") assignments only as a group.

Unless you are on a tight budget, a Nucleo64 with perhaps a 'F411 or 'F407 maybe a better choice. Besides a more capable mcu you get also an onboard STLink, i.e. debugging capability. For slightly higher price there's the F429 Disco with onboard LCD with the same ILI9341 controller.

> Here I'm migrating some old Atmel code to STM32

Atmel was a company manufacturing among other things half a dozen of entirely different mcu families - MARC-4 4-bitters, classic 8051 derivatives, 8-bit AVR, their derivative xMegas, 32-bit AVR32 and ARMv7/Cortex-M based families.

I suspect you are talking about the classic AVRs, perhaps ATMegas. And the code you are about to port is most likely written in C and based on direct register access. STM32 is not that dissimilar in basic things such as SPI - you set baudrate divisor register, set Master, CPOL/CPHA, and enable SPI in control register, and then it's enough to write into data register to see the bits falling out off the pins :) If you have debugging, you can play with all this in debugger, without actually writing any code.

What's trickier is, that prior to access SPI registers you need to enable SPI clock in RCC. But it's a single bit, not anything complicated. Also you need to set up the respective GPIO pins, again after having enabled respective GPIO in RCC. GPIO is slightly more complex - and as I've said, in 'F1 somewhat weird - but still really manageable after reading the GPIO chapter (especially the registers description) in RM0008.

Then you may decide to run the whole mcu faster than the default 16MHz. That's not necessary for experiments, but may be more fun perhaps. Again, read the Clocks (RCC) chapter in RM and also don't forget to set FLASH latency appropriately, and your good to go.

DMA is then only the last step. But I'd recommend you to get the display going without it first, and add it after that, perhaps mastering interrupts meantime.

This all sounds overwhelming and a lots of things to do, but it pays off in the long run and it's also lots of fun, I promise.

Or, just stick to clicking in CubeMX.

JW

UFB · ‎2023-02-17

Jan, I'm indeed talking about the 8-bit AVRs.

If the STM32 is somewhat similar, I might end up modifying bits directly if speed or space requires it, just as I did with the AVRs.

For the time being I will hang on to the Cube-IDE - it looks like the flash size is sufficient to bear the overhead. Since DMA now works, it also should speed up things by a factor of at least 10 for large chunks of data (like icons or large fonts).

With the Atmel I hat to flip pixels individually with every SPI transfer; you can imagine how slow that is.

Anyway, thanks for your input!

waclawek.jan · ‎2023-02-18

DMA won't buy you speed as such. It allows to run simultaneously SPI transfers while the processor may do something else, but at the cost of DMA setup, and also you have to write the program in a way that the processor does reasonable work during that time.

In many applications, there's no need for that at all.

Here is a non-Cube polled-SPI demo for the Disco-F429 I've mentioned above. And this is what it does (sorry for the horrible quality but you get the gist). Sure, the 'F429 there runs at 180MHz but SPI there is set to 11.25MHz so that's even slower than what you use now.

JW

S.Ma · ‎2023-02-18

Maybe late to the party, when using SPI Master with DMA, I only use DMA RX interrupt, never the TX.

This is because data transfer bits are determined by writing on the DR, while the bi-directional transfer ends by completing the reception (even if it is not used).

UFB · ‎2023-02-18

@Jan - thanks for the example. Brought two things to attention:

Initialization seem tricky, everybody does it in a different way, even with undocumented commands
had missed the max SPI-clock was 10MHz - saw 40ns somewhere and that threw me off

as to speed increase, I would hope that writing a 32x32 block of pixels (16400 bits) would take about 1.6ms in DMA mode whereas 2048 individual writes (each 5µs) would take more than 10ms - ok, so it is not quite a factor of 10.

@S.Ma - yes the transmit interrupt does not make much sense here - I am thinking of moving the check of the flag to the beginning of the dma-write. That way the code would only block if I write too fast (while another write is pending).