Skip to main content
RhSilicon
Lead
June 27, 2023
Question

Do not use HAL_SPI_Transmit() for high performance (DMA may not be as necessary)

  • June 27, 2023
  • 4 replies
  • 4352 views

Hi,

I'm using an STM32F407VGT6 at 168MHz, SPI3 clocked at 21MHz (the SPI1, which does up to 42MHz is in use elsewhere).

I did some tests with the SPI ILI9341 display 2.4" of 320x240 pixels, and to fill the screen completely using HAL_SPI_Transmit() takes 191ms.

But if you only use what is necessary (bare metal?), the time is 58ms.

Is that 330% performance improvement? (191/58=3.293103448)

(I tried to change the 8-bit mode to 16-bit in flight, but I haven't been able to do it yet, I don't know if this time of 58ms will go down)

 

 

 

#define I_don_t_want_to_lose_performance_using_HAL 1

void ILI9341_FillRect(uint16_t x, uint16_t y, uint16_t w, uint16_t h,
		uint16_t color) {
	// clipping
	if ((x >= ILI9341_WIDTH) || (y >= ILI9341_HEIGHT))
		return;
	if ((x + w - 1) >= ILI9341_WIDTH)
		w = ILI9341_WIDTH - x;
	if ((y + h - 1) >= ILI9341_HEIGHT)
		h = ILI9341_HEIGHT - y;

	ILI9341_Select();
	ILI9341_SetAddressWindow(x, y, x + w - 1, y + h - 1);

	uint8_t data[] = { color >> 8, color & 0xFF };

	HAL_GPIO_WritePin(ILI9341_DC_GPIO_Port, ILI9341_DC_Pin, GPIO_PIN_SET);

	for (y = h; y > 0; y--) {
		for (x = w; x > 0; x--) {

#if I_don_t_want_to_lose_performance_using_HAL != 1
			HAL_SPI_Transmit(&ILI9341_SPI_PORT, data, sizeof(data),
			HAL_MAX_DELAY);
#else
			*((__IO uint8_t*) &ILI9341_SPI_PORT.Instance->DR) = data[0];

			while(!__HAL_SPI_GET_FLAG(&ILI9341_SPI_PORT, SPI_FLAG_TXE));

			*((__IO uint8_t*) &ILI9341_SPI_PORT.Instance->DR) = data[1];

			while(!__HAL_SPI_GET_FLAG(&ILI9341_SPI_PORT, SPI_FLAG_TXE));
#endif
		}
	}

	ILI9341_Unselect();
}

 

 

It might be very interesting to review all the STM32 libraries that use SPI, such as the display ones.

 Original full library here

This topic has been closed for replies.

4 replies

KnarfB
Super User
June 27, 2023

One API call per pixel is nonsense. Compare for larger arrays. Anyway, consider using DMA.

hth

KnarfB

 

RhSilicon
RhSiliconAuthor
Lead
June 28, 2023

Perhaps due to the fact that when the ILI9341 display operates in SPI mode and the color data is only two bytes, ends up discouraging the programmer to customize the routine using DMA.

Also, maybe whoever makes the libraries usually has a preliminary knowledge in generic systems like Arduino, where the generic API is used, to be portable to other MCUs, usually from different manufacturers.

"And I am not even using C++, the file is named .cpp since the Arduino classes require it."

I found several libraries claiming to make use of DMA, but using HAL_SPI_Transmit() instead of HAL_SPI_Transmit_DMA().

Anyway, I had already noticed this HAL problem with the interruption of Timers, but now I'm realizing that the HAL code must be disturbing all the peripherals, due to the many adaptations inserted in the code to be generic.

I found a tutorial on using SPI with DMA that might be interesting:

"By using HAL_SPI_Transmit_DMA() instead of HAL_SPI_Transmit() we tell the SPI peripheral to start transmitting s_TransferBuffer using the background direct memory access mechanism and immediately return control. When the first half of the buffer is transmitted, the DMA will raise an interrupt resulting in a call to HAL_SPI_TxHalfCpltCallback() that will generate the first half of the next frame while the current frame is still being transferred. When the entire buffer is transferred, HAL_SPI_TxCpltCallback() will get called and will generate the second half of the frame (while the DMA is already sending the first half of it). This results in 100% uninterrupted transmission without any gaps"

Using the SPI interface on STM32 devices

KnarfB
Super User
June 28, 2023

It is easy to fill the entire screen with a single DMA. Just disable source address increment and do something like

 uint16_t color = 0xA005;
 HAL_SPI_Transmit_DMA(&hspi1, (void*)&color, 320*240);

which will kick-off a huge number of 16-bit transfers.

KnarfB

RhSilicon
RhSiliconAuthor
Lead
June 28, 2023

"When you couldn't write solution, just solder it" :rolling_on_the_floor_laughing:

waclawek.jan
Super User
June 28, 2023

Yes, the maximum number of transfers in both the single-port DMA in 'L4 and dual-port DMA in 'F4 is 65535, and 240*320 is somewhat more. But this is an extreme case.

Some (most?) LCD controllers allow to define a rectangular area, and then transfers end up only in that area. This can be useful e.g. to use DMA to transfer a whole letter when displaying text.

JW

TDK
June 29, 2023

What you wrote will probably work okay if the SPI clock rate is high, but just because TXE=1 the transmission may not be complete. You should wait for BSY=0 after the final transfer before you pull CS high.

 

TDK_0-1688046392829.png

 

"If you feel a post has answered your question, please click ""Accept as Solution""."