Do not use HAL_SPI_Transmit() for high performance (DMA may not be as necessary)

RhSilicon · ‎2023-06-27

Hi,

I'm using an STM32F407VGT6 at 168MHz, SPI3 clocked at 21MHz (the SPI1, which does up to 42MHz is in use elsewhere).

I did some tests with the SPI ILI9341 display 2.4" of 320x240 pixels, and to fill the screen completely using HAL_SPI_Transmit() takes 191ms.

But if you only use what is necessary (bare metal?), the time is 58ms.

Is that 330% performance improvement? (191/58=3.293103448)

(I tried to change the 8-bit mode to 16-bit in flight, but I haven't been able to do it yet, I don't know if this time of 58ms will go down)

#define I_don_t_want_to_lose_performance_using_HAL 1

void ILI9341_FillRect(uint16_t x, uint16_t y, uint16_t w, uint16_t h,
		uint16_t color) {
	// clipping
	if ((x >= ILI9341_WIDTH) || (y >= ILI9341_HEIGHT))
		return;
	if ((x + w - 1) >= ILI9341_WIDTH)
		w = ILI9341_WIDTH - x;
	if ((y + h - 1) >= ILI9341_HEIGHT)
		h = ILI9341_HEIGHT - y;

	ILI9341_Select();
	ILI9341_SetAddressWindow(x, y, x + w - 1, y + h - 1);

	uint8_t data[] = { color >> 8, color & 0xFF };

	HAL_GPIO_WritePin(ILI9341_DC_GPIO_Port, ILI9341_DC_Pin, GPIO_PIN_SET);

	for (y = h; y > 0; y--) {
		for (x = w; x > 0; x--) {

#if I_don_t_want_to_lose_performance_using_HAL != 1
			HAL_SPI_Transmit(&ILI9341_SPI_PORT, data, sizeof(data),
			HAL_MAX_DELAY);
#else
			*((__IO uint8_t*) &ILI9341_SPI_PORT.Instance->DR) = data[0];

			while(!__HAL_SPI_GET_FLAG(&ILI9341_SPI_PORT, SPI_FLAG_TXE));

			*((__IO uint8_t*) &ILI9341_SPI_PORT.Instance->DR) = data[1];

			while(!__HAL_SPI_GET_FLAG(&ILI9341_SPI_PORT, SPI_FLAG_TXE));
#endif
		}
	}

	ILI9341_Unselect();
}

It might be very interesting to review all the STM32 libraries that use SPI, such as the display ones.

Original full library here

KnarfB · ‎2023-06-27

One API call per pixel is nonsense. Compare for larger arrays. Anyway, consider using DMA.

hth

KnarfB

RhSilicon · ‎2023-06-28

"When you couldn't write solution, just solder it" :rolling_on_the_floor_laughing:

RhSilicon · ‎2023-06-28

Perhaps due to the fact that when the ILI9341 display operates in SPI mode and the color data is only two bytes, ends up discouraging the programmer to customize the routine using DMA.

Also, maybe whoever makes the libraries usually has a preliminary knowledge in generic systems like Arduino, where the generic API is used, to be portable to other MCUs, usually from different manufacturers.

"And I am not even using C++, the file is named .cpp since the Arduino classes require it."

I found several libraries claiming to make use of DMA, but using HAL_SPI_Transmit() instead of HAL_SPI_Transmit_DMA().

Anyway, I had already noticed this HAL problem with the interruption of Timers, but now I'm realizing that the HAL code must be disturbing all the peripherals, due to the many adaptations inserted in the code to be generic.

I found a tutorial on using SPI with DMA that might be interesting:

"By using HAL_SPI_Transmit_DMA() instead of HAL_SPI_Transmit() we tell the SPI peripheral to start transmitting s_TransferBuffer using the background direct memory access mechanism and immediately return control. When the first half of the buffer is transmitted, the DMA will raise an interrupt resulting in a call to HAL_SPI_TxHalfCpltCallback() that will generate the first half of the next frame while the current frame is still being transferred. When the entire buffer is transferred, HAL_SPI_TxCpltCallback() will get called and will generate the second half of the frame (while the DMA is already sending the first half of it). This results in 100% uninterrupted transmission without any gaps"

Using the SPI interface on STM32 devices

KnarfB · ‎2023-06-28

It is easy to fill the entire screen with a single DMA. Just disable source address increment and do something like

    uint16_t color = 0xA005;
    HAL_SPI_Transmit_DMA(&hspi1, (void*)&color, 320*240);

which will kick-off a huge number of 16-bit transfers.

KnarfB

TDK · ‎2023-06-28

Careful, 240*320 > 65535, which is the limit on DMA transfers on the F4. The size parameter to HAL_SPI_Transmit_DMA is a uint16_t.

Still doable if you cast things to uint32_t instead.

If you feel a post has answered your question, please click "Accept as Solution".

waclawek.jan · ‎2023-06-28

Hi @KnarfB ,

> HAL_SPI_Transmit_DMA(&hspi1, (void*)&color, 320*240);

Did you actually try this? And guess why am I asking ;‑)

JW

KnarfB · ‎2023-06-28

Guess I see your point. Tried it on a STM32L4 with a smaller size parameter. 320*240 is too large and will overflow. Two transfers should do it, right?

KnarfB

waclawek.jan · ‎2023-06-28

Yes, the maximum number of transfers in both the single-port DMA in 'L4 and dual-port DMA in 'F4 is 65535, and 240*320 is somewhat more. But this is an extreme case.

Some (most?) LCD controllers allow to define a rectangular area, and then transfers end up only in that area. This can be useful e.g. to use DMA to transfer a whole letter when displaying text.

JW

waclawek.jan · ‎2023-06-29

> Still doable if you cast things to uint32_t instead.

How? SPI is limited to 16-bit data and the DMA count limitation is related to the peripheral-side transfers i.e. number of transfers into SPI. Even if you'd swap the direction and set SPI_DR at memory-side and buffer in memory at peripheral-side, and even if the dual-port DMA in 'F4 *can* actually split words to two halfwords when FIFO is switched on, upon a trigger it would transfer the word from peripheral-side (i.e. memory) to FIFO and if FIFO threshold is set to lowest it would in turn transfer *both* halfwords in rapid succession to memory-side (i.e. SPI_DR). While that is still feasible somewhat, as SPI Tx is double-buffered, it also means that you could not use the DMA trigger from SPI (as it is directly the TXE flag which is risen while one of those two buffers, i.e. the transmit shift register, is still not emptied). It might've been pulled off by triggering it from a well-calculated timer, but the whole contraption would be fragile and not worthy the hassle at all... just to be able to fill one screenful in one DMA run...

Why am I writing all this? Because yes, I was thinking about the same yesterday, when replying to @KnarfB ... :‑)

JW