STM32L4 SPI transfer using HAL takes a long time (DMA and IT)

I use a STM32L496 and I'm implementing the SPI1 port for the communication with an external device.

I have used CubeMX to set things up with SPI_NSS_HARD_OUTPUT and an external pull-up on the SPI1_NSS pin. I have connected SPI1_MOSI to SPI1_MISO to read the transmitted data back.

When using HAL_SPI_TransmitReceive_DMA() or HAL_SPI_TransmitReceive_IT() the data is all send and received. The 32 bits are transferred within about 16us as SPI1_SCK is at 2MHz. SPI1_NSS goes low when __HAL_SPI_ENABLE() is used by HAL_SPI_TransmitReceive_DMA() or HAL_SPI_TransmitReceive_IT(). But after this it takes about 200us before SPI1_NSS goes high when  __HAL_SPI_DISABLE() is used from within HAL_SPI_RxCpltCallback().

Why this long delay?

There is no difference between HAL_SPI_TransmitReceive_DMA() and HAL_SPI_TransmitReceive_IT().

When using SPI_NSS_SOFT instead of SPI_NSS_HARD_OUTPUT and make SPI1_NSS low before the use of HAL_SPI_TransmitReceive_DMA() or HAL_SPI_TransmitReceive_IT(), and high when HAL_SPI_GetState() doesn't show SPI1 is still busy, it also shows a delay of about 200us. Why?



This delay of about 200us is caused by the handling of the interrupts as implemented within HAL.

It partly consists of the useless handling of the Half Complete interrupt (DMA_IT_HT), which can't be disabled, as it is set by HAL_DMA_Start_IT() because HAL_SPI_TransmitReceive_DMA()  makes hdma->XferHalfCpltCallback not equal to NULL.

The largest part is simply the handling of the DMA_IT_TC interrupt within HAL.

My own code (as part of the interrupt handling) took less than 4us.