cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F4 data length assert failed for HAL_SPI_TransmitReceive_DMA

Jaroslaw Hill
Associate II

Hello,

I think I have interesting case, but from the beginning

I use STM32F446ZEJx and HAL Drivers from STM32Cube_FW_F4_V1.28.3.

When I use 

HAL_StatusTypeDef HAL_SPI_TransmitReceive_DMA(SPI_HandleTypeDef *hspi, const uint8_t *pTxData, uint8_t *pRxData, uint16_t Size)

with Size=5 it internally checks the argument

if ((pTxData == NULL) || (pRxData == NULL) || (Size == 0U))
  return HAL_ERROR;

and passes the Size to

/* Set the transaction information */
hspi->ErrorCode = HAL_SPI_ERROR_NONE;
hspi->pTxBuffPtr = (const uint8_t *)pTxData;
hspi->TxXferSize = Size;
hspi->TxXferCount = Size;
hspi->pRxBuffPtr = (uint8_t *)pRxData;
hspi->RxXferSize = Size;
hspi->RxXferCount = Size;

later it configures the DMA channels,

Rx

/* Enable the Rx DMA Stream/Channel  */
HAL_DMA_Start_IT(hspi->hdmarx, (uint32_t)&hspi->Instance->DR, (uint32_t)hspi->pRxBuffPtr, hspi->RxXferCount)

Tx

/* Enable the Tx DMA Stream/Channel  */
HAL_DMA_Start_IT(hspi->hdmatx, (uint32_t)hspi->pTxBuffPtr, (uint32_t)&hspi->Instance->DR, hspi->TxXferCount)

where is another control of DataLength parameter

HAL_StatusTypeDef HAL_DMA_Start_IT(DMA_HandleTypeDef *hdma, uint32_t SrcAddress, uint32_t DstAddress, uint32_t DataLength)

  /* Check the parameters */
  assert_param(IS_DMA_BUFFER_SIZE(DataLength));
  // #define IS_DMA_BUFFER_SIZE(SIZE) (((SIZE) >= 0x01U) && ((SIZE) < 0x10000U))

which triggers the assertion

Assert failed in file ../Drivers/STM32F4xx_HAL_Driver/Src/stm32f4xx_hal_dma.c on line 459

And here is the problem, what could happen in the mean time?

Quite obvious answer is interrupt that changed the structure fields

hspi->TxXferCount = 0;
hspi->RxXferCount = 0;

but the SPI and DMA interrupts are enabled at the end, after the configuration that failed.

The stack and heap is big enough, no nested interrupts, assert failed stops the execution and waits for watchdog reset.

 

The way I use HAL library is summarized below, the same callback for success and for error, when error I clean up using HAL_SPI_Abort, is it enough?

HAL_SPI_RegisterCallback(self->Spi,HAL_SPI_TX_RX_COMPLETE_CB_ID,SpiCallback);
HAL_SPI_RegisterCallback(self->Spi,HAL_SPI_ERROR_CB_ID         ,SpiCallback);

// SpiCallback uses osSemaphoreRelease to resume the SpiTransaction

static U8 SpiTransaction(tTmc* const self, U8* const data, U16 size)
	{
	assert_param(self);
	if(HAL_SPI_TransmitReceive_DMA(self->Spi,data,data,size)!=HAL_OK)
		return 2;
	__HAL_DMA_DISABLE_IT(self->Spi->hdmarx,DMA_IT_HT); // mask half transfer interrupt
	if(osSemaphoreAcquire(self->SpiSync,5)!=osOK)
		{
		HAL_SPI_Abort(self->Spi);
		osSemaphoreAcquire(self->SpiSync,0);
		return 3;
		}
	return HAL_SPI_GetError(self->Spi);
	};

The problem happens once a few days, normally it works and transmission is executed every 200 ms.

Did someone have similar issue?

 

Best Regards

JH

1 ACCEPTED SOLUTION

Accepted Solutions

Hello,

one comment about DMA error, as previously I found the workaround only,

now I know what was causing the DMA error, 

__HAL_DMA_DISABLE_IT(self->Spi->hdmarx,DMA_IT_HT); // mask half transfer interrupt

digging in rx dma control register with enabled dma tx and rx interrupts was the root cause of race condition,

any interrupt can jump at the moment when dma rx channel is disabled and tx channel continues transmission,

doing it inside critical section was ok, finally I decided that masking half transfer interrupt isn't worth the performance benefit.

View solution in original post

4 REPLIES 4
Saket_Om
ST Employee

Hello @Jaroslaw Hill 

Did you try to increase the heap and stack ?

I would recommend protecting the whole SPI transaction with a mutex. Since the HAL stores the transfer state inside SPI_HandleTypeDef, concurrent access from another task/context during start, timeout, or abort can corrupt that state and lead to rare failures like this. A mutex ensures that only one transaction can use the SPI handle at a time, which can eliminate this kind of race condition.

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
Saket_Om
Mikk Leini
Senior III

There are strange things in the code:

1. Why do you disable DMA HT interrupt after starting transmission?

2. Why do you abort SPI transfer after starting it? Abort is needed when transfer times out or error occurs (although it's slightly gray area).

3. Semaphore is not relased, it's (tried) to acquire when first acquire failed. Very strange.

4. Why use semaphores on abort, but not on transfer? Is it possible that multiple tasks call SpiTransaction? That can create a lot of chaos. If SpiTransaction is called from one task only, then there is no need for semaphores or mutexes at all. If you have single chip on SPI bus then clean architecture would be to let only one task talk to that chip and use thread safety on task-to-task interface level.

5. HAL_SPI_GetError() returns 32-bit value, but SpiTransaction is U8.

 

Jaroslaw Hill
Associate II

Hello,

thank you for your replies.

I checked the heap and the stack, they both are big enough and not corrupted.

All SPI operations are executed under a mutex, so there is no risk of concurrent access.

Semaphore is used to wake the executing task up as soon as possible after completed transaction or after an error.

Finally I figured out what was the scenario of negative assert

assert_param(IS_DMA_BUFFER_SIZE(DataLength));

inside

HAL_SPI_TransmitReceive_DMA

I assumed that only one callback (complete or error) is invoked at the end, but this is not true in case of DMA error or SPI error.

DMA transfer error can invoke transfer completed and transfer error inside the same interrupt, not much problem as my code isn't executed in the mean time,

SPI error triggers deferred DMA abort in interrupts for RX and TX channels, every channel invokes own error callback and in between them there is time for resuming my code. Sometimes it starts new transmission before the second error callback ends the previous transmission by releasing the semaphore that is ready to acquire before the second transmission finish. So the third transmission starts too early and triggers assertion.

The simplest solution was to not using error callback at all and aborting the transmission only after long enough timeout.

Honestly speaking I don't know how to use error callback that resumes a task immediately without the risk of being notified twice.

 

Hello,

one comment about DMA error, as previously I found the workaround only,

now I know what was causing the DMA error, 

__HAL_DMA_DISABLE_IT(self->Spi->hdmarx,DMA_IT_HT); // mask half transfer interrupt

digging in rx dma control register with enabled dma tx and rx interrupts was the root cause of race condition,

any interrupt can jump at the moment when dma rx channel is disabled and tx channel continues transmission,

doing it inside critical section was ok, finally I decided that masking half transfer interrupt isn't worth the performance benefit.