Showing results for 
Search instead for 
Did you mean: 

STM32U575 FMAC DMA wong amount of bytes


I have experimented with STM32U575Z using FMAC.

I run the FMAC doing a convolution (sFmacConfig.Filter = FMAC_FUNC_CONVO_FIR)

I run the FMAC in DMA mode.

I registerd a HAL_FMAC_OUTPUT_DATA_READY Callback

But it is never triggered.

According to AN5593 Rev 4, GPDMA1 should be set for FMAC to work to "half word" transfer.

When reading through the source code in file "stm32u5xx_hal_fmac.c

  * @brief  Register the new output buffer, update DMA configuration if needed and change the FMAC state.
  * @param  hfmac pointer to a FMAC_HandleTypeDef structure that contains
  *         the configuration information for FMAC module.
  * @param  pOutput New output vector.
  * @param  pOutputSize Size of the output vector (if the vector can't
  *         be entirely filled, pOutputSize will be updated with the number
  *         of data read from FMAC).
  * @retval HAL_StatusTypeDef HAL status
static HAL_StatusTypeDef FMAC_ConfigFilterOutputBufferUpdateState(FMAC_HandleTypeDef *hfmac, int16_t *pOutput,
                                                                  uint16_t *pOutputSize)
  HAL_StatusTypeDef status;
  /* Reset the current size */
  hfmac->OutputCurrentSize = 0U;
      status = HAL_DMA_Start_IT(hfmac->hdmaOut, (uint32_t)&hfmac->Instance->RDATA, \
                                (uint32_t)pOutput, (uint32_t)(4UL * (*pOutputSize)));

in line 2270 (in this snipet line 24) the DMA is setup with 4 times the size of the output vector. (4UL * pOutputSize)

Obviously the code is written for a fixed data size of 32Bit per DMA transfer.

But FMAC is designed for 16 bit transfer.

As a result the DMA_complete_Callback is never called.

Is this wrong? Should the amount of bytes being transfered here changed to 2 * pOutSize ? Or do I have to change the DMA transfer to 32 Bit? (different from the recommendation of AN5593)

ST Employee

Dear Johannes,

Do you checked the project demonstration of FMAC ? provided by the STM32CubeU5 in ~\Projects\NUCLEO-U575ZI-Q\Examples\FMAC

It explains the:

  • Polling mode is used to transfer input data from memory to the FMAC peripheral.
  • DMA mode is used to transfer output data from FMAC peripheral to memory, so that CPU is offloaded.

Tell me if it answers your question, otherwise we need to investigate in depth the implementation of the



Yes I took that example as base for my code.

I use DMA for FMAC data input and output. My data is longer than the FMAC can hold in its internal memory. (752 points input, 111 points coefficients)

the result is correct.

The "only" thing is, that the DMA for the output is set to an amount of bytes two times higher than necessary, because the amount of bytes given the DMA to transfer is calculated by "output vector length" times 4 (32bit).

But the DMA is configured to 16 bit transfer.

Typical DMA start sequences (like from DAC), looks like this:

      /* Length should be converted to number of bytes */
      if (hdac->DMA_Handle1->Init.SrcDataWidth == DMA_SRC_DATAWIDTH_WORD)
        /* Word -> Bytes */
        LengthInBytes = Length * 4U;
      else if (hdac->DMA_Handle1->Init.SrcDataWidth == DMA_SRC_DATAWIDTH_HALFWORD)
        /* Halfword -> Bytes */
        LengthInBytes = Length * 2U;
      else /* Bytes */
        /* Same size already expressed in Bytes */
        LengthInBytes = Length;
      /* Enable the DMA channel */
      status = HAL_DMA_Start_IT(hdac->DMA_Handle1, (uint32_t)pData, tmpreg, LengthInBytes);

So it checks, which data transfer width is selected and calculates the amount of bytes for the DMA appropriately.

Unfortunately for FMAC DMA output this is fixed to vector-size times 4.

      status = HAL_DMA_Start_IT(hfmac->hdmaOut, (uint32_t)&hfmac->Instance->RDATA, \
                                (uint32_t)pOutput, (uint32_t)(4UL * (*pOutputSize)));

At the moment, I work around this issue by the following:

	FMAC_FilterConfigTypeDef sFmacConfig;
	/* Set the coefficient buffer base address */
	sFmacConfig.CoeffBaseAddress = 0;
	/* Set the coefficient buffer size to the number of coeffs */
	sFmacConfig.CoeffBufferSize = COEF_NUM;
	/* Set the Input buffer base address to the next free address */
	sFmacConfig.InputBaseAddress = COEF_NUM;
	/* Set the input buffer size greater than the number of coeffs */
	sFmacConfig.InputBufferSize = COEF_NUM;
	/* Set the input watermark to zero since we are using DMA */
	sFmacConfig.InputThreshold = 0;
	/* Set the Output buffer base address to the next free address */
	sFmacConfig.OutputBaseAddress = 2*COEF_NUM;
	/* Set the output buffer size */
	sFmacConfig.OutputBufferSize = 256-(2*COEF_NUM);
	/* Set the output watermark to zero since we are using DMA */
	sFmacConfig.OutputThreshold = 0;
	/* No A coefficients since FIR */
	sFmacConfig.pCoeffA = NULL;
	sFmacConfig.CoeffASize = 0;
	/* Pointer to the coefficients in memory */
	sFmacConfig.pCoeffB = convol_coef;
	/* Number of coefficients */
	sFmacConfig.CoeffBSize = COEF_NUM;
	/* Select FIR filter function */
	sFmacConfig.Filter = FMAC_FUNC_CONVO_FIR;
	/* Enable DMA input transfer */
	sFmacConfig.InputAccess = FMAC_BUFFER_ACCESS_DMA;
	/* Enable DMA output transfer */
	sFmacConfig.OutputAccess = FMAC_BUFFER_ACCESS_DMA;
	/* Enable clipping of the output at 0x7FFF and 0x8000 */
	sFmacConfig.Clip = FMAC_CLIP_ENABLED;
	/* P parameter contains number of coefficients */
	sFmacConfig.P = COEF_NUM;
	/* Q parameter is not used */
	sFmacConfig.Q = 0; //not used
	/* R parameter contains the post-shift value (none) */
	sFmacConfig.R = 0;
	/* Configure the FMAC */
	if (HAL_FMAC_FilterConfig(&hfmac, &sFmacConfig) != HAL_OK) Error_Handler();
	uint16_t outputsize=(SIG_NUM-COEF_NUM)/2;  //error in HAL_FMAC_FilterStart. DMA length is multiplied by 4, but transfer is 16bit ->2 times too high
	if(HAL_FMAC_FilterStart(&hfmac, outputvalues, &outputsize)!=HAL_OK)
	uint16_t inputsize=SIG_NUM;
	if(HAL_FMAC_AppendFilterData(&hfmac, convol_data, &inputsize)!=HAL_OK)

Line 48 : I divide the amount of output values by two as a workaround