What is a good buffering method for real time DSP (I2S, DMA) ?

Mnemocron · ‎2019-12-02

I have a custom board with an STM32F412re and a TLV320aic Codec, where the Codec is in Master mode and the I2S2 Interface of the STM32 in Full Duplex Slave mode.

So far I have the I2S2 interface configured to send and receive the data over DMA.

On DMA transfer complete Interrupt I copy the input buffer (uint16_t) to the output buffer.

This way I have a passthrough of the audio signal.

Now I also implemented the float32_t FIR example from CMSIS/DSP. I have verified, that the FIR filter works (amplitude shows low-pass characteristic). However I am seeing (and hearing) that the audio signal has gaps.

(blue: Input, 1Vpp, 1kHz. green/pink: Line Out L/R)

I suspect that this happens because I execute the FIR filter inside the DMA ISR - which is not the way to do it. The next step is to implement a buffering method to run the FIR-Filter not FFT (as in the comments) n the main loop.

How can this be achieved? What is good practice? Is there an app-note or a white paper explaining how to implement a buffer for this application?

My 1st approach: (main and interrupt)

// main.c
 
 
/* USER CODE BEGIN PV */
// four alternating buffers
uint16_t pTxRxData_A[DSP_BUFFERSIZE];
uint16_t pTxRxData_B[DSP_BUFFERSIZE];
uint16_t pTxRxData_C[DSP_BUFFERSIZE];
uint16_t pTxRxData_D[DSP_BUFFERSIZE];
 
// a Merry go Around for the three buffers, containing the pointer to the first element
volatile uint16_t* buffer_merry_go_around[4] = {pTxRxData_A, pTxRxData_B, pTxRxData_C, pTxRxData_D};
volatile uint8_t pDataIndex_Rx  = 3;
volatile uint8_t pDataIndex_DSP_in  = 2;   // increment to get Rx buffer
volatile uint8_t pDataIndex_DSP_out = 1;   // 
volatile uint8_t pDataIndex_Tx  = 0;   // increment to get DSP buffer
volatile uint8_t newDataReadyFlag = 0; // ISR Flag to main
 
/* USER CODE BEGIN 3 */
/* Use MAIN Loop to process buffered Data */
if(newDataReadyFlag){
	newDataReadyFlag = 0;
	DSP_Process_Data((uint16_t*)buffer_merry_go_around[pDataIndex_DSP_in], (uint16_t*)buffer_merry_go_around[pDataIndex_DSP_out], DSP_BLOCK_SIZE);
}

// stm32f4xx_it.c
 
/**
  * @brief This function handles DMA1 stream4 global interrupt.
  */
void DMA1_Stream4_IRQHandler(void)
{
  /* USER CODE BEGIN DMA1_Stream4_IRQn 0 */
	dmaTransferComplete ++;
 
  /* USER CODE END DMA1_Stream4_IRQn 0 */
  HAL_DMA_IRQHandler(&hdma_spi2_tx);
  /* USER CODE BEGIN DMA1_Stream4_IRQn 1 */
	HAL_I2S_DMAStop(&hi2s2);
	pDataIndex_Rx = (pDataIndex_Rx+1) % 4;
	pDataIndex_Tx = (pDataIndex_Tx+1) % 4;
	pDataIndex_DSP_in = (pDataIndex_DSP_in+1) % 4;
	pDataIndex_DSP_out = (pDataIndex_DSP_out+1) % 4;
	
	HAL_I2S_DMAStop(&hi2s2);
	HAL_I2SEx_TransmitReceive_DMA(&hi2s2, (uint16_t*)buffer_merry_go_around[pDataIndex_Tx], (uint16_t*)buffer_merry_go_around[pDataIndex_Rx], DSP_BUFFERSIZE);
	newDataReadyFlag ++;
}

This turns out to be even worse. The passthrough functionality now too has gaps and the FIR amplitude stays the same. This is why I suspect, that the buffers are getting confused somewhere. (e.g. DMA output buffer is not FIR output)

My second approach is similar, with two alternating buffer arrays[2].

Both approaches can be found here:

https://github.com/mnemocron/P5-DSP-Board-mdk/tree/rotating-buffers/Src

https://github.com/mnemocron/P5-DSP-Board-mdk/tree/switching-buffers/Src

Mnemocron · ‎2019-12-04

The input and output speed are the same. The TLV320 is in Master mode and outputs a 12.288MHz clock to the STM32. Rx/Tx happens at the same time.

The goal is to implement an FIR Filter (as opposed to an FFT in the replies to my question) so I need to modify the data.

I was now able to implement the DMA Buffer with an interrupt on half-full. Another issues was in how I implemented the FIR filter with the wrong parameters.

The DSP routine (including the FIR) is still called in the ISR as the processing time is much less than the buffer length times sampling period.

DMA runns on the pRxData[2 * DSP_BUFFERSIZE] and pTxData[2 * DSP_BUFFERSIZE]

void DMA1_Stream3_IRQHandler(void)
{
  /* USER CODE BEGIN DMA1_Stream3_IRQn 0 */
  /* DMA STREAM 3 is I2S2 RX complete */
  dmaTransferComplete ++;
 
  /* USER CODE END DMA1_Stream3_IRQn 0 */
  HAL_DMA_IRQHandler(&hdma_i2s2_ext_rx);
  /* USER CODE BEGIN DMA1_Stream3_IRQn 1 */
 
  // arm_fir_f32() is called in this function
  DSP_Process_Data( (pRxData + buffer_offset), (pTxData + buffer_offset), DSP_BUFFERSIZE);
 
  if(!buffer_offset)
    buffer_offset = DSP_BUFFERSIZE;
  else
    buffer_offset = 0;
 
  /* USER CODE END DMA1_Stream3_IRQn 1 */
}

View solution in original post

Ozone · ‎2019-12-02

I can't comment on Cube/HAL code. I have my reasons for not using it ...

But I had somehow similar projects involving a realtime FFT.

Audio was sampled at 44,1kHz, and copied per DMA into a FFT-sized sample buffer.

The DMA was configured for a TC interrupt, meaning the buffer is full.

The TC interrupt code copied the data to the actual FFT buffer, perhaps doing a simple data conversion (int->float) on the fly, and set a ready flag.

The main loop did the FFT conversion using the FFT buffer, while the next sample buffer was already being filled simultaneously.

The code is based on SPL.

Not sure if you need a float FIR implementation. The CMSIS DSP lib has integer-based implementation as well, I believe.

Danish1 · ‎2019-12-02

I've used DMA to a circular buffer of length exactly twice the amount of data I choose to process in one chunk.

When I get the DMA half-transfer interrupt I know the first half has good data (and the second half is filling up). So I can do what I want with the first-half data.

When I get the DMA transfer complete interrupt I know the second half has good data. As I set the buffer to circular, I know the first-half is now filling up while I process the second-half data.

If you want to use more than 2 buffers, then you should use the double-buffer* mode so DMA can continue while you are updating the pointers for the currently-inactive buffer. Sorry I don't know how to do it with HAL - I generally go direct to registers because I find the Reference Manual better-documented that HAL.

*Present on stm32f415 so I assume '412 has it as well.

Hope this helps,

Danish

waclawek.jan · ‎2019-12-02

> On DMA transfer complete Interrupt I copy the input buffer (uint16_t) to the output buffer.

> This way I have a passthrough of the audio signal.

Do you have modify the data during copy? Do the the input and output devices run on different clock?

If both answers are "no", then forget about copying the data, just start two circular DMAs of the same length, one receiving, one transmitting, into the same buffer.

That's it, and you can completely forget about the stuff then.

If you need to modify the data - e.g. amplify/attenuate - it's still two circular DMAs, but to/from two buffers, with the copy in the half-transfer interrupt, as Danish said above. There's again no reason to stop/start the DMAs ever.

If the clocks differ, that's the most tricky one - but only in the math involved in the copy; it's still two circular DMAs, never to be stopped/restarted.

In the latter two cases, the copy and/or processing performed in the half-full interrupt needs to be carefully benchmarked so that it always fits into the slot given by the data rate. Impact of interrupts of higher priority MUST be included in its worst-case scenario. Calculating/measuring execution times and latencies is a basic skill of mcu the programmer.

In any case, FFT just picks data from the input buffer to its own as Ozone said (unless this step can be omitted and data can be processed directly as they arrive, in which case you simply leave them in the inbound buffer), and processes - this can be triggered by the interrupt and processed elsewhere. If FFT can't keep up with the data rate, then it's just that data which is corrupted, not the feedthrough.

JW

Mnemocron · ‎2019-12-04

The input and output speed are the same. The TLV320 is in Master mode and outputs a 12.288MHz clock to the STM32. Rx/Tx happens at the same time.

The goal is to implement an FIR Filter (as opposed to an FFT in the replies to my question) so I need to modify the data.

I was now able to implement the DMA Buffer with an interrupt on half-full. Another issues was in how I implemented the FIR filter with the wrong parameters.

The DSP routine (including the FIR) is still called in the ISR as the processing time is much less than the buffer length times sampling period.

DMA runns on the pRxData[2 * DSP_BUFFERSIZE] and pTxData[2 * DSP_BUFFERSIZE]

void DMA1_Stream3_IRQHandler(void)
{
  /* USER CODE BEGIN DMA1_Stream3_IRQn 0 */
  /* DMA STREAM 3 is I2S2 RX complete */
  dmaTransferComplete ++;
 
  /* USER CODE END DMA1_Stream3_IRQn 0 */
  HAL_DMA_IRQHandler(&hdma_i2s2_ext_rx);
  /* USER CODE BEGIN DMA1_Stream3_IRQn 1 */
 
  // arm_fir_f32() is called in this function
  DSP_Process_Data( (pRxData + buffer_offset), (pTxData + buffer_offset), DSP_BUFFERSIZE);
 
  if(!buffer_offset)
    buffer_offset = DSP_BUFFERSIZE;
  else
    buffer_offset = 0;
 
  /* USER CODE END DMA1_Stream3_IRQn 1 */
}

Mnemocron · ‎2019-12-04

DMA Half-Transfer Interrupt was what I needed. Thank you for pointing this out.

waclawek.jan · ‎2019-12-04

So problem solved?

If so, please select your post as Best so that the thread is marked as solved.

JW