cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with I2S TX / RX Circular DMA on STM32H743ZI2 board

JOatl
Associate II

I have a problem with DMA on this board that was not present on an F407 using code generated by Cube.

I2S2 has been configured as master half duplex and IS23 as slave half duplex (fed clock and WS of I2S2). The plan was to have a MEMS microphone on the slave and DAC on the I2S3.

I/O buffers for transfers have been placed in D2 RAM, cache then invalidated before setting off receive DMA (circular) and subsequently cleaned before setting off transmit DMA (circular).

Next is where I have been having issues, I have a tried and tested (F407) FIFO setup in a while loop to grab data and feed it through to TX. I have the problem that my DMA appears to block and mess up the while loop and each of the callbacks creating unexpected orders of code execution and intermittent hanging.

I and D cache enabled. Could this be a cache maintancne problem? If I disable the caches I still have issues.

__SECTION_RAM_D2
 
uint16_t txBuf[128];
 
__SECTION_RAM_D2
 
uint16_t rxBuf[128];
 
uint8_t txstate = 0;
 
uint8_t rxstate = 0;
 
 
#define LED_TIMEOUT PCM_SAMPLE_RATE / (128 / 4) // 1 second.
 
uint32_t LED3Timeout = 0;
 
uint32_t LED4Timeout = 0;
 
uint32_t LED5Timeout = 0;
 
uint32_t LED6Timeout = 0;
 
uint16_t fifobuf[256];
 
uint8_t fifo_w_ptr = 0;
 
uint8_t fifo_r_ptr = 0;
 
uint8_t fifo_read_enabled = 0;
 
 
 
int main(void)
 
{
 
 
 SCB_EnableICache();
 
 SCB_EnableDCache();
 
 
 HAL_Init();
 
 SystemClock_Config();
 
 MX_GPIO_Init();
 
 MX_DMA_Init();
 
 MX_I2S2_Init();
 
 MX_I2S3_Init();
 
 
   SCB_InvalidateDCache_by_Addr((uint32_t*)(((uint32_t)rxBuf) & ~(uint32_t)0x1F), sizeof(rxBuf));
 
   HAL_I2S_Receive_DMA(&hi2s3, &rxBuf[0], 64); 
 
 
   SCB_CleanDCache_by_Addr((uint32_t*)(((uint32_t)txBuf) & ~(uint32_t)0x1F), sizeof(txBuf));
 
   HAL_I2S_Transmit_DMA(&hi2s2, &txBuf[0], 64);
 
 
 while (1)
 
 {
 
    if (txstate == 1)
 
    {
 
       if (fifo_read_enabled == 1)
 
       {
 
          memcpy(&txBuf[0], &fifobuf[fifo_r_ptr], 128);
 
          fifo_r_ptr += 64;
 
       }
 
       SCB_CleanDCache_by_Addr((uint32_t*)(((uint32_t)txBuf) & ~(uint32_t)0x1F), sizeof(txBuf) / 2);
 
       txstate = 0;
 
    }
 
    if (rxstate == 1)
 
    {
 
       SCB_InvalidateDCache_by_Addr((uint32_t*)(((uint32_t)rxBuf) & ~(uint32_t)0x1F), sizeof(rxBuf) / 2);
 
       memcpy(&fifobuf[fifo_w_ptr], &rxBuf[0], 128);
 
       fifo_w_ptr += 64;
 
       if (fifo_w_ptr - fifo_r_ptr > 128)
 
       {
 
          fifo_read_enabled = 1;
 
       }
 
       rxstate = 0;
 
    }
 
    if (txstate == 2)
 
    {
 
       if (fifo_read_enabled == 1)
 
       {
 
          memcpy(&txBuf[64], &fifobuf[fifo_r_ptr], 128);
 
          fifo_r_ptr += 64;
 
       }
 
       SCB_CleanDCache_by_Addr((uint32_t*)(((uint32_t)&txBuf[64]) & ~(uint32_t)0x1F), sizeof(txBuf) / 2);
 
       txstate = 0;
 
    }
 
    if (rxstate == 2)
 
    {
 
       SCB_InvalidateDCache_by_Addr((uint32_t*)(((uint32_t)&rxBuf[64]) & ~(uint32_t)0x1F), sizeof(rxBuf) / 2);
 
       memcpy(&fifobuf[fifo_w_ptr], &rxBuf[64], 128);
 
       fifo_w_ptr += 64;
 
       rxstate = 0;
 
    }
 
 }
 
}

Here are the callbacks:

void HAL_I2S_TxHalfCpltCallback(I2S_HandleTypeDef *hi2s)
{
	
	// Error LED.
	if (LED3Timeout > 0)
	{
		LED3Timeout--;
	}
	if (txstate != 0)
	{
		HAL_GPIO_WritePin(LD1_GPIO_Port, LD1_Pin, GPIO_PIN_SET);
		LED3Timeout = LED_TIMEOUT;
	}
	else if (LED3Timeout == 0)
	{
		HAL_GPIO_WritePin(LD1_GPIO_Port, LD1_Pin, GPIO_PIN_RESET);
	}
	
	// Ready to transmit
	txstate = 1;
	
}
void HAL_I2S_RxHalfCpltCallback(I2S_HandleTypeDef *hi2s)
{
	
	// Error LED.
	if (LED4Timeout > 0)
	{
		LED4Timeout--;
	}
	if (rxstate != 0)
	{
		HAL_GPIO_WritePin(LD1_GPIO_Port, LD1_Pin, GPIO_PIN_SET);
		LED4Timeout = LED_TIMEOUT;
	}
	else if (LED4Timeout == 0)
	{
		HAL_GPIO_WritePin(LD1_GPIO_Port, LD1_Pin, GPIO_PIN_RESET);
	}
	
	// Ready to receive
	rxstate = 1;
	
}
void HAL_I2S_TxCpltCallback(I2S_HandleTypeDef *hi2s)
{
	
	// Error LED.
	if (LED5Timeout > 0)
	{
		LED5Timeout--;
	}
	if (txstate != 0)
	{
		HAL_GPIO_WritePin(LD2_GPIO_Port, LD2_Pin, GPIO_PIN_SET);
		LED5Timeout = LED_TIMEOUT;
	}
	else if (LED5Timeout == 0)
	{
		HAL_GPIO_WritePin(LD2_GPIO_Port, LD2_Pin, GPIO_PIN_RESET);
	}
	
	// Ready to transmit
	txstate = 2;
	
}
void HAL_I2S_RxCpltCallback(I2S_HandleTypeDef *hi2s)
{
	
	// Error LED.
	if (LED6Timeout > 0)
	{
		LED6Timeout--;
	}
	if (rxstate != 0)
	{
		HAL_GPIO_WritePin(LD3_GPIO_Port, LD3_Pin, GPIO_PIN_SET);
		LED6Timeout = LED_TIMEOUT;
	}
	else if (LED6Timeout == 0)
	{
		HAL_GPIO_WritePin(LD3_GPIO_Port, LD3_Pin, GPIO_PIN_RESET);
	}
	
	// Ready to receive
	rxstate = 2;
	
}

15 REPLIES 15
TDK
Guru

You should ensure txBuf and rxBuf do not share cache pages with each other or with other data. Aligning to a 32-byte boundary and making their size a multiple of 32 bytes should do this.

And if you disable cache (and cache-related functions), then it clearly can't be a cache issue.

> I have the problem that my DMA appears to block and mess up the while loop and each of the callbacks creating unexpected orders of code execution and intermittent hanging.

What does "DMA appears to block" mean? The DMA operates independently of the cpu, it can't be blocking things. What unexpected order of code execution do you see? Note that if the half and full tx complete appear quickly in succession, your code will fail, but it seems like you're setting an error flag in that case.

If you feel a post has answered your question, please click "Accept as Solution".
JOatl
Associate II

> And if you disable cache (and cache-related functions), then it clearly can't be a cache issue.

I have tried calling SCB_DisableICache, SCB_DisableDCache and remove SCB_InvalidateDCache_by_Addr and SCB_CleanDCache_by_Addr from my code above with no success. I don't know if another cube generated function is doing anything cache-related.

> What does "DMA appears to block" mean? The DMA operates independently of the cpu, it can't be blocking things. What unexpected order of code execution do you see? Note that if the half and full tx complete appear quickly in succession, your code will fail, but it seems like you're setting an error flag in that case.

Given that the master is clocking the slave and sharing WS I would expect the DMA callbacks to run something like TX half cmpt, RX half cmpt, TX cmpt, RX cmpt and the relevant functionality (resetting the int flag) triggered within the main while loop. This is the behaviour I have observed on an F407. On the H7, with or without cache enabled, I find that all of the error flags I have implemented for spotting non-coherence are triggered (LEDs light up).

I have set a log running of the order of execution and it works out like (running snapshot of 25 states):

DMA_RX_HALF

DMA_TX_HALF

DMA_RX_FULL

DMA_RX_HALF

DMA_TX_FULL

DMA_RX_FULL

DMA_TX_HALF

WHILE_LOOP_RX_FULL

DMA_RX_HALF

DMA_TX_FULL

DMA_RX_FULL

DMA_TX_HALF

DMA_RX_HALF

DMA_TX_FULL

DMA_RX_FULL

DMA_TX_HALF

WHILE_LOOP_TX_HALF

DMA_RX_HALF

DMA_TX_FULL

DMA_RX_FULL

DMA_TX_HALF

DMA_RX_HALF

DMA_TX_FULL

DMA_RX_FULL

DMA_TX_HALF

CPU clock is 480 MHz and with minimal processing within the while loop I don't understand why the main CPU code isn't able to run smoothly.

Here is the first 25 states after CPU reset:

DMA_TX_HALF DMA_TX_FULL DMA_TX_HALF DMA_TX_FULL WHILE_LOOP_TX_HALF DMA_TX_HALF WHILE_LOOP_TX_HALF DMA_TX_FULL WHILE_LOOP_TX_FULL DMA_TX_HALF WHILE_LOOP_TX_HALF DMA_TX_FULL WHILE_LOOP_TX_FULL DMA_TX_HALF DMA_RX_HALF DMA_TX_FULL DMA_RX_FULL DMA_TX_HALF DMA_RX_HALF DMA_TX_FULL DMA_RX_FULL DMA_TX_HALF DMA_RX_HALF WHILE_LOOP_TX_HALF DMA_TX_FULL DMA_RX_FULL

Edit: the state machine functions as expected (for TX only) if I never call HAL_I2S_Receive_DMA(), it is only when I bring in the RX from the slave using DMA that the main CPU code stops being executed.

> CPU clock is 480 MHz and with minimal processing within the while loop I don't understand why the main CPU code isn't able to run smoothly.

What's the WS clock frequency?

Toggle a pin at entry to and exit from the two ISRs, and observe using LA/oscilloscope.

JW

15.42 kHz WS, expected 48 kHz (correct on F407)

1 MHz I2S data clock, expected 3.067 MHz

Interrupt calls for stream0 and stream1 varying between 170 kHz to 210 kHz

If I comment out HAL_I2S_Receive_DMA I get a more consistent 240 kHz interrupt frequecy.

Here is my clock setup from cube:

(Edit: have lowered mcu to 120 MHz, I2S clocks still incorrect)

0693W0000059Kb5QAE.png

TDK
Guru

> Given that the master is clocking the slave and sharing WS I would expect the DMA callbacks to run something like TX half cmpt, RX half cmpt, TX cmpt, RX cmpt and the relevant functionality (resetting the int flag) triggered within the main while loop.

I would verify this is the case by observing the lines.

It's also possible that receive data is missed. Data received after the first HAL_I2S_Receive_DMA completes but before the next one starts is lost. This is one reason that using a circular buffer can be better.

> Interrupt calls for stream0 and stream1 varying between 170 kHz to 210 kHz

This is fast. I would toggle a GPIO pin within the interrupt to ensure the CPU has enough time. I would also guess at this rate that you're losing data due to the condition above.

If you feel a post has answered your question, please click "Accept as Solution".
JOatl
Associate II

It is fast, and not intentional. On my F407 project with the same cubemx setup I have 2 us per interrupt on each stream and about 320 us between interrupt calls. On the H7 project (with same I2s configuration, and almost identicalPLL clock) I am now getting 2.5 us per interrupt but only 10 us between calls, also with my i2s clocks only running at 1/3 of the values I configured them for (observed on scope)

JOatl
Associate II

I have noticed that changing the value of D2PPRE2 in cubemx directly affects the frequency of DMA interrupts

Edit: also removing DMA and using interrupts I still get 1/3 the I2S clock speed I configured cube for but can't immediately spot any issues with the initialisation code. This is very fustrating given it worked fine on the F407.

> 15.42 kHz WS, expected 48 kHz (correct on F407)

> 1 MHz I2S data clock, expected 3.067 MHz

>

> Interrupt calls for stream0 and stream1 varying between 170 kHz to 210 kHz

This has to be WS/[buffer halfsize] so something is very broken.

Check your clocks settings. Read out and check/post relevant I2S/DMAMUX/DMA registers content. Make sure the ISRs properly clear the interrupt sources. Using Cube is no excuse, it's your code now.

JW