cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H7 cache and DMA transfer error rate.

Aleks
Associate III

Hi.

It seems, that sometimes data in cached memory is corrupted.

DMA transfer data from SPI to SRAM2 (0x30000000). I use ping-pong buffer in SRAM2 (two similar buffers). Data processing performing for first part of the buffer in thread, while second it's part is filled with data over DMA.

 DMA write 35 bytes to buffer. DMA start next time and write to next 35 bytes and so on 72 times. After 35*72 bytes are written into first part of the ping-pong buffer, it swap to second part. Both parts are 35*72 bytes.

 Both start address of buffer parts are alligned to 32 bytes (but size of buffer is not alligned to 32 and accsess from thread is one byte alligned).

Of course invalidate cache before processing is performed. Buffer part for thread (35*72 bytes) is invalidated. But sometimes data in buffer seems is not valid.

 Usually failed data that correspond one DMA transaction (all 35 bytes), but one or some corrupted bytes meet too.

When cache is disabled there is no one error for very long period of time. When cache is enabled error rates is about 1e-6. It is too much for my application.

Additional considerations

  1. I use about 5kBytes from total 16k cache. And OS use some cache. The code without OS run proper with enabled cache.
  2. Full Invalidate cache solve this problem (but may be other problems grown up, isn't it?).

Do you have any suggestions?

Regards, Aleks.

1 ACCEPTED SOLUTION

Accepted Solutions
Piranha
Chief II

> but size of buffer is not alligned to 32 and accsess from thread is one byte alligned

Then align the size also. In your case 35*72=2520 B. The closest aligned size is 2528 B. You don't have to use those 8 bytes but they have to be there so that they take up the space and invalidation doesn't damage other variables. One can afford wasting 2*8 bytes on H7 series.

> Of course invalidate cache before processing is performed.

Invalidation must be done before passing the buffers to the DMA - before the reception on a particular buffer is started. Doing it after the reception is too late and is a flaw. The reason is a cache eviction process.

Read my comment carefully:

https://community.st.com/s/question/0D53W00000oXSzySAG/different-cache-behavior-between-stm32h7-and-stm32f7

Detailed example of cache eviction:

https://community.st.com/s/question/0D50X0000C9hGozSQE/weird-cache-writeback-behavior-for-stm32f7508

View solution in original post

6 REPLIES 6
Piranha
Chief II

> but size of buffer is not alligned to 32 and accsess from thread is one byte alligned

Then align the size also. In your case 35*72=2520 B. The closest aligned size is 2528 B. You don't have to use those 8 bytes but they have to be there so that they take up the space and invalidation doesn't damage other variables. One can afford wasting 2*8 bytes on H7 series.

> Of course invalidate cache before processing is performed.

Invalidation must be done before passing the buffers to the DMA - before the reception on a particular buffer is started. Doing it after the reception is too late and is a flaw. The reason is a cache eviction process.

Read my comment carefully:

https://community.st.com/s/question/0D53W00000oXSzySAG/different-cache-behavior-between-stm32h7-and-stm32f7

Detailed example of cache eviction:

https://community.st.com/s/question/0D50X0000C9hGozSQE/weird-cache-writeback-behavior-for-stm32f7508

TDK
Guru

Invalidate the first half of the buffer after it's received but before you read it. Do the same with the second half. Do not globally invalidate the cache as this causes other issues. You will need to align each half of the buffer to fit cleanly within a cache page.

If you feel a post has answered your question, please click "Accept as Solution".

I am sorry for my unclear note. The sizeof(buffers) is not aligned to 32 byte, but addresses of all buffers for DMA are aligned to 32 bytes. So it is OK.

I am confused with your suggestion

> Invalidation must be done before passing the buffers to the DMA - before the reception on a particular buffer is started

because https://www.st.com/resource/en/application_note/dm00272913-level-1-cache-on-stm32f7-series-and-stm32h7-series-stmicroelectronics.pdf After the DMA transfer complete, when reading the data from the peripheral, the software must perform a cache invalidate before reading the DMA updated memory region.

And I do it exactly. But will read your materials and try correspond it.

  1. Clean DMA TX buffer before transmit it to SPI.
  2. Receive data from SPI over DMA to DMA RX buffer (72 times)
  3. swap DMA RX1 buffer to RX2
  4. setting up DMA to RX2
  5. send to OS event and receive event into thread
  6. invalidate cache by address for RX1
  7. checking data in the buffer, (errors catches here) change data to little endian
  8. RX2 is filling...

Aleks
Associate III

Exactly this is done.

Aleks
Associate III

Piranha and TDK thaks a lot!

It seems that reason in sequential read and write to buffer by means of my code, i. e. CPU read modify and write data to cachable buffer.

I suppose there are possible issues:

  1. There is cache eviction when CPU write to buffer, at this moment previously cached and modified (change endianess) data is refill from SRAM with original data.
  2. There is an abnormal work of the core according errata https://www.state-machine.com/doc/ARM-AT610-611.pdf. (1259864 Data corruption in a sequence of Write-Through stores and loads)
  3. The code should contain ISB/DMB/DSB instructions at correspond places (in a chain read-modify-write-read).
  4. Some other issues

I remove writing to buffer and errors are disapier with other equal conditions.

DSB instructions after RD, WR operations do not resolve the problem.

for (size_t cnt = 0; cnt < lenBuf; ++cnt)
		{
			// crc check
			uint32_t *ptr = reinterpret_cast<uint32_t *>(&(bufRd[cnt][1])); // byte 1 - dummy
			uint32_t len = static_cast<uint32_t>(Mcp3914::lenCrcSpiPart);	// adc data
			uint32_t crcCalculated = HAL_CRC_Calculate(phcrc, ptr, len); 	// CRC16
 
			uint16_t crc16; // CRC16 for SPI frame
			crc16 = bufRd[cnt][Mcp3914::lenBufDmaSpi - 2] << 8;
			crc16 |= bufRd[cnt][Mcp3914::lenBufDmaSpi - 1];
 
			uint16_t *crcIsInvalid = reinterpret_cast<uint16_t *>(&(bufRd[cnt][Mcp3914::lenBufDmaSpi - 2]));
			// place crc status to "crc16" field
			*crcIsInvalid = crc16 ^ static_cast<uint16_t>(crcCalculated);
 
			// to little endian data
			int32_t *pAdcData;
			int32_t littleEndianData;
			for (size_t ch = 0; ch < Mcp3914::channelsTotal * Mcp3914::lenChannelAdc; ch += Mcp3914::lenChannelAdc)
			{
				pAdcData = reinterpret_cast<int32_t *>(&bufRd[cnt][ch + 1]);
				littleEndianData = static_cast<int32_t>(__REV(*pAdcData));
				*pAdcData = littleEndianData;
			}
		}

Here bufRd is DMA buffer.

There is no one error if cache is disabled for SRAM buffer.

Why invalidating full cache resolve the problem?

Kind regards, Aleks. 

Hi Piranha!

I read carefully your comments in the topic. I invalidate buffers before start accept data from DMA and after receiving is completed. Problem is seems solved.  

Yes, even for receive part the invalidation must be done before passing buffers to DMA, because otherwise cache eviction can damage the receive buffers by writing back dirty lines during reception.

So invalidate must be done before start receiving and after receiving is comleated.

Thank you for responce.