Skip to main content
Mdi c
Associate III
June 14, 2020
Question

STM32H7 SAI + DMA + Cache enabled: nothing works as before

  • June 14, 2020
  • 8 replies
  • 4366 views

Hello, I am working on a project where I implemented various peripherals including a SAI configured in DMA mode. I have defined a 1024 samples buffer and used the Half and Full callbacks to fill it. Everything worked perfectly. However, I quickly realised that the performance of the H7 were incredibly disappointing and I decided to enable the caches to look for improvements.

I have followed the advice of making a SCB_CleanDCache_by_Addr after every half buffer fill but my application now suffers from worst problems. When I see the output of the SAI peripherals, the waveforms look like noise instead of sinewaves. If I try to decrease the buffer length this has an effect on the audio output , the smaller is, the "cleanest" is the output. This makes me thinking about a cache coherency issue.

Do you have an idea about a possible reason ?

thanks

This topic has been closed for replies.

8 replies

TDK
June 14, 2020

I agree it seems like a cache issue. Make sure your buffer is aligned to 32-byte cache page boundary. Invalidate (not clean) each half of the buffer after DMA has written to it but before reading it with the CPU.

"If you feel a post has answered your question, please click ""Accept as Solution""."
Mdi c
Mdi cAuthor
Associate III
June 14, 2020

Hi @TDK​ , thanks. Just for information, I call the function

HAL_SAI_Transmit_DMA(&hsai_BlockA3, (uint8_t *)sai_buff, BUFFER_SIZE);

in the main before the while(1) loop.

The data is aligned with 32bytes.

I thought that the SCB_CleanDCache_by_Addr would be enough but you reccomend to use void SCB_InvalidateDCache instead ? This should be also done on a periodic base at every half buffer cycle ?

thank you

Tesla DeLorean
Guru
June 15, 2020

Clean is like a flush, content on the CPU side pushed into memory.

The Invalidate should be used very carefully, you should use the By Addr​ version so as not to break your code/stack.

Tips, Buy me a coffee, or three.. PayPal VenmoUp vote any posts that you find helpful, it shows what's working..
TDK
June 14, 2020

Yes.

When DMA puts data into memory, the CPU doesn't know. If that line of memory is already in the CPU, then it will think it has the right values even though it does not. To avoid this and load the new data in memory, you need to invalidate those data cache addresses.

Clean is something different. Look up clean vs invalidate on Google.

"If you feel a post has answered your question, please click ""Accept as Solution""."
berendi
Principal
June 15, 2020

Another solution is to disable caching just on the buffer memory.

https://community.st.com/s/question/0D70X000007Q6Aw/stm32f7-using-spi-slave-with-dma-to-set-a-flag-ie-no-interrupts

This is my preferred method for smaller buffers, because I can just set it up once and forget about it. Drawback is the stricter alignment rule, i.e. a 1k large buffer must be aligned to 1k.

Mdi c
Mdi cAuthor
Associate III
June 15, 2020

thanks @berendi​ , but this means that I will not improve the calculation performance I need on that buffer, which is the reason why I want to enable the cache ?

Mdi c
Mdi cAuthor
Associate III
June 15, 2020

Hello everyone, thank you for your answers.

Ok I think I got it, for the SAI interface since the Transfer is CPU -> Peripheral I need to a Clean Cache to basically update the SRAM content with the new correct data (the SAI gets the new updated data). For the ADC and SPI RX working in DMA mode I need instead to do a Invalidate the cache in order to be sure the CPU look for new data in the SRAM rather than in the local cache. I guess it is correct to perform these operations inside the callback functions ?

All variables involved in this need to be aligned with 32bytes.

I will give it a try asap.

Mdi c
Mdi cAuthor
Associate III
June 15, 2020

So I have implemented the following. At every interrupt call from a DMA peripheral I do an Invalidate cache and I can correctly see the data in the debugger. however, I am still struggling trying to make the SAI work. For this, in the two calls (half / full completed) I added the clean cache function after the calculation of the new samples. However, I get just noise at the output. The data are correctly aligned with 32B:

ALIGN_32BYTES (int32_t sai_buff[BUFFER_SIZE]);

The size of the buffer does affect the shape of the output. Is there anything I can to to verify the buffer variable is properly managed by the cache?

Mdi c
Mdi cAuthor
Associate III
June 15, 2020

uff..hold on a moment .. SCB_CleanDCache_by_Addr (uint32_t *addr, int32_t dsize)

dsize is length in bytes... :\

Piranha
Principal III
June 15, 2020

> dsize is length in bytes... :\

Yes, but those cache management functions together with CPU hardware effectively expand the begin/end addresses of that range to the next lower/higher 32 byte multiples. Read my discussion with Pavel here:

https://community.st.com/s/question/0D50X0000AnsIJeSQM/how-to-get-ethernet-working-again-after-upgrading-to-firmware-fwh7v140-

And also my posts here:

https://community.st.com/s/question/0D50X0000CEr14wSQB/h745-memory-regions-attributes-cache

Take a note that BUFFER_SIZE must also be a multiple of 32. And to operate on half buffer that also, which means that the total buffer size must be a multiple of 64. But that's only for receiving buffers on which SCB_InvalidateDCache_by_Addr() must be called. On transmission buffers SCB_CleanDCache_by_Addr() must be called, which don't need any special alignment. That's because alignment and size of receiving buffers already rules out invalidate/clean conflicts and therefore flushing some additional data on the sides of the buffer will not damage anything.

Take a note that both of those cache management functions must be called before passing the buffer to peripheral. Yes, that's true also for a receive buffer invalidation because otherwise cache eviction can damage those by writing back dirty lines, which can happen during reception time.