2025-10-10 8:24 AM
Hello,
Using a NUCLEO H7S3L8, I'm trying to run a test where, using circular DMA with a linked list, I'm trying to send a buffer in memory over to the SAI for audio playback, using an external DAC. This configuration is based on either other tutorials or the reference manual, but it always throws a User Error flag and calls the DMA's IRQ handler, right after the HAL_DMAEx_List_Start_IT() call (whether done on its own after the necessary configuration, or as part of HAL_SAI_Transmit_DMA()). 
Even with some changes, it always yields the same results. The only one that seemed to have any effect was setting the Transfer Event mode to trigger the TC flag for each linked list item, which confirmed that a transfer might have been completed, despite the error flag.
I have also seen GPDMA being configured with circular, using both Standard Request Mode and a Linked List, but haven't gotten any success out of that myself (with even TC flag not being raised).
In case it causes any complications, the board also is configured as a USB Audio device (leftover from previous test, not used currently), and the SAI is configured at 16 bits, at 48 kHz.
Below are images of the configuration:
I'll also leave main.c of the application project, where the only parts of relevant user code have been written.
Any help or advice would be greatly appreciated.
Solved! Go to Solution.
2025-10-12 2:34 PM
So...seems to work.
But i changed many...see and try.
( core clock etc. lower , 360M , just to get no problem.)
2025-10-10 11:10 AM
Hi,
>I'm trying to send a buffer in memory over to the SAI for audio playback, using an external DAC.
So what should it do finally ? Audio-player ?
I made an Audio-player with H743, plays wav/flac/mp3 from SD-card and USB, DAC is ES9038 + TDA1387.
And DMA is normal mode, circular.
So why this linked list ? (just makes it more complicated, imho.)
2025-10-11 1:05 PM
@AScha.3 wrote:So what should it do finally ? Audio-player ?
Yes, the end result I have in mind is a USB audio interface, using the Nucleo's secondary USB port for power and receiving data, to then forward to the SAI via DMA, and finally to an external PCM5102 DAC (on a WMCU-5120 breakout board, to be exact).
@AScha.3 wrote:So why this linked list ? (just makes it more complicated, imho.)
GPDMA on the newer STM32 MCUs (U5, H5, H7RS, etc.) seems to differ quite a bit from the DMA controller used on the H743 and others (from my limited experience with a Nucleo H743ZI). The functionality for circular DMA without using any linked lists is now considered a legacy approach, and not available as far as I can tell.
Unfortunately, that makes using a linked list necessary, with the only "alternative" I considered, being using non-circular DMA to send bulk-like DMA transfers to fill the SAI's FIFO, which will then be played while waiting to refill the FIFO with more data.
That, from my tests, may work if transmitting an unchanged buffer repeatedly every couple ms.
However, when gathering USB data into half of a larger buffer, transmitting that half when full, and proceeding to fill the other (simulating circular double buffering), only noise akin to a square wave is produced, I'm assuming due to intermittent silence from the SAI being starved of data to transmit.
So, other than being unorthodox, it's also non-functional.
Sorry for the long answer, but I wanted to offer any information that might be relevant.
2025-10-11 1:20 PM
>Sorry for the long answer
No, thats fine. So i know more...
I have a NUCLEO H7S3L8 , so if you give me your project (maybe main.c + .ioc could be enough),
i could try it.
(I had similar thing running on H563 , with linked list, just to test it. + Azure rtos.)
2025-10-11 1:56 PM
Yes, sure thing, I'll post both files.
Also, this section below was changed from the CubeMX generated linker script for XSPI2, to accomodate a larger non-cacheable buffer.
__RAM_BEGIN    = 0x24000000;
__RAM_SIZE     = 0x71800;
__RAM_NONCACHEABLEBUFFER_SIZE = 0x800;Thank you for your help and time!
2025-10-12 2:34 PM
2025-10-13 2:51 AM
Yes, it does work now. And checking the changed settings one by one, it looks like the main culprit was D-Cache being on. It seems that just putting the buffer that will be used for the DMA transmission in a non-cacheable area, as declared by the CubeMX linker script, isn't enough for circular mode using GPDMA.
Maybe if I find a way to keep D-Cache used and the rest of the configuration correct (for the sake of processing other data faster), I'll reply to this post.
Still, thank you, your input was very helpful.
2025-10-13 5:06 AM - edited 2025-10-13 5:19 AM
After i got it to same state as you had, it seemed a problem with the data size and the DMA.
And so i switched off d-cache, to exclude any problems from this side, and the MPU (because useless then).
In my "experiments " for my player i also had to tinker around for a few hours, to get the linked list working, so i know the "problems" (this beast is so complex and together with the SAI and its fifo ...still more).
So i changed data type to int16, what usually the standard PCM is and a DAC expects to get;
then let the DMA do the job to do the fifo loading in its optimum format, for fastest transfer;
and reduce its priority, as its not needed here (audio data rate is "slow" for this kind of cpu, and together with the fifo there will never be a delay/dropout problem, so better to have the (maybe) needed fast things, like interrupts from other signals, working with higher prio.
+
And about the d-cache: just leave it off, if not needing the last few percent of performance, the cpu can have at all.
I tested it with my audio player, where the problem is : in SAI-DMA-callbacks the data is moved to the playbuffer, but these data has to be calculated by the cpu before (if flac or mp3 much calc.) and then comes the biquad filters, different for the streams (i have 3 , going to 3 DACs).
So possible are 3 types of handling the data:
1 - no D-cache, no MPU
2 - D-cache ON, handling with cache management, no MPU
3 - D-cache ON, handling with MPU
-3: useless , because mpu just can define the area, no cache is used; but here the calculation is running...
- 2 : so calculation faster, but cache managing needs also extra time
- 1: no problems, but also no d-cache :
test this against "2", at 200MHz core, flac decoding, 4K samples , x 3 channels , data as int32, filter calc. in double float, it was about :
4.8 ms no d-cache
4.1 ms D-cache with management
So here, with a lot calculation "in place" , all that D-cache can improve is about 20% speed.
If not much calculation with filters or decoder, using NO D-cache would be faster !
