STM32F407ZET6, Is it possible for Multiple streams of a DMA to run in Parallel?

Neolithic · ‎2021-06-01

Hi I am using STM32F407ZET6 Microcontroller and I want to use multiple streams of DMA1. Is it possible to trigger two different streams of the same DMA for transferring data to two different peripherals simulatenously. (Like in Parallel).

In the advanced AHB bus matrix I observe that for each DMA there are only two lines, one for memory and one for peripheral, which suggest to me that at any time at max two streams can perhaps run in parallel and that also if none of the streams are really doing memory<->peripheral transfer. Is this assumption correct? And, is this also correct that to run two streams in parallel through a single DMA they should not be doing memory<->peripheral transfer? what I mean is that by the look of AHB matrix it felt if only Mem to Mem and Periph to Periph transfers are done then probably two streams can run in parallel, but if any one of them does memory<->peripheral transfer then the use of DMA memory and peripheral interface for a single transfer will probably make that NOT possible. Can you shed some light on this?

I would like to request some guidance on this particular topic as i could not find satisfactory information on it... And if it is dependent on the bus bandwidth to transfer streams in parallel then how the bandwidth is divided among multiple channels for a single bus to perform multiple transfer.... Some If there is any such example, i would be thankful. As a reference I have put the AHB matrix below:

Neolithic · ‎2022-02-04

The reason why I ask is that I have implemented a non-preemptive scheduler, which means no context switching for the tasks themselves; hence no concurrent executions. However, I wanted to make at least two data transfers between peripherals parallel, say memory to memory and memory to peripheral where i can use different segments of RAM (CCRAM, SRAM1, SRAM2 etc) and ext-flash etc. so that i can exploit the full capacity of the hardware in making the LCD unit more responsive while the background activities still keep on going without much interruption. This can be less problematic if i use freeRTOS etc since then my high priority background stuff will always get priority; however non-preemptive schedulers have their own pros as well. So, that is the sole reason.

waclawek.jan · ‎2022-02-05

Oh so. I was afraid you want to invent some scheme involving perfect external signals' synchronicity, e.g. running two SPI masters synchronously.

> i can exploit the full capacity of the hardware

In that regard, even running a M2M in DMA2 (as only DMA2 is capable of running M2M) while having simultaneously serving one or even several peripherals (such as SPI or I2C or UART, at reasonable pace) in the same DMA2 would not really hurt performance. Setting the peripheral's stream's priority higher than M2M's ensures the peripheral is timely served, whereas the relatively rare occurence of peripheral's transfer is not that detrimental to M2M's performance. For example running system clock at 160MHz, M2M transfer takes say 5-6 cycles per transfer if there's no "obstacle" resulting in 25-30Mtransfers/s; whereas serving SPI running in 16-bit mode at 20Mbaud means 2Mtransfers/s, possibly resulting in 10% decrease of the M2M's performance. And it may be even less, given FIFOs and the 2-port design of DMA, together with the direct DMA-to-APB connection, you can get away even with setting SPI's priority to lower - but viability of this may depend on the particular setup.

OTOH, avoiding conflicts of even two DMAs on the buses is hard, as there are more busmasters (mainly processor, and then also HS-USB and ETH, in higher-end 'F4 and 'F7 also DMA2D and LTDC) and there are only 3 available memories (4 in 'F42x/43x), one of which is F(S)MC with its own timing issues. Note that CCM is not available for DMA in 'F4.

JW

Danish1 · ‎2022-02-05

I suppose one thing that makes me think multiple simultaneous paths in the bus matrix might be possible is the distinction between SRAM1 and SRAM2.

If only one of SRAM1 and SRAM2 may be accessed at a given time, then why specify that they are distinct?

Looking at the Reference Manual RM0090 Rev 15 p68/1745 I see in Section 2.3.1 Embedded SRAM:

The AHB masters support concurrent SRAM accesses (from the Ethernet or the USB OTG HS): for instance, the Ethernet MAC can read/write from/to SRAM2 while the CPU is reading/writing from/to SRAM1 or SRAM3.

(My bold)

(SRAM3 is only on stm32f42x / stm32f43x)

(But why explicitly Ethernet and USB; I know they have their own DMA controllers; are their ones privileged above DMA1 and DMA2?)

I suspect that there's only one access to the AHB matrix from e.g. DMA2, so any other concurrent transfer could not be on DMA2.

Hope this helps frame the next question,

Danish

waclawek.jan · ‎2022-02-05

> But why explicitly Ethernet and USB

Both are high-traffic interfaces, where large packets of data "appear suddenly" (after hardware finished checksum check, having them stored in their internal FIFOs). And, indeed their DMAs are capable of bursts, potentially blocking other users of given RAM for prolonged periods. That's why it's good to use a separate SRAM for their data, if there's any available.

> I suspect that there's only one access to the AHB matrix from e.g. DMA2, so any other concurrent transfer could not be on DMA2.

No, there are two. Read my posts above, DMA chapter in RM0090 and AN4031.

JW

Danish1 · ‎2022-02-06

Hi Jan,

No, there are two. Read my posts above, DMA chapter in RM0090 and AN4031.

What I meant was that I don’t expect DMA2 to be able to allocate e.g. Stream* 1 to one of those interfaces and Stream 2 to the other. If two streams are both ready to transfer, I expect DMA2 to apply priority/round-robin and do one after the other.

Whereas if DMA1 wants AHB and DMA2 also wants AHB, and their paths are independent then the AHB may allow both concurrently.

*or is it Channels on this stm32

Danish

waclawek.jan · ‎2022-02-06

> What I meant was that I don’t expect DMA2 to be able to allocate e.g. Stream* 1 to one of those

> interfaces and Stream 2 to the other.

The two ports have separate arbitrators. Thus, if one stream transfers on memory port, the other can simultaneously transfer on peripheral port. This figure from AN4031 illustrates it:

> *or is it Channels on this stm32

Wherever dual-port DMA is used ('F2/'F4/'F7/"general purpose" DMA in 'H7) the transfer element is Stream. In the single-port DMA (all other STM32) it's Channel. Don't ask me...

JW