2015-03-20 05:58 AM
I know DMA acts as a channel where the CPU gets bypassed while data is transferred between Memory/ Peripheral,and in the mean-time CPU can execute useful operations..
But my question is ...
If the DMA transfer is going-on the DATA-BUS will be acquired by the DMA controller, So if the CPU needs to fetch/retrieve data for any operation it needs the access to data bus where it will be used by the DMA data transfer. So how exactly we can say that DMA offloaded the CPU?
#spi #stm32 #dma #dma #bus-matrix2015-03-20 07:37 AM
DMA can contend with the CPU for bus resources, the bus matrix localizes traffic so that occurs less frequently. The operation is done more rapidly, and uses less resources. It can be done at rates that exceed the CPU's ability to perform the operation.
If the CPU does all the work you're sucking up it's entire bandwidth, both execution and data. With a load-store architecture the CPU has to read every thing into itself, and then write it back out. DMA is effectively point-to-point copies the data, or where the same buses are involved via a temporary holding register.DMA is pointless for single byte transfer lengths, you'll spend more time setting it up than doing actual work. Get to 100+ bytes, the setup time is fractional. With circular modes there's only one setup penalty.2015-03-20 10:34 AM
On the newer STM32s (F2 and later) there are multiple SRAM regions which can operate in parallel, where the CPU avoids a DMA collision by operating out of CCM or SRAM2 at the same time DMA transfers to SRAM1. With some good allocation strategies the DMA becomes an essentially ''free'' operation, no impact on CPU operation.
Also on later STM32s the DMA can perform packing/unpacking and FIFO burst modes to minimize bus collisions. For example, transferring 16 bytes to a USART the DMA can burst transfer four 32 bit words (assuming it's aligned) from RAM and then unpack 16 bytes to the USART. On the F1 this would require 16 DMA bus accesses, one per byte. One possible collision versus 16 possible collisions. In my experience one of the more significant ways to offload the CPU is using DMA to transfer long strings to a USART. Console output can be time intensive when it's an interrupt per byte, especially at higher speeds. At 115Kbaud the interrupt rate is around 11 thousand per second if streaming data at full speed, and 4x as many if running an RS-485 interface at 460Kbits/sec (that's only about 22usec per byte IRQ service time). DMA offloads that to one (DMA complete) or two (adding a USART transmission complete for last byte shifted out) interrupts. Anytime you can reduce interrupts by 5 orders of magnitude it will make a difference. Jack Peacock2015-03-25 06:09 AM
I've read in some blogs that , in burst mode a block of data is transffered which makes the cpu inactive,
But as per your comment, the CPU can still work with the data in the second SRAM region (CC SRAM) , Am I right?2015-03-25 07:19 AM
Yes that's one of STs big selling points, the bus matrix. I can run code from flash, heap and stack in CCM, ADC DMA conversions to SRAM1 and USART DMA transfers from SRAM2, all in parallel. CPU and DMA don't collide since they are on separate buses (course I/O does collide if on same APB or AHB bus but that's minor).
Even if using burst mode with CPU collisions the arbitration time is reduced since there is one arbitration between CPU and DMA for up to 4 words at a time, as opposed to an arbitration for each byte (as many as 32:1 reduction) for, say, a USART TX DMA transfer. CPU can work with any memory region. DMA has some limits (no CCM access, DMA1 can't do memory to memory) but those limitations are minor and easily worked around. Jack Peacock2015-03-25 08:37 AM
Thanks,
So does the DMA controller in ST supports 1. Cycle stealing Mode 2.Burst Mode 3.Transparent Mode All thsese above modes ?2015-03-25 08:48 AM
Pretty sure it doesn't do (1). It either gets unfettered access, or in the case of contention it arbitrates, or simple waits for the other master to finish what it's doing. It's going to be hard to model or predict in absolute terms as a lot of things interact, and there is caching and write buffers involved.
It doesn't do any anti-phase access compared to the CPU. The CPU will be delayed if it contends with an ongoing DMA burst.