I'm about to start implementing driving a single GPIO pin with DMA timer-driven bit-banging and I have a bunch of questions. I'm throwing them out here first in case people know the answers to some (or all) of them and so can save time on discovering things the hard way. I don't expect anyone to do my work or research for me - I'm prepared to bash my head against the wall and find out everything the hard way, but would appreciate any insights people have. To give back to the community, I will update this thread with the answers I discover over the next few days as the project progresses to completion.
1) (Not directly relevant to this project, but I am interested in the answer anyway) - Is it better to use a timer event to trigger the peripheral (let's say a DAC) directly to pull data via a DMA request or to use the timer interrupt to generate a DMA request to push data to the peripheral? Is one more temporally correct (I'm assuming triggering the DAC directly)? Is one more bus access contention efficient?
Answer (per Clive's answer below): It is better to use a timer event to trigger the peripheral to do a DMA pull rather than using a timer interrupt to initiate a DMA push to the peripheral.
2) I moved my existing working DMA buffer array variables to CCM to reduce bus contention and everything died (no bus or hard faults - Program kept running fine, just no more DMA). Am I missing something obvious?
Answer (per Jan's answer below): Yes, I was missing that DMA from CCM memory is impossible. From RM0090 (STMF4xx Reference Manual - Page 61): "The 64-Kbyte CCM (core coupled memory) data RAM is not part of the bus matrix and can be accessed only through the CPU."
3) For minimum bus access contention and/or best performance is it better to put only DMA buffers in CCM, or DMA buffers + heap + stack in CCM, or heap + stack in CCM and DMA buffers in SRAM 2? Chip is an STM32F429. In this thread, Clive and Mr. Peacock come to basically opposite conclusions.
Answer (per Mr. Peacock in the referenced thread):
CCM: Heap and stacks (No bus contention with DMA accesses)
SRAM1: Globals and large arrays (Data caching has benefit here)
SRAM2: DMA buffers (DMA doesn't care about lack of data caching and it avoids some contention with SRAM1)
BKPSRAM: Non-volatile variables
From AN4031 (Using the STM32F2, STM32F4 and STM32F7 Series DMA controller - Page 32):
Best DMA throughput configuration
When using STM32F4xx with reduced AHB frequency while DMA is servicing a high-speed peripheral, it is recommended to put the stack and heap in the CCM (which can be addressed directly by the CPU through D-bus) instead of putting them on the SRAM, which would create an additional concurrency between CPU and DMA accessing the SRAM memory.
4) Can timer-driven memory-to-GPIO DMA be configured in CubeMX?
No, this currently can't be done in CubeMX as of version 4.22.1.
5) Can DMA be used with GPIO ODR port bit-banding? If so, is DMA from memory to GPIO ODR port bit-banding considered memory-to-peripheral or memory-to-memory?
Answer (per Clive's answer below): Apparently yes, if you are willing to lose atomicity. It is considered a memory-to-memory transfer.
6) Since I am driving a single pin, it is quite wasteful to DMA an entire byte to the GPIO bit-banding register. In my ideal scenario (which quite possibly may not be technically feasible), I'd like to store my output bit pattern as individual bits in 32-bit words, then DMA the 32-bit words of the bit-banded expansion of the source array to the 32-bit word of the GPIO ODR port bit-banded expansion corresponding to my output pin. This would reduce the bit pattern source data array size requirements by a factor of 8. Basically, what I am thinking of doing is this:
Word array in memory -> 32x word-array expansion in bit-banding land -> DMA -> Single word in 32x word-array in GPIO ODR bit-banding land -> GPIO output pin
Is this possible?
Unknown. I have abandoned this goal as I found that I could shift a few pins around and free up an SPI port which will do the job in a much cleaner fashion than DMA-driven GPIO bit-banging. It seems like this "double bit-banding DMA" question comes up every few years (see example here) and never gets definitively answered.
Based on this thread, it seems like it should be possible to do a timer-driven double-bit-banding DMA transfer from memory to a GPIO pin.