2025-05-29 1:25 PM - last edited on 2025-05-29 1:39 PM by mƎALLEm
I'm working on a project which uses DMA to pull data from an external chip controlled by the FMC into a buffer in AXI SRAM, and am seeing a much lower throughput than expected. It looks like the FMC is idle for long periods of time, and I'm looking for some insight the cause of the downtime. Where are the delays being introduced? Are they inherent to the FMC peripheral, caused by its current settings, or a result of how it's being accessed? Also, which clock connects to the DMA controller on the H7 series, and could the DMA clock also be limiting performance?
Additional details:
The FMC is configured for asynchronous accesses to SRAM with an 8-bit data bus and a 12-bit address bus (full config is pasted below). The FMC is clocked at 200MHz, the core is clocked at 400MHz, and I'm not sure about the DMA controller clock since it doesn't show up in any clock diagrams I can find. I'm using development hardware
(an old Nucleo-H743ZI2) and just monitoring the FMC outputs directly with a logic analyzer.
I've attached 3 captures of the address LSB and chip enable lines. The first image shows the FMC when driven by the DMA controller doing 8-bit accesses, the second shows the FMC when driven by the DMA controller doing 32-bit accesses, and the third shows the FMC when driven by the core doing 64-bit accesses (which seems to be being split into two 32-bit transactions, I assume this is a limitation of the instructions available to the core). the 32 and 64 bit accesses show that the peripheral is capable of chaining transactions back to back, but in all three captures, there's significant downtime between accesses. I was running into bandwidth limitations on the logic analyzer so these numbers are approximate, but long idle periods in these images are ~70-90ns, short idle periods in the 64-bit transaction are ~40ns.
FMC configuration:
hsram1.Instance = FMC_NORSRAM_DEVICE;
hsram1.Extended = FMC_NORSRAM_EXTENDED_DEVICE;
hsram1.Init.NSBank = FMC_NORSRAM_BANK1;
hsram1.Init.DataAddressMux = FMC_DATA_ADDRESS_MUX_DISABLE;
hsram1.Init.MemoryType = FMC_MEMORY_TYPE_SRAM;
hsram1.Init.MemoryDataWidth = FMC_NORSRAM_MEM_BUS_WIDTH_8;
hsram1.Init.BurstAccessMode = FMC_BURST_ACCESS_MODE_DISABLE;
hsram1.Init.WaitSignalPolarity = FMC_WAIT_SIGNAL_POLARITY_LOW;
hsram1.Init.WaitSignalActive = FMC_WAIT_TIMING_BEFORE_WS;
hsram1.Init.WriteOperation = FMC_WRITE_OPERATION_ENABLE;
hsram1.Init.WaitSignal = FMC_WAIT_SIGNAL_DISABLE;
hsram1.Init.ExtendedMode = FMC_EXTENDED_MODE_DISABLE;
hsram1.Init.AsynchronousWait = FMC_ASYNCHRONOUS_WAIT_DISABLE;
hsram1.Init.WriteBurst = FMC_WRITE_BURST_DISABLE;
hsram1.Init.ContinuousClock = FMC_CONTINUOUS_CLOCK_SYNC_ONLY;
hsram1.Init.WriteFifo = FMC_WRITE_FIFO_ENABLE;
hsram1.Init.PageSize = FMC_PAGE_SIZE_NONE;
/* Timing */
Timing.AddressSetupTime = 2;
Timing.AddressHoldTime = 1;
Timing.DataSetupTime = 2;
Timing.BusTurnAroundDuration = 0;
Timing.CLKDivision = 0;
Timing.DataLatency = 0;
Timing.AccessMode = FMC_ACCESS_MODE_A;
DMA Configuration:
hdma_memtomem_dma1_stream0.Instance = DMA1_Stream0;
hdma_memtomem_dma1_stream0.Init.Request = DMA_REQUEST_MEM2MEM;
hdma_memtomem_dma1_stream0.Init.Direction = DMA_MEMORY_TO_MEMORY;
hdma_memtomem_dma1_stream0.Init.PeriphInc = DMA_PINC_ENABLE;
hdma_memtomem_dma1_stream0.Init.MemInc = DMA_MINC_ENABLE;
hdma_memtomem_dma1_stream0.Init.PeriphDataAlignment = DMA_PDATAALIGN_WORD;
hdma_memtomem_dma1_stream0.Init.MemDataAlignment = DMA_MDATAALIGN_WORD; //Set this to DMA_MDATAALIGN_BYTE for 8-bit DMA configuration
hdma_memtomem_dma1_stream0.Init.Mode = DMA_NORMAL;
hdma_memtomem_dma1_stream0.Init.Priority = DMA_PRIORITY_MEDIUM;
hdma_memtomem_dma1_stream0.Init.FIFOMode = DMA_FIFOMODE_ENABLE;
hdma_memtomem_dma1_stream0.Init.FIFOThreshold = DMA_FIFO_THRESHOLD_FULL;
hdma_memtomem_dma1_stream0.Init.MemBurst = DMA_MBURST_SINGLE;
hdma_memtomem_dma1_stream0.Init.PeriphBurst = DMA_PBURST_SINGLE;
Transaction code (only one of the following is used at a time):
HAL_DMA_Start(&hdma_memtomem_dma1_stream0, 0x60000000, (uint32_t)(&data_buffer[0]), 256); //8 bit DMA
HAL_DMA_Start(&hdma_memtomem_dma1_stream0, 0x60000000, (uint32_t)(&data_buffer[0]), 256); //32 bit DMA
for (;;) //64 bit core-controlled
{
*((uint64_t*)&data_buffer[0]) = *(uint64_t*)0x60000000;
}