DMA latency

Lee3 · ‎2024-10-03

I working through issues on an STM32G4 where at times I'm not getting the reaction time from the DMA that I would like. I read AN2548, and it is a very helpful resource. However, I'm seeing slightly worse response times than AN2548 would suggest, and I'm wondering why. I set up a simple program to test latency, which is to set up a dma request generator that triggers off EXTI and sends a burst of on-off values to a GPIO. I've set the CPU to doing nops during this burst so that there is no bus contention. I found that the initial response takes about 11 cycles and the repeated values take 8 cycles each at 170 MHz, both with no jitter. It seems from AN2548 I should expect 6 cycles for each. In addition AN2548 suggests "The reasonable and recommended safety margin for this occasion is 50%, leaving one‑third of the total bus capacity in reserve". I'm wondering how to calculate this capacity. Is it 170e6/8 due to seeming 8 cycles per transaction?

  uint32_t gpio_a_bsrr[2] = {2, 2<<16};
  DMAMUX1_Channel0->CCR =  1; // req gen 0
  DMAMUX1_RequestGenerator0->RGCR = 1 << DMAMUX_RGxCR_GPOL_Pos | 0 << DMAMUX_RGxCR_SIG_ID_Pos | 31 << DMAMUX_RGxCR_GNBREQ_Pos | DMAMUX_RGxCR_GE;
  DMA1_Channel1->CMAR = (uint32_t)&gpio_a_bsrr;
  DMA1_Channel1->CPAR = (uint32_t)&GPIOA->BSRR;
  DMA1_Channel1->CNDTR = 2;
  DMA1_Channel1->CCR = DMA_CCR_CIRC | DMA_CCR_DIR | DMA_CCR_EN | DMA_CCR_MINC | DMA_CCR_MSIZE_1 | DMA_CCR_PSIZE_1;

PGump.1 · ‎2024-10-03

Hi,

How have you set the CPU to do nops?

Is the EXTI resampled before it triggers?

Peripherals and memory can located on different Busses, with individual timing. Have you looked at that?

Your test is measuring the Read & Write cycle of the DMA operation. Perhaps ST is measuring the latency for only the Read portion...

Kind regards
Pedro

AI = Artificial Intelligence, NI = No Intelligence, RI = Real Intelligence.

waclawek.jan · ‎2024-10-03

While ST does not publish enough details to be sure about the timing at this level, I believe what you see as "extra" latency (i.e. 11-6=5 cycles at first transfer) is combination of synchronization delay between the pin and DMAMUX and the delay imposed by the output buses (including the AHB-to-APB bridge).

What you then see as "extra cycles" in the burst is not exactly "latency", but consequence of that burst mechanism: it's not generated in DMA as such, but in DMAMUX, so there is a signal from DMA indicating end of transfer, from which DMAMUX generates the next request signal back to DMA; this is IMO what takes some cycles extra as compared to M2M transfers for which requests are "generated" directly in DMA (btw. try that for comparison).

Except for possible contention on the AHB/APB bridge, bus(es) throughput is not dependent on what you observe here, which are mainly processing/synchronizing signals outside of the buses.

JW