Optimizing ADC DMA transfert and IRQ latency

tarzan · ‎2023-02-27

Hello,

I have the following config:

STM32H743

- Core 400MHz

- AXI/AHB 200MHz

- ADC clocked by the PLL at 144MHz, then divided by 4 (async mode) => 36MHz

- TIM15 trigering ADC conversions

ADC1 in scan mode converting 2 channels in scan mode. Data sent to SRAM1 using DMA1s1

ADC2 in scan mode converting 3 channels. Data sent to SRAM1 using DMA1s2

ADC3 in scan mode converting 4 channels. Data sent to SRAM1 using DMA2s3

I'm only using the IRQ of DMA2. Since the ADC/DMA have have 4 channels to convert, when entering the DMA "transfert complete" IRQ callback, I'm sure all other data are in place.

All code, interrupt routine, interrupt vector table are in ITCM RAM. All variables are in DTCM RAM excepted the DMA destination buffer, located in SRAM because the DMA can not access the DTCM.

I don't use any of the HAL functions excepted for the init. Code optimized for speed.

When entering in the DMA interrupt, I set up a GPIO the measure the time between the TMR15 trig (start of all ADC conversions) and the DMA IT. I'm using some tricks to ensure that the optimizer don't reorganize my code.

I have 1794ns between the T15 trig and the GPIO up. The ADC trig + sampling + conversion theorically last 1611ns. They are 183ns lost in DMA transfert, IRQ latency and GPIO set up. This is 73 CPU cycles, or 36 AXI/AHB cycles. I've tried to use BDMA for ADC3 but the result is a little bit slower.

1/ Is is a good score, or is there any way to may things a little bit faster?

In the interrupt routine, I start by copying all ADC data in SRAM to a DTCM RAM buffer. It last 155 CPU cycles (17cy/word).

2/ Theses 17 CPU cycles (8 AXI/AHB cycles) looks quite long for a simple copy. Is evetything normal ?

C code of the IRQ callback:

void DMA2_Stream3_IRQHandler(void)
{
  register uint32_t chan;
  register uint32_t square;
 
   mTP34_ON;                                 // This is my GPIO. Direct write to register.
   asm volatile("" ::: "memory");       // Avoid asm code re-ordering by the optimizer.
 
  // Copy 9 uint32_t word from SRAM to DTCM RAM.
  adcData_dtcm[1] = adc.array[1]; // ADC1, DMA1s1
  adcData_dtcm[2] = adc.array[2]; // ADC1, DMA1s1
 
  adcData_dtcm[3] = adc.array[3]; // ADC2, DMA1s2
  adcData_dtcm[4] = adc.array[4]; // ADC2, DMA1s2
  adcData_dtcm[5] = adc.array[5]; // ADC2, DMA1s2
 
  adcData_dtcm[6] = adc.array[6]; // ADC3, DMA2s3
  adcData_dtcm[7] = adc.array[7]; // ADC3, DMA2s3
  adcData_dtcm[8] = adc.array[8]; // ADC3, DMA2s3
  adcData_dtcm[9] = adc.array[9]; // ADC3, DMA2s3
 
  asm volatile("" ::: "memory");
  mTP3_OFF;
  asm volatile("" ::: "memory");

S.Ma · ‎2023-02-27

"I'm only using the IRQ of DMA2. Since the ADC/DMA have have 4 channels to convert, when entering the DMA "transfert complete" IRQ callback, I'm sure all other data are in place."

Why ? ADC have programmable sample and hold time which can be expressed from few cycles to 500+ cycles. Are all ADC running with same clock source, prescale, resolution, oversampling, sample and hold values ?

tarzan · ‎2023-02-28

Yes, the config is exactly the same excepted the DMA channel / stream.