2025-08-01 2:09 PM
I am measuring time time it takes to do ADC conversions on an STM32H753ZI Nucleo board, using the function function HAL_ADC_Start_DMA(). This converts 6 channels of ADC3, using 16 bits, with DMA. The one function call handles everything including DMA transfer. Then I am waiting for DMA to complete using interrupt handler HAL_ADC_ConvCpltCallback(). I am surprised how long this is taking and would like to ask if anyone has ideas on how to speed it up, or if perhaps this is expected. I set this up using CubeMX as follows:
ADC clock is10MHz. (I think max is 12MHz for 16bit ADC with LQFP144 package)
CPU clock is 100MHz.
ADC3 channels 0 thru 5 all have sample time 2.5 ADC clocks
ADC3 is set for 16 bits, conversion time = 8.5 ADC clocks I believe
This is a total of 11 ADC clock cycles, or 1.1usec
So for 6 channels total time should be about 1.1usec x 6 + DMA time, which should be 7-8usec. But the time I measure from HAL_ADC_Start_DMA() to HAL_ADC_ConvCpltCallback is about 40usec. I am compiling in release mode, with default optimization (-Os).
I normally avoid optimization but I tried -O2 and the measured time decreased to 28usec. I also tried a 200MHz CPU clock (ADC clock still 10MHz) and it decreased further to 22usec. But still this is slow compared with underlying hardware. Any thoughts are appreciated, and thanks for the help.
2025-08-01 3:44 PM
Putting code into ITCMRAM will help quite a bit. Put the vector table and the callback routines in there.
Using DTCMRAM for the stack will help.
Disabling the half-complete callback, if possible, will help. Should be able to disable it after HAL_ADC_Start_DMA but before it's called.
Disabling interrupts entirely and polling for completion would avoid a lot of the slowdown. But now we're deviating from how HAL expects things to be ran. There are sacrifices to be made (size, speed) for the niceties of HAL.
I'm surprised compiler optimization settings were able to get it from 40 us in default release mode down to 22 us. That's a lot.
2025-08-01 5:28 PM
I think you will see a better average sample rate with a higher number of samples; the overhead of setting up DMA is only needed once per call. That overhead is vastly improved by optimization as you have seen.
For me, the real benefit of DMA is that it allows the stm32’s arm processor to do other things while the ADC (or other slow peripheral) works as fast as it can.
2025-08-01 9:29 PM
Option -Os optimizes for minimal size, not for maximum speed.
2025-08-01 11:03 PM
Using HAL callbacks has an inherent overhead. Put a breakpoint in the raw interrupt handler (in some *_it.c file) and follow the path through the HAL.
Ironically, you will be faster off with polling the DMA completion flag. Of course, this will keep the cpu busy whilst polling.
If you do periodic measurements with circular DMA, this interrupts are less of an problem because measurments and callbacks will overlap. The latency still remains.
hth
KnarfB