ADC is slow using HAL

jim_b · ‎2025-08-01

I am measuring time time it takes to do ADC conversions on an STM32H753ZI Nucleo board, using the function function HAL_ADC_Start_DMA(). This converts 6 channels of ADC3, using 16 bits, with DMA. The one function call handles everything including DMA transfer. Then I am waiting for DMA to complete using interrupt handler HAL_ADC_ConvCpltCallback(). I am surprised how long this is taking and would like to ask if anyone has ideas on how to speed it up, or if perhaps this is expected. I set this up using CubeMX as follows:

ADC clock is10MHz. (I think max is 12MHz for 16bit ADC with LQFP144 package)

CPU clock is 100MHz.

ADC3 channels 0 thru 5 all have sample time 2.5 ADC clocks

ADC3 is set for 16 bits, conversion time = 8.5 ADC clocks I believe

This is a total of 11 ADC clock cycles, or 1.1usec

So for 6 channels total time should be about 1.1usec x 6 + DMA time, which should be 7-8usec. But the time I measure from HAL_ADC_Start_DMA() to HAL_ADC_ConvCpltCallback is about 40usec. I am compiling in release mode, with default optimization (-Os).

I normally avoid optimization but I tried -O2 and the measured time decreased to 28usec. I also tried a 200MHz CPU clock (ADC clock still 10MHz) and it decreased further to 22usec. But still this is slow compared with underlying hardware. Any thoughts are appreciated, and thanks for the help.

TDK · ‎2025-08-01

Putting code into ITCMRAM will help quite a bit. Put the vector table and the callback routines in there.

Using DTCMRAM for the stack will help.

Disabling the half-complete callback, if possible, will help. Should be able to disable it after HAL_ADC_Start_DMA but before it's called.

Disabling interrupts entirely and polling for completion would avoid a lot of the slowdown. But now we're deviating from how HAL expects things to be ran. There are sacrifices to be made (size, speed) for the niceties of HAL.

I'm surprised compiler optimization settings were able to get it from 40 us in default release mode down to 22 us. That's a lot.

If you feel a post has answered your question, please click "Accept as Solution".

View solution in original post

TDK · ‎2025-08-01

Putting code into ITCMRAM will help quite a bit. Put the vector table and the callback routines in there.

Using DTCMRAM for the stack will help.

Disabling the half-complete callback, if possible, will help. Should be able to disable it after HAL_ADC_Start_DMA but before it's called.

Disabling interrupts entirely and polling for completion would avoid a lot of the slowdown. But now we're deviating from how HAL expects things to be ran. There are sacrifices to be made (size, speed) for the niceties of HAL.

I'm surprised compiler optimization settings were able to get it from 40 us in default release mode down to 22 us. That's a lot.

If you feel a post has answered your question, please click "Accept as Solution".

Danish1 · ‎2025-08-01

I think you will see a better average sample rate with a higher number of samples; the overhead of setting up DMA is only needed once per call. That overhead is vastly improved by optimization as you have seen.

For me, the real benefit of DMA is that it allows the stm32’s arm processor to do other things while the ADC (or other slow peripheral) works as fast as it can.

Pavel A. · ‎2025-08-01

Option -Os optimizes for minimal size, not for maximum speed.

KnarfB · ‎2025-08-01

Using HAL callbacks has an inherent overhead. Put a breakpoint in the raw interrupt handler (in some *_it.c file) and follow the path through the HAL.

Ironically, you will be faster off with polling the DMA completion flag. Of course, this will keep the cpu busy whilst polling.

If you do periodic measurements with circular DMA, this interrupts are less of an problem because measurments and callbacks will overlap. The latency still remains.

hth

KnarfB

jim_b · ‎2025-08-05

TDK, Danish, Pavel and Knarf ---

Thanks for suggestions. I did not know about ITCM RAM for zero wait state code execution. Apparently 64K in size for STM32H7 series. I tried putting my ADC functions into it but only had small speedup, but am now looking into putting HAL code itself into it, also vector table, not so simple. I think HAL code is what really needs to be there. I agree that HAL does not seem efficient, but as mentioned it's a trade off.

The speedup using -O2 was from 40usec to 28usec. The 22usec measurement was obtained by also changing CPU clock from 100MHz to 200MHZ. Not sure I can use -O2, my code is supposed to be "hi-rel". But probably okay with higher clock. By the way -O1 gave me 31usec.

One thing I found is that changing the DMA mode from "normal" to "circular" gave me considerable improvement, from 40usec to 32usec. Apparently this saves the CPU from having to configure DMA each time. (I am doing data acquisition using repetitive sequences of 6 ADC conversions each sequence).

Am also experimenting with reducing sample time from 2.5 cycles to 1.5, but need to verify accuracy is not degrading. The accuracy of the ADC in 16 bit mode seems very good, better that ADCs in other CPUs I have used.

Thanks,

Jim