Fast ADC transfer and accumulate to on array - STM32U595

Anton_2 · ‎2024-09-30

Hi ST community,

I'm struggling to get required transfer and accumulate rate from ADC to an array, I'm focusing at the moment to the data transfer phase, which seems the slowest one.

Situation:

using STMCube IDE Version: 1.14.0

using 3 ADC: ADC1 ADC2 ADC4 to simultaneously acquire 3 signals, ADC start conversion is triggered by timer event.

Transfer of data is in the form of array[idx] = array[idx] + ADCdata in the HAL_ADC_ConvCpltCallback callback. When ADCx_seqPhase == 0, ADC data is transfered directly ( zeroing phase ).

Array indexes are managed outside of this function.

code is

void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef *hadc)
{
	if(hadc->Instance == ADC1)
	{
		if (ADC1_seqPhase == 0)
		{
			S.array[ altArrayACQ_ADC1 ][Array_Pointer-1] =  ADC1->DR;
		}
		else
		{
			S.array[ altArrayACQ_ADC1 ][Array_Pointer-1] +=  ADC1->DR;		
		}
	}

	if(hadc->Instance == ADC2)
	{
		if (ADC1_seqPhase == 0)	
		{
			S.array[ altArrayACQ_ADC2 ][Array_Pointer-1] = ADC2->DR;
		}
		else
		{
			S.array[ altArrayACQ_ADC2 ][Array_Pointer-1] += ADC2->DR;
		}

	}


	if(hadc->Instance == ADC4)
	{
		if (ADC1_seqPhase == 0)	
		{
			S.array[ altArrayACQ_ADC4 ][Array_Pointer-1] = ADC4->DR;

		}
		else
		{
			S.array[ altArrayACQ_ADC4 ][Array_Pointer-1] += ADC4->DR;
		}
	}
}

Looking at the compiled list ( only for the ADC1, just to avoid rewriting the same code 3 times) I see lots of low level instructions, which can somehow explain why performance is quite low.

		if (ADC1_seqPhase == 0)							
 8003c66:	4b56      	ldr	r3, [pc, #344]	; (8003dc0 <HAL_ADC_ConvCpltCallback+0x16c>)
 8003c68:	781b      	ldrb	r3, [r3, #0]
 8003c6a:	2b00      	cmp	r3, #0
 8003c6c:	d110      	bne.n	8003c90 <HAL_ADC_ConvCpltCallback+0x3c>
		{
			S.array[ altArrayACQ_ADC1 ][Array_Pointer-1] =  ADC1->DR;			
 8003c6e:	4a53      	ldr	r2, [pc, #332]	; (8003dbc <HAL_ADC_ConvCpltCallback+0x168>)
 8003c70:	4b54      	ldr	r3, [pc, #336]	; (8003dc4 <HAL_ADC_ConvCpltCallback+0x170>)
 8003c72:	781b      	ldrb	r3, [r3, #0]
 8003c74:	461c      	mov	r4, r3
 8003c76:	4b54      	ldr	r3, [pc, #336]	; (8003dc8 <HAL_ADC_ConvCpltCallback+0x174>)
 8003c78:	881b      	ldrh	r3, [r3, #0]
 8003c7a:	3b01      	subs	r3, #1
 8003c7c:	6c12      	ldr	r2, [r2, #64]	; 0x40
 8003c7e:	4853      	ldr	r0, [pc, #332]	; (8003dcc <HAL_ADC_ConvCpltCallback+0x178>)
 8003c80:	f44f 7196 	mov.w	r1, #300	; 0x12c
 8003c84:	fb04 f101 	mul.w	r1, r4, r1
 8003c88:	440b      	add	r3, r1
 8003c8a:	f840 2023 	str.w	r2, [r0, r3, lsl #2]
 8003c8e:	e01e      	b.n	8003cce <HAL_ADC_ConvCpltCallback+0x7a>
		}
		else
		{
			S.array[ altArrayACQ_ADC1 ][Array_Pointer-1] +=  ADC1->DR;			
 8003c90:	4b4a      	ldr	r3, [pc, #296]	; (8003dbc <HAL_ADC_ConvCpltCallback+0x168>)
 8003c92:	6c19      	ldr	r1, [r3, #64]	; 0x40
 8003c94:	4b4b      	ldr	r3, [pc, #300]	; (8003dc4 <HAL_ADC_ConvCpltCallback+0x170>)
 8003c96:	781b      	ldrb	r3, [r3, #0]
 8003c98:	461c      	mov	r4, r3
 8003c9a:	4b4b      	ldr	r3, [pc, #300]	; (8003dc8 <HAL_ADC_ConvCpltCallback+0x174>)
 8003c9c:	881b      	ldrh	r3, [r3, #0]
 8003c9e:	3b01      	subs	r3, #1
 8003ca0:	484a      	ldr	r0, [pc, #296]	; (8003dcc <HAL_ADC_ConvCpltCallback+0x178>)
 8003ca2:	f44f 7296 	mov.w	r2, #300	; 0x12c
 8003ca6:	fb04 f202 	mul.w	r2, r4, r2
 8003caa:	4413      	add	r3, r2
 8003cac:	f850 2023 	ldr.w	r2, [r0, r3, lsl #2]
 8003cb0:	4b44      	ldr	r3, [pc, #272]	; (8003dc4 <HAL_ADC_ConvCpltCallback+0x170>)
 8003cb2:	781b      	ldrb	r3, [r3, #0]
 8003cb4:	461c      	mov	r4, r3
 8003cb6:	4b44      	ldr	r3, [pc, #272]	; (8003dc8 <HAL_ADC_ConvCpltCallback+0x174>)
 8003cb8:	881b      	ldrh	r3, [r3, #0]
 8003cba:	3b01      	subs	r3, #1
 8003cbc:	440a      	add	r2, r1
 8003cbe:	4843      	ldr	r0, [pc, #268]	; (8003dcc <HAL_ADC_ConvCpltCallback+0x178>)
 8003cc0:	f44f 7196 	mov.w	r1, #300	; 0x12c
 8003cc4:	fb04 f101 	mul.w	r1, r4, r1
 8003cc8:	440b      	add	r3, r1
 8003cca:	f840 2023 	str.w	r2, [r0, r3, lsl #2]


	}

is there a more clever way to obtain a faster transfer and accumulate of ADC values? I've looked at DMA, but as far as I've understood no accumulate feature is available.

Best regards,

Anton

Uwe Bonnes · ‎2024-09-30

Did you have a look at the oversampling mode of the ADCs?

TDK · ‎2024-09-30

ADCs have oversampling, that would be better in terms of performance. Or use DMA and accumulate on the half-complete or full-complete callbacks. Having a callback for every conversion will kill performance.

More specifics on capture rate and accumulation needs would be helpful. Figure out what you want first, then figure out how to get it.

If you feel a post has answered your question, please click "Accept as Solution".

Anton_2 · ‎2024-10-01

Thanks for the replys.

ADCs are already oversampled, but it is necessary to acquire the signal at different time rather than acquiring the same signal with more resolution.

ADC is clocked at maximum clock and sample time is at minimum

this is what I have to do in the system

do 10 times
generate stimulus ( DAC )  -> acquire 3 response signals -> store and accumulate in array

this is necessary to do due to how the system phisically behave and how to mitigate noise

more precisely the stimulus is made of a few hundred points, so each array is the same length

the array of array structure is made to have a pair of array for each signal, one being acquired and one being processed : altArrayACQ_ADCx is used to point the array to be used for acquisition

@TDKdo you mean to acquire the signal on a separate array ( always the same, one for each ADC ) with DMA and then add to the desired array after all points are acquired? If so, I have to check if that is compatible with the logic of the program. Based on the new info I've shared, do you think is still meaningful?

TDK · ‎2024-10-01

> do you mean to acquire the signal on a separate array ( always the same, one for each ADC ) with DMA and then add to the desired array after all points are acquired?

Yes. But summing values has the same effect as oversampling so I don't see why it's needed or helpful here in place of oversampling.

> Based on the new info I've shared, do you think is still meaningful?

I think you should decide what you want first, then find out how to program it.

If your goal is measuring accurate DC values, high sample time and oversampling is the best way to get there.

If you feel a post has answered your question, please click "Accept as Solution".

Anton_2 · ‎2024-10-02

in this specific case, the summing is not the same as oversampling, because the signal is different each time. This is how this specific measure works.

Otherwise I agree that generally speaking and under some assumption, summing has the same effect as oversampling.

I'll try to have a look at DMA and how to implement it in this measuring scheme.

I'm still curios on how a simple register transfer and addition instruction (yes it has variable indexes) is converted into 25 low level instruction. Is this considered normal? I'm wondering if some boundaries checks are performed on indexes or if there are some penalties in accessing some RAM areas, but I'm not skilled enough to be able to understand what is actuallly happening at the low level.

TDK · ‎2024-10-02

I don't want to unduly belabor the point, but I do want to stress that oversampling is exactly the same as taking the measurement multiple times and computing the average, regardless of what the signal is doing. That is what it is doing under the hood--taking multiple measurements, summing, and optionally dividing them by some factor. It's not extending the sampling time or changing conversion or changing the nature of the measurement in any way.

If your concern is performance, that is the way to go.

> I'm still curios on how a simple register transfer and addition instruction (yes it has variable indexes) is converted into 25 low level instruction. Is this considered normal?

Compiling with optimizations (e.g. with the Release configuration) will produce more optimized code. The line in question has a lot going on within it. Optimizations could probably reduce the instruction count by 40% or so. Looks like it's loading the array value twice.

> S.array[ altArrayACQ_ADC1 ][Array_Pointer-1] += ADC1->DR;

If you feel a post has answered your question, please click "Accept as Solution".

Anton_2 · ‎2024-10-03

well, just using the release config shortened the number of instruction in the callback by 75% ....

overall the firmware runs a bit more than twice faster