2024-09-30 02:04 PM
Hi ST community,
I'm struggling to get required transfer and accumulate rate from ADC to an array, I'm focusing at the moment to the data transfer phase, which seems the slowest one.
Situation:
using STMCube IDE Version: 1.14.0
using 3 ADC: ADC1 ADC2 ADC4 to simultaneously acquire 3 signals, ADC start conversion is triggered by timer event.
Transfer of data is in the form of array[idx] = array[idx] + ADCdata in the HAL_ADC_ConvCpltCallback callback. When ADCx_seqPhase == 0, ADC data is transfered directly ( zeroing phase ).
Array indexes are managed outside of this function.
code is
void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef *hadc)
{
if(hadc->Instance == ADC1)
{
if (ADC1_seqPhase == 0)
{
S.array[ altArrayACQ_ADC1 ][Array_Pointer-1] = ADC1->DR;
}
else
{
S.array[ altArrayACQ_ADC1 ][Array_Pointer-1] += ADC1->DR;
}
}
if(hadc->Instance == ADC2)
{
if (ADC1_seqPhase == 0)
{
S.array[ altArrayACQ_ADC2 ][Array_Pointer-1] = ADC2->DR;
}
else
{
S.array[ altArrayACQ_ADC2 ][Array_Pointer-1] += ADC2->DR;
}
}
if(hadc->Instance == ADC4)
{
if (ADC1_seqPhase == 0)
{
S.array[ altArrayACQ_ADC4 ][Array_Pointer-1] = ADC4->DR;
}
else
{
S.array[ altArrayACQ_ADC4 ][Array_Pointer-1] += ADC4->DR;
}
}
}
Looking at the compiled list ( only for the ADC1, just to avoid rewriting the same code 3 times) I see lots of low level instructions, which can somehow explain why performance is quite low.
if (ADC1_seqPhase == 0)
8003c66: 4b56 ldr r3, [pc, #344] ; (8003dc0 <HAL_ADC_ConvCpltCallback+0x16c>)
8003c68: 781b ldrb r3, [r3, #0]
8003c6a: 2b00 cmp r3, #0
8003c6c: d110 bne.n 8003c90 <HAL_ADC_ConvCpltCallback+0x3c>
{
S.array[ altArrayACQ_ADC1 ][Array_Pointer-1] = ADC1->DR;
8003c6e: 4a53 ldr r2, [pc, #332] ; (8003dbc <HAL_ADC_ConvCpltCallback+0x168>)
8003c70: 4b54 ldr r3, [pc, #336] ; (8003dc4 <HAL_ADC_ConvCpltCallback+0x170>)
8003c72: 781b ldrb r3, [r3, #0]
8003c74: 461c mov r4, r3
8003c76: 4b54 ldr r3, [pc, #336] ; (8003dc8 <HAL_ADC_ConvCpltCallback+0x174>)
8003c78: 881b ldrh r3, [r3, #0]
8003c7a: 3b01 subs r3, #1
8003c7c: 6c12 ldr r2, [r2, #64] ; 0x40
8003c7e: 4853 ldr r0, [pc, #332] ; (8003dcc <HAL_ADC_ConvCpltCallback+0x178>)
8003c80: f44f 7196 mov.w r1, #300 ; 0x12c
8003c84: fb04 f101 mul.w r1, r4, r1
8003c88: 440b add r3, r1
8003c8a: f840 2023 str.w r2, [r0, r3, lsl #2]
8003c8e: e01e b.n 8003cce <HAL_ADC_ConvCpltCallback+0x7a>
}
else
{
S.array[ altArrayACQ_ADC1 ][Array_Pointer-1] += ADC1->DR;
8003c90: 4b4a ldr r3, [pc, #296] ; (8003dbc <HAL_ADC_ConvCpltCallback+0x168>)
8003c92: 6c19 ldr r1, [r3, #64] ; 0x40
8003c94: 4b4b ldr r3, [pc, #300] ; (8003dc4 <HAL_ADC_ConvCpltCallback+0x170>)
8003c96: 781b ldrb r3, [r3, #0]
8003c98: 461c mov r4, r3
8003c9a: 4b4b ldr r3, [pc, #300] ; (8003dc8 <HAL_ADC_ConvCpltCallback+0x174>)
8003c9c: 881b ldrh r3, [r3, #0]
8003c9e: 3b01 subs r3, #1
8003ca0: 484a ldr r0, [pc, #296] ; (8003dcc <HAL_ADC_ConvCpltCallback+0x178>)
8003ca2: f44f 7296 mov.w r2, #300 ; 0x12c
8003ca6: fb04 f202 mul.w r2, r4, r2
8003caa: 4413 add r3, r2
8003cac: f850 2023 ldr.w r2, [r0, r3, lsl #2]
8003cb0: 4b44 ldr r3, [pc, #272] ; (8003dc4 <HAL_ADC_ConvCpltCallback+0x170>)
8003cb2: 781b ldrb r3, [r3, #0]
8003cb4: 461c mov r4, r3
8003cb6: 4b44 ldr r3, [pc, #272] ; (8003dc8 <HAL_ADC_ConvCpltCallback+0x174>)
8003cb8: 881b ldrh r3, [r3, #0]
8003cba: 3b01 subs r3, #1
8003cbc: 440a add r2, r1
8003cbe: 4843 ldr r0, [pc, #268] ; (8003dcc <HAL_ADC_ConvCpltCallback+0x178>)
8003cc0: f44f 7196 mov.w r1, #300 ; 0x12c
8003cc4: fb04 f101 mul.w r1, r4, r1
8003cc8: 440b add r3, r1
8003cca: f840 2023 str.w r2, [r0, r3, lsl #2]
}
is there a more clever way to obtain a faster transfer and accumulate of ADC values? I've looked at DMA, but as far as I've understood no accumulate feature is available.
Best regards,
Anton
2024-09-30 02:58 PM
Did you have a look at the oversampling mode of the ADCs?
2024-09-30 03:00 PM
ADCs have oversampling, that would be better in terms of performance. Or use DMA and accumulate on the half-complete or full-complete callbacks. Having a callback for every conversion will kill performance.
More specifics on capture rate and accumulation needs would be helpful. Figure out what you want first, then figure out how to get it.
2024-10-01 01:47 PM
Thanks for the replys.
ADCs are already oversampled, but it is necessary to acquire the signal at different time rather than acquiring the same signal with more resolution.
ADC is clocked at maximum clock and sample time is at minimum
this is what I have to do in the system
do 10 times
generate stimulus ( DAC ) -> acquire 3 response signals -> store and accumulate in array
this is necessary to do due to how the system phisically behave and how to mitigate noise
more precisely the stimulus is made of a few hundred points, so each array is the same length
the array of array structure is made to have a pair of array for each signal, one being acquired and one being processed : altArrayACQ_ADCx is used to point the array to be used for acquisition
@TDKdo you mean to acquire the signal on a separate array ( always the same, one for each ADC ) with DMA and then add to the desired array after all points are acquired? If so, I have to check if that is compatible with the logic of the program. Based on the new info I've shared, do you think is still meaningful?
2024-10-01 08:21 PM
> do you mean to acquire the signal on a separate array ( always the same, one for each ADC ) with DMA and then add to the desired array after all points are acquired?
Yes. But summing values has the same effect as oversampling so I don't see why it's needed or helpful here in place of oversampling.
> Based on the new info I've shared, do you think is still meaningful?
I think you should decide what you want first, then find out how to program it.
If your goal is measuring accurate DC values, high sample time and oversampling is the best way to get there.
2024-10-02 02:09 PM
in this specific case, the summing is not the same as oversampling, because the signal is different each time. This is how this specific measure works.
Otherwise I agree that generally speaking and under some assumption, summing has the same effect as oversampling.
I'll try to have a look at DMA and how to implement it in this measuring scheme.
I'm still curios on how a simple register transfer and addition instruction (yes it has variable indexes) is converted into 25 low level instruction. Is this considered normal? I'm wondering if some boundaries checks are performed on indexes or if there are some penalties in accessing some RAM areas, but I'm not skilled enough to be able to understand what is actuallly happening at the low level.
2024-10-02 02:34 PM
I don't want to unduly belabor the point, but I do want to stress that oversampling is exactly the same as taking the measurement multiple times and computing the average, regardless of what the signal is doing. That is what it is doing under the hood--taking multiple measurements, summing, and optionally dividing them by some factor. It's not extending the sampling time or changing conversion or changing the nature of the measurement in any way.
If your concern is performance, that is the way to go.
> I'm still curios on how a simple register transfer and addition instruction (yes it has variable indexes) is converted into 25 low level instruction. Is this considered normal?
Compiling with optimizations (e.g. with the Release configuration) will produce more optimized code. The line in question has a lot going on within it. Optimizations could probably reduce the instruction count by 40% or so. Looks like it's loading the array value twice.
> S.array[ altArrayACQ_ADC1 ][Array_Pointer-1] += ADC1->DR;
2024-10-03 02:03 PM
well, just using the release config shortened the number of instruction in the callback by 75% ....
overall the firmware runs a bit more than twice faster