2025-05-04 5:27 AM
Hi folks.
I am trying to optimize (by time) the following piece of code.
for (uint32_t i = 6 + adc_data_index; i < 35 + adc_data_index; i++)
{
raw[0] += (adc_data[i]);
raw[1] += (adc_data[i + 35]);
raw[2] += (adc_data[i + 70]);
raw[3] += (adc_data[i + 115]);
}
For now it takes 3.5 micro-second at 250 MHz clock
I want to make it less by at least factor of 2.
Do you have any ideas?
What I tried:
1. Change the optimization to be -Ofast
2. Using pointer
3. Also, thought about FMAC and DFSDM
How can I achieve that?
Thanks
Yonatan
2025-05-04 7:15 AM
Suggestion: avoid computations in loop, like replacing i with an arrray before running the loop, plus run the loop from a to zero to optimize end of loop check.
2025-05-04 7:31 AM
2025-05-04 8:02 AM - edited 2025-05-04 8:03 AM
> 3.5 micro-second at 250 MHz clock
So 875 cycles and you're doing 116 (4*29) summations. Probably some improvement to be made.
Storing raw and adc_data in DTCMRAM will help.
Enabling data cache if not already enabled will help a lot.
Executing the function out of ITCMRAM for the function will also help.
Looking at the disassembly will be the most useful here to understand what the compiler is doing and seeing what is unnecessary. That can help guide you to the right solution. I imagine using a pointer for access and comparing the loop variable to a pointer constant rather than 35 + X will help a bit.