STM32H723: How to optimize summation of an array?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-05-04 5:27 AM
Hi folks.
I am trying to optimize (by time) the following piece of code.
for (uint32_t i = 6 + adc_data_index; i < 35 + adc_data_index; i++)
{
raw[0] += (adc_data[i]);
raw[1] += (adc_data[i + 35]);
raw[2] += (adc_data[i + 70]);
raw[3] += (adc_data[i + 115]);
}
For now it takes 3.5 micro-second at 250 MHz clock
I want to make it less by at least factor of 2.
Do you have any ideas?
What I tried:
1. Change the optimization to be -Ofast
2. Using pointer
3. Also, thought about FMAC and DFSDM
How can I achieve that?
Thanks
Yonatan
Solved! Go to Solution.
- Labels:
-
DFSDM
-
STM32H7 Series
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-05-05 4:49 AM
> It saved me ~250 nS
Oh my, that's disappointing... :(
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-05-05 4:51 AM - edited ‎2025-05-05 4:52 AM
Maybe it helps if you place at least the iteration variable i and the destination buffer raw[] into DTCM.
And / or using data cache might help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-05-05 4:55 AM
Hi @TDK
This looks good.
I cannot use the DTCM because I use the DMA for the ADC samples.
However, I tried to put the time-consuming function in the ITCM.
However I an encountering very strange phenomena.
It works with the debugger (after flashing) but does not after reconnecting to power.
I did it with 3 steps:
1. Added this above the time-consuming function
__attribute__((section(".itcmram")))
2. Added to the linker script new section:
_siitcmram = LOADADDR(.itcmram);
.itcmram :
{
. = ALIGN(4);
_sitcmram = .; /* create a global symbol at data start */
*(.itcmram) /* .data sections */
*(.itcmram*) /* .data* sections */
. = ALIGN(4);
_eitcmram = .; /* define a global symbol at data end */
} >ITCMRAM
3. Then added to startup.s:
CopyDataInitITCM:
ldr r3, =_siitcmram
ldr r3, [r3, r1]
str r3, [r0, r1]
adds r1, r1, #4
LoopCopyDataInitITCM:
ldr r0, =_sitcmram
ldr r3, =_eitcmram
adds r2, r0, r1
cmp r2, r3
bcc CopyDataInitITCM
ldr r2, =_sbss
b LoopFillZerobss
What am I missing??
Again, it works after flashing the MCU with the SWD and also after disconnecting.
However, After power reset it looks like it is stuck (LED is not blinking)
I can't debug it to find where is it because when using the debugger it runs.
What you all think?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-05-05 5:07 AM - edited ‎2025-05-05 5:10 AM
Just a question: you just summing the adc results , for average or better resolution ?
Why not use the ADC , let him make the sum : by oversampling ?
Or using the DFSDM , to get a filtered average ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-05-05 5:29 AM
When it's stuck, to find out where it is, attach a debugger and debug without downloading the code or resetting the device. You can access these options within the debugger configuration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-05-05 12:56 PM - edited ‎2025-05-05 12:56 PM
> I can't debug it to find where is it because when using the debugger it runs.
Debug without (re)downloading the program.
* First, download the program by any way (debugger or CubeProgramer).
* Next, configure the debugger to load symbols only, without writing to flash.
* Power cycle the board.
* Start debugging (remember that the code is already flashed).
Check that you're stopping at main and your itcmram code has been copied properly (disassemble it, put a breakpoint there)
* Run, see where it gets stuck.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-05-05 10:54 PM
You are right. my problem is that I don't want all the samples, just part of them.
regarding the DFSDM - Can I use it to sum just part of the samples?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-05-06 12:23 AM
For who are wondering, I had a bug in the startup file.
This code worked for me:
// Above the c function
__attribute__((section(".itcmram")))
//In the linker script file:
_siITCMRAM = LOADADDR(.ITCMRAM);
.ITCMRAM :
{
. = ALIGN(4);
_sITCMRAM = .; /* create a global symbol at itcmram start */
*(.itcmram) /* .itcmram sections */
*(.itcmram*) /* .itcmram* sections */
. = ALIGN(4);
_eITCMRAM = .; /* define a global symbol at itcmram end */
} >ITCMRAM AT> FLASH
// In startup file: (after LoopFillZerobss)
ldr r0, =_sITCMRAM
ldr r1, =_eITCMRAM
ldr r2, =_siITCMRAM
movs r3, #0
b LoopCopyITCMRAM
CopyITCMRAM:
ldr r4, [r2, r3]
str r4, [r0, r3]
adds r3, r3, #4
LoopCopyITCMRAM:
adds r4, r0, r3
cmp r4, r1
bcc CopyITCMRAM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-05-06 3:53 AM
Thanks for coming back with the working code!
How's the timing with the function in ITCM RAM?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-05-06 4:31 AM - edited ‎2025-05-06 4:32 AM
it saved me something like another ~500 nS
(BTW, I also enabled the ICache so the "profit" is small)
