cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H723: How to optimize summation of an array?

yonatan
Associate III

Hi folks.

I am trying to optimize (by time) the following piece of code.

for (uint32_t i = 6 + adc_data_index; i < 35 + adc_data_index; i++) { raw[0] += (adc_data[i]); raw[1] += (adc_data[i + 35]); raw[2] += (adc_data[i + 70]); raw[3] += (adc_data[i + 115]); }

For now it takes 3.5 micro-second at 250 MHz clock

I want to make it less by at least factor of 2.

Do you have any ideas?

What I tried:

1. Change the optimization to be -Ofast

2. Using pointer

3. Also, thought about FMAC and DFSDM

How can I achieve that?

Thanks

Yonatan

23 REPLIES 23

It saved me ~250 nS

Oh my, that's disappointing... :(

LCE
Principal II

Maybe it helps if you place at least the iteration variable i and the destination buffer raw[] into DTCM.

And / or using data cache might help.

yonatan
Associate III

Hi @TDK 

This looks good.

I cannot use the DTCM because I use the DMA for the ADC samples.

However, I tried to put the time-consuming function in the ITCM.

However I an encountering very strange phenomena.

It works with the debugger (after flashing) but does not after reconnecting to power.

I did it with 3 steps:

1. Added this above the time-consuming function

__attribute__((section(".itcmram")))

2. Added to the linker script new section:

_siitcmram = LOADADDR(.itcmram); .itcmram : { . = ALIGN(4); _sitcmram = .; /* create a global symbol at data start */ *(.itcmram) /* .data sections */ *(.itcmram*) /* .data* sections */ . = ALIGN(4); _eitcmram = .; /* define a global symbol at data end */ } >ITCMRAM

3. Then added to startup.s:

CopyDataInitITCM: ldr r3, =_siitcmram ldr r3, [r3, r1] str r3, [r0, r1] adds r1, r1, #4 LoopCopyDataInitITCM: ldr r0, =_sitcmram ldr r3, =_eitcmram adds r2, r0, r1 cmp r2, r3 bcc CopyDataInitITCM ldr r2, =_sbss b LoopFillZerobss

What am I missing??

Again, it works after flashing the MCU with the SWD and also after disconnecting.

However, After power reset it looks like it is stuck (LED is not blinking)

I can't debug it to find where is it because when using the debugger it runs.

What you all think?

Just a question: you just summing the adc results , for average or better resolution ?

Why not use the ADC , let him make the sum : by oversampling ?

AScha3_0-1746446825655.png

Or using the DFSDM , to get a filtered average ?

If you feel a post has answered your question, please click "Accept as Solution".

When it's stuck, to find out where it is, attach a debugger and debug without downloading the code or resetting the device. You can access these options within the debugger configuration.

If you feel a post has answered your question, please click "Accept as Solution".

I can't debug it to find where is it because when using the debugger it runs.

Debug without (re)downloading the program.

* First, download the program by any way (debugger or CubeProgramer).

* Next, configure the debugger to load symbols only, without writing to flash. 

* Power cycle the board.

* Start debugging (remember that the code is already flashed).

Check that you're stopping at main and your itcmram code has been copied properly (disassemble it, put a breakpoint there)

* Run, see where it gets stuck.

 

 

 

You are right. my problem is that I don't want all the samples, just part of them.

regarding the DFSDM - Can I use it to sum just part of the samples?

For who are wondering, I had a bug in the startup file.

This code worked for me:

// Above the c function __attribute__((section(".itcmram"))) //In the linker script file: _siITCMRAM = LOADADDR(.ITCMRAM); .ITCMRAM : { . = ALIGN(4); _sITCMRAM = .; /* create a global symbol at itcmram start */ *(.itcmram) /* .itcmram sections */ *(.itcmram*) /* .itcmram* sections */ . = ALIGN(4); _eITCMRAM = .; /* define a global symbol at itcmram end */ } >ITCMRAM AT> FLASH // In startup file: (after LoopFillZerobss) ldr r0, =_sITCMRAM ldr r1, =_eITCMRAM ldr r2, =_siITCMRAM movs r3, #0 b LoopCopyITCMRAM CopyITCMRAM: ldr r4, [r2, r3] str r4, [r0, r3] adds r3, r3, #4 LoopCopyITCMRAM: adds r4, r0, r3 cmp r4, r1 bcc CopyITCMRAM
View more
LCE
Principal II

Thanks for coming back with the working code!

How's the timing with the function in ITCM RAM?

yonatan
Associate III

it saved me something like another ~500 nS

(BTW, I also enabled the ICache so the "profit" is small)