cancel
Showing results for 
Search instead for 
Did you mean: 

Processing time with STM32H7A3 differs depending on the variable.

htsuj.1
Associate II

Hello.

I am developing with a STM32H7A3.

When dealing with local variables or global variables when executing test code on the microcomputer, the processing time when using global variables is about twice as long as when using local variables.

Is this problem affected by the processing performance of the microcomputer? Or is there something else to set? Please provide information.

Below is the test code.

The processing time differs depending on whether the variable of "gs_test" is a local variable or a global variable.

0693W00000LwNR5QAN.png

8 REPLIES 8
Nikita91
Lead II

Wat is your optimization level?

Where is declared your global variable (in another source file)?

Inspect the generated code in the .list file (in the Debug folder).

Danish1
Lead II

Global variables are always at the same place in memory. So accessing them just involves the cpu accessing the memory from that address.

Local variables are stored in a "stack frame", in other words a certain offset from the stack-pointer. This is necessary because languages like C allow functions to be called from anywhere including themselves (recursion) and each time the local variable is unique* and altering it won't affect the value for other invocations of the function.

So the cpu has to add this offset to its current stack-pointer and only then does it have the memory-location of the variable.

This extra calculation can make local variables slower.

But.

Things can change and even go the other way when you turn on optimisation.

And local variables tend to have much reduced risk of programmer-errors (i.e. bugs).

So except for very specific code where you absolutely must hand-optimise cycles, stick with what is "logically" correct; generally this means local.

Also note that the stm32h7 family are very complicated things, and access time can vary depending which memory-bank a variable lives in, and whether this stays in the cache.

*Unless the local variable is declared as static. But that isn't what is normally wanted.

Hope this helps,

Danish

Thanks, Nikita91.

For convenience of development, we have not optimized it.

Also, global variables are declared in the same source file and are placed in AXI_SRAM.

Postscript:

The operating frequency of CPU, AXI, AHB, TIM is set to 280MHz and the operating frequency of APB is set to 140MHz.

Thanks, Danish.

I'm sorry that this question was difficult to understand.

I am asking about the reason why the processing time changes depending on whether the variable of "gs_test" is the Stack area or the AXI_SRAM area.

As in the answer to the previous question, is processing time of AXI_SRAM is slow even if the CPU and AXI operating frequencies match?

Where is the stack ? If it is in the AXI_SRAM there is no difference.

Maybe the compiler better optimizes local variables (keeps them in register).

Often for a global variable, the code first loads the address of this variable then then accesses this address => two memory accesses for 1 access to the variable

Again: compare the assembly code generated in both cases. It's the best way to find out what's going on.

TDK
Guru

The stack (and therefore local variables) are typically in DTCMRAM which is faster than AXI_SRAM.

Your linker file will show where these things are.

If you feel a post has answered your question, please click "Accept as Solution".

Thanks, TDK.

When I placed the global variable "gs_test" in the DTCMRAM area and measured the processing time, there was no change in the processing time compared to the AXI_SRAM area.

Compare assembly output:

For access to a global variable, its address (32-bit value) must be loaded into a register, then the variable's value is read (indirect access via other register's contents). Makes two instructions, and the second depends on the first (i.e. the first register load had to be completed before second one can start). (There might be cases where the variable's address is already in a register due to a previous access, so the penalty may not be observed in all cases.)

For a local variable (at least when offset from current top of stack to variable is small), an single SP-relative load instruction does the whole job, hence this will be faster in general.

But in both cases there is always the possibilty that the variable is held in a register (as already mentioned above). For a global variable certainly less likely. The net effect is hardly predictable, depends on optimizations, compiler, surrounding code etc., so without looking at the assembly output, that's all rather speculative.