Showing results for 
Search instead for 
Did you mean: 

How the stack work?


Hi there,

I remember the stack starts from the top, which means the earlier local variable will be located at a higher address space. 

I have the following code in the main() function in ARM-MDK IDE:



uint8_t v1;
uint8_t v2;
uint8_t v3;
LOG_DBG("&v1[%p] &v2[%p] &v3[%p]\r\n", &v1, &v2, &v3);



When I set the optimization to "Level 0(-O0)", the output is:



 &v1[20000BEC] &v2[20000BE8] &v3[20000BE4]



While when I set the optimization to "Level 3(-O3)", the output change to:



 &v1[20000BE4] &v2[20000BE8] &v3[20000BEC]



 There are 2 questions in my eyes:

1. I define the variable as uint8_t, it should space only one byte, why are there 4 bytes?

2. why does the direction of the stack increment is different with different optimization options?


Lead II

The presence and ordering of procedure's local data on the stack does not depend on the order of declarations. The compiler will arrange data in any way if finds reasonable, place a variable in a register or remove it completely if it is not needed. The same is true about data declared outside of functions.

The compiler, however, is required to preserve the exact order of fields/members in a structure (and also guarantee the proper alignment of every field, as required by the calling convention). Also, the order of function's arguments is preserved but in ARM architecture up to 4 arguments may be passed in registers, so these will not be pushed onto stack.


Thanks for your reply.

Now I understand the order of the address, but how about the space? Why does the MCU allocate 4 bytes for a uint8_t variable? In this way, there are many stack space will be wasted, yes?

BTW, I try the code at onlineGDB also, the output is as:



&v1[0x7ffee6e738b5] &v2[0x7ffee6e738b6] &v3[0x7ffee6e738b7]



we can see the address is allocated by byte and not alignment by 4.

Does that mean all these things are decided by the compiler, NOT by C language?

Principal III

and about the >  uint8_t, it should space only one byte, why are there 4 bytes?

the cpu is a 32 bit machine, so its "natural" size for numbers is 4 byte wide.

to load a byte, same time on bus is needed - if the byte is on 4-byte address aligned. if not the "natural" 32bit memory value is loaded and then shifted right to the byte position it is expected to be as a single byte value.

now it depends on the optimizer setting, what is desired: more speed or more compact code ?

for best speed, every variable is put on a 32bit adress, with some unused stuffing bytes to get it there.

but load is 1 clock, no need to shift the byte to the right byte position.

if memory space is tight, bytes get positions without unused space between them , but using and storing them might be slower then.

so you decide with the optimizer setting, what will be preferred.   speed or smaller code.

If you feel a post has answered your question, please click "Accept as Solution".

@AScha.3 Yes, you are right.

I just don't know why I have tried both -O0 and -O3, both of they is allocated to 4 bytes. What I expected is when I use -O0, the compiler would allocate 1 byte, and when I use -O3, 4 bytes would be allocated.

Maybe better is say O levels optimize code not data RAM use. Too better is say uint isnt native type, is aliased and in scripts for every compiler have different alligns... Native is unsigned char. Try compare this.
And third your code dont show what declaration you use. Seems be global not local. Global variables is stored from RAM start, against stacked local variables can be stored only inside registers = no ram or in stack on end RAM .

Edit: I skip reading your in main func then variable is local , but in main declared is next special ...

well, what the optimizer will do or not - is complex, take look there, what options are possible and included, if choosing just a "simple" looking -O2 :

and about the ram usage: as @MM..1 said, optimized is at first the code. what it doing on variable/space ...try to read/find it for the compiler you use.

+ because 32bit is the "natural" size for a cpu with 32b registers generating code might also go this way: variables always get a 32b aligned address, only optimize for minimum memory usage might change this.

If you feel a post has answered your question, please click "Accept as Solution".