cancel
Showing results for 
Search instead for 
Did you mean: 

Cortex-M0 Thumb alignment

francesco_gritti
Associate II

Hi everybody,

I am using STM32F042G6 MCU for an automotive project and since It doesn't have mush FLASH (32kB) I am trying to reduce code size as much as possible.

Lookign at the linker file generated by STM32CubeMX, I see that instructions are aligned to 4 bytes. Since M0 architecture uses Thumb instruction set which is 16 bit wide, I was wondering wether instructions can be aligned to 2 bytes, saving some flash. Since I am not very experienced in this area of MCU I would like to know from you if this is correct or if I am missing something. I will link part of the linker script. I definitely understand why isr_vector is aligned to 4 bytes but I quite don't why .text is.

Thank you in advence!

 

 

SECTIONS

{

/* The startup code into "FLASH" Rom type memory */

.isr_vector :

{

. = ALIGN(4);

KEEP(*(.isr_vector)) /* Startup code */

. = ALIGN(4);

} >FLASH

 

/* The program code and other data into "FLASH" Rom type memory */

.text :

{

. = ALIGN(4);

*(.text) /* .text sections (code) */

*(.text*) /* .text* sections (code) */

*(.glue_7) /* glue arm to thumb code */

*(.glue_7t) /* glue thumb to arm code */

*(.eh_frame)

 

KEEP (*(.init))

KEEP (*(.fini))

 

. = ALIGN(4);

_etext = .; /* define a global symbols at end of code */

} >FLASH

...

 

1 ACCEPTED SOLUTION

Accepted Solutions

Because of the literal pools interleaved with the code.

The immediate load instructions can handle many patterns, but some 32-bit values need to be loaded via near PC relative loads.

Shrinking the alignment is not going to save significant amount of space. An CM0 faults with misaligned loads/stores

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

View solution in original post

6 REPLIES 6

Because of the literal pools interleaved with the code.

The immediate load instructions can handle many patterns, but some 32-bit values need to be loaded via near PC relative loads.

Shrinking the alignment is not going to save significant amount of space. An CM0 faults with misaligned loads/stores

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
francesco_gritti
Associate II

Thank you so much @Tesla DeLorean that's make perfectly sense.

> Shrinking the alignment is not going to save significant amount of space.

because it's not true that

>> [I see that] instructions are aligned to 4 bytes.

Alignment in linker script as you've presented above affects only the symbols and addresses aligned there, mostly it's just added alignment in between the sections.

The compiler imposes some (and not insignificant) alignment, too; but again it's not that individual instructions are aligned other than to 16-bits. Have a look at the mapfile and at the disasm.

JW

Looping and prefetch will work better with 4-byte alignment as the internal design and pipelining are optimized that way, The 2-byte instructions / opcodes get you the space efficiency, but the CM0(+) throws a bunch of the potential away because it tends to need more instructions to do the same thing as a CM3/CM4 design.

Even in situations where you can use 2-byte alignment, reading a spanned 4-byte / 32-bit value takes twice as long.

None of the Cortex-M parts are tolerant on misaligned LDRD/STRD, a particular issue for pointers/structures using doubles or int64_t's

 

What can make substantial code space difference are use of subroutines for repetitive code sequences over massive runs of linear code. ie avoid very large scopes in subroutines or loops.

Effective libraries, function as an object, and effective dead code elimination can make a substantial.

Float libraries on CM0(+) can be a big consumer of code resources, watch especially printf() / scanf()

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Sure, for simple debugging purposes I have implemented my custom printf() function and I have seen that it is also important to make sure that if you are using floats in the code you should make sure to tag every literal with “f” otherwise the compile might treat literals as double and imports code for converting and performing calculations on both floats and double, which on 32kB MCU take up a good amount of the memory.  

Since I am implementing a custom bootloader I was playing around with the linker script and I was a bit confused on why it was necessary to align to 4 bytes but you both pointed out good arguments

Thank you!

ST has a free license for KEIL on CM0(+) parts.

Generally this is going to have a lot better performance on the compiler/linker code size, supporting dead code elimination, multi-pass working set reduction, and compression of statics.

Worth evaluating against GNU/GCC. At least compare-n-contrast 

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..