cancel
Showing results for 
Search instead for 
Did you mean: 

How does optimization level affect performance and memory usage ?

Jordanstanley
Visitor

I’d like to understand the practical trade-offs — for example, how each level influences execution speed, flash usage, and RAM consumption. Also, are there any best practices or common pitfalls when choosing an optimization level for STM32 projects?

2 REPLIES 2
Andrew Neil
Super User

This is not really specific to STM32 - there are plenty of general references on this.

https://en.wikipedia.org/wiki/Optimizing_compiler

 

It will, of course, depend on the particular compiler.

For GCC, see:

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

You will see that some options target code size, and others target execution speed.

 

A big problem with optimisation is that it makes debugging harder - the higher the optimisation, the harder it gets to debug.

Probably the biggest pitfall is that optimisation is likely to show up flaws in your code - you'll find loads of forum posts on the lines of "my code was working fine, then I turned on optimisation, and it wouldn't work any more".

A common example is where you haven't put 'volatile' on variables which should have it. You may get away with that with no optimisation, but not with optimisation.

 

A complex system that works is invariably found to have evolved from a simple system that worked.
A complex system designed from scratch never works and cannot be patched up to make it work.
markkavin
Associate II

Optimization levels are essentially trade-offs between code size, execution speed, and debugging convenience:

  • -O0: No optimization. Code is large and slow, but debugging is straightforward (variables and line steps behave as expected).

  • -O1/-O2: Progressive optimizations that reduce flash size and improve performance. -O2 is usually a good balance for release builds.

  • -Os: Optimizes for size, which often reduces flash usage significantly while still running fast. Good for memory-constrained MCUs.

  • -O3: Aggressive speed optimization. Can increase flash size and sometimes even reduce performance on Cortex-M (due to pipeline effects or larger code footprint). Rarely needed in embedded unless you have a very compute-heavy routine.

RAM usage is mostly unaffected, except that inlining or loop unrolling at higher levels may increase stack use in certain functions.

Best practices:

  • Use -O0 during development/debugging.

  • For production, test with -O2 or -Os to balance speed and size.

  • Profile critical routines separately — you can apply function-specific attributes (e.g. __attribute__((optimize("O3")))) if only a few parts need max performance.

  • Always regression-test after changing optimization — higher levels can expose hidden issues with uninitialized variables, volatile misuse, or timing assumptions.

Common pitfall: relying on code execution order in ways the compiler may change. Always use volatile for hardware registers and shared variables with ISRs to ensure correctness across optimization levels.