2025-09-26 9:51 AM - last edited on 2025-09-26 10:12 AM by Andrew Neil
I’d like to understand the practical trade-offs — for example, how each level influences execution speed, flash usage, and RAM consumption. Also, are there any best practices or common pitfalls when choosing an optimization level for STM32 projects?
2025-09-26 10:12 AM
This is not really specific to STM32 - there are plenty of general references on this.
https://en.wikipedia.org/wiki/Optimizing_compiler
It will, of course, depend on the particular compiler.
For GCC, see:
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
You will see that some options target code size, and others target execution speed.
A big problem with optimisation is that it makes debugging harder - the higher the optimisation, the harder it gets to debug.
Probably the biggest pitfall is that optimisation is likely to show up flaws in your code - you'll find loads of forum posts on the lines of "my code was working fine, then I turned on optimisation, and it wouldn't work any more".
A common example is where you haven't put 'volatile' on variables which should have it. You may get away with that with no optimisation, but not with optimisation.
2025-09-26 11:04 AM
Optimization levels are essentially trade-offs between code size, execution speed, and debugging convenience:
-O0: No optimization. Code is large and slow, but debugging is straightforward (variables and line steps behave as expected).
-O1/-O2: Progressive optimizations that reduce flash size and improve performance. -O2 is usually a good balance for release builds.
-Os: Optimizes for size, which often reduces flash usage significantly while still running fast. Good for memory-constrained MCUs.
-O3: Aggressive speed optimization. Can increase flash size and sometimes even reduce performance on Cortex-M (due to pipeline effects or larger code footprint). Rarely needed in embedded unless you have a very compute-heavy routine.
RAM usage is mostly unaffected, except that inlining or loop unrolling at higher levels may increase stack use in certain functions.
Best practices:
Use -O0 during development/debugging.
For production, test with -O2 or -Os to balance speed and size.
Profile critical routines separately — you can apply function-specific attributes (e.g. __attribute__((optimize("O3")))) if only a few parts need max performance.
Always regression-test after changing optimization — higher levels can expose hidden issues with uninitialized variables, volatile misuse, or timing assumptions.
Common pitfall: relying on code execution order in ways the compiler may change. Always use volatile for hardware registers and shared variables with ISRs to ensure correctness across optimization levels.